Technorati Blog

Subscribe

Technorati is experiencing a problem with our search result updating infrastructure. We continue to crawl and save data, however, post search results are stale and temporarily stuck at about 3pm Pacific Fri. Aug 15. Link results (reactions) are stuck at Thu. Aug 14.

We have identified the root cause and are actively working on the issue. We expect to have the system caught up during the evening hours.

No data is being lost, but the most recent posts and reactions are not reflected in results at the moment.

UPDATE:

We have restored our post and tag search results. Link (reactions) results are catching up. We expect the system to be fully restored late this evening.

I was in Chicago last week to participate in ad-tech. The content and speakers struck me as particularly good this time around, with a major focus on social media.

The media shift of the past few years is fundamental – you can’t underestimate this – and it’s critical that brands adapt to life in this new environment. There was definitely an air of urgency on the part of everyone present to figure it all out.

Overwhelmingly, the two main themes I heard were:

Brands need to be part of or at least adjacent to the conversation

Brands need to go where their audiences are versus trying to bring audiences to them

A few highlights and how-tos from the sessions I attended:

The six drivers of brand credibility in social media environments*

  • Trust
  • Authenticity
  • Transparency
  • Affirmation
  • Listening
  • Responsiveness
The commitment needs to permeate the entire company, not just the marketing organization.


The conversation is less about brands and more about the issues and topics that surround brands, or that are passion points for the audiences of those brands.

Every brand is different: You might need to blog, you might need to listen and interact or you might simply need to be present alongside the conversation.

Speaking of execution:

The microsite was declared dead. Rising up in its place are media that function as the microsite, but do it one better by putting that content and interactivity where your audiences ARE: conversational ads and channels, widgets.

Even the most universally loved brands have their critics. Look at this new era not as a problem to solve but as an unprecedented opportunity to truly know what people think about you, and to engage with them.

The long tail is where you find influence. Even if a blogger has a relatively small number of followers, the level of influence and trust is exponentially higher than with large, mainstream media

And finally, don’t wait for a crisis to get started. The case studies are there: conversational strategies are working.


“We’re not serving them dinner anymore, we’re at the dinner party.”
- Richard Binhammer, Dell, Inc

*Pete Blackshaw, EVP of Nielsen Online

Technorati is bringing you that much closer to attending Web 2.0 Expo NYC next month – we’ve got free tickets to give away! As a media sponsor for the event, Technorati has complimentary promotional tickets for the conference taking place Sept. 16-19 at the Javits Center in NYC.

For your chance to snag a ticket, email WebExNY@technorati.com by August 12, 2008. You’ll be entered into the drawing, and notified by August 14.

Good luck!

It’s been a while – – we’ve had our heads down focused on building the business, so we’ve been a little quiet lately. I wanted to bring things up to date with what’s new today as well as fill you in on our core search business.

So we’re launching an ad network…

Why? Technorati was founded to help bloggers succeed and to bring audiences to blog content. Given our unique position of running a blog search engine, an ad network geared towards helping blog and social media publishers at every level to make some money just made sense.

We’ve been successful attracting premium brand advertisers to Technorati.com – and we’d like to extend those relationships for bloggers, as well as give our advertisers the deeper reach into blogging and social media they’ve been asking for.

Our first step was a private beta. We assembled a core of like-minded sites, founded to provide community and services to bloggers and to surface the best of blog content to consumers, and were successful in attracting advertisers to the network including: T-Mobile, Toyota, and Verizon.

These sites form the base of the Technorati network’s vertical content channels and reach an audience of 17 million (with that audience increasing very shortly with several other sites about to sign). Over the next several months, we’ll be adding blogs from the mid and long tail within those verticals. Here’s some of who’s in so far:

blogtalkradio
Blogcritics
blogcatalog
BlogTV
GeekAlerts
GPSMagazine
NerdApproved
Technabob


That doesn’t mean we’re moving away from our core. We’ve organized the company into two operating groups – the network and Technorati.com. Blog search is still and will always be the foundation of everything we do.

In our biggest internal initiative, we’re in the midst of a summer-long project to completely rewrite our crawler and search engine. Last week, Dorion addressed some of our recent challenges and fixes. An updated search infrastructure should address of the vast majority of the complaints we receive, greatly reduce spam and give everyone a faster, more efficient utility. You’ll also see significant upgrades to blog claiming and Technorati Authority. Our product team has also spent a lot of this year getting feedback directly from the blogging community and incorporated this into the development of our widgets – as we roll them out a lot of you will recognize what you see.

You’ll see some new features designed for our readers as well, but I’ll leave this for a future update.

We strive to provide a great user experience and that includes fast page load times. Last summer we worked very hard on this effort and for the past many months we have been able to achieve this goal.

Well, I'm disappointed that I have to tell you what your probably already know, we have stumbled a bit the past two weeks. Page load times have been on the rise over the last week and, in a couple instances, the site has been nearly unusable. We are working exremely hard to resolve the underlying problems, but I thought it important to let you all know that we are keenly aware of the problem and, like you, want the site to be screaming fast.

What happened?

A high volume of automated processes (AKA "bots") access our service for various purposes, many with nefarious ones, exhibiting onerous behaviors and often masquerading as human users. These bots can, at times, impact the stability and performance of the service.

What are we doing about it?

We have an ongoing effort to reduce the impact of bots. At times we've had to throttle certain activities particularly around feed and API requests. We continue to upgrade and add new hardware as load dictates. We are also configuring new detection and prevention mechanisms to help ensure that real end user requests are our top priority to serve.

When will this all be done?

Several defenses have already gone live over the past week and these additions have resulted in a significant reduction in backend resource consumption and have stabilized parts of the overall system.

We constantly monitor the system and, as of this writing, have been able to cool things down again very close to our desired response time levels.

We appreciate your patience and understand that many of you have come to rely on our services in your daily use of the Internet. As I stated before, we strive to provide a great user experience. I want to thank the dedicated team I work with who is getting us through this difficult time. I hope you too can thank them when we achieve our goal once again.

UPDATE:

Well, we've had over 15 hours of excellent response times from the system. We have addressed the underlying capacity shortage things have returned to normal. Thank you again for your patience.

Nowhere have we seen a bigger impact of blogging and social media on the American political landscape than on the 2008 presidential election. Candidate appearances formerly confined to a small town are uploaded to YouTube and seen by millions. Conversations once shared by small groups spread instantly and globally. Facebook and MySpace are as important as New Hampshire and Iowa.

According to Yahoo, 51% of internet users will turn to blogs to gather information and communicate about politics. Citizen journalists are the ones posting the stories that break through the campaigning and ask the hard questions.

Authenticity is what plays with this audience. Spread misinformation or spin, and more than 30,000 political blogs (tagged politics in the Technorati index) are ready to call foul.

There's a brilliant application of Technorati data over at Tech President. (Disclosure: co-founder Micah Sifry's brother David Sifry is Technorati's founder.)

View Technorati election data profiles.

Taking a pulse of the blogosphere today, what do the numbers tell us? (Keep in mind that Technorati is indexing in real time, so the numbers can vary even by a few minutes.)

Barak Obama has pulled into the lead – in terms of attention in the blogosphere. If 2008 is truly the social media election, as has been posited, all signs point to yes. As the only Republican candidate, should John McCain be benefitting with a focus in attention – or will he rebound once the Democrats have picked a candidate?

Simple and telling: the tag cloud on Technorati's politics page.

Technorati Tag Cloud

Hilary Clinton

English posts that contain Hillary Clinton per day for the last 30 days.
Technorati Chart
Get your own chart!

Barak Obama

English posts that contain Obama per day for the last 30 days.
Technorati Chart
Get your own chart!

John McCain

English posts that contain McCain per day for the last 30 days.
Technorati Chart
Get your own chart!

D'Technology Blog posted today


So you’ve installed WordPress 2.5, now you want to show Technorati links on the dashboard. Here’s the code...


read the rest

If you're stuck on an old release because you didn't want to lose those inbound links in the administrative console, you're now free to move up to 2.5. Because the of the widespread hacking of legacy WordPress installations, we strongly urge you to upgrade ASAP.

We're seeing thousands of blogs per day that we're not indexing because they're bearing symptoms of being compromised (see the previous post on the matter). If you're not using versions 2.3.3 or 2.5, you must upgrade to protect yourself (perhaps 2.0.11 and 2.1.3 each fixes this issue too, I'm looking for confirmation on that).

This is a follow up on our post regarding a problem affecting thousands of WordPress blogs, Patch or Upgrade Your Wordpress Installation, Now. WordPress has since released version 2.5. However, we've noticed that a large number of blogs remain vulnerable to the security issue addressed by the 2.3.3 release.

Blogs that have been compromised by this security vulnerability are typified by having links to spam destinations inserted onto the blog page. These link insertions may be invisible to casual observations; the links are often obscured by style attributes that render them invisible. These links are still seen by crawlers such as Technorati's, Google's and Yahoo's. You can find these links by viewing the source of the blog pages or, when using Firefox, looking under "Tools" -> "Page Info" -> "Links". Blogs hosted on wordpress.com are not affected by this issue; only blogs hosted on their own installations of WordPress from wordpress.org require concern.

Because of this ongoing problem, we're discontinuing processing crawls of blogs that exhibit common symptoms of being compromised. We strongly recommend upgrading your WordPress installation. Even if you haven't been afflicted by a compromise, by the time you are aware that you have been a number of negative consequences may have already occurred (for instance, flagged spam by Technorati, Google or Yahoo!) -- this has been reported by many WordPress users.

If you have questions about installing WordPress or maintaining a WordPress installation, please refer to the WordPress Documentation or the WordPress Forums. If you feel that your blog is not vulnerable to this hack but your WordPress blog is not being updated, please contact Technorati support staff.

Technorati has seen a number of blogs exploited by a recently announced WordPress vulnerability. The fix for it is simple: upgrade your installation or patch it. If you're running a WordPress installation, please read about the WordPress 2.3.3. release to review your options.

Sorry about the goofy title, I'm in grave need of levity now due to some indexing troubles we had this past week and the ensuing recovery effort. We're currently in the midst of repairing most of the effected data but I wanted to share what's going on with it.

Technorati's spiders were shutdown for several hours on Thursday and various intervals since then while we investigated a number of anomalies that were appearing in our data; essentially, a small percentage of recently created blogs were having their data scrambled. An example of this appears in this blog post. The spidering outages allowed us time to investigate, diagnose and make corrections that prevented further data corruption. We started running some corrective measures on Friday but found over the weekend that that was only partially effective. Technorati handles a large volume of data everyday; isolating and devising remedies for these kinds of issues that effect a small percentage of the data flow is tricky. However, we think we're recovering now and the backlog of data processing is getting worked through.

Just to peek into the works a little bit, many distributed data systems rely on centrally dispensing identifiers for data elements and Technorati has such a beast. What was found were cases of blogs new to our system (from within the last 3 weeks) losing thier identifiers and those identifiers getting re-associated to other new blogs. No blogs that existed in our system before Dec. 18th (the vast majority) were impacted at all. The outward manifestations visible were posts for blogs with a shared ID mingled (a mashup the authors naturally were unhappy with) and mis-associated blog claims ("And you may tell yourself, this is not my beautiful blog").

This was a unprecedented case for us; while it had been occurring in about 8% of those blogs (created on or after December 18) for about 2 days (beginning on Tuesday, January 8th) we had until that time never encountered this phenomenon. An intensive investigation was launched, reconstructing operational timelines and correlating facts. What we found was that this stemmed from a failure incident with the primary system for identifier dispensing, another failure in the secondary system that took its place and then a corrupted data set mistakenly taking over that one, ouch! The first two blows appeared to be handled routinely but the third time was cursed; propagation of corrupted data was not detected for about 48 hours between Tuesday when it started and Thursday when we pulled the emergency brakes on the spiders.

So we're recovering now, most of the data is being restored to its previous state and we have had a number of internal postmortem discussions about earlier fault detection and recovery. If your blog was created in our system within the prior three weeks (since December 18th) and you're seeing aberrant data associated with it or it's no longer there (try http://technorati.com/blogs/YOUR_BLOG_URL to check), please visit the support request page. A selection for 'The January 8th System Outage' will be available this month while we shake out any remaining issues that aren't covered by the remedial action under way now.

View Archived Posts