The WSJ has just reviewed the book The Numerati By Stephen Baker and I have already ordered my copy. How could I resist when we’re mining the blogoisphere for sentiment and about to test our own home-grown splog detector? Check out this section of the review:
The Numerati are even mining the output of bloggers, those stream-of-consciousness online diarists and self-promoters. “What makes the blog world especially valuable to marketers,” Mr. Baker writes, is “its unfiltered immediacy.” What do consumers think of your new product? What desires are still not satisfied by products of this kind? You can commission a poll or wait for the sales figures to come in . . . or you can read the blogs. Better yet, you can hire Numerati to write programs that will read them for you, since there are now more than 20 million bloggers in the U.S. alone.
…But Adsense has set in motion an ugly arms race online as robot bloggers — clever computer programs — have generated hundreds of thousands of spam blogs, or “splogs.”
A splog, though unreadable, is seeded with words that will attract Google ads. A computer-user may be annoyed at finding himself staring at a screen full of gibberish but click on an ad anyway, allowing the robot blogger to harvest revenue. This sleight of hand has the Numerati hard at work getting their software to distinguish between a blog and a splog. Mr. Baker gives a helpful sketch of the math involved, each blog reduced to a vector in a space of several dozen dimensions.

{ 1 trackback }
{ 0 comments… add one now }