Archive for the 'News' Category

Crawl Packages: Aggregate Website Data in a Few Clicks

We’re excited to announce a new service at 80legs: Crawl Packages.

What crawl packages are:

Crawl packages are pre-configured crawls that you can access and run in just a few clicks.

For a specific website or group of websites, we’ve designed and setup an 80legs crawl, along with custom data extractors, to crawl that site and extract all the interesting information from it.  These are crawls you could have setup yourself, but we’ve gone ahead and done all the work for you.

Types of crawl packages available:

We’re currently offering crawl packages for social networks, retail/shopping sites and business directories.  We’ll be expanding our offerings to include other websites as well.  Initial plans include crawling blogs (and their comments), semantic annotation feeds of various websites, and so on.

Results & Pricing:

Most crawl packages will cost $350 per month and produce 10 – 20 million records per month.  The type of records produced depend on the crawl package.  Social network packages produce publicly-available profiles, Retail packages produce product listings, etc.

Open Data:

We realize that the availability of crawl packages will raise some concerns over what data should be crawled and shouldn’t.  We only crawl publicly-available Web data.  We don’t crawl private data and have no interest in that.

What we are interested in is what our users can do with Web data that is more accessible.  Since our launch, we’ve seen many startups come to us asking for large amounts of Web data so that they can create additional value on top of that data.  They want to do interesting things like provide new insight into how people connect with one another, create CPIs of online product invetory, and more.  We want to make that possible, and crawl packages are a step in that direction.


On Microsoft & Yahoo

As several sources reported yesterday, Microsoft and Yahoo put together a deal whereby Microsoft will use Bing to power Yahoo search, while Yahoo will take over the advertising on all search results.  Overall, we feel that this is a good thing, assuming it does create a more viable challenger to Google.  While we’re fully vested in the success of companies using 80legs to power their own search technologies, we feel Bing represents the best opportunity to get consumers used to thinking of alternative when it comes to search.  Right now your average person only thinks of one thing when it comes to search (Google) and he doesn’t even consider any alternatives (niche, deep web, semantic, etc.).  Bing has been able to put some good-sized chinks in Google’s armor, and that will help in the entire industry.

One interesting side issue is what will happen to all the efforts Yahoo has engaged in to help the search community, such as SearchMonkey, BOSS, etc.  From what we can gather, the fate of this products is sort of up in the air right now.  My guess is that they stay alive, as both Microsoft and Yahoo probably realize they need as many people taking on Google as possible.

Companies we wish we had met earlier: SearchMe

TechCrunch reported today that visual search engine SearchMe is (temporarily) closing its doors.  This is pretty disappointing to us.  From what we’ve heard, SearchMe was a pretty cool product with some real potential.  I find it especially ironic (in a bad way) that the site now redirects to Google.  Search is not a solved technology, and for companies like SearchMe to not be able to find enough traction to keep going hurts the industry in the long run.

We never got the chance to talk to the folks at SearchMe (I’ve tweeted Adams to see if he’d be interested in us as way to cut down on crawling costs, though that seems unlikely since they appear to be re-focusing on a different market), but I’m guessing we could have helped them out.  I found the following quote particularly interesting:

So the plan now (unless a buyer or white knight jumps in at the last moment) is to significantly downsize, take the site down for a while (probably tomorrow) and refocus the tech in a space where we don’t have to have 3,000 servers costing a million a month to run on the back end.

I’m sure most of that 3,000 is not for things like crawling and processing web content, but I’m betting some of it is.  The infrastructure requirements of getting into the search game are so daunting that it makes it very difficult to just to throw your hat in.  I’ve told a lot of people that Google’s main advantage is not its technology, but its operational costs.  It can deliver at a lower cost than anyone else, due to its sheer size.

We’re hoping 80legs puts more than a few dents in that barrier to entry.