Yahoo! recently announced a new pricing scheme for their BOSS platform, so we thought it would be a good idea to provide a comparison between 80legs and BOSS.
Web-Scale Development vs. Re-packaged Yahoo! Search
The biggest difference between 80legs and BOSS is that 80legs is a platform for developing your own web-scale applications while BOSS is an API for retrieving search results from Yahoo!. In other words, with 80legs you can easily build any kind of web-scale app that accesses the entire Internet. With BOSS, you are ultimately re-packaging search results from Yahoo!.
Query Types
BOSS lets you make 4 types of queries:
- Spelling
- Web
- News
- Image
Each of these query types is logically the same type: keyword matching on text content. The difference between the four is the result type you get with each one. 80legs has no limitations on query types. With our service, you can do any of the following:
- Keyword matching on text content (includes all 4 BOSS query ‘types’)
- Visual matching on images (e.g., Is Image A similar to Image B?)
- Programmatic queries (e.g., On which pages does the word ‘Obama’ appear 4 times?)
- And any other query type you conceive
Because 80legs is an application development platform, you can create your own code to create any query type you want.
Filtering
Within some of the BOSS query types listed above, you can pass in a limited set of filter options to narrow down the result set your query returns. For example, with web queries, you can choose from a set of 6 file types. When filtering with 80legs, you pass in regular expressions instead of pre-defined options. This gives the developer infinitely more freedom when it comes to filtering result sets.
Pricing
Here’s the pricing table for BOSS:

Each unit costs $0.10. This table is a bit opaque, but with a little math we can break it down as follows (MRR = million results returned):
- $0.10 per MRR: off-peak use
- $3.00 per MRR: 1,000 results/query, on-peak use
- $10.00 per MRR: 100 results/query, on-peak use
- $12.00 per MRR: 50 results/query, on-peak use
- $30.00 per MRR: 10 results/query, on-peak use
The cost to use 80legs is more straightforward (MPC = million pages crawled):
- $2.00 per MPC: for crawling/accessing content
- $0.03 per CPU-hr: for computing/analysis performed on content
Now, this comparison is admittedly a bit of an apples to oranges comparison (hopefully we’ve impressed upon you that 80legs is a different animal and has way more features), but it gives you some sense of the difference in pricing. Companies interested in serious web-scale development could potentially save a lot by going with BOSS during off-peak hours, but I wonder if they would be trying BOSS at all due to limitations we mentioned above. (Also, it’s not clear what constitutes ‘off-peak’ at this point.) Smaller users will be paying less on a per-unit basis. Again, this is an apples-oranges scenario, so comparing the two pricing schemes is a bit odd, but we like to be thorough :).
Conclusion
80legs and BOSS are two very different things. 80legs is a platform for making any kind of web-scale application. BOSS is a way to query Yahoo!. 80legs allows much more functionality and enables a much wider variety of service and products looking to do interesting things with Internet data.
Do you plan to have ready page-rank algorithms for usage over the crawled data? I understand that you could implement your own, but some kind of alternative page-rank infrastructure would be nice.
srw – We won’t have page-rank algorithms in place. Our goal is to be an infrastructure service provider, and we hope our customers create new and innovative page ranking algorithms using 80legs.
Hi Shion,
you are right that 80legs is much more flexible than yahoo boss, but with yahoo boss, you can reduce the search space with keywords, so that you dont have to waste computer resources on all webservers by crawling all the pages everytime you need a small subset of information.
The problem with 80legs is if you dont know which domains to seed with, you have to do a brute force approach but using as many seed urls as you can. Alternatively you can use a yahoo boss search to determine which domains to crawl with 80legs.
what would be ideal if 80legs would collect all the pages found and cache them or maybe even index them an make the available to api users instead of crawling all sites given by api users for each an every user and wasting resources. Alternatively, google or yahoo could offer paid access to their index or parts of their index without such great limitations
That said, I really love 80legs and the ideas behind it! Please continue the good work!