Comparing 80legs to Yahoo! BOSS

Yahoo! recently announced a new pricing scheme for their BOSS platform, so we thought it would be a good idea to provide a comparison between 80legs and BOSS.

Web-Scale Development vs. Re-packaged Yahoo! Search

The biggest difference between 80legs and BOSS is that 80legs is a platform for developing your own web-scale applications while BOSS is an API for retrieving search results from Yahoo!.  In other words, with 80legs you can easily build any kind of web-scale app that accesses the entire Internet.  With BOSS, you are ultimately  re-packaging search results from Yahoo!.

Query Types

BOSS lets you make 4 types of queries:

  • Spelling
  • Web
  • News
  • Image

Each of these query types is logically the same type: keyword matching on text content.  The difference between the four is the result type you get with each one.  80legs has no limitations on query types.  With our service, you can do any of the following:

  • Keyword matching on text content (includes all 4 BOSS query ‘types’)
  • Visual matching on images (e.g., Is Image A similar to Image B?)
  • Programmatic queries (e.g., On which pages does the word ‘Obama’ appear 4 times?)
  • And any other query type you conceive

Because 80legs is an application development platform, you can create your own code to create any query type you want.


Within some of the BOSS query types listed above, you can pass in a limited set of filter options to narrow down the result set your query returns.  For example, with web queries, you can choose from a set of 6 file types.  When filtering with 80legs, you pass in regular expressions instead of pre-defined options.  This gives the developer infinitely more freedom when it comes to filtering result sets.


Here’s the pricing table for BOSS:


Each unit costs $0.10.  This table is a bit opaque, but with a little math we can break it down as follows (MRR = million results returned):

  • $0.10 per MRR: off-peak use
  • $3.00 per MRR: 1,000 results/query, on-peak use
  • $10.00 per MRR: 100 results/query, on-peak use
  • $12.00 per MRR: 50 results/query, on-peak use
  • $30.00 per MRR: 10 results/query, on-peak use

The cost to use 80legs is more straightforward (MPC = million pages crawled):

  • $2.00 per MPC: for crawling/accessing content
  • $0.03 per CPU-hr: for computing/analysis performed on content

Now, this comparison is admittedly a bit of an apples to oranges comparison (hopefully we’ve impressed upon you that 80legs is a different animal and has way more features), but it gives you some sense of the difference in pricing.  Companies interested in serious web-scale development could potentially save a lot by going with BOSS during off-peak hours, but I wonder if they would be trying BOSS at all due to limitations we mentioned above.  (Also, it’s not clear what constitutes ‘off-peak’ at this point.)  Smaller users will be paying less on a per-unit basis.  Again, this is an apples-oranges scenario, so comparing the two pricing schemes is a bit odd, but we like to be thorough :).


80legs and BOSS are two very different things.  80legs is a platform for making any kind of web-scale application.  BOSS is a way to query Yahoo!.  80legs allows much more functionality and enables a  much wider variety of service and products looking to do interesting things with Internet data.

6 Responses to “Comparing 80legs to Yahoo! BOSS”

  1. 1 srw April 2, 2009 at 10:35 pm

    Do you plan to have ready page-rank algorithms for usage over the crawled data? I understand that you could implement your own, but some kind of alternative page-rank infrastructure would be nice.

  2. 2 Shion Deysarkar April 5, 2009 at 4:54 am

    srw – We won’t have page-rank algorithms in place. Our goal is to be an infrastructure service provider, and we hope our customers create new and innovative page ranking algorithms using 80legs.

  3. 3 anahap September 2, 2010 at 6:40 am

    Hi Shion,
    you are right that 80legs is much more flexible than yahoo boss, but with yahoo boss, you can reduce the search space with keywords, so that you dont have to waste computer resources on all webservers by crawling all the pages everytime you need a small subset of information.
    The problem with 80legs is if you dont know which domains to seed with, you have to do a brute force approach but using as many seed urls as you can. Alternatively you can use a yahoo boss search to determine which domains to crawl with 80legs.
    what would be ideal if 80legs would collect all the pages found and cache them or maybe even index them an make the available to api users instead of crawling all sites given by api users for each an every user and wasting resources. Alternatively, google or yahoo could offer paid access to their index or parts of their index without such great limitations

    That said, I really love 80legs and the ideas behind it! Please continue the good work!

  4. 4 August 5, 2013 at 9:26 am

    Because the admin of this site is working, no hesitation very shortly it will be renowned, due to
    its quality contents.

  5. 5 storage September 24, 2013 at 10:06 am

    Somebody essentially lend a hand to make significantly posts I would state.

    That is the first time I frequented your web page and thus far?
    I surprised with the research you made to create this actual submit amazing.
    Magnificent task!

  1. 1 Crawlez le web avec 80legs | motrech Trackback on October 1, 2009 at 12:40 am

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Twitter Updates

%d bloggers like this: