Posts Tagged 'larger crawls'

Testing out some improvements to our crawling back-end

Some of our users may have noticed we recently lowered the limit on # of pages crawled per job to just 10,000.  This is a temporary measure while we test out some major improvements to the crawling back-end.  If the tests go well, the limit will be pushed back up to 1 million (probably 10 million, actually) by the end of this week or early next week.  We think this will be the last major upgrade to our back-end.  It should allow 80legs to more-easily scale into the billions of pages crawled per job.

So please bear with us as we continue to work on the service – thanks so much!!

Advertisements

New Beta Release 0.76

Please see the website for the complete list of features and improvements (http://80legs.com/using.html#releases).

We have bumped up the maximum number of pages to crawl in a single crawl to 1,000,000 (still free for our current beta users). For a very broad crawl, you should expect 1M pages to take about 10-20 minutes. If your crawl is restricted or is not very broad, it can take much longer that that because of the way we throttle ourselves to prevent hitting single domains and servers too hard.

We are expecting this to be our last release before we push the first beta version containing our processDocument() functionality in 0.8.