We’ve just deployed a new version of 80legs that adds an exciting new feature: 80app Packs!
Plus and Premium subscribers will now have access to a growing set of useful, pre-built 80apps. The following 80apps are currently available or will be available soon:
Plus:
- Return Page Content
- Regex Text Matcher
- Regex Source Matcher
- Image Resizer
Premium:
- All Plus 80apps
- Social Network Scrapers
- E-commerce Site Scrapers
80legs users will be able to select these apps and get the information they want from crawls with zero programming. Everything will be pre-built and ready to go. We want to make things as easy as possible for our users.
We plan to keep on adding more and more 80apps to Plus and Premium Plans. If you have an idea for 80apps you’d like to see, just let us know!
nice step keep on
Are you aware that your crawler is used for crashing web servers? your crawler makes lots of connections from lots of different IPs, i still cant get my server up, and i have banned more than 500 IP addresses!
Hoze – you can follow the instructions here to prevent 80legs from crawling your site: http://80legs.pbworks.com/FAQ#Webmasters
The crawler also does NOT obey robots.txt OR html header directives, despite what it says in the FAQ referenced above.
Janus,
We obey all standard robots.txt directives. Which directive in your robots.txt file are we not following?
Our robots.txt contains the following:
User-agent: *
Disallow: /exampledirectory/
Your bot cruises merrily into /exampledirectory/ oblivious to robots.txt directives. I also ignores robots directives in page headers such as nofollow.
Janus,
Please let me know which domain this robots.txt is on (either here or by contacting us: http://www.80legs.com/contact.html).
The nofollow tag is not supposed to be used to tell a bot to not crawl it. It’s for search engines such as Google to not index that content.
– Shion