Published August 25, 2009
We just pushed out version 0.9, which is a big, big update to the system. This release includes several upgrades to our back-end architecture (allowing larger jobs), a Java API (allowing programmatic access), an easy-to-use job form (allowing easier access), and a bunch of other cool things!
Here’s a list of the specific features:
- Large crawls are now supported. Crawl up to 10 million pages per job!
- The API is officially released. Submit jobs, download results and much more using Java.
- A much easier-to-use job form. We realized the old job form was a bit clunky. The new one is much easier to understand.
- To go along with the new job form, we’ve updated the entire portal to be easier to navigate and use.
- You can now load in external JARs into your 80Apps. This lets developers use third-party code more easily.
- Several improvements to the crawler, including:
- Options to select your type of crawl. Choose among fast, comprehensive, and breadth-first.
- Crawler now crawls https:// pages.
- Crawler tries to fetch a page more than once before giving up.
Since we just released 0.9, I suppose that technically makes us 0.1 from a beta exit! Some of the upcoming features are:
- Finalizing the payment system in preparation for beta exit and charging actual money.
- Providing useful default 80Apps for all users (this is also in preparation for the app store model we’ll be pursuing).
See full release log details at http://80legs.pbworks.com/Release-Log.
Published July 9, 2009
We pushed out 0.83 today. This release was mostly done to push out some improvements in our crawling and back-end data store, which should help the overall performance of 80legs.
We also took the opportunity to push out some new functionality, including allowing users to upload very large seed lists (up to 1 GB!). To upload these seed lists, you’ll need to go to the new “Seed Lists” section in the portal. The interface is still a bit on the “raw” side, so let us know if you encounter any problems.
You can see the full list of changes at http://80legs.pbworks.com/Release-Log#Release0838July2009.
Published June 26, 2009
We’ve just pushed out 0.82. Improvements and changes include:
- Smarter URL selection for larger crawls
- Sandbox jobs run automatically and the user gets access to stdout from their 80App
- Domain throttling information in the portal
- Time estimates shown in the portal
- Crawled result files additions:
- page size
- parse time in milliseconds
- process time in milliseconds
- compute timeouts get COMPUTE_TIMEOUT_GOOD or COMPUTE_TIMEOUT_BAD
- Several improvements for large job performance
- User can specify data for the jar upload which gets passed into the initialize() during the validation test
- Fixed problem with multiple Loading Code errors
- Improved default link parsing
- Better web portal login behavior
As usual, we’ve started working on the next release already, which will have things like:
- Allowing larger crawls
- Allowing larger seed lists
- Creating result files on the fly
Check out http://80legs.pbworks.com/Release-Log for all the details!
Published June 17, 2009
Published June 3, 2009
Tags: 80App, custom code, release
We’re very excited to announce that you can now run custom code on 80legs. We have just released version 0.8, which gives users the ability to write their own content analysis logic using processDocument() and their own link extraction logic using parseLinks(). For more information on how to write and run code on 80legs, please visit http://80legs.pbworks.com/Custom-Code.
The total list of changes in this release include:
- Custom code initial release (first IWebAnalysisConnector release with parseLinks() and processDocument())
- Option to analyze specific MIME types
- Option to preserve query strings when crawling
- Resulting crawl list shows status codes and other reasons for failing to crawl (e.g. robots.txt, DNS, etc)
- Better handling of failed URLs
- Sandbox server for testing custom code on your own machine using the 80legs framework.
- Stop problem jobs automatically
We’ve also granted access to several more users on our private beta list. If you haven’t received access yet, but would really like to get access soon, please let us know, and we’ll try and include you in the next set of beta users.
We’re already working on the new features, such as:
- A web service for programmatically submitting and managing jobs
- An “app store” that will allow users to run pre-built applications developed by trusted third-parties
- Our payment system, which will be released first as a “demo”, allowing users to get used to the system before actually requiring payment
Published April 22, 2009
Tags: larger crawls, release
Please see the website for the complete list of features and improvements (http://80legs.com/using.html#releases).
We have bumped up the maximum number of pages to crawl in a single crawl to 1,000,000 (still free for our current beta users). For a very broad crawl, you should expect 1M pages to take about 10-20 minutes. If your crawl is restricted or is not very broad, it can take much longer that that because of the way we throttle ourselves to prevent hitting single domains and servers too hard.
We are expecting this to be our last release before we push the first beta version containing our processDocument() functionality in 0.8.
Published April 14, 2009
0.71 has been released. Please visit the forum for details (http://forum.80legs.com/showthread.php?t=28).