Posts Tagged '80App'

You can now run custom code on 80legs – version 0.8 released!

We’re very excited to announce that you can now run custom code on 80legs.  We have just released version 0.8, which gives users the ability to write their own content analysis logic using processDocument() and their own link extraction logic using parseLinks().  For more information on how to write and run code on 80legs, please visit http://80legs.pbworks.com/Custom-Code.

The total list of changes in this release include:

  • Custom code initial release (first IWebAnalysisConnector release with parseLinks() and processDocument())
  • Option to analyze specific MIME types
  • Option to preserve query strings when crawling
  • Resulting crawl list shows status codes and other reasons for failing to crawl (e.g. robots.txt, DNS, etc)
  • Better handling of failed URLs
  • Sandbox server for testing custom code on your own machine using the 80legs framework.
  • Stop problem jobs automatically

We’ve also granted access to several more users on our private beta list.  If you haven’t received access yet, but would really like to get access soon, please let us know, and we’ll try and include you in the next set of beta users.

We’re already working on the new features, such as:

  • A web service for programmatically submitting and managing jobs
  • An “app store” that will allow users to run pre-built applications developed by trusted third-parties
  • Our payment system, which will be released first as a “demo”, allowing users to get used to the system before actually requiring payment

Status on Custom Code release

We are still hard at work preparing our first beta release that allows our users to submit their own custom code.  We’ve decided to delay the release just slightly so we can include the following features:

  • parseLinks() – You will be able to control which links your crawl follows from the initial release.  We were going to put this in a later release, but we’ve decided to include it now so that our customers won’t need to make additional changes later.
  • Automated JAR approval – We are automating our JAR approval process so that any submitted custom code JARs get approved or denied automatically.
  • Improved and simplified custom code interface –  Our wiki has details and instructions and our open-source repository has the latest custom-code interface and an empty starter project.
  • Open-sourced default regex and string processing – This is packaged as an 80legs custom code JAR so you can modify and use this project for your own needs.  It is BSD licensed so you are not required to release any modifications (but you are certainly welcome to do so).
  • The 80legs JAR testing application (and source) – This allows you to test your custom code JAR locally on your machine before deploying it to our sandbox server for larger tests.
  • Simple results deserialization class – We’ve created an easy to use java class for extracting your results from your 80legs output.  Since your results from your processDocument() function are binary (they are a byte[]), this class will extract your URL/results pairs from your output.

To our current beta users – please go ahead and start working on your custom code using the interface and instructions.  We should release the version that works with them in the next few days.  To our patiently waiting soon-to-be-beta-users, it won’t be much longer!