We are still hard at work preparing our first beta release that allows our users to submit their own custom code. We’ve decided to delay the release just slightly so we can include the following features:
- parseLinks() – You will be able to control which links your crawl follows from the initial release. We were going to put this in a later release, but we’ve decided to include it now so that our customers won’t need to make additional changes later.
- Automated JAR approval – We are automating our JAR approval process so that any submitted custom code JARs get approved or denied automatically.
- Improved and simplified custom code interface – Our wiki has details and instructions and our open-source repository has the latest custom-code interface and an empty starter project.
- Open-sourced default regex and string processing – This is packaged as an 80legs custom code JAR so you can modify and use this project for your own needs. It is BSD licensed so you are not required to release any modifications (but you are certainly welcome to do so).
- The 80legs JAR testing application (and source) – This allows you to test your custom code JAR locally on your machine before deploying it to our sandbox server for larger tests.
- Simple results deserialization class – We’ve created an easy to use java class for extracting your results from your 80legs output. Since your results from your processDocument() function are binary (they are a byte), this class will extract your URL/results pairs from your output.
To our current beta users – please go ahead and start working on your custom code using the interface and instructions. We should release the version that works with them in the next few days. To our patiently waiting soon-to-be-beta-users, it won’t be much longer!