Archive for the 'Releases' Category

Crawl Packages: Aggregate Website Data in a Few Clicks

We’re excited to announce a new service at 80legs: Crawl Packages.

What crawl packages are:

Crawl packages are pre-configured crawls that you can access and run in just a few clicks.

For a specific website or group of websites, we’ve designed and setup an 80legs crawl, along with custom data extractors, to crawl that site and extract all the interesting information from it.  These are crawls you could have setup yourself, but we’ve gone ahead and done all the work for you.

Types of crawl packages available:

We’re currently offering crawl packages for social networks, retail/shopping sites and business directories.  We’ll be expanding our offerings to include other websites as well.  Initial plans include crawling blogs (and their comments), semantic annotation feeds of various websites, and so on.

Results & Pricing:

Most crawl packages will cost $350 per month and produce 10 – 20 million records per month.  The type of records produced depend on the crawl package.  Social network packages produce publicly-available profiles, Retail packages produce product listings, etc.

Open Data:

We realize that the availability of crawl packages will raise some concerns over what data should be crawled and shouldn’t.  We only crawl publicly-available Web data.  We don’t crawl private data and have no interest in that.

What we are interested in is what our users can do with Web data that is more accessible.  Since our launch, we’ve seen many startups come to us asking for large amounts of Web data so that they can create additional value on top of that data.  They want to do interesting things like provide new insight into how people connect with one another, create CPIs of online product invetory, and more.  We want to make that possible, and crawl packages are a step in that direction.


Python API Released

The 80legs Python API is now available for use.  To learn how to access and use it, visit the 80legs Python API documentation.

New feature: 80app packs!

We’ve just deployed a new version of 80legs that adds an exciting new feature: 80app Packs!

Plus and Premium subscribers will now have access to a growing set of useful, pre-built 80apps.  The following 80apps are currently available or will be available soon:


  • Return Page Content
  • Regex Text Matcher
  • Regex Source Matcher
  • Image Resizer


  • All Plus 80apps
  • Social Network Scrapers
  • E-commerce Site Scrapers

80legs users will be able to select these apps and get the information they want from crawls with zero programming.  Everything will be pre-built and ready to go.  We want to make things as easy as possible for our users.

We plan to keep on adding more and more 80apps to Plus and Premium Plans.  If you have an idea for 80apps you’d like to see, just let us know!

80legs Subscription Plans and Free Web-Crawling

We have just updated 80legs with some exciting new changes.  Starting today, the 80legs service will be divided into 3 tiers: Basic, Plus and Premium.  Since the time we’ve launched, we’ve noticed that our customer base can be classified into 3 major groups – light, medium and heavy users.  Each of these plans is targeted to each group and designed to fulfill their specific needs.

Here are details on each plan:

Basic Plan:

  • Free to use
  • Normal crawling speed (up to 1 request/second/domain)
  • Access to 80legs Web Portal
  • 1 job running at a time
  • Up to 100K crawled pages per job
  • Low priority in 80legs job queue
  • No recurring jobs allowed

Plus Plan:

  • $99/month + crawling fees
  • Fast crawling speed (up to 5 requests/second/domain)
  • Access to 80legs Web Portal and API
  • Up to 3 jobs running at a time
  • Up to 1M crawled pages per job
  • Normal priority in 80legs job queue
  • Recurring jobs allowed

Premium Plan:

  • $299/month + crawling fees
  • Ultra-fast crawling speed (up to 10 requests/second/domain)
  • Access to 80legs Web Portal and API
  • Up to 5 jobs running at a time
  • Up to 10M crawled pages per job
  • Preferred priority in 80legs job queue
  • Recurring jobs allowed

Existing users can sign up for a plan by going to the new Subscription section in the 80legs Web Portal, where there are complete details and instructions on signing up for a plan.

We’re really excited about these changes.  Of course, the Basic Plan now enables completely free web-crawling, which until today has been completely unheard of.  The Plus and Premium Plans give heavier users the ability to set up and run more intensive crawls.

If any of our users have questions about the changes, please contact us or submit a tickets.  We’re always happy to hear from you!

Our launch experience, part 3: Now the real work begins

I ended my last post mentioning that I figured we could take a small break the day after DEMO.  Boy, was I wrong.  When I woke up the next day, I saw several hundred emails, about 300 tweets referring to 80legs and dozens of articles discussing us.  So instead of checking out the beach, we spent the morning responding to emails and catching up on all the 80legs discussion.

I think we did a really good job of getting the word out for 80legs.  Here are some quick stats showing how well we did on this front:

  • # of articles on 80legs: 16
  • # of times 80legs was mentioned as “Best of DEMOfall09”: 2
  • # of re-tweets of articles: 700+

Here are just some our favorite articles:

I should also note that we got posted to Hacker News, Digg and Slashdot.  Here’s what happened to our web traffic in the week following DEMO:


Having a 1 Gbps connection helps

Interesting note: most of our web traffic came from Hacker News.  We check HN regularly and participate in the discussions from time to time, so it was awesome to get so much interest from our own community.  Of course, our main focus is not our web traffic (which I think is pretty good for non-consumer-facing service), but customer adoption.  Here are a few stats on that:

  • # of users that logged in since DEMO: 1554
  • # of jobs run since DEMO: 1557

Just as an aside, there were about 50 active beta users, and not every user that logged in has run a job.

Another interesting outcome from DEMO is that we’ve realized there’s demand for customized services on top of 80legs.  In other words, people want to use our team to either build customized products for them that are powered by 80legs, or they want us to build the 80Apps that run within 80legs.  We originally expected third-party companies to build these services and products themselves over time as 80legs became more popular.  In the long-term, that is most likely the key to 80legs sustainable success.  In the short-term, however, we think it’s prudent to pursue these engagement ourselves.  In fact, it makes sense to modify our business model somewhat and form 2 additional product/service lines: one for developing value-added services on top of 80legs and another for custom implementation of 80Apps.  Of course, we need to consider how to manage these two additional lines while still managing and improving the core service.

I feel our team’s experience so far has been pretty awesome.  We spent about 2 years developing what we feel is a pretty cool technology and now we’re starting to see the fruits of our labor.  That said, I’m a big believer that developing good technology is just the first step of many when it comes to finding commercial success.  Now we get to focus on execution, customer satisfaction, and delivering on what we’ve been promising.  Now the real work begins.

Our launch experience, part 2: DEMO

Around July, we started thinking about how to launch the live service.  We were fortunate that our plans lined up with DEMO.  Of course, they also lined up with TechCrunch50.  I imagine some companies have to think about which one is best for them, but for us it was pretty easy.  TC50 required a company to have no public exposure before their event, which of course made us ineligible.

We did have to think a bit about the cost of DEMO.  I talked to my friends that had demoed there and was ultimately convinced that it was a great place to launch a product, provided you took full advantage of it with the press, PR and other media outlets.

Again though, I wasn’t sure we would even get into DEMO.  80legs was usable by this point, but again – here was a completely non-shiny service, void of any semblance of a bell or whistle.  Sure, any “big data”-nut is going to think what we do is the coolest thing since SSDs, but will anyone else?  We weren’t sure.

Carla from Guidewire was the one that talked to me about our application.  I gave the 5-second spiel, and was excited to hear that she understood it and really liked the idea.  She did wonder about how we could make the demo interesting.  I assured her we could (while making a note to myself: “Figure out how to make demo interesting!”).

A few weeks later…

Guys, I’ve got news.  We’re going live in September.

We got into DEMO?

Yep.  So we’ll be on stage.  Hundreds of people.  Thousands of Internet viewers.

So we have about 8 weeks to get everything stable, fully-tested, and scaled out.. oh and we need to make the web portal look a lot better.


Now, it’s not like we had been slacking off, but July to September was especially scrambly, particularly for our back-end guys.  On the business and marketing end, we wanted to make sure we take full advantage of not only DEMO itself, but the momentum it could generate after the event.

For that, I sought out a PR firm to help with the media.  I asked a bunch of tech/startup friends in Texas about who to go with, and almost all of them recommended Jones-Dilworth, run by the veteran Josh, who had just left Porter Novelli.  If every trusted source you have recommends the same firm, you should probably go with them!

Josh and his team met with us and mapped out a strategy to garner media attention for DEMO and keep momentum going afterward.  They also helped with training our team for handling interviews, which was a big help.  In the week leading up to DEMO, I did at least 1 interview almost every day.  It was pretty awesome talking to and being interviewed by the same folks I had been reading every day for the past few years.

We got into San Diego on Sunday.  The event and crew at DEMO were very nice and professional.  They definitely run a tight shift, but are also super-approachable.  Everyone on staff seemed to know all the details, where to be, etc.

On Monday, all the demoers went through a few introductory items and then we headed off to a happy hour by the bay.  Mingling with other startups, VCs, and press folks is pretty fun.  It’s pretty awesome to be at a party where everyone is doing something interesting or has something engaging to say.  Can’t say the same about most bars I go to :).

Not your usual bar scene

After that was the “CEO & Dealmakers” dinner, which was only attended by 1 member of each company as well as VCs and other such folks.  While the pre-dinner topic, “The Good, Bad and the Ugly of VC” is something I’ve read about ad nauseum, hearing it straight from guys like the president of the NVCA was pretty cool.  I got a chance to thank Matt Marshall and Chris Shipley for giving us the chance to DEMO.  Matt and Chris kind of seem like opposites.  Chris was cracking jokes about Pittsburgh (I went to CMU and she’s from there), but Matt was like “But seriously, what are you demoing?”.

CEO & Dealmakers Dinner

After dinner, I had a cool talk with Flip from Infochimps and Mike Olson from Cloudera about Hadoop and how we might use it for providing post-processing services of crawled data.  Yeah, that’s the kind of after-dinner conversation you have at DEMO :)

The real show started on Tuesday, with the first group of presenters in the morning.  There did seem to be a few network issues, which was unfortunate.  Digsby actually ran an “offline” version of their chat client to demo their new Twitter capabilities.  All the data was cached locally.  Now that’s what I call a backup plan!

After the presentations, the pavilion was open for a few hours.  Our booth traffic was a bit slow.  Although we had a fair number of people come by, it was nothing like Web 2.0, where a constant stream came by.  I think two factors contributed to this: 1) we hadn’t yet presented and 2) we had already talked to almost all the press folks.

Wednesday came along, which meant it was time to demo!  Although people say I always seem uber-calm, I must admit I was just a touch nervous :).  The staff guy pulled me up.  Chris called me out.  I walk out – cameras, lights, hundreds of people before me, time to launch.  “Hi, my name’s Shion Deysarkar and I’m here to show you a revolutionary new service called <dramatic pause> 80legs.”  I wonder if I’ll ever forget the lines?

I actually used a pretty cool semantic 80App written by a technology partner of ours and compared what positive and negative things people are saying about DEMO and TC50.  I thought this would be a fun demo for the audience, given the interesting history between the two shows.  I didn’t actually show who came out on top though – people had to come by the booth to find out!  It turns out that DEMO just eked out, with a 95% to 91% positive rating over TC50.  If you want to learn more about the future of this app, check out these posts.

Side note:  Even though I poked a little fun at the TC crew, I thought they’d like the joke, given their sense of humor and attitude on DEMO.  Most of the audience cracked up at my joke, but a TC writer told me the joke was “lame”.  Oh well, can’t win them all.

The demo went pretty smoothly, which I was pretty happy about.  It was great to have it out of the way though.  About 2 hours after, I could feel my body crashing, as I could finally relax.  I don’t drink a lot of soda, but I went through about 3 Pepsis (why San Diego doesn’t have Coke is beyond me) before dinner to keep the energy levels up.

At the end of the show were the awards.  7 companies received DEMOGod awards, and 2 each received media prizes – 1 company in the consumer category and 1 in the enterprise category.  I’ll admit that I was a bit miffed we didn’t win the enterprise category, but c’est la vie.  Oh, we also got treated to a little dance by the DEMO staff.

Maybe I was wrong about Matt...

Maybe I was wrong about Matt...

DEMO was finally over.  It was a great experience, but I was looking forward to a little relaxation the next day.  I figured we’d sleep in, check out San Diego for a bit, and enjoy the moment.  I couldn’t have been more wrong…

Next-up, part 3: Post-DEMO, or “Now the Real Work Begins”

Our launch experience, part 1: beta

Wow, what a crazy few weeks it has been!  For those of you just tuning in, we just launched 80legs.  Since launching, we’ve been swamped with emails, press, tweets, and much more, but I thought I’d recap our experience, from beta to launch, including our experience at DEMO.

We announced our private beta at the Launch Pad event during the Web 2.0 Expo in San Francisco, back in April.  We had been working on 80legs since early 2008.  Around February, I decided we’d exhibit at the Web 2.0 Expo to get some early exposure.  When I signed the booth contract, we weren’t thinking of making 80legs available in April.  But then I came across the Launch Pad event that they had.  Applying was pretty straightforward – all I had to do was fill out a form.  But the form asked for what kind of demo I could show right now, so that the judges could get a sense for what we did.  At that point, you could run a crawl through 80legs, but there was no pretty interface to it.  It was just command-line Java.  So in the form I said something like “Nothing to show now, but trust me – it will be really cool in April!!” and submitted it.

I was pretty sure nothing would come of it.  Surely they had several applications for products that looked shiny and sexy and would never accept anything as obtuse as a “distributed computing service designed for crawling and processing web content”… that wasn’t even ready to show yet.  Then a few weeks later, I get an email welcoming us into Launch Pad.  Ohhh-k :)  I stood up from my desk (this is mid-February, I think) and said:

Guys, I’ve got news.  We’re launching our beta in April.

We are?

Yes.  At the Web 2.0 Expo.   In front of hundreds of people.  On stage.

We don’t have an interface.  Or any way for people to setup an account.  And we’re still making the crawling reliable.

Yeah.  I guess we have a month to do that!

So during March, we scrambled putting together the first version of the web portal, getting the crawling to an acceptable state, and a bunch of other stuff.  It was nose-to-the-ground, grind-away work, but at Launch Pad, we had something to show and it looked good (well, for a beta).  The Launch Pad garnered us some press as well.

We got about 300 sign-ups for the private beta – not bad for a technical product.  We decided on letting them into 80legs in periodic batches.  On retrospect, we could have handled this better.  The first couple of batches let in responded well and offered substantive feedback.  But later batches, which may have had to wait a few months, had forgotten about us.  The excitement had worn off.  It would have been better to let them all in at once, or to at least have sent them reminders.

During our beta period, we spent a ton of time on collecting feedback from users, quickly implementing suggestions we felt were important, and scaling up our crawling ability.  Every 2-3 weeks we worked on a major new feature, such as crawler improvements, 80Apps, the API and several others.  At the same time, we were implementing a ton of minor features to make the system more robust and usable.

Our beta was going well and was getting to the point where we were starting to think about going live.  But we wanted to make a splash with our live launch.  We needed something that would get the momentum going again.  Something big…

Stay tuned for part 2: DEMO ..!