80legs Subscription Plans and Free Web-Crawling

We have just updated 80legs with some exciting new changes.  Starting today, the 80legs service will be divided into 3 tiers: Basic, Plus and Premium.  Since the time we’ve launched, we’ve noticed that our customer base can be classified into 3 major groups – light, medium and heavy users.  Each of these plans is targeted to each group and designed to fulfill their specific needs.

Here are details on each plan:

Basic Plan:

  • Free to use
  • Normal crawling speed (up to 1 request/second/domain)
  • Access to 80legs Web Portal
  • 1 job running at a time
  • Up to 100K crawled pages per job
  • Low priority in 80legs job queue
  • No recurring jobs allowed

Plus Plan:

  • $99/month + crawling fees
  • Fast crawling speed (up to 5 requests/second/domain)
  • Access to 80legs Web Portal and API
  • Up to 3 jobs running at a time
  • Up to 1M crawled pages per job
  • Normal priority in 80legs job queue
  • Recurring jobs allowed

Premium Plan:

  • $299/month + crawling fees
  • Ultra-fast crawling speed (up to 10 requests/second/domain)
  • Access to 80legs Web Portal and API
  • Up to 5 jobs running at a time
  • Up to 10M crawled pages per job
  • Preferred priority in 80legs job queue
  • Recurring jobs allowed

Existing users can sign up for a plan by going to the new Subscription section in the 80legs Web Portal, where there are complete details and instructions on signing up for a plan.

We’re really excited about these changes.  Of course, the Basic Plan now enables completely free web-crawling, which until today has been completely unheard of.  The Plus and Premium Plans give heavier users the ability to set up and run more intensive crawls.

If any of our users have questions about the changes, please contact us or submit a tickets.  We’re always happy to hear from you!

Defrag Experience

Defrag2009LogoThis past week I was in Denver attending Defrag 2009, which is something of the uber-tech geek con and bills itself as:

…focused on the tools and technologies that accelerate the “aha” moment, and is a gathering place for the growing community of implementers, users, and thinkers that are building the next wave of software innovation.

It was a very unique experience, to say the least.  We actually were unsure of our interest in attending when we first heard about Defrag.  Eric Norlin had contacted me several months ago about us being a sponsor.  Since “big data” is one of the themes at Defrag, he justifiably figured that we would fit right in.  Unfortunately, with DEMO looming, we were unsure of Defrag of being worth taking a chunk out of our budget.  I actually initially declined Eric, but he was persistent and contacted me again after DEMO.  Of course, I was even more cautious about committing now that all the money for DEMO had actually been spent!  But, after Eric offered the opportunity to speak, I decided we’d go for it.

Let me first say that deciding to attend Defrag was definitely the right move.  The quality of level of the audience is definitely the highest of any conference I’ve seen.  Each person that came by the booth was plugged-in, technical and business-savvy.  We actually managed to generate a good number of promising leads, which was impressive considering there were only about 350 people in attendance.  From a pure business perspective, just closing 2 or 3 of these leads would make the conference worth it for us.

We had some great one-on-one conversations with folks there, including talks with the guys at Infochimps, Robert Scoble, and Bill from Factual (previously of Y! BOSS).  We also gave some folks a sneak peak at what we’re working on with Language Computer.  Without providing too much detail, we’re building a service called Extractiv, which will let people turn any part of the web into highly structured, semantic data.

The one down vote I would give for Defrag is that the talks didn’t always live up to “tech” billing I thought they would.  In many cases, on-stage discussion converged onto social media, Twitter, etc.  While those are important new developments, many of the speakers focused on how to create the right UI or visualize social content.  My personal opinion is that UI and visualization are not the hard problems to be solved in these spaces.  Rather, converting that content into meaningful and actionable data is.

Oh, and here’s my little presentation!

I think I ruffled some feathers by actually suggesting something could be better than the cloud in some cases (god forbid!). :)

Overall it was a great time, and I look forward to attending next year!

Web-Scale Apps Challenge

10-19-2009 11-22-33 AMWe just launched the 80legs Web-Scale Apps Challenge over at ChallengePost!  We’re challenging anyone and everyone to make the coolest apps for crawling and processing web content.  The top 3 entries will win some pretty sweet prizes, like a Kindle, original mint-condition Atari, and more.

We issued this challenge in anticipation of our App Store launch, which will happen the week of November 16th.  The 80legs App Store will allow our users to buy and run 80Apps created by third-party developers.  Our users will get to run custom code without having to do their own development work, and developers get a way to monetize cool web content processing technologies.

More details on the App Store to come!  For now, check out the challenge at http://www.challengepost.com/challenge/80legs-web-scale-apps-competition!

Most of the web isn’t real-time

I should have gotten around to this post about a week ago, but we’ve been running around doing real work since our launch.  Anyway, a while back, Marshall Kirkpatrick wrote a post entitled “Ten Useful Examples of the Real-Time Web in Action” on ReadWriteWeb.  In it, he outlines several benefits that real-time web technologies can provide.  At #1 is “Real-Time Push to Replace Web Crawling”, where he references PubSubHubbub co-creator Brad Fitzpatrick wondering about something that certainly interests us:

…real-time push technologies could someday replace the need for most of the web crawling his employer Google does to maintain its index. If all webpages were PubSubHubbub enabled, for example, they could simply tell a Hub about any changes they had published and Google could find out via that Hub. There would be no reason for Google to poll websites for changes over and over again.

Although this idea is certainly very compelling, I don’t think it’s very likely that real-time push can replace crawling.  Here’s why:

  1. Real-time push is only useful for (surprisingly enough) real-time content, which is a small % of web content, and always will be (just do some simple induction to figure out why).  So unless you’ve been receiving pushes since “time 0″, you won’t be getting all the content you might want.
  2. Real-time push allows the site to only provide snippets of content, which means you’ll have to crawl if you want more.  Put another way, sometimes the guy making the request wants control over the response of that request.  Imagine that ;)
  3. This idea depends on all sites using real-time push, which I personally feel is highly unlikely to happen.  Just ask the semantic web guys how many webmasters use RDF markup.

The above 3 points are general rebuttals to the idea that real-time push will be pervasive.  There’s still a specific reason why 80legs would still maintain an advantage over real-time push, and that’s because our distributed architecture would still provide performance and cost advantages when it comes to accessing and processing web content.  Simply put, we can throw more bandwidth and compute power for looking at and processing web content then what someone could do on their own, with a centralized data center.

Let me finish off by saying that I do think real-time push is a really cool technology.  For things like pulling status updates, news, etc., it can be really useful.  But I think the vast majority of the web will always need to be crawled, for many different purposes that real-time push can’t provide.

Our launch experience, part 3: Now the real work begins

I ended my last post mentioning that I figured we could take a small break the day after DEMO.  Boy, was I wrong.  When I woke up the next day, I saw several hundred emails, about 300 tweets referring to 80legs and dozens of articles discussing us.  So instead of checking out the beach, we spent the morning responding to emails and catching up on all the 80legs discussion.

I think we did a really good job of getting the word out for 80legs.  Here are some quick stats showing how well we did on this front:

  • # of articles on 80legs: 16
  • # of times 80legs was mentioned as “Best of DEMOfall09″: 2
  • # of re-tweets of articles: 700+

Here are just some our favorite articles:

I should also note that we got posted to Hacker News, Digg and Slashdot.  Here’s what happened to our web traffic in the week following DEMO:

google_analytics

Having a 1 Gbps connection helps

Interesting note: most of our web traffic came from Hacker News.  We check HN regularly and participate in the discussions from time to time, so it was awesome to get so much interest from our own community.  Of course, our main focus is not our web traffic (which I think is pretty good for non-consumer-facing service), but customer adoption.  Here are a few stats on that:

  • # of users that logged in since DEMO: 1554
  • # of jobs run since DEMO: 1557

Just as an aside, there were about 50 active beta users, and not every user that logged in has run a job.

Another interesting outcome from DEMO is that we’ve realized there’s demand for customized services on top of 80legs.  In other words, people want to use our team to either build customized products for them that are powered by 80legs, or they want us to build the 80Apps that run within 80legs.  We originally expected third-party companies to build these services and products themselves over time as 80legs became more popular.  In the long-term, that is most likely the key to 80legs sustainable success.  In the short-term, however, we think it’s prudent to pursue these engagement ourselves.  In fact, it makes sense to modify our business model somewhat and form 2 additional product/service lines: one for developing value-added services on top of 80legs and another for custom implementation of 80Apps.  Of course, we need to consider how to manage these two additional lines while still managing and improving the core service.

I feel our team’s experience so far has been pretty awesome.  We spent about 2 years developing what we feel is a pretty cool technology and now we’re starting to see the fruits of our labor.  That said, I’m a big believer that developing good technology is just the first step of many when it comes to finding commercial success.  Now we get to focus on execution, customer satisfaction, and delivering on what we’ve been promising.  Now the real work begins.

Our launch experience, part 2: DEMO

Around July, we started thinking about how to launch the live service.  We were fortunate that our plans lined up with DEMO.  Of course, they also lined up with TechCrunch50.  I imagine some companies have to think about which one is best for them, but for us it was pretty easy.  TC50 required a company to have no public exposure before their event, which of course made us ineligible.

We did have to think a bit about the cost of DEMO.  I talked to my friends that had demoed there and was ultimately convinced that it was a great place to launch a product, provided you took full advantage of it with the press, PR and other media outlets.

Again though, I wasn’t sure we would even get into DEMO.  80legs was usable by this point, but again – here was a completely non-shiny service, void of any semblance of a bell or whistle.  Sure, any “big data”-nut is going to think what we do is the coolest thing since SSDs, but will anyone else?  We weren’t sure.

Carla from Guidewire was the one that talked to me about our application.  I gave the 5-second spiel, and was excited to hear that she understood it and really liked the idea.  She did wonder about how we could make the demo interesting.  I assured her we could (while making a note to myself: “Figure out how to make demo interesting!”).

A few weeks later…

Guys, I’ve got news.  We’re going live in September.

We got into DEMO?

Yep.  So we’ll be on stage.  Hundreds of people.  Thousands of Internet viewers.

So we have about 8 weeks to get everything stable, fully-tested, and scaled out.. oh and we need to make the web portal look a lot better.

Yep!

Now, it’s not like we had been slacking off, but July to September was especially scrambly, particularly for our back-end guys.  On the business and marketing end, we wanted to make sure we take full advantage of not only DEMO itself, but the momentum it could generate after the event.

For that, I sought out a PR firm to help with the media.  I asked a bunch of tech/startup friends in Texas about who to go with, and almost all of them recommended Jones-Dilworth, run by the veteran Josh, who had just left Porter Novelli.  If every trusted source you have recommends the same firm, you should probably go with them!

Josh and his team met with us and mapped out a strategy to garner media attention for DEMO and keep momentum going afterward.  They also helped with training our team for handling interviews, which was a big help.  In the week leading up to DEMO, I did at least 1 interview almost every day.  It was pretty awesome talking to and being interviewed by the same folks I had been reading every day for the past few years.

We got into San Diego on Sunday.  The event and crew at DEMO were very nice and professional.  They definitely run a tight shift, but are also super-approachable.  Everyone on staff seemed to know all the details, where to be, etc.

On Monday, all the demoers went through a few introductory items and then we headed off to a happy hour by the bay.  Mingling with other startups, VCs, and press folks is pretty fun.  It’s pretty awesome to be at a party where everyone is doing something interesting or has something engaging to say.  Can’t say the same about most bars I go to :).

Not your usual bar scene

After that was the “CEO & Dealmakers” dinner, which was only attended by 1 member of each company as well as VCs and other such folks.  While the pre-dinner topic, “The Good, Bad and the Ugly of VC” is something I’ve read about ad nauseum, hearing it straight from guys like the president of the NVCA was pretty cool.  I got a chance to thank Matt Marshall and Chris Shipley for giving us the chance to DEMO.  Matt and Chris kind of seem like opposites.  Chris was cracking jokes about Pittsburgh (I went to CMU and she’s from there), but Matt was like “But seriously, what are you demoing?”.

CEO & Dealmakers Dinner

After dinner, I had a cool talk with Flip from Infochimps and Mike Olson from Cloudera about Hadoop and how we might use it for providing post-processing services of crawled data.  Yeah, that’s the kind of after-dinner conversation you have at DEMO :)

The real show started on Tuesday, with the first group of presenters in the morning.  There did seem to be a few network issues, which was unfortunate.  Digsby actually ran an “offline” version of their chat client to demo their new Twitter capabilities.  All the data was cached locally.  Now that’s what I call a backup plan!

After the presentations, the pavilion was open for a few hours.  Our booth traffic was a bit slow.  Although we had a fair number of people come by, it was nothing like Web 2.0, where a constant stream came by.  I think two factors contributed to this: 1) we hadn’t yet presented and 2) we had already talked to almost all the press folks.

Wednesday came along, which meant it was time to demo!  Although people say I always seem uber-calm, I must admit I was just a touch nervous :).  The staff guy pulled me up.  Chris called me out.  I walk out – cameras, lights, hundreds of people before me, time to launch.  “Hi, my name’s Shion Deysarkar and I’m here to show you a revolutionary new service called <dramatic pause> 80legs.”  I wonder if I’ll ever forget the lines?

I actually used a pretty cool semantic 80App written by a technology partner of ours and compared what positive and negative things people are saying about DEMO and TC50.  I thought this would be a fun demo for the audience, given the interesting history between the two shows.  I didn’t actually show who came out on top though – people had to come by the booth to find out!  It turns out that DEMO just eked out, with a 95% to 91% positive rating over TC50.  If you want to learn more about the future of this app, check out these posts.

Side note:  Even though I poked a little fun at the TC crew, I thought they’d like the joke, given their sense of humor and attitude on DEMO.  Most of the audience cracked up at my joke, but a TC writer told me the joke was “lame”.  Oh well, can’t win them all.

The demo went pretty smoothly, which I was pretty happy about.  It was great to have it out of the way though.  About 2 hours after, I could feel my body crashing, as I could finally relax.  I don’t drink a lot of soda, but I went through about 3 Pepsis (why San Diego doesn’t have Coke is beyond me) before dinner to keep the energy levels up.

At the end of the show were the awards.  7 companies received DEMOGod awards, and 2 each received media prizes – 1 company in the consumer category and 1 in the enterprise category.  I’ll admit that I was a bit miffed we didn’t win the enterprise category, but c’est la vie.  Oh, we also got treated to a little dance by the DEMO staff.

Maybe I was wrong about Matt...

Maybe I was wrong about Matt...

DEMO was finally over.  It was a great experience, but I was looking forward to a little relaxation the next day.  I figured we’d sleep in, check out San Diego for a bit, and enjoy the moment.  I couldn’t have been more wrong…

Next-up, part 3: Post-DEMO, or “Now the Real Work Begins”

Our launch experience, part 1: beta

Wow, what a crazy few weeks it has been!  For those of you just tuning in, we just launched 80legs.  Since launching, we’ve been swamped with emails, press, tweets, and much more, but I thought I’d recap our experience, from beta to launch, including our experience at DEMO.

We announced our private beta at the Launch Pad event during the Web 2.0 Expo in San Francisco, back in April.  We had been working on 80legs since early 2008.  Around February, I decided we’d exhibit at the Web 2.0 Expo to get some early exposure.  When I signed the booth contract, we weren’t thinking of making 80legs available in April.  But then I came across the Launch Pad event that they had.  Applying was pretty straightforward – all I had to do was fill out a form.  But the form asked for what kind of demo I could show right now, so that the judges could get a sense for what we did.  At that point, you could run a crawl through 80legs, but there was no pretty interface to it.  It was just command-line Java.  So in the form I said something like “Nothing to show now, but trust me – it will be really cool in April!!” and submitted it.

I was pretty sure nothing would come of it.  Surely they had several applications for products that looked shiny and sexy and would never accept anything as obtuse as a “distributed computing service designed for crawling and processing web content”… that wasn’t even ready to show yet.  Then a few weeks later, I get an email welcoming us into Launch Pad.  Ohhh-k :)  I stood up from my desk (this is mid-February, I think) and said:

Guys, I’ve got news.  We’re launching our beta in April.

We are?

Yes.  At the Web 2.0 Expo.   In front of hundreds of people.  On stage.

We don’t have an interface.  Or any way for people to setup an account.  And we’re still making the crawling reliable.

Yeah.  I guess we have a month to do that!

So during March, we scrambled putting together the first version of the web portal, getting the crawling to an acceptable state, and a bunch of other stuff.  It was nose-to-the-ground, grind-away work, but at Launch Pad, we had something to show and it looked good (well, for a beta).  The Launch Pad garnered us some press as well.

We got about 300 sign-ups for the private beta – not bad for a technical product.  We decided on letting them into 80legs in periodic batches.  On retrospect, we could have handled this better.  The first couple of batches let in responded well and offered substantive feedback.  But later batches, which may have had to wait a few months, had forgotten about us.  The excitement had worn off.  It would have been better to let them all in at once, or to at least have sent them reminders.

During our beta period, we spent a ton of time on collecting feedback from users, quickly implementing suggestions we felt were important, and scaling up our crawling ability.  Every 2-3 weeks we worked on a major new feature, such as crawler improvements, 80Apps, the API and several others.  At the same time, we were implementing a ton of minor features to make the system more robust and usable.

Our beta was going well and was getting to the point where we were starting to think about going live.  But we wanted to make a splash with our live launch.  We needed something that would get the momentum going again.  Something big…

Stay tuned for part 2: DEMO ..!

80legs has launched!!

The day is finally here!  We are now live, beta has exited, 1.0 is a go!

Before I go any further, I want to thank the many beta users that helped us over the last several months by providing feedback, suggestions for improvements, and identifying bugs.  Without your help, we wouldn’t have been able to get 80legs to where it is today.

During the private beta, we were working on several features, all of which are now ready for public use.  These features are:

  • True web-scale crawling: crawl up to 2 billion pages per day
  • Usability: easily and design your own crawls using an intuitive job form
  • 80Apps: write and run your own applications on over 50,000 computers
  • API: programmatically control 80legs to work for you

There’s also one big change that comes with leaving beta – 80legs is no longer completely free to use.  Our pricing is now in effect.  You can still dip your toes in the water and run jobs that crawl less than 100 pages.

We are doing our official launch announcement at DEMO.  If you happen to be at the show, please come and visit us at pavilion station #2!

Last day to vote for SxSWi panels!

Just a reminder – today is the last day to vote for SxSWi panels.  We’re really interested in four of them, so click the links below and vote if you love us!

Scaling the Semantic Web

Big Data, Big Dream

Semantic Search: Life Beyond Ten Blue Links

Semantic Search: Off to a Good Start

In support of Plura and Digsby

This is admittedly kind of an insider-y post that gets complicated very quickly.  Still, we thought it important to offer our POV and make clear our position on the matter.

If you’re not familiar, you can read about the recent controversy involving Digsby, an excellent instant messaging platform, here.  Unfortunately, Digsby learned the hard way that it’s important to go above and beyond in disclosures to users.

Plura, which gathers and resells compute power from a distributed grid of 50,000 individual machines (nodes, in our language), was also involved in the controversy because Plura is being used as part of the Digsby application (Digsby is a Plura affiliate).

The involvement of Plura as a supplementary business model angered and frustrated Digsby users and the press, because many of them did not know that their latent, excess CPU cycles were being utilized.  Most of the controversy centered on Digsby and whether they were as transparent as they could and should have been.  As a result, Digsby is releasing a new version of their application, and Plura has strengthened their Terms of Use.

For more about how Plura works, see here.

We are invested in this situation first because we share an investor with Plura.  We have also been both a beta user for Plura from the early stages, and one of their biggest customers.  We’re obviously biased because we’re sister companies, but Plura is as white hat as they come, and we’re big believers in the service they provide.

What does it mean that we are a Plura customer?

Basically, we get our raw computing power from Plura.  Our own IP centers around Web-scale crawling and processing, and our goal is to fundamentally democratize Web-scale crawling and processing, making it easy to use and available to anyone.  But we are essentially “powered by Plura” – they do the heavy Web-scale lifting, and we focus on what we do best – finding and storing information about Web content, with accuracy and speed.

We’re agnostic in the sense that we could eventually decide to build our own, stand-alone supercomputer, for example, to power 80legs.  But the purpose of the post is to voice our support for Plura.  Plura provides a high-quality service at an exceptionally affordable rate.  We would not have been able to get the 80legs service off the ground as quickly as we have had it not been for Plura.

Furthermore, Plura’s goal is essentially our own – the democratization of Web-scale capabilities of many kinds.

We feel very comfortable with the news Terms of Use that Plura has issued, and their promise to police those terms aggressively.  We feel strongly that companies of all kinds should be proactive in disclosing their business practices to users, even if to a fault, and we’re doing the same here at 80legs.

It’s also not lost on us that Plura is and may continue to be a somewhat controversial service, because it is taking a proven non-proft model (think SETI@Home, etc.) and turning it into a for-profit business.

Although the concept isn’t new, the trust barrier rose quite a bit, and for good reason.  We think that Plura is a sound organization, and we feel confident that they will be successful in building a brand and a service based on trust.

We also really like the idea that Plura offers a new, supplementary revenue stream to startups – especially in today’s economic climate, that’s a welcome development.

If you have questions about Plura and/or our experiences working with them, please don’t hesitate!

Next Page »


Twitter Updates