Powerset launches itself directly into the toilet 3
Stop me if you’ve heard this one: a “semi-stealth” startup spends 2.5 years, hires ~200 people and gives them all new MacBook Pros, takes untold millions in multiple rounds of VC funding, makes a bunch of noise about being the next generation of search and then launches with a search engine that basically re-indexes Wikipedia and not much else. ROFLOL, right?
Not if you’re Powerset its not. This is exactly what they’ve done.
As far as I can tell, their burn rate can’t be any less than ~$21MM/year. This is counting 200 people at an average $100K/year salary and 1,000 small EC2 instances for a year. I’m pretty sure they are using more EC2 instances and lots of their people are making more than $100K/yr in downtown San Francisco, but I’m being conservative here. This doesn’t seem that bad until you realize that their revenue is $0/forever thus far. Conservatively, they’ve probably burned $40MM already building the product as it currently stands. Geez, that’s a lot. Google got something useful up and running for $100K.
And where is that product? Discussing it on Twitter today, my boy Cliff Moon (a Powerset engineer), sent me a link to show how good the results were. Here is that link:
http://skitch.com/moonpolysoft/mjx5/who-shot-the-man-who-shot-jfk-powerset
That’s a pretty awesome result, I have to admit. Direct and to the point and right with the info you’d need to win that bar bet. However, I replied back with this link, pertinent to some work I’m doing currently (and something that Wikipedia definitely has an article on):
http://www.powerset.com/explore/pset?q=mondrian+database&x=0&y=0
Not so much.
Now, this little exchange proves nothing. The real thing that struck me was that every result I searched for was basically a re-ambiguated list of Wikipedia results. Powerset claims to be using Wikipedia and Freebase as its base data for now so that makes some sense. However, I took a look at Freebase and it appears that most of Freebase is Wikipedia data, too! Thus, one could (semi)facetiously claim that the powerset of Powerset is {Wikipedia}. This is not great for them. I don’t really switch search engines for this little incentive. Why would I want to use Powerset when Wikipedia already has its own search engine? If that fails I can always hit Google and pare down the results to wikipedia.org, which I’m sure is what a lot of power-Wikipedia users already do.
The larger question is whether or not people would switch to Powerset. If it were an order of magnitude better, yeah, I think they would. However, the Powerset I’m seeing is nowhere near as robust or helpful as Google is today and those guys aren’t exactly standing still over there in Mountain View.
As well, I would question whether or not the question-based interface is as useful for everyday searching as the keyword-based interface of the current crop of search engines. Perhaps I’m just ingrained to that method by now, but the question-based interface seems clunkier and is definitely slower and less flexible than the keyword-based interface. There’s just more ways to query an engine based on a set of keywords than there is if you have to formulate a question to do the same job. I can see how a question-based interface would be superior in certain cases, but in general? Doesn’t seem so to me.
On the advertiser side, the question-based interface brings problems there, too. Today’s search engines allow engines to buy keywords in conjunction with the ads they want to display when said keywords are queried. How is Powerset to build a CPC engine on top of a question-based engine? Are advertisers expected to have to guess at the questions Powerset’s searchers are likely to query? That seems to me to be an order of magnitude more difficult than today’s keyword-guessing-game, which is already hard enough for the advertisers as it is.
Most of the “questions” I ask to search engines don’t have one paragraph answers. When I’m researching, I want to quickly skim a bunch of sites that relate to my general query topic and then get down deeper and deeper as I learn more. Only once I’ve done that might I have some “questions” that I could properly formulate for consumption by Powerset. Am I to assume that Powerset would have me use Google or MSN for the first 80% or more of my researching? I hope that’s not their goal or their VCs are taking an acid bath right now.
I can’t remember being this underwhelmed from such an overhyped product before. Powerset really let me down. If this is the next generation of search, I’m sticking with the current generation, thank you very much. As the eminent Internet sage Ted Dziuba would say: FAIL. Google’s probably throwing a victory party in the volleyball courts right now. I really hope that Powerset gets its act together and makes me look like an idiot for posting this, but after today, its looking to me like that will be a very tough proposition, indeed.
37signals and Divergent Reality 2
I was just reading about In/Out, 37signals’ internal Twitter clone over on their blog and it struck me how they sometimes mutilate context in their writings. First of all, DHH knows the Twitter guys well. Second of all, they didn’t even mention that it was inspired by Twitter and even went so far as to refute this in the comments. I guess a place that’s know for its “creativity” has to keep up the appearances of being innovative…
UPDATE: Apparently, as the comments state, I am wrong about this. My bad. Nevermind this post.
Is AppEngine Python's Rails? 4
I was thinking about AppEngine some more today and it occurred to me that not only could AppEngine be responsible for a lot of people learning/using Python, but it very well might be Python’s answer to Rails. Its so-called “killer app”, if you will.
Up until this point, Python has been plagued with multiple, competing Web frameworks all taking some mindshare and there really hasn’t been a strong rallying point in the Python Web community like Rails. It appears that Django has been winning out in the blogosphere lately, but its nothing like Rails’ devout following in Ruby-land.
However, with AppEngine, Google does Rails one better: instead of just making it easy to code your app, they make it just as easy as Rails to code and dirt simple to deploy and reduce the operation maintenance to near zero. The need for things like ActiveRecord and migrations is pretty reduced in the AppEngine environment, as is all but a tiny knowledge of SQL (called GQL in that realm). That’s really, really attractive if you’re a Web consultancy shop that’s looking to turn over clients as fast as possible or a side project with a mandate for speed, quality and low cost. To me, that seems like it would be worth learning a little Python for.
AppEngine is pretty clearly aimed at Facebook’s F8 platform but it could end up hitting Python with a major boost in popularity as an aside. I bet Guido is smiling all the way to the bank on this one…
Google AppEngine Thoughts 1
I just read a little bit about the recent announcement of Google AppEngine and I think its a pretty good service overall for certain kinds of Web applications. Unlike the rumors that were floating about prior to its announcement, its not a competitor to AWS directly, nor even a competitor to Ning as some have also claimed. To me, it seems more directly in competition with Heroku if such a thing could even be said. It is clearly based on Bigtable, though, so that part of the rumor appears to have been true.
Being so simple has some advantages and presents some interesting constraints. Because you don’t have root and can’t run the “box” yourself, you’re forced to think simply about the app itself. This seems as if it would be quite a welcome constraint for a lot of Web developers who don’t care about running a network. But the constraint really serves to enable Google-style scalability at its heart. As well, the use of CGI allows Google to run your code wherever they deem it best at the moment underneath, without you having to care about that sort of thing. This is something they already do quite well.
As well, the lack of a traditional RDBMS is something that obviously works well in a shared-nothing environment and makes a lot of sense since Google’s infrastructure is already based on such. This gives real credence to the idea that the RDBMS isn’t the be-all-end-all and will introduce a lot of developers out there to this new way of thinking.
A few things I don’t like:
- Users of an AppEngine app must login with Google Accounts
- You do have to upload your Python source to Google and this leads to obvious privacy and IP concerns
- No recurring job scheduling (essential for lots of different kinds of apps)
One more obvious thing is that an application built on AppEngine is one that is much, much easier for Google to acquire than one that is not. Don’t underestimate that piece of it, as its likely this service will be a loss-leader for Google.
Finally, I think this announcement will be very good for Python. As AppEngine only supports Python right now and for the foreseeable future, anyone interested in AppEngine will have to learn some Python. This will shed some more light on Python for people who might not have otherwise given it a try.
UPDATE: Apparently, others agree about the acquisition potential of apps on AppEngine. Good stuff.
Twitter and me 1
I’ve been using Twitter for some time now and I post to Twitter orders of magnitude more than I post to this blog. With my blog, I don’t feel like I should post something unless I have something interesting or funny to say that others would like to spend some time reading. However, Twitter makes it easy to just post whatever I’m doing or thinking at any random time, reply to other’s conversations, keep up on what’s going on with friends, etc.
I really like the Twitter model, too, in that there are some very interesting constraints:
- messages are 140 characters or less
- Twitter auto-shortens links with tinyurl
- no embedded audio/video (I’m looking at you, Pownce)
- very simple network model (follow and be followed)
Twitter’s been the whipping post of the Internet for the past year because they had some well-known scaling issues and this was incorrectly blamed on their underlying Web framework, Ruby on Rails. Twitter’s got some things going for it as far as an interesting example of scaling, though, in that the model is so simple. I’ve been playing around in my mind with designing a Twitter clone in Erlang or Stackless with no RDBMS just as a mental exercise. More on that if it ever becomes concrete-er-ish.
In any case, my blog isn’t going to die for a while but if you really want to keep up with every little thing with me you should follow me on Twitter ;-)
Running on Thin 1
After moving all my sites to Slicehost, I figured I could also now experiment with the backend of this blog, too. So, instead of a Mongrel-backed Rails app, I am backing Typo up with the Thin webserver. Thin is supposed to be faster and more concurrent than Mongrel in any case, even the evented Mongrel from Swiftcore. Here’s hoping things go well!
Slicehost, FTW 2
A couple hours ago I moved this blog, my main website and the PhillyLambda homepage over to the Slicehost service. I’ve been a Dreamhost customer since 2002 and back then they were great and cheap. They were great for a while after that, and then they were OK but now they are pretty bad. Routinely, when I SSH into Dreamhost, I see load averages in the double digits (one time it was 135+). Every time I post to this blog, I have to hit Publish and then hit back about 5 times to get it to really commit because the FastCGI keeps snapping between Apache and Rails and throwing up a 500 error. I was getting pretty tired of it all.
So, for just a little bit more money I get my own dedicated VPS where I can run whatever I want. I’d heard really good things about Slicehost for a while so I decided that when I got the time, I’d move my sites over there. I also am using DNS services from Nettica as they were very good to me while I was at Commerce360. Here’s hoping that when I press Publish this time it just does it! ;)
Upgraded to Typo 4.1.1
Its about time! I’ve upgraded the blog to Typo 4.1.1 . I also have a new RSS feed for this blog here . If you are subbed right now, you should switch to this new feed URL as I will be turning off the old one after a while.
I was previously running on 2.6.0 since the inception of this blog and I was just lazy in getting around to upgrading. Of course, I just spent the last hour tweaking things and changing themes and marking comments and trackbacks as ham/spam, but hey… what are Sundays for? Hope you all enjoy the new digs. Here’s hoping that Dreamhost’s FastCGI will somehow be improved by this upgrade ;)
Nettica DNS and Amazon EC2 1
When I started using Amazon’s EC2 service I realized pretty quickly that the traditional load-balancing solution of putting a big, honkin’ F5 BIG-IP in front of the servers wasn’t going to work out. Amazon doesn’t currently rent F5’s ;)
So, I went looking around for a DNS-based load-balancing solution that would be flexible enough to deal with the dynamic environment that EC2 provides. However, I pretty quickly found that the existing dynamic DNS APIs of most of the providers were not up to the task of programmatically updating a DNS record the way I needed. Specifically, I wanted to be able to register and deregister an EC2 instance with a round robin A record automatically upon instance startup and shutdown.
In the end, I was only able to find one dynamic DNS provider whose API was up to this task: Nettica.
Once I found them, it was pretty easy to wrap their SOAP-based API into a binary to drive this type of dynamic management of my DNS records. Apache Axis took care of turning the WSDL into Java (the only library that could do so across three programming languages, by the way) and the code for driving that wrapper was pretty simple. The end result of that effort is now open sourced for all to use.
The way I use the new Nettica client is to have the init script automatically register an instance on startup with the Nettica service. However, I remove the instance from Nettica a bit before shutting down the instance itself in order to deal with DNS caches that sustain the now-removed IP for longer than the specified TTL value. You might be able to automate this by throwing a sleep LARGENUMBER into the shutdown portion of the init script, but I haven’t tried this yet.
In any case, I hope people find this client helpful when using EC2. I look forward to hearing about experiences with it and improving it for others. I’m planning for a release pretty soon to add support for RPM packaging, as most of the AMIs used today appear to be RedHat RPM based. Watch this space for updates.
WWW2007 Wrapup
Wow, that was a really great conference. I met a bunch of people there, saw a few I already knew from other fields and listened to some great talks. I wasn’t sure if I was going to feel that it was good enough to entertain thoughts of hitting next year’s conference in Beijing, but now I just might.
Also, I saw some more good talks yesterday:
Scaling Up All Pairs Similarity Search
Exposing Private Information by Timing Web Applications
On Anonymizing Query Logs via Token-based Hashing
Those are just some that were really good. I also have a stack that I want to read but who’s talks I missed. I’ll post the pics I took with the phone when I get off this damn airplane.