Friday, April 09, 2010

Infrastructure

If there's anything Google loves to build, it's infrastructure. Google had entire buildings worth of machines, and lots of ways to make use of all of them. There's MapReduce, Bigtable, Blobstore, and all sorts of other distributed infrastructure. So much so that engineers frequently told me that they thought that developing without all that infrastructure would be crippling, and would slow them down too much.

The irony, of course, is that companies like FriendFeed gives the lie to that statement. Friendfeed was launched in weeks! If you're not Google, you don't have to scale to billions of users right away. Existing tools can be made to be extremely scalable. For instance, even MySQL can be made to scale. The truth is, launching products at a big company takes a much longer time than at a startup because of non-coding related reasons. In fact, many of the engineers who made that above statement would find creative ways around missing infrastructure if they were at a startup: context is everything.

I remember attending a talk by YouTube engineers after the acquisition (this was at OSCON, so I know it's unclassified information). What impressed me was how closed they always seemed to be to falling over completely. Yet they never did. Then it occurred to me that a startup should always be running at the ragged edge of what their systems can handle: to do otherwise would mean that you're not using all your resources efficiently. By contrast, Google can afford a few under-utilized machines. In addition, all that generic infrastructure has overhead. Generic cluster management software, for instance, doesn't (and can't) know enough about the overall job structure of your tasks to put compute-intensive tasks on the same machine as network bound tasks. But a startup with a customized software stack can do that (and frequently must do so) because they don't have enough machines to do otherwise.

In short, I think startups have to be very careful about building generic infrastructure just because that's the way Google did things. Google built generic infrastructure because its big problem (search) had to have massive scalability right away. Even with a single user, a search engine still has to search as much of the web as possible. But what applied to Google doesn't apply to all startups. Build only the tools you need as the need arises.
Post a Comment