Friday, November 20, 2009

Programming Tools

Every time I think about programming tools, I get really annoyed. If you've been programming for a while, you probably started off with the basic PRINT statement as your debugging tool way back when micro-computers were too small and insufficiently powerful to run anything as sophisticated as a debugger.

When Turbo Pascal 3.0 came out for the PC, it was a revelation, at least for me. You could have a programming environment that could not only compile at lightning fast speeds, but it too, was restricted to debugging via print statements --- debuggers only became available starting from Turbo Pascal 4.0.

When I got to college and had access to UNIX machines, having a debugger was a revelation. You could single-step through code, print variables, set break points (and even conditional break points), walk up and down the stack, and if you recompiled the code, you could restart the program and the debugging would automatically pick up the new binary. I got out of the habit of writing print statements.

As an intern at Geoworks, I became even more spoiled. Geoworks had an in-house debugger called swat, and the basic development environment was a SUN workstation connected to a PC via a serial cable. You would then cross-compile on your SUN (using a distributed compiling environment), download the code via the serial cable to the PC, and swat would run on your workstation while talking to a debugging stub on the PC. Swat was ridiculously sophisticated --- to this day, I still have not used a debugger that works as well. (The author, Adam de Boor, like most of the smart people I've ever met, now works at Google) First of all, it had an extension language built into it (tcl). But secondly, the programmers working on GEOS had a very tool-oriented ethic: every time a new data structure was added, they would also write a swat extension that understood how the data structure was laid out in memory. This enabled you to type "heapwalk" at the swat prompt, and the debugger would then walk through memory and dump out all the data structures in human-readable, human-formatted form! If you had a linked list, you could tell it to walk the linked list and dump every element in it. If it was a linked list of a certain object, you could tell it to dump out the actual objects while walking through the list, rather than just dumping the pointer. Even though GEOS was written entirely in assembly (yes, even the applications --- how do you think everything fit into 512KB?), it felt more sophisticated than any high level language except Lisp.

When I graduated school and worked at Pure Software, we took a lot of pains to make sure the purify would work with debuggers. Stack traces, etc., would work with whatever debugger you used, and variable names always remained intact. This was despite incremental linkers and other techniques that Purify applied to binaries under inspection. To this day, no other UNIX vendor or free software tool has deployed an incremental linker.

When I started having to do Windows development again, the IDEs such as Visual C++ felt like a step backwards --- they had a lot of pretty visuals, but none of them were extensible, so you couldn't teach it about your new data structure, or get it to walk a list. Nevertheless, I still didn't need to write PRINT statements. When I ended up writing VxDs for a living in 1995, I had a much more primitive environment, and it was painful, but I quickly learned to abstract away most of the issues and not rewrite VxDs as much as possible.

Enter the internet server age, and I feel like it's 1986 again, and I might as well be programming on a PDP-11 using RSTS/E BASIC. Today, any kind of cloud programming that requires harnessing multiple machines essentially relies on RPCs. One would think that with all the knowledge we have from building old debuggers and such systems, we would be able to do things like single-step through a procedure from one machine to a remote machine, and still be able to do stack dumps, walk stack traces, and print data structures. The sad truth is, we can't. In fact, in many environments, you can barely attach a debugger to a remote process, and in some cases if you do attach a debugger and then detach it, the process immediately exits. Symbolic variable names? Thanks to C++ name mangling, I can barely decipher error messages from the compiler, let alone use a symbolic name in a debugger. Combine that with threads, remote systems, and other such setups, and pretty soon you're back to debugging using PRINT statements. You might dress it up and call it "logging" (and I know I've been guilty of doing that myself), but really, it's debugging via PRINTs, and as someone who calls himself a software engineer, whenever I put in yet another LOG statement I feel ashamed, both for myself and for my profession --- we had such beautiful tools in the 80s and 90s, but they are all wasted in the internet era. Yes, I'm well aware that people have written RPC analyzers --- but again, they're all after-the-fact analysis tools --- not nearly as useful as being able to "stop the state of the world and examine the state at leisure", which was what swat and the other tools were capable of doing.

What's responsible for this state of affairs? I think the big one is the decline of the market for programming tools. After Borland died, there was no longer an effective programming tools company that had the kind of end-to-end reach that could provide a development environment that was sophisticated. Microsoft all but stopped evolving its programming tools. Since it was impossible to compete against the free gdb/gcc/g++ tools (and now the free Eclipse), it became a case of "don't beat them, join them." Without end-to-end control of a development environment, it's hard to build a debugger that would do the right thing --- Microsoft could probably do that for its environment, as can Apple, but neither are power-houses in client/server/distributed computing. Google and Yahoo could invest in their distributed debugging infrastructure, but have chosen to invest resources elsewhere. The net result: I don't feel like our programming tools have done anything but gone backwards, despite all the progress we've made in other areas.
Post a Comment