I go to The New Old Thing a fair amount. It’s a blog written by a veteran system level guy at Microsoft. He puts some fun articles up – and I had to respond to this one. Here was his original article:
A car park in Birmingham switches from English to German in times of stress. Over a decade ago, a colleague of mine noticed an error message on the screen at the exit to the parking garage at the Seattle-Tacoma International Airport. The way the airport works, you pick up a ticket as you enter, and you pay your parking fee at vending machines stationed around the parking garage, and at the exit, you insert the (paid) ticket into the machine, which verifies that you paid your parking fee and opens the gate.
When my colleague pulled up to the machine, it had an error message. In German. Fortunately, my colleague knows German, and he recognized the error message as a Windows 95 serial port conflict resolution dialog. While he was trying to figure out how to click Abbrechen on a machine with no mouse or keyboard, an attendant walked up, took his ticket, and opened the gate.
A conversation ensued about what went wrong and the right way to fix it. Here was one guys response:
yuhong2: Well, as I remember, the dialog in question relates to multiple DOS virtual machines trying to access a serial port at the same time. There was several ways to virtualize a hardware device across multiple DOS VMs, and one of them was to allow only one DOS VM to access a device at a time. If two of them tried to access at the same time, the only option was to pop up a conflict resolution dialog like this one, and Win3.x and Win9x had built-in support for doing this. More information on all this stuff can be found in the old DDK docs (like the Windows Server 2003 SP1 DDK) that had the VxD documentation.
At which point I needed to reply with the following, and got some serious ‘Amen brothers’:
@yuhong2 – your answer is totally plausible, but shows what a lot of us do (me included) – in the face of a bug like this – we just keep jumping down the engineering rat/rabbit hole without stopping to ask a more fundamental question of whether you’re even on the right track with your architecture. I’ve learned to stop looking down the hole and to try and smell them coming before I start. Trust me – there IS no bottom. And more importantly, when you’re spending your time looking off the cliff, you’re not looking at your goal anymore.
After being a software engineer for 10 years on major projects – I’ve learned this one lesson about code usability: if you’re using more than 3-4 sentences your mom can’t understand to describe a fix or architecture; something is likely gone, or will be going, very wrong soon.
While I completely understand what yuhong2 is saying and it’s very plausible and intelligent, if it’s true, it shows me that someone made a terrible choice of platforms/architecture when choosing to solve this problem of a parking meter. I’ve worked with more than a few tech leads that come up with horrendously complicated algorithms and architectures to solve problems that could be solved MUCH more easily. And you know what? The easier solution, while perhaps not the fastest or cool looking, is almost always the fastest and best long-term. Why? Because those complex architectures have even more complex problems. i.e. yohung2’s answer.
As the project grows, it just gets worse and worse – not better. Until you need a phD to figure out why some part of your threaded, interconnected data structures are hanging or getting corrupted once every 30 days. This should be a moment to stop and ask yourself what you’re really doing. Sure, certain problems really do require complex solutions (i.e. 3D graphics, threaded applications and drivers – some of what I do), but know your tools and the strengths and limitations of them (the hardware platform, the language you’re using, the software/os stack, etc). Yes, you darn well better have a good toolbox of the latest algorithms, languages and OS info, but your toolbox will do nothing but get you into trouble if you don’t know how and when (or when NOT) to really use those tools. If yuhong2 is right, you see how your complex solution just backed you into a corner and shows you someone probably picked the wrong stack or algorithmic approach when just having a very simple box that checked times would have worked.
I have worked long enough to also know that many times platform choices are out of your control as an engineer – budgeting and marketing often limit you. But if you’re forced to use a platform – design so-as to avoid the limitations of that platform – don’t try to force them into doing what you want. As a rule of thumb, always stick with the simplest solution that completely solves the problem first, then you’ll be re-writing to solve performance and feature issues instead of core this-thing-doesn’t-even-work-yet problems. At worst, I’ll have a working system that’s slow and I can then optimize and re-design the parts that need it. I won’t have wasted time optimizing for things that may not have needed any help. Now the caveat is that you really must know what you’re doing and why your making those choices for simplicity. It’s just as bad to stupidly pick an architecture that’s too simple for the problem and gets you just as much in a corner. But I’ll argue a working, slow system will always sell and ship before one that’s 10 months late and MIGHT be faster. Simplicity also makes the code more maintainable and much more extensible long term as there’s less inherent inertia in the code to move about.
Software is like using clay to make art. Sure you can beat and force and manipulate it to create very complex structures – but you start making more and more complex problems for yourself. i.e. if you try to build a car engine from clay. If you work *with* the clay/code’s natural strengths and not force it to do what it isn’t good at, however, you end up with beautiful thing that’s simple, works well for its intended use, and is very easy to maintain (a clay bowl).
Another guy summarized even better:
For any sufficiently complicated system, there will always be failure modes where the average Joe has but two options: call an expert to fix it, or arrange his life to not use that system. In the case of the parking garage, Joe’s only rational response is to use a different exit lane.
Computer interface design goes wrong when the programmer refuses to admit that the system is in such a state. The parking garage’s system should have displayed “Closed – use a different lane”, with the error message shown on a maintenance screen.
This echo’s some of the great wisdom I’ve been learning from this great book:
Believe it should be on every programmer’s shelf. I’ll do some more postings as I get through it.