The Myth of Order The real lesson of Y2K is that software operates just like any natural system: out of control. By Ellen Ullman From the Apr issue of Wired Magazine, available online at: http://www.wired.com/wired/archive/7.04/ Y2K has uncovered a hidden side of computing. It's always been there, of course, and always will be. It's simply been obscured by the pleasures we get from our electronic tools and toys, and then lost in the zingy glow of techno-boosterism. Y2K is showing everyone what technical people have been dealing with for years: the complex, muddled, bug-bitten systems we all depend on, and their nasty tendency toward the occasional disaster. It's almost a betrayal. After being told for years that technology is the path to a highly evolved future, it's come as something of a shock to discover that a computer system is not a shining city on a hill - perfect and ever new - but something more akin to an old farmhouse built bit by bit over decades by nonunion carpenters. The reaction has been anger, outrage even - how could all you programmers be so stupid? Y2K has challenged a belief in digital technology that has been almost religious. But it's not surprising. The public has had little understanding of the context in which Y2K exists. Glitches, patches, crashes - these are as inherent to the process of creating an intelligent electronic system as is the beauty of an elegant algorithm, the satisfaction of a finely tuned program, the gee-whiz pleasure of messages sent around the world at light speed. Until you understand that computers contain both of these aspects - elegance and error - you can't really understand Y2K. "Bugs are an unintended source of inspiration. Many times I've seen a bug in a game and thought, 'That's cool - I wouldn't have thought of that in a million years.'" - Will Wright, creator of SimCity and chief game designer at Maxis "I've fixed about 1,000 bugs in my life. How many have I created? Undoubtedly more." - Patrick Naughton, executive vice president of products, Infoseek Technically speaking, the "millennium bug" is not a bug at all, but what is called a design flaw. Programmers are very sensitive to the difference, since a bug means the code is at fault (the program isn't doing what it was designed to do), and a design flaw means it's the designer's fault (the code is doing exactly what was specified in the design, but the design was wrong or inadequate). In the case of the millennium bug, of course, the code was designed to use two-digit years, and that's precisely what it's doing. The problem comes if computers misread the two-digit numbers - 00, 01, et cetera. Should these be seen as 1900 and 1901, or as 2000 and 2001? Two-digit dates were used originally to save space, since computer memory and disk storage were prohibitively expensive. The designers who chose to specify these two-digit "bugs" were not stupid, and perhaps they were not even wrong. By some estimates, the savings accrued by using two-digit years will have outweighed the entire cost of fixing the code for the year 2000. But Y2K did not even begin its existence as a design flaw. Up until the mid-1980s - almost 30 years after two-digit years were first put into use - what we now call Y2K would have been called an "engineering trade-off," and a good one. A trade-off: To get something you need, you give up something else you need less urgently; to get more space on disk and in memory, you give up the precision of the century indicators. Perfectly reasonable. The correct decision. The surest sign of its correctness is what happened next: Two-digit years went on to have a long, successful life as a "standard." Computer systems could not work without standards - an agreement among programs and systems about how they will exchange information. Dates flowed from program to program, system to system, from tape to memory to paper, and back to disk - it all worked just fine for decades. Though not for centuries, of course. The near immortality of computer software has come as a shock to programmers. Ask anyone who was there: We never expected this stuff to still be around. Bug, design flaw, side effect, engineering trade-off - programmers have many names for system defects, the way Eskimos have many words for snow. And for the same reason: They're very familiar with the thing and can detect its fine gradations. To be a programmer is to develop a carefully managed relationship with error. There's no getting around it. You either make your accommodations with failure, or the work will become intolerable. Every program has a bug; every complex system has its blind spots. Occasionally, given just the right set of circumstances, something will fail spectacularly. There is a Silicon Valley company, formerly called Failure Analysis (now Exponent), whose business consists of studying system disasters. The company's sign used to face the freeway like a warning to every technical person heading north out of Silicon Valley: Failure Analysis. No one simply accepts the inevitability of errors - no honest programmer wants to write a bug that will bring down a system. Both engineers and technical managers have continually looked for ways to normalize the process, to make it more reliable, predictable - schedulable, at the very least. They have talked perennially about certification programs, whereby programmers would have to prove minimal proficiency in standard skills. They have welcomed the advent of reusable software components, or "objects," because components are supposed to make programming more accessible, a process more like assembling hardware than proving a mathematical theorem. They've tried elaborate development methodologies. But the work of programming has remained maddeningly undefinable, some mix of mathematics, sculpting, scrupulous accounting, and wily, ingenious plumbing. In the popular imagination, the programmer is a kind of traveler into the unknown, venturing near the margin of mind and meatspace. Maybe. For moments. On some extraordinary projects, sometimes - a new operating system, a newly conceived class of software. For most of us, though, programming is not a dramatic confrontation between human and machine; it's a confused conversation with programmers we will never meet, a frustrating wrangle with some other programmer's code. "January 1 is a Saturday. So if the world comes to an end for a couple of days, it'll be OK. We've all had weekends like that." - Reed Hundt, former FCC chair "One guy in our office keeps a wooden head at the top of his cube - the God of Debugging. He makes offerings to it daily." - Maurice Doucet, director of engineering at MetaCreations Most modern programming is done through what are called application programming interfaces, or APIs. Your job is to write some code that will talk to another piece of code in a narrowly defined way using the specific methods offered by the interface, and only those methods. The interface is rarely documented well. The code on the other side of the interface is usually sealed in a proprietary black box. And below that black box is another, and below that another - a receding tower of black boxes, each with its own errors. You can't envision the whole tower, you can't open the boxes, and what information you've been given about any individual box could be wrong. The experience is a little like looking at a madman's electronic bomb and trying to figure out which wire to cut. You try to do it carefully but sometimes things blow up. At its core, programming remains irrational - a time-consuming, painstaking, error-stalked process, out of which comes a functional but flawed piece of work. And it most likely will remain so as long as we are using computers whose basic design descends from Eniac, a machine constructed to calculate the trajectory of artillery shells. A programmer is presented with a task that a program must accomplish. But it is a task as a human sees it: full of unexpressed knowledge, implicit associations, allusions to allusions. Its coherence comes from knowledge structures deep in the body, from experience, memory. Somehow all this must be expressed in the constricted language of the API, and all of the accumulated code must resolve into a set of instructions that can be performed by a machine that is, in essence, a giant calculator. It shouldn't be surprising if mistakes are made. There is irrationality at the core of programming, and there is irrationality surrounding it from without. Factors external to the programmer - the whole enterprise of computing, its history and business practices - create an atmosphere in which flaws and oversights are that much more likely to occur. The most irrational of all external factors, the one that makes the experience of programming feel most insane, is known as "aggressive scheduling." Whether software companies will acknowledge it or not, release schedules are normally driven by market demand, not the actual time it would take to build a reasonably robust system. The parts of the development process most often foreshortened are two crucial ones: design documentation and testing. I recently went to a party where a senior consultant - a woman who has been in the business for some 30 years, someone who founded and sold a significant software company - was explaining why she would no longer work with a certain client. She had presented a software development schedule to the client, who received it, read it, then turned it back to her, asking if she'd remake the schedule so that it took exactly half the time. There were many veteran programmers in the room; they nodded along in weary recognition. Even if programmers were given rational development schedules, the systems they work on are increasingly complex, patched together - and incoherent. Systems have become something like Russian nesting dolls, with newer software wrapped around older software, which is wrapped around software that is older yet. We've come to see that code doesn't evolve; it accumulates. A young Web company founder I know - very young; Scott Hassan of eGroups.com - suggests that all programs should be replaced every two years. He's probably right. It would be a great relief to toss all our old code into that trash container where we dumped the computer we bought a couple of years ago. Maybe on the Web we can constantly replenish our code: The developer never lets go of the software; it sits there on the server available for constant change, and the users have no choice but to take it as it comes. But software does not follow Moore's Law, doubling its power every 18 months. It's still the product of a handworked craft, with too much meticulous effort already put into it. Even eGroups.com, founded only nine months ago, finds itself stuck with code programmers have no time to redo. Said Carl Page, another of its founders, "We're living with code we wish we'd done better the first time." "Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent finding mistakes in my own programs." - Maurice Wilkes, creator of the Edsac and consultant to Olivetti Research Lab "Trust the computer industry to shorten 'Year 2000' to 'Y2K.' It was this kind of thinking that caused the problem in the first place." - anonymous Net wisdom The problem of old code is many times worse in a large corporation or a government office, where whole subsystems may have been built 20 or 30 years ago. Most of the original programmers are long gone, taking their knowledge with them - along with the programmers who followed them, and ones after that. The code, a sort of palimpsest by now, becomes difficult to understand. Even if the company had the time to replace it, it's no longer sure of everything the code does. So it is kept running behind wrappers of newer code - so-called middleware, or quickly developed user interfaces like the Web - which keeps the old code running, but as a fragile, precious object. The program runs, but is not understood; it can be used, but not modified. Eventually, a complex computer system becomes a journey backward through time. Look into the center of the most slick-looking Web banking site, built a few months ago, and you're bound to see a creaky database running on an aged mainframe. Adding yet more complexity are the electronic connections that have been built between systems: customers, suppliers, financial clearinghouses, whole supply chains interlinking their systems. One patched-together wrapped-up system exchanges data with another patched-together wrapped-up system - layer upon layer of software involved in a single transaction, until the possibility of failure increases exponentially. It's from deep in there - somewhere near the middle-most Russian doll in the innermost layer of software - that the millennium bug originates. One system sends it on to the next, along with the many bugs and problems we already know about, and the untold numbers that remain to be discovered. One day - maybe when we switch to the new version of the Internet Protocol, or when some router somewhere is replaced - one day the undiscovered bugs will come to light and we'll have to worry about each of them in turn. The millennium bug is not unique; it's just the flaw we see now, the most convincing evidence yet of the human fallibility that lives inside every system. It's hard to overstate just how common bugs are. Every week, the computer trade paper InfoWorld prints a little box called "The Bug Report," showing problems in commonly used software, some of them very serious. And the box itself is just a sampling from www.bugnet.com, where one day's search for bugs relating to "security" yielded a list of 68 links, many to other lists and to lists of links, reflecting what may be thousands of bugs related to this keyword alone. And that's just the ones that are known about and have been reported. If you think about all the things that can go wrong, it'll drive you crazy. So technical people, who can't help knowing about the fragility of systems, have had to find some way to live with what they know. What they've done is develop a normal sense of failure, an everyday relationship with potential disaster. One approach is to ignore all thoughts about the consequences - to stay focused on the code on your desk. This is not that difficult to do, since programmers get high rewards for spending large amounts of time in front of a computer workstation, where they're expected to maintain a very deep and narrow sort of concentration. A few months ago, I talked to a systems programmer who'd barely looked over the top of his cubicle for 30 years. He'd spent half that time working in the Federal Reserve System, backbone of the world banking order everyone fears will collapse come the millennium. But until he joined the Fed's Y2K project, he had never much considered the real-world effects of his work. "I read an article about how the Federal Reserve would crash everything if it went bad," said the man I'll call Jim Fuller, who agreed to talk only on condition of anonymity. "It was the first time in my life I understood everything the Federal Reserve did." He'd taken a rare look up and down the supply chain; the job of fixing Y2K in the context of an enormous, linked economic machine was now a task that stretched out in all directions far beyond his control. It scared him. "I discovered we were kind of important," he said uneasily. If you can't stay focused on your code, another approach is to develop an odd sort of fatalism, a dark, defensive humor in the face of all the things you know can go wrong. Making fun of bugs is almost a sign of sophistication. It shows you know your way around a real system, that you won't shy back when things really start to fall apart. A friend of mine once worked as a software engineer at a Baby Bell. He liked to tell people how everyone in the company was amazed to pick up a handset and actually get a dial tone. It was almost a brag: Ha ha, my system's so screwed up you wouldn't believe it. Now here comes a problem that's no joke. Technical people can't help hearing about the extreme consequences that will come down on the world if they don't find all the places Y2K is hiding. And they simultaneously know that it is impossible to find all the problems in any system, let alone in ones being used long beyond their useful life spans. Programmers feel under siege, caught between the long-standing knowledge of error and fragility they've learned to live with, and the sudden, unrealistic pressure to fix everything. "To paraphrase Mark Twain, the difference between the right program and almost the right program is like the difference between lightning and a lightning bug. The difference is just a bug." - Danny Hillis, in The Pattern on the Stone (1998) "I am one of the culprits who created the problem. I used to write those programs back in the '60s and '70s, and was so proud of the fact that I was able to squeeze a few elements of space by not having to put '19' before the year." - Alan Greenspan, Federal Reserve chair "Y2K is a sort of perverse payback from the universe for all the hasty and incomplete development efforts over the last 10 years," said the Y2K testing lead for a midsize brokerage. Also speaking on condition of anonymity, Lawrence Bell (a pseudonym) said it like an I-told-you-so, a chance for him to get back at every programmer and programming manager who ever sent him junky software. Bell is a tall, impeccably groomed young man whose entire workday consists of looking for bugs. He's in QA, quality assurance, the place where glitches are brought to light, kept on lists, managed, prioritized, and juggled - a complete department devoted to bugs. He has the tester's crisp manner, the precision of the quality seeker, in whom a certain amount of obsessive fussiness is a very good thing. Since Bell doesn't write code, and can't just concentrate on the program on his desk, he has no alternative but to affect a jaunty, fake cheer in the face of everything that can go wrong. "We have systems that have been developed in, shall we say, an 'uncontrolled' manner," he said. The systems he's responsible for testing are classic journeys through time: new systems on Windows NT with graphical user interfaces, Unix relational databases on the sturdy client- server systems of the late '80s, command-line interfaces that were in vogue in the late '70s and early '80s, all the way back to an IBM midrange computer running programs "that nobody thinks about," said Bell, but "have to run or we're in trouble." Bell's team is doing what they call "clean management": testing everything for Y2K problems, whether or not they suspect it has a date-related problem. In the course of it, as they go backward in time, they're coming across systems that have never been formally tested. "There was a day when things did not go through QA," said Bell, as if he were talking about another century. All this time, the untested systems have been out there, problems waiting to happen. "We find all sorts of functional bugs," he said affably. "Not Y2K. Just big old bugs." Bell had all the complaints testers always have. Missing source code. No documentation. Third-party software vendors who won't give them information. Not enough people who know how the systems were put together. Users who won't take the time to explain how they work with the system. And what he calls the "ominous task" of fixing one of the oldest, least documented systems - the crucial trade-clearing system running on the IBM machines. "If one of the midrange computers goes down for a day, we're out of business without our backups," he said. Still, quality assurance is the one place where the muddled side of computing is obvious, predominant, inescapable. Bell, as a good QA guy, is mostly inured to it all. "Come the year 2000, a couple of systems will fail," he said nonchalantly. "But that's what happens with any implementation. It's the same thing we've been doing for years." For Bell, it's no big deal that supposedly Y2K-compliant programs will be put into users' hands without thorough testing. He's comfortable with the idea that things can go very, very wrong and still not bring about the end of the world. Said Bell with a shrug, "It's just a big user test." "We used to have 'bugs for bucks' prizes, because toward the end of debugging, the bugs get hard to find. We'd add $10 to the prize for each bug found. But then people would hold off reporting one until the price went up. It was an underground economy in bug reporting." - Heidi Roizen, former VP of developer relations at Apple. The millennium bug is not unique - human fallibility lives inside every system. The only thing about Y2K that was really bothering Lawrence Bell was the programmers. There is a classic animosity between programmer and tester - after all, the tester's role in life is to find everything the programmer did wrong. But Y2K and its real-world time pressures seem to have escalated the conflict. Bell thought that QA would manage - "it won't be pretty but we'll do it" - but no thanks to the programmers who developed the applications. "The application folks are never there," said Bell, deeply annoyed. "We're not getting analysis from the developers - it's really absurd." The source of the hostility is documentation: Programmers are supposed to make a record of the code they've written. Documentation is how QA people know what the system is supposed to do, and therefore how to test it. But programmers hate to write documentation, and so they simply avoid doing it. "The turnover is high," said Bell, "or the programmers who have been here a long time get promoted. They don't want to go back to this project they wrote 10 years ago - and get punished for not documenting it." Programmers have fun and leave us to clean up their messes, is Bell's attitude. They want to go off to new programs, new challenges, and the really annoying thing is, they can. "They say, 'I want to do something new,'" said Bell, truly angry now, "and they get away with it." "No more programmers working without adult supervision!" This was declaimed by Ed Yardeni, chief economist for Deutsche Bank Securities, before a crowded hotel ballroom. On the opening day of the Year 2000 Symposium, August 10, 1998 (with cameras from 60 Minutes rolling), Yardeni explained how the millennium bug would bring about a world recession on the order of the 1973-74 downturn, and this would occur because the world's systems "were put together over 30 to 40 years without any adult supervision whatsoever." Blame the programmers. The mood at the conference was like that of a spurned lover: All those coddled boys in T-shirts and cool eyewear, formerly fetishized for their adolescent ways, have betrayed us. It has become popular wisdom to say that Y2K is the result of "shortsightedness." It's a theme that has been taken up as a near moral issue, as if the people who created the faulty systems were somehow derelict as human beings. In fact, some of the most successful and long-lived technologies suffer from extreme shortsightedness. The design of the original IBM PC, for example, assumed there would never be more than one user, who would never be running more than one program at a time, which would never see more than 256K of memory. The original Internet protocol, IP, limited the number of server addresses it could handle to what seemed a very large number at the time, never imagining the explosive growth of the Web. I once worked on a Cobol program that had been running for more than 15 years. It was written before the great inflation of the late 1970s. By the time I saw it, in 1981, the million-dollar figure in all dollar amounts was too large for the program's internal storage format, and so multiple millions of dollars simply disappeared without a trace. We are surrounded by shortsighted systems. Right at this moment, some other program is surely about to burst the bounds of its format for money or number of shares traded or count of items sold. The Dow Jones Industrial Average will one day break 10,000, the price of gas will top $9.99, the systems we're renovating now may live long enough to need renovation again. Some system designer, reacting to the scarce computer resource of our day - not memory but bandwidth - is specifying a piece of code that we will one day look back on as folly. At the Year 2000 Symposium where Yardeni spoke, there was a technical workshop about creating a "time machine" - a virtual time environment for testing "fixed" Y2K programs. One of the presenters, Carl Gehr of the Edge Information Group, patiently explained that, when designing the test environment, "you have to specify an upper limit" for the year. While everyone scribbled notes, an awful thought occurred to me. "But what upper limit?" I said out loud. "Should we be worrying about the year 9000? 10,001?" Gehr stopped talking, heads came up from their notes, and the room went quiet. It was as if this were the first time, in all the rush to fix their systems, the attendees had been able to stop, reflect, think about a faraway future. Finally, from the back of the room came a voice: "Good question." Things can go very, very wrong and still not be the end of the world. Says Bell: "It's just a big user test." Gehr glanced over at his colleague, Marilyn Frankel, who was waiting to talk about temporary "fixes" for Y2K-affected code. "Marilyn will address that later, I'm sure," he said. --------------------------------------------------------- Ellen Ullman (ullman@well.com) is the author of Close to the Machine, a memoir that draws on her 20 years as a programmer. She is a frequent contributor to Harper's, Salon, and NPR. Copyright (C) 1993-99 The Conde Nast Publications Inc. All rights reserved.