Rush to Disaster, How to hose IT all up or down 101
Upgrade disasters made easy
We've got a huge number of upgrades to do, so let's do them all at the same time!
I’m the technical administrator at a large medical group in Canada. Among other things,
I’m responsible for the LAN, the WAN, all the desktops, laptops, peripherals, and a
medical-records application that’s at the core of our group’s operations. Over the last
couple of years, we’ve been struggling to make that app perform more reliably. At the
same time, our infrastructure has been growing fast, and sluggish performance from our
overloaded servers had become a problem.
We developed quite a wish list:
- A two-generation upgrade of the medical records application — an app that’s critical
to the health and well-being of our patients;
- A two-generation upgrade of our revenue-critical Practice Management application;
- Moving our datacenter from an overcrowded, in-house facility to a third-party hosted center;
- Installing bigger, faster servers for in-house use;
- Upgrading our connectivity software from RDC/Terminal Server to Citrix ICA;
- Deploying server virtualization and SAN (storage area network) technology;
- Upgrading Windows Server, SQL Server, and software that supports printing across the WAN;
- Buying new scanning software to interface with the upgraded medical records app.
OK, there were issues. Our CIO had no experience with Citrix ICA … or server virtualization
… or SANs. But we figured if we moved through this process one step at a time, everything would
be fine.
Then, in the infinite wisdom of the powers on high, our CIO got approval to make all these
changes simultaneously, during a one-week roll-out! I warned the CIO that the proverbial snowball
in hell would have a better chance of success than this ill-advised rush to disaster. But he
assured me that his staff would be able to handle any problems.
The roll-out started on Friday evening. By Saturday morning we were crippled. The only
functional apps were an old Exchange server and the staff time-clock software running on
the old server. The CIO and his staff of help-desk technicians had reserved Sunday for
testing. But nothing was working, and the level of chaos was so intense nobody knew what
to test first.
Any first-term computer science student would have known that a more incremental, staged
transition would have had a better chance of success. For much of the week the transition
team threw money at Microsoft, Citrix, our medical-records application vendor, an IT consulting
group, and the new datacenter hosting company, trying to isolate, identify, and repair the vast
number of issues that had arisen. The CIO spent the week hiding from a small army of inside
staffers and outside consultants, all of whom were waiting in line to call him bad names.
By 5 p.m. Thursday things were beginning to work, and by mid-day Friday I was reasonably
confident that we were able to care for our patients. I had been worried that a patient might
die because of the deranged state of our systems. But we got lucky, and no one suffered physical
harm. On the other hand, my best guess is that during the course of this week we spent roughly
twice the half-million dollars that had been allocated for the upgrade.
To my surprise, the CIO hasn’t been fired. Maybe that’s because HR can’t find his records.