I was just bit by this when del.icio.us wasn't responding. Suddenly I can't look up that how-to I was working from last week.
Networked applications are great. They let us all communicate and collaborate better, they centralize things, and we're all more productive. However, the problem is that when network applications fail they tend to fail catastrophically. If the network is down, no one can work.
The problem really started to arise when we started getting always-on internet connections with good bandwidth. Before our networks were reliable and fast things were designed for intermittent communication. Things like FTP and CVS are designed to create a local copy for you to work on and then upload to the server when it's convenient. Now the bandwidth is high enough that we can just keep our documents on the server all the time, be they file shares or GoogleSpreadsheets. In the old model if the network went down everyone could keep working on their local copies. Now when the network goes down no one can work.
For this reason it makes a lot of sense to have backups, both of data and procedures, for when your web applications fail and servers fail. Who do you talk to to submit that purchase order? Who will keep track of changes to a collaborative document? How will you get your files from one system to another? Some suggestions:
- Designate an individual to be in charge of each process, such as documents or purchasing or code check-ins, so that there is a central point of contact when the system stops.
- For web-forms and submissions make sure you have paper backups and that people know where they are and have a supply on hand to last at least a day.
- Use technologies where users keep a local cache such as AFS and SVN instead of having all files reside on the server.
- Never entirely replace something with a networked counterpart. Leave the previous version around so people can roll back to something familiar when the new system fails.
- Make sure you have out-of-band communication with your systems. This is usually a keyboard and monitor or serial connection to the system. If you get in the habit of SSHing into everything you'll have a lot of trouble fixing it when you accidentally assign an IP address of 192.168.1. Yes, that's three octets, and yes, I've done that when trying to remotely roll-over a firewall while out traveling.
No comments:
Post a Comment