back up, but not from backup
You *may* have noticed, this server was down for almost two days. What's up?
Well, the root server fprmerly known as server-wg.de officially died somewhere in the timeframe of June 25thm 1am Nürnberg time. It had gone down around June 24th 9am, and wouldn't restart. Several calls to trexh support and some hours spent in rescue mode (booting into a ram-drive and mounting the HD afterwards for diagnosis) finally lead to the discovery that said HD would not even be detected anymore. Some hours later and more phone calls to tech support had the effect of an HD replacement. All seemed well, although this meant a fresh OS install with none of the old configuarations or data, all lost with the old HD. So happily I started down the gloarious road of setting the system up again from scratch. First I had to replace several critical software components (apache, qmail etc) with current versions before I could even think of configuring it to the sate it had had before (user accounts etc).
While it was happily compiling apache2 from source, I suddendly got I/O errors and it stopped reading or writing to it's HD. Again a call to tech support and I was back in rescuse mode. No HD visible! At this point tech support decided this machine (which dates from 2002 and has been online ever since) was beyond recovery. I decided it was time to go to bed.
Around 11am on the 25th I called tech support about the status. Good news! They had decded to give me a compülete hardware replacement, which also meant I would get almost current RAM and CPU gardes (the old machine had a 900Mhz CPU with 256MB of RAM ^ o)
Later that afternoon the new machine came online and I logged on as root for the first time. Oh wonder, they had initialized it with SuSE 10.2, the default OS install for puretec.de root servers. Now that is a current Linux OS, only not one I really care for (SuSE does things a bit differently from Debian or Ubuntu, which I am more familliar with).
So after some deliberating and poking around, I decided to whipe the OS and go from a clean minimal Ubuntu ibstall (which they offer as an option, but not as default)... the re-initialisation was done painlessly over the we interface of puretec.de and the only irksome thing about it is that you hit 'start', see your server go down, and then you can only WAIT. You do not know how long this will take, you can just guess. One hour minimum, but you will only know it failed if the server doesn't come back after a sufficent amount of time. Depending on the size of your install that could be anywhere from one to *several* hours...
But after about 1h the machine came back online and I was presented with a clena and almost empty Ubuntu 6.06 LTS minimal install.
The 'LTS' means Long Term Support, which means you will get updates for this base system for a long time. In the case of 6.06 that long time ends 12-2008... as the systems has been around for quite a bit, and the new LTS came out in April, which will be supported until 2010. So I though !before I do anything, let's upgrade this one the the 8.04 LTS." Which is actually possible to do in one step.. if you don't faul up on some of the choices the process gives you. Apparently that'S what I did, as after the final reboot (it's about midnight by now) the machine wouldn't come back up.
Back to the well known by now procedure of calling tech support to set the machine into rescie mode, scanning the HD etc... the machine looked ok, so I guess I misconfigured something important during the upgrade.
Ok, hard choices: I did the re-init once more and returned to the 6.06 Ubuntu minimal system. Again the one hour wait, not knowing if it will comeback at all. (Remeber, this is around 1am now, and I hadn't slept much since Tuesday...)
Around 2:30am the machine finally came back online and I was able to log in as root.
A clean fresh and minimal Linux server with nothing on it.
First order would be to set up the mailserver and re-create the mail and iser acounts for the various people who had been with out mail for almost two days now. As this is a *modern* Linux now, my recently aquired sendmail expertise seems in approriate (sendmail is an ancient monster of Unix software you only use if your system supports nothing else, like the old one)
Having had the SuSE re install on the ill fated replacement HD for almost a day, I had been reading up on qmail configuration (that's what SuSE now uses)... but Ubuntu doesn't use qmail - amd talking to several knowlegable friends I quickly learned why (it's at least as bad as sendmail re configuration, and hasn't been updated in ages) - but rather it uses postfix! So here I was, needing to get an existing group of users setup with accounts and mail services again *quickly* AND also needing to learn a new set of skills (how postfix is configured)...
In the end I am very happy with the switch to postfix. Not only is it the mail software supported by Ubuntu out of the box, but it's actually very easy and clear to configure. The biggest challenge was to map my old sendmail setup to the workings of postfix, which is really a matter of translating one language to another. God bless the fact that postfix speaks a much clearer and saner language :) I got it all working in 'no time' and I even found a greylisting plug-in that actually worked out of the box!
Actually setting up the webserver and putting the various websites back where they belong was a piece of cake. Lucky me I had decided to do this last... it was now around 9am the next morning and I really have no idea how I managed to stay awake through the upload process...
So now it's almost all back up, re built from scratch. Some missing parts here and there, and maybe one day of actual emails lost.
Would actual backups have helped in this process?
Well, for one, I had backup for all websites, as I generally build all my content offline and then upload, even this weblog.
In the case of user accounts and mail configuration... a backup would only really have helped if I was moving from a crashed system to a fresh one *of the same kind*... but in this case that was neither possible nor desireable (the old system was SuSE 7.3 based with a ton of manual pathes over the years - as the base system hadn't been supported anymore for about 4 years... and an upgrade wasn't possible for hardware reasons)
Also the dual HD crash and foulup with my OS upgrade had nothing to do with backups anyway...
All in all I guess I mamanged this quite well, considering that I am *not* a sysadmin or at all versed in Linux internals. I managed to re create an established system while also porting it over to a modern OS setup, and did that in actually under 4 hours real configuration time. Additionally I had to learn a complete new set op skills in that time ^
(Oh btw, I spent all of yesterday sleeping and I still have quite the headache now)
- What to write about Trieste
- restoring older versions of iPhone apps
- fear, panic, thresholds
- How to really learn from debugging
alles Bild, Text und Tonmaterial ist © Martin Spernau, Verwendung und Reproduktion erfordert die Zustimmung des Authors