I have a lot of data. Not, I am sure, a lot by the standards of an experimental scientist or someone who shoots a lot of video or takes a great many photographs, but a lot of small files which I don’t want to lose.
I have also had a number of data losses over the years – some from media failure, some from user error*. The most catastrophic was a motherboard failure on an old PC In the days when storage was expensive and removable discs were tiny. The computer worked for the most part, but there was a fault which meant that it could not calculate checksums correctly. The first things to go were the compressed directories that I had set up so as to make better use of the hard disc – that was the signal to me that I needed to do a comprehensive backup. Which I did, very carefully zipping up all the stuff I cared about most onto multi-floppy archives. An operating system reinstall did not fix the problem, and I was more than a little put out when the computer came back from the mender and I found that every single one of those archives was corrupt, the cherished data lost.
I suppose I should be glad that this happened before I had a digital camera, but I lost a lot of writing.
I have just performed an upgrade on my laptop to allow more current software to run on it, and was looking at my hard disc to clear things out a bit. What I found was nested machine images, copies of previous computer hard drives resting inside each other, copies made as I have moved from machine to machine to machine: yig contains wendigo, which contains ithaqua, which contains dagon, nyogtha and daoloth – generations of computers going back fifteen years. There’s even a disc image from the first machine I owned with a hard disc in it, kurt the Acorn Archimedes, although that was recovered in a recent bout of data archaeology.
Each previous image contains the seed of the data I have on the system that superseded it.
I’ve been using the same basic layout of my data directories for a long time, so I can trace the development of projects over the lifetime of these machines. I have early versions of programs and websites that I’ve maintained over this whole period, and larval versions of stories, and early snapshots of photo directories. I also have multiple copies of large data projects, like the effort to digitise my vinyl, which at least means I haven’t lost anything.
The question really comes down to how much of this ancient data I want to keep around. The large data projects I want to make sure are backed up in a couple of places, but the prehistoric data directories? Not so much, I think – make sure nothing is unique, but I feel less need to retain these old versions than I once did, when I started a repository project which was intended to act as a versioned history of all the work I have done.
Half the data on my hard disc on yig is duplicated, or stale, or just doesn’t need to be available at all times.
Time to clean up.
[*] aka PEBKAC – Problem Exists Between Keyboard And Chair