Data maintenance and recovery

Friday, 25 April, 2014

Several things landed across my desk this week which reminded me of the the thorny problem of file formats, data media, archival and redundancy. Computers make our life easier - they can do repetitive tasks easily, store data and control external devices such as printers amongst other things. A nice example of this is my MSc thesis which I wrote in the then industry standard of Word Perfect 5.1. Scarily, I blogged on this nearly a decade ago noting the relatively short space of time that Word Perfect files largely became redundant…. yes Microsoft Office can read the files but not complex formatting. So I could recover the data and, using the process of exporting to PDF, I could recover a printable page. What I couldnt easily recover was a fully formatted, editable, document.

This has strong echoes of the doomed BBC Doomsday Project….. an ambitious £2.5M 1986 project (about £10M in todays money!) designed to “capture the essence of life in the United Kingdom. Over a million people contributed to this digital snapshot of the country.” Yet within 15 years it was redundant (as opposed to the 900+ years of the original Doomsday project!) before the CAMiLEON Project was resourced to recover the data.

And in the news this week a similar tale about some artwork produced by Andy Warhol for Commodore. This was a 3 year project to recover a dozen unknown digital works from the artist… yes, magnetic imaging was used so there was no physical contact (or damage) to the original floppy discs. They then discovered an “unknown” file format which had to be reverse engineered in order to read the content.

That brings me to the two recent remote sensing examples which are nice:

The first noted over at DPReview shows recovery of “lost” NASA lunar orbiter photos from the 1960s. These were amazingly photography on to 70mm film and chemically processed automatically in the orbiter before being transmitted back to Earth in analogue signals. The original recordings have now been found and reprocessed to produce, arguably, better imagery than current digital sensors!

The second dates back a few years to the Soviet Venera missions to Venus in the 1980s. These also show access to the original data (no problems with media this time!) but this time analysing multiple analogue transmissions of the same data in order to improve the fidelity of the imagery.

Stark warnings indeed of:

1. the media you store your data on
Be careful WHERE you store your data and how you back it up/archive it. Dispersing data between different cloud based services makes identifying WHAT you own and WHERE it is can be difficult. My recommendation would be to keep a local coopy and use cloud based services for file sharing (e.g. Dropbox) or backup (e.g. Carbonite). However remember that data can be quite ephemeral - I use a portable HDD, but this has a finite life so I not only back this up but where it wears out I will “format shift” to a new device. Offloading data on to DVDs or “old” HDDs can solve running out of storage but be careful that the media doesn’t become unreadable. Think of your data as “belongings” and the media as a bag. The bag is unimportant, it is there for storage and transport and you will move your “belongings” from one bag to another. Take care of your data!

2. the file formats themselves (something I blogged about in terms of spatial data).
There are some formats we can choose to use that will likely be readable in the future (Word DOC/DOCX, TIFF, JPG etc) but as applications become more specialist this becomes more difficult. We all get caught out with old formats, but if its valuable to you make sure you have a future proof route to ensure its longevity.

Add comment

Fill out the form below to add your own comments