NNSquad - Network Neutrality Squad
[ NNSquad ] Re: The disappearing web: Information decay is eating away our history
| On Sep 23, 2012, at 5:54 PM, Vint Cerf wrote: here's my take on "bit rot" - it is more a problem of software that can still interpret the bits than anything else. One can keep rewriting bits but making sure the applications are still around and running to interpret them is hard. Solving this problem turns up all kinds of legal and business issues. I also agree that the legal "discovery" system is leading to wholesale policies of deletion of digitized content after a fixed period of time and that's going to frustrate historians and each of us when we are trying to remember something from 5-10-20 years ago...A couple of years ago, I attended a talk about a project to create an "archive format" for documents that were intended to be interpretable even 100 or more years from now. Like all "standardization" efforts at the time, it relied on XML - the theory being that you could get away from all the specific details of particular document encodings by defining an XML Schema "once and for all". I haven't heard anything further about that project since, though I suppose it's still out there.  It struck me then as typical of proposals built on XML, promising much more than XML was inherently capable of providing. Besides, at best the proposal only covered "documents", in the sense of data representations of pixels on pieces of paper.  Much of the information we have today is in the form of databases or actual active programs.  Sure, you can print a spreadsheet - but in doing so you lose the computational relationships that actually define a spreadsheet.  (And, sure, you can print out the cell formulas - but you're still missing something fundamental.) My suggestion as an alternative:  Virtual machines.  We're pretty good these days at writing down hardware specifications in excruciating detail.  This can be extended to the detailed definition of a saved virtual environment.  (The different vendors of VMM implementations already read each others' saved files, so there's no mystery here.)  Then if you want to save a Word document - save an image of an appropriate Windows environment, together with a copy of Word, and someone in the future will be able to see and interact with the document the way the author did. Cross-architecture virtualization is already being used to preserve classic early programming environments that ran only on now long-dead machines.  This is truly the only way to do so.  No amount of description can match actually sitting at a TOPS-20 keyboard and experimenting. There is one fly in this ointment:  Software licensing.  Imagine a historian 100 years from now, trying to analyze the history of some important political speech.  He has the original Word document, complete with revision history.  What a wonderful resource.  He has the saved virtual images.  Unfortunately … when he goes to read the document, he finds the license has long expired. (Libraries and other institutions dedicated to historical preservation should probably be automatically granted perpetual licenses to the software needed to access the documents the preserve.  It's unlikely to happen without a law requiring it - and I don't have much hope for such a law being passed.  But one can always hope….)                                                         -- Jerry | 
_______________________________________________ nnsquad mailing list http://lists.nnsquad.org/mailman/listinfo/nnsquad