Code is data

Oct 16, 2020 Programming Archiving

It used to run…

I have seen a number of reports in the media about the aging of source code recently and it has had me thinking. If you can’t run code, whatever relies on it’s execution dies with the code.

Sure there is always the option of rewriting the code in a more modern language, but that doesn’t happen all that often. There are projects that languish from not being converted from Python 2 to 3, and that is mostly automated. Never mind finding someone that understands the intricacies of FORTRAN IV, PL/I or dare I say APL. Maybe someone can translate the code forward in time, but it doesn’t often happen.

The number of cases where this is a problem happens all the time and is growing rapidly. Here are a few I have seen in the news recently.

Small districts invested in electronic voting machines in the early 2000s from a company called Diebold. After Diebold went out of business the investment started to decay. Hardware failed, but software bugs with security could no longer be fixed because the source code was not available. Nor could the code be easily audited if there were questions.
Some scientific results can’t be reproduced because the code used to process the data is no longer viable. To highlight the issue a challenge was setup to see what code could still be used.
Large swaths of digital media in the form of old games is being lost as the systems that run them disappear along with the code. That is the experience of a significant chunk of people for a significant number of years lost to time. Many multimedia “demo” applications are already lost representing music and visual art of a fledgling media.
Even well maintained open source projects can run a foul of aging software when the APIs of underlying libraries and system interfaces are no longer available to compile or run against.

The financial industry keeps armies of COBOL programmers around to prevent being caught in this trap, but it is only time before a few loose at that game of chicken. Other users can’t make an equivalent investment. Unfortunately there is a lot of code (and systems that rely on it) that are in a bad place and won’t ever improve. When we talk about the Software Defined Life Cycle (SDLC) we really don’t characterize the geriatric years at the end and the later stage investments are almost never made.

Freeze the machine in time

There are many that look to solve the problem by freezing the machines in time. I guess the thinking is that if all of the media and hardware doesn’t change, than the code will be able to run forever. Maybe.

The cost of keeping hardware alive grows and grows over time. I have been in a few computer rooms where the legacy system’s hardware support contract was the most expensive budget item. There are teams of people that used to keep old IBM hardware running for the FAA or IRS because those systems were required for policy or legal reasons. I remember talking to one person who worked for the IRS that was giddy that they were going to be allowed to use emulators of a 50 year old IBM. I think the only reason was the cost of keeping the 1401 machines running were outstripping the available budget.

Freezing the operating system only goes so far also. Some scientist are looking at Docker containers or VMs to lock down the environment, but how long are those really going to hold up? I’ll give them 10 years, but 20 I’m not so sure about. There are package management solutions like Guix that may offer more longevity, but might need hardware preserved at some point to keep their promises. See last paragraph.

Many people in IT feel that the Java Virtual Machine (JVM) was the answer to this. Write once, run anywhere was a common refrain, but not many people still use it today after trying to do just that for 20 years. There was local tweaking for many features and if the code relied on an underlying C library it didn’t hold. Now you can still download old versions of Java, but how much longer is Oracle going to keep Java 1.1 - 5 available. Plus, where am I going to find a Windows NT 4.0 or Sparc Solaris 2 system to run it on?

Which door did I prop open with that IPC again?

There has been a lot of talk about a system like WASM being a better alternative that is open and richly maintained. Yeah, that makes some sense. Even as the industry moves on and expands it, the base instructions should still run for some time, but the precursor asm.js may make more sense as it was just pure JavaScript. That should run in any browser for some time to come, but we don’t know when the browser will drop JavaScript or break something that asm.js needs. Probably not for a while, but that doesn’t help a damn bit after it has happened.

Archive language

Maybe the trick is to make a language that is easy to move forward in time. I know earlier I lambasted many examples of that failing, but not all languages seem to follow that path. The title of this entry comes from Lisp, where code and data are interchangeable. It is part of the Lisp superpowers that you hear old programmers talk about as they stroke their beards.

Lisp was one of the first languages and still thrives with an active community today. It has two main variant types that are implemented and extended easily. There are versions that run on everything from super computers, editors (like the Emacs I’m typing on right now), to embedded chips that you might loose if you sneeze hard. Other languages with the same promise (like Forth) are often not expressive enough, but Lisp has held it’s own for decades and looks to for many more.

Paul Graham has actually written about this in his work on the 100 year programming language and Arc, but more as a language concept. Larry Wall and Joe Armstrong have joked about Perl and Erlang as sacred hieroglyphics that no one touches in 300 years because they just work. In the end it would have to be something that is easy to port and re-implement and abstracts the machine away. While Erlang would be fun to use for the rest of my life, I don’t see it happening.

I think the solution is a language that can sit in a particular sweet spot.

Easy to port/implement
Abstracts the machine
Expressive and easy enough that people want to use it
Fast enough that they don’t mind

While I don’t think games will be written in this magic archive language, it would be nice for art and science where the goal is to put another stone in the foundation of human knowledge. Maybe Lisp won’t hit that “easy” bullet, but Lua might.

In the end I think there will be a lot of experiments and efforts to get to a passable solution. That basically means one that people will use and carry around. Not something we can decide on with math or simulation.