AI clean room

Share on:

Malus promises to liberate open source, but they just fired the opening shot across the bow of the software industry. Let’s dive in.

Open Source

First off, open source doesn’t need to be liberated from anything. You can read it and use it today, but what you can’t do under some of the licenses is take it and not give back. The open source creators are giving us gifts. The whole idea of Malus liberating (their word not mine) open source is so someone can steal the work of others to be made into a proprietary product. The attitude isn’t welcome and it really smacks of greed, but enough of that as there are bigger fish to fry.

Clean room

Malus is offering a service where you can upload source code and have it rewrite it in a clean room by an AI agent. Clean room design is where skilled programmer reverse engineers the functionality of a bit of code and then recreates it without using any of the original code. To be really clean, the reverse engineer normally writes a detailed spec and that spec is used to write a new version by someone else that has no access to the original software.

Malus implies crap stain on code

I’m not going to jump into the process, as I probably don’t know all of what they are doing behind the scenes. If they are being smart about it they have one AI write the spec. Another AI that reviews it for lack of original content. Then a third AI to write the new code. The distasteful example they used which I am displaying above looks like they are just renaming the variables and claiming to remove all the smelly open source licenses (has the MIT license ever really bothered anyone?). If I was asked to examine that code I would argue that their clean room process is what smells, but lets give them the benefit of the doubt.

There are a lot of open source community members that are angry that this service will take their code and for a fee per kilobyte, clean room implement it for use in a proprietary product. Every open source community should be angry at the way Malus is presenting themselves, the service and the problem. If an open source project wanted to give companies the ability to use the code anyway they wanted, the project could have released the code in the public domain, like SQLite does. Like I said before, I’m going to leave this for others to complain about because there are bigger fish to fry.

Commercial threat

While Malus is only talking about open source, how long before they are used to clean room a commercial product? You are probably thinking, that’s different and the source code isn’t available. Hold on, I’m not sure it is different at all and here is why.

Clean room design was originally developed to recreate commercial software. The concept that Malus is using is just as legal against Microsoft Excel as it is against LibreOffice Calc. No one would bat an eye if a company spent a few million dollars to do this and in most cases their shareholders should be questioning decisions to not use these techniques where they are valuable.

Source code

The second argument is about the availability of the source code. Sure, the commented high level language implementation is not available to the public for most compiled commercial products, but the computer has to run something. It is perfectly reasonable to consider the reverse engineering AI we mentioned above can write the clean room specification from the machine code that is available for every product. Don’t believe me, well nothing is stopping anyone from running a decompiler to generate higher level language from the machine code. Sure, you and I don’t like reading the decompiled code, but that AI will not notice a difference.

We already see LLMs that are fairly good at converting source code from one language into a second language. Decompilers have been proven technology for decades now. What is to stop someone from using a Malus like service or technology to clean room implement anything?

Take away

Malus uses offensive language towards the open source community while offering a service to liberate code, but the people that should be worrying have millions or billions of dollars invested in commercial software.

Immediately the LLMs have too small a context window to digest even a medium sized commercial software product, but using different types of memory and lookup along with the LLMs we should expect to see an LLM actually successfully clean room a medium to large executable as soon as someone spends the time to try.

How many accountants would pinch pennies on buying Excel if they knew that the clean room version was the same functionality for a fraction of the cost? How many database users will shell out for Oracle licenses when they can get the clean room version sold for a tenth the cost? Sure, the customer support will drive some sales, but how many people actually use support? How small that fraction is might surprise you.

I fully expect this threat will grow into a matter of national policy making. As larger code bases are threatened with automated clean room processing, we should expect to see new regulation and legislation to address the problem. Unfortunately, I doubt that any of those efforts will work well, and the unintended consequences might be devastating.