The precise mathematical definition of obfuscation and what is considered obfuscation for an average software engineer are two very different things.
In fact, the article is only about indistinguishability obfuscation. What is mostly discussed in this thread is the notion of virtual black box obfuscation (VBB).
VBB has been proven to be impossible in the general case (see https://www.wisdom.weizmann.ac.il/~oded/PS/obf4.pdf). There are a few special programs where VBB is feasible, such as point functions, but in general in cannot be achieved.
Indistinguishability obfuscation (iO) means that if you obfuscate two programs that compute the same function, then you cannot distinguish them. Or put in different words, if you get two obfuscated programs, then there is no better way than random guessing (except for a factor that is negligible in some security parameter) to find out if they stem from the same original program.
To try to make this concrete, if you have a program A that does bubble sort, and a program B that does selection sort:
1. VBB would be making it so you can't glean information about A or B by running VBB(A) or VBB(B) or examining them, for various definitions of "information".
You can't tell A is a bubble sort at all, and you can't tell B is a selection sort at all.
VBB is, as mentioned, impossible in the general case.
2. IO would be making it so if you are holding IO(A) and IO(B), you can't tell them apart, and can't tell if the original was A or B.
So you can have functionally identical programs, and when you run them through IO, you can't tell whether the original was A or B.
I'm kind of confused. I'm guessing this is a dumb question, but I'm obviously missing something. If a program is running bubble sort, and another is running selection sort... can't you just run them instruction-by-instruction to see which elements they swap? And deduce what they were based on that? Like if you have [4, 3, 2, 1], and the first swap the program does results in [1, 3, 2, 4], then it clearly wasn't bubble sort, right? What am I missing?
Yeah, I probably should have collected an the pieces into a single reply. Right now the number of comments is small enough that it’s not hard to find though.
The latter is useful to hide who wrote the program? Like some features of code that might reveal your personal patterns?
Or what is it for?
And how does it deal with timing? İf two programs I gave it have different complexity, would it pad the running time to some fixed value? İf so, how is that value chosen?
Thanks, I had no idea what to make of the OP, so your comment is very valuable!
> Or put in different words,
I don't see how the second phrasing follows from the first one. In fact, I don't understand what the second statement is supposed to mean to begin with. Could you elaborate?
If you have a bubble sort and selection sort, and run them through IO, for either result, you can't tell if the original was bubble sort or selection sort.
Roughly: VBB is the ability to take any program and make it indistinguishable from random.
IO is the ability to take two programs that compute the same function and make them indistinguishable from each other.
The original goal was to be able to make public key systems out of secret key systems, so hiding existing running time is not part of the definition of indistinguishable.
It is about what information you can glean about either of the originals.
Formally (from the paper):
Indistinguishability: For every two ensembles {C0,λ} and {C1,λ} of polynomial-sized circuits that have the same size, input length, and output length, and are functionally equivalent, that is, ∀λ, C0,λ(x) = C1,λ(x) for every input x, the following distributions are computationally indistinguishable:
{iO(1λ, C0,λ)} {iO(1λ, C1,λ)}
IE you can take a secret key system, hardwire the key into the program, and turn it into a public key system without fear that the secret key will be recovered - instead, it will not be distinguishable from other programs with different hardwired keys.
> IE you can take a secret key system, hardwire the key into the program, and turn it into a public key system without fear that the secret key will be recovered - instead, it will not be distinguishable from other programs with different hardwired keys.
I don't understand. Shouldn't a different secret key yield a different function / circuit?
That is a fascinating result. A remarkable mathematical achievement. And a nightmare.
This thesis says that it's possible to obfuscate code in such a way that there is a lower bound on the level of effort needed to de-obfuscate it. That lower bound can apparently be comparable to the level of effort required to break a cryptosystem.
So, coming soon, viruses and worms nobody can figure out. Code where no one can tell if it has a backdoor. ML classifiers where no one can be sure what they really do.
What you are worried about is VBB (roughly can i take a program a and obfuscate it in a way that you get no information about the original from the obfuscated version), which has been proven impossible.
This is about indistinguishability, which is a form of obfuscation, but not the kind you are thinking about.
This is really something like "If you have two programs a and b that compute the same function, it's possible to obfuscate them in a way that the resulting programs are indistinguishable from each other"
So if program a is "bubble sort" and program b is "selection sort" (function here is the mathematical sense, so these compute the same function), obfuscation can make it so you can't tell them from each other, and you can't tell if the original program of what you are holding was bubble or selection .
Roughly: VBB is the ability to take any program and make it indistinguishable from random.
IO is the ability to take two programs that compute the same function and make them indistinguishable from each other.
"compute the same function" is actually what makes it feasible :)
The original use case was really trying to turn secret key systems into public key systems by hardwiring the secret key into the program and then obfuscating it.
So an example use case today would be "every DVD decoding program has a different key but computes the same function, can you make them all indinguishtable and thus hide the key" or something like that.
> The original use case was really trying to turn secret key systems into public key systems by hardwiring the secret key into the program and then obfuscating it.
Wow, that definitely qualifies as a "crown jewel." I, for one, would not look forward to a future in which software running on my hardware is able to hide secrets, like one-off media encryption keys, from me!
In practice, as the thesis proves, you can use iO to generate a lot of things people care about in cryptography.
I will go out on a limb and say you can use it to build almost everything interesting :)
Now, it happens for a lot of those we don't need iO and iO may be impractical, but as a theoretical building block, it's quite nice.
As for your concern, yes, this is now possible, at least in the sense that you do as good as is possible to do in making them computationally indistinguishable. This does not prevent you from attacking it in other ways :)
Two things, good or bad depending on how you look at it:
1. You can prove iO is as least as good as the best possible obfuscation scheme that can ever exist. So whatever that enables you to do or not, it's the limit.
2. It also means you can get away with handing over less secrets, ensure better isolation, etc.
The following things are possible with iO (trivially so), due to this paper:
Adaptively secure succinct garbled RAM - which would let you hand secure databases to untrusted providers and not worry about it.
Sender deniable encryption where you can't prove what the original plaintext was and various options are equally likely.
Fully deniable interactive encryption where secrecy of communication is protected even if the parties are later coerced (or willingly bribed) to expose their entire internal states: plaintexts, keys and randomness.
etc
These are just some examples.
Now, some of this, as I said, we know how to do already, some we don't. But this paper gives you iO as a building block that can do them without having to separately prove that it is as sound and secure as existing crypto systems are.
Again, iO only provides computational indistinguishability, not other things, but it is a nice primitive.
> Wow, that definitely qualifies as a "crown jewel." I, for one, would not look forward to a future in which software running on my hardware is able to hide secrets, like one-off media encryption keys, from me!
That future is already here for most people. See things like Secure Enclave or Intel's SGX.
We already have code that obfuscates itself. And runs differently based on whether it is running in a debugger or not. This has been the cat & mouse game played by:
virus writers vs anti-virus code.
game hackers vs anti-cheat software.
license checking code vs cheapskates.
Stuxnet.
> Code where no one can tell if it has a backdoor.
Back in 1984, Ken Thompson wrote Reflections on Trusting Trust [0]. The short answer is "no, you can never ever tell if it has a backdoor. Ever."
> The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode.
Back in the 90s, I was interested in hacking. The sort of hacking that starts with "this is the disassembler. Step one, hack the trial version of IDA Pro." It was fascinating at the time (and bores me now, my punishment for growing up, I guess) so I read & did lots of stuff like that.
You should require positive proof that code does what you want.
Similar to how eg Haskell's type systems doesn't have to solve the halting problem [0]: it just rejects some programs that would be ok, but don't conform to the type system.
First, for already closed source programs the obfuscation results don't make a difference at all. They remain just as obscure before as they are now. (Unless you routinely try to de-compile all closed source programs?)
The obfuscation result in the submitted article really only makes a difference to open source programs.
But: thanks to advances in mathematics and cryptography, a vendor can prove to you that their program does what it's supposed to, without having to reveal the source code or anything else about the program.
Among other things, tons of desktop software and most mobile apps and even javascript you run from most websites is obfuscated. How is it possible to avoid that? Do you run 100% open source for everything with javascript disabled?
I'm not sure. It seems to be applied to "circuits", which in this context seems to mean a one-way set of logic gates (such as AND, OR, NOT, NAND) which map a set of boolean inputs to a set of boolean outputs. In theory you can construct any finite digital function that way. It's a useful abstraction, like a Turing machine.
Unlike a Turing machine, a circuit is finite. All circuits are "solveable" (given a outputs, compute an input which yields them) by trying all the input patterns..
There's no undecidability and no halting problem. There's just difficulty.
"Difficult" here means there's no way easier than trying all the patterns.
This is the same property sound cryptosystems are supposed to have - there's no easier way than trying all the keys.
Whether this result can be extended to programs with iteration I'm not sure. The paper doesn't seem to mention iteration or storage.
A (polynomial-sized) circuit in this context is a sequence of DAGs whose nodes are binary logic gates where the nth DAG infers 1 output bit from n input bits, such that some polynomial bounds the size of the graphs of the sequence. See https://en.wikipedia.org/wiki/Circuit_complexity
The class of circuits in question is P/poly which includes BPP (which includes P, the class of polynomial-time programs). So the result is quite general, since most practical programs are in P.
For an introduction to complexity theory, see Arora, Barak (2009) (Draft available here https://theory.cs.princeton.edu/complexity/book.pdf ); same Barak that did the presentation "On the (Im)Possibility of Obfuscating Programs" (2001) mentioned in the abstract.
IMO: Paranoid about the implications, slightly less worried about it applying in practice.
As shown by the DRM schemes used in modern games, this type of obfuscation comes at the cost of performance: Unless you want to compute a sensitive function at the user's end in an obfuscated manner, it'd be much simpler to just run that function on your end and optimize it in terms of running costs & performance.
Such a design would also runs counter to the everything-as-a-service model that companies are trending towards, as it places more power back at the user's end, even if the user can't decipher the obfuscated function's inner mechanisms. Such a design would reduce the need to phone home, and thus the need for EaaS.
Would it be possible to use this to obfuscate other cryptographic primitives (e.g. a hash function or a simple program that checks whether the hash of some input equals a hardcoded value)?
The next question is why didn’t the high end shops spending a ton of money and recruiting effort on crypto and state malware like NSA .. or Russia(?) figure this out already or did they
Because figuring out what a worm does doesn't help if it has already done it, and for that matter they're supposed to remain undetected. Furthermore people already cannot tell whether code has backdoors, or what ML classifiers do for that matter.
Unfortunately, the biggest industrial use case is making wi-fi routers that are absolutely impossible to install OpenWRT on, etc...
> The next question is why didn’t the high end shops spending a ton of money and recruiting effort on crypto and state malware like NSA .. or Russia(?) figure this out already or did they
The fun open secret about working as a researcher for the NSA is that even after you leave the NSA, any and all research must be approved by them before publishing.
There are more efficient ways to hide malware from current scanners, and on the flip-side, the scanners that use heuristics/ML to classify the sequence of system calls made by the binary wouldn't be affected by this sort of obfuscation.
In theory? Absolutely; unrolling a loop is exactly what it sounds like. Recursion is just a fancy loop. But there's a reason no one does that, and why no compilers emit that as code.
It is not efficient enough that you need to worry about exciting developments using this scheme. But it’s a step towards more reasonable mathematical assumptions than previous constructions, assumptions that may actually be true. Give it a few more years and we might be able to use this stuff.
> I think Saberhagen's Berserkers actually had this feature.
Yes. Code decoded other code, etc. So if the code wasn't running normally, it couldn't be analyzed.
It was handwaving back then, but it may become real.
It was standard coding practices if you were selling games - to prevent people from hacking the license key stuff. Or by cheating software to prevent detection by anti-cheat software. One of the hooks built into Windows is "is a debugger connected" helps the virus/cheat code determine if it is "safe" to run or not.
You can be sure that it matches if the vendor (1) doesn't use an obfuscated binary, or (2) provides exact details of how the obfuscated binary was constructed (presumably including some kind of seed for the obfuscator).
This was the point behind Reflections on Trusting Trust - you can't trust it. Unless you build the entire compiler/linker tool chain from source that you've already inspected.
I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?
Yet I think the utility of encryption is well-demonstrated, and not just theoretical. It can be used by good people to defend against adversarial intent. And it may well be that obfuscation (in its most direct application) has the same effect. (e.g. companies sharing proprietary algorithms for you to run at home, without revealing the secret sauce; or me delegating computation to AWS without revealing secrets)
Note that this is not why indistinguishability obfuscation (iO) is a crown jewel, here. Practically, iO is nowhere close to obfuscating anything larger than a tiny circuit. But it can still be useful to do things like obfuscate secret keys when designing cryptographic protocols. Theoretically, iO allows us to derive essentially every cryptographic primitive, which is why this paper is interesting, and why iO is called a crown jewel. And now, we can build iO for the first time from well-studied hardness assumptions.
> I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?
There is a huge difference: Encryption is used to hide something from a third party, while obfuscation is used to hide something from the intended recipient.
Or, if you'd like to argue that the intended recipient is the computer, not the user: To turn that computer into a deputy of the sender, while still formally belonging to the recipient.
> while obfuscation is used to hide something from the intended recipient.
No. Most obfuscation is done to increase the time needed to reverse the source of publicly available programs. Not the user (99% of the recipients) but the adversary is the intended target. The users are just cought in the crossfire.
Unless you consider every user of apps as someone who has both the skill and the need to reverse engineer your app.
> No. Most obfuscation is done to increase the time needed to reverse the source of publicly available programs.
So to stop the intended recipient from deciphering the program.
"Adversaries" are the strawman used to hurt legitimate customers who are left with black boxes for device drivers and no functioning hardware since the manufacturers make it impossible to open source firmware updates.
In case of DRM sure, but there are cases like online games where "adversaries" are players (intended users of the app) trying to use cheat programs on game binary to get an advantage over other players.
I share the sentiment that in most cases it just serves DRM vendors and is disservice to users but that's not all the cases.
> I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?
It's easy to argue that everyone has legitimate interest in hiding some things in data, but what would be a legitimate case for hiding things in code?
Except we know(and have example from cryptocurrency) that the problems with ownership in real life have little to do with cryptography.
The reason coca cola doesn't use Pepsi's recipe isnt because mixing the ingredients under heat creates an irreversible mixture that you can seperate to find the recipe. They don't use pepsis recipe because its illegal.
That's true for a recipe that was developed in a past millennium. Trade secrets are still very much relevant for companies that work with more modern technologies than soda from before the Great War, and a useful protection against competition. Even if you do believe that companies within the US or EU wouldn't touch other's IP with a 10 foot pole due to strong laws, you'd have to be very naive to think this is valid in all parts of the globe.
Edit: it's also trivially not true for trade secrets in the legal sense -- if you upload your trade secret willingly to AWS, it's no longer a trade secret, unless you have extra contracts in place. With obfuscation, you could skip a lot of red tape to keep it secret.
You can't copyright a recipe, the thing reason it doesn't make sense for Pepsi to make an exact duplicate of coke is that Coke is better at selling Coke that Pepsi is, and it's better for them to be a niche alternative some people prefer than to be an undifferentiated competititor. Reverse engineering trade secrets is perfectly legal if you haven't agreed otherwise.
You say the intent is different but don’t show why. What about obfuscation implies malicious intent? How come it doesn’t apply to encryption?
It can’t be about applications, because I can come up with a thousand applications for obfuscation with benign intent. Such as delegating private computation to AWS without Amazon learning my personal information.
If it's a total black box, wouldn't the NOBUS thing to do would be to have some large key that it's watching input for that flips it into a malicious mode?
If BB(6) took years to execute, how long would you have to spend feeding random input to a suspected-hostile 10000 symbol Turing machine (whose source code and state you can't examine) in a sandbox before you decided it was safe?
Wouldn't this also be a field day for encryption maximalists?
I'd imagine if all code were to be perfectly obscured then to the public it wouldn't matter, but many would demand for more open source and that'd be the only way to trust.
Autonomous, polymorphic malware utilizing secure computation to perform operations that aren't revealed to systems they infect.
Seems like there is an opportunity for bot nets to perform computations, say BTC mining, in a completely sealed and zero-knowledge environment, which now may not even need a backdoor or are able to hide it perfectly.
Seems some of those cases (backdoors and classifiers) could be prevented by regulation and/or social convention (eg- don’t use programs/services obfuscated in this way).
The malware seems trickier. Maybe systems will need to require proof of unobfuscated source to run code?
If I’m reading Wikipedia [1] correctly, a program that does an AND of 32 booleans when obfuscated is 32 GiB. No indication of runtime overhead but something tells me it’s a large constant overhead (if I skimmed the paper correctly, the complexity class must be the same).
If mankind can't stomach 10% overhead to check array bounds and collect unused memory, what makes anybody think they can sell 1,000,000% overhead to make your DRM driver (and viruses) harder to reverse?
DRM implementations already use code obfuscation heavily. Can someone with knowledge of the math clarify why this is a big deal? Why would it matter that there is mathematical proof that someone cannot figure out what the code is doing? How does it apply to crypto?
The code obfuscation that DRM implementations use is not provably secure. I don't think anyone would (or should) hide a bitcoin secret key inside of an obfuscated program. There's no formal guarantee of security.
This work, on the other hand, shows a provably secure construction for obfuscation (assuming that some assumptions about the hardness of various well-studied mathematical problems are true). In other words, extracting a bitcoin secret key from an obfuscated algorithm is as hard as insert math problem here. (This is how all cryptography works; e.g. breaking RSA encryption is as hard as factoring large products of primes.) This paper is special because, for the first time, the hard math problem they use (to build the obfuscation) is "reasonable": they've been studied for decades, and no one knows how to break them. (Indeed, a lot of cryptography is built on top of the same assumptions/problems.)
Thus, we would be surprised if someone extracted our bitcoin secret key from our obfuscated program, because it means they solved some "thought-to-be-unsolvable" mathematical problem. This is a much nicer guarantee than the alternative, e.g. with current obfuscation, where we say "oh, it looks pretty random, let's stick my key in" and then trusting that no one will break it. (That's a lot of trust, when you might have a 100,000 Bitcoin at stake.)
Of course, none of this is really feasible; at best, we can securely obfuscate tiny circuits (in the present time). So the main utility is probably to hide keys and the such, not to obfuscate entire programs. This might enable better secure MPC schemes, NIZKs, etc. So I guess, per your question, it's also targeted towards a different use case.
Now, as to why theoretical cryptographers care about obfuscation in general: it's not so much about the direct application of obfuscating programs. (It's not obvious why obfuscation is more interesting than encryption, etc, from an applied point of view.) Instead, from a theoretical point of view, if we can build indistinguishability obfuscation, we can directly build public key encryption, non-interactive zero knowledge proofs, Multiparty computation, etc. etc. (assuming one-way functions). So really, this is a primitive that somehow connects all of the other primitives, which is why theoreticians think it is a big deal.
> I don't think anyone would (or should) hide a bitcoin secret key inside of an obfuscated program.
I can't remember where I read this, or if it was just a hypothetical, but I think I heard of people doing this deliberately as a sort of canary. If the wallet gets drained, then you know someone cracked your obfuscation.
For example, we could write a program that hard-codes a secret key, and then signs bitcoin transactions with that secret key, but only if the transaction has value less than 0.5 bitcoin. Then anyone who has the program can sign things of small value on our behalf; if the program is indistinguishably-obfuscated, they won't be able to get any additional information about the key itself, or use it in any other way.
I guess just encrypting the key won't let you use it to sign things.
Imagine you wrote a decoder program with the key embedded in it. From looking at the program you can't work out what the key is, but you can use it to decode a secret message. You can have a secret encoder program too. Effectively blackbox obfuscation among other things allows you to turn symmetric encryption into asymmetric encryption.
Another possible use is in cloud computing. You could prepare a program with private information in it and still safely submit it to a third party to execute. This would also be very useful with smartcontracts. Since the smart contract code is public it can't contain any secrets unless it's safely obfuscated.
As I understand it we are still quite a way off from any of these techniques being feasible, since the overhead at the moment is huge.
> Program obfuscation would enable a host of useful applications: For instance, you could use an obfuscated program to delegate particular tasks within your bank or email accounts to other individuals, without worrying that someone could use the program in a way it wasn’t intended for or read off your account passwords (unless the program was designed to output them).
I relate this to cracking; run a program and watch the memory, see where the password is checked and edit the binary at that location to bypass. So this level obfuscation would make that impossible? Seems like magic honestly.
Does this mean we will be able to embed private keys and API keys in deliverables? (i.e. to cryptographicaly sign artifacts, APIs tokens, and other niceties).
Sucks that DRM and user rights violations is about to get much much worse.
It's not weird, it's how academia works, it's not even about competition, it's about putting in the work in a sustained manner. It's about proving if you can get funding (who do you know, how often do you publish, what type of publishing do you do), how long have you been teaching for, etc etc. In addition to experience, most institutions have specific criteria and expectations for promotion to full professorship... demonstrated record of excellence in teaching, multiple significant contributions to the field through research and active engagement in service and leadership roles within the academic community. This guy has a very very sparse resume. Just because he did one impressive thing doesn't mean anything in academia (thankfully).
I wonder how this compares to homomorphic encryption? If I understand correctly, I could give someone a computation with an embedded key that operates over encrypted data, and they wouldn't be able to decrypt the data other than for the computation I gave them, right?
I think it compares more to a One Time Program or Functional Encryption.
Homomorphic Encryption won't prevent you from decrypting other outputs (if you have a decryption key), and with Functional Encryption it's assumed both parties know the function being evaluated, so OTP is really the closest.
But does it withstand a powerful LLM attack? Because cursory tests on existing commercial code obfuscation solutions so far look like it’s clubby baby seals for them LLM
This is not about software obfuscation, this is about cryptographic indistinguishability obfuscation (iO). It's targeting a different problem -- software obfuscation is more about hiding program behavior, whereas iO is more useful for white box cryptography, like hiding an AES key.
Maybe you could obfuscate software eventually, but we don't have practical efficiency even for tiny circuits. It's a bit like saying how fully homomorphic encryption allows you to securely run "software" on someone else's computer: theoretically maybe yes, but in practice it's used within software to run very specific computations.
Why? It's not a trash title. Explain yourself, and give us some substance instead of being inflammatory.
Indistinguishability obfuscation (plus one-way functions) implies basically every cryptographic primitive there is, including public key encryption, NIZKs, and MPC. It gets close to giving us FHE (open question, I think). In some sense it is a "unifying primitive". Absolutely a holy grail to cryptographers, even if it isn't for you.
In fact, the article is only about indistinguishability obfuscation. What is mostly discussed in this thread is the notion of virtual black box obfuscation (VBB). VBB has been proven to be impossible in the general case (see https://www.wisdom.weizmann.ac.il/~oded/PS/obf4.pdf). There are a few special programs where VBB is feasible, such as point functions, but in general in cannot be achieved.
Indistinguishability obfuscation (iO) means that if you obfuscate two programs that compute the same function, then you cannot distinguish them. Or put in different words, if you get two obfuscated programs, then there is no better way than random guessing (except for a factor that is negligible in some security parameter) to find out if they stem from the same original program.