Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
UCLA computer grad constructs “crown jewel of cryptography” (acm.org)
346 points by geox on May 24, 2023 | hide | past | favorite | 133 comments


The precise mathematical definition of obfuscation and what is considered obfuscation for an average software engineer are two very different things.

In fact, the article is only about indistinguishability obfuscation. What is mostly discussed in this thread is the notion of virtual black box obfuscation (VBB). VBB has been proven to be impossible in the general case (see https://www.wisdom.weizmann.ac.il/~oded/PS/obf4.pdf). There are a few special programs where VBB is feasible, such as point functions, but in general in cannot be achieved.

Indistinguishability obfuscation (iO) means that if you obfuscate two programs that compute the same function, then you cannot distinguish them. Or put in different words, if you get two obfuscated programs, then there is no better way than random guessing (except for a factor that is negligible in some security parameter) to find out if they stem from the same original program.


To try to make this concrete, if you have a program A that does bubble sort, and a program B that does selection sort:

1. VBB would be making it so you can't glean information about A or B by running VBB(A) or VBB(B) or examining them, for various definitions of "information".

You can't tell A is a bubble sort at all, and you can't tell B is a selection sort at all.

VBB is, as mentioned, impossible in the general case.

2. IO would be making it so if you are holding IO(A) and IO(B), you can't tell them apart, and can't tell if the original was A or B.

So you can have functionally identical programs, and when you run them through IO, you can't tell whether the original was A or B.


I'm kind of confused. I'm guessing this is a dumb question, but I'm obviously missing something. If a program is running bubble sort, and another is running selection sort... can't you just run them instruction-by-instruction to see which elements they swap? And deduce what they were based on that? Like if you have [4, 3, 2, 1], and the first swap the program does results in [1, 3, 2, 4], then it clearly wasn't bubble sort, right? What am I missing?


Boolean circuits themselves do not have memory. (though you can create memory within them).

See the formal model here: https://en.wikipedia.org/wiki/Circuit_(computer_science)

There are extensions of iO to RAM programs, but this is not it.


> You can't tell A is a bubble sort at all, and you can't tell B is a selection sort at all.

That doesn't make any sense. You can trace them while they execute, then compare and analyze the two traces?

By definition obfuscation is always reversible or the software wouldn't even work any more.

You can make it infeasible (timewise) to figure it out, but it's still possible, even if it takes infinite time.


I'd suggest reading the paper - which covers this ;)


Thanks, this is a great explanation.


Ed: redundant - asked and answered in thread.


Yeah, I probably should have collected an the pieces into a single reply. Right now the number of comments is small enough that it’s not hard to find though.


The latter is useful to hide who wrote the program? Like some features of code that might reveal your personal patterns?

Or what is it for?

And how does it deal with timing? İf two programs I gave it have different complexity, would it pad the running time to some fixed value? İf so, how is that value chosen?


Thanks, I had no idea what to make of the OP, so your comment is very valuable!

> Or put in different words,

I don't see how the second phrasing follows from the first one. In fact, I don't understand what the second statement is supposed to mean to begin with. Could you elaborate?


If you have a bubble sort and selection sort, and run them through IO, for either result, you can't tell if the original was bubble sort or selection sort.

Roughly: VBB is the ability to take any program and make it indistinguishable from random.

IO is the ability to take two programs that compute the same function and make them indistinguishable from each other.


How far does this analogy go?

For example, if I plot how long both programs take at various scales, at some point I should be able to determine which one is O(n log(n)) right?


I gave the formal definition of distinguishability in the other comment, but it does not include running time.


And that involves preventing the leakage of any data? Such as the time each take?


The original goal was to be able to make public key systems out of secret key systems, so hiding existing running time is not part of the definition of indistinguishable.

It is about what information you can glean about either of the originals. Formally (from the paper):

Indistinguishability: For every two ensembles {C0,λ} and {C1,λ} of polynomial-sized circuits that have the same size, input length, and output length, and are functionally equivalent, that is, ∀λ, C0,λ(x) = C1,λ(x) for every input x, the following distributions are computationally indistinguishable:

  {iO(1λ, C0,λ)} {iO(1λ, C1,λ)}


IE you can take a secret key system, hardwire the key into the program, and turn it into a public key system without fear that the secret key will be recovered - instead, it will not be distinguishable from other programs with different hardwired keys.


> IE you can take a secret key system, hardwire the key into the program, and turn it into a public key system without fear that the secret key will be recovered - instead, it will not be distinguishable from other programs with different hardwired keys.

I don't understand. Shouldn't a different secret key yield a different function / circuit?


You can actually do it both ways - where the ciphertext is an obfuscated program, and where keys a programs and ciphertexts are short.

https://eprint.iacr.org/2013/454.pdf and friends have a direct description of the latter and references to the former.

This is just one mechanism.


Ah I think I understand better. Thanks!


That is a fascinating result. A remarkable mathematical achievement. And a nightmare.

This thesis says that it's possible to obfuscate code in such a way that there is a lower bound on the level of effort needed to de-obfuscate it. That lower bound can apparently be comparable to the level of effort required to break a cryptosystem.

So, coming soon, viruses and worms nobody can figure out. Code where no one can tell if it has a backdoor. ML classifiers where no one can be sure what they really do.


Well, no.

What you are worried about is VBB (roughly can i take a program a and obfuscate it in a way that you get no information about the original from the obfuscated version), which has been proven impossible.

This is about indistinguishability, which is a form of obfuscation, but not the kind you are thinking about.

This is really something like "If you have two programs a and b that compute the same function, it's possible to obfuscate them in a way that the resulting programs are indistinguishable from each other"

So if program a is "bubble sort" and program b is "selection sort" (function here is the mathematical sense, so these compute the same function), obfuscation can make it so you can't tell them from each other, and you can't tell if the original program of what you are holding was bubble or selection .

Roughly: VBB is the ability to take any program and make it indistinguishable from random.

IO is the ability to take two programs that compute the same function and make them indistinguishable from each other.

"compute the same function" is actually what makes it feasible :)

The original use case was really trying to turn secret key systems into public key systems by hardwiring the secret key into the program and then obfuscating it.

So an example use case today would be "every DVD decoding program has a different key but computes the same function, can you make them all indinguishtable and thus hide the key" or something like that.


> The original use case was really trying to turn secret key systems into public key systems by hardwiring the secret key into the program and then obfuscating it.

Wow, that definitely qualifies as a "crown jewel." I, for one, would not look forward to a future in which software running on my hardware is able to hide secrets, like one-off media encryption keys, from me!


In practice, as the thesis proves, you can use iO to generate a lot of things people care about in cryptography. I will go out on a limb and say you can use it to build almost everything interesting :)

Now, it happens for a lot of those we don't need iO and iO may be impractical, but as a theoretical building block, it's quite nice.

As for your concern, yes, this is now possible, at least in the sense that you do as good as is possible to do in making them computationally indistinguishable. This does not prevent you from attacking it in other ways :)

Two things, good or bad depending on how you look at it:

1. You can prove iO is as least as good as the best possible obfuscation scheme that can ever exist. So whatever that enables you to do or not, it's the limit.

2. It also means you can get away with handing over less secrets, ensure better isolation, etc.

The following things are possible with iO (trivially so), due to this paper:

Adaptively secure succinct garbled RAM - which would let you hand secure databases to untrusted providers and not worry about it.

Sender deniable encryption where you can't prove what the original plaintext was and various options are equally likely.

Fully deniable interactive encryption where secrecy of communication is protected even if the parties are later coerced (or willingly bribed) to expose their entire internal states: plaintexts, keys and randomness.

etc

These are just some examples.

Now, some of this, as I said, we know how to do already, some we don't. But this paper gives you iO as a building block that can do them without having to separately prove that it is as sound and secure as existing crypto systems are.

Again, iO only provides computational indistinguishability, not other things, but it is a nice primitive.


Thank you for taking the time to respond so thoroughly :-)

Everything you write makes sense to me -- though I had to lookup and skim Canetti and Holmgren's work to understand the bit about succint garbled RAM.

We live in interesting times!


What's iO in this context? Is it indistinguishable obfuscation?


> Wow, that definitely qualifies as a "crown jewel." I, for one, would not look forward to a future in which software running on my hardware is able to hide secrets, like one-off media encryption keys, from me!

That future is already here for most people. See things like Secure Enclave or Intel's SGX.


We already have code that obfuscates itself. And runs differently based on whether it is running in a debugger or not. This has been the cat & mouse game played by:

virus writers vs anti-virus code.

game hackers vs anti-cheat software.

license checking code vs cheapskates.

Stuxnet.

> Code where no one can tell if it has a backdoor.

Back in 1984, Ken Thompson wrote Reflections on Trusting Trust [0]. The short answer is "no, you can never ever tell if it has a backdoor. Ever."

> The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode.

Back in the 90s, I was interested in hacking. The sort of hacking that starts with "this is the disassembler. Step one, hack the trial version of IDA Pro." It was fascinating at the time (and bores me now, my punishment for growing up, I guess) so I read & did lots of stuff like that.

0 - https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...


Just don't trust that code, and don't run it?

You should require positive proof that code does what you want.

Similar to how eg Haskell's type systems doesn't have to solve the halting problem [0]: it just rejects some programs that would be ok, but don't conform to the type system.

See also proof-carrying code. https://en.wikipedia.org/wiki/Proof-carrying_code

[0] Ok, unless you use undecidable instances or something like that.


> You should require positive proof that code does what you want.

That's just saying you should only ever solve simple problems with your computer.


Why? People have eg written proven-correct C compilers. And those things ain't 'simple'.


Yes great so let's not use any software at all on our machines! only things we can program ourselves or are open-source. what a dystopia we live in...


Huh, why?

First, for already closed source programs the obfuscation results don't make a difference at all. They remain just as obscure before as they are now. (Unless you routinely try to de-compile all closed source programs?)

The obfuscation result in the submitted article really only makes a difference to open source programs.

But: thanks to advances in mathematics and cryptography, a vendor can prove to you that their program does what it's supposed to, without having to reveal the source code or anything else about the program.


> Just don't trust that code,

That's even harder problem to solve


Any program worth obfuscating is not worth trusting.


Among other things, tons of desktop software and most mobile apps and even javascript you run from most websites is obfuscated. How is it possible to avoid that? Do you run 100% open source for everything with javascript disabled?


Some people do take it that seriously: https://www.gnu.org/software/librejs/


People typically run website's javascript in a sandbox.

Mobile apps also typically have tightly controlled permissions these days.

And, of course, people don't typically review most softwares's sources anyway. They rely on trust and reputation.


Wouldn't this be a great case for a sort of machine or program that delivers Zero Knowledge Proofs for each input and output?


You could deliver a zero knowledge proof that the program you are trying to sell me is benign, without revealing eg the source of the program.

I'm not sure what you would do with a zkp of each input and output of the program?


Is it just an existence proof or is it actually constructive? (I haven't read the dissertation.)


I'm not sure. It seems to be applied to "circuits", which in this context seems to mean a one-way set of logic gates (such as AND, OR, NOT, NAND) which map a set of boolean inputs to a set of boolean outputs. In theory you can construct any finite digital function that way. It's a useful abstraction, like a Turing machine.

Unlike a Turing machine, a circuit is finite. All circuits are "solveable" (given a outputs, compute an input which yields them) by trying all the input patterns.. There's no undecidability and no halting problem. There's just difficulty. "Difficult" here means there's no way easier than trying all the patterns. This is the same property sound cryptosystems are supposed to have - there's no easier way than trying all the keys.

Whether this result can be extended to programs with iteration I'm not sure. The paper doesn't seem to mention iteration or storage.


A (polynomial-sized) circuit in this context is a sequence of DAGs whose nodes are binary logic gates where the nth DAG infers 1 output bit from n input bits, such that some polynomial bounds the size of the graphs of the sequence. See https://en.wikipedia.org/wiki/Circuit_complexity

The class of circuits in question is P/poly which includes BPP (which includes P, the class of polynomial-time programs). So the result is quite general, since most practical programs are in P.

For an introduction to complexity theory, see Arora, Barak (2009) (Draft available here https://theory.cs.princeton.edu/complexity/book.pdf ); same Barak that did the presentation "On the (Im)Possibility of Obfuscating Programs" (2001) mentioned in the abstract.


IMO: Paranoid about the implications, slightly less worried about it applying in practice.

As shown by the DRM schemes used in modern games, this type of obfuscation comes at the cost of performance: Unless you want to compute a sensitive function at the user's end in an obfuscated manner, it'd be much simpler to just run that function on your end and optimize it in terms of running costs & performance.

Such a design would also runs counter to the everything-as-a-service model that companies are trending towards, as it places more power back at the user's end, even if the user can't decipher the obfuscated function's inner mechanisms. Such a design would reduce the need to phone home, and thus the need for EaaS.


Would it be possible to use this to obfuscate other cryptographic primitives (e.g. a hash function or a simple program that checks whether the hash of some input equals a hardcoded value)?


The next question is why didn’t the high end shops spending a ton of money and recruiting effort on crypto and state malware like NSA .. or Russia(?) figure this out already or did they


Because figuring out what a worm does doesn't help if it has already done it, and for that matter they're supposed to remain undetected. Furthermore people already cannot tell whether code has backdoors, or what ML classifiers do for that matter.

Unfortunately, the biggest industrial use case is making wi-fi routers that are absolutely impossible to install OpenWRT on, etc...


> people already cannot tell whether code has backdoors

I was feeling crazy after reading that.


> The next question is why didn’t the high end shops spending a ton of money and recruiting effort on crypto and state malware like NSA .. or Russia(?) figure this out already or did they

There's no way to know. But the NSA (at least) has a documented history of making crypto breakthroughs and keeping them secret (e.g. public key crypto: https://en.wikipedia.org/wiki/Public-key_cryptography#Classi..., differential cryptanalysis: https://en.wikipedia.org/wiki/Differential_cryptanalysis#His...).


The fun open secret about working as a researcher for the NSA is that even after you leave the NSA, any and all research must be approved by them before publishing.


There are more efficient ways to hide malware from current scanners, and on the flip-side, the scanners that use heuristics/ML to classify the sequence of system calls made by the binary wouldn't be affected by this sort of obfuscation.


As mentioned on page 9 of the thesis, IO as constructed in the thesis implies IO for RAM machines, so yes you can do programs with iteration.


Can't finite iteration and recursion just be unrolled? And storage just considered another part of the input?


This is how SMT solvers deal with bounded quantifiers.


In theory? Absolutely; unrolling a loop is exactly what it sounds like. Recursion is just a fancy loop. But there's a reason no one does that, and why no compilers emit that as code.


Fancy in the sense of being a loop with an implicit stack. Or maybe just some registers if TCO.

And yet the machines we can actually build are far closer to LBAs than TMs.


It is not efficient enough that you need to worry about exciting developments using this scheme. But it’s a step towards more reasonable mathematical assumptions than previous constructions, assumptions that may actually be true. Give it a few more years and we might be able to use this stuff.


Looks like constructive but my math knowledge is probably equivalent to a freshman’s at best.


Even if the vendor is willing to provide source, you can't be sure that the binary matches. Nice.

I think Saberhagen's Berserkers actually had this feature.


> I think Saberhagen's Berserkers actually had this feature.

Yes. Code decoded other code, etc. So if the code wasn't running normally, it couldn't be analyzed. It was handwaving back then, but it may become real.


It was standard coding practices if you were selling games - to prevent people from hacking the license key stuff. Or by cheating software to prevent detection by anti-cheat software. One of the hooks built into Windows is "is a debugger connected" helps the virus/cheat code determine if it is "safe" to run or not.


You can be sure that it matches if the vendor (1) doesn't use an obfuscated binary, or (2) provides exact details of how the obfuscated binary was constructed (presumably including some kind of seed for the obfuscator).


This was the point behind Reflections on Trusting Trust - you can't trust it. Unless you build the entire compiler/linker tool chain from source that you've already inspected.

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...


I’m sure DRM companies like Denuvo are having a field day.


Obfuscation is itself a malware signature.


I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?

Yet I think the utility of encryption is well-demonstrated, and not just theoretical. It can be used by good people to defend against adversarial intent. And it may well be that obfuscation (in its most direct application) has the same effect. (e.g. companies sharing proprietary algorithms for you to run at home, without revealing the secret sauce; or me delegating computation to AWS without revealing secrets)

Note that this is not why indistinguishability obfuscation (iO) is a crown jewel, here. Practically, iO is nowhere close to obfuscating anything larger than a tiny circuit. But it can still be useful to do things like obfuscate secret keys when designing cryptographic protocols. Theoretically, iO allows us to derive essentially every cryptographic primitive, which is why this paper is interesting, and why iO is called a crown jewel. And now, we can build iO for the first time from well-studied hardness assumptions.


> I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?

There is a huge difference: Encryption is used to hide something from a third party, while obfuscation is used to hide something from the intended recipient.

Or, if you'd like to argue that the intended recipient is the computer, not the user: To turn that computer into a deputy of the sender, while still formally belonging to the recipient.


> while obfuscation is used to hide something from the intended recipient.

No. Most obfuscation is done to increase the time needed to reverse the source of publicly available programs. Not the user (99% of the recipients) but the adversary is the intended target. The users are just cought in the crossfire.

Unless you consider every user of apps as someone who has both the skill and the need to reverse engineer your app.


> No. Most obfuscation is done to increase the time needed to reverse the source of publicly available programs.

So to stop the intended recipient from deciphering the program.

"Adversaries" are the strawman used to hurt legitimate customers who are left with black boxes for device drivers and no functioning hardware since the manufacturers make it impossible to open source firmware updates.


In case of DRM sure, but there are cases like online games where "adversaries" are players (intended users of the app) trying to use cheat programs on game binary to get an advantage over other players.

I share the sentiment that in most cases it just serves DRM vendors and is disservice to users but that's not all the cases.


> I mean, the same argument has been applied to other cryptographic tools. Why encrypt your messages, unless you're sending something sketchy? Why obfuscate programs, unless you're hiding something?

It's easy to argue that everyone has legitimate interest in hiding some things in data, but what would be a legitimate case for hiding things in code?


"This is a trade secret. We have not patented it, because it's so clever and valuable that it needs to be protected longer than a patent lasts"


Except we know(and have example from cryptocurrency) that the problems with ownership in real life have little to do with cryptography.

The reason coca cola doesn't use Pepsi's recipe isnt because mixing the ingredients under heat creates an irreversible mixture that you can seperate to find the recipe. They don't use pepsis recipe because its illegal.


That's true for a recipe that was developed in a past millennium. Trade secrets are still very much relevant for companies that work with more modern technologies than soda from before the Great War, and a useful protection against competition. Even if you do believe that companies within the US or EU wouldn't touch other's IP with a 10 foot pole due to strong laws, you'd have to be very naive to think this is valid in all parts of the globe.

Edit: it's also trivially not true for trade secrets in the legal sense -- if you upload your trade secret willingly to AWS, it's no longer a trade secret, unless you have extra contracts in place. With obfuscation, you could skip a lot of red tape to keep it secret.


You can't copyright a recipe, the thing reason it doesn't make sense for Pepsi to make an exact duplicate of coke is that Coke is better at selling Coke that Pepsi is, and it's better for them to be a niche alternative some people prefer than to be an undifferentiated competititor. Reverse engineering trade secrets is perfectly legal if you haven't agreed otherwise.


Since when is coca cola niche?


That remains a fundamentally anti-social motivation...


Making it harder for bad actors to create malware would seem like one legitimate case.


> I mean, the same argument has been applied to other cryptographic tools.

Intent matters.

> Why encrypt your messages, unless you're sending something sketchy?

Because I am sending something private, I can't believe people bring this dumbass argument still, and even here.

> Why obfuscate programs, unless you're hiding something?

Why indeed. This one is the red flag, not message in your e-mail client being encrypted. Again. Intent, and context, matters.


You say the intent is different but don’t show why. What about obfuscation implies malicious intent? How come it doesn’t apply to encryption?

It can’t be about applications, because I can come up with a thousand applications for obfuscation with benign intent. Such as delegating private computation to AWS without Amazon learning my personal information.


> and why iO is called a crown jewel.

Ah, thanks. That's far from clear in the article.


Until it’s standard practice by all companies to protect up under the guise of consumer protection with all closed sourced software.


like that time sony gave everyone a rootkit: https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootk...


similar to starforce for those that don't know: https://www.youtube.com/watch?v=p-wyIalhdPU


Sure but it's a very weak signal.


Isn't that where we're already at with ML classifiers?


Well, we know they're ML classifiers and they're not making syscalls. So there's obviously a limit to "we don't know what they do".


You'd still know what syscalls a cryptographically obfuscated userland program was making.


If it's a total black box, wouldn't the NOBUS thing to do would be to have some large key that it's watching input for that flips it into a malicious mode?

If BB(6) took years to execute, how long would you have to spend feeding random input to a suspected-hostile 10000 symbol Turing machine (whose source code and state you can't examine) in a sandbox before you decided it was safe?


Easy, have all executable segments read-only with none (or fixed) syscall instructions.


Think it'd be a bit dangerous to let them make any though, unless you were single stepping.


Wouldn't this also be a field day for encryption maximalists?

I'd imagine if all code were to be perfectly obscured then to the public it wouldn't matter, but many would demand for more open source and that'd be the only way to trust.


Ah, "code is law". Good luck with that.


Just wait until the legislators catch wind of this.


Autonomous, polymorphic malware utilizing secure computation to perform operations that aren't revealed to systems they infect.

Seems like there is an opportunity for bot nets to perform computations, say BTC mining, in a completely sealed and zero-knowledge environment, which now may not even need a backdoor or are able to hide it perfectly.


> viruses and worms nobody can figure out

Sandboxing and modeling interaction are still possible and useful, and is a large part of figuring out viruses and worms. That's not going away


Yeah pretty much. It is only going to get wilder in the ransomware scene. Good times ahead.


Seems some of those cases (backdoors and classifiers) could be prevented by regulation and/or social convention (eg- don’t use programs/services obfuscated in this way).

The malware seems trickier. Maybe systems will need to require proof of unobfuscated source to run code?


Could you steal source code, obfuscate it, and sell it as your own?


Does it run at the same speed as the original?


Across the same universe of discourse.


If I’m reading Wikipedia [1] correctly, a program that does an AND of 32 booleans when obfuscated is 32 GiB. No indication of runtime overhead but something tells me it’s a large constant overhead (if I skimmed the paper correctly, the complexity class must be the same).

[1] https://en.wikipedia.org/wiki/Indistinguishability_obfuscati...


But can it be compressed with something like UPX?


And can you tell where an operation begins and ends, for example, determine inputs and outputs?


If mankind can't stomach 10% overhead to check array bounds and collect unused memory, what makes anybody think they can sell 1,000,000% overhead to make your DRM driver (and viruses) harder to reverse?


We already accept something on the order of 1000% overhead for video DRM (software vs hardware decoding)


Same reason. Money.


Denuvo would like a word with you.


Here's the quanta article referenced by the title: https://www.quantamagazine.org/computer-scientists-achieve-c...


And its discussion from 3 years ago:

https://news.ycombinator.com/item?id=25046738


DRM implementations already use code obfuscation heavily. Can someone with knowledge of the math clarify why this is a big deal? Why would it matter that there is mathematical proof that someone cannot figure out what the code is doing? How does it apply to crypto?


The code obfuscation that DRM implementations use is not provably secure. I don't think anyone would (or should) hide a bitcoin secret key inside of an obfuscated program. There's no formal guarantee of security.

This work, on the other hand, shows a provably secure construction for obfuscation (assuming that some assumptions about the hardness of various well-studied mathematical problems are true). In other words, extracting a bitcoin secret key from an obfuscated algorithm is as hard as insert math problem here. (This is how all cryptography works; e.g. breaking RSA encryption is as hard as factoring large products of primes.) This paper is special because, for the first time, the hard math problem they use (to build the obfuscation) is "reasonable": they've been studied for decades, and no one knows how to break them. (Indeed, a lot of cryptography is built on top of the same assumptions/problems.)

Thus, we would be surprised if someone extracted our bitcoin secret key from our obfuscated program, because it means they solved some "thought-to-be-unsolvable" mathematical problem. This is a much nicer guarantee than the alternative, e.g. with current obfuscation, where we say "oh, it looks pretty random, let's stick my key in" and then trusting that no one will break it. (That's a lot of trust, when you might have a 100,000 Bitcoin at stake.)

Of course, none of this is really feasible; at best, we can securely obfuscate tiny circuits (in the present time). So the main utility is probably to hide keys and the such, not to obfuscate entire programs. This might enable better secure MPC schemes, NIZKs, etc. So I guess, per your question, it's also targeted towards a different use case.

Now, as to why theoretical cryptographers care about obfuscation in general: it's not so much about the direct application of obfuscating programs. (It's not obvious why obfuscation is more interesting than encryption, etc, from an applied point of view.) Instead, from a theoretical point of view, if we can build indistinguishability obfuscation, we can directly build public key encryption, non-interactive zero knowledge proofs, Multiparty computation, etc. etc. (assuming one-way functions). So really, this is a primitive that somehow connects all of the other primitives, which is why theoreticians think it is a big deal.


> I don't think anyone would (or should) hide a bitcoin secret key inside of an obfuscated program.

I can't remember where I read this, or if it was just a hypothetical, but I think I heard of people doing this deliberately as a sort of canary. If the wallet gets drained, then you know someone cracked your obfuscation.


How does obfuscating a key differ from encrypting it?


For example, we could write a program that hard-codes a secret key, and then signs bitcoin transactions with that secret key, but only if the transaction has value less than 0.5 bitcoin. Then anyone who has the program can sign things of small value on our behalf; if the program is indistinguishably-obfuscated, they won't be able to get any additional information about the key itself, or use it in any other way.

I guess just encrypting the key won't let you use it to sign things.


Imagine you wrote a decoder program with the key embedded in it. From looking at the program you can't work out what the key is, but you can use it to decode a secret message. You can have a secret encoder program too. Effectively blackbox obfuscation among other things allows you to turn symmetric encryption into asymmetric encryption.

Another possible use is in cloud computing. You could prepare a program with private information in it and still safely submit it to a third party to execute. This would also be very useful with smartcontracts. Since the smart contract code is public it can't contain any secrets unless it's safely obfuscated.

As I understand it we are still quite a way off from any of these techniques being feasible, since the overhead at the moment is huge.


For those, like me, wondering what this is about;

> Program obfuscation would enable a host of useful applications: For instance, you could use an obfuscated program to delegate particular tasks within your bank or email accounts to other individuals, without worrying that someone could use the program in a way it wasn’t intended for or read off your account passwords (unless the program was designed to output them).

I relate this to cracking; run a program and watch the memory, see where the password is checked and edit the binary at that location to bypass. So this level obfuscation would make that impossible? Seems like magic honestly.


> relate this to cracking; run a program and watch the memory, see where the password is checked and edit the binary at that location to bypass

You can frustrate this by computing a very long hash and sprinkling the checks of the various pieces throughout the program's execution.


Does this mean we will be able to embed private keys and API keys in deliverables? (i.e. to cryptographicaly sign artifacts, APIs tokens, and other niceties).

Sucks that DRM and user rights violations is about to get much much worse.


your API tokens could still probably get sniffed in the requests?


No because the requests are encrypted (with a public-key crypto method for example).


The article doesn't explain what "indistinguishability obfuscation" does, nor why it is the "crown jewel of cryptography".


It links to the thesis, which has a good summary in the first few pages


So wait, doesn't this also indirectly imply that we will never be able to decipher neural networks of sufficient size for the general case?


Weird how having research of this quality only lands you assistant professorship. The competition in academia is insane


It's not weird, it's how academia works, it's not even about competition, it's about putting in the work in a sustained manner. It's about proving if you can get funding (who do you know, how often do you publish, what type of publishing do you do), how long have you been teaching for, etc etc. In addition to experience, most institutions have specific criteria and expectations for promotion to full professorship... demonstrated record of excellence in teaching, multiple significant contributions to the field through research and active engagement in service and leadership roles within the academic community. This guy has a very very sparse resume. Just because he did one impressive thing doesn't mean anything in academia (thankfully).


I wonder how this compares to homomorphic encryption? If I understand correctly, I could give someone a computation with an embedded key that operates over encrypted data, and they wouldn't be able to decrypt the data other than for the computation I gave them, right?


I think it compares more to a One Time Program or Functional Encryption.

Homomorphic Encryption won't prevent you from decrypting other outputs (if you have a decryption key), and with Functional Encryption it's assumed both parties know the function being evaluated, so OTP is really the closest.


But does it withstand a powerful LLM attack? Because cursory tests on existing commercial code obfuscation solutions so far look like it’s clubby baby seals for them LLM


Thanks to DannyBee for ITT de-obfuscating the finding and its practical implications.


1 year old


Trash title.

> established the feasibility of mathematically rigorous software obfuscation from well-studied hardness conjectures


This is not about software obfuscation, this is about cryptographic indistinguishability obfuscation (iO). It's targeting a different problem -- software obfuscation is more about hiding program behavior, whereas iO is more useful for white box cryptography, like hiding an AES key.

Maybe you could obfuscate software eventually, but we don't have practical efficiency even for tiny circuits. It's a bit like saying how fully homomorphic encryption allows you to securely run "software" on someone else's computer: theoretically maybe yes, but in practice it's used within software to run very specific computations.


Why? It's not a trash title. Explain yourself, and give us some substance instead of being inflammatory.

Indistinguishability obfuscation (plus one-way functions) implies basically every cryptographic primitive there is, including public key encryption, NIZKs, and MPC. It gets close to giving us FHE (open question, I think). In some sense it is a "unifying primitive". Absolutely a holy grail to cryptographers, even if it isn't for you.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: