Reverse Engineering Snapchat (Part II): Deobfuscating the Undeobfuscatable
2020-06-22
Black box
Many Hackernews users suggested using emulation to generate the token, treating the whole thing as a black box. The problem with this solution as I mentioned in a comment is that there’s way too many real device dependencies. X-Snapchat-Client-Auth-Token
isn’t random gibberish. It contains lots of info about the device, encrypted. Even if you use Corellium, I’ll just leave it at saying that the chances of it working are slim. I don’t see an alternative to reversing it.
Deoptimizing optimizations
First of all how is all this monstrosity achieved? When you compile hello.c
using clang
, clang
is just the front end, llvm
is the real deal; meaning clang
takes the C source and converts it to an intermediate representation (IR)1, llvm
then interprets this IR, optimizes and compiles it to machine code. The advantage of this system is that the backend remains language-agonstic, you just write an LLVM-IR-compatible-front end for each language and leave the rest to llvm
. Now what we’re interested in is optimizations AKA optimization passes. llvm
operates on the IR to generate more efficient code, which might look a bit different in assembly than what you expect.
That’s where obfuscation comes; instead of actually optimizing the code, we can change it however we want as long as we keep the semantics. That’s how O-LLVM
2 works; at the compiler level, because no one can maintain source code like that. (The team behind O-LLVM is now acquired by Snap 3)
If we understand the obfuscations can’t we just do the reverse and generate something that resembles the original assembly? As well articulated by one Hackernews user4, obfuscation is mostly lossy; deobfuscate all you want (many cool libraries5 allow you to do this), but you’re way better off getting your hands dirty with the binary directly.
“Evan Spiegel Hates This Trick!”: or, How to Bypass the Breakpoint Check
This might end my Snapchat reversing career (or the whole post really). One of the biggest hurdles with this binary is fuckup_debugging
. You won’t be able to do any kind of dynamic analysis (which is the only way to go in this binary in my opinion) because of it. And you can’t patch fuckup_debugging
to return the right path_key
so that it takes the right execution and not the infinite loop because there are anti-tamepring checks to stop you from communicating with the server in a modified binary. But first how do you even get the right path_key
, by defnition you’ll have to set a breakpoint so that you can get to fuckup_debugging
in the first place? Is it a dead lock? Let’s see it in action:
Here we’re in the function that returns the token, this is the block that calls fuckup_debugging
, I know that because when I debug it, the block right after that is an infinite loop.
So how do you “hide” breakpoints from fuckup_debugging
?
- Set a breakpoint right before it’s called.
- Disable this breakpoint (I disable all of them just in case):
(lldb) br dis
- Single step inside
fuckup_debugging
.(lldb) si
- Now as far as
fuckup_debugging
is concerned, no breakpoints exist. - Set a breakpoint before it returns and stop at it:
(lldb) b <you know where> (lldb) c
- Now you have the correct
path_key
inx0
. - Profit.
A couple of gotchas here:
- If you got smart like I did and after disabling all breakpoints, stepped over
fuckup_debugging
using(lldb) ni
instead of single stepping, you’ll get an infinite loop, because you’ve just set another breakpoint. That’s because stepping over is essentially a breakpoint at the address of the next instruction. While single stepping has a sweet littleptrace
command of its own (PTRACE_SINGLESTEP
)6, or in other words it executes at the CPU level, no code patching. - The weakness of
fuckup_debugging
is that it doesn’t check breakpoints for itself. That’s why you’d be able to set a breakpoints inside it harmlessly. There, a new challenge for future Snap reverse engineers after they patch this.
Now we’ve bypassed one breakpoint check for one function. Not every function in token generation is breakpoint-checked. But now for every check that you bypass using the above method, you can add a comment with the correct path key. So in the future it will be a matter of:
We stepped over fuckup_debugging
and patched the return value on the fly. We’re a bit slowed down because we’ll have to do this with every patched function, but now we can at least do some real debugging.
Getting what you want
Here’s what happens on a very high level with the token. We have the joint function gen_token
, which calls set_token_params
, which is a mammoth joint function that calls many other functions. Then the token is encrypted and returned from gen_token
. To get info like this, you have to spend some time on the binary and make educated guesses until you have a rough big picture, then you can go for something specific, or a bottom-up approach to be concise. Let’s go over an example of how you would reverse a specific token parameter.
Anchor addresses
Having bookmarks in critical areas of the binary will save you lots of time and headache. For example I had bookmarks where the token params are being written, right before the token is encrypted, etc. That way I can start working on a certain parameter just by knowing its offset in the token structure, since that’s the closest trace I know to where it’s generated. But how do you find these anchor addresses in the first place?
Watchpoints are underrated
If you asked me to give you only one word that will solve more than half of your problems with the Snap binary, I would tell you: watchpoints. For example you have no idea where to start with anything, but you know that gen_token
returns the token’s pointer, so that’s your lead. You then trace gen_token
before it returns, and you find that this pointer is actually written to by another function before being returned by gen_token
, say the equivalent C code:
Now before real_deal
is called, you set a watchpoint on token_out
, disable all breakpoints so that fuckup_debugging
is happy, continue execution, then the watchpoint will stop the process as it’s being written to/read from, getting you to a possibly critical point of the code that you can use as an anchor. There’s my second ace with the Snap binary.
No trick is too dirty
Watchpoints are nice and all, but what about registers? You can’t set a watchpoints on those, or in theory you could, but that’s for another time. So what if the value you’re interested in is in a register? The answer is an execution trace and a text editor that supports regex.
Suppose we’re in set_token_params
and we know that the value we’re interested in is in x2
:
Since that’s the start of the block and there’s no CFG, we have no idea where x2
came from.
What I did is I generated an execution trace using Frida’s Stalker7 from a point near set_token_params
enough and one that I know isn’t policed by fuckup_debugging
; because Frida patches instructions to hook to functions, which will trigger the breakpoints check. Now I had a sequential execution trace. Then I used a good old text editor to find the above block, and searched for points before it where x2
is written to, where the source of the data is.
Dreams of Assembly
Once you find that a certain paramter is generated by the function gen_param1
, I see no other way but to reverse it to its high level source code equivalent. I see C (haha) as the best fit for this as you can accurately translate data structures/types from assembly to C. And as close as C is to the CPU, you still have to know how struct alignment works, make educated guesses about certain values and their types, and keep track of all stack offsets. I had dreams of assembly instructions while working on this binary, no shit.
MBA, again
MBA expressions are the trickiest thing in this binary. The latest research8 on MBA says that there are effectively two ways to simplify MBA expressions. Let’s explain them then give an example from the binary.
Synthesis
In this method9 you treat the whole expression as a black box “oracle”, you give it input, observe the output, and try to generate a function simulating that behavior. The biggest problem with this is that it’s impractical for complex expressions8.
Re-write rules
In this method8 all the rules for simplification is hardcoded, so for example you have a rule that says:
(x | y) - y + (~x & y) == x ^ y
.
Then you try to match the rules with the expression recursively, after which you use an SMT solver to prove that the original expressions is identical to the simplified tone.
The problem with this is that the rules aren’t universal like boolean/algebraic simplification, so it won’t be able to handle expressions outside the hardcoded ones.
Standing on the shoulders of giants
The first step to this is extracting the expressions from the binary, we can do it using symbolic execution10. Which is basically a way of executing the binary where we don’t assign concrete values to variables, only symbols. A good fit for this is the awesome Triton11 framework. After extracting the expressions, for synthesis you could still use Triton12, and for re-write rules there exists SSPAM13 by Quarkslab, which is Python 2 only.
MBA simplification example
This example involves a 120+ instruction block, whose input is two values from the stack, and whose output is in 4 registers. The trickiest output is in x27, so we’ll do that one. First we need to extract the expressions. We symbolize the block’s input using Triton.symbolizeMemory
, then after emulating the block, we get the full expression in the register using Triton.getSymbolicRegister(x27).regAst().unroll()
, which, brace yourselves, prints to:
The real expression is less scary than that because Triton adds bit masks. Let’s see what SSPAM has to say:
That’s not too good. So SSPAM, the only tool that I know to exist for MBA re-write rules, didn’t give a meaningful simplification. So I thought let’s try some synthesis, I used Python’s eval
to plug in different inputs to this expression and see how the bits in the output react (try it yourself, it’s fun). In the end my synthesized expression was:
Can’t see it getting simpler than this, but is it identical to the obfuscated expression? Let’s ask Z3:
Bingo!
EDIT: @adriengnt brought to my attention Arybo which he worked on at Quarkslab (they seem to have a monopoly on MBA). It was able to simplify the expression above in one go14. I’d say this is state of the art as far as MBA obfuscation goes. Its concepts15 are interesting if you wanna get technical.
We cool, Snap?
There’s much more to token generation than obfuscations. Being able to work with the binary at this level is only half the battle. I won’t disclose how to communicate with the API, because if Godfather has taught us anything, it’s that the perfect number of sequels is one.