Reverse Engineering Snapchat (Part I): Obfuscation Techniques

2020-06-18

When you have 200+ million daily users, you’ll definitely want to keep your API private from spammers and co., so you’ll have to keep a secret in the binary that authorizes it to make API calls to your server. Snapchat (as of version 10.81.6.81) does this by including an X-Snapchat-Client-Auth-Token header in each request, a typical one would look like:

v8:7841EAFE02CD9DE06AE8E41C6478D504:2B8115D1C5873C8BD5A3A9DDA7F976B21A672A643D8AB2AC91CE223C84BA5F9EB112B65B7C85AFD9CEA86A9DC36D5F6405B8D23B369A94A5657894207F09E432CBD21953F8E4F50E44373B59FB39270360DE5113FA983D1F06FF71A0D540488403A848D1C52A2421AF4341E6BBCD702F4921E5DC134ECCF99EDBD599EAA1AAA8556C6122334A63C86711740E58E453A7049FE94634DEC8FFE2E26C28780FFA46994818F7D0915E6DB3061188784D46D381CE2BF4D15E83BEC1ABFFE29207D2A58906CAC598AD314F368CF41E1892CA032859485DC99882F97D5064D4C7C5C2A4A4975C59530F4D0289EF4BC4E7CFC89FC8279038FB6E623C88A8AB38678F1D2757F7C0914C1A162E4F5B173E694109CD67E73762D8C090D8780714861DB883977D3B85D6F503D8D8CD5167B43A2DB18B79804841FE8064AD1A8078EAEF472698AD482AA77BC5D7EB012F0946DAFB923CFD10BA06675730EF338A96D1D0081B174BE5989B77FD07DCEDCDC635DEF1EE986F65798D87A358742F152AA929800FD5BA2CC29E

Control flow obufscations:

Forget about doing static analysis on this binary. Here’s what they do on a high level: the CFG (control flow graph) is destroyed (not flattened), dead code, library calls are mostly dynamic, and all symbols for the token generation function (let’s call it gen_token) and its callees are stripped. They’re implemented in C and not Objective-C for this reason, because you can always use the ObjC runtime against itself1 (Trivia: they only started using Swift recently, but for other tasks.)

Indirect branches and opaque predicates

Let’s take a look at the very first block in gen_token. The block loads some values from different sections of the binary and then:

orr       w8,wzr,#0x3
cmp       x8,#0xb
orr       w9,wzr,#0x6
csel      x8,x9,x8,hi
adrp      x28,0x106941000
add       x28,x28,#0xe40 ; jump table
ldr       x8,[x28, x8, LSL #0x3]
br        x8

See the first two instructions. Why would they compare x8 with 0xb right after storing 0x3 in it? Opaque predicates1. The csel condition will always be false, but that doesn’t matter, because as far as the disassembler is concerned, this is a condition, and conditions need to be evaluated at runtime. Replace every single jump (including legit conditions) with a similar block, and you’ve completely destroyed the CFG for any modern disassembler. Now Ghidra/IDA would be happy to display what it thinks is a small function with a tail call, which is in fact a huge function. I’ll give Ghidra that it’s able to calculate the address in br x8 but only for the first block (because that’s where it thinks the function ends). Now that’s a plugin idea: use emulation to calculate all the addresses in indirect branches with opaque predicates. I actually worked for a bit on implementing this but then that’s not even half the battle for this binary.

Bogus instructions AKA dead code

Every few blocks you’ll find a block that loads a global constant, does some complex-looking operations on it, then just discards it and branches to somewhere else. Those are just there to confuse you and are easily detectible once you see them for a couple of times, so it’s not much of a hindrance.

Dynamic library calls

To make the code as bland as possible, and to prevent you from making educated guesses when you see a call to SecItemCopyMatching for example, most library calls are dynamic. So for example instead of a simple bl SecItemCopyMatching, they would do:

adrp x23 <address of SecItemCopyMatching>

Then, in another block they would:

blr x23

The disassembler doesn’t know the value of x23 here because, as stated above, it treats the block as if it doesn’t belong to the current function.

Loop unrolling

When you have a loop that comes with a pre-determined/fixed counter, you can get rid of the counter, and hardcode the loop iterations. This comes at a cost of the binary size, and it’s slightly faster than using a counter. Snap uses this technique in an encryption function. This block moves a huge array of bytes to another, notice how the offsets increment, replacing the counter:

ldr        w8,[sp, #0x278]
str        w8,[sp, #0x226c]
ldr        w8,[sp, #0x27c]
str        w8,[sp, #0x2268]
ldr        w8,[sp, #0x280]
str        w8,[sp, #0x2264]
ldr        w8,[sp, #0x284]
str        w8,[sp, #0x2260]
ldr        w8,[sp, #0x288]
str        w8,[sp, #0x225c]
ldr        w8,[sp, #0x28c]
str        w8,[sp, #0x2258]
ldr        w8,[sp, #0x290]
str        w8,[sp, #0x2254]
; and so on

Joint functions

Suppose you have a function that fills some structure with the right data and another that converts bytes to ASCII:

void set_struct_fields(some_struct *p);
void bin2ascii(char *in, char *out, size_t nbytes);

With a little effort you could intercept calls to both and understand what they do just by watching their behavior. Snapchat has quite a clever way of thwarting this. Instead of the two above, there would be:

void joint_function(uint64_t function_id, void *retval, void *argv[]) {
    switch (function_id) {
        case SET_STRUCT_FIELDS_FI:
            // Get argument from argv
            set_struct_fields(p);
            break;
        case BINS2ASCII_FI:
            bin2sacii(in, out, nbytes);
            break;
        // etc
    }
}

argv would include all the arguments needed. Now strip all symbols, add the above obfuscations and you’ve got an unintelligble mammoth of a function. You would think that you could still trace all calls to the joint function and treat the function_id as an identifier to the function you’re interested in. But breakpoints won’t act as you’d expect them to. See next.

The solution: not breakpoints (AKA anti-debugging measures)

Now most control flow obfuscation is against static analysis, using a debugger to get past the above would do it. Not so fast. Most functions call an anti-debugging function, which I named appropriately and whose signature is:

uint64_t fuckup_debugging(/* some args */, void *func);

There’s at least 9 such functions, all the same behavior. I haven’t taken the time to reverse them but their behavior is clear. Software breakpoints work by patching the instruction at the designated address in memory. The patch is an instruction that triggers an interrupt that’s handled by the parent process, the debugger3. That makes them easily detectable; if you have a checksum of what a certain area in memory looks like, a breakpoint in that area will invalidate the checksum. Or, you could look for the interrupt instruction’s (brk) bytes in the binary.

After doing its check, fuckup_debugging will return a uint64_t, its value which depends on whether there was a breakpoint detected. So actually there’s only two possible values. Isn’t that called a bool? Yes. But a boolean would be trivial to patch. But with an int you can’t guess the “right” value. The fuckup_debugging caller uses the return value (I’ll call it the path_key) to load an address from a jump table, if there was a breakpoint, the fetched address would lead to an infinite loop, leading the app to just keep loading with no feedback, which is the right way to do it.

Data flow obufscations

Data obfuscation is one of the harder things to work with in this binary. Here we have lots of MBA (mixed-boolean arithmetic) and scratch arguments passed to functions just to distract you.

Mixed-boolean arithmetic

One of the less resarched areas in obfuscation techniques is MBA (shoutout to the awesome Quarkslab for their research on this and many other things). Those are typically used in cryptography but can be utilized for obufscation. Basically they’re expressions that mix logical operations with pure arithmetic. For example, x = (a ^ b) + (a & b).

The interesting thing about those here is identities, for example x + y can be re-written as (x ^ y) + 2 * (x & y)5. Now imagine how huge the simple x + y expression could get if you substituted each term recursively with its MBA equivalent, crazy stuff.

An example in assembly. All what that block does is timestamp * 1000:

add        x0,sp,#0x1b8             ; struct timeval *tval
mov        x1,#0x0                  ; struct timezonze *tzone
adrp       x8,0x109499000
ldr        x8,[x8, #0x1d0]
blr        x8                       ; gettimeofday(tval, tone)
ldr        x8,[sp, #0x1b8]          ; tval->tv_sec
mov        w9,#0x3e8
mul        x8,x8,x9
ldrsw      x9,[sp, #0x1c0]
lsr        x9,x9,#0x3
mov        x10,#0xf7cf
movk       x10,#0xe353, LSL #16
movk       x10,#0x9ba5, LSL #32
movk       x10,#0x20c4, LSL #48
umulh      x9,x9,x10
mov        x10,#0xe6b3
movk       x10,#0x7dba, LSL #16
movk       x10,#0xecfa, LSL #32
movk       x10,#0xd0e1, LSL #48
add        x9,x10,x9, LSR #0x4
orr        x11,x9,x8
lsl        x11,x11,#0x1
eor        x8,x9,x8
sub        x8,x11,x8
eor        x9,x8,x10
mov        x10,#0xe6b3
movk       x10,#0x7dba, LSL #16
movk       x10,#0xecfa, LSL #32
movk       x10,#0x50e1, LSL #48
bic        x8,x10,x8
sub        x8,x9,x8, LSL #0x1     ; effectively tv_sec *= 1000

Scratch arguments

This one isn’t very prevalent in the binary but it’s still interesting to mention. I’ve seen it used in a function that reads the first 8 bytes at a pointer. It has the signature:

uint64_t get_first_qword(uint64_t scratch1, void *src, uint64_t scratch2);

scratch1 and 2 are overwritten without being used at all, again, there to slow you down a bit.

Clever shit/time buyers

In-house memmove?

To make your life even more miserable, Snap ocassionally deprives you of recognizing some basic standard lib functions, namely memmove, by implementing their own, or maybe just copying the source. You won’t be very happy after spending a day or two reversing a function to find it’s memmove in the end.

Loading by overflowing

Another honorary mention. This one has a base address and an index and it loads bytes from an array using a loop. Instead of simply adding the base address to the counter to get the byte, they do a calculation that yields two big 64-bit integers that will overflow but whose sum will be equivalent to the simple calculation. So instead of:

add        x10, sp, #0x338             ;base
ldr        x9, [sp, #0x270]            ;counter
ldrb       w9, [x10, x9]              

They do:

add        x10, sp, #0x338              ;base
ldr        x9,[sp, #0x270]              ;counter
mov        x11,#0x5bdd
movk       x11,#0x7d38, LSL #16
movk       x11,#0x1e74, LSL #32
movk       x11,#0x6d7c, LSL #48
add        x9,x9,x11
mov        x12,#0x3f94
movk       x12,#0x7886, LSL #16
movk       x12,#0xf6b2, LSL #32
movk       x12,#0xb119, LSL #48
add        x9,x9,x12
sub        x9,x9,x11
add        x9,x9,#0x10
mov        x11,#0xd943
movk       x11,#0xb8b5, LSL #16
movk       x11,#0x5fd9, LSL #32
movk       x11,#0x6bd2, LSL #48       ; x11 = 0x6bd25fd9b8b5d943
sub        x9,x9,x11
sub        x9,x9,x12
add        x9,x10,x9                  ; x9 = 0x942da027b272bb75
ldrb       w9,[x9, x11]               ; overflowing sum but right stack offset

__mod_init_func

In Mach-O binaries, functions whose pointers are in the __mod_init_func section run before main. Using otool to see how many of those are in Snap, we find a staggering 816 functions:

$ otool -s __DATA __mod_init_func Snapchat
Snapchat:
Contents of (__DATA,__mod_init_func) section
0000000106819610	0042de58 00000001 0042de58 00000001
0000000106819620	0042de58 00000001 0042de58 00000001
0000000106819630	0042de58 00000001 0042de58 00000001
0000000106819640	0042de58 00000001 0042de58 00000001
0000000106819650	0042de58 00000001 0042de58 00000001
0000000106819660	0042de58 00000001 0042de58 00000001
0000000106819670	0042de58 00000001 0042de58 00000001
0000000106819680	0042de58 00000001 0042de58 00000001
0000000106819690	0042de58 00000001 0042de58 00000001
00000001068196a0	0042de58 00000001 0042de58 00000001
...and a lot more

Hmm, seems too much to count manually. Let’s wc it:

$ otool -s __DATA __mod_init_func Snapchat | wc -l
     410

And since it’s two function pointers per line, their actual number is 816 (after discarding the first two lines). But wait, all of those point to the same function? They’re probably using duplicates as a distraction and to make your job harder, let’s see how many are there. After doing some regex to get the functions pointers, I found there’s 769 unique functions, still a huge number.

$ cat mod_init_func | sort -u | wc -l
     769

Some of those are dummy functions that do nothing useful. For example the very first one loads a constant, stores it on the stack, then discards it and returns:

sub        sp,sp,#0x10
adrp       x8,0x10641a000
add        x8,x8,#0x340
str        x8,[sp, #0x8]
add        sp,sp,#0x10
ret

Among those 769 functions some will definitely be doing some real initializations, and some could be there as another stealthy jailbreak/debugger detection. Filtering out the dummies should be easy, but we’re still talking about 700+ function, so to find the ones you’re interested in you’ll have to have some idea about how Snap is doing it, so you can get there without having to sift throught all those functions.

What’s next

I’ll probably do a part II on how to bypass all of this.