Blog‎ > ‎

CVE-2015-2783: Exploiting Buffer Over-read in Php's Phar

posted Apr 15, 2015, 9:24 PM by Emmanuel Law   [ updated Jun 1, 2015, 8:39 PM ]
Background

The Phar extension is built into PHP > 5.3.0. It allows developers to use manipulate the following archives: tar, zip, phar.

I found this vulnerability while I was assessing the security of the phar extension. When parsing a phar file, it is possible to trigger a buffer over-read condition and leak memory information. This vulnerability is interesting as it's not a typical exploitation just by controlling the read size. 

Affected version are PHP < 5.6.8RC1

Technical Details
Phar files metadata are stored in php serialized format. When processing a phar file, php attempts to unserialize the medatadata in phar.c:

623
if (!php_var_unserialize(metadata, &p, p + buf_len, &var_hash TSRMLS_CC)) {


  • p points to the start of serialized metadata in the file buffer. 
  • p + buf_len points to the end
  • buf_len is a field specified in the phar file and user controllable

php_var_unserialize() is the same function that is called when unserialize() is invoked in PHP "user-land".

Within Php_var_unserialize () there is a sanity check to ensure that p does not go beyond p + buf_len:

461


469
470
471

892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
PHPAPI int php_var_unserialize(UNSERIALIZE_PARAMETER) //zval **rval, const unsigned char **p, const unsigned char *max, php_unserialize_data_t *var_hash TSRMLS_DC
{
...
	if (YYCURSOR >= YYLIMIT) {
		return 0;
	}
.....
yy48:
	++YYCURSOR;
	if ((YYLIMIT - YYCURSOR) < 2) YYFILL(2);
	yych = *YYCURSOR;
	if (yych <= '/') goto yy18;
	if (yych <= '9') goto yy48;
	if (yych >= ';') goto yy18;
	yych = *++YYCURSOR;
	if (yych != '"') goto yy18;
	++YYCURSOR;
	{
	size_t len, maxlen;
	char *str;

	len = parse_uiv(start + 2); 
	maxlen = max - YYCURSOR;   
	if (maxlen < len) {p
		*p = start + 2;
		return 0;
	}

  • Line 469 checks that p !> p+buflen
  • Line 894 looks like it's sanity checking as well. But this is just template code left over by re2c when generating var_unserializer.c file. It is typically used by the regex library to fill up the data buffer when it's getting low. However in PHP case, php_var_unserialize will always have the full serialized data in the buffer from the very beginning.  Thus YYFILL is just a while(0) loop that does nothing. It's interesting to note that the compiler actually optimized this line of code out. In essence, this line of code never gets executed.
  • Theres another sanity check at line 908 to ensure the length of the data doesn't go beyond p+buf_len

When we start unserializing a string in the format: s:<len>:"<Data>", lines 890-900 is essentially a loop to extract out <len>. As long as <len> is a digit, it would keep looping even if YYCURSOR goes beyond  p+buf_len.

When YYCURSOR goes beyond max (aka p+ buf_len or YYLIMIT),  it results in an integer underflow on line 907. Thus the sanity check on line 908 will always pass.

There's also some format checks:

900

915
916
917
918
919
920
if (yych != '"'goto  yy18;  
....
YYCURSOR += len;

if (*(YYCURSOR) != '"') {
	*p = YYCURSOR;
	return 0;
}

Lines 900, 915 to 920 checks the data format for the tokens as highlighted :  s:<len>:"<Data>" 



In essence, what this means is that once YYCURSOR goes beyond max when extracting <len>, <len> can be as large a number as you want and it will unserialize to a string successfully as long as it's in the format  s:<len>:"<Data>"

This is the state you want it look like during exploitation:





At this point one might ask, how do we ensure that the ending byte is  " since it's in a memory region beyond our control? Well you have 1 in 255 chance of strike gold. But one possible way to exploit it is just to keep "hammering" the various values of  xxxxxxxxxxxx . Sooner or later you will encounter a  ".

One might also ask, why can't I trigger this buffer over read through a typical unserialize call? For example why can't we trigger it via unserialize("s:0010") + some heap massaging? Why is it only vulnerable when we unserialize through phar?
  • Reason is when we do a typical unserialize(), we are passing in a string. In PHP, a string is always null terminated. So essentially you are passing in s:0110\0 . Even if we massage the heap such that :" appears immediately after the string, it still wouldn't unserialize properly as the null byte breaks the unserializing process. 
  • In Phar, the attacker control the data immediately past p+buf_len as it is still part of the phar file.


Here's a screenshot of successful exploitation to leak memory a la Heartbleed style:



After Thoughts

I exploited this by using a serialized string type. It would be interesting to see if we can get code execution using other types.

The full bug report could be found here

Comments