Columbus, Ohio

PHP file verification – md5_file vs sha1_file vs crc32 and no native crc32_file

The following was originally posted on under the ‘User Contributed Notes’ but was recently removed. Since this information came up in topic again with a colleague, I am making this research available on my blog.

If you are trying to decide on a function for file verification, I came to the conclusion that md5_file() is the best all around solution.

file_crc() function that Bulk at bulksplace dot com posted on is the most efficient solution on Windows for small and medium size files. It is most likely because file_get_contents() uses memory mapping techniques. Unfortunately on Linux (Fedora), the results were slightly better for md5_file().

sha1_file() on large files is slower than md5_file(). The time it takes for the __crc32_file() function found on is linear to the size of the file. I would avoid using __crc32_file(). The file_crc() function will fail when using the file_get_contents() if the file is larger than the PHP.ini memory_limit setting. Windows does not seem to use the memory_limit for file_get_contents(), but I did run into an error ‘FATAL: emalloc(): Unable to allocate x bytes’ when testing iso files.

I ran the following tests on both WindowsXP and Fedora 4 machines.

< ?php // File verification tests by Angelo Mandato (angelo [at] mandato {period} com) // __crc32_file() is very slow, you can uncomment to test for yourself. //require_once('crc32_file.php'); // Copy and paste the contents of the crc32_file() code found on // the crc32 PHP manual page in a new file and save // as crc32_file.php in the same directory as this script. // Get microseconds function GetMicrotime() { list($usec, $sec) = explode(" ", microtime()); return ((float)$usec + (float)$sec); } // file_crc() - function to test function file_crc($file) { $file_string = file_get_contents($file); $crc = crc32($file_string); return sprintf("%u", $crc); } $Methods = array('sha1_file()', 'md5_file()', 'file_crc()'); if( function_exists('__crc32_file') ) $Methods[] = '__crc32_file()'; $directory = '/path/to/directory/'; // Don't forget trailing backslash. $files = scandir($directory); for( $method_index = 0; $method_index < count($Methods); $method_index++ ) { $start_time = GetMicrotime(); while( list($index,$file) = each($files) ) { if( $file != '.' && $file != '..' && is_file($directory.$file) ) { switch( $method_index ) { case 0: { // sha1_file() $value = sha1_file($directory.$file); }; break; case 1: { // md5_file() $value = md5_file($directory.$file); }; break; case 2: { // file_crc() $value = file_crc($directory.$file); }; break; case 3: { // __crc32_file() $value = __crc32_file($directory.$file); }; break; } } else // It is not part of our test results, lets remove it from the array { unset($files[$index]); } } $end_time = GetMicrotime(); echo sprintf("%s took %.03f seconds to calculate %d files.n", $Methods[$method_index], $end_time-$start_time, count($files) ); reset($files); // Reset pointer in array } echo "file verification tests completed.n"; ?>

In conclusion, the md5_file() function was the all around fastest file verification function in PHP. I suspect if a well written crc32_file() function was incorporated into PHP then it would be the best way verify files.