PHP file verification – md5_file vs sha1_file vs crc32 and no native crc32_file

The following was originally posted on php.net/crc32/ under the ‘User Contributed Notes’ but was recently removed. Since this information came up in topic again with a colleague, I am making this research available on my blog.

If you are trying to decide on a function for file verification, I came to the conclusion that md5_file() is the best all around solution.

file_crc() function that Bulk at bulksplace dot com posted on php.net/crc32/ is the most efficient solution on Windows for small and medium size files. It is most likely because file_get_contents() uses memory mapping techniques. Unfortunately on Linux (Fedora), the results were slightly better for md5_file().

sha1_file() on large files is slower than md5_file(). The time it takes for the __crc32_file() function found on php.net/crc32/ is linear to the size of the file. I would avoid using __crc32_file(). The file_crc() function will fail when using the file_get_contents() if the file is larger than the PHP.ini memory_limit setting. Windows does not seem to use the memory_limit for file_get_contents(), but I did run into an error ‘FATAL: emalloc(): Unable to allocate x bytes’ when testing iso files.

I ran the following tests on both WindowsXP and Fedora 4 machines.

< ?php // File verification tests by Angelo Mandato (angelo [at] mandato {period} com) // __crc32_file() is very slow, you can uncomment to test for yourself. //require_once('crc32_file.php'); // Copy and paste the contents of the crc32_file() code found on // the php.net crc32 PHP manual page in a new file and save // as crc32_file.php in the same directory as this script. // Get microseconds function GetMicrotime() { list($usec, $sec) = explode(" ", microtime()); return ((float)$usec + (float)$sec); } // file_crc() - function to test function file_crc($file) { $file_string = file_get_contents($file); $crc = crc32($file_string); return sprintf("%u", $crc); } $Methods = array('sha1_file()', 'md5_file()', 'file_crc()'); if( function_exists('__crc32_file') ) $Methods[] = '__crc32_file()'; $directory = '/path/to/directory/'; // Don't forget trailing backslash. $files = scandir($directory); for( $method_index = 0; $method_index < count($Methods); $method_index++ ) { $start_time = GetMicrotime(); while( list($index,$file) = each($files) ) { if( $file != '.' && $file != '..' && is_file($directory.$file) ) { switch( $method_index ) { case 0: { // sha1_file() $value = sha1_file($directory.$file); }; break; case 1: { // md5_file() $value = md5_file($directory.$file); }; break; case 2: { // file_crc() $value = file_crc($directory.$file); }; break; case 3: { // __crc32_file() $value = __crc32_file($directory.$file); }; break; } } else // It is not part of our test results, lets remove it from the array { unset($files[$index]); } } $end_time = GetMicrotime(); echo sprintf("%s took %.03f seconds to calculate %d files.n", $Methods[$method_index], $end_time-$start_time, count($files) ); reset($files); // Reset pointer in array } echo "file verification tests completed.n"; ?>

In conclusion, the md5_file() function was the all around fastest file verification function in PHP. I suspect if a well written crc32_file() function was incorporated into PHP then it would be the best way verify files.

Join My FREE Newsletter

Get the latest news and episodes of the Cloud Entrepreneur Podcast and Angelo’s development blog directly in your inbox!