PHP Logo

PHP – Accurately detecting the type of a file

Posted by

When files are being uploaded, you cannot rely on the MIME typeWikipedia: In information and communications technology, a media type,[1][2] content type[2][3] or MIME type[1][4][5] is a two-part identifier for file formats and content formats. Their purpose is comparable to... the Web browserWikipedia: A web browser is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the u... sends. This data is entirely under the control of the user and it will not necessarily be accurate. This may be malicious (e.g. a user uploading a .php file and pretending it’s an image) or accidental (e.g. a user thinking that simply renaming .tiff to .jpeg makes it such). Either way, your code should check files uploaded to get a better idea of their type.

Many files have a signature – sometimes called a magic number. These are the first few bytes of the file and should fairly accurately represent what type of file it is. The number of bytes depends on the format. For example, the hexadecimal representation of the signature of JavaWikipedia: Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended t... bytecode files is CAFEBABE.

Say we had some image files uploaded and wanted to be sure they were JPEG, GIF or PNG. We could use the following function to do this:

<?php

function check_type($filename) {
    // PNG, GIF, JFIF JPEG, EXIF JPEF (respectively)
    $allowed = array('89504E47', '47494638', 'FFD8FFE0', 'FFD8FFE1');

    $handle = fopen($filename, 'r');
    $bytes = strtoupper(bin2hex(fread($handle, 4)));
    fclose($handle);

    return in_array($bytes, $allowed);
}

In essence, I manually specify a list of allowed 4 bytes signatures as hexadecimal, read 4 bytes from the file and hex it and then compare this to the list. This function obviously needs some error checking and could be an awful lot  smarter and more adaptable, but it’s hopefully a start for you. You’ll notice that I define two different types of JPEG – JFIF and EXIF. The latter is a newer format and is used by iPhones (as well as other devices). There are also proprietary JPEG formats which exist, for example Samsung’s format – these have different signatures.

Sadly, this technique cannot be used to detect the type of files which don’t have a signature – e.g. plain text files such as PHP scripts.

2 comments

  1. Good tip, Phil. But, i usually don’t do file validation that complex. What i do is i get the file name, and use the explode and end functions to get the last bit of after the “.” and use an in_array function to validate it.
    Example:

    <?php
    function ifImage(){
    $name = "file.jpg";

    $ext = end(explode('.', $name));

    $allowed_ext = array('jpg', 'jpeg', 'png', 'gif');

    if(in_array($ext, $allowed_ext)){
    return true;
    }

    ?>

  2. Andrew,

    That is indeed an easier and less intensive way to detect the type. It does however allow files such as .exe to be uploaded as .jpeg. This is a technique used by malware to update itself as it can download update.jpeg, rename it to update.exe and run it. As such, your site ends up hosting the malicious file. Obviously my method doesn’t make this impossible – rather just a little harder.

    I’ve also have had luck with this technique for identifying the real type of a file. If a user has renamed a .png to a .jpeg and your code passes this into imagecreatefromjpeg(), it will fail. I use the technique to call the correct imagecreate function and name the file correctly when it’s saved.

    Phil

Leave a Reply

Your email address will not be published. Required fields are marked *