When files are being uploaded, you cannot rely on the MIME typeWikipedia: A media type (formerly known as MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official ... the Web browserWikipedia: A web browser (commonly referred to as a browser) is application software for accessing the World Wide Web. When a user requests a web page from a particular website, the web browser retrieves the nec... sends. This data is entirely under the control of the user and it will not necessarily be accurate. This may be malicious (e.g. a user uploading a .php file and pretending it’s an image) or accidental (e.g. a user thinking that simply renaming .tiff to .jpeg makes it such). Either way, your code should check files uploaded to get a better idea of their type.
Many files have a signature – sometimes called a magic number. These are the first few bytes of the file and should fairly accurately represent what type of file it is. The number of bytes depends on the format. For example, the hexadecimal representation of the signature of JavaWikipedia: Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended t... bytecode files is CAFEBABE.
Say we had some image files uploaded and wanted to be sure they were JPEG, GIF or PNG. We could use the following function to do this:
In essence, I manually specify a list of allowed 4 bytes signatures as hexadecimal, read 4 bytes from the file and hex it and then compare this to the list. This function obviously needs some error checking and could be an awful lot smarter and more adaptable, but it’s hopefully a start for you. You’ll notice that I define two different types of JPEG – JFIF and EXIF. The latter is a newer format and is used by iPhones (as well as other devices). There are also proprietary JPEG formats which exist, for example Samsung’s format – these have different signatures.
Sadly, this technique cannot be used to detect the type of files which don’t have a signature – e.g. plain text files such as PHP scripts.