[xplorer˛] — Inside file type storage
home » blog » 11 March 2007
 

I was planning a basic series of articles on text files and searching in them, when I realized that first I would have to clarify a more elementary notion, that of a "file". Surely everyone knows what a file is, you say, but do you know what makes one file play music and another show a picture? How does the computer know what to do with each file type? I understand that the people that appreciate xplorer˛ are computer savvy; but I suspect even in this league there must be people that would find this discussion illuminating. Ok if you know how to generate an NMI with a pen then probably you don't need to read any further, but other lesser mortals please read on!

Computers seem smart but down deep it's all about numbers. Pushing numbers around and manipulating them, that's all a computer can do. All files, be it documents, videos, pictures, or source code are numbers stored on disk. Executable programs themselves are just numbers. Then we have a series of mappings that translate the numbers into more flexible features like letters and so on. In plain text files the number 65 corresponds to letter "A", so when notepad reads a text file and finds the number 65 it prints out an "A". In monochrome bitmaps, the binary equivalent of a number is mapped into screen pixels, 1 stands for black, 0 for white. A file type is a contract that specifies how to map numbers to the particular features of the document.

Being just numbers, it could get kind of hard telling the various file types apart. Windows have a very simple mechanism for this, each file type gets its own name extension. When explorer sees a file called music.txt it will recognize the extension .txt as a text file and it will open it with notepad. The base name reflects the content, the extension reveals its format so to speak. Note that naming a text file music.txt as we did will not fool the system into believing it is a sonnata. The base name is irrelevant in this respect; you just pick it so that it reminds you what you've stored in the file.

Parenthetically, if you can't see the extensions of file names, either use xplorer˛ extension column (press <Alt+K> to select columns in detailed view) or turn extensions on from control panel (go to Folder Options > View tab and clear the checkbox "Hide extensions for known file types"). This simple action will raise your geek stock value to all your friends and relatives, easy!

Historically extensions are 3-letters long, reflecting the limitations of old file naming conventions (8 letters base, 3 letters extension), although nowadays 4 or more letters can also be used (e.g. xplorer˛ uses .cida for its scrap container "documents"). The registry key HKEY_CLASSES_ROOT has a list of all the extensions and the registered programs that handle each file type (extension). Notepad is normally associated with .txt files, Irfanview with .gif (image) files and so on. Some other day we'll see how you can use the framework to associate more than one program with each file type (using Folder options > File types tab).

Now that you're all pumped up after enabling file extensions by default, you may want to get cheeky and start changing them. So you take your music.txt and you rename it to music.gif. You will find this to be a futile experiment — be my guest to try though, renaming extensions is perfectly legal and valid file operation. You will certainly fool explorer into launching irfanview or whatever viewer you have associated with GIF image files, but the load will fail: the internal representation of GIF and TXT files is totally different. That's why explorer warns you "If you change a file name extension the file may become unusable".

Almost all documents save some extra numbers in the beginning of the file as a means for type identification. So a picture data file, on top of the numerical values that correspond to the size and colors will have a few bytes at the start signifying the format (type). You can do some detective work to see it yourself. The draft tab of xplorer˛ quickviewer pane can be switched to show the raw content of files be it image or whatever (right click and pick Text only from the context menu (snapshot). Selecting our GIF file then will not show the picture but its raw numbers, what is really stored on disk. You see from the snapshot that GIFs begin with a sequence of numbers (47, 49, 46, ...) that "read" GIF89a. This identification lets a picture format aware application know what the file content is all about. Windows only care about the extension, but Irfanview cares about the right content too!

So what if you get a text file, type "GIF89a" at the beginning and save it with GIF extension? That is an exercise that's left to you upstart hackers :)

Post a comment on this topic

Digg Digg this   del.icio.us Add to del.icio.us

 

 

What would you like to do next?

Reclaim control of your files!
  • browse
  • preview
  • manage
  • locate
  • organize
Download xplorer2 free trial
"This powerhouse file manager beats the pants off Microsoft's built-in utility..."

download.com
© 2002—2007 Nikos Bozinis, all rights reserved