Exploring the Complete Shell Namespace

Filesystem API is easy to use, but it won't offer any help when it comes to exploring all the quirky namespace extensions and other virtual folders on a computer. Low-level shell COM programming is the only refuge. Its object-oriented design means that a single set of well designed methods should be adequate for exploring both regular filesystem and virtual folders.

Topics: Exploring via COM | Virtual folders | Semi-virtual oddities

Using COM to explore the namespace

There are no two ways about it, using low-level COM to enumerate folders is quirky. Note that I didn't say "difficult", it just takes some time getting used to it. With COM you have to manage objects yourself: create the right one, ask for a specific interface for the functionality you are after, then use it and finally take care to release it. In contrast, in regular API-based programming the operating system is doing the management for you; you just call functions directly, oblivious about objects and things behind the scenes.

To enumerate a folder via COM, you need an IShellFolder pointer to the folder object you're after. We've seen a sketch of how to obtain this interface by parsing a path name in the previous section. The next step is to ask the object to enumerate itself using its EnumObjects method, which creates a new object which exposes an IEnumIDList interface. This enumerator object has the contents we are after, as a collection of local PIDLs, one for each item in the folder.

The best way to illustrate all this is with an example. Let's try to create the COM-equivalent of the EnumerateFolderFS() sample presented earlier, which read the contents of a filesystem folder given a full path to it. The following EnumerateFolder() sample produces the same results in a completely different approach.
#include <shlobj.h>

void EnumerateFolder(LPCTSTR path)
{
   HRESULT hr; // COM result, you'd better examine it in your code!
   hr = CoInitialize(NULL); // initialize COM
   // NOTE: usually COM would be initialized just once in your main()

   LPMALLOC pMalloc = NULL; // memory manager, for freeing up PIDLs
   hr = SHGetMalloc(&pMalloc);

   LPSHELLFOLDER psfDesktop = NULL; // namespace root for parsing the path
   hr = SHGetDesktopFolder(&psfDesktop);

   // IShellFolder::ParseDisplayName requires the path name in Unicode.
   OLECHAR olePath[MAX_PATH]; // wide-char version of path name
   MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, path, -1, olePath, MAX_PATH);

   // parse path for absolute PIDL, and connect to target folder
   LPITEMIDLIST pidl = NULL; // general purpose
   hr = psfDesktop->ParseDisplayName(NULL, NULL, olePath, NULL, &pidl, NULL);
   LPSHELLFOLDER psfFolder = NULL;
   hr = psfDesktop->BindToObject(pidl, NULL, IID_IShellFolder, 
                                 (void**)&psfFolder);
   psfDesktop->Release(); // no longer required
   pMalloc->Free(pidl);

   LPENUMIDLIST penumIDL = NULL; // IEnumIDList interface for reading contents
   hr = psfFolder->EnumObjects(NULL, SHCONTF_FOLDERS | SHCONTF_NONFOLDERS, 
                               &penumIDL);
   while(1) {
      // retrieve a copy of next local item ID list
      hr = penumIDL->Next(1, &pidl, NULL);
      if(hr == NOERROR) {
         WIN32_FIND_DATA ffd; // let's cheat a bit :)
         hr = SHGetDataFromIDList(psfFolder, pidl, SHGDFIL_FINDDATA, &ffd, 
                                  sizeof(WIN32_FIND_DATA));

         cout << "Name = " << ffd.cFileName << endl;
         cout << "Type = " << ( (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
                                ? "dir\n" : "file\n" );
         cout << "Size = " << ffd.nFileSizeLow << endl;
         
         pMalloc->Free(pidl);
      }
      // the expected "error" is S_FALSE, when the list is finished
      else break;
   }

   // release all remaining interface pointers
   penumIDL->Release();
   psfFolder->Release();
   pMalloc->Release();

   CoUninitialize(); // shut down COM
}

Oh dear, that sure was hard work. 56 lines of code to do exactly the same thing EnumerateFolderFS() managed in just 23 lines — less than half. But hey, that's C++ baby, either take it or leave it and join the VB club <g>. Since this is the first real COM example I presented, I'll try to be gentle and explain it thoroughly. Many typical issues in COM programming appear in this code:

You can now appreciate why people say that COM is a technology with a steep learning curve. One has to be familiar with dozens of details before even the simplest "Hello COM" program can be build. And you'd better hurry in that climbing while learning, because mikro$oft have already prepared the COM+ mountain for you to climb next, slippery slopes an'all. Will this torment ever end? <g>

Of course, these issues are all for starters. The main dish is learning about all the COM objects supplied by the framework, finding out what they can do for you, and how to work with them, what interfaces they support, etc. The online documentation is guaranteed to be the most complete and up-to-date source of reference information. As they say, if you don't like reading, probably software development is not your ideal vocation. What you read in that good book published in 1998 may already be obsolete, plus there will be dozens of new objects introduced since then.

ADVANCED: Fending for your address space
You may have noticed in the sample code the abundance of NULL arguments to interface methods, most of the time implying a default action. Although this is easy to do, there's a possible risk here, if the NULL is a pointer meant to receive some result from the method. Take for example pdwAttributes parameter of ParseDisplayName. The docs state clearly that by passing NULL you specify that are not interested to receive any attributes at this time. In an ideal world you would be safe, but in this world there are many cowboys out there developing shell and namespace extensions, who wouldn't think twice before attempting to write on a NULL pointer without doing the proper checks first. Since that offending amateur object runs in the same address space as your app, it will drag you down in its demise. The workaround is to provide dummy variables for all such potential trouble-makers; for our example this would mean defining a "DWORD dummyAttrs = 0;" and passing it on to ParseDisplayName, even if you don't have any intention of using it in the end.

Ok, but what about that namespace exploring?

Sorry folks, I got a bit carried away there. So, let's get back to the subject, folder contents enumeration. The EnumerateFolder() sample will produce almost the same results as a the filesystem version EnumerateFolderFS(). Even the order of the items is the same, which hints that EnumObjects down deep must be using FindFirstFile et al for doing the actual folder reading. But almost the same implies there are some differences:

The most important difference however, is the file date/time information. SHGetDataFromIDList doesn't fill in the file details in WIN32_FIND_DATA as thoroughly as an equivalent FindFirstFile would. Only the modification date is filled in, and even that is rounded to the nearest even second. That's the reason why I mentioned that you need both COM and traditional API to obtain complete information for filesystem folders.

ADVANCED: Stale PIDLs
SHGetDataFromIDList gets all the file data directly from the PIDL, without accessing the disc at all. Microsoft's implementation of filesystem PIDLs stores quite a large amount of data in each SHITEMID (cf. my earlier suggestion), except for those unfortunate creation etc dates of course. This is an advantage since the cached information can be accessed quickly, but it has to be interpreted carefully. A PIDL you obtained yesterday won't necessarily contain accurate information, if for example the file was modified in the meantime. Still, even a stale PIDL is good enough to uniquely identify a file. To convince yourselves, take two PIDLs to the same file, obtained at different times, and see what CompareIDs will return.

Exploring virtual folders

Shell COM may be weak in term of dates, but it really shines when virtual folders are concerned. In fact it is the only way to read into all these namespace extensions that litter your computer. And the beauty of it all is that the same procedure demonstrated in the EnumerateFolder() sample is applicable to both filesystem and virtual folders. That's object-oriented programming at its best. We'll modify the sample so as to read virtual folders, too.

You can identify special folders with pseudo-paths, in the form "::{CLSID}", so there's no urgent need to change the signature of EnumerateFolder(LPCTSTR path). What needs to be changed is the way data are extracted from each local PIDL. Non-filesystem items do not necessarily have any of the usual file attributes like size, dates, etc. It all depends on each specific folder, how it decides to present its content. A virtual folder that allowed you to browse the registry from explorer, would have keys for "folders" and values for "files". Although it is possible to assign "sizes" to these "files", the folder object that implements the extension may not necessarily do so.

On the other hand, the majority of virtual folders have content that is hierarchically organised — there wouldn't be any point building a namespace extension otherwise. The concept of a subfolder still exists, and these can have more subfolders in them as well as plain items. Hence we can always expect to read the item names and a "folder" attribute at minimum. Here's a snippet that demonstrates the idea:
   // this segment is meant to replace the reading loop in EnumerateFolder()

   STRRET strDispName;
   TCHAR szDisplayName[MAX_PATH];

   while(1) {
      // retrieve a copy of next local item ID list
      hr = penumIDL->Next(1, &pidl, NULL);
      if(hr == NOERROR) {
         hr = psfFolder->GetDisplayNameOf(pidl, SHGDN_INFOLDER, &strDispName);
         hr = StrRetToBuf(&strDispName, pidl, szDisplayName, 
                          sizeof(szDisplayName));
         cout << "Name = " << szDisplayName << endl;

         DWORD dwAttributes = SFGAO_FOLDER;
         hr = psfFolder->GetAttributesOf(1, (LPCITEMIDLIST*)&pidl, &dwAttributes);
         cout << "Type = " << ( (dwAttributes & SFGAO_FOLDER )
                                ? "folder\n" : "item\n" );

         pMalloc->Free(pidl);
      }
      // the expected "error" is S_FALSE, when the list is finished
      else break;
   }
   
   // remaining cleanup code as in EnumerateFolder()

You can see that PIDLs are still read via Next; what changes is how one extracts information off them. SHGetDataFromIDList is only valid for filesystem folders, so it's out of the window with it. IShellFolder contains many methods for obtaining standard information about items, that work in both filesystem and virtual folders. Two of them are used above.

GetDisplayNameOf extracts various names for items. We asked for a simple display name using the SHGDN_INFOLDER flag. Another useful flag is SHGDN_FORPARSING which, on it's own, would return a fully qualified pathname to the item — for both filesystem and virtual folders. An interesting combination is SHGDN_FORPARSING | SHGDN_INFOLDER that would return the "real" local name for filesystem items. That could be useful in case users have opted to hide extensions for registered filetypes: in that case the local display name returned by SHGDN_INFOLDER would be extensionless, whereas the addition of SHGDN_FORPARSING flag bit would return the real name, including the extension. The documentation for SHGNO discusses many similarly interesting flag combinations for display names.

GetDisplayNameOf returns item names in a STRRET struct, which is awkward to work with, and leak prone if you are not careful — I plead guilty yer honour <g>. Thankfully since shell v4.71 StrRetToBuf does all the conversions worry-free, giving you a plain text string to work with. For better efficiency you can initialise STRRET's uType member to STRRET_OFFSET before calling GetDisplayNameOf. This will instruct the object to read the name directly from the PIDL, if possible. Still objects are not obliged to honour your request and can change the uType to whatever they see fit. This will always be the case if you ask for full parsing names, which aren't kept in the PIDL anyway.

GetAttributesOf tells us whether an item is a folder or plain, among other things. Don't confuse shell attributes with MS/DOS file attributes like system, archive etc. There is some partial equivalence for attributes like "folder-ness", but shell properties go a long way further. For efficiency you should initialise rgfInOut parameter with the attribute bits that you are interested in, otherwise GetAttributesOf will determine all known attributes, which is time wasting. In our example we're only interested in the SFGAO_FOLDER bit so we set it upfront; if the item turns out to be plain, GetAttributesOf will simply clear the bit.

There are many more attributes shell items expose, many of which we will be discovering in later sections of this document. You can get icons, context menus and loads of other things. If you are curious you can examine the documentation of IShellFolder for the complete story. Windows 2000 has brought a mini-revolution in the namespace, extending and formalising the handling of columns in detailed views. This extra information is available through the new IShellFolder2 interface exposed by folder objects, which we will be examining at a later point.

ADVANCED: Special semi-virtual folders

The namespace contains some filesystem folders which are irregular, like the system "Fonts" folder. When this is browsed with windows explorer, instead of the real filenames you get to see the font names contained in each file. Another example is "My Briefcase". How does explorer pull those tricks?

The answer lies in a file named desktop.ini. The existence of such a file advertises some special features supported by the folder. The extension ".ini" suggests an old-style settings file, which can be examined using GetPrivateProfileString. The most interesting section is [.ShellClassInfo]. Any user can edit this file to provide custom icons, infotips etc. More interesting for our subject is the existence of a "CLSID={guid}" key in this section. This is one of the ways namespace extensions advertise their existence and association to a certain filesystem folder.

Sure enough, special folders like Fonts and Briefcases have a "desktop.ini" file, but things don't go by the numbers. A casual search for desktop.ini in my win98 system has revealed that CLSID is the least common key appearing under [.ShellClassInfo]. More often you'd find keys called UICLSID or CLSID2, which never appear in the documentation. Some further tinkering about revealed that none of these COM objects expose IShellFolder, which would be expected by any decent namespace extension.

For example, Fonts advertise an object with UICLSID={BD84B380-8CA2-1069-AB1D-08000948F534}. Using CLSIDFromString and then instantiating this object with CoCreateInstance will succeed, but the only interface I've managed to query for is IShellView, which is used by explorer only. The strange thing is that queries for IUnknown fail too, which kind of throws the spanner in the works in terms of conformance to the COM model. Briefcase has a regular CLSID key but the associated object doesn't support any interfaces I am aware of, not even IPersistFolder.

All in all, without IShellFolder such partial extensions are of no interest to namespace explorers like 2xExplorer. The best that can be done is to use the filesystem provided IShellFolder which will read the "raw" contents.

Additional information

TN059 - Using MFC MBCS/Unicode Conversion Macros
Q210341 - INFO: Unicode Support in Windows 95 and Windows 98
Q198871 - PRB: IShellFolder::GetDisplayNameOf Returns Names with GUIDs
Q235630 - PRB SHGetFileInfo Caches Drive Information
Q179364 - HOWTO: Determine if File Extension Should Be Shown for a File
SHLoadNonloadedIconOverlayIdentifiers - Nothing important there; I just thought I'd share with you this shell API with the longest, daftest name, which would never be useful to anybody <g>




Filesystem exploration File managment Contents