Component Object Model (COM) basics

COM is tricky. It's a technology that intimidates everyone at first sight. Fortunately you don't need to know much about COM to start exploring the shell. Trust me, when I first started programming 2xExplorer back in late 1998 I didn't have the foggiest idea about the whole thing (which should help explain why it was a whole year before drag/drop etc was supported by the program).

Viewed simplistically, COM is a set of objects that do "stuff" for you. From such a "client" perspective it's not too hard to use COM. All you need to do is call CoInitialize at the beginning of the program to boot the COM subsystem, and if you should be so kind, remember to cleanup the infrastructure by calling CoUninitialize just before you quit.

Inbetween, you get access to various objects through so called "interfaces", which are pointers that expose the functionality that you are after. After you are done with an "object" you need to Release it, returning the resources to the system. Finally, if you indirectly obtain some memory allocated during your interaction with objects, it is your responsibility to Free it.

Surely, that's not too hard, is it? Being a mere client, you may remain oblivious about the complicated trickery that the COM subsystem has to come up with so that you can use the various objects. All that remains is to know what objects exist and what sort of things can they do for you. Hey, that's what visual basic programmers do all the time, need I say more? <g>

Topics: Object basics | The SuperUnknown | Shell Objects

Object basics

This is the object oriented era. COM objects are abstractions for chunks of self-contained code that expose some functionality. For instance, there are objects that know how to read the contents of any folder, objects that can extract thumbnail images of image files, and so on. Each object has a unique identifier called a CLSID, which presumably stands for "class identifier". Technically speaking these are 128-bit numbers that look like {00BB2763-6A77-11D0-A535-00C04FD7D062}. Thankfully, for commonly used objects there are descriptive names that you can use instead, like CLSID_AutoComplete, which arguably is much friendlier to use. You may also hear of terms like GUID, IID etc, but they all refer to the same 128-bit unique identifier principle.

Most COM objects are registered in your system. Anybody who has hacked about with regEdit.exe, the registry editor, will have undoubtedly come across a key called HKEY_CLASSES_ROOT\CLSID, which contains about a million subkeys or so, with funny names like the {00BB2763-6A77-11D0-A535-00C04FD7D062} one mentioned above. Well, now you know what all these items stand for: they are COM objects registered by the applications installed on your computer. More often than not, there is some DLL (dynamic link library) that implements a COM object. Each DLL may actually contain more than one object.

Once you know about an object, the next thing you need is to literally take advantage of it. Objects offer access to their functionality through the so called "interfaces". You may think of interfaces as groups of methods, or function calls, that perform a certain task. Each object exports at least one interface, and usually more than that. For example, shell folder objects export a number of interfaces; one is IShellFolder, which offers methods to enumerate contents, get file names and attributes etc; another is IContextMenu, which can show a context menu for items contained in a folder; and so on. Interfaces have unique identifiers, too, like COM objects, only here they are called IIDs (interface identifiers). However, down deep they are the same 128-bit numbers like CLSIDs.

Let us consider an example. Let's assume that there is this COM object that has the identifier CLSID_MyObject, which exports an interface called IMyInterface, whose identifier is IID_IMyInterface. This interface contains a method called MethodIReallyNeed(), which for the sake of simplicity takes no arguments. Here's how you could access this method:

CoInitialize(NULL); // absolutely essential: initialize the COM subsystem
IMyInterface* pIFace;
// create the object and obtain a pointer to the sought interface
CoCreateInstance(CLSID_MyObject, NULL, CLSCTX_ALL, IID_IMyInterface, &pIFace);
pIFace->MethodIReallyNeed(); // use the object
pIFace->Release(); // free the object
CoUninitialize(); // cleanup COM after you're done using its services

The above code fragment demonstrates the basic steps of using COM objects. First of all you have to initialize COM using CoInitialize. Then you instantiate the object you're after and request the target interface, all in one stroke using the CoCreateInstance API. If successful, this will return a pointer to the requested interface, that will allow you to use the object. After all is said and done, it's time for cleaning up: you need to free up the interface you requested (and hence the object itself) by calling the Release method, which is supported by all COM objects. Finally, the COM subsystem itself is shut down using CoUninitialize.

If you are using MFC, it's easier to initialise COM using AfxOleInit early within your application's InitInstance. This internally energizes COM using OleInitialize, which must be used instead of CoCreateInstance for any applications that use the system clipboard or implement drag/drop. In such cases, COM is powered down using OleUninitialize, but MFC is doing this automatically so you don't have to worry.

The SuperUnknown

Interfaces are instruction manuals for COM objects. They contain methods which advertise what an object can do. You can't do anything else with objects but call the methods they expose through their interfaces. Let's take a look at the definition of the most elementary COM interface called IUnknown:

interface IUnknown {
public:
   virtual HRESULT QueryInterface(REFIID riid, void** ppvObject) = 0;
   virtual ULONG AddRef(void) = 0;
   virtual ULONG Release(void) = 0;
};

People familiar with C++ shouldn't have any problems understanding this definition. The keyword "interface" is an alias for struct, so there's nothing peculiar there. Note that all methods are public (else there would be no point in presenting them to the world) and pure virtual, making sure that objects that derive (in the usual C++ class inheritance scheme) from this interface will do the actual implementation for these methods. An interface is just the protocol; the object is free to use whatever means to achieve the result expected by some method.

The definition of IUnknown is characteristic in that it lacks member variables. Clearly, the real object that exposes IUnknown will have many member variables but it won't make them available to outsiders. All communication will have to be achieved just through interface function calls. For example, the QueryInterface method above "returns" a new interface pointer to the caller via the second argument ppvObject.

HRESULT is a common return data type from most interface functions, describing the success or failure status of the call. Although this is a basic 4-byte data type, it is broken down into subfields and needs special macros to be interpreted. Most of the time you'd be using SUCCEEDED or FAILED, which are self-explanatory.
TIP: Developer Studio comes with a handy little tool called Error Lookup which is available from the Tools menu. Except from regular windows error codes it also understands HRESULTs, and will provide a textual explanation for many COM errors, helping you figure out what went wrong.

All COM objects have to expose IUnknown. This is the trademark of COM, and starting point for all operations. QueryInterface is the most useful method here, since it asks the object for other interface pointers it supports. We've already seen the use of Release that frees a COM object after you're done using it. You wouldn't be calling AddRef directly often.

IUnknown is also the base class for all other COM interfaces. Hence, if you have a pointer to any interface, you can use its QueryInterface, inherited from IUnknown, to request another interface supported by the same object. Similarly, you can use the inherited Release method for freeing up any interface. Once all the interface pointers obtained are released, the object itself can stop functioning, usually by self-destruction.

ADVANCED: Object life-cycle
All COM objects are dynamically created by class factories. Each object manages its own lifetime using reference counting, which simply is the number of clients holding outstanding interface pointers. When this number reaches 0, the object performs a "delete this;" suicide act to free up the resources. When you first obtain an interface pointer, say via CoCreateInstance, the object's reference counter is 1. Subsequently each successful call to QueryInterface, extracting more interface pointers from the object, increases the counter by 1. When you Release an interface, this counter is reduced by one. The object is not deleted yet, since it still serves clients. It is only when the last interface pointer is released that the object can unload itself. Sometimes this reference counter can be manipulated, as e.g. manually calling AddRef to place a stranglehold to the object, ensuring that it remains alive — just don't forget to release this extra reference, too.

Shell Objects

Microsoft developers have written a lot of code for the windows shell. This is organised in a number of COM objects that can do almost everything you could imagine to manipulate the filesystem and its interface to the end users. On the downside, there are tons of objects to deal with, many exposing too many interfaces. All in all there's a steep learning curve here; you'd have to comprehend many items before you attempt to do even the simplest operations. But I can assure you that once you reach such a point of maturity, satisfaction is guaranteed.

There are several ways you could get hold of some shell COM object. Most frequently an interface pointer would be returned by some windows API call like SHGetDesktopFolder, which returns the IShellFolder object of the desktop. Once you have access to such an object, you may obtain other objects and interfaces through regular methods. For example, IShellFolder::GetUIObjectOf will give you access to a number of useful interfaces like IContextMenu, IDropTarget etc.

Rarely, you'd instantiate objects directly through CoCreateInstance like in the sample code above, as for example when dealing with the IShellLink interface, that deals with creating and resolving links (shortcuts) to other shell objects. The common issue in all shell operations is that once you have a pointer to some interface, regardless of the exact route this was obtained, you need to call its Release method once you're done using it, so that the COM object can be freed up.
TIP: The GUIDs for all shell objects that you'd need to access directly are already defined in header files, so instead of the 128-bit number you just need to know the equivalent constant identifier. More often than not this is the interface name prepended by IID_; for instance the GUID for IShellLink is the constant IID_IShellLink.

A shell object frequently used is the shell memory manager, IMalloc. The idea is that the shell should provide a way for objects to allocate and free memory in a language independent fashion. Whenever some shell object allocates some memory as a result of your calling some of its methods, it is your responsibility to free that memory. A common example is PIDLs, which are the shell equivalents of filesystem pathnames to files. Whenever you use a shell object's method that returns a PIDL, it is your responsibility to free up the allocated memory, using shell's allocator object. You can obtain a pointer to this object by calling the SHGetMalloc API. The method you'll be using most frequently is IMalloc::Free, to release memory returned by some shell interface. Note that the shell memory manager is a COM object itself, so after you're done using it you need to Release it.

ADVANCED
All shell objects are supplied by in-process DLL servers. This is more efficient since they execute in the same address space as the main application that utilises them. Each object runs in it's individual COM appartment, hence there are usually no problems for thread synchronisation; the COM subsystem ensures that only one client accesses these objects at each time.




Shell fundamentals Contents