Data objects and the clipboard

The clipboard must be the most deceptive item in windows programming. Newcomers receive a big shock when they realize that to place a couple of text sentences in the clipboard they need to go through a manual 100 pages long, staring at funny names like STGMEDIUMs, an unmistakeable sign of evil design and intelligence. Surely there must be another way! Surely Mr. Gates would have thought of some function like PutThatTextInTheClipboardForUsNewbiesPleaze() to deliver us from these demons? <g>

The answer is of course negative. I cannot understand why M/S did not export some easy to use clipboard API. On the other hand, they sure made a good job making it extremely flexible and powerful, which explains why its manual is tall enough to sit on. I think that the best strategy is to attack the beast head on and understand it, rather than live in constant fear. This is going to be a rough ride, but it's worth the effort.

This chapter should be considered as a stepping stone for managing shell data objects, which will be covered in the following chapter. If you already know about data objects you may skip this section. Otherwise all the necessary groundwork will be explained here; dealing with shell clipboard formats will be a natural extension of these basic operations that will be presented.

Topics: Clipboard Basics | The OLE Clipboard | KinderGarden Clipboard

Clipboard Basics

Even my grandmother knows of the clipboard: after she establishes the sex of the computer she's working on — turning it upside down if necessary in search of genitalia <g> — she wants to use the clipboard. It is this extreme user-friendliness that misleads developers into assuming that it should be easy to deal with it programmatically. Alas, the easiest it gets for the end user, the harder it's going to be for the programmer. Is that a cliche or what?

As a warming-up exercise, we are going to look into the old-fashioned clipboard API, which is much easier to work with — but doesn't offer too much as we're going to discover soon. As we all know that it is a shared system-wide resource that applications put data in standard formats that other applications can read, let's jump straight into some code that shows how to place some text in the clipboard.
BOOL SetClipboardText(LPCTSTR pszText)
{
   BOOL ok = FALSE;
   if(OpenClipboard(NULL)) {
      // the text should be placed in "global" memory
      HGLOBAL hMem = GlobalAlloc(GMEM_SHARE | GMEM_MOVEABLE, 
         (lstrlen(pszText)+1)*sizeof(pszText[0]) );
      LPTSTR ptxt = (LPTSTR)GlobalLock(hMem);
      lstrcpy(ptxt, pszText);
      GlobalUnlock(hMem);
      // set data in clipboard; we are no longer responsible for hMem
      ok = (BOOL)SetClipboardData(CF_TEXT, hMem);

      CloseClipboard(); // relinquish it for other windows
   }
   return ok;
}

Before you can do anything with the clipboard you need to OpenClipboard it. This ensures that only one program is manipulating it at each time, safeguarding the contents. The text string has to be placed in global memory so that other applications can access it. Of course, there is no such thing as global memory accessible by all processes in win32, so I couldn't really tell you what is the purpose of GlobalAlloc and its accompanying locking/unlocking antics with GlobalLock and GlobalUnlock, respectively. Still, that's the way to do it. At present I'm reading J.Richter's "Programming Applications for Microsoft Windows" which is a real eye-opener, and I hope it will shed some light on this obscure topic.

Once prepared, the text is entered in the clipboard calling SetClipboardData, which clears any previous contents at the same time. The constant CF_TEXT is a predefined clipboard format. There are a number of standard clipboard formats similarly defined like CF_BITMAP et al. On top of those are various just-as-standard formats referred to as registered. The only difference is that they are defined by standard string constants like CFSTR_SHELLIDLIST instead of numeric constants. In such a case you need to use RegisterClipboardFormat to convert the name to a number you may pass to SetClipboardData.

The CF_xxx format identifiers offer a standard way for interpreting the data in the clipboard. Applications that see the CF_TEXT format expect to find a global text string and not a bitmap. The clipboard doesn't do any conversions for you; it merely trusts that your code knows how to deal with the format you place into it. To cap it all up we call CloseClipboard so that other programs can read the text we placed in. The clipboard is now responsible to GlobalFree the data, not ourselves.

Applications that access the clipboard, even through the old-fashioned API, need to initialise COM using OleInitialize, else nothing will work. The strange thing I discovered when writing the SetClipboardText sample above is that if you omit this initialization, things will seemingly work, all functions returning TRUE like as if all was well, but in fact nothing was placed in the clipboard in the end. I suppose this would be a real head-scratcher for people unfamiliar with COM, who would be thus excused to use foul language venting their anger to miniature$oft <g>.

In the interest of completeness, here's a sample that reads text from the clipboard in the old-fashioned way:
int GetClipboardText(LPTSTR pszBuf, int nLength)
{
   int i = 0;
   *pszBuf = 0;
   if(OpenClipboard(NULL)) {
      HGLOBAL hMem = GetClipboardData(CF_TEXT);
      if(hMem) { // format is available, extract text
         LPCTSTR ptxt = (LPCTSTR)GlobalLock(hMem);
         for( ; ++i < nLength && *ptxt; ) // copy as much as will fit
            *pszBuf++ = *ptxt++;
         *pszBuf = 0; // ensure terminator

         GlobalUnlock(hMem); // we don't free it; owned by clipboard
      }

      CloseClipboard();
   }
   return i; // 0 indicates failure
}

The drill is similar, if not a bit easier since we don't have to do much for global memory management. GetClipboardData requests text contents, if any exist. We could have used IsClipboardFormatAvailable first to ensure that CF_TEXT is actually available, but my proposed approach is as robust and simpler. Once again we just borrow the memory handle from the clipboard, copy the contents to the supplied pszBuf and return it unscathed.

The OLE Clipboard

Ok boyz, this is what the fuss is all about. OpenClipboard et al. are ok if you want to spend your life putting text in and out, but for the serious stuff you need the "modern" data object aware clipboard. Here, instead of placing e.g. some text directly into the clipboard, you create a COM data object which contains the text and place this object in the clipboard. In case you're wondering what's the point complicating things in this manner, let me give you a list of advantages:

The list goes even further. With data objects it is possible for applications to add data in objects originally created by other applications (!). Clients can also register an interest in a specific object and request to be notified when its contents change, using the so called advice sink mechanism. An impressive set of features this, by anybody's standards.

Make no mistake, if you want to add support for all these features in your data objects you are going to have to do much typing and sweating yourself, since you get almost nothing for free. However, knowing about all these features means that you can take advantage of the effort of other people who can afford more staff than you, and therefore have the resources to add all this functionality to their data objects — e.g. minimal$oft.

For example, take a look what happens when you copy some text in WordPad. If you view the clipboard contents using DataObject Viewer — which by the way is an extremely useful tool & part of the Developer Studio — you'll discover no less than eight different formats for the words you copied (people work for their monies in Redmond). How does this viewer tool get all this information? Here's what it must be doing, if my hunch is correct:
IDataObject* pDataObj;
// open the clipboard and access the current data object
HRESULT hr = OleGetClipboard(&pDataObj);

IEnumFORMATETC* pEnumFmt;
// enumerate the available formats supported by the object
hr = pDataObj->EnumFormatEtc(DATADIR_GET, &pEnumFmt);
FORMATETC fmt;
TCHAR szBuf[100];
while(S_OK == pEnumFmt->Next(1, &fmt, NULL)) {
   GetClipboardFormatName(fmt.cfFormat, szBuf, sizeof(szBuf));
   // remaining entries read from "fmt" members
}

pEnumFmt->Release();
pDataObj->Release();

Ain't it funny how COM always complicates things with all those long quirky data names? But we aren't doing anything even remotely complex, just using OleGetClipboard to get the current clipboard contents and we examine the available formats. The main object and central theme to all OLE object transfer — that includes clipboard and drag/drop — is the COM object which exposes IDataObject. This interface has many member functions which we will be looking into in the sequel.

In this example we just used its EnumFormatEtc to "browse" its contents, i.e. the alternative formats supported by this object. Since this is COM, an enumerator object is used for this task, IEnumFORMATETC. If you recall IEnumIDList, the enumerator used for browsing the contents of a folder, you will immediately recognize the pattern. We just call its Next method repeatedly to get each contained item, till we receive a S_FALSE return code which signifies that no more items exist. But what are those FORMATETC types?

Let's consider the enumeration of folder contents. They contain files, and the enumeration object IEnumIDList offers access to PIDLs which uniquely identify each file. The contents of a data object held in the clipboard are generic "data", which may be global memory items, OLE files and so on. IEnumFORMATETC offers a descriptor to all these data formats through a unique FORMATETC struct which can be used to identify each piece of data. I hope the parallel between the two enumerators is clear.

In the old days, a clipboard format constant like CF_TEXT was enough to identify a piece of data in the clipboard. This information can still be found in cfFormat member of FORMATETC, but there are many more on top, facilitating all the extra features of OLE data objects. In the code above we limit ourselves in obtaining the name that corresponds to the format identifier with the help of GetClipboardFormatName. The DataObject Viewer program just prints out the remaining contents in the FORMATETC. You may want to revisit the snapshot to confirm this. Note that most formats are pretty standard, having a dwAspect of DVASPECT_CONTENT (i.e. the actual data) and a storage medium tymed of TYMED_HGLOBAL (global memory handle). In a sense they don't really take advantage of the rich features of data objects; if we just wanted this behaviour we could have stuck with the simpler clipboard API. However, when you see items with mediums of IStorage, that's when it all becomes interesting.

But let us not get carried away yet. After we browse through all the available formats, we release the two COM objects calling the familiar Release method. Note that we don't have to explicitly "close" the clipboard; releasing its data object pointer does the job nicely and other applications can gain access to it.

Let's turn our attention to something more useful, how to read some text from the clipboard, the OLE way. Here we are going to add a new strange word to our vocabulary, the so called STGMEDIUM, which basically is a generalization of the global memory handles used to transfer the actual data. Most probably the name stands for SToraGe Medium.
int GetOLEClipboardText(LPTSTR pszBuf, int nLength)
{
   int i = 0;
   *pszBuf = 0;

   IDataObject* pDataObj;
   HRESULT hr = OleGetClipboard(&pDataObj); // access the clipboard

   // format characteristics of the data we are after
   FORMATETC fmt = {CF_TEXT, NULL, DVASPECT_CONTENT, -1, TYMED_HGLOBAL};
   STGMEDIUM stgm; // receives the data content
   hr = pDataObj->GetData(&fmt, &stgm);
   if(hr == S_OK) { // CF_TEXT format available
      // we asked for a global memory handle, must lock it for access
      LPCTSTR ptxt = (LPCTSTR)GlobalLock(stgm.hGlobal);
      for( ; ++i < nLength && *ptxt; ) // copy as much as will fit
         *pszBuf++ = *ptxt++;
      *pszBuf = 0; // ensure terminator

      GlobalUnlock(stgm.hGlobal);
      ReleaseStgMedium(&stgm); // will call GlobalFree
   }

   pDataObj->Release();
   return i; // 0 indicates no text there
}

If you compare this implementation with its low-tech equivalent GetClipboardText you'll realize that we're not doing anything resembling rocket science just yet <g>. After we obtain access to the clipboard data object, we use its GetData method to request the text. This method accepts a FORMATETC parameter that prescribes how we wish to have the data, and the infamous STGMEDIUM which will receive the data as the object sees fit. Let me explain what I mean by this latter statement.

If the requested CF_TEXT is not available, then GetData will fail, there's no two ways about it. However, there is flexibility in the storage medium the actual data will be placed in. For instance we could use the tymed member of FORMATETC to request the data in either global memory or in a stream, by OR-ing TYMED_HGLOBAL | TYMED_ISTREAM. The data object would acknowledge our capability to handle either of those medium types but in the end of the day it would select one of them, and set the tymed member in STGMEDIUM to reflect that. To recap, the tymed in FORMATETC can specify as many mediums our code can handle and the same-name parameter in STGMEDIUM is the actual medium the object has opted for.

Except for the medium type, the returned STGMEDIUM contains a "handle" to the actual data, efficiently held in a union for the various types supported. We asked for a global memory copy of the data, so we use hGlobal to access it. After that, the drill is the same as with all HGLOBAL data: lock, access and unlock. The important difference is that we own the handle and are responsible to free it after use — in GetClipboardText the clipboard was the custodian, not us. We could have used GlobalFree, but ReleaseStgMedium is more convenient since it understands how to cleanup any of the supported TYMED_xxx. It's one size fits all so we don't have to worry.

ADVANCED: Storage medium cleanup
The pUnkForRelease member in STGMEDIUM is usually zero, implying that the standard cleanup action should be taken, i.e. GlobalFree for TYMED_HGLOBAL etc. However, some data objects may require special destruction procedures for the data, and supply a valid IUnknown in this member. In such a case, the Release method of this pointer should be used instead of the regular cleanup. The object can then take the appropriate actions, which could even be to avoid freeing the data altogether, if they are to be e.g. reused for future clients. At any rate this is just for your information, since ReleaseStgMedium knows how to properly handle both NULL or valid pUnkForRelease values, so you don't have to take any action in your code.

ADVANCED: Impress your boss with IStream

For such a small piece of text data a HGLOBAL is efficient and makes perfect sense. However, you wouldn't command a higher salary without the ability to pull more trix than the next man. So, although it's not a recommended course of action, let's request WordPad's data object — which has been helping us a lot in this chapter — to deliver the text in a stream rather than common-as-grass global memory.

For those in the home audience that haven't heard of COM compound files, IStream is the interface used to manage generic "files". These can be real files, or sub-files in a real file, or even pseudo-files providing sequential access to memory. The latter case is what we are after for our trickery: access whatever text was copied in WordPad just like it was a memory file. IStream has methods which resemble the management of real files. Let's see them in action, modifying GetOLEClipboardText to handle streams.
int GetOLEClipboardTextAsStream(LPTSTR pszBuf, int nLength)
{
   int i = 0;
   *pszBuf = 0;
   IDataObject* pDataObj;
   HRESULT hr = OleGetClipboard(&pDataObj); // access the clipboard

   FORMATETC fmt = {CF_TEXT, NULL, DVASPECT_CONTENT, -1};
   fmt.tymed = TYMED_ISTREAM; // make this prominent
   STGMEDIUM stgm;
   hr = pDataObj->GetData(&fmt, &stgm);
   if(hr == S_OK) { // CF_TEXT format available
      LARGE_INTEGER pos = {0, 0};
      // i don't understand why, but the file has to be rewound first
      hr = stgm.pstm->Seek(pos, STREAM_SEEK_SET, NULL); 
      hr = stgm.pstm->Read(pszBuf, nLength, &i);

      ReleaseStgMedium(&stgm); // will release the IStream
   }

   pDataObj->Release();
   return i; // 0 indicates no text there
}

Most things should be familiar by now. We specifically ask for a TYMED_ISTREAM which is initialized by the object and placed in pstm for us. We then use it like as if it was a regular file, although I can't understand why initially the file pointer is at the end, hence we have to use Seek to rewind the file to its beginning. The Read method should be familiar, too. For cleaning up, we just call ReleaseStgMedium which internally calls pstm->Release(), fingers crossed <g>.

One last thing. If you remember that snapshot generated by DataObject Viewer for the WordPad clipboard data, the advertised tymed for CF_TEXT was hGlobal and not IStream. It is a testament to the power of OLE data objects that our request was granted after all. There's no magic here, just OLE working hard for you.

KinderGarden Clipboard

The hardest thing with OLE clipboard is placing your own data in it. It's not that the API is hard, the problem is managing the data with some IDataObject you have to implement from scratch yourself. We have seen all the feats data objects can pull; it is you that have to insert all this functionality. Without a data object, there's no play.

The other day I waz in a playful mood myself so I dag down in ATL source code and nicked a basic IDataObject implementation, which you can use to explore the OLE clipboard. The main class is CDataObjectImpl which derives from IDataObject and provides some bare-bones functionality, good enough to place some fixed text in the clipboard. The auxiliary CEnumFORMATETCImpl implements the companion IEnumFORMATETC which clipboard needs to probe our dataobject contents.

Obviously I have cheated a little, since both these classes are not real COM objects, they just have the structure to fool the clipboard into make-believe. A "proper" data object could have a class factory, be allocated dynamically and obviously support carrying more than the same old piece of text at all times <g>. Still, we couldn't get away without implementing the basic IUnknown functionality for reference counting, which will help us figure out the lifetime of the object. When a Release call brings the reference counter m_dwRef down to zero, a real object would delete itself. CDataObjectImpl on the other hand is meant to be used on the stack, so it just records the release without "freeing" any resources.

Other than that, a few IDataObject methods are implemented to allow our object to be used by the clipboard and other applications reading from it. Other complicated or irrelevant methods are defined but don't do much except returning E_NOTIMPL, politely informing clients that the service is not supported. That's about it, now let's see how you can set this object to the clipboard.
#include "DataObjectImpl.h"

int main(int argc, char* argv[])
{
   CDataObjectImpl testDO;
   HRESULT hr = OleInitialize(NULL);
   hr = OleSetClipboard(&testDO);

   // access our own object via the clipboard
   TCHAR szBuf[100];
   GetOLEClipboardText(szBuf, 100);

   // if we still own the clipboard, release the darn thing
   hr = OleIsCurrentClipboard(&testDO);
   if(hr == S_OK)
      hr = OleFlushClipboard(); // release clipboards grip
   // else we have been already releazed

   OleUninitialize();
   return 0;
}

This is pretty straightforward use. Our data object is defined on the stack frame and OleSetClipboard is used to place it in the clipboard. The GetOLEClipboardText() sample function defined earlier is then used to read the clipboard contents, which barring any nasty surprises should be "Hello dude!", the only sentence our rather daft data object can spell <g>. Before we exit, we examine whether our object is still active using OleIsCurrentClipboard; if this is the case OleFlushClipboard tells the clipboard to release the object and copy the data. Otherwise, after we exited, the clipboard would have held on to some pointer that is invalid, and who knows what sort of firewerkz would have resulted.

I recommend running this program from the debugger so that you can monitor which methods are called and in what order for the various operations performed on the data object, gratis OutputDebugString which is used extensively in our IDataObject implementation. If you really are in an inquisitive mood, here's a couple of tests you could perform, changing the main() function above:

I won't spoil the surprise but you are going to experience some pretty unexpected results, as I have before you. Or try some other quirky things that you can imagine. You'll be stocked up with party tricks to last you for a long time — and learn a lot about the clipboard at the same time.

ADVANCED: Proxy data objects
If you run the main() example above you'll see that the object pointer returned by OleGetClipboard is not the same as the object we inserted, &testDO. It seems that the clipboard is using a proxy object, which obviously ends up calling our real object, but not all of the times — as you'll discover if you experiment. Why does the clipboard use this proxy? I don't really know the answer, but I've got a couple of hunches. The first is that clipboard wraps data placed from old-fashioned applications using OpenClipboard and the like, so it needs to be using its own data object at all times. Another one is inter-process communication. The data object may be in our address space but it should be accessible to other applications, too. Perhaps that's why the clipboard is furnishing us with another instance — although I have to admit that as a COM object outright, inter-process communication shouldn't be an issue. The most plausible explanation must be the fact that clipboard takes it upon itself to furnish the native data in more storage mediums than the real object may support. The jury is still out on this one.

Additional information

Q109545 - BUG: Retaining Clipboard IDataObject Causes Unexpected Result
Q112530 - HOWTO: Copying a Bitmap to the Clipboard with MFC



Context menu Drag-Drop Contents