Friday, January 06, 2012

COM Interop Part 8: Strings, Arrays and Structures

In the last installment of this increasingly long series, we saw how to activate a new COM object for the first time on a local machine. One of the features of COM programming (and a big deal at the time it first came out) was the fact that remote, or "distributed", COM programming was almost as easy as local programming: all the complex work was handled by DCOM and the client could largely ignore the distinction.

Activating remote COM servers through COM interop turns out to be almost as easy as activating local ones. Translating those methods properly, however, means digging into some new details of the data marshalling logic: arrays, strings and structures. We're starting to get into the really good stuff here, but it's a lot to take in, so hold on tight and lets get to it.

Interop Structures And Strings

We've already seen how to create local instances of COM servers using purely managed code; creating a remote instance is simply a matter of embedding the remote server name into the type we get back from the CLSID:

var type = Type.GetTypeFromCLSID(CustomClsid, "DCOMSRVR");
var instance = Activator.CreateInstance(type);
The COM component in question must be registered on the remote server, not necessarily your local machine, for this to work. Also, keep in mind that GetTypeFromCLSID doesn't really do much work: it just returns a MarshalByRefObject that's ready to be activated by COM, when you ask for it. If something is going to go wrong, it will be the CreateInstance call that does it.

As before, we could accomplish this same feat in unmanaged code, and get back our interface in a single step. This time it's definitely not worth the effort to do this in a real application; even after we have translated everything we need, using it is a total pain. But it is a good introduction to some important pieces of the interop system, so lets walk through it anyway.

To activate a remote COM server, obviously CoCreateInstance won't work, since there's no way to specify the remote machine. Instead, we use a new method introduced for DCOM: CoCreateInstanceEx:
HRESULT CoCreateInstanceEx(
  __in     REFCLSID rclsid,
  __in     IUnknown *punkOuter,
  __in     DWORD dwClsCtx,
  __in     COSERVERINFO *pServerInfo,
  __in     DWORD dwCount,
  __inout  MULTI_QI *pResults
This looks very similar to CoCreateInstance, but hiding in there is one particularly nasty aspect of COM interop. We have two new structured types here that we've never seen before: COSERVERINFO and MULTI_QI. Before we go any further, lets check those out and translate them.

First, COSSERVERINFO, which MSDN tells us is found in objidl.h:
typedef struct _COSERVERINFO
    DWORD dwReserved1;
    LPWSTR pwszName;
    COAUTHINFO *pAuthInfo;
    DWORD dwReserved2;
As C structures go, this one's pretty tame. As you can see, the type name COSERVERINFO is just an alias for struct _COSERVERINFO. This is a pretty universal idiom in C when defining structures. The struct keyword introduces what's called a tag name for the type that follows it, so you'll often see these start with the word "tag" as well. You will rarely see the tag name used directly in code (apart from certain cases where it's mandatory), so we'll ignore it for now.

When translating these structure types, my personal preference is to change the name into something that looks more "managed C#ish", which means Pascal cased, no abbreviations, and basically following the same naming rules as any other type name. We saw this earlier, with Microsoft's renaming of CLSCTX into RegistrationClassContext. So, this structure will henceforth be call a ComServerInfo structure.

The first thing we need to do for a structure is determine what kind of layout it has. The C language provides developers a lot of control over the exact memory organization that a structured type has, including a special type called a union where all of the fields in the structure occupy the same memory. (An example of a union is the PROPVARIANT union used by the shell for property storage. I plan to talk about property storage once this introductory series is done, so we'll see how to translate that eventually.)

The memory footprint of a structured type includes two separate parts: the fields themselves, and the alignment packing added to the structure between fields. This packing process is a very low-level optimization technique that is related to how a CPU actually reads data from memory. The short explanation is, the compiler will add extra padding in between the fields in your structure, to guarantee that every field offset is a specific number of bytes (usually 4 or 8) from the start of the structure.

All of these details about a structure are specified by using the StructLayoutAttribute when declaring the new type. The attribute requires that you provide a LayoutKind, of which there are two (three, actually, but Auto is illegal for interop): Sequential layout is the normal, standard, non-union type where fields are stored in source-code order, with appropriate alignment packing in between. Explicit layout means things aren't arranged quite so neatly, so we are going to tell the compiler exactly how we want it. Note that this is an all or nothing proposition: either every field in the structure is laid out sequentually, or else we have to provide the entire layout structure manually.

The attribute also lets you specify other information: the Pack property lets you change the default alignment, while the Size property allows you to specify a larger-than-normal size for your structure. When we include the two DWORD fields that we already know how to handle, our new structure is starting to take shape:
public struct ComServerInfo
    public uint dwReserved1;
    // LPWSTR pwszName;
    // COAUTHINFO *pAuthInfo
    public uint dwReserved2;

CharSets and Marshalling String Data

The next new element here is that LPWSTR parameter type. As you might have guessed, that's a string parameter. And you will probably not be surprised to learn that COM has a lot of string types, and they're all different. These string types come in two basic flavors. The LPWSTR string type is one of several C-style types, in this case an typedef alias for wchar *, which means we have a pointer to the first charater in an array of 16-bit Unicode characters, ending with a NULL value. There are a number of related types, notably LPSTR, which is the 8-bit ANSI equivalent. The BSTR type is a COM-specific type that was designed to match the VB String type (and thus, is much closer to a C# managed string), in that it includes information such as the string length embedded right into the type. BSTR types always use the wide character set, which in Windows means UTF-16. OLE and ActiveX interfaces tend to use BSTRs; Windows SDK code typically uses the LPWSTR type.

As far as the translation goes, we have a bit of a problem. Obviously, in C#, this is going to be a string parameter. (There's actually one case where an LPWSTR is not a string, which we will hopefully cover later.) But we've already seen that string could be translated into at least two different unmanaged types. In fact, there are eight different ways to marshal a managed string into unmanaged code, when you account for ANSI vs. Unicode and a few highly specialized cases. The default behavior for the marshaller is to select one of those options, based on what kind of interop is being done. This MSDN article has a good breakdown, but essentially, strings used by interface methods are UnmanagedType.BSTR by default, strings used in P/Invoke and structures are UnmanagedType.LPTStr by default.

That LPTSTR thing may need a bit of explanation. TSTR is a special Windows SDK type that was invented to help with the ANSI to Unicode migration, back when people still ran Windows OSs that were not Unicode based. In unmanaged code, the decision was made by the C/C++ compiler, at build time, to alias TSTR to either a char * or wchar *, depending on the state of the _UNICODE preprocessor macro. In the CLR, however, this decision is made at runtime based on which operating system you are running on: on a Windows NT-based system, UnmanagedType.LPTStr means UnmanagedType.LPWStr, while on a 95-based system, it means UnmanagedType.LPStr. This subtle mismatch can cause big problems if you get things mismatched. The most common situation you will see is where the CLR thinks a string is a 2-byte "wide character set" string (UTF-16, usually) but the unmanaged code expects a so-called "multi-byte character set" string (like UTF-8). (For example, your unmanaged code may return an 8 character string like "ABCDEFGH" but your managed code will treat it as the 4 character string "????". Many Windows API calls from that era come in two flavors, the so called A and W flavors, specifically to avoid this problem.) For our purposes, we're going to ignore everything pre-Windows XP and force all our interop to be done in Unicode, as nature intended.

We can change the default behavior for this field in two ways. We've already seen how to apply a MarshalAs attribute to individual parameters, and it works just the same way for fields in a structure. The UnmanagedType enumeration has values for all of the Windows SDK string types, so it's just a matter of picking the same one that's listed in the header, in this case, UnmanagedType.LPWSTR. However, there's a second option, which changes the default for the entire structure. The StructLayout attribute has a CharSet property, which specifies a value from the CharSet enumeration. The default value, if we don't include one is CharSet.Auto, which causes the TSTR behavior. However, we know we want Unicode behavior for this structure, so we can override the default by specifying CharSet.Unicode for that property. Our structure now looks like this:
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct ComServerInfo
    public uint dwReserved1;
    public string pwszName;
    // COAUTHINFO *pAuthInfo
    public uint dwReserved2;
By the way, this is one case where I'm going to break my own general rule, and include an attribute that is redundant. Once we specify the CharSet for the structure, the MarshalAs attribute is now specifying the behavior that the marshaller would already have used on its own. Normally, I would not do that, but string is one of a handful of special cases where I always include a MarshalAs attribute. The reason is, there is not one single "correct" way to marshal a string to unmanaged code; in the same way as there's no one single way to marshal an object. (The other two cases where this happens are arrays and bool, both coming up.) Although I happen to know the default behavior for these cases, it makes the code much less maintainable if I, or more importantly someone else coming after me, has to go look it up. So, all string fields or parameters get MarshalAs attributes, period.

Omitting Optional Structures

We next see that this structure has another new type nested inside of it, which we also need to translate. And, it turns out, that type has another nested type inside it, plus five separate enumerations, and now we start to wonder if this ever ends?

Which brings us to a very important aspect of interop translation, which is knowing your target audience. Apart from wordy blog posts, we generally aren't translating unmanaged code into managed signatures just for fun, but to use them in some actual program. If I were translating this code for a component that other people would use, or which I knew would be used in a lot of remote COM activation scenarios, I would indeed have to follow this chain of new types until the very end. But the documentation for this structure tells me two very important things about that pAuthInfo field:
  1. If I set it to NULL, it will use default values, and
  2. The default values can be overriden per-component in the registry
At this point, I have to decide if the effort involved in translating that type is more or less important the capabilities I lose by not doing so. Further reading does indicate one possibility: I'm losing the ability to specify a different client identity on the remote server. That sounds like it might be useful in the future; if I ever need that feature, I will have to come back and flesh out this translation a lot more. For now, since I know I'm not going to use this feature in practice, it would just be a distraction. So I'm going to use the same trick we used for the aggregation container parameter to QueryInterface, and declare it as an IntPtr so I can make it NULL.

With that decision out of the way, our final structure now looks like this:
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct ComServerInfo
    public uint dwReserved1;
    public string pwszName;
    public IntPtr pAuthInfo;
    public uint dwReserved2;
By comparison, the MULTI_QI structure is a piece of cake. The definition from objidl.h says this:
typedef struct tagMULTI_QI
    const IID *pIID;
    IUnknown *pItf;
    HRESULT hr;
Don't let that const keyword there throw you; the field is not read only. It's actually a "pointer to a const IID", which is a quirky aspect of programming with pointers, that boils down to making the memory pointed to by pIID read-only, as far as this structure is concerned. Since we don't deal with pointers in C#, we'll just pretend the const keyword isn't there. Once we get past that, everything we're left with is stuff we've seen before:
public class MultiQueryInterface
    [MarshalAs(UnmanagedType.LPStruct)] public Guid pIID;
    [MarshalAs(UnmanagedType.IUnknown)] public object pItf;
    public int hr;

Arrays in COM

Now that we have our data structures available, we can get back to translating the function call that uses them. Here is where it becomes crucial to read the documentation, because we have two parameters declared as pointers, that behave quite differently. pServerInfo is a pointer to a structure that could be NULL, like the other ponters we've seen so far. pResults, on the other hand, is an array of structures, and for that we're going to have to give the marshaller a bit of help.

In standard C terms, there is little practical difference between an array of things and a pointer to a thing, so C-style arrays are often declared as pointers. Just like we saw with strings, COM also has second flavor of arrays, a SAFEARRAY, that works more like a Visual Basic array. Internally, the two are very different, but both of them map to the same C# array types. In order to tell the difference, we turn once again to the MarshalAs attribute.

The primary difference between a C-style array and a COM-style SAFEARRAY is the size tracking. SAFEARRAYs are named as such because they include a size value as part of the type; C-style arrays are, in practice, just a block of memory containing a continuous list of elements, with a pointer to the first one. There are various ways to "know" how big a C-style array is but they are all by convention, not by definition. It is common practice within the Windows SDK for any function that has an array parameter to also have a count parameter, specifying how big the array is. For the CoCreateInstanceEx function, we can see from the documentation that the dwCount parameter tells us how big the pResults array is supposed to be.

Knowing that, we can now translate our function call into a P/Invoke signature:
[DllImport("ole32.dll", PreserveSig = false)]
public static extern void CoCreateInstanceEx(
    [MarshalAs(UnmanagedType.LPStruct)] Guid rclsid,
    IntPtr pUnkOuter,
    RegistrationClassContext dwClassContext,
    ref ComServerInfo pServerInfo,
    int dwCount,
    [In, Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 4)]
    MultiQueryInterface[] pResults);
Here, as with strings, we are supplying a MarshalAs parameter that specifies the default behavior anyway. I'm doing this for two reasons. First, the default behavior changes between P/Invoke and COM Interop: in P/Invoke, UnmanagedType.LPArray is the default, while in COM, the default is UnmanagedType.SafeArray. In keeping with our pattern so far, whenever there is ambiguity over what unmanaged type a given managed type can represent, we're going to be explicit.

The other reason is that the SizeParamIndex lets us tell the marshaller which method parameter holds the number of array elements. This is always a good idea if we expect data to come back out of the array at runtime; when going "in" from managed -> unmanaged, the marshaller knows how many elements our array has because it's a managed array with a Length attribute. When coming back "out" from unmanaged -> managed, the marshaller just has a pointer to element zero to work with, so it needs us to tell it where to find that length. (There is also a SizeConst field if you happen to have a fixed-size array)

Also, notice that we tagged our array with the Out attribute, but passed it by value. For arrays, it is almost always an error to declare one as a ref parameter; passing a ref int[] to a method means that the method can replace the entire array with a new array instance, discarding the one you passed in. That's rarely what we want. (And to unmanaged code, introduces an extra level of pointer-ness that's guaranteed to cause a crash.) All we want is for the elements inside the array to be changed, and to get those changes back from COM. That's what the Out attribute does: we are telling the runtime marshaller "I know I passed this by value, but it really isn't an [in] parameter, and I really need you to pay attention to the data when this call returns."

Lastly, we were unable to use the return value rewriting trick here, because the final parameter is not an out parameter; we have to supply it on the call. Since I don't particularly care about the return value (it's either S_OK or we're getting an exception), I just went with void.

So, how do we use this method? Assuming that we had DCOM all set up properly, we just create the appropriate structures and go for it:
var server = new ComServerInfo
    pAuthInfo = IntPtr.Zero,
    pwszName = "DCOMSRVR"
var qis = new[] { 
    new MultiQueryInterface { hr = 0, pItf = null, pIID = typeof(ICustom).GUID } 
Ole32NativeMethods.CoCreateInstanceEx(CustomClsid, IntPtr.Zero, 
    RegistrationClassContext.RemoteServer, ref server, 1, qis);
var custom = qis[0].pItf as ICustom;
Whew. That was a lot of effort for just one function call! All to accomplish essentially the same thing we saw previously with two lines of managed code. It was a good exercise, but not very realistic for production use.

Next time, we'll wrap up the COM activation portion of this series by looking at some other ways to activate a COM instance besides direct activation of a single instance. We'll also see how to activate a COM component that may not even be registered. Unlike the CoCreateInstance functions, these actually do see real use under certain cases, so stay tuned.