Wednesday, January 11, 2012

COM Interop Part 10: Recapping And Wrapping Up

Wow, this turned out to be a much bigger series that I originally planned, but we covered a LOT. To wrap up, I want to just summarize some of the most important information we've covered so far, in preparation for future posts where we do cool things with COM Interop. Consider this something of a cheat sheet for interop translations.

Link And References Roundup

First, here's a list of all the posts in the series:

Part 1: Introduction
Part 2: Built-In Interop Mechanisms
Part 3: Basics of COM Marshalling
Part 4: A Look At IDL
Part 5: COM Interface Basics
Part 6: Basic Parameter And Return Value Marshalling
Part 7: Local Activation
Part 8: Strings, Structures, Arrays
Part 9: Remote Activation; Dynamically Loaded P/Invoke

In addition to the pages for individual attributes or methods, the following MSDN articles are particularly handy reference guides:

Default Marshalling Behavior
SDK to CLR Data Types

Commonly Used Attributes

Most of the work in COM Interop involves correct application of attributes that change the marshaller's behavior. The most important attributes are in the following list (default, when specified, are for C#; VB defaults may be different):

ComImportAttribute - Specifies that an interface type was imported from a COM type library.

GuidAttribute - When applied to a ComImport interface, specified the COM interface identifier

InterfaceTypeAttribute - (default: ComInterfaceType.InterfaceIsDual) Specifies the style of interface represent by the definition. Possible values:
  • ComInterfaceType.InterfaceIsIUnknown - Standard IUnknown-derived interface; supports early binding only.
  • ComInterfaceType.InterfaceIsIDispatch - IDispatch-derived dispinterface; supports late binding only.
  • ComInterfaceType.InterfaceIsDual - Default; OLE-style dual interface; supports early and late binding.
PreserveSigAttribute - When applied to a ComImport interface method, turns off default method signature rewriting, as described below.

DllImportAttribute - Specifies that a extern method is an imported native P/Invoke function. In addition to the required filename parameter, the following fields can optionally be specified:
  • CharSet - (default: CharSet.Ansi) Specifies the default marshalling behavior for string parameters, and name mangling (see ExactSpelling). Possible values include:
    • CharSet.ANSI - Strings are converted from managed data to an 8-bit characters using a multi-byte character set (ASCII, UTF-8, etc.) Mangled name formed by appending an "A" to function name.
    • CharSet.Unicode - Strings passed using the same 16-bit wide character set as .NET (UTF-16). Mangled name formed by appending a "W" to function name.
    • CharSet.Auto - Selects Unicode or ANSI at runtime based on the running operating system.
  • ExactSpelling - (default: false) Enables or disables alternate entry point name lookup. When set to false, CLR will also search for the "mangled", character-set-dependent entry point name in addition to the original entry point name.
  • EntryPointName - Overrides entry point name in the DLL; useful for providing multiple managed definitions of the same entry point
  • PreserveSig - (default: true) Indicates whether or not COM signature rewriting is enabled, as described below. Default value means signature rewriting is not enabled; to allow rewriting, set PreserveSig to false.
  • CallingConvention - (default: CallingConvention.StdCall) Indicates the calling convention for the function. Valid conventions for P/Invoke functions are CallingConvention.Cdecl and CallingConvention.StdCall.
StructLayoutAttribute - Specifies the unmanaged memory layout of a managed structure. Requires one of the following LayoutKind values (The CharSet field can be optionally be set, with the same behavior as the CharSet field in DllImportAttribute):
  • LayoutKind.Sequential - structure is laid out in source-code order, including any alignment packing.
  • LayoutKind.Explicit - structure is laid out per-field explicitly using FieldOffset attributes per field.
UnmanagedFunctionPointerAttribute - Identifies a delegate type as being a native callback or entry point. Requires specifying the CallingConvention; CharSet field behaves the same as the DllImportAttribute.

MarshalAsAttribute - Provides options to override the default marshalling behavior for a given parameter, field, or return value. See below for more details on marshalling behavior. Requires an UnmanagedType enumeration value; the following are the most commonly used:
  • UnmanagedType.IUnknown - Applies only to object parameters, indicating that the parameter is a COM interface derived from IUnknown.
    • Optional: Set the IidParameterIndex field to the parameter number of a Guid parameter that designates the interface identifier of the interface.
  • UnmanagedType.BOOL - Applies only to bool parameters, indicating that the parameter is a 4-byte Windows SDK BOOL type (see below for alternatives)
  • UnmanagedType.LPWStr - Applies only to string parameters, indicating that the parameter is a LPWSTR (a wchar *), that is, an array of 16-bit characters (see below for alternatives).
    • Optionally set the SizeParameterIndex to the parameter number of an integer-type parameter specifying how long the string is
    • Optionally set the SizeConst parameter to an integer number representing the length of a fixed-size string
  • UnmangedType.LPArray - Applies to array parameters, indicating that the parameter should be passed as a pointer to a value of the array subtype.
    • Optionally set the SizeParameterIndex to the parameter number of an integer-type parameter specifying how many elements the array has.
    • Optionally set the ArraySubType to the UnmanagedType of the array elements; Only needed when there is ambiguity, e.g. object[] or string[].
  • UnmanagedType.LPStruct - Applies to Guid parameters, indicating that the Guid should be passed as a pointer (e.g. a REFIID or REFCLSID) but is not actually a C# ref parameter.
  • UnmanagedType.ByValTStr, UnmanagedType.ByValArray - Applies only to fixed size string or array field (respectively) in a structure, indicating that the entire string/array should be embedded into the structure and not marshalled as a pointer.
    • Set the SizeConst field to the integral size of the string/array.
InAttribute - Specifies that the marshaller should copy data from caller to callee when making a call; see Default Marshalling Behavior below for details.

OutAttribute - Specifies that the marshaller should copy data back from callee to caller when returning from a call; see Default Marshalling Behavior below for details.

Return Value Rewriting

In general, most COM methods return an HRESULT that encodes a variety of data, including the success/failure status of the method. By default, COM Interop methods undergo a signature rewriting process that changes this return value in two different by related ways:
  1. If the HRESULT indicates an error occurred (that is, it is negative), an exception is thrown back to the caller. The runtime will attempt to map well-known HRESULTs to meaningful exceptions (e.g. E_INVALIDPARAMETER becomes an ArgumentException), otherwise a general-purpose ComException is thrown with the appropriate HRESULT field.
  2. The managed signature can choose from three possible options when designating the value to return to a managed caller in the case where the method did not throw an exception:
    • If the managed signature is defined with the same parameters as the unmanaged signature, and returns an int or uint, then the actual HRESULT value is returned.
    • If the managed signature is defined with the same parameters as the unmanaged signature, and returns void, then a successful HRESULT is simply discarded.
    • If the unmanaged signature has an [out] parameter in the last position, the managed signature can optionally omit this parameter and specify its type as the return type. In this case, the value stored in the [out] parameter by the unmanaged code will be used as the managed return value, and the successful HRESULT value is discarded. (This is true even if the [out] parameter is itself an int or uint: if the parameter is omitted from the method signature, its value, and not the HRESULT, will be returned to the caller.)
This behavior can be controlled by the PreserveSig state of the managed type definition. If PreserveSig is set to true, then the unmanaged signature is kept intact, the managed signature must match it exactly (or risk an access violation), and failure HRESULTs are returned as normal. If PreserveSig is false, then the unmanaged signature may optionally move the final parameter to the return value, or else may return void; and failure HRESULTalways throw an exception and will never be returned to the caller. Note that is is possible to use the HRESULT to exception translation without rewriting the return value (by leaving the signature intact), but it is not possible to rewrite the return value while disabling the HRESULT to exception translation.

For methods that are part of a COM interface, the PreserveSig state is false by default, and must be explicitly enabled with the PreserveSigAtttribute. For methods that are defined as P/Invoke functions, the PreserveSig state is true by default, and must be explicitly disabled with the PreserveSig field of the DllImportAttribute. For delegates that are unmanaged function pointers, PreserveSig is enabled by default and cannot easily be disabled.

Default Marshalling Behaviors

The runtime is able to make most marshalling decisions on its own, based on the C# type information included in the method signature. In general, it is only necessary to apply attributes when the C# type information is ambiguous, or when the unmanaged code behaves in an unusual way.

When determining which direction to marshal parameter data on a function call, the following rules apply:
  • For a passed by-value parameter, the marshaller treats the parameter as an [in] parameter of the appropriate type, and data is marshalled only from the caller to the callee.
  • Reference types passed by-value are treated as an [in] parameter of a pointer to the appropriate type, and data is marshalled only from the caller to the callee.
    • StringBuilder is an special case, and by default is treated as an [in, out] parameter.
  • For an out parameter, the marshaller treats the parameter as an [out] parameter that is a pointer to the appropriate type, and only marshals data from the callee to the caller on return from the function.
  • For a ref parameter, the marshaller treats the parameter as an [in, out] parameter that is a pointer to the appropriate type, and marshals the pointed-to data in both directions.
The In and Out attributes can be applied to individual parameters to override this behavior. Keep in mind that applying either parameter completely overrides the default behavior: applying an Out parameter to a value parameter does not result in [in, out] semantics, only [out] semantics. To get both behaviors, apply both parameters explicitly (or use a ref parameter.)

By default, the "appropriate type" for fields/parameters is just the unmanaged equivalent of the managed type. An out or ref parameter of a reference type (for example, an out object) is marshalled as a double pointer, which is usually correct. The In and Out attributes have no effect on whether a parameter is marshalled as a pointer or not. To pass a by-value parameter as a pointer (for example, when using a structure), you must either declare the managed type as a reference type, or else declare the parameter as a ref parameter. The only exception is a Guid, passed to a P/Invoke method as an UnmanagedType.LPStruct.

When determining the equivalent unmanaged type for a particular managed type, in most cases the runtime can automatically map built-in C# types and enumerated types to the same-sized and unmanaged type (and preserve the signed/unsigned nature). For user-defined structure types, the runtime will construct an unmanaged type by recursively mapping the individual fields of the complex type. The runtime will also automatically convert managed interfaces to unmanaged COM interfaces when using strongly-typed parameters. Managed ComVisible class instances are passed to COM interop methods by converting them into a pointer to their default interface (which may be an auto-generated class interface.),

Certain types require special handling. These are types that have different typical representations in COM vs. P/Invoke methods. Due to the ambiguity, these types should always we decorated with the MarshalAs attribute, even when it specifies default behavior. The following list indicates the appopriate UnmanagedType enumeration value to match a particular unmanaged type:
  • object parameters: P/Invoke calls cannot use parameters of type object, but can accept complex types with object fields. The following UnmanagedType values can be applied only to object parameters:
    • UnmanagedType.Struct (default for parameters) - COM VARIANT, despite the name.
    • UnmanagedType.IUnknown (default for structure fields) - Interface derived from IUnknown
    • UnmanagedType.IDispatch - dispinterface derived from IDispatch
    • UnmanagedType.Interface -IDispatch interface if possible, IUnknown otherwise.
  • string parameters: The default marshalling for strings is controlled by the CharSet field in the DllImport, StructLayout, or UnmanagedFunctionPointer attribute applied to a type. (If not specified, CharSet defaults to CharSet.Ansi, which is almost always wrong, so be sure to specify CharSet.Unicode and the appopriate MarshalAs attribute!) The following UnmanagedType values can only be applied to strings (this is not an exhaustive list):
    • UnmanagedType.BSTR (default for COM methods): COM-style BSTR
    • UnmanagedType.LPStr (default for CharSet.Ansi): Pointer to an (8-bit) character array.
    • UnmanagedType.LPWStr (default for CharSet.Unicode): Pointer to a wide character array.
    • UnmanagedType.LPTStr (default for CharSet.Auto): Marshaller will select LPStr or LPWStr at runtime based on operating system.
    • UnmanagedType.ByValTStr: Applies only to fields in structures; field will be marshalled as an inline fixed-length array of characters, whose size depends on the CharSet attribute of the structure (not the operating system).
  • StringBuffer instances can be used in place of string parameters (but not as fields in a structure) in any case where UnmanagedType.LPStr, UnmanagedType.LPWStr, or UnmanagedType.LPTStr is specified. The StringBuffer instance must be pre-allocated to the correct size before calling the function, and should be passed by value (it will automatically behave as an [in, out] parameter.)
  • bool parameters are marshalled as integers of the appropriate size. The following UnmanagedType values can be applied to only booleans:
    • UnmanagedType.BOOL - Windows SDK's BOOL, 4 bytes
    • UnmanagedType.VariantBool - COM's VARIANT_BOOL, 2 bytes
  • The UnmanagedType.U1 type can be used for the obsolete 1-byte BOOLEAN type from early Windows code. UnmanagedType.U1 is a generic "unsigned 1-byte" data type, which is used by default for byte parameters; it also can be used for char parameters, but probably shouldn't (it will prevent the normal CharSet conversion from working).
    • Arrays passed to unmanaged code are passed by-value with[in] semantics only by default, unlike arrays passed between managed methods. You can explicitly specify the In and Out attributes to get bi-directional semantics if required. If the array subtype is one of the other ambiguous types (for example, a string[]), you should also specify the ArraySubType or SafeArraySubType field to the appropriate value. The following UnmanagedType values can only be applied to arrays:
      • UnmanagedType.SafeArray (default) - COM-style SAFEARRAY
      • UnmanagedType.LPArray - C-style, pointer to the first element.
      • UnmanagedType.ByValArray - Fixed-size array embedded within a structure
    That about wraps up this short (ha!) introduction to COM Interop. As you can see from this summary alone, it's a lot to digest, and takes a lot of practice to get comfortable using unmanaged code in your C# applications. Now that we have the basics down, look for future posts that demonstrate some of the real-world applications of COM Interop in your own applications.


    Transistor1 said...

    This was an excellent series; thank you very much for writing it. It really helped me fill in the missing details in understanding marshaling and COM.

    Jon said...

    Great article series, gave me a good understanding on interops. Thank you!

    Dũng Phạm Văn said...

    Hello Michael,

    I'm working on a experiment project in university using COM to communicate with legacy system on IBM AS400.

    I could use com interop to control client console of AS400. But it didn't ran as expectation. Could you please share complete source code of this com interop series?

    If you can please post public link share or send me through email

    Thanks a lot for your valuable articles.

    Dzung, Pham.