Link And References Roundup
First, here's a list of all the posts in the series:
Part 1: Introduction
Part 2: Built-In Interop Mechanisms
Part 3: Basics of COM Marshalling
Part 4: A Look At IDL
Part 5: COM Interface Basics
Part 6: Basic Parameter And Return Value Marshalling
Part 7: Local Activation
Part 8: Strings, Structures, Arrays
Part 9: Remote Activation; Dynamically Loaded P/Invoke
In addition to the pages for individual attributes or methods, the following MSDN articles are particularly handy reference guides:
Default Marshalling Behavior
SDK to CLR Data Types
Commonly Used Attributes
Most of the work in COM Interop involves correct application of attributes that change the marshaller's behavior. The most important attributes are in the following list (default, when specified, are for C#; VB defaults may be different):
ComImportAttribute - Specifies that an interface type was imported from a COM type library.
GuidAttribute - When applied to a ComImport interface, specified the COM interface identifier
InterfaceTypeAttribute - (default: ComInterfaceType.InterfaceIsDual) Specifies the style of interface represent by the definition. Possible values:
- ComInterfaceType.InterfaceIsIUnknown - Standard IUnknown-derived interface; supports early binding only.
- ComInterfaceType.InterfaceIsIDispatch - IDispatch-derived dispinterface; supports late binding only.
- ComInterfaceType.InterfaceIsDual - Default; OLE-style dual interface; supports early and late binding.
DllImportAttribute - Specifies that a extern method is an imported native P/Invoke function. In addition to the required filename parameter, the following fields can optionally be specified:
- CharSet - (default: CharSet.Ansi) Specifies the default marshalling behavior for string parameters, and name mangling (see ExactSpelling). Possible values include:
- CharSet.ANSI - Strings are converted from managed data to an 8-bit characters using a multi-byte character set (ASCII, UTF-8, etc.) Mangled name formed by appending an "A" to function name.
- CharSet.Unicode - Strings passed using the same 16-bit wide character set as .NET (UTF-16). Mangled name formed by appending a "W" to function name.
- CharSet.Auto - Selects Unicode or ANSI at runtime based on the running operating system.
- ExactSpelling - (default: false) Enables or disables alternate entry point name lookup. When set to false, CLR will also search for the "mangled", character-set-dependent entry point name in addition to the original entry point name.
- EntryPointName - Overrides entry point name in the DLL; useful for providing multiple managed definitions of the same entry point
- PreserveSig - (default: true) Indicates whether or not COM signature rewriting is enabled, as described below. Default value means signature rewriting is not enabled; to allow rewriting, set PreserveSig to false.
- CallingConvention - (default: CallingConvention.StdCall) Indicates the calling convention for the function. Valid conventions for P/Invoke functions are CallingConvention.Cdecl and CallingConvention.StdCall.
- LayoutKind.Sequential - structure is laid out in source-code order, including any alignment packing.
- LayoutKind.Explicit - structure is laid out per-field explicitly using FieldOffset attributes per field.
MarshalAsAttribute - Provides options to override the default marshalling behavior for a given parameter, field, or return value. See below for more details on marshalling behavior. Requires an UnmanagedType enumeration value; the following are the most commonly used:
- UnmanagedType.IUnknown - Applies only to object parameters, indicating that the parameter is a COM interface derived from IUnknown.
- Optional: Set the IidParameterIndex field to the parameter number of a Guid parameter that designates the interface identifier of the interface.
- UnmanagedType.BOOL - Applies only to bool parameters, indicating that the parameter is a 4-byte Windows SDK BOOL type (see below for alternatives)
- UnmanagedType.LPWStr - Applies only to string parameters, indicating that the parameter is a LPWSTR (a wchar *), that is, an array of 16-bit characters (see below for alternatives).
- Optionally set the SizeParameterIndex to the parameter number of an integer-type parameter specifying how long the string is
- Optionally set the SizeConst parameter to an integer number representing the length of a fixed-size string
- UnmangedType.LPArray - Applies to array parameters, indicating that the parameter should be passed as a pointer to a value of the array subtype.
- Optionally set the SizeParameterIndex to the parameter number of an integer-type parameter specifying how many elements the array has.
- Optionally set the ArraySubType to the UnmanagedType of the array elements; Only needed when there is ambiguity, e.g. object or string.
- UnmanagedType.LPStruct - Applies to Guid parameters, indicating that the Guid should be passed as a pointer (e.g. a REFIID or REFCLSID) but is not actually a C# ref parameter.
- UnmanagedType.ByValTStr, UnmanagedType.ByValArray - Applies only to fixed size string or array field (respectively) in a structure, indicating that the entire string/array should be embedded into the structure and not marshalled as a pointer.
- Set the SizeConst field to the integral size of the string/array.
OutAttribute - Specifies that the marshaller should copy data back from callee to caller when returning from a call; see Default Marshalling Behavior below for details.
Return Value Rewriting
In general, most COM methods return an HRESULT that encodes a variety of data, including the success/failure status of the method. By default, COM Interop methods undergo a signature rewriting process that changes this return value in two different by related ways:
- If the HRESULT indicates an error occurred (that is, it is negative), an exception is thrown back to the caller. The runtime will attempt to map well-known HRESULTs to meaningful exceptions (e.g. E_INVALIDPARAMETER becomes an ArgumentException), otherwise a general-purpose ComException is thrown with the appropriate HRESULT field.
- The managed signature can choose from three possible options when designating the value to return to a managed caller in the case where the method did not throw an exception:
- If the managed signature is defined with the same parameters as the unmanaged signature, and returns an int or uint, then the actual HRESULT value is returned.
- If the managed signature is defined with the same parameters as the unmanaged signature, and returns void, then a successful HRESULT is simply discarded.
- If the unmanaged signature has an [out] parameter in the last position, the managed signature can optionally omit this parameter and specify its type as the return type. In this case, the value stored in the [out] parameter by the unmanaged code will be used as the managed return value, and the successful HRESULT value is discarded. (This is true even if the [out] parameter is itself an int or uint: if the parameter is omitted from the method signature, its value, and not the HRESULT, will be returned to the caller.)
For methods that are part of a COM interface, the PreserveSig state is false by default, and must be explicitly enabled with the PreserveSigAtttribute. For methods that are defined as P/Invoke functions, the PreserveSig state is true by default, and must be explicitly disabled with the PreserveSig field of the DllImportAttribute. For delegates that are unmanaged function pointers, PreserveSig is enabled by default and cannot easily be disabled.
Default Marshalling Behaviors
The runtime is able to make most marshalling decisions on its own, based on the C# type information included in the method signature. In general, it is only necessary to apply attributes when the C# type information is ambiguous, or when the unmanaged code behaves in an unusual way.
When determining which direction to marshal parameter data on a function call, the following rules apply:
- For a passed by-value parameter, the marshaller treats the parameter as an [in] parameter of the appropriate type, and data is marshalled only from the caller to the callee.
- Reference types passed by-value are treated as an [in] parameter of a pointer to the appropriate type, and data is marshalled only from the caller to the callee.
- StringBuilder is an special case, and by default is treated as an [in, out] parameter.
- For an out parameter, the marshaller treats the parameter as an [out] parameter that is a pointer to the appropriate type, and only marshals data from the callee to the caller on return from the function.
- For a ref parameter, the marshaller treats the parameter as an [in, out] parameter that is a pointer to the appropriate type, and marshals the pointed-to data in both directions.
By default, the "appropriate type" for fields/parameters is just the unmanaged equivalent of the managed type. An out or ref parameter of a reference type (for example, an out object) is marshalled as a double pointer, which is usually correct. The In and Out attributes have no effect on whether a parameter is marshalled as a pointer or not. To pass a by-value parameter as a pointer (for example, when using a structure), you must either declare the managed type as a reference type, or else declare the parameter as a ref parameter. The only exception is a Guid, passed to a P/Invoke method as an UnmanagedType.LPStruct.
When determining the equivalent unmanaged type for a particular managed type, in most cases the runtime can automatically map built-in C# types and enumerated types to the same-sized and unmanaged type (and preserve the signed/unsigned nature). For user-defined structure types, the runtime will construct an unmanaged type by recursively mapping the individual fields of the complex type. The runtime will also automatically convert managed interfaces to unmanaged COM interfaces when using strongly-typed parameters. Managed ComVisible class instances are passed to COM interop methods by converting them into a pointer to their default interface (which may be an auto-generated class interface.),
Certain types require special handling. These are types that have different typical representations in COM vs. P/Invoke methods. Due to the ambiguity, these types should always we decorated with the MarshalAs attribute, even when it specifies default behavior. The following list indicates the appopriate UnmanagedType enumeration value to match a particular unmanaged type:
- object parameters: P/Invoke calls cannot use parameters of type object, but can accept complex types with object fields. The following UnmanagedType values can be applied only to object parameters:
- UnmanagedType.Struct (default for parameters) - COM VARIANT, despite the name.
- UnmanagedType.IUnknown (default for structure fields) - Interface derived from IUnknown
- UnmanagedType.IDispatch - dispinterface derived from IDispatch
- UnmanagedType.Interface -IDispatch interface if possible, IUnknown otherwise.
- string parameters: The default marshalling for strings is controlled by the CharSet field in the DllImport, StructLayout, or UnmanagedFunctionPointer attribute applied to a type. (If not specified, CharSet defaults to CharSet.Ansi, which is almost always wrong, so be sure to specify CharSet.Unicode and the appopriate MarshalAs attribute!) The following UnmanagedType values can only be applied to strings (this is not an exhaustive list):
- UnmanagedType.BSTR (default for COM methods): COM-style BSTR
- UnmanagedType.LPStr (default for CharSet.Ansi): Pointer to an (8-bit) character array.
- UnmanagedType.LPWStr (default for CharSet.Unicode): Pointer to a wide character array.
- UnmanagedType.LPTStr (default for CharSet.Auto): Marshaller will select LPStr or LPWStr at runtime based on operating system.
- UnmanagedType.ByValTStr: Applies only to fields in structures; field will be marshalled as an inline fixed-length array of characters, whose size depends on the CharSet attribute of the structure (not the operating system).
- StringBuffer instances can be used in place of string parameters (but not as fields in a structure) in any case where UnmanagedType.LPStr, UnmanagedType.LPWStr, or UnmanagedType.LPTStr is specified. The StringBuffer instance must be pre-allocated to the correct size before calling the function, and should be passed by value (it will automatically behave as an [in, out] parameter.)
- bool parameters are marshalled as integers of the appropriate size. The following UnmanagedType values can be applied to only booleans:
- UnmanagedType.BOOL - Windows SDK's BOOL, 4 bytes
- UnmanagedType.VariantBool - COM's VARIANT_BOOL, 2 bytes
- The UnmanagedType.U1 type can be used for the obsolete 1-byte BOOLEAN type from early Windows code. UnmanagedType.U1 is a generic "unsigned 1-byte" data type, which is used by default for byte parameters; it also can be used for char parameters, but probably shouldn't (it will prevent the normal CharSet conversion from working).
- Arrays passed to unmanaged code are passed by-value with[in] semantics only by default, unlike arrays passed between managed methods. You can explicitly specify the In and Out attributes to get bi-directional semantics if required. If the array subtype is one of the other ambiguous types (for example, a string), you should also specify the ArraySubType or SafeArraySubType field to the appropriate value. The following UnmanagedType values can only be applied to arrays:
- UnmanagedType.SafeArray (default) - COM-style SAFEARRAY.
- UnmanagedType.LPArray - C-style, pointer to the first element.
- UnmanagedType.ByValArray - Fixed-size array embedded within a structure