Monday, January 02, 2012

COM Interop Part 2: Interop The Easy Way

Last time we got a brief introduction to interop, and in particular COM Interop, and why we might need to use it. The idea of using C# to talk to a non-C# application sounds like it could be a lot of work, and rife with potential bugs. Mostly because it is. Once you leave the safe, cozy confines of a managed environment, we open ourselves up to all kind of errors that managed code left behind a long time ago. Fear not: avoiding these errors is a simple (!) matter of understanding the rules of interop, and following them religiously.

After my first post, I've was asked why this COM stuff is even relevant to C# developers in 2011. While we managed developers are lucky enough to see Microsoft releasing lots of new .NET libraries, the core of the Windows OS is still unmanaged code. The basic features of Windows, that carried over from the Windows 3 days, are simple, non-object-based native functions, but a large chunk of the system is exposed only through COM interfaces. Just off the top of my head, our C# projects have used COM Interop for: shell preview handlers, search text filters, structured property storage, drag and drop sources, and exposing C# components to VBScript.

We are fortunate here that COM is conceptually very similar to the .NET style of development, it's just the details that are different. We can take advantage of these similarities to let Visual Studio's supporting tools do the heavy lifting for us in some cases. Depending on the use case, you may find that you don't need to do any work at all to make things function properly. Most often, you will find your COM interactions boil down to one of these three use cases:
  1. Accessing an existing COM component, such as an ActiveX control, from your project. In this case, you have an actual registered COM component, probably in the form of an OCX or DLL, and you need to reference it. This type of interop is almost seamless within Visual Studio, as we'll see in a minute.
  2. Exposing your own C# classes to COM consumers. Again, Visual Studio makes this very easy, though there's some extra steps you need to go through to allow COM to see your component.
  3. Using the built-in COM features of Windows itself. Here is where Visual Studio's automated tools start to fall apart, because we don't have an actual component to work with. Windows is the component. At this point, we need to fall back on doing some grunt work ourselves.
Since this last case is the source of some of the most useful and interesting things you can do with COM Interop, it's what I'll spend the rest of this series talking about. But first, a quick look at the built-in support for COM Interop with Visual Studio and the Windows SDK. It all starts with the typelib.

COM Type Libraries

Those of you who have done any C# development are probably at least vaguely aware of the concept of type metadata. This is the information that .NET embeds into your assembly that lists the names and data types of all the public elements in your assembly: the class names, method names, parameters, etc. The type metadata gives the compiler everything it needs to reference an external assembly at build-time, and tells the runtime what it needs to know to verify that you aren't doing anything silly (like passing a string instead of an integer).

In the COM world, this information is contained in a special type of file called the type library. This is a compiled file that contains a description of all of the type information a COM component exposes: interface names, GUIDs, parameter types, etc. Typically, a type library for a given COM component will be embedded as a resource into the same file that contains the COM code (probably a DLL or OCX file), but it can also be provided separately, in a standalone TLB file. To develop against a COM component at build type, you only need the type library; the actual component files don't have to be present and registered until runtime.

Interop Assemblies

Having a type library is all well and good, but the C# compiler doesn't speak type library, it speaks IL. Fortunately, the type library was designed for just this sort of situation: it provides all the information we need to generate the appropriate managed code with the type information in it. There are a number of ways accomplish this code generation, but the end result is usually the same: an interop assembly. This is a managed assembly that consists mostly (or entirely) of type metadata derived from a type library. Occasionally, you will also hear the term primary interop assembly; this is just a fancy term for an interop assembly that was provided by the same people who wrote the COM object in the first place. For example, Microsoft provides us with interop assemblies for the Office COM API -- those are primary interop assemblies, or PIA. On the other hand, we can generate our own interop assembly using the Office type libraries, and get the same metadata, but those would not be primary interop assemblies. PIAs are strongly-named, and thus can only be (legitimately) produced by the actual manufacturer, which gives them a slightly higher rank among interop assemblies, but otherwise they contain the same metadata.

Importing Type Libraries

If you have a type library available, the easiest way is to access it is to let Visual Studio do it for you. The Add References... dialog include a tab labelled "COM", which lists all of the registered COM components on your system. If you need to use a component that is not registered, the "Browse" tab will also list COM components, including type library files, for you to import. When you add a reference to a COM component or type library, Visual Studio processes the type library into the appropriate interop metadata and creates an interop assembly. You can see this interop assembly in your build output, with a name like Interop.TypeLibraryName.dll.

With Visual Studio 2010, there is an additional option to make life even simpler. By default, the type information in your interop assemblies will be embedded directly into your main assembly. This was done primarily to make Office development easier, but it works for a wide range of COM assemblies. With this option, you no longer have to deploy the interop assembly with your application.

If you're not using Visual Studio, or if you just like more control, you can use a utility called the Type Library Importer, TLBIMP, which is part of both Visual Studio and the Windows SDK. This program does the same basic job as the Visual Studio reference dialog, but on a command line. You supply the type library (or a DLL or OCX with an embedded one), and it will generate an interop assembly for you. You have some options to control the output. This tool is also the way you would generate a PIA for your own COM components.

Exporting Type Libraries

On the other hand, if you are writing managed code to be exposed to COM, you don't need to import a type library, you need to create one. This time, you still have two options, but they are both command-line tools. The easiest option is to use the Assembly Registration Tool, REGASM, which registers your assembly with the COM subsystem on the local machine, and can optionally spit out a type library if requested. For deployment purposes, you would use the /regfile command-line switch to ask REGASM to write out a registry file, and give that information to your installer to write directly into the target registry. If you just want to create the type library (for example, to give to a VB developer), you can instead use the Type Library Exporter, which does just that.

In both cases, you will want to have some control over what does and doesn't go into your type library. I will probably go into more details on this bit later, but by default, everything public gets exported into your type library, and that usually produces a serious mess. There are a couple of useful attributes you can apply to your code that changes this behavior. Particularly critical are the ComVisible attribute, which should be set to false at the assembly level, then true on those specific type that go into your type library, and the ClassInterface attribute, which helps prevent the typelib generator from generating CoClasses when it doesn't need to.

Hand-Crafted Interop Assemblies

Of course, all of those utilities work fine if you're talking about an external COM object, like an ActiveX control or automation server. But what if you're talking about Windows? Lots of parts of the Windows API are exposed as COM interfaces that you can acquire from Windows, or which you can expose yourself for Windows to use, and there's no type libraries involved. Here's where the hand-crafting of your interop assemblies comes into play!

If we were doing C++ development, then all the type information we need is already available for us, in the form of the Windows SDK. The C++ header files included with the SDK define the COM interfaces that Windows uses, and the native methods we use to acquire them. Sadly, the SDK doesn't include C# code, but who needs it? With the header files and MSDN documentation, we have everything we need to craft our own COM Interop types.

The rest of this series is going to deal with the COM interfaces expose by Windows through the SDK, and how to use them in C#. We will be starting with the interface definitions found in those header files, and building our own, working interop assembly from scratch. Along the way, hopefully we'll learn a whole lot more about the way COM Interop works!

Next time, we'll start looking at the guts of the interop system, and in particular, the workhorse of the bunch: the marshaller.