More Tales from the Unmanaged Side - System.String -> char*

Tags: .NET, C#, C++CLI

Today, I had a very tight deadline to achieve a very simple task: pass a managed .NET string to an API function that expects a null-terminated char*. Trivial, you would expect? Unfortunately it wasn't.

My first though was to do the pinning trick that I mentioned in my last post, but in this case I needed my resulting char* to be null-terminated.

Second thought was to go to the System.Runtime.InteropServices.Marshal class and see what it had for me. I found two contenders:

1) Marshal::StringToBSTR() - this creates a COM-compatible BSTR. I found various tutorials about BSTRs saying that they MIGHT be, under SOME circumstances, compatibles with zero-terminated wide-character strings. Didn't seem to be safe enough.

2) Marshal::StringToHGlobalAuto() - this allocates a block of memory, copies the characters in and even null-terminates it for me. This looked like a winner. Stage one was done - we managed to get an unmanaged string. But can we use it now?

The next problem was that StringToHGlobalAuto returns an IntPtr, and casting it to a char* led to a compilation error. The solution to that is either to cast the IntPtr to a (void*) before casting to (char*), or to do the same action by calling the IntPtr's ToPointer() method. The second option seems neater to those of us who like as few direct casts as possible - I'd rather my conversion was done by a method than by a possibly unsafe casting operation. I'm sure those more concerned with method-call overheads will disagree.

The next problem is that the result of this operation was a single character string - the first character from the expected string. C++ programmers who've struggled with Unicode will quickly spot the problem.

char* strings are null terminated - the first byte containing 00 is the terminator. For Unicode strings as returned by StringToHGlobalAuto, each character takes 2 bytes. If it's a character from the lower reaches of the Unicode spectrum, the second byte, being the high-order byte, will usually be 00, thus terminating the string. There are two options:

1. Instead of char*, use wchar_t* - wide, 2-byte character string, terminated by two null bytes.
2. use StringToHGlobalANSI, which converts the string to standard 8-bit ANSI encoding. This should be used only if we know we can't receive any unicode characters, or (as in my case) when the API we call only acceptts char*. :(

 

So that's a bit more C++ that haunted and taunted me today. See you next time.

3 Comments

Comments have been disabled for this content.