More Tales from the Unmanaged Side - System.String -> char*
Today, I had a very tight deadline to achieve a very simple task: pass a managed .NET string to an API function that expects a null-terminated char*. Trivial, you would expect? Unfortunately it wasn't.
My first though was to do the pinning trick that I mentioned in my last post, but in this case I needed my resulting char* to be null-terminated.
Second thought was to go to the System.Runtime.InteropServices.Marshal class and see what it had for me. I found two contenders:
1) Marshal::StringToBSTR() - this creates a COM-compatible BSTR. I found various tutorials about BSTRs saying that they MIGHT be, under SOME circumstances, compatibles with zero-terminated wide-character strings. Didn't seem to be safe enough.
2) Marshal::StringToHGlobalAuto() - this allocates a block of memory, copies the characters in and even null-terminates it for me. This looked like a winner. Stage one was done - we managed to get an unmanaged string. But can we use it now?
The next problem was that StringToHGlobalAuto returns an IntPtr, and casting it to a char* led to a compilation error. The solution to that is either to cast the IntPtr to a (void*) before casting to (char*), or to do the same action by calling the IntPtr's ToPointer() method. The second option seems neater to those of us who like as few direct casts as possible - I'd rather my conversion was done by a method than by a possibly unsafe casting operation. I'm sure those more concerned with method-call overheads will disagree.
The next problem is that the result of this operation was a single character string - the first character from the expected string. C++ programmers who've struggled with Unicode will quickly spot the problem.
char* strings are null terminated - the first byte containing 00 is the terminator. For Unicode strings as returned by StringToHGlobalAuto, each character takes 2 bytes. If it's a character from the lower reaches of the Unicode spectrum, the second byte, being the high-order byte, will usually be 00, thus terminating the string. There are two options:
1. Instead of char*, use wchar_t* - wide, 2-byte character string, terminated by two null bytes.
2. use StringToHGlobalANSI, which converts the string to standard 8-bit ANSI encoding. This should be used only if we know we can't receive any unicode characters, or (as in my case) when the API we call only acceptts char*. :(
So that's a bit more C++ that haunted and taunted me today. See you next time.
3 Comments
Comments have been disabled for this content.
Jim Hewitt said
Thanks for the tip. I was a couple hours into pulling my hair out when I found your post.
Intuicia said
Who knows why the sun shines on empty?
Mozius said
Great insight, great article, and thanks for sharing it. How to subscribe on your blog ???