More Tales from the Unmanaged Side - System.String -> char*

Tuesday, August 28, 2007 4:08:50 PM

Today, I had a very tight deadline to achieve a very simple task: pass a managed .NET string to an API function that expects a null-terminated char*. Trivial, you would expect? Unfortunately it wasn't.

My first though was to do the pinning trick that I mentioned in my last post, but in this case I needed my resulting char* to be null-terminated.

Second thought was to go to the System.Runtime.InteropServices.Marshal class and see what it had for me. I found two contenders:

1) Marshal::StringToBSTR() - this creates a COM-compatible BSTR. I found various tutorials about BSTRs saying that they MIGHT be, under SOME circumstances, compatibles with zero-terminated wide-character strings. Didn't seem to be safe enough.

2) Marshal::StringToHGlobalAuto() - this allocates a block of memory, copies the characters in and even null-terminates it for me. This looked like a winner. Stage one was done - we managed to get an unmanaged string. But can we use it now?

The next problem was that StringToHGlobalAuto returns an IntPtr, and casting it to a char* led to a compilation error. The solution to that is either to cast the IntPtr to a (void*) before casting to (char*), or to do the same action by calling the IntPtr's ToPointer() method. The second option seems neater to those of us who like as few direct casts as possible - I'd rather my conversion was done by a method than by a possibly unsafe casting operation. I'm sure those more concerned with method-call overheads will disagree.

The next problem is that the result of this operation was a single character string - the first character from the expected string. C++ programmers who've struggled with Unicode will quickly spot the problem.

char* strings are null terminated - the first byte containing 00 is the terminator. For Unicode strings as returned by StringToHGlobalAuto, each character takes 2 bytes. If it's a character from the lower reaches of the Unicode spectrum, the second byte, being the high-order byte, will usually be 00, thus terminating the string. There are two options:

1. Instead of char*, use wchar_t* - wide, 2-byte character string, terminated by two null bytes.
2. use StringToHGlobalANSI, which converts the string to standard 8-bit ANSI encoding. This should be used only if we know we can't receive any unicode characters, or (as in my case) when the API we call only acceptts char*. :(

So that's a bit more C++ that haunted and taunted me today. See you next time.

3 Comments

Jim Hewitt said Thursday, December 18, 2008 12:11:58 AM

Thanks for the tip. I was a couple hours into pulling my hair out when I found your post.
Intuicia said Saturday, March 26, 2011 7:07:31 AM

Who knows why the sun shines on empty?
Mozius said Monday, June 6, 2011 5:53:43 AM

Great insight, great article, and thanks for sharing it. How to subscribe on your blog ???

Comments have been disabled for this content.