Easy high speed reading/writing of structured binary files using C#
Quite a bit has been written about reading structured binary data from or writing it to files (see [1,2,3]). [1], for example, compares three different approaches. Unfortunately none is as straightforward as C/C++ code would be. Here´s how you could read the ID3v1 tag from a MP3 file:
struct ID3v1Tag
{
char tag[3]; // == "TAG"
char title[30];
...
};
ID3v1Tag t;
FILE *f = fopen("mysong.mp3", "r");
fseek(f, -128, SEEK_END);
fread(&t, 1, 128, f);
printf("%.30s\n", t.title);
fclose(f);
Now, if you wanted to accomplish the same with C#... it would not look that easy anymore. The reason: you cannot read data from a file (stream) directly into a struct. A stream always requires a byte array as the target for read operations. Or if you use a BinaryReader the ReadBytes() method returns a byte array. In any case the data read into a byte array needs to be copied into the target struct.
[1] uses Marshal.PtrToStructure() to do this, and [3] offers a much more elegant solution using an unsafe assignment like this:
[StructLayout(LayoutKind.Sequential, Pack=1)]
unsafe struct ID3v1Tag
{
...
public ID3v1Tag(byte[] data)
{
fixed (byte* pData = data)
{
this = *(ID3v1Tag*)pData;
}
}
}
Alternatively you could read data from an input stream in little chunks using a BinaryReader, which would mean you deserialize the data into each field by hand. This avoids the extra copy of data, but requires much effort on your side. You´re trading performance for lines of code.
That´s what can be said about reading (and writing) binary data using C# (or managed code in general).
However, due to a customer engagement I recently started thinking about this. The customer needs to port C++ code which interacts massively with binary files to C#. The approaches found in the literature, though, are too slow for him. The need for an extra data copy really hurts the application´s performance. So he kept essential parts of the code in C++ to benefit from the languages ease of use when accessing binary data.
I felt challenged by this problem. And here´s my solution: Easy reading/writing of binary structured data using C# 2.0 - without the need for an extra data copy. Look at the following code for reading the ID3v1 tag of a MP3 file:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public unsafe struct ID3v1Tag
{
private fixed sbyte tag[3];
private fixed sbyte title[30];
...
}
using (System.IO.BinaryFile fmp3 = new System.IO.BinaryFile("myfile.mp3", System.IO.FileMode.Open))
{
ID3v1Tag t;
unsafe
{
fmp3.Seek(-128, System.IO.SeekOrigin.End);
fmp3.ReadStruct<ID3v1Tag>(&t);
}
if (t.Tag == "TAG")
{
Console.WriteLine("title: " + t.Title); ...
}
}
I´d say it´s as easy to read/write as the C++ equivalent above. And it´s just generic functions that get called. And no extra copies of data are needed. The ID3v1 tag data is read directly into the ID3v1Tag struct passed to the Read() method.
How is this done?
Well, I removed the premise that underlies the usual literature on this topic: I don´t use System.IO to access the file, but the old CRT fxxx() functions. The above BinaryFile class encapsulates the calls to the following C DLL functions:
[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private static extern int FileOpen(string filename, string mode);
[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private static extern void FileClose(int hStream);
[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileReadBuffer(int hStream, void* buffer, short bufferLen);
[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileWriteBuffer(int hStream, void* buffer, short bufferLen);
[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileSeek(int hStream, int offset, short origin);
[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileGetPos(int hStream, out int pos);
[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileFlush(int hStream);
I just wrote a small unmanaged DLL wrapper around the basic stdio C functions like fopen(), fread() etc. That´s all the magic there is. Look at my C function for reading data from a file:
extern "C" DLLEXPORT short __stdcall FileReadBuffer(FILE *stream, void *buffer, int bufferLen)
{
int n = fread(buffer, 1, bufferLen, stream);
return n == bufferLen;
}
This function is called by a wrapper class´ method to make it easier for application code to work with binary files. BinaryFile hides the CRT file handle and looks much like a FileStream (that´s also the reason why I put BinaryFile into the System.IO namespace):
public unsafe bool ReadStruct<StructType>(void *buffer) where StructType : struct
{
return Read(buffer, (short)System.Runtime.InteropServices.Marshal.SizeOf(typeof(StructType)));
}
public unsafe bool Read(void* buffer, short bufferLen)
{
...
return FileReadBuffer(hFile, buffer, bufferLen);
}
This Read() method you just need to pass the address of the target struct to receive the data from the file and the number of bytes to read. That´s it. fread() will put the data right into the C# struct. No extra byte[], no explicit deserialization of fields. You just need to be willing to use unsafe code:
unsafe
{
fmp3.Read<MyStruct>(&myStructVar);
}
I´d say, it cannot become much easier or faster than this, when reading from binary files.
If you´d like to give this approach a try, you can download sources here.
In order to use the BinaryFile class just add a reference to CRTFileIO.Import.dll to your C# project and make sure the C wrapper CRTFileIO.dll gets copied to the same directory as CRTFileIO.Import.dll.
Enjoy!
Resources
[1] Anthony Baraff: Fast Binary File Reading with C#, http://www.codeproject.com/csharp/fastbinaryfileinput.asp
[2] Robert L. Bogue: Read binary files more efficiently using C#, http://www.builderau.com.au/architect/webservices/0,39024590,20277904,00.htm
[3] Eric Gunnerson: Unsafe and reading from files, http://blogs.msdn.com/ericgu/archive/2004/04/13/112297.aspx