A C# Set class based on enums - You don´t want to be without it when constructing a compiler ;-)
To refresh myself from obscure information models like Pile and restless thinking about software architecture I decided to pick up a new hobby: compiler construction ;-) Or rather I decided to apply my long time interest in this topic to a good cause: When I talked to Niklaus Wirth - the father of Pascal - in November at the iX conference he told me, he was about to release an updated version of his book "Compiler Construction" (read the complete book here: [PDF]) - which happens to be one of my all time favorite computer text books.
However, when I asked him, if he intended to bring his classical introduction to the world of .NET, he declined. He said, he deliberately chose Oberon as the implementation language for all examples in his book.
Now, since I´m a great fan of the book as well as the .NET Framework, I have the idea of translating all code samples in the book to C# to make the text more accessible to .NET developers. This sure is some undertaking and I don´t know, if I will find the time to really complete this effort - but at least I want to start.
So, here is my first very small piece of the puzzle: a set class. It´s very handy when you write a parser to check, if a symbol belongs to a certain set of expected symbols, e.g.
IF symbol IN [number, identifier, leftParent, minusOp] THEN ...
This checks, if the current symbol of the parser is a number or identifier etc. which means, if it might signify the start of an expression.
A set data type exists in Pascal and Oberon, but not in C# or VB. That´s pretty sad, because it´s sometimes quite convenient. Of course there exist some implementations like [1] and [2], but I thought, why not try it again using .NET 2.0 Generics? Also I though, enum data types were an ideal foundation to build a set on.
Why not be able to write:
public enum Colors
{
red, blue, green
}
Set<Colors> sc = new Set<Colors>();
sc.Add(Colors.red);
sc.Add(Colors.green);
Set<Colors> sc2 = new Set<Colors>();
sc2.Add(Colors.blue);
sc.Add(sc2); // union
Console.WriteLine(sc); // prints: [red,blue,green]
With such a set data type setting up a scanner and parser would be as easy as with Pascal, e.g.
public enum Symbols
{
identifier, number, string, leftParent, rightParent, ...
}
Set<Symbols> firstExpressions = new Set<Symbols>();
firstExpressions.Add(Symbols.identifier);
firstExpressions.Add(Symbols.number);
...
Symbols currentSymbol;
...
if(firstExpressions.Contains(currentSymbol))
{
...
}
I find using enums as a starting point very convenient. They look pretty much like the Pascal sets, but just lack any operations. So I devised a generic Set data type in C# to wrap an enum. So far it works ok for me:
public class Set<TEnum>
{
#region "Data"
bool flagsEnum;
System.Collections.BitArray members;
System.Collections.Generic.Dictionary<TEnum, int> mapMember2Index;
#endregion
#region "ctors"
public Set()
{
Initialize();
members = new System.Collections.BitArray(mapMember2Index.Count);
}
private Set(System.Collections.BitArray members)
{
Initialize();
this.members = members;
}
private void Initialize()
{
if (typeof(TEnum).BaseType != typeof(System.Enum))
throw new ApplicationException(string.Format("Generic type parameter <{0}> is not an enum type!",
typeof(TEnum).FullName));
flagsEnum = typeof(TEnum).GetCustomAttributes(typeof(System.FlagsAttribute), true).Length > 0;
int i = 0;
mapMember2Index = new Dictionary<TEnum, int>();
foreach (TEnum m in Enum.GetValues(typeof(TEnum)))
mapMember2Index.Add(m, i++);
}
#endregion
#region "Working with the set"
public void Clear()
{
members.SetAll(false);
}
public Set<TEnum> Add(TEnum member)
{
members.Set(mapMember2Index[member], true);
return this;
}
public Set<TEnum> Add(Set<TEnum> otherSet)
{
if(otherSet != null) members.Or(otherSet.members);
return this;
}
public Set<TEnum> Remove(TEnum member)
{
members.Set(mapMember2Index[member], false);
return this;
}
public Set<TEnum> Remove(Set<TEnum> otherSet)
{
if (otherSet != null)
for (int i = 0; i < members.Count; i++)
if (otherSet.members[i])
members[i] = false;
return this;
}
public Set<TEnum> Intersect(Set<TEnum> otherSet)
{
if (otherSet != null) members.And(otherSet.members);
return this;
}
public bool Contains(TEnum member)
{
return members[mapMember2Index[member]];
}
public int Cardinality
{
get
{
return members.Length;
}
}
#endregion
#region "Overrides"
public override bool Equals(object obj)
{
return this == (Set<TEnum>)obj;
}
public override int GetHashCode()
{
return members.GetHashCode();
}
public override string ToString()
{
string[] names = Enum.GetNames(typeof(TEnum));
StringBuilder memberNames = new StringBuilder("[");
for (int i = 0; i < members.Count; i++)
if (members[i])
{
if (memberNames.Length > 1) memberNames.Append(",");
memberNames.Append(names[i]);
}
memberNames.Append("]");
return memberNames.ToString();
}
#endregion
#region "Operators"
// intersection
public static Set<TEnum> operator &(Set<TEnum> left, Set<TEnum> right)
{
System.Collections.BitArray result = new System.Collections.BitArray(new bool[left.members.Count]);
result.Or(left.members);
if(right != null) result.And(right.members);
return new Set<TEnum>(result);
}
// union
public static Set<TEnum> operator |(Set<TEnum> left, Set<TEnum> right)
{
System.Collections.BitArray result = new System.Collections.BitArray(new bool[left.members.Count]);
result.Or(left.members);
if(right != null) result.Or(right.members);
return new Set<TEnum>(result);
}
public static bool operator ==(Set<TEnum> left, Set<TEnum> right)
{
if (right != null)
{
for (int i = 0; i < left.members.Count; i++)
if (left.members[i] != right.members[i]) return false;
return true;
}
else
return false;
}
public static bool operator !=(Set<TEnum> left, Set<TEnum> right)
{
return !(left == right);
}
#endregion
#region "Iterator"
public IEnumerator<TEnum> GetEnumerator()
{
TEnum[] memberValues = (TEnum[])Enum.GetValues(typeof(TEnum));
for (int i = 0; i < members.Count; i++)
if (members[i])
yield return memberValues[i];
}
#endregion
}
Enjoy!
PS: It seems there is one less excuse to not start translating the Compiler Construction sources... Let´s see when I really find time for that. Stay tuned!
[1] A Generic Set Data Structure
[2] Yet Another C# set class