Code Puzzle #2 - Generate random fake surnames - Recap
Code Puzzle #2 posed the following task: Write a simple function which generates fake but passable surnames (read more here). As I'd hoped, I got several great submissions with a range of interesting approaches. I'd like to say that we're all winners, but the rules were clear: "This is a fixed contest; my solution will win first place."
- Wim Hollebrandse was quickest on the draw. If he really cranked that out in 5 minutes, I'm definitely not going to admit to how long my solution took me.
- Mr. McSaeceeefaeeeeeedeffeeeerson's solution was unfortunately rejected by the judges due to allegations of blood doping.
- Keith Rull submitted a Filipino name generator. I was very much hoping this puzzle would venture beyond names of Western European origin, and Keith came through. Sadly, I have no idea if his names are good or not.
- Carlos M Perez came up with the most involved solution which included associating probability weightings with digraphs based on how frequently they appear in common usage.
- Rich McCollister took an innovative approach. He combined digraphs in a random order, then used a regular expression filter to weed out results that didn't make sense.
Finally, here's my solution. It's the ugliest solution by far, but I was pretty happy with the output. I divided the consonants into four groups:
- common - can appear anywhere, and appear frequently
- average - can appear anywhere, and appear with... um... average frequency
- middle - slightly lower frequency, and not allowed to start a name
- rare - appear rarely, but are allowed to start a name
I seeded the random number generator with Guid.NewGuid().GetHashCode(), per Brendan Tompkins' tip. System.Random bases its randomization based on a seed. If you pass the same seed in every time, you get the sequence out every time. If you don't seed the Random Number Generator, it uses the system clock (Ticks, to be precise). The problem is that if you call it multiple times in a tight loop, you'll get the same values out. Seeding based on a GUID hashcode ensures a random (though not evenly distributed) sequence.
You'll notice that my letter arrays contain duplicates of some values - common letters like A,E,S, and T are repeated multiple times. That's a cheap trick to allow for a random output that weights some values more highly.
I added common prefixes and suffixes after looking at the common surname list, then tweaked the weightings so they'd show up at the right frequency.
private static string GenerateSurname()
{
string name = string.Empty;
string[] currentConsonant;
string[] vowels = "a,a,a,a,a,e,e,e,e,e,e,e,e,e,e,e,i,i,i,o,o,o,u,y,ee,ee,ea,ea,ey,eau,eigh,oa,oo,ou,ough,ay".Split(',');
string[] commonConsonants = "s,s,s,s,t,t,t,t,t,n,n,r,l,d,sm,sl,sh,sh,th,th,th".Split(',');
string[] averageConsonants = "sh,sh,st,st,b,c,f,g,h,k,l,m,p,p,ph,wh".Split(',');
string[] middleConsonants = "x,ss,ss,ch,ch,ck,ck,dd,kn,rt,gh,mm,nd,nd,nn,pp,ps,tt,ff,rr,rk,mp,ll".Split(','); //Can't start
string[] rareConsonants = "j,j,j,v,v,w,w,w,z,qu,qu".Split(',');
Random rng = new Random(Guid.NewGuid().GetHashCode()); //http://codebetter.com/blogs/59496.aspx
int[] lengthArray = new int[] { 2, 2, 2, 2, 2, 2, 3, 3, 3, 4 }; //Favor shorter names but allow longer ones
int length = lengthArray[rng.Next(lengthArray.Length)];
for (int i = 0; i < length; i++)
{
int letterType = rng.Next(1000);
if (letterType < 775) currentConsonant = commonConsonants;
else if (letterType < 875 && i > 0) currentConsonant = middleConsonants;
else if (letterType < 985) currentConsonant = averageConsonants;
else currentConsonant = rareConsonants;
name += currentConsonant[rng.Next(currentConsonant.Length)];
name += vowels[rng.Next(vowels.Length)];
if (name.Length > 4 && rng.Next(1000) < 800) break; //Getting long, must roll to save
if (name.Length > 6 && rng.Next(1000) < 950) break; //Really long, roll again to save
if (name.Length > 7) break; //Probably ridiculous, stop building and add ending
}
int endingType = rng.Next(1000);
if (name.Length > 6)
endingType -= (name.Length * 25); //Don't add long endings if already long
else
endingType += (name.Length * 10); //Favor long endings if short
if (endingType < 400) { } // Ends with vowel
else if (endingType < 775) name += commonConsonants[rng.Next(commonConsonants.Length)];
else if (endingType < 825) name += averageConsonants[rng.Next(averageConsonants.Length)];
else if (endingType < 840) name += "ski";
else if (endingType < 860) name += "son";
else if (Regex.IsMatch(name, "(.+)(ay|e|ee|ea|oo)$") || name.Length < 5)
{
name = "Mc" + name.Substring(0, 1).ToUpper() + name.Substring(1);
return name;
}
else name += "ez";
name = name.Substring(0, 1).ToUpper() + name.Substring(1); //Capitalize first letter
return name;
}
Please feel free to submit your solution. We've only covered a few ethnicities here, there are plenty more to cover.