A Quick Introduction to Text-To-Speech Synthesis in .NET 2.0

Wednesday, May 25, 2005

.NET

When I heard that the Indigo and Avalon Beta 1 RC had the new managed Speech APIs I decided I just had to take a look. You see, I’ve been a geek for quite a while now and when I was but a wee lad (okay, I’ll admit it I was never actually all that wee of a lad), I saw this movie called “Tin Man”. It was a universally panned film about a deaf man who creates a speaking computer called Osgood. Then in the late 80s or so, when the old SoundBlaster sound cards came out with passable text-to-speech I used to sit at the keyboard and let my 5 year old son talk to my computer, which of course I called Osgood. He’d say Hello and I’d type a response and the words would come out of the computer which literally amazed my son. Ah, the magic of fatherhood.

So, I’ve been interested in text-to-speech (TTS) for quite some time. In looking at the ease at which I could get the same TTS synthesize programmatically I thought of an interesting potential use of the technology and have implemented an example which I’ll go through in this article.

One of the most common uses of speech synthesis is to provide a friendlier interface for novice users. In that vane, I decided to create a Windows Forms control that would play an instructional message via TTS when the user set the focus on any form control, with the message being set by the developer at design time.

I started by creating a new ClassLibrary project and adding the necessary references for Windows Forms and the System.Speech API. The Speech.DLL should be located in the C:\WINDOWS\Microsoft.NET\Windows\v6.0.4030\ directory. I then added a new component class called SpeechTipProvider. By deriving the control from System.ComponentModel.Component I can use the control in the Windows Forms Designer by dragging it off the tool palette onto a Form. Visual Studio 2005 is fortunately smart enough to add the component to the palette automatically under a project specific tab.

What I needed to do next was to make it possible for a developer to be able to define an instructional message for each control on a Form. I am able to do this thanks to one of the most powerful extensions to the Windows Forms architecture; the IExtenderProvider. This interface, and an attribute or two, allows developers to add properties to existing controls. It is how the ErrorProvider, ToolTip, and HelpProvider controls are implemented. The interface defines only one property, CanExtend which takes a reference to the object being tested and returns a Boolean value that tells the Designer if that control can use the extender. Figure 1 shows the implementation of CanExtend for the SpeechTipProvider class which allows it to be used with any control.

Figure 1.

The next thing I did is to use the ProvidePropertyAttribute attribute to tell the Designer the name of the property I’m adding and for which types of objects. This attribute is added at the class level. Figure 2 shows the implementation of the ProvidePropertyAttribute for the SpeechTipProvider class.

Figure 2

The next task is to implement the actual SpeechTip property. This property isn’t implemented as a normal property however because it requires a reference to the control that is being extended to be passed in to both the Get and Set. Therefore, you implement this property using two methods with the exact naming standard of GetPropertyName and SetPropertyName. So for the SpeechTip property the methods are GetSpeechTip and SetSpeechTip.

The SetSpeechTip method has two parameters, a reference to the control being extended and the String value of the SpeechTip property. When Set is called I place the String that the developer defined for the SpeechTip into a private StringDictionary with the control’s name as the key. I then add a new EventHandler for the control’s Enter event passing in the ControlEnter method from the SpeechTipProvider class. By wiring up a method to that event, the SpeechTipProvider class will know whenever the user enters a control for which it has a SpeechTip set. If the value passed into the SetSpeechTip method is null or String.Empty, I remove any previously set value from the StringDictionary and remove the event handler from the control’s Enter event.

The GetSpeechTip method is very simple, and just returns the current value stored in the StringDictionary for the control passed into the method as a parameter. Figure 3 shows the implementation of the Get and Set SpeechTip methods.

Figure 3

Now that all of the plumbing work for the IExtenderProvider is done, I can get to the fun part of working with the Speech APIs. I start by adding a reference to a new System.Speech.Synthesis.SpeechSynthesizer to the class, initializing it in the constructor. Windows XP comes with a TTS synthesizer and there are also engines installed as part of Windows XP Tablet Edition and Microsoft Office 2003. You can download or buy others as well. This example uses the standard Windows synthesizer.

In the ControlEnter method, the implementation of TTS is very simple. I merely cast the sender as a Control and call the synthesizer’s SpeakAsync method passing in the text tip from the StringDictionary for that control. Figure 4 shows the source code for the ControlEnter method.

Figure 4.

Next I wanted to add some features to the SpeechTipProvider to allow the developer to specify which of the built-in voices to use for the TTS as well as allowing the developer to turn off the speech as to not irritate power users. There are three voices installed in the standard Windows XP synthesizer named Michael, Michelle, and Sam. Each voice has a slightly different tone to it based on gender and age. I added an enumeration to the project called Voices for each of the preinstalled voices and then added a property to the SpeechTipProvider to allow the developer to set which voice he/she wanted to use. In order for the property to be viewable within the Designer, I added attributes to specify that the property is Browsable and that set a Category for it in the Properties window. In the Set for the Voice property I set the synthesizer’s default voice to the selected voice using the static InstalledVoices method.

I then added another property called Enabled which would allow the developer to turn off TTS at runtime. In the ControlEnter method in Figure 4 you’ll see that I’m testing the private field associated with that property before playing the tip. Figure 5 shows the source code for these two properties.

Figure 5.

Now that the implementation of the SpeechTipProvider is complete, I added a new Windows Form project with a Form having several controls. I added TextBoxes for first and last names, a DatePicker control for selecting a BirthDate, and a group box with two radio buttons for selecting Gender. I then dragged the SpeechTipProvider control off the toolbox palette and onto the form. After doing that, I was able to use the Properties Window for each control to set the SpeechTip property. Figure 6 shows the Properties Window with the last name Textbox selected.

Figure 6.

All that’s left to do is Build and Eecute and Osgood has once again come to life, this time smarter than ever. The uses for the Speech APIs range from the somewhat trivial like this example to full fledged voice recognition and response systems written entirely in managed code. Watch my blog at http://weblogs.asp.net/PaulBallard for more experiments with voice recognition and other synthesizers.

Download the source for this post.

1 Comment