Testing ASP.NET 2.0 and Visual Web Developer
Several people have asked for additional testing details after my recent Whidbey Update post where I talked a little about how we are building ASP.NET 2.0 and Visual Web Developer. Some specific questions I’ve been asked include: How do you build and track 105,000 test cases and 505,000 test scenarios? How big is the test team in relation to the dev team? What tools do we use to write and run them? What is the process used to manage all of this? Etc, Etc. Hopefully the below post provide some answers.
Test Team Structure
Our test team is staffed by engineers who own writing test plans, developing automated tests, and building the test infrastructure required to run and analyze them. The job title we use to describe this role at Microsoft is SDE/T (Software Design Engineer in Test).
All members of the test team report through a Test Manager (TM), and similarly all members of the development team and program management team report through a Development Manager (DM) and Group Program Manager (GPM) respectively. The TM, DM and GPM are peers who report to a Product Unit Manager (PUM) who runs the overall product team (note: I'm this guy).
This partitioned reporting structure has a couple of benefits – one of the big ones being that it enables specialization and focus across the entire team, and enables deep career growth and skills mentoring for each job type. It also helps ensure that design, development and testing each get the focus they need throughout the product cycle.
In terms of staffing ratios, our test team is actually the largest of the three disciplines on my team. We currently have approximately 1.4 testers for every 1 developer.
Why is the test team larger than the development team?
I think there are two main reasons for this on my team:
1) We take quality pretty seriously at Microsoft – hence the reason we invest the time and resources.
2) We also have a lot of very hard requirements that necessitate a heck of a lot of careful planning and work to ensure high quality.
For ASP.NET 2.0 and Visual Web Developer, we have to be able to deliver a super high quality product that is rock solid from a functional perspective, can run the world’s largest sites/applications for months without hiccups, is bullet-proof secure, and is faster than previous versions despite having infinitely more features (do a file size diff on System.Web.dll comparing V2 with V1.1 and you’ll see that it is 4 times larger).
Now doing all of the above is challenging. What makes it even harder is the fact that we need to deliver it on the same date on three radically different processor architectures (x86, IA-64, and x64 processor architectures), on 4 different major OS variations (Windows 2000, Windows XP, Windows 2003 and Longhorn), support design-time scenarios with 7 different Visual Studio SKUs, and be localized into 34+ languages (including BiDi languages which bring unique challenges).
Making things even more challenging is the fact that Microsoft supports all software for at least 10 years after the date of its release – which means that customers at any point during that timeframe can report a problem and request a QFE fix. We’ll also then do periodic service packs (SPs) rolling up these fixes during these 10 years as well.
Each QFE or SP needs to be fully verified to ensure that it does not cause a functional, stress or performance regression. Likewise, my team needs to ensure that any widely distributed change (for example: a security GDR) to Windows, CLR or Visual Studio (all of whom we sit on top of) doesn’t cause regressions in our products either. We’ll probably end up having to-do approximately 25 of these servicing analysis runs on a single product release in a given year. If you have multiple products released within a 10 year window, then you end up multiplying this number times the number of releases. It quickly gets large.
What is our process for testing?
Our high-level process for testing involves three essential steps:
1) We build detailed test plans that comprehensively cover all product scenarios
2) We automate the test scenarios in the test plans to eliminate the need for manual steps to test or verify functionality
3) We build and maintain infrastructure that enables us to rapidly run, analyze and report the status of these automated tests
Test Plans
Test plans are the first step, and happen as early as possible in the product cycle. A separate test plan will be written by a tester for each feature or feature area of the product. The goal with them is to comprehensively detail all of the scenarios needed to test a given feature. The test plan will group each of these scenarios into a test case (where 1 test case might have up to 10 or more separately verified scenarios), and assign a priority (P1, P2, or P3) to each test case.
The entire feature team (pm, dev, and test) will get together during a coding milestone to review the test plan and try to ensure that no scenarios are missing. The team will then use the test plan as the blueprint when they go to write and automate tests, and they will implement the test scenarios in the priority order defined by the plan.
During the product cycle we’ll often find new scenarios not covered by the original test plan. We call these missing scenarios “test holes”, and when found they’ll be added to the test plan and be automated. Every new bug opened during the product cycle will also be analyzed by test to ensure that it would be found by the test plan -- if not, a new test case is added to cover it.
Here is a pointer to a few pages from the test plan of our new GridView data control in ASP.NET 2.0: http://www.scottgu.com/blogposts/testingatmicrosoft/testplan/testplan.htm
The full test plan for this feature is 300+ pages and involves thousands of total scenarios – but hopefully this snippet provides a taste for what the overall document looks like. Note that some of the test cases have a number associated with them (look at the first AutoFormat one) – this indicates that this test case was missed during the original review of the document (meaning a test hole) and has been added in response to bugs being opened (110263 is the bug number).
Test Automation
After testers finalize their test plans, they will start writing and automating the tests defined within them. We use a variety of languages to test the product, and like to have a mixture of C#, VB and some J# so as to exercise the different compilers in addition to our own product.
Tests on my team are written using a testing framework that we’ve built internally. Long term we’ll use vanilla VSTS (Visual Studio Team System) infrastructure more and more, but given that they are still under active development we aren’t using it for our Whidbey release. The teams actually building the VSTS technology, though, are themselves “dogfooding” their own work and use it for their source control and testing infrastructure (and it is definitely being designed to handle internal Microsoft team scenarios). One of the really cool things about VSTS is that when it is released, you’ll be able to take all of the process described in this blog and apply it to your own projects/products with full Visual Studio infrastructure support.
My team’s test framework is optimized to enable a variety of rich web scenarios to be run, and allows us to automatically run tests under custom scenario contexts without test case modification. For example, we can automatically choose to run a DataGrid test within a code access security context, or under different process model accounts/settings, or against a UNC network share, etc – without having to ever have the DataGrid test be aware of the environment it is running in.
The test cases themselves are often relatively straight forward and not too code-heavy. Instead, the bulk of the work goes into the shared test libraries that are shared across test scenarios and test cases. Here is a pointer to an example test case written for our new WebPart personalization framework in ASP.NET 2.0: http://www.scottgu.com/blogposts/testingatmicrosoft/testcase/testcase.htm
Note how the test case contains a number of distinct scenarios within it – each of which is verified along the way. This test case and the scenarios contained within it will match the test plan exactly. Each scenario is then using a common WebPart automation test library built by the SDE/T that enables heavy re-use of code across test cases.
My team will have ~105,000 test cases and ~505,000 functional test scenarios covered when we ship Whidbey. Our hope/expectation is that these will yield us ~80-90% managed code block coverage of the products when we ship.
We use this code coverage number as a rough metric to track how well we are covering test scenarios with our functional tests. By code “blocks” we mean a set of statements in source code – and 90% block coverage would mean that after running all these functional tests 90% of the blocks have been exercised. We also then measure “arc” coverage, which includes measuring further individual code paths within a block (for example: a switch statement might count as a block – where each case statement within it would count as a separate arc). We measure both block and arc numbers regularly along the way when we do full test passes (like we are doing this week) to check whether we are on target or not. One really cool thing about VS 2005 is that VSTS includes support to automatically calculate code coverage for you – and will highlight your code in the source editor red/green to show which blocks and arcs of your code were exercised by your test cases.
There is always a percentage of code that cannot be easily exercised using functional tests (common examples: catastrophic situations involving a process running out of memory, difficult to reproduce threading scenarios, etc). Today we exercise these conditions using our stress lab – where we’ll run stress tests for days/weeks on end and put a variety of weird load and usage scenarios on the servers (for example: we have some tests that deliberately leak memory, some that AV every once in awhile, some that continually modify .config files to cause app-domain restarts under heavy load, etc). Stress is a whole additional blog topic that I’ll try and cover at some point in the future to give it full justice. Going forward, my team is also moving to a model where we’ll also add more fault-injection specific tests to our functional test suites to try and get coverage of these scenarios through functional runs as well.
Running Tests
So once you have 105,000 tests – what do you do with them? Well, the answer is run them regularly on the product – carefully organizing the runs to make sure that they cover all of the different scenarios we need to hit when we ship (example: different processor architectures, different OS versions, different languages, etc).
My team uses an internally built system we affectionately call “Maddog” to handle managing and running our tests. Post Whidbey my team will be looking to transition to a VSTS one, but for right now Maddog is the one we use.
Maddog does a couple of things for my team, including: managing test plans, managing test cases, providing a build system to build and deploy all test suites we want to execute during a given test run, providing infrastructure to image servers to run and execute our tests, and ultimately providing a reporting system so that we can analyze failures and track the results.
My team currently has 4 labs where we keep approximately 1,200 machines that Maddog helps coordinate and keep busy. The machines vary in size and quality – with some being custom-built towers and others being rack-mounts. Here is a picture of what one row (there are many, many, many of them) in one of labs in building 42 looks like:
The magic happens when we use Maddog to help coordinate and use all these machines. A tester can use Maddog within their office to build a query of tests to run (selecting either a sub-node of feature areas – or doing a search for tests based on some other criteria), then pick what hardware and OS version the tests should run on, pick what language they should be run under (Arabic, German, Japanese, etc), what ASP.NET and Visual Studio build should be installed on the machine, and then how many machines it should be distributed over.
Maddog will then identify free machines in the lab, automatically format and re-image them with the appropriate operating system, install the right build on them, build and deploy the tests selected onto them, and then run the tests. When the run is over the tester can examine the results within Maddog, investigate all failures, publish the results (all through the Maddog system), and then release the machines for other Maddog runs. Published test results stay in the system forever (or until we delete them) – allowing test leads and my test manager to review them and make sure everything is getting covered. All this gets done without the tester ever having to leave their office.
Below are some MadDog screenshots walking-through this process. Click on any of the pictures to see a full-size version of them.
Picture 1: This shows browsing the tests in our test case system. This can be done both hierarchically by feature area and via a feature query.
(click the picture above to see a full-size version of it)
Picture 2: This shows looking at one of the 105,000 test cases in more detail. Note that the test case plan and scenarios are stored in MadDog.
Picture 3: This shows how code for the test case is also stored in MadDog – allowing us to automatically compile and build the test harness based on what query of tests is specified.
Picture 4: This shows what a test looks like when run. Note the interface is very similar to what VSTS does when running a web scenario.
(click the picture above to see a full-size version of it)
Picture 5: This shows how to pick a test query as part of a new test run (basically choosing what test cases to include as part of the run)
(click the picture above to see a full-size version of it)
Picture 6: This shows picking what build of ASP.NET and Visual Studio to install on one of the test run machines.
(click the picture above to see a full-size version of it)
Picture 7: This shows picking what OS image to install on the machines (in this case Japanese Windows Server 2003 on x86), and how many machines to distribute the tests across.
(click the picture above to see a full-size version of it)
After everything is selected above, the tester can hit “go” and launch the test run. Anywhere from 30 minutes to 14 hours later it will be done and ready to be analyzed.
What tests are run when?
We run functional tests on an almost daily basis. As I mentioned earlier, we do a functional run on our shipping products every time we release a patch or QFE. We also do a functional run anytime a big software component in Microsoft releases a GDR (for example: a security patch to Windows).
With ASP.NET 2.0 and Visual Web Developer we’ll usually try and run a subset of our tests 2-3 times a week. This subset contains all of our P0 test cases and provides broad breadth coverage of the product (about 12% of our total test cases). We’ll then try and complete a full automation run every 2-3 weeks that includes all
As we get closer to big milestone or product events (like a ZBB, Beta or RTM), we’ll do a full test pass where we’ll run everything – including manually running those tests that aren’t automated yet (as I mentioned in my earlier blog post – my team is doing this right now for our Beta2 ZBB milestone date).
Assuming we’ve kept test holes to a minimum, have deep code coverage throughout all features of the product, and the dev team fixes all the bugs that are found – then we’ll end up with a really, really solid product.
Summary
There is an old saying with software that three years from now, no one will remember if you shipped an awesome software release a few months late. What customers will still remember three years from now is if you shipped a software release that wasn’t ready a few months too soon. It takes multiple product releases to change people’s quality perception about one bad release.
Unfortunately there are no easy silver bullets to building super high quality software -- it takes good engineering discipline, unwillingness to compromise, and a lot of really hard work to get there. We are going to make very sure we deliver on all of this with ASP.NET 2.0 and Visual Web Developer.
November 3rd Update: For more details on how we track and manage bugs please read this new post: http://weblogs.asp.net/scottgu/archive/2004/11/03/251930.aspx