Database normalization and Microsoft sample applications

Tuesday, December 9, 2003

Some time ago we found severe normalizations issues with Microsoft's PetShop implementation (which by the way were fixed in the 3.0 version).

Now we were planning to build a PocketPC front end for the ASP.NET TimeTracker Starter Kit, using the new SQL Server CE support/Compact Framework Windows Forms generator in DeKlarit 3.0.

When we reverse-engineered the database, we found normalization problems again. They have a Projects table, similar to:

Projects:
   ProjectId (Primary Key)
   Name
   (... more fields )

And a Categories table:

Categories:
   CategoryId (Primary Key)
   ProjectId
   Name
   (... more fields )

Then they have a table to store the time tracking entries, and they designed it like:

EntryLog
   EntryLogId (Primary Key)
   ProjectId
   CategoryId
   (... more fields )

Of course, as a CategoryId implies a ProjectId, the ProjectId field should not be in the EntryLog table.

Two thoughts:

Microsoft cannot let this kind of mistakes see the light. Please find someone that attended to a DB Modeling 101 to review your database designs.
When we explain the database-normalization features in DeKlarit we always say that normalizing a data model is easy for simple applications but very hard for complex ones... This is a 6-table application!! So I think we'll use this to show how sometimes even professional developers can't do it right in very simple models.

8 Comments

Weren't you also concerned about the two database choice they used? I always found it particular weird that the petshop solution uses 2 catalogs instead of one.

Frans Bouma - Tuesday, December 9, 2003 10:44:00 PM

Doesn't it depend on how the application will be used? There have been many projects that I worked on where, because of performance considerations, I needed to de-normalize the database because it was killing me performance wise. I can't say that is the case here since I haven't looked at the code thouroughly but I think its a little arrogant to make your statemnet without knowing beforehand the purpose of the design.

Mark Struck - Tuesday, December 9, 2003 10:59:00 PM

I was wondering if you ran the ERD through some other tools like ERWin or Embacedero, etc.. These have analysis capabilities as well..

SBC - Tuesday, December 9, 2003 11:07:00 PM

Frans, the 2 catalogs one is because they wanted to use DTC in the application.

Mark, granted, it is a little arrogant, but this is a very simple application designed to be a used as a base for other developers to extend. If the denormalization was because of performance issues, it should be clearly explained and documented.

SBC, I really did not need to do that ;)

Andres Aguiar - Wednesday, December 10, 2003 1:08:00 AM

I agree totally with Mark’s point here. You just look at a DB in a vacuum and assume it's poorly designed. I used to think naively that every table in my DB had to be perfectly normalized as well until I realized that over-normalization is bad for performance and also more difficult to maintain (because the queries are more complicated).

There are also valid cases for doing what MS is doing here as well. Adding a level of indirection to your data store can be vital to maintaining valid data. For instance I have an app where i record which staff member put a record into a table. Now every staff member belongs to a dept. But I store both the StaffID and the DeptID in the record. Why? Is this bad design? Not at all! If that staff member moves dept.'s at some point I should definitely NOT move the fact that the record was added by this employee’s new Dept. The action was taken while under Dept.X and should remain this way.

You just need to be careful when you make a blanket assumption about whats good or bad design when the model may be better than you think. This is especially true with normalization as it's not always the best thing to do.

-- not to say that this isn't wrong either... in the greater context of the app it may be wrong. Still... you need to be careful.

Ray Jezek - Wednesday, December 10, 2003 6:22:00 PM

There is no such thing as 'over-normalization'. You have a normalized model or you don't (yeah, I know you know this, but I could not resist ;).

What you are saying is that having normalized data hurts querying performance and makes queries more difficult. This is always true. Redundancies are great for querying but very bad for updating. Granted, sometimes it's easier to maintain the redundancies than to use a normalized model, and of course that's something you can do to tune your application's performance.

The case you described is different, and it's a quite common case for tables that keep historic information. If you don't need to change the DeptId when the Staff member changes it's Dept, then is well normalized anyway.

In this case, I don't think they want to change the category's project and keep the old ProjectId in the EntryLog table.

On the other hand, if you are adding an entry to your time sheet, you probably want to first select the project and then the category, so it seems quite natural to design a table with both fields. If you want to do this, the best approach is probably to set CategoryId + ProjectId as the Category's table primary key.

Andres Aguiar - Wednesday, December 10, 2003 6:54:00 PM

I'm not familiar with the app (I usually avoid the examples), but here's the likely scenario (and one I've run into before): Business rules may state that it is perfectly reasonable to enter time and *not* select a category. Working around this by forcing every project to have at least one "stub" or "default" category may be cumbersome. NULL for category is a better answer, but Project must still be filled out.

For tuples where a category *is* selected, this is a minor infraction of 3NF, but the rule is a bit fuzzy: you can't state with impunity that ProjectID is dependant on CategoryID in *all* cases, only in non-null cases.

This is somewhat common in databases that use identity fields as their primary keys. However, NF is generally intended to solve UPDATE and INSERT issues. ON CASCADE DELETE on both fields from their respective parent table will solve deletions, INSERT is not an issue, and since identity fields aren't updated themselves, the only UPDATE issue is if a category is re-assigned to another project--an unlikely scenario, but one that could ideally be handled by the code implementing such a feature or a trigger on the Categories table.

Richard Tallent - Thursday, December 11, 2003 1:42:00 AM

I just posted a more complete and less-rambling response on my blog, linked to my name above...

Richard Tallent - Thursday, December 11, 2003 2:25:00 AM

Comments have been disabled for this content.