Li Chen's Blog

5 minutes guide to exposing .NET components to COM clients

Wednesday, October 12, 2011
.NET
No Comments

I occasionally write COM objects in .NET. I have to remind myself the right things to do every time. Finally, I just decide to write it up so I don’t have to do the same research again.

If you try to register a .NET component as COM using regasm tool, by default, the tool will expose all exposable public classes. The exposable mean that the class must satisfy some rules such as the type must have a public default constructor. For complete list of rules, see MSDN.

The Visual Studio template for class libraries actually turn off the COM visibility in the AssemblyInfo.cs file using the assembly attribute [assembly: ComVisible(false)]. This is a good practice as we do not want to flood the registry with all the types in our library.

Next, we need to apply ComVisible(true) attribute to all classes and interfaces that we want to expose. It is a good practice to apply the attribute [ClassInterface(ClassInterfaceType.None)] to classes and implement interfaces explicitly. Why? That is because by default .NET will automatically create a class interface to contain all exposable methods. We do not have much control of the interface. So it is a good idea to turn off the class interface and control our interfaces explicitly.

Lastly, it is a good idea to explicitly apply Guid attributes to classes and interfaces. This helps avoiding adding lots of orphan ClassId entries in the registry in our development machine. Visual Studio has Create GUID tool right under the Tools menu.

This is just a brief discussion and is no substitution for MSDN. Adam Nathan wrote an ultimate book called .NET and COM. This book was out of print for a while so even a used one was very expensive. Fortunately, the publisher reprinted it. Adam has also some nice articles on InformIT (need to click the Articles tab).
Take pain out of Windows Communication Foundation (WCF) configuration (2)

Saturday, October 8, 2011
ASP.NET WCF
No Comments

I am continuing my previous part. I am going to discuss how to trouble shoot WCF problems by using WCF tracing and message logging. Again I am going to use Microsoft Service Configuration Editor discussed in the previous part. When you click the Diagnostics node on the left, you will see the following:

You would want to enable tracing and message logging. If you want to see the log entries immediately, you also want to enable log auto flush. What is the difference between tracing and message logging? Tracing captures the life cycle of WCF service call while the message logging capture the message. If you get a message deserialization error, you would need message logging to inspect the message. Once enabling the logging, you need to go to each listener to configure the log location and go to each source to configure the trace level. If you want to log the entire soap message, go Message Logging node and set LogEntireMessage to true. MSDN has a nice article on recommended settings for development and production environments.

Also, do not change the .svclog file extension when you specify the log file. This is a registered extension for the WCF tracing log file. If you double click a log file, another tool in the same directory of configuration editor called SvcTraceViewer will show up and display the contents of the log.

Recently, I have been seeing random deserialization errors in one of my WCF. After turning on full message logging, I was able to inspect the full soap message, and I see nothing wrong with the message. So what is the problem? By default, to prevent denial-of-service attack, WCF has some limits on message length, array length, string length. This can be configured in the binding configuration:

The default limit for the size of string parameter is 8192 bytes. There are clients sending string longer than that. So I had to increate the MaxStringContentLength attribute under the ReaderQuotas property. For the default of other quotas, please visit this page on MSDN.

WCF is big. I cannot discuss all the various configuration options. There are no substitutions for MSDN or a good book. I hope I have covered some common scenarios for help you get through the troubles quickly without spending lots of time navigating the MSDN documents.
Take pain out of Windows Communication Foundation (WCF) configuration (1)

Saturday, October 8, 2011
ASP.NET WCF
No Comments

WCF is a very flexible technology. The flexibility comes with complexity; there are many configurable pieces. It is usually relative easier to start with an ASP.NET project and add WCF service to it. However, as we start changing things and deploying web service to production, many unexpected problem could occur. I have written many WCF web services in my life but I don’t spend high enough percentage of my time working WCF. So I went through the same struggle again each time when I work with WCF. So I thought I would start documenting the experience to stop the pain.

If I started with an empty website called website1 and add a WCF service called DummyService, Visual Studio will added the DummyService.svc as well as the associated interface and implementation in the App_Code directory.

The contents of the .svc file looks like:

Everything works fine and makes sense, but both the service interface and implementation class lives in the global namespace. Now supposing I like to put business logic in a library project, I add another library project with namespace DummyLib and move the interface and implementation into the library project. I add the reference to System.ServiceModel in my DummyLib project and add a project reference to DummyLib into my website. Now I need to change my .svc file. Obvious, I do not need the CodeBehind attribute any more as my code is compiled by my library project. How do I reference the class? According to MSDN, the Service attribute should point to "Service, ServiceNamespace". Unfortunately, that does not work. Fortunately, like any other place where a type is referenced, I can use a type name like “TypeName, AssemblyName, etc”. Look at the web.config file and look for a type attribute and you will know what I am talking about. So now I .svc file looks like:

Is it enough? No. WCF actually stores very few information in the class and stores lots information in the web.config; that is how it could be configurable. So we need to modify the web.config to reference both the namespaced interface and namespaced class (two changes).

Now it is time to examine our web.config file. There is a serivce element that points to our implementation type. The service exposes 2 end points: one is our service end point; the “mex” endpoint if for retrieving meta data, in our case, WSDL. The web service end points to the wsHttpBinding and the mex endpoint poinsts to the mexHttpBinding. Finally, the service point to a behavior that is HttpEnabled and does not include Exception Details in fault. Confused? It is not easy to create configuration like this from scratch, but once the Visual Studio created this for us, we can feel our way to move around a bit. Fortunately, Windows SDK has a really nice tool called SvcConfigEditor. It is normally in C:\Program Files\Microsoft SDKs\Windows\v7.0A\Bin or C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin. If I open my web.config file with SvcConfigEditor, and look at the choice for end point binding:

There are lots of choices. If we work with SOAP, two common choices are basicHttpBinding and wsHttpBinding. basicHttpBinding supports an older standard of SOAP. It does not support message level security, but is very compatible with all kinds of clients. wsHttpBinding supports newer SOAP standard and many of the WS* extensions. Now supposing we have legacy clients so we need to use basicHttpBinding. We also want to use Https to secure us at the transport level. Let us see how many changes that we need to make.

Firstly, basicHttpBinding does not support https by default. We need to create a basicHttpBindingConfiguration to change the default. So I click the Bindings node on the left and then add a binding configuration calls basicHttpBindingConfiguration. Then I can click the Security tab and then the Mode from “None” to “Transport”. Now I can return to the end point and the configuration I just created is available to select in the BindingConfiguration attribute. Next, we need to go to the behavior, and add HttpsGetEnabled=”True”. Lastly, we need to go to the “mex” endpoint to change binding from mexHttpBinding to mexHttpsBinding. So to change from http to https, we need to make 3 modifications and our web.config now looks like:

In the next part, I will talk about WCF tracing and message logging. It really makes trouble shooting easier.
Sharing session state over multiple ASP.NET applications with ASP.NET state server

Wednesday, October 5, 2011
ASP.NET
No Comments

There are many posts about sharing session state using Sql Server, but there are little information with out-of-process session state. The primary difficulty is that there is not an official way. While the source code for the Sql Server Session State is available in ASP.NET provider toolkit, there are scarce information on state server. The method introduced in this post is based on both public information as well as a little help from Reflector. I should warn that this method is considered as hacking as I use reflection to change private member. Therefore, it will not in partially trusted environment. Future version of the OutOfProcSessionStateStore class could break the compatibility too; users should use the principle discussed in the post to find the solution.

I will first discuss how ASP.NET state server used by out-of-proc session state works. According to the published protocol specification, ASP.NET state server is essentially a mini HTTP Server talking in customized HTTP. The key that ASP.NET used to lookup data is composed of an application id from HttpRuntime.AppDomainAppId, a hush of it and the SesionId stored in cookie. This is very similar to the Sql Server session state. Developers has been using reflection to modify the _appDomainAppId private variable before it is used by the SqlSessionStateStore. However, this method does not work with OutOfProcSessionStateStore because the way it was written.

When the ASP.NET session state http module is loaded, it uses the mode attribute of the sessionState element in the Web.Config file to determine which provider to load. If the mode is “StateServer”, the http module will load OutOfProcSessionStateStore and initialize it. The partial key it constructed is stored in the s_uribase private static variable. Since SessionStateModule loads very early in the pipeline, it is difficult to change the _appDomainAppId variable before OutOfProcSessionStateStore is loaded.

The solution is to change modify the data in OutOfProcSessionStateStore using reflection. According to ASP.NET life cycle, an asp.net application can actually load multiple HttpApplication instances. For each HttpApplication instance, ASP.NET will load the entire set of http modules configured in machine.config and web.config without specific order. After loading the http modules, ASP.NET will call the init method of the HttpApplication class, which can be subclassed in global.asax. Therefore, we can access the loaded OutOfProcSessionStateStore in the init method to modify it so that all the ASP.NET applications use the same application id to access the state server. ASP.NET is already good at keeping the sessionid the same cross applications. If this is not enough, we can supply a SessionIdManager.

So, we just need to place the following code in global.asax or code-behind:
public override void Init() { base.Init(); foreach (string moduleName in this.Modules) { string appName = "MyApp"; IHttpModule module = this.Modules[moduleName]; SessionStateModule ssm = module as SessionStateModule; if (ssm != null) { FieldInfo storeInfo = typeof(SessionStateModule).GetField("_store", BindingFlags.Instance | BindingFlags.NonPublic); SessionStateStoreProviderBase store = (SessionStateStoreProviderBase)storeInfo.GetValue(ssm); if (store == null) //In IIS7 Integrated mode, module.Init() is called later { FieldInfo runtimeInfo = typeof(HttpRuntime).GetField("_theRuntime", BindingFlags.Static | BindingFlags.NonPublic); HttpRuntime theRuntime = (HttpRuntime)runtimeInfo.GetValue(null); FieldInfo appNameInfo = typeof(HttpRuntime).GetField("_appDomainAppId", BindingFlags.Instance | BindingFlags.NonPublic); appNameInfo.SetValue(theRuntime, appName); } else { Type storeType = store.GetType(); if (storeType.Name.Equals("OutOfProcSessionStateStore")) { FieldInfo uribaseInfo = storeType.GetField("s_uribase", BindingFlags.Static | BindingFlags.NonPublic); uribaseInfo.SetValue(storeType, appName); } } } } }
As you can see, this code works purely because we hacked according to the way OutOfProcSessionStateStore was written. Future changes of OutOfProcSessionStateStore class may break our code, or may make our life harder or easier. I hope future version of ASP.NET will make it easier for us to control the application id used against the state server. If fact, Windows Azure already has a session state provider that allows sharing session state across applications.
Edited on 10/17/2011:
The previous version of the code only works in IIS6 or IIS7 classic mode. I made a change so that the code works in IIS7 integrated mode as well. What happen is that in IIS7 when Application.Init() is called, the module.init() is not yet called. So the _store variable is not yet initialized. We could change the _appDomainAppId of the HttpRuntime at this time as it will be picked up later when the session store is initialized. The session store will use a key in the form of AppId(Hash) to access the session. The Hash is generated from the AppId and the machine key. To ensure that all applications use the same key, it is important to explicitely specify the machine key in the machine.config or the web.config rather than using auto generated key. See http://msdn.microsoft.com/en-us/library/ff649308.aspx for details and especially scroll down to the section "Sharing Authentication Tickets Across Applications".
Theory and Practice of Database and Data Analysis (4) – Capturing additional information on relations

Monday, October 3, 2011
General SQL Server
No Comments

So far in this series, I have been talking about traversing related data by navigating down the referential constraints. The physical database makes no distinction about the nature of the foreign key, whether it is parent-child relationship or another type of reference. In logical ER modeling, we often capture more information. So in this blog, I will explore the idea of capturing additional information so that we can build more powerful tools.

Let me further refrain myself by using the fictitious e-commerce company example introduced in the part 1 of this series: We have an Orders table to capture the order header and OrderDetails table to capture the line items. An order must be placed by a customer, and may or may not originates from marketing campaign. An order may be fulfilled by one or more shipments. Each shipment would ship order line items in part or in full. So we have an ER diagram likes the following:

In the ER diagram, there is a clear parent-child relationship between Orders and OrderDetails and between Shipdments and ShipmentDetails. If we send the data in an XML file, each OrderDetail element would be a child of an Order element and each ShipmentDetail element would be a child element of an Shipment element. However, it appears that ShipmentDetails table has two parents: Shipments and OrderDetails. We will pick the parent by the importance of the relationship so we will pick Shipments as the parent. We then need to establish an additional mechanism to store the cross references between ShipmentDetails and OrderDetails in our XML file.

Let us examine other relations. Customer is a mandatory attribute for the Order table and Product is a mandatory attribute to the OrderDetails. Campaign is an optional attribute of the Orders. We can further classify as Products as system table and Customers as data table.

So we have classified the relations in our simple example into several categories: parent-child, cross-reference, mandatory attribute to system table, mandatory attribute to data table and optional attribute.

Now let us use an example to see how such classification can help us. Supposing we are building a tool to export all data related to a transaction from our production database into an XML and then import into our development database. The ID of an element could change from one system to another. That is not a problem with parent-child as we will get a new key when we insert the parent and we can propagate that down to the children. However, we need to take extra care to ensure that the cross reference is maintained.

We do not have to worry about the system table but we do have to worry about the data table. Our development database may or may not have the same customer. So from our classification, we can generically determine that we need to get a copy of mandatory reference to a data table and its children to check for existence in the destination database.

Besides categorization, other information that we like to capture are one-to-one relationships, some of them can be inferred; if a referential constraint references to primary key in both tables, it has a one-to-one relationship. We also like to capture hidden relationship; they are real references but cannot be enforced by foreign key constraint in Sql Server. For example, some tables contain an ID column that can store multiple types of IDs; a separate column is used to indicate the type of ID stored in the ID column.

With human input of extra information beyond those we can extracted from physical databases, we shall be able to build better tools. I will explorer some tools in upcoming blogs of this series.
Theory and Practice of Database and Data Analysis (3) – Exhaustive Search by Traversing Table Relationships

Thursday, September 29, 2011
General Software Development SQL Server
No Comments

In this post, I will discuss about building a script to find all the relevant data starting from a table name and primary key value. In the previous post, I have discussed how to a list of tables that references a table. From a parent table, we just need to get a list of child tables. We then loop through each child table to select records using the parent primary key value. If the child tables themselves have child tables, we repeat the same approach by recursively getting the grand children.

I often need to remote-desktop into clients system where I only have read privilege; I can neither create stored procedures nor execute them. So I am going to present a script that requires only read privilege.

In transact-SQL (T-SQL) batch, looping through records can be implemented using cursor or temp table. I used the later approach as temp table can grow so that I can also use it for recursion as well.

Recursion is a harder problem here. Recursion can be done in most of programming languages by calling procedures recursively. However, that is not possible here since I am restricted to a batch. In computer science, recursion can be implemented with a loop with a stack. In many programming languages, when a caller calls a callee, the program would save the current local data as well as the returning location of the caller into the stack. The area in the stack used by each call is called a stack frame. Once the callee returns, the program will restore the local data from the stack and resume from the previous location.

In my script, I use temp table #ref_contraints to accomplish both looping and recursion. referential constraints from the top table are added to the table as they are discovered and removed from the table when they are consumed (that is, no longer needed). I used the depth-first traversal in my script. With this structure, I can change to breath-first traversal with minimum efforts.

The #keyvalues temp table contains the primary key values for the table that I have already traversed. I can get the records from the child tables by joining to the key values in this table so that I only need to query each child table once for each foreign key relationship.

In order to make the script simpler, I eliminate the schema and assume all tables are under the schema “dbo”. This works with our database, and works with databases in Microsoft Dynamics CRM 2011.

I have also assumed that all primary keys contain only one integer column. This is true with out database. This is also true with Microsoft Dynamics CRM 2011 except you need to change integer id to guid.

I added comments the script so that one would know where the code would correspond to looping and procedure calling if the code is written in a language like VB.NET or C#. The comments also indicate the insertion points if additional code is needed.
--input values declare @ptable varchar(256) declare @pid int set @ptable = 'Invoices' set @pid = 55813 --variables declare @sql varchar(2000) declare @pcolumn varchar(256) --primary key column of the parent table declare @ftable varchar(256) --child table name declare @fcolumn varchar(256) --column of the child table that link to primary key of the parent table declare @fpcolumn varchar(256) --primary key column of the child table declare @frame int declare @rid int set @frame = 0 --Temp table used to hold key values in related tables --drop table #keyvalues --select * from #keyvalues order by id asc create table #keyvalues ( id int not null identity(1,1), tablename varchar(256), keyvalue int ) --Temp table used to hold referential contraints so that we can navigate down --drop table #ref_contraints --select * from #ref_contraints order by rid create table #ref_contraints ( rid int not null identity (1,1), frame int, ptable varchar(256), pcolumn varchar(256), ftable varchar(256), fcolumn varchar(256) ) --get the primary column name select @pcolumn = kcu.COLUMN_NAME from INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc inner join INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu on tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME where tc.CONSTRAINT_TYPE = 'PRIMARY KEY' and tc.TABLE_NAME = @ptable --select the parent record set @sql = 'select t1.* from ' + @ptable + ' t1 where ' + @pcolumn + '=' + cast(@pid as varchar(10)) --print @sql exec (@sql) if (@@ROWCOUNT > 0) begin --Save the key value of the parent record set @sql = 'insert into #keyvalues select ''' + @ptable + ''', t1.' + @pcolumn + ' from ' + @ptable + ' t1 where ' + @pcolumn + '=' + cast(@pid as varchar(10)) exec (@sql) GOTO GET_CHILDREN_BEGIN --call the get_chilren function GET_CHILDREN_RETURN2: --return from the last get_children function drop table #keyvalues drop table #ref_contraints GOTO BATCH_END end GET_CHILDREN_BEGIN: --parameters are @ptable and set @frame = @frame + 1 insert into #ref_contraints(frame, ptable, pcolumn, ftable, fcolumn) select @frame, kcup.TABLE_NAME, kcup.COLUMN_NAME, kcuf.TABLE_NAME, kcuf.COLUMN_NAME from INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc inner join INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcup on kcup.CONSTRAINT_SCHEMA = rc.UNIQUE_CONSTRAINT_SCHEMA and kcup.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME inner join INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcuf on kcuf.CONSTRAINT_SCHEMA = rc.CONSTRAINT_SCHEMA and kcuf.CONSTRAINT_NAME = rc.CONSTRAINT_NAME and kcup.ORDINAL_POSITION = kcuf.ORDINAL_POSITION left outer join #ref_contraints trc --Exclude existing records on trc.frame = @frame and trc.ptable = kcup.TABLE_NAME and trc.pcolumn = kcup.COLUMN_NAME and trc.ftable = kcuf.TABLE_NAME and trc.fcolumn = kcup.COLUMN_NAME where kcup.TABLE_NAME = @ptable and not (kcuf.TABLE_NAME = @ptable) --Exclude foreign key relationship within the same table and trc.frame is null FOR_EACH_CHILD_LOOP_BEGIN: select @rid=rid, @ptable = ptable, @pcolumn = pcolumn, @ftable = ftable, @fcolumn = fcolumn from #ref_contraints where frame = @frame if (@@ROWCOUNT > 0) begin delete #ref_contraints where rid = @rid --delete the one that is used set @sql = 'select ''' + CAST(@frame as varchar(10)) + ''',''' + @ptable + ''', ''' + @ftable + ''', t1.* from ' + @ftable + ' t1 inner join #keyvalues t2 on t1.' + @fcolumn + '= t2.keyvalue where t2.tablename=''' + @ptable + '''' --print @sql exec (@sql) if (@@ROWCOUNT > 0) begin --get the primary column name select @fpcolumn = kcu.COLUMN_NAME from INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc inner join INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu on tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME where tc.CONSTRAINT_TYPE = 'PRIMARY KEY' and tc.TABLE_NAME = @ftable --outer join t3 to avoid duplicated records set @sql = 'insert into #keyvalues select distinct ''' + @ftable + ''', t1.' + @fpcolumn + ' from ' + @ftable + ' t1 inner join #keyvalues t2 on t1.' + @fcolumn + '= t2.keyvalue left outer join #keyvalues t3 on t3.tablename=''' + @ftable + ''' and t3.keyvalue=t1.' + @fpcolumn + ' where t3.id is null and t2.tablename=''' + @ptable + '''' exec (@sql) set @ptable = @ftable --The child table is the new parent table now GOTO GET_CHILDREN_BEGIN --Call get_children GET_CHILDREN_RETURN1: end end else begin GOTO FOR_EACH_CHILD_LOOP_EXIT end FOR_EACH_CHILD_LOOP_END: --Clean up before returning to the loop --On finishing cleaning up, return to the beginning of the loop GOTO FOR_EACH_CHILD_LOOP_BEGIN FOR_EACH_CHILD_LOOP_EXIT: --Execute code after the FOR_EACH_CHILD_LOOP here --On finishing everything, exit the routine GOTO GET_CHILDREN_END GET_CHILDREN_END: --clean up the frame here set @frame = @frame - 1 if @frame > 0 GOTO GET_CHILDREN_RETURN1 else GOTO GET_CHILDREN_RETURN2 BATCH_END: --end of the batch
In future posts of this series, I will discuss how to capture human intelligence to make search even more powerful.
The case for living in the cloud

Monday, September 26, 2011
General Software Development
No Comments

These days one cannot have a day without hearing the cloud; vendors are pushing it. Google’s Chrome Book is already nothing but the cloud. Apple’s iOS and Microsoft’s upcoming Windows 8 also have increase cloud features. Until now, I have been skeptical about the cloud. Apart from the security, my primary concern is what if I lose the connection to the internet.

Recently, I have been migrating to the cloud. The main reason is that I triple boot my laptop now. I once run Windows 7 as my primary OS and anything else as virtual machines. However, the performance of these virtual machines has been less than ideal. The Windows 7 Virtual PC would not run 64bit OS, so I have to rely on Virtual Box to run Windows 2008 R2. Recently, I have been booting Windows 8 Developers Preview and Sharepoint/Windows 2008 from VHD. This is the my currently most satisfying configuration. Although the VHD is slightly slower than the real hard drive, I have the full access to the rest of the hardware. The issue I am facing now is that I need to access my data no matter which OS I boot with. That motivate me to migrate to the cloud.

My primary data are my email, document and code.

Email is least of my concern. Both Google and Hotmail have been offering ample space for me to store my email. I do download a copy email into Windows 7 so I can access them when I do not have access to the internet.

Documents also have been easier. Google has been offering Google Docs as well as web based document editing for a while. Recently, Microsoft has been offering 25GB of free space on Windows Live Skydrive. From Skydrive, I can edit documents using Office Web Apps. So accessing and editing document from a boot OS that does not have Office installed is no longer a concern. From Office 2010, it is possible to save documents directly to Skydrive. In addition, Windows Live Mesh can automatically sync the local file system with Skydrive.

Lastly, there are many free online source code versioning systems. Microsoft Codeplex, Google Code and Git, just to name a few, all offer free source code versions systems for open source projects. Other vendors such BitBucket also offer source control for close source projects. Open source version control software such as TortoiseHg or TortoiseSVN are easy to get. The free Visual Studio Express is also becoming more useful for real world projects; the recent Windows 8 Developer Preview has Visual Studio Express 11 preinstalled.

So my conclusion is that the currently available services and software are sufficient to my needs and I am ready to live in the cloud.
Theory and Practice of Database and Data Analysis (2) – Navigating the table relationships

Friday, September 23, 2011
SQL Server
No Comments

In the first part of the series, I discussed how to use the Sql Server meta data to find database objects. In this part, I will discuss how to find all the data related to a transaction by navigating the table relationships. Let us suppose that the transaction is a purchase order stored in a table called Orders. Any one that has seen the Northwind database knows that it has a child entity called OrderDetails. In even a modest real world database, the complexity can grow very fast. If we can ship a partial order and put the rest in back order, or we need to ship from multiple warehouses, we could have multiple Shipping records for each Order record and have each ShippingDetails record linked to an OrderDetails record. The customers can return the order, either in full or in part, with and without return authorization number, and the actually return may or may not match the return authorization. The system needs to be flexible enough to capture all the events related to the order. As you can see, a very modest order transaction could easily grow to a dozen of tables. If you are not familiar with the database, how can you find all the related table? If you need to delete a record, how do you delete it clean? If you need to copy a transaction from a production system to the development system, how do you copy the entire set of the data related to the transaction?

Fortunately, in any well-designed database, table relationships can be navigated through the foreign key relationships. In Sql server, we can usually find the information using the sp_help stored procedure. Here we will use INFORMATION_SCHEMA to obtain more refined results to be used in our tools. The TABLE_CONSTRAINTS view contains the primary and foreign keys on a table. The REFERENTIAL_CONSTRAINTS view contains the name of the foreign key on the many side and the name of the unique constraint on the one side. It is possible to navigate the table relationships using these two views. The following query will find all the tables that references the Sales.SalesOrderHeader table in the AdventureWorks database:

select tcf.TABLE_SCHEMA, tcf.TABLE_NAME from INFORMATION_SCHEMA.TABLE_CONSTRAINTS tcp inner join INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc on tcp.CONSTRAINT_SCHEMA = rc.UNIQUE_CONSTRAINT_SCHEMA and tcp.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME and tcp.CONSTRAINT_TYPE = 'PRIMARY KEY' inner join INFORMATION_SCHEMA.TABLE_CONSTRAINTS tcf on rc.CONSTRAINT_SCHEMA = tcf.CONSTRAINT_SCHEMA and rc.CONSTRAINT_NAME = tcf.CONSTRAINT_NAME where tcp.TABLE_SCHEMA = 'Sales' and tcp.TABLE_NAME = 'SalesOrderHeader'

Here are the results:

The following query will return all the tables referenced by the Sales.SalesOrderHeader table:

select tcp.TABLE_SCHEMA, tcp.TABLE_NAME from INFORMATION_SCHEMA.TABLE_CONSTRAINTS tcp inner join INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc on tcp.CONSTRAINT_SCHEMA = rc.UNIQUE_CONSTRAINT_SCHEMA and tcp.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME and tcp.CONSTRAINT_TYPE = 'PRIMARY KEY' inner join INFORMATION_SCHEMA.TABLE_CONSTRAINTS tcf on rc.CONSTRAINT_SCHEMA = tcf.CONSTRAINT_SCHEMA and rc.CONSTRAINT_NAME = tcf.CONSTRAINT_NAME where tcf.TABLE_SCHEMA = 'Sales' and tcf.TABLE_NAME = 'SalesOrderHeader'

And the results:

The Person.Address table appeared twice because we have two columns referencing the table.

The CONSTRAINT_COLUMN_USAGE view contains the columns in the primary and the foreign keys. By using these 3 views, we can construct a simple tool that drills down into the data. In the next part of this series, we will construct such a tool.

Added by Li Chen 9/26/2011

The KEY_COLUMN_USAGE view also contains the columns in the primary and the foreign keys. I prefer to use this view because it also contains the ORDINAL_POSITION that I can use to match the columns in the primary and the foreign keys when we have composite keys. The following example shows how to match the columns. The last line in the query eliminates the records from a table joining to itself.

declare @tablename varchar(256) declare @schema varchar(256) set @schema = 'Sales' set @tablename = 'SalesOrderHeader' select kcup.TABLE_SCHEMA, kcup.TABLE_NAME, kcuf.TABLE_SCHEMA, kcuf.TABLE_NAME, kcup.ORDINAL_POSITION, kcup.COLUMN_NAME, kcuf.COLUMN_NAME from INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc inner join INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcup on kcup.CONSTRAINT_SCHEMA = rc.UNIQUE_CONSTRAINT_SCHEMA and kcup.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME inner join INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcuf on kcuf.CONSTRAINT_SCHEMA = rc.CONSTRAINT_SCHEMA and kcuf.CONSTRAINT_NAME = rc.CONSTRAINT_NAME and kcup.ORDINAL_POSITION = kcuf.ORDINAL_POSITION where kcup.TABLE_SCHEMA = @schema and kcup.TABLE_NAME = @tablename and not (kcuf.TABLE_SCHEMA = @schema and kcuf.TABLE_NAME = @tablename)
Theory and Practice of Database and Data Analysis (1) – Searching for objects

Wednesday, September 21, 2011
SQL Server
No Comments

Recently, I had to spend a significant portion of my time on production data support. Since we have a full-featured agency management system, I have to deal with parts of the application that I am not familiar with. Through the accumulated experiences and improved procedure, I was able to locate the problems with the increasing speed. In this series, I will try to document my experiences and approaches. In the first part, I will discuss how to search for objects.

Supposing we were told that there were problems in address data, the very first thing we need to do is to locate the table and the field that contains the data. Microsoft SQL Server has a set of sys* tables. It also supports ANSI style INFORMATION_SCHEMA. In order to reuse the knowledge discussed here to other database systems, I will try to use the INFORMATION_SCHEMA as much as possible. So to find any table that is related to “Address”, we can use the following query:

select * from Information_Schema.Tables where Table_Type = 'BASE TABLE' and Table_name like '%Address%'

Using AdventureWorks 2008 sample database, it yields the following results:

If we suspect there are columns in other tables relating to address, we can search for the columns with the following query:

select c.* from Information_Schema.Columns c inner join Information_Schema.Tables t on c.Table_Schema = t.Table_Schema and c.Table_Name = t.Table_Name where Table_Type = 'BASE TABLE' and Column_name like '%Address%'

We will get the following results this time:

Supposing we need to find all the stored procedures and functions that reference a table, we can use sp_depends stored procedure. If we need to search thoroughly using a string, we can use the following query:

select Routine_Schema, Routine_Name, Routine_Type from Information_Schema.Routines where Routine_Definition like '%Accounting%'

However, we cannot find triggers through Information_Schema. We have to use sysobjects:

select name, object_name(parent_obj) as table_name from sysobjects where type = 'TR'

Once we find the triggers, we can find the text using sp_helptext or query the syscomments table.

In the next part of the series, I will discuss navigating the table relationships.
ASP.NET Web Application Project (WAP) vs Website

Thursday, July 28, 2011
ASP.NET
2 Comments

Recently, we moved a fairly large web application to Website project. This is a very old decision point that we have moved back and forth a few times. A Bing search shows 2 well written entries early in the list:
- Web Application Projects versus Web Site Projects
- Web Application Project VS Web Site
Here is how we reach the current decision:
- Back to VS2002/VS2003, it was WAP. No choice.
- In VS2005, it was website. No choice initially until WAP was added back to VS2005.
- In VS2008, we moved to WAP. The main reason was that website projects does not have a project file to store assembly and project references. The project reference is stored in the solution file. The reference to GAC assembly was stored in the web.config and the non-GAC DLL reference is through a bunch .refresh files that points to the location of DLL files (library). This makes checking in solution file into version control important.This system is fragile and breaks easily.
- Now we are moving back to Website. We understand website reference mechanism better now so it is less frustrating. In WAP, the entire code-behind is compiled into a single DLL. In website, it can be much granular so it works better in industries (such as financial) that requires clean audit-trail of artifacts. We have several thousands of web pages. WAP will requires us to compile the entire application before debugging while we can compile just a single page with Website. In general, with the adoption of MVVM, most of the business logic now resides in separate library DLLs so we do not have much code other than the data-binding code left in the code-behind. This allows us to have a fine control on the granularity of compilation and deployment units.