Li Chen's Blog

  • 5 minutes guide to exposing .NET components to COM clients

    I occasionally write COM objects in .NET. I have to remind myself the right things to do every time. Finally, I just decide to write it up so I don’t have to do the same research again.

    If you try to register a .NET component as COM using regasm tool, by default, the tool will expose all exposable public classes. The exposable mean that the class must satisfy some rules such as the type must have a public default constructor. For complete list of rules, see MSDN.

    The Visual Studio template for class libraries actually turn off the COM visibility in the AssemblyInfo.cs file using the assembly attribute [assembly: ComVisible(false)]. This is a good practice as we do not want to flood the registry with all the types in our library.

    Next, we need to apply ComVisible(true) attribute to all classes and interfaces that we want to expose. It is a good practice to apply the attribute [ClassInterface(ClassInterfaceType.None)] to classes and implement interfaces explicitly. Why? That is because by default .NET will automatically create a class interface to contain all exposable methods. We do not have much control of the interface. So it is a good idea to turn off the class interface and control our interfaces explicitly.

    Lastly, it is a good idea to explicitly apply Guid attributes to classes and interfaces. This helps avoiding adding lots of orphan ClassId entries in the registry in our development machine.  Visual Studio has Create GUID tool right under the Tools menu.

    This is just a brief discussion and is no substitution for MSDN. Adam Nathan wrote an ultimate book called .NET and COM. This book was out of print for a while so even a used one was very expensive. Fortunately, the publisher reprinted it. Adam has also some nice articles on InformIT (need to click the Articles tab).

    067232170X

  • Take pain out of Windows Communication Foundation (WCF) configuration (2)

    I am continuing my previous part. I am going to discuss how to trouble shoot WCF problems by using WCF tracing and message logging. Again I am going to use Microsoft Service Configuration Editor discussed in the previous part. When you click the Diagnostics node on the left, you will see the following:

    image

    You would want to enable tracing and message logging. If you want to see the log entries immediately, you also want to enable log auto flush. What is the difference between tracing and message logging? Tracing captures the life cycle of WCF service call while the message logging capture the message. If you get a message deserialization error, you would need message logging to inspect the message. Once enabling the logging, you need to go to each listener to configure the log location and go to each source to configure the trace level. If you want to log the entire soap message, go Message Logging node and set LogEntireMessage to true. MSDN has a nice article on recommended settings for development and production environments.

    Also, do not change the .svclog file extension when you specify the log file. This is a registered extension for the WCF tracing log file. If you double click a log file, another tool in the same directory of configuration editor called SvcTraceViewer will show up and display the contents of the log.

    Recently, I have been seeing random deserialization errors in one of my WCF. After turning on full message logging, I was able to inspect the full soap message, and I see nothing wrong with the message. So what is the problem? By default, to prevent denial-of-service attack, WCF has some limits on message length, array length, string length. This can be configured in the binding configuration:

    image

    The default limit for the size of string parameter is 8192 bytes. There are clients sending string longer than that. So I had to increate the MaxStringContentLength attribute under the ReaderQuotas property. For the default of other quotas, please visit this page on MSDN.

    WCF is big. I cannot discuss all the various configuration options. There are no substitutions for MSDN or a good book. I hope I have covered some common scenarios for help you get through the troubles quickly without spending lots of time navigating the MSDN documents.

  • Take pain out of Windows Communication Foundation (WCF) configuration (1)

    WCF is a very flexible technology. The flexibility comes with complexity; there are many configurable pieces. It is usually relative easier to start with an ASP.NET project and add WCF service to it. However, as we start changing things and deploying web service to production, many unexpected problem could occur. I have written many WCF web services in my life but I don’t spend high enough percentage of my time working WCF. So I went through the same struggle again each time when I work with WCF. So I thought I would start documenting the experience to stop the pain.

    If I started with an empty website called website1 and add a WCF service called DummyService, Visual Studio will added the DummyService.svc as well as the associated interface and implementation in the App_Code directory.

    image

    The contents of the .svc file looks like:

    image

    Everything works fine and makes sense, but both the service interface and implementation class lives in the global namespace. Now supposing I like to put business logic in a library project, I add another library project with namespace DummyLib and move the interface and implementation into the library project. I add the reference to System.ServiceModel in my DummyLib project and add a project reference to DummyLib into my website. Now I need to change my .svc file. Obvious, I do not need the CodeBehind attribute any more as my code is compiled by my library project. How do I reference the class? According to MSDN, the Service attribute should point to "Service, ServiceNamespace". Unfortunately, that does not work. Fortunately, like any other place where a type is referenced, I can use a type name like “TypeName, AssemblyName, etc”. Look at the web.config file and look for a type attribute and you will know what I am talking about. So now I .svc file looks like:

    image

    Is it enough? No. WCF actually stores very few information in the class and stores lots information in the web.config; that is how it could be configurable. So we need to modify the web.config to reference both the namespaced interface and namespaced class (two changes).

    image

    Now it is time to examine our web.config file. There is a serivce element that points to our implementation type. The service exposes 2 end points: one is our service end point; the “mex” endpoint if for retrieving meta data, in our case, WSDL. The web service end points to the wsHttpBinding and the mex endpoint poinsts to the mexHttpBinding. Finally, the service point to a behavior that is HttpEnabled and does not include Exception Details in fault. Confused? It is not easy to create configuration like this from scratch, but once the Visual Studio created this for us, we can feel our way to move around a bit. Fortunately, Windows SDK has a really nice tool called SvcConfigEditor. It is normally in C:\Program Files\Microsoft SDKs\Windows\v7.0A\Bin or C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin. If I open my web.config file with SvcConfigEditor, and look at the choice for end point binding:

    image

    There are lots of choices. If we work with SOAP, two common choices are basicHttpBinding and wsHttpBinding. basicHttpBinding supports an older standard of SOAP. It does not support message level security, but is very compatible with all kinds of clients. wsHttpBinding supports newer SOAP standard and many of the WS* extensions. Now supposing we have legacy clients so we need to use basicHttpBinding. We also want to use Https to secure us at the transport level. Let us see how many changes that we need to make.

    Firstly, basicHttpBinding does not support https by default. We need to create a basicHttpBindingConfiguration to change the default. So I click the Bindings node on the left and then add a binding configuration calls basicHttpBindingConfiguration. Then I can click the Security tab and then the Mode from “None” to “Transport”. Now I can return to the end point and the configuration I just created is available to select in the BindingConfiguration attribute. Next, we need to go to the behavior, and add HttpsGetEnabled=”True”. Lastly, we need to go to the “mex” endpoint to change binding from mexHttpBinding to mexHttpsBinding. So to change from http to https, we need to make 3 modifications and our web.config now looks like:

    image

    In the next part, I will talk about WCF tracing and message logging. It really makes trouble shooting easier.

  • Sharing session state over multiple ASP.NET applications with ASP.NET state server

    There are many posts about sharing session state using Sql Server, but there are little information with out-of-process session state. The primary difficulty is that there is not an official way. While the source code for the Sql Server Session State is available in ASP.NET provider toolkit, there are scarce information on state server. The method introduced in this post is based on both public information as well as a little help from Reflector. I should warn that this method is considered as hacking as I use reflection to change private member. Therefore, it will not in partially trusted environment. Future version of the OutOfProcSessionStateStore class could break the compatibility too; users should use the principle discussed in the post to find the solution.

    I will first discuss how ASP.NET state server used by out-of-proc session state works. According to the published protocol specification, ASP.NET state server is essentially a mini HTTP Server talking in customized HTTP. The key that ASP.NET used to lookup data is composed of an application id from HttpRuntime.AppDomainAppId, a hush of it and the SesionId stored in cookie. This is very similar to the Sql Server session state. Developers has been using reflection to modify the _appDomainAppId private variable before it is used by the SqlSessionStateStore. However, this method does not work with OutOfProcSessionStateStore because the way it was written.

    When the ASP.NET session state http module is loaded, it uses the mode attribute of the sessionState element in the Web.Config file to determine which provider to load. If the mode is “StateServer”, the http module will load OutOfProcSessionStateStore and initialize it. The partial key it constructed is stored in the s_uribase private static variable. Since SessionStateModule loads very early in the pipeline, it is difficult to change the _appDomainAppId variable before OutOfProcSessionStateStore is loaded.

    The solution is to change modify the data in OutOfProcSessionStateStore using reflection. According to ASP.NET life cycle, an asp.net application can actually load multiple HttpApplication instances. For each HttpApplication instance, ASP.NET will load the entire set of http modules configured in machine.config and web.config without specific order. After loading the http modules, ASP.NET will call the init method of the HttpApplication class, which can be subclassed in global.asax. Therefore, we can access the loaded OutOfProcSessionStateStore in the init method to modify it so that all the ASP.NET applications use the same application id to access the state server. ASP.NET is already good at keeping the sessionid the same cross applications. If this is not enough, we can supply a SessionIdManager.

    So, we just need to place the following code in global.asax or code-behind:

    As you can see, this code works purely because we hacked according to the way OutOfProcSessionStateStore was written. Future changes of OutOfProcSessionStateStore class may break our code, or may make our life harder or easier. I hope future version of ASP.NET will make it easier for us to control the application id used against the state server. If fact, Windows Azure already has a session state provider that allows sharing session state across applications.

    Edited on 10/17/2011:

    The previous version of the code only works in IIS6 or IIS7 classic mode. I made a change so that the code works in IIS7 integrated mode as well. What happen is that in IIS7 when Application.Init() is called, the module.init() is not yet called. So the _store variable is not yet initialized. We could change the _appDomainAppId of the HttpRuntime at this time as it will be picked up later when the session store is initialized. The session store will use a key in the form of AppId(Hash) to access the session. The Hash is generated from the AppId and the machine key. To ensure that all applications use the same key, it is important to explicitely specify the machine key in the machine.config or the web.config rather than using auto generated key. See http://msdn.microsoft.com/en-us/library/ff649308.aspx for details and especially scroll down to the section "Sharing Authentication Tickets Across Applications".

  • Theory and Practice of Database and Data Analysis (4) – Capturing additional information on relations

    So far in this series, I have been talking about traversing related data by navigating down the referential constraints. The physical database makes no distinction about the nature of the foreign key, whether it is parent-child relationship or another type of reference. In logical ER modeling, we often capture more information. So in this blog, I will explore the idea of capturing additional information so that we can build more powerful tools.

    Let me further refrain myself by using the fictitious e-commerce company example introduced in the part 1 of this series: We have an Orders table to capture the order header and OrderDetails table to capture the line items. An order must be placed by a customer, and may or may not originates from marketing campaign. An order may be fulfilled by one or more shipments. Each shipment would ship order line items in part or in full. So we have an ER diagram likes the following:

    image

    In the ER diagram, there is a clear parent-child relationship between Orders and OrderDetails and between Shipdments and ShipmentDetails. If we send the data in an XML file, each OrderDetail element would be a child of an Order element and each ShipmentDetail element would be a child element of an Shipment element. However, it appears that ShipmentDetails table has two parents: Shipments and OrderDetails. We will pick the parent by the importance of the relationship so we will pick Shipments as the parent. We then need to establish an additional mechanism to store the cross references between ShipmentDetails and OrderDetails in our XML file.

    Let us examine other relations. Customer is a mandatory attribute for the Order table and Product is a mandatory attribute to the OrderDetails. Campaign is an optional attribute of the Orders. We can further classify as Products as system table and Customers as data table.

    So we have classified the relations in our simple example into several categories: parent-child, cross-reference, mandatory attribute to system table, mandatory attribute to data table and optional attribute.

    Now let us use an example to see how such classification can help us. Supposing we are building a tool to export all data related to a transaction from our production database into an XML and then import into our development database. The ID of an element could change from one system to another. That is not a problem with parent-child as we will get a new key when we insert the parent and we can propagate that down to the children. However, we need to take extra care to ensure that the cross reference is maintained.

    We do not have to worry about the system table but we do have to worry about the data table. Our development database may or may not have the same customer. So from our classification, we can generically determine that we need to get a copy of mandatory reference to a data table and its children to check for existence in the destination database.

    Besides categorization, other information that we like to capture are one-to-one relationships, some of them can be inferred; if a referential constraint references to primary key in both tables, it has a one-to-one relationship. We also like to capture hidden relationship; they are real references but cannot be enforced by foreign key constraint in Sql Server. For example, some tables contain an ID column that can store multiple types of IDs; a separate column is used to indicate the type of ID stored in the ID column.

    With human input of extra information beyond those we can extracted from physical databases, we shall be able to build better tools. I will explorer some tools in upcoming blogs of this series.

  • Theory and Practice of Database and Data Analysis (3) – Exhaustive Search by Traversing Table Relationships

    In this post, I will discuss about building a script to find all the relevant data starting from a table name and primary key value. In the previous post, I have discussed how to a list of tables that references a table. From a parent table, we just need to get a list of child tables. We then loop through each child table to select records using the parent primary key value. If the child tables themselves have child tables, we repeat the same approach by recursively getting the grand children.

    I often need to remote-desktop into clients system where I only have read privilege; I can neither create stored procedures nor execute them. So I am going to present a script that requires only read privilege.

    In transact-SQL (T-SQL) batch, looping through records can be implemented using cursor or temp table. I used the later approach as temp table can grow so that I can also use it for recursion as well.

    Recursion is a harder problem here. Recursion can be done in most of programming languages by calling procedures recursively. However, that is not possible here since I am restricted to a batch. In computer science, recursion can be implemented with a loop with a stack. In many programming languages, when a caller calls a callee, the program would save the current local data as well as the returning location of the caller into the stack. The area in the stack used by each call is called a stack frame. Once the callee returns, the program will restore the local data from the stack and resume from the previous location.

    In my script, I use temp table #ref_contraints to accomplish both looping and recursion. referential constraints from the top table are added to the table as they are discovered and removed from the table when they are consumed (that is, no longer needed). I used the depth-first traversal in my script. With this structure, I can change to breath-first traversal with minimum efforts.

    The #keyvalues temp table contains the primary key values for the table that I have already traversed. I can get the records from the child tables by joining to the key values in this table so that I only need to query each child table once for each foreign key relationship.

    In order to make the script simpler, I eliminate the schema and assume all tables are under the schema “dbo”. This works with our database, and works with databases in Microsoft Dynamics CRM 2011.

    I have also assumed that all primary keys contain only one integer column. This is true with out database. This is also true with Microsoft Dynamics CRM 2011 except you need to change integer id to guid.

    I added comments the script so that one would know where the code would correspond to looping and procedure calling if the code is written in a language like VB.NET or C#. The comments also indicate the insertion points if additional code is needed.

    In future posts of this series, I will discuss how to capture human intelligence to make search even more powerful.

  • The case for living in the cloud

    These days one cannot have a day without hearing the cloud; vendors are pushing it. Google’s Chrome Book is already nothing but the cloud. Apple’s iOS and Microsoft’s upcoming Windows 8 also have increase cloud features. Until now, I have been skeptical about the cloud. Apart from the security, my primary concern is what if I lose the connection to the internet.

    Recently, I have been migrating to the cloud. The main reason is that I triple boot my laptop now. I once run Windows 7 as my primary OS and anything else as virtual machines. However, the performance of these virtual machines has been less than ideal. The Windows 7 Virtual PC would not run 64bit OS, so I have to rely on Virtual Box to run Windows 2008 R2. Recently, I have been booting Windows 8 Developers Preview and Sharepoint/Windows 2008 from VHD. This is the my currently most satisfying configuration. Although the VHD is slightly slower than the real hard drive, I have the full access to the rest of the hardware. The issue I am facing now is that I need to access my data no matter which OS I boot with. That motivate me to migrate to the cloud.

    My primary data are my email, document and code.

    Email is least of my concern. Both Google and Hotmail have been offering ample space for me to store my email. I do download a copy email into Windows 7 so I can access them when I do not have access to the internet.

    Documents also have been easier. Google has been offering Google Docs as well as web based document editing for a while. Recently, Microsoft has been offering 25GB of free space on Windows Live Skydrive. From Skydrive, I can edit documents using Office Web Apps. So accessing and editing document from a boot OS that does not have Office installed is no longer a concern. From Office 2010, it is possible to save documents directly to Skydrive. In addition, Windows Live Mesh can automatically sync the local file system with Skydrive.

    Lastly, there are many free online source code versioning systems. Microsoft Codeplex, Google Code and Git, just to name a few, all offer free source code versions systems for open source projects. Other vendors such BitBucket also offer source control for close source projects. Open source version control software such as TortoiseHg or TortoiseSVN are easy to get. The free Visual Studio Express is also becoming more useful for real world projects; the recent Windows 8 Developer Preview has Visual Studio Express 11 preinstalled.

    So my conclusion is that the currently available services and software are sufficient to my needs and I am ready to live in the cloud.

  • Theory and Practice of Database and Data Analysis (2) – Navigating the table relationships

    In the first part of the series, I discussed how to use the Sql Server meta data to find database objects. In this part, I will discuss how to find all the data related to a transaction by navigating the table relationships. Let us suppose that the transaction is a purchase order stored in a table called Orders. Any one that has seen the Northwind database knows that it has a child entity called OrderDetails. In even a modest real world database, the complexity can grow very fast. If we can ship a partial order and put the rest in back order, or we need to ship from multiple warehouses, we could have multiple Shipping records for each Order record and have each ShippingDetails record linked to an OrderDetails record. The customers can return the order, either in full or in part, with and without return authorization number, and the actually return may or may not match the return authorization. The system needs to be flexible enough to capture all the events related to the order. As you can see, a very modest order transaction could easily grow to a dozen of tables. If you are not familiar with the database, how can you find all the related table? If you need to delete a record, how do you delete it clean? If you need to copy a transaction from a production system to the development system, how do you copy the entire set of the data related to the transaction?

    Fortunately, in any well-designed database, table relationships can be navigated through the foreign key relationships. In Sql server, we can usually find the information using the sp_help stored procedure. Here we will use INFORMATION_SCHEMA to obtain more refined results to be used in our tools. The TABLE_CONSTRAINTS view contains the primary and foreign keys on a table. The REFERENTIAL_CONSTRAINTS view contains the name of the foreign key on the many side and the name of the unique constraint on the one side. It is possible to navigate the table relationships using these two views. The following query will find all the tables that references the Sales.SalesOrderHeader table in the AdventureWorks database:

    Here are the results:
    image

    The following query will return all the tables referenced by the Sales.SalesOrderHeader table:

    And the results:

    image

    The Person.Address table appeared twice because we have two columns referencing the table.

    The CONSTRAINT_COLUMN_USAGE view contains the columns in the primary and the foreign keys. By using these 3 views, we can construct a simple tool that drills down into the data. In the next part of this series, we will construct such a tool.

    Added by Li Chen 9/26/2011

    The KEY_COLUMN_USAGE view also contains the columns in the primary and the foreign keys. I prefer to use this view because it also contains the ORDINAL_POSITION that I can use to match the columns in the primary and the foreign keys when we have composite keys. The following example shows how to match the columns. The last line in the query eliminates the records from a table joining to itself.

  • Theory and Practice of Database and Data Analysis (1) – Searching for objects

    Recently, I had to spend a significant portion of my time on production data support. Since we have a full-featured agency management system, I have to deal with parts of the application that I am not familiar with. Through the accumulated experiences and improved procedure, I was able to locate the problems with the increasing speed. In this series, I will try to document my experiences and approaches. In the first part, I will discuss how to search for objects.

    Supposing we were told that there were problems in address data, the very first thing we need to do is to locate the table and the field that contains the data. Microsoft SQL Server has a set of sys* tables. It also supports ANSI style INFORMATION_SCHEMA. In order to reuse the knowledge discussed here to other database systems, I will try to use the INFORMATION_SCHEMA as much as possible. So to find any table that is related to “Address”, we can use the following query:

    Using AdventureWorks 2008 sample database, it yields the following results:

    image

    If we suspect there are columns in other tables relating to address, we can search for the columns with the following query:

    We will get the following results this time:

    image

    Supposing we need to find all the stored procedures and functions that reference a table, we can use sp_depends stored procedure. If we need to search thoroughly using a string, we can use the following query:

    However, we cannot find triggers through Information_Schema. We have to use sysobjects:

    Once we find the triggers, we can find the text using sp_helptext or query the syscomments table.

    In the next part of the series, I will discuss navigating the table relationships.

  • ASP.NET Web Application Project (WAP) vs Website

    Recently, we moved a fairly large web application to Website project. This is a very old decision point that we have moved back and forth a few times. A Bing search shows 2 well written entries early in the list:

    Here is how we reach the current decision:

    • Back to VS2002/VS2003, it was WAP. No choice.
    • In VS2005, it was website. No choice initially until WAP was added back to VS2005.
    • In VS2008, we moved to WAP. The main reason was that website projects does not have a project file to store assembly and project references. The project reference is stored in the solution file. The reference to GAC assembly was stored in the web.config and the non-GAC DLL reference is through a bunch .refresh files that points to the location of DLL files (library). This makes checking in solution file into version control important.This system is fragile and breaks easily.
    • Now we are moving back to Website. We understand website reference mechanism better now so it is less frustrating. In WAP, the entire code-behind is compiled into a single DLL. In website, it can be much granular so it works better in industries (such as financial) that requires clean audit-trail of artifacts. We have several thousands of web pages. WAP will requires us to compile the entire application before debugging while we can compile just a single page with Website. In general, with the adoption of MVVM, most of the business logic now resides in separate library DLLs so we do not have much code other than the data-binding code left in the code-behind. This allows us to have a fine control on the granularity of compilation and deployment units.