Cleanup inactive anonymous users from ASP.NET Membership Tables
ASP.NET 2.0 Websites that allow anonymous visit and anonymous user profile have a unique challenge to cleanup unused data which is generated by anonymous users who never come back. Every first visit is creating one anonymous user, page setup, and other user specific content. If the user never comes back, it still remains in the database permanently. It is possible user might come back within a day, or a week or a month. But there’s no guaranty if user will ever come back or not. Generally sticky users are max 30% of the total users who come to most websites. So, you end up with 70% unused data which are never needed. All these requires cleanup, otherwise the database keeps growing uncontrollably and gets slower and slower. This cleanup operation is humongous for busy websites. Think about deleting millions of rows from several tables, one after another while maintaining foreign key constraints. Also the cleanup operation needs to run while the site is running, without hampering site's overall performance. The whole operation results in heavily fragmented index and space in the MDF file. The log file also becomes enormous in order to keep track of the transactions. Hard drives get really hot and start sweating furiously. While the CPU keeps smiling having nothing to do with it, it’s really painful to watch SQL Server go through this every day. Unless you clean up the database and maintain its size under control; you can't keep up with SQL Server’s RAM and Disk IO requirement.
When a user visits the site, Asp.net Membership Provider updates the LastActivityDate of aspnet_users table. From this field, I can find out how long the user has been idle. The IsAnonymous bit field tells me whether the user account is anonymous or registered. If it is registered, no need to worry. But if it is anonymous and more than 30 days old, I can be sure that the user will never come back because the cookie has already expired. If you repeatedly logout from your start page, all cookie related to the site gets cleared. That means you are producing one new anonymous user record during each log out. That anonymous record is never used because you will soon log in to have your customized pages back and then you will log out again. This will result in another anonymous user account which again becomes useless as soon as you log in.
Here’s how the whole cleanup process works:
- Find out the users which are old enough to be discarded
- Find out the pages user has
- Delete all the widget instances on those pages
- Then delete those pages
- Remove rows from child tables related to aspnet_users like aspnet_profile, aspnet_UsersInRoles, aspnet_PersonalizationPerUser. Remove rows for users to be deleted
- Remove the users from aspnet_users
- Pray that you did not accidentally remove any good user
Here’s the giant DB script which does it all. I have put enough inline comment so that you can understand what the script is doing:
1: -- Number of days after which we give users 'bye bye'
2: DECLARE @Days int
3: SET @Days = 14
4:
5: -- No of users to delete per run. If it's too high, database will get stuck
6: -- for a long time. If it's too low, you will end up having more trash than
7: -- you can cleanup
8: DECLARE @NoOfUsersToDelete int
9: SET @NoOfUsersToDelete = 1000
10:
11: -- Create temporary tables which holds the users and pages to delete
12: -- As the user and the page is used to find out other tables, instead
13: -- of running SELECT ID FORM ... repeatedly, it's better to have them
14: -- in a temp table
15: IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[PagesToDelete]') AND type in (N'U'))
16: DROP TABLE [dbo].[PagesToDelete]
17: IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[aspnetUsersToDelete]') AND type in (N'U'))
18: DROP TABLE [dbo].[AspnetUsersToDelete]
19:
20: create table PagesToDelete (PageID int NOT NULL PRIMARY KEY)
21: create table AspnetUsersToDelete (UserID uniqueidentifier NOT NULL PRIMARY KEY)
22:
23: -- Find out inactive anonymous users and store the UserID in the temporary
24: -- table
25: insert into AspnetUsersToDelete
26: select top(@NoOfUsersToDelete) UserID from aspnet_Users where
27: (isAnonymous = 1) and (LastActivityDate < (getDate()-@Days))
28: order by UserID -- Saves SQL Server from sorting in clustered index again
29:
30: print 'Users to delete: ' + convert(varchar(255),@@ROWCOUNT)
31: GO
32:
33: -- Get the pages of the users which will be deleted
34: insert into PagesToDelete
35: select ID from Page where UserID in
36: (
37: select UserID from AspnetUsersToDelete
38: )
39:
40: print 'Pages to delete: ' + convert(varchar(255),@@ROWCOUNT)
41: GO
42:
43: -- Delete all Widget instances on the pages to be deleted
44: delete from WidgetInstance where PageID IN
45: ( SELECT PageID FROM PagesToDelete )
46:
47: print 'Widget Instances deleted: ' + convert(varchar(255), @@ROWCOUNT)
48: GO
49:
50: -- Delete the pages
51: delete from Page where ID IN
52: ( SELECT PageID FROM PagesToDelete )
53: GO
54:
55: -- Delete User Setting
56: delete from UserSetting WHERE UserID IN
57: ( SELECT UserID FROm AspnetUsersToDelete )
58: GO
59:
60: -- Delete profile of users
61: delete from aspnet_Profile WHERE UserID IN
62: ( SELECT UserID FROm AspnetUsersToDelete )
63: GO
64:
65: -- Delete from aspnet_UsersInRoles
66: delete from aspnet_UsersInRoles WHERE UserID IN
67: ( SELECT UserID FROm AspnetUsersToDelete )
68: GO
69:
70: -- Delete from aspnet_PersonalizationPerUser
71: delete from aspnet_PersonalizationPerUser WHERE UserID IN
72: ( SELECT UserID FROm AspnetUsersToDelete )
73: GO
74:
75: -- Delete the users
76: delete from aspnet_users where userID IN
77: ( SELECT UserID FROm AspnetUsersToDelete )
78:
79: PRINT 'Users deleted: ' + convert(varchar(255), @@ROWCOUNT)
80: GO
81:
82:
83: drop table PagesToDelete
84: drop table AspnetUsersToDelete
85: GO
Now the question comes, when can I run this script? It depends on several factors:
- The lowest traffic period. For example, USA midnight time when everyone in USA is sleeping if your majority users are from USA
- The period when there’s no other maintenance tasks running like Index Defrag or Database Bakup. If by any chance any other maintenance task conflicts with this enormous delete operation, SQL Server is dead.
- The operation will take from 10 mins to hours depending on the volume of trash to cleanup. So, consider the duration of running this script and plan other maintenance jobs accordingly.
- It’s best to run 30 mins before INDEX DEFRAG jobs run. After the script completes, the tables will be heavily fragmented. So, you need to defrag the indexes.
Before running this script, there are some preparations to take:
- Make sure you have turned of AutoShrink from Database Property. Database size will reduce after the cleanup and if SQL Server tried to shrink the database, there will be a big IO activity. Turn off auto shrink because the database will grow again.
- Make sure the LOG file’s initial size is big enough to hold such enormous transactions. You can specify 1/3rd of the MDF size as LDF’s Initial Size. Also make sure the log file is not shrunk. Let it occupy HD space. It saves SQL Server from expanding the file and shrinking the file. Both of these require high Disk IO.
Once the cleanup job runs and the INDEX DEFRAG runs, the database performance will improve significantly. The tables are now smaller. That means the indexes are now smaller. SQL Server need not to run through large indexes anymore. Future INDEX DEFRAGs take shorter time because there’s not much data left to optimize. SQL Server also takes less RAM because it has to work with much less amount of data. Database backup size also reduces because the MDF size does not keep increasing indefinitely. As a result, the significant overhead of this cleanup operation is quite acceptable when compared to all the benefits.
Note: I will be posting some stuffs from my old blog to new blog. Please ignore if you have read them before.