SharePoint statistics: source processing

In post http://weblogs.asp.net/soever/archive/2005/05/21/408207.aspx I did some investigations into the information logged by SharePoint in the IIS and STS log files. In this post I describe some decisions I’m going to make on processing these log files, based on the information that became available during my investigations. I’m writing these blog posts while doing these investigations, so if you have any comments on the decisions I make, please let me know!!

Goal of this weblog post is to massage the available log data into a format that can easily be processed for importing into the “Stage Area – IN”, a SQL server (2005) database where we import all source data that will eventually en up into out data warehouse.

STS logs

First of all we need a tool to convert the STS binary log files to a format that can easily be processed. The article Usage Event Logging in Windows SharePoint Services contains the code for a C++ application to do this conversion. I also got my hands on a C# implementation through Steven Kassim, a colleague of mine. He got this code from a newsgroup, but I couldn’t find where it exactly came from, and who wrote it. I’m doing some modifications to the code to change the output format (so LogParser can handle it), and to improve the speed. I will publish the code as soon as I’m ready. [Update: tracked down the newsgroup: http://groups.yahoo.com/group/sharepointdiscussions/, and the author: Fred LaForest].

IIS logs

Although the IIS log files are already in a format that could be easily parsed, there are some good reasons to do a preprocessing parse to accomplish the following:

  • Handle the problem of the IIS log header appearing in the log file on each IIS-RESET
  • Filter out log entries we are not interested in:
    • Requests made by service account, like the full text indexer account
    • Request to assets in team sites resulting in  /_vti_bin/
    • Requests made to assets we are not interested in, like javascript files, css stylesheet, images, etc.
  • Filter out fields we are not interested in, like in our case the client IP address, be base the location on the main loacation of a user in the company directory (can also be done through IIS by only selecting the properties in our log file that we are interested in!)

IIS supports multiple log formats, and multiple ways to log information. It is possible to do direct ODBC logging to a database, but this approach gives a heavier load on the web servers. The best format IIS can log in is the W3C Extended Log File Format. In this log format it is possible to select the fields we are interested in:

W3ClogProperties

Carefully selecting the the properties we are interested in can greatly reduce the amount of data that will be logged.

For more information on the W3C Extended Log File Format see:

Processing the log files: the tool

There are many good systems around to process log files. Two log file processors I would really like to mention are:

I have selected LogParser, because of its following features:

  • It supports any log file format (handy for the STS log files)
  • It might even be possible to implement direct binary parsing of the STS log files through a custom component into LogParser (still investigating this)
  • It support incremental input parsing through checkpoints, which simplifies incrementally importing of log file data into our database
  • It has a powerful query syntax
  • It is very powerful in its supported output formats
  • There is extensive programmability support available

 For more information on LogParser see:

For information on LogParser with respect to SharePoint, where direct reporting on the log files is done see:

Back to the IIS log, what do we need

As stated in the previous post, in the STS log all successful requests to all pages and documents that are within WSS sites are logged. This includes WSS site based SPS things like MySite and Areas. All those request are logged in the IIS log as well, and they are difficult to correlate due to time differences. It is also the question if it is interesting to correlate those log entries, the STS log contains all the information that we need… although… I have one issue: the bandwidth consumed by the request. I can’t get the correct value out of the STS log (although it should be in there), while the IIS log contains the correct values (sc-bytes = cs-bytes). This would be the only reason to do the correlation. I’m still working on this issue (I post on this later), so lets assume that problem will be solved.

So where do we need the IIS logs for:

  • Pages not found (404 errors)
  • Pages in the /_layouts folder, this is also the location where we store our custom web applications and our custom services
  • Unmanaged paths in the SharePoint virtual directory (paths excluded for the SharePoint render-engine “treatment”)
  • IIS logs of other web sites, not related to SharePoint, but part of our intranet

Any requests for images, javascript files and stylesheet files in the IIS log can be skipped in our case, because those files are static files, supporting the SharePoint UI and our custom applications. We also filter out requests made by service account, we are not interested in those reuqests.

In the STS log requests for images are interesting, because these images are user uploaded documents within the WSS sites.  We do filter out request made by service accounts as well for the the STS logs.

Moving IIS log files into the database

To move the IIS log files into the database we need a table definition for the IIS logs. I’m currently using the following table definition:

CREATE TABLE [dbo].[IISlog] (
 [date] [datetime] NULL,
 [time] [datetime] NULL,
 [csUsername] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [sComputername] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [csMethod] [varchar](16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [csUriStem] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [csUriQuery] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [scStatus] [smallint] NULL,
 [scSubstatus] [smallint] NULL,
 [scWin32Status] [int] NULL,
 [scBytes] [int] NULL,
 [csBytes] [int] NULL,
 [timeTaken] [int] NULL,
 [csHost] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [csUserAgent] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [csReferer] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [application] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

And the following LogParser script to move the data from the log files to the database:

"C:\Program Files\Log Parser 2.2\logparser.exe" "SELECT date, time, cs-username, s-computername, cs-method, cs-uri-stem, cs-uri-query, sc-status, sc-substatus, sc-win32-status, sc-bytes, cs-bytes, time-taken, cs-host, cs(User-Agent) as cs-User-Agent, cs(Referer) as cs-Referer, 'SharePointPortal' as application INTO IISlog FROM c:\projects\IISlog\*.log WHERE (cs-username IS NOT NULL) AND (TO_LOWERCASE(cs-username) NOT IN ('domain\serviceaccount'))" -i:IISW3C -o:SQL -server:localhost -database:SharePoint_SA_IN -clearTable:ON

This is the first step where I filter out all request made by the system account used to index the SharePoint content. I did not do the filtering out of the WSS sites requests (we will use the STS log for this) and the unwanted files in the /_layouts/ directory yet. I’m moving one step at a time. So we now have all log files (collected into the directory c:\projects\IISlog) moved into the database.

Moving STS log files into the database

To move the STS log files into the database we need a table definition for the STS logs. I’m currently using the following table definition:

CREATE TABLE [dbo].[STSlog](
 [application] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [date] [datetime] NULL,
 [time] [datetime] NULL,
 [username] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [computername] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [method] [varchar](16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [siteURL] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [webURL] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [docName] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [bytes] [int] NULL,
 [queryString] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [userAgent] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [referer] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 [bitFlags] [smallint] NULL,
 [status] [smallint] NULL,
 [siteGuid] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

And the following script to move the data from the binary log files to the database:

"C:\projects\STSLogParser\STSLogParser.exe" 2005-01-01 "c:\projects\STSlog\2005-01-01\00.log"  c:\projects\logparsertmp\stslog.csv
"C:\Program Files\Log Parser 2.2\logparser.exe" "SELECT 'SharePointPortal' as application, TO_DATE(TO_UTCTIME(TO_TIMESTAMP(TO_TIMESTAMP(date, 'yyyy-MM-dd'), TO_TIMESTAMP(time, 'hh:mm:ss')))) AS date, TO_TIME( TO_UTCTIME( TO_TIMESTAMP(TO_TIMESTAMP(date, 'yyyy-MM-dd'), TO_TIMESTAMP(time, 'hh:mm:ss')))), UserName as username, 'SERVERNAME' as computername, 'GET' as method, SiteURL as siteURL, WebURL as webURL, DocName as docName, cBytes as bytes,  QueryString as queryString, UserAgent as userAgent, RefURL as referer, TO_INT(bitFlags) as bitFlags, TO_INT(HttpStatus) as status, TO_STRING(SiteGuid) as siteGuid INTO STSlog FROM c:\projects\logparsertmp\stslog.csv WHERE (username IS NOT NULL) AND (TO_LOWERCASE(username) NOT IN (domain\serviceaccount))" -i:CSV -headerRow:ON -o:SQL -server:localhost -database:SharePoint_SA_IN -clearTable:ON
 

This script currently moves only one day, but you get the drift. As you can see we also set day, computername and application in the log file. Currently using fixed values, we will move this into a dynamic system later on. The date field is obvious, we want to record the date into the database for each log entry. We need the computer and application fields because we will have multiple servers, and multiple “applications” build on SharePoint, like for example ‘SharePointPortal’, ‘TeamSites’ (Intranet) and ‘ExternalTeamSites’ (Extranet).

The STSLogParser is an application to parse the STS log file from it’s binary format into a comma serperated ASCII log file. I will post the code for this converter in one of my next posts.

62 Comments

  • The best hearts are always the bravest.

    -----------------------------------

  • To be both a speaker of words and a doer of deeds.

    -----------------------------------

  • -----------------------------------------------------------
    "You possess a genuinely fascinating blog. As well many blogs that I see now don't genuinely offer anything that I'm enthusiastic about, but I am definately considering this 1. Just imagined that I'd pass that message on. "

  • Couldn't be written any better. Reading this post reminds me of my old room mate! He generally kept talking about this. I will forward this article to him. Pretty certain he will have an excellent read. Thanks for sharing!

  • Being a newbie, I just exploring via google to get article that will help me, thanks!

  • composed by hsm 2012-05-28

  • composed by hsm 2012-06-05

  • DSGASDGSADADSFHGADFS FGBNFSDGSADADFHGAD
    SDGSDZSDGASDASDGHASD ERYERASDGASDASDGHASD
    YUYZSDGASDASDGHASD ASFDZSDGASDSDGASD
    ASFDSDGSADDSFGHADS YUYASDGASDADFHAD

  • ASFDADFHGDAFSDFH ZVXZASDGASDDSFGHADS
    DSGAASDGASDASDFHGAD ADFHGADFHGDAFASDGHASD
    ZVXZADFHGDAFADFHAD QWERSDGSADGADFHGAD
    FGBNFSDGSADADFHGAD ZVXZSDGSADASDGHASD

  • ERYERADFHGDAFADFHGAD YUKYZSDGASDADFHGAD
    GJTRADFHGDAFASDFHGAD GJTRASDGASDADFHAD
    GJTRSDGSADASDGHASD ASFDASDGASDSDFH
    GJTRSDGSADGDFHAD GJTRASDGASDASDFHGAD

  • ADFHGASDGASDSDAFHSAD FGBNFZSDGASDSDFH
    YUYSDGSADASDGHASD FGBNFADFGASDGDSFGHADS
    FGBNFSDGSADGASDFHGAD DSGASDGSADADFHGAD
    ZVXZSDGSADGADFHGAD QWERADFGASDGASDGHASD

  • djrdb aaron rodgers jersey
    offkv brian urlacher jersey
    ksdko j.j. watt jersey
    lames randall cobb jersey
    ezsmq t.j. yates jersey

  • You need to take part in a contest for one
    of the greatest websites on the internet. I am going to highly
    recommend this blog!

  • What's up to all, it's in fact a fastidious for me to pay a quick visit this site, it contains priceless Information.

  • You made some nice points Walter Payton Jersey there.
    I did a search on the topic and found most guys Jay Cutler Jersey will consent with your site.

  • I was recommended this website through my cousin.
    I am not positive whether or not this submit is written by him as no one else understand
    such designated approximately my difficulty.
    You're wonderful! Thanks!

  • I'm not sure exactly why but this site is loading very slow for me. Is anyone else having this problem or is it a problem on my end? I'll check back
    later on and see if the problem still exists.

  • This is my first time go to see at here and i am in fact happy to read everthing at one place.

  • Heya! I'm at work surfing around your blog from my new iphone 4! Just wanted to say I love reading your blog and look forward to all your posts! Carry on the superb work!

  • Awesome! Its in fact amazing paragraph, I have got much clear idea concerning from this paragraph.

  • Have you ever considered about adding a little bit more than just your articles?
    I mean, what you say is important and all. But
    think about if you added some great images or video clips to give your posts more, "pop"!

    Your content is excellent but with pics and video clips, this website could undeniably be one of
    the greatest in its field. Wonderful blog!

  • This piece of writing will help the internet viewers for building up new web site or even a blog from start to end.

  • Thanks in favor of sharing such a nice thought, post is fastidious, thats
    why i have read it entirely

  • Hello, Neat post. There's a problem with your site in web explorer, may test this? IE nonetheless is the marketplace chief and a huge part of other people will miss your magnificent writing due to this problem.

  • You actually make it seem so easy with your presentation but I find this topic to
    be really something which I think I would never understand.
    It seems too complicated and very broad for me. I am
    looking forward for your next post, I will try to get the hang of it!

  • Howdy! I understand this is somewhat off-topic however I had
    to ask. Does managing a well-established website such as yours
    require a lot of work? I'm brand new to operating a blog but I do write in my diary everyday. I'd like to start a blog so I can share my personal experience and thoughts online.
    Please let me know if you have any recommendations or tips for brand new aspiring
    blog owners. Appreciate it!

  • Hello to every body, it's my first go to see of this web site; this web site carries awesome and in fact good material in favor of readers.

  • Normally I do not read post on blogs, however I wish to say that this write-up very compelled me to take a look at and do it!
    Your writing style has been surprised me. Thanks, very nice post.

  • Hello to every one, the contents present at this website are really amazing for people knowledge,
    well, keep up the nice work fellows.

  • Hi, I wish for to subscribe for this blog to take hottest
    updates, so where can i do it please help
    out.

  • First off I want to say fantastic blog! I had a quick question which I'd like to ask if you don't mind.

    I was curious to find out how you center yourself and clear your thoughts prior to
    writing. I have had difficulty clearing my mind in getting my
    thoughts out. I truly do take pleasure in writing however it just seems like the
    first 10 to 15 minutes are lost simply just trying to figure out how to begin.
    Any ideas or hints? Many thanks!

  • Hi there, its pleasant post regarding media print, we all understand media is a fantastic source of facts.

  • Hello there, You've done a fantastic job. I'll definitely digg it and personally recommend to my
    friends. I am confident they will be benefited from this website.

  • Ahaa, its fastidious conversation concerning this piece of writing here at this web site, I have read all that, so now me
    also commenting at this place.

  • It's great that you are getting thoughts from this piece of writing as well as from our dialogue made at this time.

  • My partner and I stumbled over here different web address and thought I might check things out.
    I like what I see so now i am following you. Look forward
    to exploring your web page yet again.

  • Good way of describing, and good article to obtain facts
    regarding my presentation topic, which i
    am going to present in school.

  • It's hard to come by experienced people for this topic, but you sound like you know what you're talking about!

    Thanks

  • This is a topic that's close to my heart... Cheers! Where are your contact details though?

  • What i don't understood is in fact how you are now not actually much more well-liked than you might be right now. You are very intelligent. You realize thus significantly in relation to this subject, made me for my part believe it from so many various angles. Its like women and men are not involved unless it's something
    to accomplish with Lady gaga! Your own stuffs excellent.
    At all times care for it up!

  • If you want to take a great deal from this piece of writing then you have to apply these methods to your won web
    site.

  • These are truly fantastic ideas in concerning blogging.
    You have touched some good things here. Any way keep up wrinting.

  • Hi there to every body, it's my first pay a visit of this blog; this webpage contains remarkable and genuinely excellent information for readers.

  • Definitely believe that which you said. Your favorite justification appeared to be on the net
    the simplest thing to be aware of. I say to you, I certainly get
    irked while people consider worries that they plainly do not know about.
    You managed to hit the nail upon the top and also defined out the whole thing without having side-effects , people can take a signal.
    Will probably be back to get more. Thanks

  • Hello! This post couldn't be written any better! Reading through this post reminds me of my good old room mate! He always kept talking about this. I will forward this article to him. Fairly certain he will have a good read. Many thanks for sharing!

  • Lots of people use laser acne remedies since they are simpler than other remedies, like
    untidy creams or some other topical medicines. Smooth the paste all over the face and
    gently rub it in. As a sample, any tight clothing that steer clear of
    the pores and skin from breathing can provoke pimples.

  • Hi there, just became aware of your blog through Google, and found that it is really informative.
    I am gonna watch out for brussels. I'll be grateful if you continue this in future. A lot of people will be benefited from your writing. Cheers!

  • Hi there mates, its impressive article concerning educationand completely defined,
    keep it up all the time.

  • Choose headwear with a cotton, satin, silk or wool lining and prevent rough fibres that
    might rub against and damage the head of hair. Thus, maintaining a balance of right moisture content, proper nutrition
    and cleanliness can go a long way in having healthy hair. However most services or accessories are merely gimmicks and can do you more harm than good.

  • Hi it's me, I am also visiting this web site regularly, this web page is actually fastidious and the users are really sharing nice thoughts.

  • You may choose to check out the website on research finished with cold sores (herpes) and also
    the effects of olive leaf extract on the website, East Park
    Research, Inc. If you don't know what things to eat, look at The Food Guide Pyramid at to discover eating plans and learn healthy eating tips. Program Director at The Wellness Community - West Los Angeles for greater than 20 years and possesses specialized in utilizing individuals and groups in the area cancer along with other medical, chronic and life-threatening illness for 25 years.

  • Every weekend i used to visit this web page, because
    i want enjoyment, for the reason that this this web site conations truly fastidious
    funny data too.

  • I really like your blog.. very nice colors & theme. Did you make this website yourself or did you hire someone to do
    it for you? Plz respond as I'm looking to construct my own blog and would like to find out where u got this from. kudos

  • I love it whenever people get together and share ideas. Great blog, stick with it!

  • You then need to market, the so called disposable
    phone numbers. Will Android and the latest updates about my co-worker's fight with her dad.

  • With havin so much written content do you ever run into any issues of plagorism or copyright violation?
    My website has a lot of completely unique content I've either created myself or outsourced but it seems a lot of it is popping it up all over the internet without my authorization. Do you know any solutions to help stop content from being ripped off? I'd really appreciate it.

  • I got this web page from my pal who informed me on the topic
    of this web site and now this time I am browsing this web site and reading very informative articles
    or reviews at this place.

  • After I originally commented I appear to have clicked the -Notify me when new comments are added- checkbox and from now on every time a comment
    is added I get 4 emails with the same comment. There has
    to be a means you can remove me from that service?

    Thanks a lot!

  • Wow! After all I got a web site from where I be able to truly take useful facts regarding
    my study and knowledge.

  • Hello, I enjoy reading all of your post. I wanted to write a little comment to support you.

  • Undeniably believe that which you said. Your favorite reason appeared
    to be on the net the simplest thing to be aware
    of. I say to you, I definitely get annoyed while people think about
    worries that they just do not know about. You managed to hit
    the nail upon the top and also defined out the whole
    thing without having side effect , people
    could take a signal. Will probably be back to get more.
    Thanks

  • There's certainly a lot to find out about this issue. I like all of the points you've made.

Comments have been disabled for this content.