Tagging Implementation with LINQ
I decided to implement tagging on my video library. My first solution was to use one table for the tags. Containing the videoid and tagname. When I loaded in 1,000,000 test tags this became slow to generate my tag cloud data. My new solution was to split up the tags into two tables. One containing distinct tags and the other joining videos to them. I will also show some things I came by when extracting my tag data.
Tables
My tables for tagging are as follows:
Tags = {
ID,
TagName
}
VideoTags = {
ID,
TagID,
VideoID
}
Inserting New Tags:
This is the only task that takes a
little more work using this solution. When adding a new tag to a video, we first must check
to see if the tag ecists in the Tags table, if it doesnt we need to add
it, otherwise we just need its ID. I created a method to aid in adding
Tags for me:
public long InsertTag(GeekTubeDataContext db) {
Tag tmp = db.Tags.SingleOrDefault(t => t.TagName == this.TagName);
long tagID = 0;
if (tmp != null) {
tagID = tmp.ID;
} else {
db.Tags.InsertOnSubmit(this);
db.SubmitChanges();
tagID = this.ID;
}
return tagID;
}
Used like so:
Tag newTag = new Tag() { TagName = "testtag" };
int tagID = newTag.InsertTag(db);
Adding tags to videos:
We have two cases adding a new video with tags and updating a videos tags.
public void InsertVideoWithTags(GeekTubeDataContext db, List<int> tags) {
foreach (var tag in tags) {
this.VideoTags.Add(new VideoTag { TagID = tag });
}
db.Videos.InsertOnSubmit(this);
db.SubmitChanges();
}
This inserts our videos and assigns the tags to it... The next method Deletes a videos current tags and adds the new ones:
public void UpdateVideoTags(GeekTubeDataContext db, List<int> tags) {
db.VideoTags.DeleteAllOnSubmit(this.VideoTags);
foreach (var tag in tags) {
this.VideoTags.Add(new VideoTag { TagID = tag });
}
db.SubmitChanges();
}
You would use it like this:
Video vid = new Video();
vid.Title = "foo";
vid.InsertVideoWithTags(db, TAGSIDLIST);
-or-
Video vid = Video.GetVideo(db, ID);
vid.UpdateVideoTags(db, TAGSIDLIST);
Getting tag cloud data:
Getting the tag cloud data this way prooved fairly quick and was much easier than with one table. My original LINQ query was this:
// Get top 20 most popular tags ordered by name...
var query = (from tag in db.Tags
join videotag in db.VideoTags on tag.ID equals videotag.TagID into g
orderby g.Count() descending
select new TagCloudItem {
Name = tag.TagName,
Count = g.Count()
}).Take(20).OrderBy(p=>p.Name);
return query;
This worked perfect but after looking at the SQL it was generated thought I could tidy it up a little so came up with this:
// Get top 20 most popular tags ordered by name...
var query = (from tag in db.Tags
join videotag in db.VideoTags on tag.ID equals videotag.TagID into g
select new { Name = tag.TagName, Count = g.Count() } into item
orderby item.Count descending
select new TagCloudItem {
Name = item.Name,
Count = item.Count
}).Take(20).OrderBy(item => item.Name);
return query;
After looking at the SQL the secound query generated and comparing the two by viewing the Execution plan I found the secound query was twice as fast as it had less nested selects. Qutie happy with that myself.
TagCloud Control:
I created a simple tagcloud control to display the data for me. I will not go into this in this post though.
Final:
I Think this example works well. One thing I am not 100% happy about is the way I implement my data access. I think this could be done a better way while still keeping it seperated from the UI layer. There are still a few things not shown here like extracting tags from frontend but that is because I have not finished that one yet :)..... Anyway hope noone fell asleep.
Thanks
Stefan
Update 20071222:
I have updated the tagcloud data generation query to the blow. I found it easier to read and it has no performance loss.
select new TagCloudItem {
Name = tag.TagName,
Count = (from videotag in db.VideoTags
where videotag.TagID == tag.ID
select videotag).Count()
}).OrderByDescending(tag => tag.Count).Take(20).OrderBy(tag => tag.Name);
Update 20071222:
I now added a Created column to the VideoTags table. This can now
allow me to get they say top 20 most recent tags. The query would now
be so:
var query = (from tag in db.Tags
select new TagCloudItem {
Name = tag.TagName,
Count = (from videotag in db.VideoTags
where videotag.TagID == tag.ID
select videotag).Count(),
LastUsed = (from videotag in db.VideoTags
where videotag.TagID == tag.ID
orderby videotag.Created descending
select videotag.Created).Take(1).SingleOrDefault()
}).OrderByDescending(tag => tag.LastUsed).Take(20).OrderBy(tag => tag.Name);
What we do now is get the most recent time a tag was assigned to a video. We then order by date descending and take 20. Then order by name and can display in our tagcloud.... Fun fun