Merging and Managing Documents and SharePoint
So here I was trying to figure out some ways to manage mutiple parts of documents stored in SharePoint and Word, IE, SharePoint, or some combination of the above are all behaving badly and causing me (and from the newsgroups, others) grief. After many cokes and munchies, I've updated the blog with some ideas around how to do this without shooting yourself in the head (and the gotchas I went through to discover this).
The problem is to store say 10 (or some number) of different documents (potentially from different document libraries) in SharePoint and:
- Allow users to edit the various individual documents AND (here's the kicker)
- Send out a compilation of all of these documents into a single one AND (now it gets ever better kids)
- Track the changes made by someone external to document AND (final nail in the coffin)
- Merge the changes back into SharePoint (from say a copy of the document you sent out that came back as an email attachement).
Okay, besides having lots of ANDs there are a few challenges here. Some are easy to solve, and some that are driving me batty.
Problem #1. Store a number of different documents in the system, allow people to edit them, track changes, all that good stuff. Easy sleazy. Document library + Document + Word 2003 = Bob's your uncle. Boy, if only my job was this easy.
Problem #2. Create a compilation of these documents into a single document and send them to someone for editing. Okay, not too much of a biggie. Lots of ways to create a compound document. This KB Article says to not useWord's Master Document feature with SharePoint. They're just not compatible. The work-around is to use the RD or INCLUDETEXT fields. There's a great article by Dian Chapman (I have a lot of respect for anyone who puts Bill Gates on their resume as a reference) here on how to use this. It took me a few minutes to realize the RD field (reference document) wasn't going to pull in my sub-documents but rather be a facility to create an uber table of contents across all of them. The INCLUDETEXT field lets you specify a URL to a document and will basically pull the contents in with some nice options (formatting is always a problem with lots of different sources). Since all documents in SharePoint are URL addressable, I was all set. It creates what looks like to a user a single document in a document library. Click on it and it launches Word with all your documents brought together. Nice. Great. Bundling it to send out is a bit of challenge since this is compound document is just directives to include the other documents. The actual document will try to retrieve the others, which is great if you have access to them but we need to create a real 500 page (or however big it is) document to send to someone externally. In any case, the document can be exported or something to create this.
Like I said, this compound document isn't a document at all. It's just that original document you create with the INCLUDETEXT fields to go retrieve each sub-document. While it looks like a single document, it isn't but you can change this single document as if you're editing any regular document in a document library (no special codes or anything). Oh wait. Gotcha. Here's an image showing the setup:
So I open the compound document (indicated by A above), go down to page 50 (which happens to really be document B in Document Library 2), and merrily change the contents. Click File | Save and the typical Save dialog comes up (SharePoint aware). Any meta data on the document library will popup so I can enter this. Fantastic. However since the compound document isn't a copy of all my sub-documents, the changes must have been made in Document Library 2 right? Nope. Open the compound document again. Yup, the changes are there. Check the sub-document. Nothing. Flush the IE cache. Nope.
In the picture above, the compound document is A which looks something like this:
{INCLUDETEXT "http://servername/sites/sitename/document%20library%201/doc1.doc" \* MERGEFORMAT}
{INCLUDETEXT "http://servername/sites/sitename/document%20library%202/doc2.doc" \* MERGEFORMAT}
One of the sub-documents is B which is what is being edited in the scenario above. However the B document doesn't get updated. For those that are still awake and keeping score:
- Edit A (the compound document). B (one of the individual documents) is not updated.
- Edit B. A shows the changes.
Remember, A is just 2 fields pointing to other documents. There's really nothing to edit there. Editing B will show the changes in the compiled document but only after you do a Select All then Update Field (F9). Otherwise, A will continue to show that last time the fields were updated (took me a few coffees to get that straight this morning).
Problem #3: Track the changes made by an external party. A couple of options here as Word has (and has had for awhile) a Track Changes option. You can password protect this option, but it's not hard to get past that. If you really want to secure the contents you need to drop in a Rights Management server and protect the document this way. Note: You'll need to expose your DRM server externally if you send a document out that has the contents protected and there's a whole bunch of things that have to be done with this (end users needing Office 2003, etc.) so it will take more than a click of a button to enable it but it's pretty slick overall once you get the infrastructure and processes in place.
Anyways, without tracking changes (or not trusting if someone can bypass it as the other end) it's not too difficult if you keep a copy of the fully merged document you sent out. Word 2003 has some nice Compare features to show you what's changed no matter what options were enabled, overridden, etc. Again a caveat here is that you actually have to unprotect a document (if it's protected) before you can compare one document with another.
Problem #4: Merge the changes back into the copies of the documents in the document libraries. And here's the rub. If you send the document out as a single (new) merged document, even with the ability to see where the changes happeed, how do you get any changes back into the various bits and pieces that made this thing up in the first place (without having your users jump through hoops editing large numbers of sub-documents individually).
So, bottom line...
- You can use a compound document with SharePoint however you need to use the RD and/or INCLUDETEXT tags and not the Master Document feature of Word (see KB article above)
- Do not edit the compound document directly, instead edit it's parts and regenerate/update the compound view.
- If you have to send a compound document to an external editor, send them the parts and perhaps a read-only copy of the compiled set of documents (like a PDF). You won't be able to merge the compiled document into the individual parts
Clear as mud now?
Note: I keep reading this and parts of it still doesn't make sense, however I do think there might be value in putting together a coherent post/article on "How to Support Compound Documents in SharePoint" or something. Watch for this in a couple of days. We'll just pretend I was having a bad morning.