woensdag 1 juni 2011

XSLT transform your XML data into an OpenXml wordprocessing document

In this post I’ll explain to you how you can effectively and easily generate data into a OpenXml word-processing document.  I’m going to assume that you have some basic knowledge of OpenXml.  If you’ve been living under a rock for the last 6 years and don’t know what OpenXml is I gladly refer you to this PDFdocument.  It’s one of the best, if not the best, resources out there on OpenXml.

So to give you a basic Idea of what we are going to do I’ve drawn it in a little diagram:


We have basically 4 steps:
  1. We take our xml data and if there isn’t an XSD available then we create one
  2. With this XSD we start creating our template with CustomXml tags in them
  3. When the template is finished we run a tool to get a XSLT document that can generate our document
  4. We feed the XML data to the XSLT transformer and let it do its magic
1. Get the XSD 
Actually this step isn’t actually a step but more of a hint.  In the case you don’t have an XSD document for you XML data you can easily generate one with Visual Studio.  Just open up the XML document with Visual Studio and then go to the menu XML and select Create Schema.

2. Generate the template 
In order to do this I would recommend you to use Microsoft Word 2007 and above.  You COULD do this manually but since I’m very fond of my sanity I won’t do this.  Just create an empty word document in Word and see if you have the Developer ribbon.  If you don’t have the Developer ribbon then check here on how to make it appear.

In the developer ribbon go to Schema and import your XSD as I’ve did in the image.

In case you’re retarded: To import the schema you have to click ‘Add schema…’.

Select the option Allow saving as XML even if not valid.  I don’t know exactly why this has to be on but I’m guessing it has something to do with cross tagging of the custom XML tags.

Select you schema and then you can start editing you document!  Toggle the structure button in the ribbon next to the schema button.  This will give you a side panel which allows you to select the tags defined in your XSD.   

If you have sections you wish to generate per node then drag a selection around it and select the parent node.  As an example, in the image I will be generating the selected text PER customer.

You can apply this principle also on tables.  Here I generate a row PER product:


This is how the total document looks like:

When this is done, save the file as a Word 2003 XML document. Now I see you pondering: “huh wtf, I thought this was XSLT transformation with an OpenXml document???”.  Yes it still is but in disguise, I’ll explain in step 3.

3. Get the XSLT
Well you can do this is plenty of different ways.  If you’re really hardcore then you can write the XSLT yourself.  Send me a postcard from the mental institution when you’ve done this.  

The reason why I saved the OpenXml document as a word 2003 document is because I wanted to have the OpenXml package as a flat structured document.  What do I mean with that?  I want all the parts of the OpenXml package to be slammed into one XML document.  This off course has some disadvantages, performance being one of them.  

Once we have this we can feed this flat XML document to a tool called WML2XSLT .  The usage of this tool is as following:  

WML2XSLT.EXE TheTemplate.xml –o Generator.xslt.   

When the tool launches it gives you the option of selecting a namespace.  Select the namespace of you XSD:


After this the tool will have generated an XSLT for you!

4. Result
 All we have to do now is to transform our data XML document by applying the XSLT.  We can do this in code or with Visual Studio by selecting XML => Start XSLT (with/without debugging).  The result in my case looks something like this:

Now we have one small issue.  The result we have is an xml document, but we want a docx document so our customer is happy.  Well this is pretty easy to solve and I didn’t know about this since a couple of weeks ago when a colleague bumped into this.  You can create an empty package and add a part inside it of type chunck.  Then guess what this part is going to be?  Yes our flat chuncky document =) 
Here is an excellent link that will guide you through this:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test1.docx", true))
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.WordprocessingML, altChunkId);
    using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
    AltChunk altChunk = new AltChunk();
    altChunk.Id = altChunkId;
        .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());

Voila that’s it.  Nothing to it!  You can find the files here.

Next blogpost will be in a couple of weeks because I’m going on a 2 week trip to rhodos now. Hell yeah! ;)

2 opmerkingen:

  1. Luc, everything was great until I reopened the template XML in Word. Surprisingly all previously visible XML tags had gone!

    I use Office 2010 and according to net experts the problem is because MS changed the way of Word processes custom tags after the lawsuit it lost.

    Maybe you know how to deal with it in new versions of Word?


  2. Hi,

    Since Microsoft lost the lawsuit against i4i they had to strip the feature in Office 2010. Its very regretful that they had to do so, but the only way to place the custom XML tags at the moment is either by using Office 2007 or by buying the i4i plugin. Especially when you’re main business revolves around document generation then it isn’t a bad purchase.

    You can also programmatically add in the custom xml tags with the OpenXml SDK but to be honest it’s a bit of a hassle and it almost comes down to writing the i4i plugin yourself :-s