vrijdag 4 november 2011

Wordprocessing Serialization: move content between wordprocessing files

At the moment I’m working on a couple of side projects (which explains why I forgot to close last month off with a funny post). I’ll be blogging about them a bit more in the couple of weeks or months ahead.  A small one-man’s project I’m working on has to do with OpenXml.  For those that have been following my blog, it won’t sound much of a surprise that I’m pretty fond of doing document generation with OpenXml.  I wanted to make it a bit easier though.

A recurring requirement I’ve been confronted with is the copying of content from one document to another.  It can even go so far that you must be able to store this content and be able to inject it into multiple documents at a later moment.  One might think that this is an easy think to do but think again.  If you’re document is crammed with custom style, numbering and tables you’ll soon notice that a simple copy of the paragraphs won’t do the trick.  

So what I’ve done is create a library that allows you to transfer content between 2 wordprocessing documents and keep the format of the table and styles.  Not only can you transfer the content between documents, you can also save the content as a blob to your database or any other file.  The way that you can mark the content that you want to serialize is by bookmarks.  You place two bookmarks in the source document to determin the start and end of the content, and one bookmark in the target document to indicate where to insert the copied content.

So how does it work? There are basically 3 main objects to the library that you need to know about.

  • ContentSerializer
  • ContentDeserializer
  • ContentInserter


The content serializer will serialize OpenXmlElements between the two bookmarks that you specify.  Here is a small example:

var memoryStream = new MemoryStream();

IContentSerializer serializer = new ContentSerializer("c:\\temp\\Test.docx");

As you can see the ContentSerializer accepts a string that points to the location of where the document resides.  You can then call the SerializeElementsFullBetweenBookmarks.  This method accepts 3 parameters:

  • The stream to which it will serialize to
  • The bookmark indicating the start of the text
  • The bookmark indicating the end of the text

The ContentSerializer will then serialize all OpenXmlElements between the two bookmarks together with their styles and numbering definition.  You can use the data in the stream to save the objects to a database or a binary file.

There is also a SerializeElementsBetweenBookmark function available which will only serialize the paragraphs between two bookmarks.  Yet for this example I want to be able to serialize everything.


The ContentDeserializer will simply deserialize the serialized content into a format usable to inject into a document.  Depending on whether you serialized only the paragraphs or the full content with styles and numbering you can call one of the two following methods:

  • DeserializeContent: This will deserialize only paragraphs
  • DeserializeContentWithNumberingAndStyles: This method will deserialize the paragraphs together with the numbering and styles.

A small demonstration:

IContentDeserializer deserializer = new ContentDeserializer();
var contentWithNumbering = deserializer.DeserializeContentWithNumberingAndStyles(memoryStream);

This method will return an object that contains an IEnumerable of paragraphs, an IEnumerable of styles and a numbering definition. 


When you have the content deserialized you might want to insert it into another docx file.  The ContentInserter will do just this for you.  All you need to do is provide it with the location of the wordprocessing document and then call the InsertElementsWithNumberingInDocument method to insert the deserialized content at the provided bookmark.

IContentInserter contentInserter = new ContentInserter("c:\\temp\\InsertInDocument.docx");
                contentInserter.InsertElementsWithNumberingInDocument(contentWithNumbering, "Paste");

So the InsertElementsWithNumberingInDocument method accepts the following parameters:

  • An ElementsFull element which contains the deserialized content
  • The name of the bookmark in which it will insert after.

There are a few things that you should take note to:

  • When the document is under revision control, all revisions will be accepted prior to serializing or inserting content!
  • The bookmarks are case sensitive
  • This projects uses the OpenXml Power tools, it’s a cool open source project so be sure to take a look at: http://powertools.codeplex.com/
  • The content that will be serialized between the two bookmarks currently only consists of Parargraphs and all their children (run, text, runproperties, etc.) and tables and all their children (tablecell, etc.).

Some things that I might improve in the future are:
  • Include the serialization of Images
  • Include the serialization of more elements like drawings and graphs
  • Optimize the serialization of the styles and numbering part, at the moment there is no filter

Make any adjustments as you like, until next time ;)

Geen opmerkingen:

Een reactie plaatsen

Een reactie plaatsen