SaguiItay

My blog has moved!

You should be automatically redirected in 4 seconds. If not, visit:
http://itaysagui.wordpress.com
and update your bookmarks.

Thursday, January 29, 2009

WordprocessingML: Part 2

In the first part of this serie, we've created a simple WordprocessingML document, and added some basic content to it. It is now time to add some formatting, to make that content presentable.

In this part we'll focus on the two basic methods to modify the formating of text:

  • Runs properties
  • Paragraph properties
More advanced methods, such as styles, will be covered later on.

Runs properties

Going back to the sample from part 1, let's take a look at a text run:

writer.WriteStartElement("r", wordmlNamespace);
writer.WriteStartElement("t", wordmlNamespace);
writer.WriteValue("Hello world");
writer.WriteEndElement(); // t
writer.WriteEndElement(); // r

After the start of the run ("r") element, let's add a "run properties" element:

writer.WriteStartElement("rPr", wordmlNamespace);
// formatting goes here
writer.WriteEndElement(); // rPr

Now we are free to define the format of the run. Attributes, such as bold, italic, underline are quite easy to define:

writer.WriteElementString("b", wordmlNamespace, "");
writer.WriteElementString("i", wordmlNamespace, "");
writer.WriteElementString("u", wordmlNamespace, "");

More complex attributes, like the font size, color, or superscript/subscript are only slightly more interesting:

writer.WriteStartElement("sz", wordmlNamespace);
writer.WriteAttributeString("val", wordmlNamespace, "8");
writer.WriteEndElement(); // sz

writer.WriteStartElement("color", wordmlNamespace);
writer.WriteAttributeString("val", wordmlNamespace, Colors.Red.ToArgb().ToString("X8").Substring(2));
writer.WriteEndElement(); // color

writer.WriteStartElement("vertAlign", wordmlNamespace);
writer.WriteAttributeString("val", wordmlNamespace, "superscript"); // subcript
writer.WriteEndElement(); // vertAlign

As you can see, defining the basic settings of the text is simple work.

Paragraph properties

In a very similar way, we can define properties at the paragraph level:

writer.WriteStartElement("pPr", wordmlNamespace);
// formatting goes here
writer.WriteEndElement(); // pPr

Alignment of the paragraph is defined using the "jc" (can anyone explain the JC name?!) element:

writer.WriteStartElement("jc", wordmlNamespace);
writer.WriteAttributeString("val", wordmlNamespace, "right");
writer.WriteEndElement(); // jc

And indentation is just as easy (Note: All units in the fields are in TWIPS (1/20 of a point). There are 72 points to an inch and 20 TWIPS to a point, and therefore there are 72 * 20 TWIPS to an inch):

writer.WriteStartElement("ind", wordmlNamespace);
writer.WriteAttributeString("firstLine", wordmlNamespace, "720");
writer.WriteAttributeString("hanging", wordmlNamespace, "1440");
writer.WriteEndElement(); // ind

and lines spacing:

writer.WriteStartElement("spacing", wordmlNamespace);
writer.WriteAttributeString("line", wordmlNamespace, "120");
writer.WriteAttributeString("after", wordmlNamespace, "240");
writer.WriteAttributeString("before", wordmlNamespace, "360");
writer.WriteEndElement(); // spacing

That's it - basic formatting of text is quite simple, yet very powerful. There are a lot more options to control the formatting of textual components.

Labels: , , ,

Wednesday, January 21, 2009

WordprocessingML: Part 1

Let's start with a public notice: Most of the OpenXML/WordprocessingML samples I've found word with TextWriters, and just write XML strings into the writer. Me, being the funny guy I am, prefer to work with XmlWriters. This makes sure I make no structure mistakes in the XML itself, allows me to generate well-formatted XML (useful during development), and just seems more "natural" to me.

Ok. Now that we got that out of the way, let's start with the very basics - creating an empty Docx file:

Creating a WordprocessingDocument object:

Nothing can be more simple than this. Just call the static method "Create" of the type WordprocessingDocument, provide a filename or stream, and select the type of document you want to create. There are several types of documents, defined in the WordprocessingDocumentType enumeration. For more details on this, just go to http://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessingdocumenttype.aspx

Here's the snippet:

using (WordprocessingDocument wpd = WordprocessingDocument.Create(filename, WordprocessingDocumentType.Document))
{
}

Notice how I used the "using" directive - WordprocessingDocument implements the IDisposable interface...

Adding the main part:

Each document consist of multiple parts, the "main" part being the document content itself. other types (which I'll cover in future entries) include styles, numbering, properties and settings. There's nothing exciting in this part - we just ask our document to create a "main" part for itself, and we keep a reference to that part.

MainDocumentPart mainPart = wpd.AddMainDocumentPart();

Getting an XML writer:

Each part implements a "GetStream" method, so this is mostly boiler plate code:

XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = "\t";
settings.Encoding = _new UTF8Encoding();

using (Stream stream = mainPart.GetStream())
using (XmlWriter xmlWr = XmlWriter.Create(stream, settings))
{
}

Adding content:

I'm not going to go too deeply in this section - you can find various samples explaining the full structure of content in documents. For now, let's just say that text goes into paragraphs. Paragraphs consist of runs, and runs contain text.

string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
writer.WriteStartElement("p", wordmlNamespace);
writer.WriteStartElement("r", wordmlNamespace);
writer.WriteStartElement("t", wordmlNamespace);
writer.WriteValue("Hello word"); // NOT A TYPO! :)
writer.WriteEndElement(); // t
writer.WriteEndElement(); // r
writer.WriteEndElement(); // p

That's it - you're free to add content to your document as you see fit. Just don't forget to call the "Close" method of the WordprocessingDocument instance, in order to save the file.

Labels: , , ,

Friday, January 16, 2009

WordprocessingML to the rescue

I've recently found myself in need to create documents out of various sources (other documents, information from ECM system, and so on). I've tried several approaches, including:
  • Word Object Model
  • HTML generation
  • RTF generation
  • Aspose.Words
  • OpenXML

My first try was with the Microsoft Word Object Model, but encountered several bumpers: The complexity of the object model, the requirement of having Microsoft Word installed on the client machine, the non-fluent code - all of those made the whole experience something I'd rather forget.

Next, I tried generating HTML and RTF documents. HTML proved quite simple, but was a bit limited for my requirements, and generating a single-file HTML (MHT) would have required much too manual work to my liking. RTF, with it's specification proved just nasty, for lack of a better word.

At first I was relunctant to use a third part component, such as Aspose.Words. The component provide quite easy to learn, and quite powerful, but was lacking some of the keep requirements (such as formatting tables, embedding objects, etc), and therefore I had to drop it. I'm still planning to use it as a format-converting component, as it allow to easily convert between HTML, PDF, DOC, DOCX and so on, while retaining a high level of fidelity.

Lastly, I tackled the OpenXML SDK. To tell you the truth, I wasn't too happy to go there in the beginning - the SDK seems simple enough, be requires A LOT of manual work with XML - not very user friendly or code-efficient. However, to my surprise, the Markup Language Reference was quite easy to use; the OpenXML format is EXTREMELY powerful (allowing me to do even more than I planned).

Although I am still experiencing some problems in generating numbered paragraphs and such, after a single day of playing with it, I find myself quite comfortable with the OpenXML format, and confident it will suite my needs.

Some useful resources for getting started with OpenXML:

As I continue my research and work, I'll try posting some code samples and guides.

Labels: , , , ,

Thursday, January 8, 2009

Retrieving Documentum repeating values

Documentum provides the functionality of "repeating" properties - properties that have more than one value. Retrieving those values is a simple matter of getting the number of values for that property, and then request each one of the values.

Here's a small utility method:

private static object[] GetRepeatingValue(IDfSysObject dfObj, string attributeName)
{
    int valuesCount = dfObj.getValueCount(attributeName);
    object[] values = new object[valuesCount];

    IDfValue val = null;
    for (int index = 0; index < valuesCount; index++)
    {
        try
        {
            val = dfObj.getRepeatingValue(attributeName, index);
            values[index] = val.asString();
        }
        finally
        {
            NAR(val);
            val = null;
        }
    }
    return values;
}

Labels: , ,

Version comments for a Documentum object

Retrieving the version comments of a Documentum SysObject is an easy task:

private static string GetVersionsComment(IDfSysObject dfObj)
{
    StringBuilder sb = new StringBuilder();

    if (dfObj.getVersionLabelCount() > 0)
    {
        for (int i = 0; i < dfObj.getVersionLabelCount(); i++)
        {
            string versionLabel = dfObj.getVersionLabel(i);
            sb.AppendLine(versionLabel);
        }
        sb.AppendLine(dfObj.getLogEntry());
    }
    return sb.ToString().Trim();
}

Labels: , ,

Displaying properties of a Documentum object

When working with Documentum TypedObjects, you almost always need to retrieve their properties. Below is a method to print those properties to the Console. Notice, that this example uses the getAllRepeatingStrings() method - a useful method for displaying values to the user, but not very useful if you need to process and work with the actual values.
public static void DisplayItem(IDfTypedObject obj)
{
    if (obj == null)
        return;
    Console.WriteLine("-------------------------------------------");
    int attrCount = obj.getAttrCount();
    for (int i = 0; i <>
    {
        IDfAttr attr = null;
        try
        {
            attr = obj.getAttr(i);
            string attrName = attr.getName();
            Console.Write(attrName + ": ");
            if (!obj.hasAttr(attrName) obj.isNull(attrName))
            {
                Console.WriteLine("NULL");
                continue;
            }
            Console.WriteLine(obj.getAllRepeatingStrings(attrName, "; "));
        }
        finally
        {
            NAR(attr);
            attr = null;
        }
    }
}

Labels: , ,

Tuesday, January 6, 2009

Quality Doesn’t Just Happen

Judy McKay writes a very interesting article, Quality Doesn’t Just Happen about project management process, and how to place Quality front-and-center of a project lifecycle.
A quality-focused team produces a better project in a shorter amount of time, every time, but you have to have the right people to make it happen. We won't need the heroes to ride in at the end to save the project if it's never in distress. A well-planned project with a quality focus won't be in crisis. There may still be trade-off decisions, which is why we use risk-based testing to be sure we mitigate the highest risk first, but these can be informed decisions with measurable consequences.

As I gain more experience, both as a developer, and a team leader, I try to learn from past mistakes, pickup good practice and processes. However, I still had to smile to myself and feel uncomfortable in my seat while reading some of this article. This is a good thing - it means I'm still learning!

Labels: ,

The Visitor design patern

I've always been a huge advocat for design patterns in the past, but up until recently, I didn't get a chance to actually implement the Visitor design pattern.

This changed last week, when one of my colleagues asked me to implement a small tool, that handles one of our standard XML files. The tool was simple enough - scan the XML, and create dummy files based on information found the the XML. Nothing too fancy - just a small console application.

This triggered some light-bulb in my head - we've been manipulating those XMLs outside of our product for some time now. Mostly just for tests, or special clients requests. Instead of creating a one-time utility, I could create something much more useful - an infrastructure that will allow me to very quickly create any manipulation tool I'd like.

In comes the Visitor design pattern. I've very quickly created a Visitor abstract class, threw in a few classes that represent the various objects that are described in the XML, and voila - we're pretty much done.

The Visitor design pattern provides the infrastructure to "visit" an hierarchy of items, and "notify" the visitor when each item is handled (in my case, there's a Start-Children-End cycle). Users are now able to implement their own visitors, and handle the various "visits", by just overriding the virtual methods of the abstract Visitor.

From this point on, creating the requested tool took around 10 minutes - override the correct methods, retrieve the required information, and generate the dummy file.

More complex visitiors where just as easy to implement - removing content from the XML, adding new content, and even just running some statistics on the items in the XML.

For more details on design patterns, I strongly suggest reading AT LEAST on of the following books:

Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley Professional Computing Series)
or
Head First Design Patterns

Labels: , ,