Troubleshooting BlogML - Schema

In the first post of my series about BlogML troubleshooting I gave an introduction to BlogML troubleshooting in general.  In this post I want to talk about issues related to BlogML schema and specification.

BlogML is a derivation of XML format and like many other derivations has its own specification and structure.

It's not related to this post but for those who don't know: we have three common ways to define the structure of an XML file: XSD schema, DTD and Relax NG.  The XSD schema is the most common way and something that .NET developers use frequently.  XSD schema is an XML file, itself, and has a special structure to declare the structure of an XML file like the order of elements, list of possible elements, attributes  and ...

In BlogML we used XSD schemas in order to define the structure of BlogML files and apply our specification to general XML format.  This has some benefits for us:

This schema has not been a very critical problem for BlogML users but I think it's worth to clarify some points about schema to help users understand what we're talking about when we refer to schema and validation process.  The first thing that I ask from BlogML users in many situations is "do you have a valid BlogML file?" then they answer "how can I make sure I have a valid BlogML file?".  Therefore, let me talk about this topic here.

The BlogML XSD schema is something that we defined for BlogML structure.  Each version of BlogML has its own schema and .NET API library.  You need to choose the correct version for your schema, .NET API library and converter tool in order to be able to perform a success migration.

This XSD file is available as a part of BlogML download but some migration tools have provided it along the converter (like my Community Server converter).

We recommend users to validate their BlogML files (exporter from the source blogging tool) against this schema to make sure everything is correct inside the document.  This validation can be very helpful to avoid further problems and can help to solve issues.

For example, last week a non-technical user had reported some issues with his BlogML migration on Community Server Gold forums.  After receiving his file, I belief that his file has a wrong structure and could stop him from using BlogML.

Below is a small part of BlogML schema in a schematic form (the actual schema is an XML file).  As you see, the BlogML schema declares the relationship between elements and their child elements and attributes and the type of each element or attribute.

BlogML Schema

But how to validate a BlogML file against this schema?  Let me give a brief description about XML validation for non-technical users.  There are two general groups of issues with an XML file:

To validate a BlogML file against BlogML schema, you have several choices.  The number of your choices is equal to all available XML validators in the world.  All you need is to choose one of them and give the address or content of your BlogML file and BlogML schema as inputs to it and perform a validation.  I personally like to use Visual Studio XML validation features but it has some limitations.  The big limitation is it only lists a specific number of errors and warnings and doesn't include any errors or warning after reaching to this number.  But its integration with my favorite IDE is enough to encourage me to use it.  We don't expect anyone to have Visual Studio installed so from the early versions we've provided a built-in XML validation application as a part of BlogML download package.  This validator isn't very professional.  It just gets the content of your BlogML file and the content of BlogML schema and performs a validation then lists any error or warning in a MessageBox one by one.  Note that based on the size of your file, it may take longer to load the file and perform the validation so you need to be patient while performing this validation especially if you're an old blogger with many posts.  It's loading all the posts that you've written during some years!!  The other point is you may see some of the errors and warnings after solving current issues and they may be hidden before you correct other issues.

Recently I created a work item to improve this validator for next version to work in a way similar to other XML validators.  You can even write your own validator application with .NET.  It's so easy and I've described it in an article about XML validation against XSD schemas in .NET 1.1 and 2.0.

After validating your BlogML file, you may get some errors or warnings.  All validators (including the built-in BlogML validator) provide the line and column number of the errors or warning as well as a text description about its reason.  In most cases, this description is very helpful to figure out the reason.  Based on the type of error or warning and its reason, you need to follow some steps to solve the issue and get a clean BlogML file.  Often it's not easy to solve some errors and warnings that are generated by export tools because you need to deal with a large number of posts hence errors or warnings.  Here is the point!  A good converter tool is a tool that can generate an output without errors or warnings.  However, if you have a BlogML file with this problem then don't worry.  Anyway, you need to solve errors but can ignore warnings to see what will happen on migration.  We have done most serialization code manually to handle such problems and hope that this can reduce the difficulty for users.

For example, below you see a snapshot of validation of a BlogML file in Visual Studio.

XML Validation

As you see, this BlogML file has four errors and one warning.  Four errors are a result of using hexadecimal values in the XML document and corresponding line and column of each error is shown in the window.  That warning is also a result of a unexpected <div> element in BlogML file which is showing a WordPress database error.

At the end, I have to note that at least you need to have a BlogML file without errors to be able to use migration tools.  The version of the BlogML schema, BlogML .NET API and your converter tool should match though.

[advertisement] Axosoft OnTime 2008 is four developer tools in one: bug tracking, project wiki, feature management, and help desk. It manages your development process so developers can focus on coding. Installed or Hosted – Free Single-user license -- Free 30-day team trial.

5 Comments : 09.15.07

Feedbacks

 avatar
#1
Keyvan Nayyeri
09.21.2007 @ 12:09 PM
So far I've written two posts about troubleshooting BlogML: Introduction Schema In this post I want to
 avatar
#2
Dale
09.22.2007 @ 10:29 AM
I have Visual Studio on my machine, but am not yet a heavy user of it. How does one run the validation on a BlogML filr with VS? Aside from that, what is the easiest to install local validator I can use? You know my BlogML file, and that it is BIG (posts back to 2002 and lots of them) Dale
 avatar
#3
Keyvan Nayyeri
09.28.2007 @ 6:14 PM
After writing an introduction to troubleshooting BlogML as well as two posts about BlogML schema and
 avatar
#4
Keyvan Nayyeri
10.09.2007 @ 6:39 PM
As the last post of this series, I want to write shortly about BlogML issues related to import tools
 avatar
#5
News
10.23.2007 @ 7:20 AM
In the second post of my post series about BlogML troubleshooting I discussed about BlogML schema and

Leave a Comment