SonghaySystem(::)

Songhay.NET: An Overview of the Songhay XML Namespace

CodePlex

The code archive for the Songhay XML Namespace is available at CodePlex.com in the Songhay System Data Access Framework.

SonghayThis document summarizes the design goals and designer intentions behind the Songhay.Xml namespace, written in C# for .NET 2.0.50727. This namespace, introduced in “Songhay.NET: An Overview of the Songhay Namespace,” is so large that it needs its own document! This is an indicator of a Microsoft future of enhancing or supplementing System.Xml—or this is an indicator of the ignorance of the developer (me). Again, the contents of this namespace attempts to not make the mistake of providing functionality that is already available in .NET. Rather, the Songhay.Xml namespace contains definitions that assert specific purpose over the general purpose of the .NET framework.

To explain the primal importance of XML is beyond the scope of this document. The specific purpose of the Songhay.Xml namespace is governed by these strong words:

The strong word, Utility, implies that “helper” methods are defined for use. So the Songhay.Xml.HtmlUtility and Songhay.Xml.XmlUtility classes in this namespace indicate generic “helper” members concerned with HTML and XML, respectively. The distinction here between HTML and XML is the formal recognition that HTML degrades into non-well-formed XML demanding the use of Regular Expressions.

The strong word, Glyph, recognizes the concept of the typographic glyphs—specifically the Latin glyphs that exist outside of the ASCII character set. The specific problem of translating HTML entities to full UTF-8 glyphs is recognized and respected here. The Songhay.Xml.LatinGlyphs struct defines a response to this problem.

The Songhay.Xml.HtmlUtility Static Class

Songhay.Xml.HtmlUtilityThis defines generic helper procedures for HTML processing.

The Songhay.Xml.HtmlUtility Public Members

ConvertToHtml()

Returns a string of marked up text compatible with degenerate browsers that do not support XHTML (loosely towards HTML 4.x W3C standard). These are the activities inside this member:

  • Minimize selected XHTML block elements.
  • Remove XHTML html element attributes.
  • Remove XHTML element minimization.
  • Remove XHTML attribute minimization.
ConvertToXml()

Attempts to convert HTML to well-formed XML. This task is simpler than converting to XHTML. These are the activities inside this member:

  • Remove xmlns attributes.
  • Close open elements.
  • Find attributes without quotes.
  • Generate attributes for checked, nobreak, nosave and selected.
  • Look for URI query strings with raw ampersands.
  • Replace the CDATA "x­mlns" with "x-mlns" (adds a soft-hyphen). For more information, read “XSLT Problem: The String "x­mlns" Can Mangle Output.”
FormatXhtmlElements()

Returns an XHTML string derived from a .NET procedure. This member addresses certain quirks that well-formed XML cannot have in a contemporary Web browser. These are the activities inside this member:

  • Maximize selected empty minimized block elements. These are the elements that cannot be minimized: a, iframe, td, th and script.
GetInnerXml()

Returns the “inner” fragment of XML from the specified unique element.

The strong word XML is somewhat abused here because what is returned may not be well-formed XML.

PublicDocType() Emits a public DOCTYPE tag for HTML 4.0 transitional (by default).

The Songhay.Xml.LatinGlyphs Struct

Songhay.Xml.LatinGlyphsThis data structure condenses and expands selected Latin glyphs. To condense a glyph means that its HTML entity form is ‘reduced’ to its single glyph. This means that é is condensed into é. To expand a glyph is the reverse of this process. Note that the decimal form of entities is preferred here because only five named entities are supported by XML. However some named entities are supported (see below).

This data structure uses a static, generic Dictionary <String,String> to store glyph-decimal-entity pairs. This should immediately suggest the brute force nature of this struct that was designed to address a specific problem!

The Songhay.Xml.LatinGlyphs Public Members

Condense()

Condenses selected decimal and named entities into their Latin glyph equivalent.

The named entities were selected for historical reasons. For Songhay System applications, these were the conventional/editorial entities most in use:

  • &copy;
  • &eacute;
  • &nbsp;
  • &reg;
  • &trade;
Expand()

Expands selected Latin glyphs into their decimal entity equivalent. These are the ranges expanded:

  • Characters 128–159
  • Characters 160–191
  • Characters 192–223
  • Characters 224–255

As of this writing, the range is simply 128 through 255.

The Songhay.Xml.XmlUtility Static Class

Songhay.Xml.XmlUtilityThis defines generic helper procedures for XML processing.

The Songhay.Xml.XmlUtility Public Members

ExpandSpecialChars() Expands selected characters in the specified System.String into the standard XML entities (&amp;, &apos;, &gt;, &lt; and &quot;). Comments are preserved.
GetAttributeValue()

An alternative to System.Xml.XPath.XPathNavigator.GetAttribute.

Returns the value of an XML attribute based on the specified IXPathNavigable set, XPath query and XmlNamespaceManager (optional).

GetInternalMessage()

Generates a conventional, ‘schema-free’ XML message used by XSLT applications.

This message format can be used to display exceptions (in a secure context of course) or the results of an operation that does not return data.

Remember that this member exists by convention. The sordid details of this cultural practice are beyond the scope of this document!

GetInternalMessageDoc() Serves the same purpose as GetInternalMessage() but returns an XPathDocument navigable document.
GetInstance<>()

‘Hydrates’ a native .NET type based on the specified XML file and System.Type. This member is a wrapper for the XmlSerializer.Deserialize method using Generics.

FxCop considers this method ‘dangerous’ (see below).

GetInstanceRaw<>()

‘Hydrates’ a native .NET type based on the specified XML fragment and System.Type. This member is a wrapper for the XmlSerializer.Deserialize method using Generics.

FxCop considers this method ‘dangerous’ (see below).

GetNamespaceManager()

Returns a System.Xml.XmlNamespaceManager with respect to the document element of the specified IXPathNavigable set.

The requirement of namespace managers for simple navigable document operations can be quite tiresome. This member is here to relieve some misery!

GetNavigableDocument() Returns an XPathDocument based on the specified System.String (as an XML fragment) or the IXPathNavigable set.
GetNavigableNode() Returns an XPathNavigator based on the specified IXPathNavigable set, XPath query and XmlNamespaceManager (optional).
GetNavigableNodes() Returns an XPathNodeIterator based on the specified IXPathNavigable set, XPath query and XmlNamespaceManager (optional).
GetNodeValue()

Returns a System.Object based on the specified XPathNavigator, XPath query and a Boolean that when true throws a new XmlException for XPath queries that fail. This member also takes an optional default value to be used in case of XPath query failure.

This member is extremely important because it provides an alternative to schemas and strongly suggests that conventional XPath assertions can be used before formal schemas.

This member returns System.Object because ADO.NET parameters are objects.

GetNodeValueAndParse<>()

This member works just like GetNodeValue() but adds the ability to parse for the specified .NET type. These are the supported types:

  • Boolean
  • Byte
  • DateTime
  • Decimal
  • Double
  • Int16
  • Int32
  • Int64
  • String
GetText() “Cleans” XML data returning in a System.IO.MemoryStream. The primary activity in this procedure is to remove \0 characters.
GetXslResult() Returns an XPathDocument for the transformation of the XSLT navigable document and the XML navigable document, with the option of providing a System.Xml.Xsl.XsltArgumentList.
GetXslString() Returns a System.String for the transformation of the XSLT navigable document and the XML navigable document, with the option of providing a System.Xml.Xsl.XsltArgumentList.
LoadXslTransform()

Returns a System.String for the transformation of the XSLT file and the XML file or navigable document, with the option of providing a System.Xml.Xsl.XsltArgumentList.

You also have the option of sending an XSLT command to a conventional XSLT parameter (cmd).

Also note that this member assumes that XSLT comes from file locations. The notion of, say, an ‘in-memory’ XSLT file (as a navigable set) is not recognized here.

StripNamespaces()

Strips the namespaces from the specified System.String based on the specified string or navigable document.

Stripping namespaces “flattens” the document and can cause local-name collisions. This routine does not remove namespace prefixes.

WriteReader() Transfers the data in the source XmlTextReader to the destination XmlTextWriter. These arguments are passed by reference and throws an error in FxCop (see below).
WriteXslTransform() Transforms the specified navigable XML input document and XSLT document, and writes to disk with the specified path. This member optionally takes an XmlReader as input.

Response to FxCop Analysis of this Namespace

As of this writing, the Songhay.Xml namespace ‘violates’ internationalization as indicated by the SpecifyIFormatProvider error. This is a typical, North American oversight that will be addressed in future!

The Songhay.Xml namespace also gets the AvoidNamespacesWithFewTypes error. The hope is that this namespace will shrink further!

The DoNotPassTypesByReference error in Songhay.Xml.XmlUtility.WriteReader() only piques my interest in why this member is hanging around. A review of any applications depending on this code reveals nothing so far. This member was written for some ASP.NET scenario…

The DoNotExposeGenericLists error is just an FxCop mystery to me.

Both Songhay.Xml.XmlUtility.GetInstance<>() and Songhay.Xml.XmlUtility.GetInstanceRaw<>() methods cause the GenericMethodsShouldProvideTypeParameter error. The guidance for this error is this:

Methods where the type parameter cannot be inferred from the parameters and therefore has to be defined in the method call are too difficult to understand. Methods with a formal parameter typed as the generic method type parameter support inference. Methods with no formal parameter typed as the generic method type parameter don’t support inference.

I’m staring at this right now… Of course this is not ridiculous and it deserves some investigation.

 
This document was last reviewed on Tuesday, May 01, 2007 at 04:16 PM PDT.
Copyright© 2008 by Bryan D. Wilhite All rights reserved. No part of this material may be used or reproduced in any form or by any means, or stored in a database or retrieval system, without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this material for any purpose other than your own personal use is a violation of United States copyright laws.

The information provided by Bryan D. Wilhite at kintespace.com is provided “as is” without warranty of any kind. In no event shall Bryan D. Wilhite or any of his affiliates be liable for any damages whatsoever including, but not limited to, direct, indirect, incidental, consequential, loss of business profits or special damages due to material published by Bryan D. Wilhite or any of his affiliates.