XML:Structure of XML Data.

Structure of XML Data

The fundamental construct in an XML document is the element. An element is simply a pair of matching start- and end-tags, and all the text that appears between them.

XML documents must have a single root element that encompasses all other elements in the document. In the example in Figure 10.1, the <bank> element forms the root element. Further, elements in an XML document must nest properly. For in- stance,

image

is not properly nested.

While proper nesting is an intuitive property, we may define it more formally.

Text is said to appear in the context of an element if it appears between the start-tag and end-tag of that element. Tags are properly nested if every start-tag has a unique matching end-tag that is in the context of the same parent element.

Note that text may be mixed with the subelements of an element, as in Figure 10.2.

As with several other features of XML, this freedom makes more sense in a document processing context than in a data-processing context, and is not particularly useful for representing more structured data such as database content in XML.

The ability to nest elements within other elements provides an alternative way to represent information. Figure 10.3 shows a representation of the bank information from Figure 10.1, but with account elements nested within customer elements. The nested representation makes it easy to find all accounts of a customer, although it would store account elements redundantly if they are owned by multiple customers.

Nested representations are widely used in XML data interchange applications to avoid joins. For instance, a shipping application would store the full address of sender and receiver redundantly on a shipping document associated with each shipment, whereas a normalized representation may require a join of shipping records with a company-address relation to get address information.

In addition to elements, XML specifies the notion of an attribute. For instance, the type of an account can represented as an attribute, as in Figure 10.4. The attributes of

image

image

an element appear as name=value pairs before the closing “>” of a tag. Attributes are strings, and do not contain markup. Furthermore, attributes can appear only once in a given tag, unlike subelements, which may be repeated.

Note that in a document construction context, the distinction between subelement and attribute is important—an attribute is implicitly text that does not appear in the printed or displayed document. However, in database and data exchange applications of XML, this distinction is less relevant, and the choice of representing data as an attribute or a subelement is frequently arbitrary.

One final syntactic note is that an element of the form <element></element>, which contains no subelements or text, can be abbreviated as <element/>; abbreviated elements may, however, contain attributes.

Since XML documents are designed to be exchanged between applications, a namespace mechanism has been introduced to allow organizations to specify globally unique names to be used as element tags in documents. The idea of a namespace is to prepend each tag or attribute with a universal resource identifier (for example, a Web address) Thus, for example, if First Bank wanted to ensure that XML documents

image

it created would not duplicate tags used by any business partner’s XML documents, it can prepend a unique identifier with a colon to each tag name. The bank may use a Web URL such as http://www.FirstBank.com as a unique identifier. Using long unique identifiers in every tag would be rather inconvenient, so the namespace standard provides a way to define an abbreviation for identifiers.

In Figure 10.5, the root element (bank) has an attribute xmlns:FB, which declares that FB is defined as an abbreviation for the URL given above. The abbreviation can then be used in various element tags, as illustrated in the figure.

A document can have more than one namespace, declared as part of the root element. Different elements can then be associated with different namespaces. A default namespace can be defined, by using the attribute xmlns instead of xmlns:FB in the root element. Elements without an explicit namespace prefix would then belong to the default namespace.

Sometimes we need to store values containing tags without having the tags interpreted as XML tags. So that we can do so, XML allows this construct:

<![CDATA[<account> ·· ·</account>]]>

Because it is enclosed within CDATA, the text <account> is treated as normal text data, not as a tag. The term CDATA stands for character data.

image

Comments

Popular posts from this blog

XML Document Schema

Extended Relational-Algebra Operations.

Distributed Databases:Concurrency Control in Distributed Databases