Google
XML

Print this page

 

The Extensible Markup Language (XML) is a general-purpose markup language.It is classified as an extensible language because it allows its users to define their own tags. Its primary purpose is to facilitate the sharing of structured data across different information systems, particularly via the Internet. It is used both to encode documents and serialize data. In the latter context, it is comparable with other text-based serialization languages such as JSON and YAML.

It started as a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible. By adding semantic constraints, application languages can be implemented in XML. These include XHTML, RSS, MathML, GraphML, Scalable Vector Graphics, MusicXML, and thousands of others. Moreover, XML is sometimes used as the specification language for such application languages.

XML is recommended by the World Wide Web Consortium. It is a fee-free open standard. The W3C recommendation specifies both the lexical grammar, and the requirements for parsing.

Introduction to XML

All about XML

XML Tutorial


Advantages of XML

It is text-based.
It supports Unicode, allowing almost any information in any written human language to be communicated.
It can represent the most general computer science data structures: records, lists and trees.
Its self-documenting format describes structure and field names as well as specific values.
The strict syntax and parsing requirements make the necessary parsing algorithms extremely simple, efficient, and consistent.
XML is heavily used as a format for document storage and processing, both online and offline.
It is based on international standards.
It can be updated incrementally.
It allows validation using schema languages such as XSD and Schematron, which makes effective unit-testing, firewalls, acceptance testing, contractual specification and software construction easier.
The hierarchical structure is suitable for most (but not all) types of documents.
It manifests as plain text files, which are less restrictive than other proprietary document formats.
It is platform-independent, thus relatively immune to changes in technology.
Forward and backward compatibility are relatively easy to maintain despite changes in DTD or Schema.
Its predecessor, SGML, has been in use since 1986, so there is extensive experience and software available.
An element fragment of a well-formed XML document is also a well-formed XML document.

Disadvantages of XML

XML syntax is redundant or large relative to binary representations of similar data.
The redundancy may affect application efficiency through higher storage, transmission and processing costs.
XML syntax is verbose relative to other alternative 'text-based' data transmission formats.
No intrinsic data type support: XML provides no specific notion of "integer", "string", "boolean", "date", and so on.
The hierarchical model for representation is limited in comparison to the relational model or an object oriented graph.
Expressing overlapping (non-hierarchical) node relationships requires extra effort.
XML namespaces are problematic to use and namespace support can be difficult to correctly implement in an XML parser.
XML is commonly depicted as "self-documenting" but this depiction ignores critical ambiguities.]

Processing XML files
Three traditional techniques for processing XML files are:

Using a programming language and the SAX API.
Using a programming language and the DOM API.
Using a transformation engine and a filter


More recent and emerging techniques for processing XML files are:

Push Parsing
Data binding
Non-extractive XML Processing API

Simple API for XML (SAX)
SAX is a lexical, event-driven interface in which a document is read serially and its contents are reported as "callbacks" to various methods on a handler object of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed. It is better suited to situations in which certain types of information are always handled the same way, no matter where they occur in the document.

DOM
DOM is an interface-oriented Application Programming Interface that allows for navigation of the entire document as if it were a tree of "Node" objects representing the document's contents. A DOM document can be created by a parser, or can be generated manually by users (with limitations). Data types in DOM Nodes are abstract; implementations provide their own programming language-specific bindings. DOM implementations tend to be memory intensive, as they generally require the entire document to be loaded into memory and constructed as a tree of objects before access is allowed.

Transformation engines and filters
A filter in the Extensible Stylesheet Language (XSL) family can transform an XML file for displaying or printing.

XSL-FO is a declarative, XML-based page layout language. An XSL-FO processor can be used to convert an XSL-FO document into another non-XML format, such as PDF.

XSLT is a declarative, XML-based document transformation language. An XSLT processor can use an XSLT stylesheet as a guide for the conversion of the data tree represented by one XML document into another tree that can then be serialized as XML, HTML, plain text, or any other format supported by the processor.

XQuery is a W3C language for querying, constructing and transforming XML data.

XPath is a DOM-like node tree data model and path expression language for selecting data within XML documents. XSL-FO, XSLT and XQuery all make use of XPath. XPath also includes a useful function library.


Push Parsing
A form of XML access that has become increasingly popular in recent years is push parsing, which treats the document as if it were a series of items which are being read in sequence. This allows for writing of recursive-descent parsers in which the structure of the code performing the parsing mirrors the structure of the XML being parsed, and intermediate parsed results can be used and accessed as local variables within the methods performing the parsing, or passed down (as method parameters) into lower-level methods, or returned (as method return values) to higher-level methods. For instance, in the Java programming language, the StAX framework can be used to create what is essentially an 'iterator' which sequentially visits the various elements, attributes, and data in an XML document. Code which uses this 'iterator' can test the current item (to tell, for example, whether it is a start or end element, or text), and inspect its attributes (local name, namespace, values of XML attributes, value of text, etc.), and can also request that the iterator be moved to the 'next' item. The code can thus extract information from the document as it traverses it. One significant advantage of push-parsing methods is that they typically are much more speed- and memory-efficient than SAX and DOM styles of parsing XML. Another advantage is that the recursive-descent approach tends to lend itself easily to keeping data as typed local variables in the code doing the parsing, while SAX, for instance, typically requires a parser to manually maintain intermediate data within a stack of elements which are parent elements of the element being parsed. This tends to mean that push-parsing code is often much more straightforward to understand and maintain than SAX parsing code. Some potential disadvantages of push parsing are that it is a newer approach which is not as well known among XML programmers (although it is by far the most common method used for writing compilers and interpreters for languages other than XML), and that most existing push parsers cannot yet perform advanced processing such as XML schema validation as they parse a document.

Data binding
Another form of XML Processing API is data binding, where XML data is made available as a custom, strongly typed programming language data structure, in contrast to the interface-oriented DOM. Example data binding systems are the Java Architecture for XML Binding (JAXB) and the Strathclyde Novel Architecture for Querying XML (SNAQue).

Non-extractive XML Processing API
Non-extractive XML Processing API is a new and emerging category of parsers. The most representative is VTD-XML, which abolishes the object-oriented modeling of XML hierarchy and instead uses 64-bit Virtual Token Descriptors (encoding offsets, lengths, depths, and types) of XML tokens. VTD-XML's approach enables a number of interesting features/enhancements, such as high performance, low memory usage , ASIC implementation , incremental update , and native XML indexing.

 

On this Page
 

On InterviewFundas.com
 

Related links to other sites

 

 
 
You can put your ad here
 
 Top  
   
   

You are visitor number :