XML/Unix Processing Tools Documentation

Usage

There are four tools. None of them take any command-line arguments; they are all simple filters, reading information from standard input in one format and writing the same information to standard output in a different format.

Tool nameInputOutput
xml2 XML Flat
html2 HTML Flat
2xml Flat XML
2html Flat HTML

The ``Flat'' format is specific to these tools. It is a syntax for representing structured markup in a way that makes it easy to process with line-oriented tools. The same format is used for HTML and XML; in fact, you can think of html2 as converting HTML to XHTML and running xml2 on the result; likewise 2html and 2xml. (Of course, this isn't how the implementation works.)

File Format

To use these tools effectively, it's important to understand the ``Flat'' format. Unfortunately, I'm lazy and sloppy; rather than provide a precise definition of the relationship between XML and ``Flat'', I will simply give you a pile of examples and hope you can generalize correctly. (Good luck!)

XMLFlat equivalent
<thing/> /thing

<thing><subthing/></thing> /thing/subthing

<thing>stuff</thing> /thing=stuff

<thing>
<subthing>substuff</subthing>
stuff
</thing>
/thing/subthing=substuff
/thing=stuff

<person>
<name>Juan Doé</name>
<occupation>Zillionaire</occupation>
<pet>Dogcow</pet>
<address>
123 Camino Real
<city>El Dorado</city>
<state>AZ</state>
<zip>12345</zip>
</address>
<important/>
</person>
/person/name=Juan Doé
/person/occupation=Zillionaire
/person/pet=Dogcow
/person/address=123 Camino Real
/person/address/city=El Dorado
/person/address/state=AZ
/person/address/zip=12345
/person/important

<collection>
<group>
<thing>stuff</thing>
<thing>stuff</thing>
</group>
</collection>
/collection/group/thing=stuff
/collection/group/thing
/collection/group/thing=stuff

<collection>
<group>
<thing>stuff</thing>
</group>
<group>
<thing>stuff</thing>
</group>
</collection>
/collection/group/thing=stuff
/collection/group
/collection/group/thing=stuff

<thing>
stuff

more stuff
&lt;other stuff&gt;
</thing>
/thing=stuff
/thing=
/thing=more stuff
/thing=<other stuff>

<thing flag="value">stuff</thing> /thing/@flag=value
/thing=stuff

<?processing instruction?>
<thing/>
/?processing=instruction
/thing

XML/Unix Processing Tools