import com.ibm.xml.parser.*; .... String filename; .... InputStream is = new FileInputStream(filename); TXDocument doc = new Parser(filename).readStream(is); is.close(); |
Parser#readStream()
never returns null
.
In this way, the parser prints parse errors to the standard error stream.
To access a parse tree, use TXDocument#getDocumentElement()
(see How to operate).
TXDocument#getDocumentElement()
may return null
when the XML document has serious errors.
A Parser
instance cannot be reused.
An application can call the Parser#readStream()
method only once.
NOTE:
A TXDocument
instance generated by Parser
is
also an instance of org.w3c.dom.Document
.
This is a DOM object tree.
You can restructure the parse tree into a stream in XML format.
String charset = "ISO-8859-1"; String jencode = MIME2Java.convert(charset); PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, jencode)); doc.setEncoding(charset); doc.print(pw, jencode); |
You can configure the parser's behavior after making a Parser
instance before readStream()
is called.
setErrorNoByteMark(boolean)
setKeepComment(boolean)
setPreserveSpace(boolean)
setWarningNoDoctypeDecl(boolean)
setWarningNoXMLDecl(boolean)
setWarningRedefinedEntity(boolean)
import com.ibm.xml.parser.*; .... String filename; .... Parser parse = new Parser(filename); parse.setWarningNoDoctypeDecl(false); parse.setWarningNoXMLDecl(false); InputStream is = new FileInputStream(filename); TXDocument doc = parse.readStream(is); is.close(); |
You can control the output of errors produced by the parser.
Make an instance of a class implementing ErrorListener
,
and specify the instance to the Parser
constructor.
The Object key
parameter of an error()
method
is an instance of String
or Exception
..
When key
is String
,
it means a type of error (See the source com/ibm/xml/parser/r/Message.java
).
import com.ibm.xml.parser.*; class ErrorIgnorer implements ErrorListener { public void error(String fname, int lineno, int charoff, Object key, String mes) { // do nothing } } .... String filename; .... InputStream is = new FileInputStream(filename); Parser parse = new Parser(filename, new ErrorIgnorer(), null); TXDocument doc = parse.readStream(is); is.close(); |
import com.ibm.xml.parser.*; import java.awt.TextArea; class ErrorEater extends TextArea implements ErrorListener { String m_fname; int m_noferrors = 0; ErrorEater(String n) { super(); m_fname = n; } public void error(String fname, int lineno, int charoff, Object key, String mes) { append((null == fname ? m_fname : fname)+":"+lineno+":"+mes+"\n"); m_noferrors ++; } public boolean hasError() { return 0 < m_noferrors; } } .... String filename; .... InputStream is = new FileInputStream(filename); Parser parse = new Parser(filename, ee = new ErrorEater(filename), null); TXDocument doc = parse.readStream(is); is.close(); |
See the sources com/ibm/xml/parser/trlxml.java
and
com/ibm/xml/parser/Stderr.java
.
A TXDocument
can have one TXElement
instance,
zero or one DTD
instance, and instances of TXPI
and TXComment
as children.
All children of TXDocument
can also be accessed with
TXDocument#getChildren()
/ TXDocument#getChildrenArray()
.
The TXElement
instance can be accessed
with TXDocuemnt#getDocumentElement()
also.
TXElement
can have some instances of TXElement
,
TXText
, TXPI
and TXComment
as children.
All children of TXElement
can be accessed with TXElement#getChildren()
/
TXElement#getChildrenArray()
.
Some mtehods of TXDocuement
and TXElement
returns one or more instances of the
Child
interface.
These Child
instances are also instances of
TXElement
or
TXText
or
TXPI
or
TXComment
or
DTD
(if a child of TXDocument
).
To know what class an instance belongs to, use Node#getNodeType()
or instanceof
operator like the following:
import com.ibm.xml.parser.*; import com.ibm.dom.*; .... TXDocument doc = ....; TXElement root = doc.getDocumentElement(); NodeEnumerator ne = root.getChildren().getEnumerator(); Node ch; while (null != (ch = ne.getNext())) { if (ch instanceof TXElement) { TXElement el = (TXElement)ch; .... } else if (ch instanceof TXText) { TXText te = (TXText)ch; .... } } |
The processor keeps all spaces and pass them to applications
according to 2.10 White Space Handling
in XML 1.0 Proposed Recommendation.
The processor sets the IsIgnorableWhitespace
flag to
TextElement
instances that consist only of white spaces.
<MEMBERS> <PERSON>Hiroshi</PERSON> <PERSON>Naohiko</PERSON> <PERSON> Kent </PERSON> </MEMBERS> |
The processor parses this Element as follows:
TXElement (getName():"MEMBERS", getText():"\n Hiroshi\n Naohiko\n \n Kent\n \n") TXText ("\n ", ignorable) TXElement (getName():"PERSON", getText():"Hiroshi") TXText ("Hiroshi") TXText ("\n ", ignorable) TXElement (getName():"PERSON", getText():"Naohiko") TXText ("Naohiko") TXText ("\n ", ignorable) TXElement (getName():"PERSON", getText():"\n Kent\n ") TXText ("\n Kent\n ") TXText ("\n", ignorable)
It is useful to call
TXText#trim(String)
/
TXText#trim(String,boolean,boolean)
when an application does not need leading/trailing spaces.
class AElementHandler implements ElementHandler { public TXElement handleElement(TXElement el) { .... } } .... Parser parse = new Parser(...); parse.setElementHandler(new AElementHandler(), "CHANNEL"); TXDocument doc = parse.readStream(is); |
This ElementHandler#handleElement()
method is called after parsing each end tag
(</CHANNEL>
), and before being added to a parent
while processing Parser#readStream()
.
The parser adds to the parent an TXElement
instance
returned by handleElement()
.
If handleElement()
returns null
,
the parser does not add this TXElement
instance to the parent.
There are two methods of setting ElementHandler
:
TXElement
addElementHandler(handler, "CHANNEL");
</CHANNEL>
tag.
TXElement
saddElementHandler(handler);
When more than one ElementHandler is registered in the parser,
the parser first calls ElementHandlers for a specific TXElement
s
(first set, first called)
and then calls ElementHandlers for all TXElement
.
Even if an ElementHandler changes the name of an TXElement
,
the parser calls other ElementHandlers for the original name.
When an ElementHandler returns null
,
the parser does not call other ElementHandlers.
Parser parse = new Parser(...); parse.addElementHandler(handler1); parse.addElementHandler(handler2, "CHANNEL"); parse.addElementHandler(handler3, "CHANNEL"); parse.addElementHandler(handler4); TXDocument doc = parse.readStream(is); |
In this case, when the parser processes the </CHANNEL>
tag,
the parser calls handler2
first, and calls handler3
,
handler1
, and handler4
.
TXDocument
instanceTXDocument doc = new TXDocument();
doc.addElement(...);
PrintWriter
TXDocument
if the encoding of PrintWriter
is not UTF-8.doc.print(...);
TXDocument doc = new TXDocument(); TXElement el = new TXElement("CHANNEL"); .... doc.addElement(el); PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, MIME2Java.convert("Shift_JIS"))); doc.setEncoding("Shift_JIS"); doc.print(pw); |
If you want to use not TXElement
class but a subclass of TXElement
,
implement the ElementFactory
interface
and call Parser#setElementFactory()
.
TXElement
class
DefaultElementFactory
class.
Parser#setElementFactory()
with an instance of the class implementing ElementFactory
.
class MyElement extends TXElement { .... } class MyElementFactory extends DefaultElementFactory { .... } .... Parser parse = new Parser(...); parse.setElementFactory(new MyElementFactory()); TXDocument doc = parse.readStream(is); // doc has not TXElement instances but MyElement instances |
ElementFactory#createElement()
is called when the processor reaches a start-tag.
ElementFactory#ripenElement()
is called when the processor reaches an end-tag.
String systemlit = "http://.../foobar.dtd"; InputStream is = (new URL(systemlit)).openStream(); Parser parse = new Parser(...); DTD dtd = parse.readDTDStream(is); |
Enumeration en = dtd.getAttributeDeclarations("FOO"); while (en.hasMoreElements()) { AttDef attd = (AttDef)en.nextElement(); // attd.getName() is attribute name } |
First, get an AttDef
instance by the above method
or by DTD#getAttributeDeclaration(String,String)
.
Next, check the attribute type by means of AttDef#getType()
,
which returns one of the following values:
TXAttribute.T_CDATA
TXAttribute.T_ENTITIES
Enumeration en = dtd.getEntities(); while (en.hasMoreElements()) { EntityValue ev = (EntityValu)en.nextElement(); if (ev.isNDATA()) { // Each ev.getName() is valid value. } } |
TXAttribute.T_ENTITY
TXAttribute.T_ENUMERATION
AttDef#elements()
.
Enumeration en = attd.elements(); while (en.hasMoreElements()) { String s = (String)en.nextElement(); // Each s is valid. } |
TXAttribute.T_ID
DTD#checkID()
returns null
.
String newid = ... if (null != dtd.checkID(newid)) { // Can't use newid } else dtd.registID(element, newid); |
TXAttribute.T_IDREF
Enumeration en = dtd.IDs(); while (en.hasMoreElements()) { String id = (String)en.nextElement(); // The attribute can have one in a set of each id. } |
TXAttribute.T_IDREFS
TXAttribute.T_NMTOKEN
TXAttribute.T_NMTOKENS
TXAttribute.T_NOTATION
AttDef#elements()
.
Enumeration en = attd.elements(); while (en.hasMoreElements()) { String s = (String)en.nextElement(); // Each s is valid. } |
<!ELEMENT PERSON (NAME, HEIGHT, WEIGHT, EMAIL?)>
When using this declaration, you must insert the "NAME" element into the "PERSON" element first, the "HEIGHT" element second, and the "WEIGHT" element third, you can also insert the "EMAIL" element if you want.
Applications can know such rules with
DTD#getInsertableElements() / DTD#getAppendableElements()
.
TXElement el = new TXElement("PERSON"); .... switch (dtd.getContentType("PERSON")) { case 0: // This element is not declared. break; case DTD.CM_EMPTY: // No element is insertable. break; case DTD.CM_ANY: // Any element is insertable. break; case DTD.CM_REGULAR: Hashtable tab = dtd.prepareTable("PERSON"); // This hashtable is reusable for any elements. dtd.getAppendableElement(el, tab); if (((InsertableElement)tab.get(DTD.CM_ERROR)).status) { // This element has an incorrect structure. } else { Enumeration en = tab.elements(); while (en.hasMoreElements()) { InsertableElement ie = (InsertableElement)en.nextElement(); if (!ie.name.equals(DTD.CM_ERROR) && !ie.name.equals(DTD.CM_EOC) && ie.status) { if (ie.name.equals(DTD.CM_PCDATA)) { // Can append a TextElement instance to el. } else { // Can append an Element instance named ie.name. } } } } break; } |
Namespace spec. is in progress. This implementation is experimental.
Parser#setProcessNamespace(true)
if you need the namespace feature.
"rdf:assertion" without namespace support
| "rdf:assertion" with namespace support
| "author" with namespace support
| |
---|---|---|---|
TXElement#getName() / TXAttribute#getName()
| "rdf:assertion"
| "assertion"
| "author"
|
TXElement#getNameSpace() / TXAttribute#getNameSpace()
| null
| "rdf"
| null
|
TXElement#getQName() / TXAttribute#getQName()
| "rdf:assertion"
| "rdf:assertion"
| "author"
|