![]() |
SgmlQL data types |
The SgmlQL type system consists of atomic types (numbers, booleans, strings, and names), and complex types (elements, documents, sets, and lists), which are built up from the atomic types.
In SgmlQL there is a single numeric type. Numeric literals are specified in the usual floating point or integer formats, according to the following syntax:
[0-9]+("."[0-9]+)?([eE][-+]?[0-9]+)?
Examples
1 1.45 1977 44 .21e02
There are two boolean values, TRUE and FALSE (upper case).
Examples
TRUE FALSE
Literal strings are enclosed either within single quotes or double quotes, in SGML style. The usual backslash rules apply for representing characters such as newline, tab, etc.:
\t tab \n newline \r return \f form feed \b backspace \" double quote \' single quote \% percent signThe empty string is denoted by "" or ''.
Examples
"Hello" "Hello\n" "He said: 'Hello'" 'He said: "Hello"' ""
Names are identifiers which follow the SGML syntax for names, with the following exceptions:
- names do not contain lower case letters
- a sharp sign (#) is allowed at the beginning of a name. Note that in a comment, the sharp sign is also used, but must be followed by a space. No space is allowed between the sharp sign and the first letter of a name.
The resulting syntax is :
#?[A-Za-z][\-.0-9A-Za-z]*Note that there is no length limitation.
Names starting with # are hidden names; all others are visible names. Hidden names are used only within an SgmlQL query; they do not appear in any SGML document--neither a document that is the object of a query, nor one created from the result of a query.
The hidden names #PCDATA and #FROM are predefined.
Examples
HEAD P DIV2 AUTHOR.NAME #PCDATA #FOO
An element is a triple composed of:
- a generic identifier of type name;
- an attribute-value set of type atvset;
- a content of type string or element list;
Elements with content string are called pseudo-elements. A (hidden) generic identifier #PCDATA is generated for such elements. Pseudo-elements are treated as strings by operators that require strings as arguments.
Literal elements are enclosed within percent signs: % %.
Examples
%<HEAD>Subject: Energy cooperation: assessment</HEAD>% %<MEMO> <FROM>me</FROM> <TO>you</TO> <BODY>Hello!</BODY> </MEMO>%
When a document is read by SgmlQL, the location of each element in the SGML tree is stored in a hidden attribute on the element itself, called #FROM. The location is represented as a string of numbers separated by dots, where each number indicates the sibling number at the corresponding level of the tree. For example, the location 1.2.1.3.1.2.1 starts at the root of the SGML tree (the initial 1), and descends taking the designated child at each node (i.e., the second child of the root, then the first child of this node, then the third child of this node, etc.). Note that PCDATA is considered as a node in the tree and therefore is given a location. For example, the elements in the following document<DOCTYPE MEMO SYSTEM "memo.dtd"> <MEMO> <FROM>me</FROM> <TO>you</TO> <BODY>Hello <EMPH>old</EMPH> friend!</BODY> </MEMO>
have the following locations:
Element Location <MEMO> 1 <FROM> 1.1 me 1.1.1\1 <TO> 1.2 you 1.2.1\1 <BODY> 1.3 Hello 1.3.1\1 <EMPH> 1.3.2 old 1.3.2.1\1 friend! 1.3.3\1
A document is a pair composed of:
- a doctype declaration, itself composed of:
- a document keyword, of type name;
- a document dtd, of type string;
- a body, of type element.
Literal documents are enclosed within percent signs: % %.
Example
%<DOCTYPE MEMO SYSTEM "memo.dtd"> <MEMO> <FROM>me</FROM> <TO>you</TO> <BODY>Hello!</BODY> </MEMO>%
Sets can contain members of any atomic type. Sets must be encoded within curly brackets, their members being separated by commas ({ ... , ... , ... }). The empty set is denoted by {}. Note that atomic objects are treated as sets of cardinality one.
Examples
{3, "hello", DIV} {}Special types of sets are associative sets, in which each member is a list of two elements, a key and a datum. In the current version, only attribute-value sets are implemented.
An attribute-value set, or atvset, is an associative set, of which each member is composed of:
- a key of type name (the attribute name);
- a datum of type string.
Literal atvsets are enclosed within curly brackets. Attribute names and values are separated by an equal sign.
Examples
{TYPE="section" , N="4" , ID="s4"} {}The datum associated with a given key is denoted by the sign ->.
Example
$d->TYPE returns: section
A list is an ordered set. Lists can contain members of any type but lists and sets.
Lists must be enclosed within [ ], their members being separated by commas ([ ... , ... , ... ]). The empty list is denoted by []. Note that atomic objects are treated as lists of length one.
Examples
[3, "hello", DIV] []Special types of lists are element lists whose members are elements, possibly including pseudo-elements, in which case the list is a mixed content list. However, two pseudo-elements cannot be adjacent in a mixed content list.
Element lists can be written either using the notation above, or they can be written within % %, as for elements. When the % % notation is used, the content follows the format of SGML element content (i.e., no commas between elements, no quotes around strings, etc.).
Examples
[ %<FROM>me</FROM>% , %<TO>you</TO>% ] [ "He was born on ", %<DATE>15 May 1950</DATE>%, " in New York" ] []are equivalent respectively to:
%<FROM>me</FROM><TO>you</TO>% %He was born on <DATE>15 May 1950</DATE> in New York% %%