Grammar Node attributes
The Grammar Node is the outer containing node for the grammar. It allows you to set default attribute values that are inherited by child Attribute Nodes, unless overridden by those child nodes. The Grammar Node may have any of the following attributes:
Attribute name | Description | Possible Values | Mandatory | Default |
byteOrder | The order of bytes. | L (for little endian) or B (for big endian). | No | B |
nibbleOrder | The order of individual nibbles (4 bits) within each byte. | L (for little endian) or B (for big endian). | No | B |
stringType | The character set used to encode strings in the file. | A string representing the name of the character set. | No | None |
complete | Whether this is a complete grammar. If the flag is set to T for true, if there are any bytes left over in the file once the grammar has completed (i.e. there are no further grammar nodes to process) then an error will be raised. | T for true or F for false. | No | T |
Repeat Node attributes
The child nodes of a Repeat Node may repeat 0, 1, or more times. In addition to common attributes (see 3.1) Repeat Nodes may have the following attributes:
Attribute name | Description | Possible Values | Mandatory | Default |
times | The number of times that the child nodes will repeat | Any of:
| No | No limit |
For example, the following grammar snippet:
Snippet 1:
<Repeat times="3">
<Attr name="text" bytes="2" type="String"/>
<Attr name="separator" discriminator="," type="String"/>
</Repeat>
<Attr name="nextAttribute" bytes="4" type="String"/>
could be used to read the following file:
aa,bb,cc,John
The grammar would read exactly three two-character strings, separated by commas, before moving to the next attribute. However, if the exact number of times to carry out the repeat was not specified then the grammar would have no way of knowing when to stop. It would then read the bytes Jo as the next two-character string, and declare an error when it did not find the expected comma in the next byte.
This next snippet is the same as the first, except that the number of times to repeat is read from a variable defined in an ancestor node.
Snippet 2:
<Sequence name="commaSeparatedBlock" variables="repeatCount">
<Attr name="repeatCount" bytes="1" type="Integer"/>
<Repeat times="repeatCount">
<Attr name="text" bytes="2" type="String"/>
<Attr name="separator" discriminator="," type="String"/>
</Repeat>
</Sequence>
<Attr name="nextAttribute" bytes="4" type="String"/>
To use the attribute repeatCount as a variable in the times parameter of the Repeat block, it must first be declared on an ancestor node in a variables attribute. The scope of the variable is the hierarchy of nodes inside the node that it is declared on. The value in the variable will be set to null when the scope finishes. The variable may only be used within the defined scope.
The next snippet is the same as the previous, except that the number of times to repeat is calculated by an Expression using a script variable declared on an ancestor node.
Snippet 3:
<Sequence name="commaSeparatedBlock" scriptVariables="repeatCount">
<Attr name="repeatCount" bytes="1" type="Integer"/>
<Repeat times="{$commaSeparatedBlock_repeatCount + 2}">
<Attr name="text" bytes="2" type="String"/>
<Attr name="separator" discriminator="," type="String"/>
</Repeat>
</Sequence>
<Attr name="nextAttribute" bytes="4" type="String"/>
To use the attribute repeatCount as a script variable in the times parameter of the Repeat block, it must first be declared on an ancestor node in scriptVariables attribute. Note that in Snippet 2 the variable can be used simply as repeatCount; but in the snippet above the script variable must be used with the format $scope_variableName, i.e. $commaSeparatedBlock_repeatCount.
If there is no way to determine the number of repeats expected, the grammar must include a way of identifying the end of a repeat loop. Typically this is with some sequence of terminating characters. These are specified in the grammar by using the discriminator attribute, as shown in the snippet below.
Snippet 4:
<Repeat>
<Attr name="text" bytes="2" type="String"/>
<Attr name="separator" discriminator="," type="String"/>
</Repeat>
<Attr name="terminator" discriminator="||" type="String"/>
In the above example the repeat loop will continue until the string "||" is found, when it will stop.
The bytes that indicate the end of the repeat loop can also be specified as a hex string, using the hexDiscriminator attribute. The snippet below shows an example of this.
Snippet 5:
<Repeat>
<Attr name="text" bytes="2" type="String"/>
<Attr name="separator" discriminator="," type="String"/>
</Repeat>
<Attr name="terminator" discriminator="A0FF" type="Integer" hexDiscriminator="T"/>
Sequence Node
Sequence Nodes simply define a collection of child nodes. These can be used to define a set of child nodes as a template; or to define the start and end of a sequence of Attribute Nodes that have an overall length specified by another Attribute Node (for example, a Length Node - see 3.4.4) somewhere within the same node hierarchy.
In addition to common attributes (see 3.1) Sequence Nodes may have the following attributes:
Attribute name | Description | Possible Values | Mandatory | Default |
length | The overall expected number of bytes for all Attribute Nodes that are descendants of this node. This is helpful if a group of fields in the file includes optional fields – setting the length helps the grammar to determine where this group of fields has ended, and the next field or group of fields begins. | Any of:
| No | None |
Consider a file which includes the following sequence of bytes:
6aabbcc8
The following grammar could be used to read this portion of the file:
<Attr name="bodyLength" bytes="1" type="Integer"/>
<Sequence name="body">
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
However, this grammar is ambiguous. It cannot determine whether the value 8 at the end of the data is the optional attribute4, or fileLength. To resolve this, the grammar below sets the length of the Sequence Node body, using the value in the Attribute Node bodyLength:
<Sequence name="dataBlock" variables="bodyLength">
...
<Attr name="bodyLength" bytes="1" type="Integer"/>
<Sequence name="body" length="bodyLength">
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
...
</Sequence>
In this grammar bodyLength is declared as a variable on an ancestor node (in this case, another Sequence Node, dataBlock). In the example data above the Attribute Node bodyLength has the value 6. Because we have a specified the length for the Sequence Node body the grammar can now tell, when it reaches the value 8, that it must be fileLength because it has already used 6 bytes for body.
The snippet above could also be written using a Length Node (see 3.4.4) as:
<Sequence name="body">
<Length name="bodyLength" bytes="1" type="Integer"/>
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
The Length Node sets the length of its parent. This length is exclusive of the Length Node itself.
If the first attribute of the data indicated the length of the Sequence Node body inclusive of the Length Node itself, i.e. the data was:
7aabbcc8
then the snippet could be written as:
<Sequence name="body" variables="bodyLength" length="bodyLength">
<Attr name="bodyLength" bytes="1" type="Integer"/>
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
Choice Node
A Choice Node specifies that only one or zero of its child nodes (with all its descendant nodes) may be present in the file. Each child node is therefore optional, and setting the optional attribute to F will have no effect.
For the binary file reader to determine which of the child nodes under a Choice Node are in a file, the discriminator attribute must be set on:
- the child node itself, or
- the first child Attribute or Record Node of the child node.
See 3.4.2 Attribute Node for details of discriminators. A Choice Node has no attributes other than the common attributes (see 3.1).
<Choice>
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
In the snippet above, if the first byte encountered is 1 then the next section of the file will be recognised as a Sequence Node option1, the next 2 bytes will be interpreted as a number, before moving on to the Attribute Node final.
If the first byte encountered is 2 then the next section of the file will be recognised as a Sequence Node option2, the next 8 bytes will be interpreted as a string, before moving on to the Attribute Node final.
If the first byte encountered is a 3 then the reader will fail with an error since the snippet only handles 1 or 2 in the first byte. If a 3 is a valid possibility there are three possible solutions:
Solution 1
Add an additional Attribute Node to the Choice Node with a discriminator of 3, as below:
<Choice>
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type2" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
<Attr name="type3" discriminator="3" type="String/>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
Solution 2
Add an Attribute Node to the Choice Node to pick up any value that is not 1 or 2, as below:
<Choice>
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type2" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
<Attr name="type3" bytes="1" type="String/>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
Note that in the snippet above, if no discriminator is set on the Attribute Node type3 then the number of bytes or bits must be set. You cannot have more than one generic option like this in a Choice Node, since this will make the grammar ambiguous.
Solution 3
Make the whole choice node optional, as below:
<Choice optional="T">
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type2" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
In the snippet above, if the first byte is not 1 or 2, the Choice Node will be skipped and the first byte will be recognised as the Attribute Node final.