Child pages (Children Display) | ||
---|---|---|
|
The following sections describe each of the nodes, and their attributes.
...
Common attributes
Some attributes are common to all nodes, except for Grammar Nodes. These are:
Attribute name | Description | Possible Values | Mandatory | Default |
optional | Whether the structure or value represented by the node will always be present in the file. | T for true or F for false | No | F |
template | Whether this node and all its children represent a template. Templates must be defined before any nodes that use them. | T for true or F for false | No | F |
definedBy | Indicates that this node uses a pre-defined template. If set, all of the attributes of the template node will be used as defaults for the attributes of this node, unless specific attribute values are set on this node to override the template values. | The name of the node defining the template. The referenced node must have the template attribute set to T. | No | None |
name | The name that will be reported in error messages, and that must be used when referring to this node from another node (e.g. to a template node). Although the name attribute is not mandatory on most nodes, it is best to always specify a name because this will improve error reporting if there is a problem processing a file. | Any alphanumberic string. However, must not start with a number or contain any spaces. | Yes for data nodes (Attribute, Tag, Length, Value, Bytes and Record). Otherwise no. | None |
scriptVariables | The names of Attribute Nodes that will be used as variables in Expressions elsewhere in the hierarchy of nodes inside this node. These variable names must match the names of Attributes Nodes contained somewhere in the hierarchy inside this node. | A comma separated list of Attribute Node names, e.g. attr1,attr2. | No | None |
variables | The names of Attributes Nodes that will be used as variables in the bits, bytes, length or times attributes of other nodes. These variable names must match the names of Attributes Nodes contained somewhere in the hierarchy inside this node. | A comma separated list of Attribute Node names, e.g. attr1,attr2. | No | None |
...
The Grammar Node is the outer containing node for the grammar. It allows you to set default attribute values that are inherited by child Attribute Nodes, unless overridden by those child nodes. The Grammar Node may have any of the following attributes:
Attribute name | Description | Possible Values | Mandatory | Default |
byteOrder | The order of bytes. | L (for little endian) or B (for big endian). | No | B |
nibbleOrder | The order of individual nibbles (4 bits) within each byte. | L (for little endian) or B (for big endian). | No | B |
stringType | The character set used to encode strings in the file. | A string representing the name of the character set. | No | None |
complete | Whether this is a complete grammar. If the flag is set to T for true, if there are any bytes left over in the file once the grammar has completed (i.e. there are no further grammar nodes to process) then an error will be raised. | T for true or F for false. | No | T |
...
The child nodes of a Repeat Node may repeat 0, 1, or more times. In addition to common attributes (see 3.1) Repeat Nodes may have the following attributes:
Attribute name | Description | Possible Values | Mandatory | Default |
times | The number of times that the child nodes will repeat | Any of:
| No | No limit |
...
. |
...
...
Sequence Nodes simply define a collection of child nodes. These can be used to define a set of child nodes as a template; or to define the start and end of a sequence of Attribute Nodes that have an overall length specified by another Attribute Node (for example, a Length Node - see 3.4.4) somewhere within the same node hierarchy.
In addition to common attributes (see 3.1) Sequence Nodes may have the following attributes:
Attribute name | Description | Possible Values | Mandatory | Default |
length | The overall expected number of bytes for all Attribute Nodes that are descendants of this node. This is helpful if a group of fields in the file includes optional fields – setting the length helps the grammar to determine where this group of fields has ended, and the next field or group of fields begins. | Any of:
| No | None |
Consider a file which includes the following sequence of bytes:
6aabbcc8
The following grammar could be used to read this portion of the file:
<Attr name="bodyLength" bytes="1" type="Integer"/>
<Sequence name="body">
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
However, this grammar is ambiguous. It cannot determine whether the value 8 at the end of the data is the optional attribute4, or fileLength. To resolve this, the grammar below sets the length of the Sequence Node body, using the value in the Attribute Node bodyLength:
<Sequence name="dataBlock" variables="bodyLength">
...
<Attr name="bodyLength" bytes="1" type="Integer"/>
<Sequence name="body" length="bodyLength">
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
...
</Sequence>
In this grammar bodyLength is declared as a variable on an ancestor node (in this case, another Sequence Node, dataBlock). In the example data above the Attribute Node bodyLength has the value 6. Because we have a specified the length for the Sequence Node body the grammar can now tell, when it reaches the value 8, that it must be fileLength because it has already used 6 bytes for body.
The snippet above could also be written using a Length Node (see 3.4.4) as:
<Sequence name="body">
<Length name="bodyLength" bytes="1" type="Integer"/>
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
The Length Node sets the length of its parent. This length is exclusive of the Length Node itself.
If the first attribute of the data indicated the length of the Sequence Node body inclusive of the Length Node itself, i.e. the data was:
7aabbcc8
then the snippet could be written as:
<Sequence name="body" variables="bodyLength" length="bodyLength">
<Attr name="bodyLength" bytes="1" type="Integer"/>
<Attr name="attribute1" bytes="2" type="String"/>
<Attr name="attribute2" bytes="2" type="String"/>
<Attr name="attribute3" bytes="2" type="String"/>
<Attr name="attribute4" bytes="1" type="Integer" optional="T"/>
</Sequence>
<Attr name="fileLength" bytes="1" type="Integer"/>
...
A Choice Node specifies that only one or zero of its child nodes (with all its descendant nodes) may be present in the file. Each child node is therefore optional, and setting the optional attribute to F will have no effect.
For the binary file reader to determine which of the child nodes under a Choice Node are in a file, the discriminator attribute must be set on:
- the child node itself, or
- the first child Attribute or Record Node of the child node.
See 3.4.2 Attribute Node for details of discriminators. A Choice Node has no attributes other than the common attributes (see 3.1).
<Choice>
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
In the snippet above, if the first byte encountered is 1 then the next section of the file will be recognised as a Sequence Node option1, the next 2 bytes will be interpreted as a number, before moving on to the Attribute Node final.
If the first byte encountered is 2 then the next section of the file will be recognised as a Sequence Node option2, the next 8 bytes will be interpreted as a string, before moving on to the Attribute Node final.
If the first byte encountered is a 3 then the reader will fail with an error since the snippet only handles 1 or 2 in the first byte. If a 3 is a valid possibility there are three possible solutions:
Solution 1
Add an additional Attribute Node to the Choice Node with a discriminator of 3, as below:
<Choice>
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type2" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
<Attr name="type3" discriminator="3" type="String/>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
Solution 2
Add an Attribute Node to the Choice Node to pick up any value that is not 1 or 2, as below:
<Choice>
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type2" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
<Attr name="type3" bytes="1" type="String/>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
Note that in the snippet above, if no discriminator is set on the Attribute Node type3 then the number of bytes or bits must be set. You cannot have more than one generic option like this in a Choice Node, since this will make the grammar ambiguous.
Solution 3
Make the whole choice node optional, as below:
<Choice optional="T">
<Sequence name="option1">
<Attr name="type1" discriminator="1" type="String"/>
<Attr name="numberValue" bytes="2" type="Integer"/>
</Sequence>
<Sequence name="option2">
<Attr name="type2" discriminator="2" type="String"/>
<Attr name="stringValue" bytes="8" type="String"/>
</Sequence>
</Choice>
<Attr name="final" bytes="2" type="Integer"/>
In the snippet above, if the first byte is not 1 or 2, the Choice Node will be skipped and the first byte will be recognised as the Attribute Node final.
...
A script node allows the user to define an Expression to be evaluated at that point in the grammar. The Expression must be entered as text between opening and closing Script Nodes, as shown below:
<Script>$complete = if($count == 10, 1, 0)</Script>
A CDATA tag must be used if the Expression includes any characters which could be interpreted as XML (e.g. a < sign), as shown below:
<Script>
<![CDATA[
$complete = if($count < 10, 1, 0)
]]>
</Script>
Dollar variables in an Expression may refer to preceding Attribute Nodes, now assigned values read from the file, providing that these Attribute Nodes have been declared as script variables in the scriptVariables attributes of ancestor nodes (in each case, an ancestor node of both the Attribute Node, and the current Script Node). Dollar variables referring to Attribute Nodes in this way must be in the form $scope_attributeName, where scope is the name of the node that the script variable was declared on.
A Script Node has no attributes other than the common attributes (see 3.1).
A Script Node should not contain any child nodes. Any child nodes of a Script Node will be ignored.
...
A Validate Node is the same as a Script Node, except that a Validate Node will only be run if the Validate File Format box is ticked in the File Format Description tab of the File Collector configuration form.
...
Record Nodes represent records in the file. Record Nodes can be specified as target nodes, which will generate output records of the File Collector (see 4).
Record Nodes are made up of a collection of nodes, of various types. The Attribute Nodes under a Record Node generate the File Columns in the output records of the File Collector.
In addition to common attributes (see 3.1) Record Nodes may have the following attributes:
Attribute name | Description | Possible Values | Mandatory | Default |
prefix | If the Record Node has a child Name Node (see 3.4.5), setting this attribute will apply the value of the attribute, followed by an underscore, as a prefix to the name read from the file into the Name Node. | Any string | No | None |
discriminator | If a record has a child Tag Node (see 3.4.3) then this string will be used as the discriminator for the Tag Node, unless a discriminator is also set on the Tag Node itself, which will override it. | Any:
| No | None |
hexDiscriminator | Whether the discriminator will be given as a hex string. | T for true or F for false | No | F |
length | This overall length, as number of bytes, expected for all Attribute Nodes within the hierarchy of the Record Node. | Any of:
| No | None |
...
An Attribute Node (and each of its five subtypes: Tag Nodes; Length Nodes; Name Nodes; Value Nodes; Bytes Nodes) are the only grammar nodes that will make the file reader read bytes from the file.
An Attribute Node may have child nodes. For example, ASN.1 files have fields which consist of a tag value, followed by a length value, followed by the actual value of the field. This can be expressed in a grammar by placing child Tag, Length and Value (or Bytes) Nodes inside the Attribute Node.
If an Attribute Node has a child Value or Bytes Node then it will not itself read any data from the file, but will instead take its value from the Value or Bytes Node. In this case the bytes, bits, type, stringType, byteOrder and nibbleOrder attributes on the Attribute Node will merely act as defaults for the child Value or Bytes Node (although these defaults can be overridden by the child node if any of these attributes are set on the child node itself).
In addition to common attributes (see 3.1) Attribute Nodes may have the following attributes:
...
Attribute name
...
Description
...
Possible Values
...
Mandatory
...
Default
...
prefix
...
If the Attribute Node has a child Name Node (see 3.4.5), setting this attribute will apply the value of the attribute, followed by an underscore, as a prefix to the name read from the file into the Name Node.
This can be useful if the value read can start with a number, or other value which would be invalid when used in a PhixFlow Attribute Expression.
...
Any string
...
No
...
None
...
discriminator
...
The discriminator specifies the full value of this field (i.e. the field represented by this Attribute Node) in the file.
The discriminator will be converted, according to the type attribute, into the bytes that will be read in the file when this field is found.
The discriminator can be used to determine whether optional fields are included in the file.
...
Any:
- string, or
- hex string, e.g A0FF08, if the hexDiscriminator attribute is set to T.
Note that the file reader will expect the minimum number of bytes required to hold the value of the discriminator. E.g. if the discriminator is the number 2, the file reader will expect a single byte. If the field uses more than the minimum number of bytes, specify the full value of the discriminator as a hex string. E.g. the discriminator is the number 2, but is represented in the file with 4 bytes, padded with leading 0s; the discriminator would be "00000002".
...
No
...
None
...
hexDiscriminator
...
Whether the discriminator will be given as a hex string.
If the discriminator cannot be specified as an ASCII string then it can be specified as a hex string, where each pair of characters in the string represent a hex digit.
...
T for true or F for false
...
No
...
None
...
bytes
...
The number of bytes to be read from the file for this field.
Any of:
...
No, in certain cases (see notes in Possible Values column)
Otherwise yes.
...
None
...
bits
...
The number of bits to be read from the file. This value will take precedence over the number of bytes if both are specified.
Any of:
...
No, in certain cases (see notes in Possible Values column)
Otherwise yes.
...
None
...
type
...
The type of the field to be read. This attribute dictates how the bytes read from the file will be converted.
...
Any of:
- BCD - the field is a binary coded decimal. Commonly used for telephone numbers where each byte contains two numerals, each numeral represented by 4 bits. The result is returned as a string.
- Integer - a standard integer.
- String - a string.
- Float - a floating point number. This type assumes that the number has been encoded as a string which needs to be converted to a float.
- DateTime - a date and time. This type assumes that the date and time have been encoded as a string which needs to be converted to a date and time.
- Date - a date. This type assumes that the date has been encoded as a string. The string may include a time component but this will be set to zero (midnight) in the output.
- BERTag – the field is encoded as an ASN.1 tag in BER format. You do not need to specify a number of bytes for a BERTag field since the length is encoded as part of the data.
- BERLength - this node is an ASN.1 length value encoded in BER format. You do not need to specify a number of bytes for a BERLength attribute since the length is encoded as part of the data.
The type you specify in the grammar is not case sensitive, e.g. it can be "STRING", "String" or "string".
...
No if this node has a child Value or Bytes Node, with the attribute type set.
Otherwise yes.
...
None
...
dateFormat
...
The format of a date in the file. This should only be set if the type has been set as Date or DateTime. If no date format is specified then the reader will try a variety of possible date formats in turn. It is therefore more efficient to specify a date format, if possible.
...
A valid date format string. Valid date formats are documented in the PhixFlow online help.
...
No
...
None
...
stringType
...
This character set to use when converting the bytes read from the file into a string, or converting a discriminator value into the bytes that will be read in the file.
...
Any string which specifies a valid Java character set. E.g. UTF8
...
No
...
Taken from Grammar Node
...
byteOrder
...
The order that bytes will be read from the file when converting into a value.
...
L (for little endian) or B (for big endian).
...
No
...
Taken from Grammar Node
...
nibbleOrder
...
The order that nibbles (4 bit blocks) will be read from the file when converting into a value.
...
L (for little endian) or B (for big endian).
...
No
...
Taken from Grammar Node
...
tagType
...
If this grammar is for an ASN.1 file and this node is for a tag value – the type of tag.
The ASN.1 definition document that describes the file using ASN.1 notation should specify the type of each tag. If not, see the notes at the end of the Tag Node description (3.4.3).
This setting dictates how the tag value is translated.
...
Any of:
- Universal
- Application
- Context
- Private
The tagType you specify in the grammar is not case sensitive, e.g. it can be "CONTEXT", "Context" or "context".
...
No
...
Context
...
berConstruct
...
If this grammar is for an ASN.1 file and this node is for a tag value - this flag indicates whether the tag is for a constructed value (i.e. a record with sub values) or a simple value.
The ASN.1 definition document that describes the file using ASN.1 notation should specify whether this tag is for a construct variable. If not, see the notes at the end of the Tag Node description (3.4.3).
...
T for true or F for false
...
No
...
F
...
The Tag Node is a sub type of the Attribute Node. It has the same attributes as the Attribute Node, and will read bytes from the file to calculate a tag value.
...
The only difference between a Tag Node and an Attribute Node is that a Tag Node will use the discriminator from its most immediate Sequence, Attribute or Record Node ancestor (any intervening Control Nodes being ignored), if no discriminator has be set on the Tag Node itself. If the most immediate Sequence/Attribute/Record Node ancestor does not have a discriminator specified, then the Tag Node will keep looking up the ancestor tree until it finds a Sequence/Attribute/Record Node with the discriminator set.
This is useful when creating templates such as the following:
<Attribute name="attrTemplate" template="T">
<Tag name="tag" type="berTag"/>
<Length name="length" type="berLength"/>
<Value name="value"/>
</Attribute>
The discriminator is not set on the Tag Node in the template since the template may be used in several places in the grammar, with different discriminator values. For example:
<Attribute definedBy="attrTemplate" name="attr1" type="String" discriminator="27"/>
<Attribute definedBy="attrTemplate" name="attr2" type="Integer" discriminator="99"/>
Each of the two Attribute Node definitions above use the template, and so inherit all the child nodes of the template. However, across these two cases the child Tag Nodes must use different values for the discriminator. This is possible because each child Tag Node will inherit the discriminator set on its parent Attribute Node.
Tag Node types
If a Tag Node is given the type BERTag then the attributes tagType and berConstruct must be set correctly, because the bytes stored in the file for an ASN.1 tag will depend on the values given for tag type and BER construct in the ASN.1 file definition.
A full ASN.1 file definition will specify these values for each tag.
For example, an ASN.1 file definition may specify a tag as context-specific, constructed with a value of 5. The grammar for this would be:
<Tag name="asn1Tag" type="BERTag" discriminator="5" tagType="Context" berConstruct="T"/>
If a full ASN.1 file definition is not available then you must assume that the tag value in the file is the full hex or decimal value of the bytes for this tag. With the same example tag as above, the documentation will give the full value of the bytes representing the tag as A5 in hex or 165 in decimal. In this case, the grammar can be written either with:
- a Tag Node with type set to BERTag; follow the steps in Appendix A - ASN.1 Tag Encoding to determine the appropriate discriminator, tagType and berConstruct settings (this will result in the grammar above)
- a Tag Node with type set to Integer, discriminator set to a hex string (in the example, A5) and the hexDiscriminator set to T, as shown below:
<Tag name="hexTag" type="Integer" discriminator="A5" hexDiscriminator="T"/>
...
The Length Node is a sub type of the Attribute Node. It has the same attributes as the Attribute Node, and will read bytes from the file to obtain a length value.
The only difference between a Length Node and an Attribute Node is that once the Length Node has read a value from the file, it will use that value to set the length attribute on its most immediate Attribute, Sequence or Record Node ancestor (any intervening Control Nodes being ignored).
How this length is used depends on the type of ancestor node which is set by it.
If a Length Node sets the length on an ancestor Attribute Node, then the length will be the default bytes value of a child Value or Bytes Node of the Attribute Node. This default can be overridden by setting the bits or bytes on the child node. For example:
<Attr name="ASN1Attribute" type="Integer">
<Length name="attrLength" type="Integer"/>
<Value name="contents"/>
</Attr>
In this example the bytes attribute is not needed on the Value Node, since this is automatically inherited from the parent Attribute Node, which in turn receives it from the Length Node.
The following grammar is equivalent:
<Attr name="ASN1Attribute" type="Integer" variables="attrLength">
<Attr name="attrLength" type="Integer"/>
<Value name="contents" bytes="attrLength"/>
</Attr>
If a Length Node sets the length on a Sequence or Record Node, then the length determines how many bytes are expected across all remaining child Attribute Nodes of that Record or Sequence Node after the Length Node (i.e. not including any child Attribute Nodes before the Length Node). See examples in 3.2.3.
...
The Name Node is a sub type of the Attribute Node. It has the same attributes as the Attribute Node, and will read bytes from the file to obtain a name.
The only difference between a Name Node and an Attribute Node is that the Name Node will modify the name attribute of the most immediate Attribute Node ancestor (all intervening Control Nodes will be ignored). If the Attribute Node being modified has a prefix attribute set, the new name of the Attribute Node will be the prefix followed by an underscore, followed by the value of the Name Node.
If more than one Name Node is specified then the second and subsequent Name Nodes will append an underscore, followed by the value of the Name Node, to the end of the name created by the first Name Node.
...
The Value Node is a sub type of the Attribute Node. It has the same attributes as the Attribute Node, and will read bytes from the file to calculate a value.
The only difference between a Value Node and an Attribute Node is that the Value Node will use the bytes read to set the value of its most immediate Attribute Node ancestor (all intervening Control Nodes will be ignored).
If the bytes, bits, type, stringType, byteOrder and nibbleOrder attributes are not set on a Value Node then the corresponding attributes on the most immediate Attribute Node ancestor will apply. Just as for Tag Nodes inheriting discriminators, this is useful when creating reusable templates (see 0). Rather than specifying these attributes in the template, they can be set on the Attribute Nodes that use the template. These attribute values will be inherited by the child nodes copied from the template. In this way the same template can be used for a range of Integer, String or BCD attributes with different byte, nibbleOrder, etc. attributes.
If a Value Node has the type attribute set to String, and the parent node has a different type, e.g. Integer, then the Value Node will first read the bytes from the file as a string, but when the value is passed to the parent Attribute Node it will be converted to an integer. This can be useful when using the binary file reader to read an ASCII file (e.g. a CSV or JSON file), converting string representations of integers into actual integers.
For example:
<Attribute name="csvDate" type="Date">
<Value name="csvDateAsString" type="String">
<Repeat name="csvValueRepeat">
<Bytes name="csvValueBytes" bytes="1"/>
</Repeat>
</Value>
</Attribute>
<Attribute name="comma" type="String" discriminator=","/>
The grammar above will keep reading bytes until a comma is reached. This collection of bytes will be passed to the Value Node, which will convert them to a string. This string will then be passed to the outer Attribute Node, which will convert the string into a date. The same technique could be used to convert strings to Integers, Floats, Dates or DateTimes.
...