Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that File Collectors can also be used to process files that reside inside compressed file archives such as zip files. Please see the section below on Compressed Files for further information.

This page describes, in detail, all configuration options for file collectors. But if you are setting up a file collector to load email messages and/ or attached files, a good starting place is the article How To: Read data from an email account, which covers a number of common examples.

Form: File Collector Details

...

FieldTypeDescription
NameTextThe name of the file collector.
EnabledCheckboxTick when the configuration is complete and the file collector is ready for use.
Source TypeDropdownThis field can have any of the following values:
  • Specified Directory: This option will cause the file collector to use the Import File Location (specified in System Configuration on the System Directories tab) as the root input directory when looking for files to load.
  • Managed File: This option will cause the file collector to use the File Upload Directory (specified in System Configuration on the System Directories tab) as the root input directory when looking for files to load.
Number of Header LinesTextThe number of lines in the header of the file. These are ignored when reading the file. (This option is not available for Binary File, XML and HTML file types).
Tag

Text Expression

(special case: only string literals allowed, no PhixFlow variables)

This field is only available if the Source Type field is set to Managed File. When files are uploaded by PhixFlow they are placed into a directory whose full path is a combination of the root File Upload Directory (specified in System Configuration on the System Directories tab), the tag value specified here and the Input Directory specified below (hard coded to 'in' for Managed files).
For example, if the System Configuration File Upload Directory is set to C:\ManagedFiles and Tag is set to CVFiles then the File Collector will look within C:\ManagedFiles\CVFiles\in for files to process.

If you are creating a file collector to load email messages and/ or attached files, you can specify a tag here if one has been provided in the subject line of the incoming emails. See How To: Read data from an email account for further details.

Allow Non-Scheduled CollectionCheckboxIf this is turned on, then the collector will run as part of any ad-hoc Analysis Engine run which requires this data. If not, it will only run as part of a scheduled task under the Analysis Engine.
File TypeDropdownCan have values:
  • Comma Separated Values: fields are delimited by a comma, (or other character).
  • Fixed Length Records: fields have a fixed column width.
  • Binary File: Data is extracted from the file using a Binary File Grammar (in XML) specified in the File Format Description tab.
  • File Details Only: Only attribute details about the file itself will be available.
  • Excel Spreadsheet: Data is extracted from the an excel spreadsheet supporting a .xls or .xlsx extension.

    .xlsx type excel files containing in excess of 10,000 rows are not supported by PhixFlow and should not be imported using a File Collector.

  • XML File: Data is extracted from an XML file
  • HTML File: Data is extracted from an HTML file
Next Sequence 

The next sequence number expected to be found within the name of the file being imported.

This field is only available if File Location Strategy = All Files in Folder.

FTP Site The FTP Site on which the import file is stored. If no site is specified then the file is assumed to be on the local machine. If a site is specified then all directory paths specified on this form should be the full path to the file since the base directory specified in system configuration is ignored (since the base directory is specific to the local machine).
Ignore Base Directory This field is only available if Source Type = Specified Directory.
Normally the base directory, specified in the "System Directories" tab of the "System Configuration" screen, is prepended to all directories specified on this form. However, if this flag is ticked then this does not happen and the directories specified on this form alone are used as the full path specifications for the import file.
File Location StrategyTextCan have values:
    • All Files in Folder: read all files matching the pattern specified in File Pattern Expression.
    • Read Paths: read in file path names from a collector or stream.

      This input database collector or stream must be attached to the file collector by a lookup pipe with no index set. The attribute of the input stream or collector which contains the file path names is specified in the field: File Name Attribute. The value entered into this field should be plain text, e.g. myFilePaths but not quoted "myFilePaths".

      Each file path name is interpreted as a pathname relative to the Import Directory. A path name may be a simple file name, or it may have multiple levels of directory, including compressed files (which will be interpreted as directories). The directory separator must be '/' (forward-slash), and not '\' (back-slash), even on a Windows platform. There should be no leading '/'.

      E.g. 'abc.csv', 'dir1/dir2.zip/abc.csv'

      Please note that once an attempt is made to read a file via the Read Paths method, the stream will not attempt to re-import the contents of that file if instructed to do so during another analysis run, even if the original import failed.
      Please also note that the stream containing the file paths will not run automatically in this situation, and so must be added to a taskplan if it is required to run it before the file collector stream.


    • Read Names: read in file locations from a collector or stream. This input collector or stream must be attached to the file collector by a pipe. The attribute of the input stream or collector which contains the file locations is specified in the field: File Name Attribute.

    • Read Names is deprecated.
Input Directory ExpressionText Expression

Source Type = Specified Directory

If the Source Type is Specified Directory, files will be ready from the directory specified in Input Directory Expression.

Unless the flag Ignore Base Directory is ticked, the path specified in this field will be added to the default input directory root - this is specified in the System Configuration File Upload Location. If the flag Ignore Base Directory is ticked, the full path for the input directory must be specified.

In fact, this field is an expression. This must evaluate to a plain text string. In the simple case, this will be text surrounded by quotes, e.g.

Code Block
"C:/data/address/input/accountValues"
Also, because this is an expression, you must always use / rather than \, even on windows platforms.

You can include PhixFlow variables in this expression, e.g.:

Code Block
"C:/data/address/input/" + _inputMultiplier

If you need to include wildcards or some other variable element in the resulting path, you must use the Directory Pattern Expression.

If File Location Strategy = All Files in Folder PhixFlow will look in this directory to find files matching the pattern specified in File Pattern Expression.

If File Location Strategy = Read Names this is added to the start of the file location read from the file name attribute.

Source Type = Managed File

If the Source Type is Managed file, this will contain a non editable value of "in"
This will be appended to the combined path of System Configuration File Upload Directory and Tag to give the input directory that files will be read from.

Directory Pattern ExpressionRegular Expression

This field is used to identify valid sub-directories of the input directory.

If a Directory Pattern Expression is provided then PhixFlow will not only check the Input Directory for files but will also check all sub-directories of the Input Directory. Each file found will then not only have its name checked against the File Pattern Expression but will also have the relative path from the Input Directory to the file (referred to as the sub-directory path) checked against the Directory Pattern Expression.

For example, suppose the Input Directory has the sub-directories: 'region1/teamA'; 'region1/teamB'; 'region2/teamA'. If you want all the files across all regions for teamA, but not teamB, then you could use the following Directory Pattern Expression to pick out just the files for teamA:

".*/teamA/"

Alternatively, if you wanted all the files for all teams in region 1 only, you could use the following Directory Pattern Expression:

"region1/.*"

Regular expression rules are used to perform this match rather than the sort of pattern matching rules you might be used to when listing files. For example:

  • To match any string of characters, you must use ".*" and not "*"
  • To match a "." you must use "\\." and not "." (which means any character)
  • You must use forward slashes "/" instead of backslashes "\" for directory separators

A number of internal variables are available in these expressions:

  • _fromDate: the start date of the period of the stream being processed.
  • _toDate: the end date of the period of the the stream being processed.

Note that there are also a number of predefined compressed file expressions that will always be checked to determine if a file within a valid sub directory is actually a compressed file. If so then this file will assumed to be a valid compressed file and hence will be recursed into as if it was a standard matching directory. Please see Compressed Files for a list of valid compressed file expressions.

Exclude Dir. Pattern Expr.Regular Expression

This field can be used to exclude certain sub-directories found by the Directory Pattern Expression.

For example, suppose the Input Directory has the sub-directories: 'region1/teamA'; 'region1/teamB'; 'region2/teamA'. If you want all the files across all regions for teamA, but not teamB, then you could use the following Directory Pattern Expression to find all files:

".*"

combined with the following Exclude Dir. Pattern Expr to exclude those for teamB:

".*/teamB/"

Regular expression rules are used to perform this match rather than the sort of pattern matching rules you might be used to when listing files. For example:

  • To match any string of characters, you must use ".*" and not "*"
  • To match a "." you must use "\\." and not "." (which means any character)
  • You must use forward slashes "/" instead of backslashes "\" for directory separators

A number of internal variables are available in these expressions:

  • _fromDate: the start date of the period of the stream being processed.
  • _toDate: the end date of the period of the the stream being processed.
File Pattern ExpressionText ExpressionThis field is only available if File Location Strategy = All Files in Folder. An expression used to generate a list of files to be read. This expression must itself resolve to a Regular Expression, used to match files in the input directory. Note that regular expression rules are used to perform this match, not the shell replacement style rules used in many file systems. E.g. to match all files, you must use ".*" and not "*". A number of internal variables are available in these expressions:
  • _fromDate: the start date of the period of the stream being processed.
  • _toDate: the end date of the period of the the stream being processed.
  • %SEQ%: the current sequence number.

Examples:

"inputRecords.txt"

will read files called "inputRecords.txt" from the input directories.

".*"

will read all files in the input directories.

".*\\.txt"

will read all files in the input directories with the extension ".txt"

"teamA.*"

will read all files in the input directories starting with "teamA."

"record_" + toString(now(),"yyyy-MM-dd") + "\\.txt"

will read files in the input directories with the format "record_yyyy-MM-dd.txt", where yyyy-MM-dd is the current date. E.g. "record_2013-03-26.txt".

"("+listToString(_context.f,"|")+")"

will read files with name contained in the list of files uploaded by the Stream Action which caused the File Collector to run, but only if a Context Value called 'f' is set in the Action and its value expression is '_files'

Archive Directory ExpressionRegular ExpressionIf set, processed files will be written to this directory. This field is an expression that must resolve to a Regular Expression. Note that because this is an expression field, if you supply a simple directory definition in plain text it must be surrounded by quotes. Also, directory separators must be / and not \, even if the file is being moved to a directory on a Windows platform. E.g. "C:/data/address/archive/".
Error Directory ExpressionRegular ExpressionIf set, files that error during processing will be written to this directory. This field is an expression that must resolve to a Regular Expression. Note that because this is an expression field, if you supply a simple directory definition in plain text it must be surrounded by quotes. Also, directory separators must be / and not \, even if the file is being moved to a directory on a Windows platform. E.g. "C:/data/address/error/".
Local Archive DirectoryText Expresison 
Local Error DirectoryText Expression 

...