A File Collector describes the structure, content, naming patterns and location of files of data to be imported into PhixFlow.

Note that File Collectors can also be used to process files that reside inside compressed file archives such as zip files. Please see the section below on Compressed Files for further information.

Info
Image Removed This page describes, in detail, all configuration options for file collectors. But if you are setting up a file collector to load email messages and/ or attached files, a good starting place is the article How To: Read data from an email account, which covers a number of common examples.

Form: File Collector Details

The following fields are configured on the Details tab:

...

Specified Directory: This option will cause the file collector to use the Import File Location (specified in System Configuration on the System Directories tab) as the root input directory when looking for files to load.
Managed File: This option will cause the file collector to use the File Upload Directory (specified in System Configuration on the System Directories tab) as the root input directory when looking for files to load.

...

Comma Separated Values: fields are delimited by a comma, (or other character).
Fixed Length Records: fields have a fixed column width.
Binary File: Data is extracted from the file using a Binary File Grammar (in XML) specified in the File Format Description tab.
File Details Only: Only attribute details about the file itself will be available.
Excel Spreadsheet: Data is extracted from the an excel spreadsheet supporting a .xls or .xlsx extension.
.xlsx type excel files containing in excess of 10,000 rows are not supported by PhixFlow and should not be imported using a File Collector.
XML File: Data is extracted from an XML file
HTML File: Data is extracted from an HTML file

...

File Location Strategy

...

This input database collector or stream must be attached to the file collector by a lookup pipe with no index set. The attribute of the input stream or collector which contains the file path names is specified in the field: File Name Attribute. The value entered into this field should be plain text, e.g. myFilePaths but not quoted "myFilePaths".

Each file path name is interpreted as a pathname relative to the Import Directory. A path name may be a simple file name, or it may have multiple levels of directory, including compressed files (which will be interpreted as directories). The directory separator must be '/' (forward-slash), and not '\' (back-slash), even on a Windows platform. There should be no leading '/'.

...

Read Names: read in file locations from a collector or stream. This input collector or stream must be attached to the file collector by a pipe. The attribute of the input stream or collector which contains the file locations is specified in the field: File Name Attribute.

...

Text Expression

(special case: only string literals allowed, no PhixFlow variables)

...

This field is only available if the Source Type field is set to Managed File. When files are uploaded by PhixFlow they are placed into a directory whose full path is a combination of the root File Upload Directory (specified in System Configuration on the System Directories tab), the tag value specified here and the Input Directory specified below (hard coded to 'in' for Managed files).
For example, if the System Configuration File Upload Directory is set to C:\ManagedFiles and Tag is set to CVFiles then the File Collector will look within C:\ManagedFiles\CVFiles\in for files to process.

If you are creating a file collector to load email messages and/ or attached files, you can specify a tag here if one has been provided in the subject line of the incoming emails. See How To: Read data from an email account for further details.

...

Ignore Base Directory

...

Source Type = Specified Directory

If the Source Type is Specified Directory, files will be ready from the directory specified in Input Directory Expression.

Unless the flag Ignore Base Directory is ticked, the path specified in this field will be added to the default input directory root - this is specified in the System Configuration File Upload Location. If the flag Ignore Base Directory is ticked, the full path for the input directory must be specified.

In fact, this field is an expression. This must evaluate to a plain text string. In the simple case, this will be text surrounded by quotes, e.g.

Code Block
"C:/data/address/input/accountValues"

Also, because this is an expression, you must always use / rather than \, even on windows platforms.

You can include PhixFlow variables in this expression, e.g.:

Code Block
"C:/data/address/input/" + _inputMultiplier

If you need to include wildcards or some other variable element in the resulting path, you must use the Directory Pattern Expression.

If File Location Strategy = All Files in Folder PhixFlow will look in this directory to find files matching the pattern specified in File Pattern Expression.

If File Location Strategy = Read Names this is added to the start of the file location read from the file name attribute.

Source Type = Managed File

If the Source Type is Managed file, this will contain a non editable value of "in"
This will be appended to the combined path of System Configuration File Upload Directory and Tag to give the input directory that files will be read from.

...

This field is used to identify valid sub-directories of the input directory.

If a Directory Pattern Expression is provided then PhixFlow will not only check the Input Directory for files but will also check all sub-directories of the Input Directory. Each file found will then not only have its name checked against the File Pattern Expression but will also have the relative path from the Input Directory to the file (referred to as the sub-directory path) checked against the Directory Pattern Expression.

For example, suppose the Input Directory has the sub-directories: 'region1/teamA'; 'region1/teamB'; 'region2/teamA'. If you want all the files across all regions for teamA, but not teamB, then you could use the following Directory Pattern Expression to pick out just the files for teamA:

".*/teamA/"

Alternatively, if you wanted all the files for all teams in region 1 only, you could use the following Directory Pattern Expression:

"region1/.*"

Regular expression rules are used to perform this match rather than the sort of pattern matching rules you might be used to when listing files. For example:

To match any string of characters, you must use ".*" and not "*"
To match a "." you must use "\\." and not "." (which means any character)
You must use forward slashes "/" instead of backslashes "\" for directory separators

A number of internal variables are available in these expressions:

_fromDate: the start date of the period of the stream being processed.
_toDate: the end date of the period of the the stream being processed.

Note that there are also a number of predefined compressed file expressions that will always be checked to determine if a file within a valid sub directory is actually a compressed file. If so then this file will assumed to be a valid compressed file and hence will be recursed into as if it was a standard matching directory. Please see Compressed Files for a list of valid compressed file expressions.

...

This field can be used to exclude certain sub-directories found by the Directory Pattern Expression.

For example, suppose the Input Directory has the sub-directories: 'region1/teamA'; 'region1/teamB'; 'region2/teamA'. If you want all the files across all regions for teamA, but not teamB, then you could use the following Directory Pattern Expression to find all files:

".*"

combined with the following Exclude Dir. Pattern Expr to exclude those for teamB:

".*/teamB/"

Regular expression rules are used to perform this match rather than the sort of pattern matching rules you might be used to when listing files. For example:

To match any string of characters, you must use ".*" and not "*"
To match a "." you must use "\\." and not "." (which means any character)
You must use forward slashes "/" instead of backslashes "\" for directory separators

A number of internal variables are available in these expressions:

_fromDate: the start date of the period of the stream being processed.
_toDate: the end date of the period of the the stream being processed.

...

_fromDate: the start date of the period of the stream being processed.
_toDate: the end date of the period of the the stream being processed.
%SEQ%: the current sequence number.

Examples:

"inputRecords.txt"

will read files called "inputRecords.txt" from the input directories.

".*"

will read all files in the input directories.

".*\\.txt"

will read all files in the input directories with the extension ".txt"

"teamA.*"

will read all files in the input directories starting with "teamA."

"record_" + toString(now(),"yyyy-MM-dd") + "\\.txt"

will read files in the input directories with the format "record_yyyy-MM-dd.txt", where yyyy-MM-dd is the current date. E.g. "record_2013-03-26.txt".

"("+listToString(_context.f,"|")+")"

will read files with name contained in the list of files uploaded by the Stream Action which caused the File Collector to run, but only if a Context Value called 'f' is set in the Action and its value expression is '_files'

...

The next sequence number expected to be found within the name of the file being imported.

This field is only available if File Location Strategy = All Files in Folder.

...

Error: unreadable directories will be reported, and if any are found, the file search will fail.
Warning: unreadable directories will be reported, but the file search will continue unaffected.
Ignore: unreadable directories will be silently ignored

...

This field should be populated according to valid XPath syntax. Please see XPath Examples for how to use XPath expressions and how the returned data can be used and evaluated in the corresponding stream attribute expressions.

...

Text/

Dropdown

...

Text/

Dropdown

...

Text/

Dropdown

...

If this flag is set then PhixFlow will not throw an error if the record being read contains more columns than expected. If this flag is not set then an error will be reported if there are too many columns.

...

If this flag is set then PhixFlow will not throw an error if the record being read contains fewer columns than expected. If this flag is not set then an error will be reported if there are too few columns.

...

This field is only available if File Type = Excel Spreadsheet.

This field is an expression that must evaluate to a list of ranges with the format "WorksheetName!TopLeftCell:BottomRightCell"

If this field is left blank, PhixFlow will read the first worksheet it finds in the excel file (even if this is a hidden sheet) with a range covering the whole sheet.

E.g. if just a single range is needed: "DailyCallsSheet!A1:G100"

E.g. if a list of ranges is required: ["DailyCallsSheet!A1:G100", "A1:B20", "Calls!A1:C100"]

Remember that in all cases PhixFlow will only read the columns that have been defined in the File Columns tab.

Because this field is an expression, the resulting list can be generated with any valid PhixFlow expression. You can also use the internal variable _worksheets which gives you the list of worksheets that PhixFlow found in the file. See the example below for how you might use this.

Examples

All rows and columns in the default 1st worksheet:- leave this field empty as this is the default behaviour
Specified columns only on the default 1st worksheet:- "A:C"
Specified cell range only on the default 1st worksheet:- "B1:G10"
Specified cell range on a specified worksheet:- "DailyCallsSheet!A2:F20"
List of specified cell ranges on multiple worksheets:- ["DailyCallsSheet!A2:F20", "Calls!A1:C400", "Accounts!A5:F50"]
Examine the list of worksheets that have been found and specify ranges for only certain sheets, if they exist:

do (    $rangeList = [],

    forEach($sheet, _worksheets,

        if ( listContains(["sheetA", "sheetB"], $sheet),

            addElement($rangeList, $sheet + "!A1:B10")

),

    $rangeList

This expression will evaluate to the list of a maximum of two ranges, if both worksheets sheetA and sheetB exist - in this case, this is equivalent to ["sheetA!A1:B10", "sheetB!A1:B10"]. Crucially, if a sheet is not found, the range will not be included. This is important for error handling. If you specify a range that is not in the excel file PhixFlow will error. So if you are not sure that a worksheet will always be included, write an expression like this to check, and only specify the range when the sheet is found.

Related internal variables

See notes for the internal variables _worksheet and _range below. These can be used in stream attribute expressions to record the source worksheet and range for data you have loaded into PhixFlow.

Constraints

Note that if a worksheet is specified, then the full cell range must also be specified. Hence it is not possible to select a 'worksheet only' or 'columns only for a specified worksheet'. e.g DailyCallsSheet or DailyCallsSheet!A:C are not supported.

...

This checkbox is only available if File Type = Excel Spreadsheet.

If you are reading a spreadsheet which is password protected, enter the password here so that the file can be unlocked.

...

This checkbox is only available if File Type = Excel Spreadsheet.

...

The following fields are configured through the File Format Description tab on the form: Note : this tab is only available for Binary File Collectors.

...

Expression:
list of strings

...

A list of target node names - that is, the names of nodes that will generate an output record.

Example of a single target:

"DATA_RECORD"

Example of a list of targets:

["DATA_RECORD1","DATA_RECORD2","DATA_RECORD3"]

...

The following fields are configured through separate tabs on the form:

...

Forms: File Columns

The data columns present in the import file are defined here.

Form: File Column Details

...

String
Integer
Float
Date - please see valid Date formats
Datetime - please see valid Datetime formats
TrueFalse - please see valid TrueFalse values

...

Overview

Use a file collector to specify the structure, content, naming patterns and location of files of data. When you run the file collector, PhixFlow imports the data. You can also use a file collector to process files inside compressed file archives such as zip files; see Handling Compressed Files, below.

You can also use a file collector to import email messages and their attached files; see Reading Data From an Email Account for details.

To open a file collector's property tab, double-click on:

the file collector name in the repository
the file collector icon in a model.

Before running a model, you must upload files that do not already exist on the PhixFlow server. For example you may have CSV or Microsoft Excel files saved to your computer or the network.

In the property tab, set the Input Directory Expression.
In the model, right-click on the file collector icon to open the context toolbar.
Click
Insert excerpt
_upload_file
_upload_file
nopanel true
.
PhixFlow opens a file browser. Find and select the file.
Click
Insert excerpt
_upload_file
_upload_file
nopanel true
.

Insert excerpt

	_standard_settings
	_standard_settings
nopanel	true

Panel

borderColor	#7da054
titleColor	white
titleBGColor	#7da054
borderStyle	solid
title	Sections on this page

Table of Contents

indent	12px
style	none

File Collector Properties

Basic Settings

Field Description

Name Enter a name for this file collector.

Auto Configuration

This check box is available for a new file collector that has not been configured. It is ticked by default.

Insert excerpt

	_check_box_ticked
	_check_box_ticked
nopanel	true

to load an Excel spreadsheet and automatically configure the file collector and stream properties using the data in the spreadsheet; see Easy Loading for Excel Spreadsheets, below.

Insert excerpt

	_check_box_untick
	_check_box_untick
nopanel	true

to switch off auto-configuration. More file collector properties become available so that you can specify details about the file. You also need to add the stream and pipe to the model.

Enabled

Insert excerpt

	_check_box_ticked
	_check_box_ticked
nopanel	true

to indicate you have completed the properties and the file collector is ready to use.

Source Type

Select whether or not the file is already on the PhixFlow server.

Specified Directory

Use for files that are already stored on the PhixFlow server. By default, PhixFlow assumes files are in the System Configuration → System Directories → Import File Location, or a subdirectory of it. You must also specify the relative path to the file(s) in the Input Directory Expression.

If your file is not in the Import File Location, select Specified directory and tick Ignore base directory. In this case you must also specify the full path to the file(s) in the Input Directory Expression.

See also Input Directory Expression and Ignore base directory, below.

Managed File

Use for files that you have on your local machine or the network.
PhixFlow loads the file to its System Configuration → System Directories→ File Upload Directory. You can optionally specify a sub-directory of the File Upload Directory using the Tag property.

For a managed file, you must load the file before running the file collector.

Expand

title	How to load a managed file

In the analysis model, hover your mouse pointer over the new file collector to display the pop-up toolbar.
Click
Insert excerpt
_upload_file
_upload_file
nopanel true
.
In the file explorer, find your file and click Open.
In the Upload Managed File window, click the
Insert excerpt
_upload_button
_upload_button
nopanel true
button.

If you have selected Auto Configuration, PhixFlow automatically starts the process for loading a managed file; see Easy Loading for Excel Spreadsheets, below.

Number of Header Lines Enter the number of lines in the header of the file. These are ignored when reading the file. (This option is not available for Binary File, XML and HTML file types).

File Type

Can have values:

Comma Separated Values: fields are delimited by a comma, (or other character).
Fixed Length Records: fields have a fixed column width.
Binary File: Data is extracted from the file using a Binary File Grammar (in XML) specified in the File Format Description tab.
File Details Only: Only attribute details about the file itself will be available.
Excel Spreadsheet: Data is extracted from the an excel spreadsheet supporting a .xls or .xlsx extension.
Tip
.xlsx type excel files containing in excess of 10,000 rows are not supported by PhixFlow and should not be imported using a file collector.
XML File: Data is extracted from an XML file.
HTML File: Data is extracted from an HTML file.

For information about the PhixFlow internal variables you can use to specify the attributes you want, see File Collector Attributes, below.

Next Sequence

Available when File Location Strategy is All Files in Folder.

Enter the next sequence number expected to be found within the name of the file being imported.

Allow Non-Scheduled Collection

Insert excerpt

	_check_box_ticked
	_check_box_ticked
nopanel	true

to run the file collector as part of any analysis run that requires this data.

Insert excerpt

	_check_box_untick
	_check_box_untick
nopanel	true

to only run the file collector as part of a scheduled task.

FTP Site Specify the FTP Site on which the import file is stored. If no site is specified then the file is assumed to be on the local machine. If a site is specified then all directory paths specified on this form should be the full path to the file since the base directory specified in system configuration is ignored (since the base directory is specific to the local machine).

File Location Strategy

Select one of:

All Files in Folder: read all files matching the pattern specified in File Pattern Expression.
Read File Paths: read in file path names from a collector or stream.

This input database collector or stream must be attached to the file collector by a lookup pipe with no index set. The attribute of the input stream or collector which contains the file path names is specified in the field: File Name Attribute. The value entered into this field should be plain text, e.g. myFilePaths but not quoted "myFilePaths".

Each file path name is interpreted as a pathname relative to the Import Directory. A path name may be a simple file name, or it may have multiple levels of directory. Compressed files are interpreted as directories. On Linux and Windows platforms, the directory separator must be forward-slash / not backslash \ . Do not include a leading forward-slash /, for example: 'abc.csv' or 'dir1/dir2.zip/abc.csv'

Read Names: This option is deprecated.

Read in file locations from a collector or stream. This input collector or stream must be attached to the file collector by a pipe. The attribute of the input stream or collector which contains the file locations is specified in the field: File Name Attribute.

Note

After a stream has attempted to read a file using Read File Paths, the stream will not attempt to re-read the contents of that file during another analysis run, even if the original import failed.

This means the stream containing the file paths will not run during an analysis run. You must add it to a task plan in order to run it before the file collector stream.

Tag
Anchor
tag
tag

Available when Source Type field is Managed File.

Specify a sub-directory of System Configuration → System Directories → File Upload Directory using string literals only. Do not use PhixFlow variables.

PhixFlow loads files into a directory whose full path is a combination of:

File Upload Directory, for example C:\LoadedManagedFiles
Tag, for example CVFiles
Input Directory - this is automatically set to in for managed files.

For the example paths above, the file collector looks in the directory C:\LoadedManagedFiles\CVFiles\in

If you are creating a file collector to load email messages and/or attached files, you can specify a tag here if one has been provided in the subject line of the incoming emails. See Reading Data From an Email Account for further details.

Ignore Base Directory

Anchor

	ignore
	ignore

Available when Source Type is Specified Directory.

The base directory is set in System Configuration → System Directories → Import File Location.

Insert excerpt

	_check_box_untick
	_check_box_untick
nopanel	true

to automatically prepend the Import File Location directory path to the directories specified in the file collector properties.

Insert excerpt

	_check_box_ticked
	_check_box_ticked
nopanel	true

so that PhixFlow reads directories specified in the file collector properties as the full path to the file.

Note
For improved security, your administrator can set a System Configuration → System Directories → Restricted Directory. If it is set, PhixFlow will only load files from the Restricted Directory or a sub-directory of it. Even if you tick Ignore base directory, all directories specified in the file collector properties must be within the Restricted Directory.

Input Directory
Expression
Anchor
input
input

Specify the location of the files you want to load.

When Source Type is Managed File

This field contains a read-only value of in.
PhixFlow reads files from the combined path: File Upload Directory + Tag +in

When Source Type is Specified Directory

Specify a path:
- either to a directory relative to the Import File Location. PhixFlow reads files from the directory whose full path is a combination of Import File Location and the path specified here.
- or, if Ignore Base Directory is ticked, the full path to the file(s).

Expand

title	Tips on specifying a directory in an expression

This expression must evaluate to a plain text string.

Use string literals only.

Always use forward-slash / rather than backslash \, even on Windows.

Enclose the string in quotes, for example:

Code Block
"C:/data/address/input/accountValues"

Optionally include PhixFlow variables, for example:

Code Block
"C:/data/address/input/" + _inputMultiplier

Do not use wildcards. If you need to include wildcards or some other variable element in the resulting path, use the Directory Pattern Expression.

If File Location Strategy is:

All Files in Folder, PhixFlow looks in this directory to find files matching the pattern specified in File Pattern Expression.
Read Names, this is added to the start of the file location read from the file name attribute.

See also System Configuration for File Upload Directory and Import File Location.

Directory
Pattern Expression

Enter a regular expression to specify one or more relative paths from the Input Directory to its sub-directories. PhixFlow looks in the input directory and its sub-directories for files whose names match the File Pattern Expression.

PhixFlow can recognise some compressed file types; see Handling Compressed Files, below. PhixFlow treats these compressed files as a matching sub-directory.

Internal Variables

You can use the following PhixFlow internal variables in this expression:

_fromDate: the start date of the period of the stream being processed.
_toDate: the end date of the period of the the stream being processed.

Expand

title	Examples

The Input Directory specifies three directories: 'region1/teamA'; 'region1/teamB'; 'region2/teamA'

To load files from all regions for teamA only, enter: ".*/teamA/"
To load files for all teams in region 1 only, enter: "region1/.*"

Insert excerpt

	_regexp_tips
	_regexp_tips
nopanel	true

Exclude Dir.
Pattern Expr.

If the Directory Pattern Expression includes some sub-directories that you do not want to search, enter a regular expression that will exclude those sub-directories.

Internal Variables

You can use the following PhixFlow internal variables in this expression:

_fromDate: the start date of the period of the stream being processed.
_toDate: the end date of the period of the the stream being processed.

Expand

title	Examples

Input Directory has the sub-directories:
'region1/teamA'; 'region1/teamB'; 'region2/teamA'
Directory Pattern Expression is: ".*"
This means PhixFlow will load all files from all regions.

To exclude files from teamB, enter the expression: ".*/teamB/"

Insert excerpt

	_regexp_tips
	_regexp_tips
nopanel	true

File
Pattern Expression

Available when File Location Strategy is All Files in Folder.

Enter a regular expression that will match one or more files in the Input Directory, in order to generate a list of files to load.

PhixFlow uses the File Pattern Expression combined with the fields that specify the directories in which to look. This means PhixFlow:

looks in directories specified in the Input Directory
and looks in sub-directories specified the Directory Pattern Expression
but ignores sub-directories that have been excluded by the Exclude Dir Pattern Expr.

If the specified directories contain no files that match the regular expression, the file collector will not load any files.

Internal Variables

You can use the following PhixFlow internal variables in this expression:

_fromDate: the start date of the period of the stream being processed.
_toDate: the end date of the period of the the stream being processed.
%SEQ%: the current sequence number.

Expand

title	Examples

An Input Directory, inputRecords.txt, contains multiple files. To load:

a single file called inputRecords.txt, enter: "inputRecords.txt"
all the files, enter: ".*"
files with the extension .txt, enter: ".*\\.txt"
files whose names start with teamA, enter: "teamA.*"
files with the format record_yyyy-MM-dd.txt, where yyyy-MM-dd is the current date, such as
record_2013-03-26.txt, enter: "record_" + toString(now(),"yyyy-MM-dd") + "\\.txt"

You can also load files with name contained in the list of files uploaded by the Stream Action which caused the File Collector to run. The stream action must it must have a Context Value called 'f' with the value expression '_files'.

In this case, enter the file pattern expression: "("+listToString(_context.f,"|")+")"

Insert excerpt

	_regexp_tips
	_regexp_tips
nopanel	true

Archive Directory Expression

Optionally, enter an expression for a directory path. The expression must resolve to a Regular Expression.

Tip

To enter a plain text directory as an expression:

enclose the path in quotes
use forward slash / as a separator, even for paths on a Windows platform, for example, "C:/data/address/directory/".

The Archive Directory Expression is the location to which all files processed by the file collector will be written.

The Error Directory Expression is the location to which any files that cause an error during processing will be written.

Error Directory
Expression

Local Archive Directory

Available when FTP Site is specified.

Specify whether the archive directory is on the PhixFlow server (local) or on the original server.

Note

If you have uploaded a file to the file collector, when you run analysis, PhixFlow reads the data into the file collector's stream and then archives the uploaded data. This means the file collector no longer has any data. If you subsequently run analysis again, the stream set will be empty. To keep the data in the file collector, you must do one of the following:

do not specify an archive directory
upload the file to the file collector again
once the data is in the stream, make the stream, or the pipe to the stream, static.

Local Error Directory Available when FTP Site is specified. Specify whether the error directory is on the PhixFlow server (local) or on the original server.

Advanced

Field Description

Maximum Files Enter the maximum number of files that PhixFlow will process when the file collector runs.

Minimum Files Enter the minimum number of files that PhixFlow expects to find when the file collector runs. If fewer files are found then this is treated as an error.

Max Records Per File Enter the maximum number of records that PhixFlow will read from each file.

Errors Before Rollback

Anchor

	fileCollectorErrorsBeforeRollback
	fileCollectorErrorsBeforeRollback

The maximum number of errors PhixFlow will permit during the processing of a file before. Once this number has been exceeded, PhixFlow will abandon the attempted file load; see Error Handling below.

Parallel Readers Enter the number of files to process in parallel. If blank, this defaults to 1. If the file collector is configured to read files in sequence, this field is ignored and a single file reader is used.

Unreadable Directories

Select what PhixFlow will do if there are unreadable directories when it is searching a directory hierarchy for files to import.

Error: the search fails and the unreadable directories are reported in the log.
Warning: the search continues and unreadable directories are reported in the log.
Ignore: the search continues and unreadable directories are ignored.

XPath Expression

Available when File Type is XML File or HTML File.

Enter valid XPath syntax. For information about how to use XPath expressions and how to use the returned data in the corresponding stream attribute expressions, see XPath Examples.

Character Set

Insert excerpt

	_character_set
	_character_set
nopanel	true

Column Separator

Insert excerpt

	_column_separator
	_column_separator
nopanel	true

Separator Character

Insert excerpt

	_separator_character
	_separator_character
nopanel	true

Quote Style

Insert excerpt

	_quote_style
	_quote_style
nopanel	true

Quote Character

Insert excerpt

	_quote_character
	_quote_character
nopanel	true

Ignore Extra Columns

Anchor

	fileCollectorIgnoreExtraColumns
	fileCollectorIgnoreExtraColumns

Available when File Type is Comma Separated Values.

Insert excerpt

	_check_box_ticked
	_check_box_ticked
nopanel	true

PhixFlow will not report an error if the record being read contains more columns than expected.

Insert excerpt

	_check_box_untick
	_check_box_untick
nopanel	true

PhixFlow report will report an error if there are too many columns; see Error Handling below.

Ignore Missing Columns

Anchor

	fileCollectorIgnoreMissingColumns
	fileCollectorIgnoreMissingColumns

Available when File Type is Comma Separated Values.

Insert excerpt

	_check_box_ticked
	_check_box_ticked
nopanel	true

PhixFlow will not report an error if the record being read contains fewer columns than expected.

Insert excerpt

	_check_box_untick
	_check_box_untick
nopanel	true

PhixFlow will report an error if there are too few columns; see Error Handling below.

Import Rows Matching Enter an expression that PhixFlow compares to each line in the file against the expression. Only lines that match are imported.

Replace Text Matching In each imported line, find all occurrences of the expression that you enter in Replace Text Matching and replace it with the expression that you enter in With.

With

Excel Data Range Expression

Anchor

	excelRangeExpression
	excelRangeExpression

Available when File Type = Excel Spreadsheet.

Leave this field blank or enter an expression for the spreadsheet data range that PhixFlow will look in. The data that PhixFlow extracts from the range is defined in the File Columns section, below. The expression can specify:

blank: PhixFlow looks at the whole first worksheet in the Excel file, even if this is a hidden sheet.
columns on the first worksheet, for example, "A:C"
columns and rows on the first worksheet, for example, "A1:G10"
the worksheet and data range, with the format "WorksheetName!TopLeftCell:BottomRightCell", for example, "DailyCalls!A1:G100"

You cannot specify:

only a worksheet.
columns for a specified worksheet.

If the worksheet name contains:

spaces, you must enclose the name in single quotes, for example My Sheet is expressed as "'My Sheet'!A1:G10"
single quotes, you must
- enclose the whole name in single quotes
- and use two single quotes (apostrophes).
  For example Gary'sSheet is expressed as "'Gary''s Sheet'!A1:G100"

Tip
Use the internal variable _worksheets to list all worksheets that PhixFlow found in the file.

Expand

title	Simple Examples

To specify the Daily Calls worksheet, columns A to G, rows 1 to 10:

"'Daily Calls'!A1:G10"

To specify different areas from the same worksheet, or multiple worksheets:

["'Daily Calls'!A1:G100", "A1:B20", "'Monthly Totals'!A1:C100"]

To specify John's Data worksheet, columns A to G, rows 1 to 10, escaping the space and apostrophe:

"'John''s Data'!A1:G10"

Expand

title	Example using _worksheets

To examine the list of worksheets that have been found and specify ranges for only certain sheets, if they exist. The worksheet names contain spaces and/or single quotes.

Code Block

do (
    $rangeList = [],
    forEach($sheet, _worksheets,
        if ( listContains(["sheet A", "sheet'B"], $sheet),
            do(
                $safeSheet = "'" + replaceAll($sheet, "'", "''") + "'",
                $range = $safeSheet + "!A1:B10",
                addElement($rangeList, $range)
            )
        )
    ),
    $rangeList
)

This expression evaluates to the list of a maximum of two ranges, if both worksheets sheet A and sheet'B exist. If a sheet is not found, the range is not be included.

This is important for error handling. If you specify a range that is not in the Excel file, PhixFlow will error. If you are not sure that a worksheet will always be included, write an expression like this to check, and only specify the range when the sheet is found.

To ensure PhixFlow recognises the generated ranges as valid, you must use $safeSheet, as this doubles any embedded single quotes and wraps the result in single quotes.

Ignore Undefined Values

Available when File Type is Excel Spreadsheet. When importing the file:

Insert excerpt

	_check_box_ticked
	_check_box_ticked
nopanel	true

PhixFlow ignores all unsupported Excel values such as #N/A, #REF! #DIV/0 and replaces them with null values. When the import is complete, PhixFlow displays a single warning message stating the number of unsupported cell values with details about the first unsupported cell value.

Insert excerpt

	_check_box_untick
	_check_box_untick
nopanel	true

PhixFlow attempts to import all the values that it finds. PhixFlow reports individual warning or errors in the console for any unsupported Excel values. If the import exceeds the maximum number of errors/warnings, PhixFlow ends the process and the import will fail.

Maximum Excel File Size (MB)

Specify the maximum file size, in megabytes, to process. This prevents upload attempts on excessively large files from slowing down the server.

The default is 0, which means no restriction. Specify a number greater than 0, for example 500, to prevent the file collector from processing files larger than this size.

File Password

Available if File Type is Excel Spreadsheet.

If you are reading a spreadsheet which is password protected, enter the password here so that the file can be unlocked.

Confirm Password

Available when File Type is Excel Spreadsheet.

If you are reading a spreadsheet which is password protected, confirm the password here so that the file can be unlocked.

Log Traffic

Insert excerpt

	_log_traffic2
	_log_traffic2
nopanel	true

Log File Collector Traffic : when ticked, PhixFlow always logs the number of records read from files by file collectors, whatever is set here.

Insert excerpt

	_log_traffic1
	_log_traffic1
nopanel	true

File Columns
Anchor
file_columns
file_columns

Available when File Type is Comma Separated Values, Fixed Length Records and Excel Spreadsheet. Enter the attributes of the data columns that you want to extract from the input file. The grid has the standard toolbar and the extra buttons:

Insert excerpt
_attribute_populate
_attribute_populate
nopanel true
Once you have uploaded a file, automatically populate the File Columns grid with the data attributes. PhixFlow samples some rows in the file to determine the values to use.
Insert excerpt
_streams
_streams
nopanel true
Open a repository tab listing all the streams.
Insert excerpt
_file_collector
_file_collector
nopanel true
Open a repository tab listing all the file collectors.

To add attributes, click

Insert excerpt

	_add
	_add
nopanel	true

and to edit an attribute double-click on a row in the grid. PhixFlow opens an attribute form.

Field Description

Name

Enter the name of the column, which can contain any combination of letters, numbers and _ the underscore character.

If you use

Insert excerpt

	_attribute_populate
	_attribute_populate
nopanel	true

, PhixFlow uses data in the first row of a file to generate column names. Any invalid characters are stripped out.

PhixFlow always uses this name to refer to this attribute.

Order

Enter a number that matches the column number in the input file. For example, if you want to extract the third, first and fifth column of data from a file, the three rows in this grid will have the order:

3
1
5

Type

Enter one of the data types:

String
Integer
Float
Date - see Supported Date/Datetime Format Patterns, below
Datetime - see Supported Date/Datetime Format Patterns, below
TrueFalse - see Supported TrueFalse Values, below

If you use

Insert excerpt

	_attribute_populate
	_attribute_populate
nopanel	true

, PhixFlow uses the type it recognises from the data, or String.

Length

Enter the maximum length of the field in the input file

For String, the maximum length of the string
For Integer, the maximum number of digits.

This is:

required when File Type is Fixed Length Records
optional when File Type is Comma Separated Values or Excel Spreadsheet, as only the field sizes of the fields in the stream that this collector will write to are important.

If you use

Insert excerpt

	_attribute_populate
	_attribute_populate
nopanel	true

, PhixFlow uses the length of the longest record that PhixFlow has seen. As PhixFlow does not check all the records, longer fields may exist in the CSV file.

XML Namespaces

This section is available when File Type is XML. It has a toolbar with standard buttons. The grid contains a list of the namespaces defined in an XML response.

To add a namespace to the list, click

Insert excerpt

	_add
	_add
nopanel	true

. PhixFlow opens a new XML Namespace properties tab. To remove a namespace, use the

Insert excerpt

	_delete
	_delete
nopanel	true

in the toolbar.

File Format Description

Available when File Type is Binary File. Enter details about the binary input file format and the data you want to extract.

Field Description

Validate File Format

Tick to indicate you have completed the properties and the XML description matches the file format.

Stream Item Node

Enter an expression for a string or list of strings. Each string is a node name in the binary file that will generate an output record.

Expand

title	Examples

For a single output record, enter:

"DATA_RECORD_1"

For a list of output records, enter:

["DATA_RECORD_1","DATA_RECORD_2","DATA_RECORD_3"]

File Format Description Enter an expression, using XML Binary File Grammar, describing the format of data in the file.

Error Handling
Anchor
fileCollectorErrorHandlingSummary
fileCollectorErrorHandlingSummary

All the properties that support error handling are documented individually above; this is a summary of these features, and what to consider when designing the error handling needed for the files you are loading.

Ignore Missing Columns

If you tick Advanced → Ignore Missing Columns you will still import rows which are missing columns. The missing columns will give you blank values in the stream for the missing fields.

However, PhixFlow starts lining columns up from the left, based on the idea that if there are missing columns, they will be missing from the right hand side. This means if columns are missing from somewhere in the middle or at the beginning of the line, the line may fail to import if the data types don't line up. However, this is not an error trap, it is arbitrary - if you think lines might be missing values in the middle or at the beginning of lines you should be careful about using this option – in any case, you will need further validation.

Ignore Extra Columns

If you tick Advanced → Ignore Extra Columns - again, this assumes that extra columns will appear on the right. If they appear in the middle or at the beginning of the row, the row may or may not import depending on how the data types of the resulting record line up with a standard record. Again, if you think this may occur, you need further validation.

Errors Before Rollback

If you set a number in Advanced → Errors Before Rollback – providing that, across the run (not individual files), the error count is less than the threshold you have set - this will load all records that are not in error. You will get a warning message about any that fail in the console – with the file name, and details of the line that failed. But these are essentially discarded from the import. These files will then be placed in the archive directory.

If, across the run, the threshold is reached – all files that have been processed up to that point will be placed in the error directory, including files don’t contain errors (because these are associated with a failed run). The run will be in error - that is, a red line in the console, the resulting data set for the run (the stream set) will be invalid, and therefore not available to the next step in your process. Any files remaining after the error threshold was reached will remain, untouched, in the input directory.

Supported Formats and Values

Supported Date/Datetime Format Patterns
Anchor
validdateformats
validdateformats

The following formats are available for use in Date and Datetime type fields:

Valid Date Formats	Valid Datetime Formats
dd/MM/yy dd/MMM/yy dd/MM/yyyy dd/MMM/yyyy dd-MM-yy dd-MMM-yy dd-MM-yyyy dd-MMM-yyyy MM/dd/yy MMM/dd/yy MM/dd/yyyy MMM/dd/yyyy MM-dd-yy MMM-dd-yy MM-dd-yyyy MMM-dd-yyyy yyyyMMdd	dd/MM/yy HH:mm:ss dd/MMM/yy HH:mm:ss dd/MM/yyyy HH:mm:ss dd/MMM/yyyy HH:mm:ss dd-MM-yy HH:mm:ss dd-MMM-yy HH:mm:ss dd-MM-yyyy HH:mm:ss dd-MMM-yyyy HH:mm:ss MM/dd/yy HH:mm:ss MMM/dd/yy HH:mm:ss MM/dd/yyyy HH:mm:ss MMM/dd/yyyy HH:mm:ss MM-dd-yy HH:mm:ss MMM-dd-yy HH:mm:ss MM-dd-yyyy HH:mm:ss MMM-dd-yyyy HH:mm:ss yyyyMMdd.HHmmss

The symbols used in these formats are explained in the following table

Symbol	Meaning	Presentation	Examples
y	year	year	1996
M	month of year	month	Jul; 07
d	day of month	number	10
H	hour of day (0~23)	number	0
m	minute of hour	number	30
s	second of minute	number	55

The number of letters used in the pattern determines the format.

Number: The minimum number of digits. Shorter numbers are zero-padded to this amount.
Year: Numeric presentation for the year field are handled specially. For example, if the count of 'y' is 2, the year should be displayed as the zero-based year of the century, which is two digits.
Month: 3 or over, use text, otherwise use number.

Supported TrueFalse Values
Anchor
validtruefalsevalues
validtruefalsevalues

...

The following values are available in upper, lower and mixed cases for use in the TrueFalse type field:

Valid True Values	Valid False Values
true,yes,T,Y,1	false,no,F,N,0

Form Icons

The form provides the standard form icons.

File Collector Attributes

...

File Collector Attributes
Anchor
attributes
attributes

You can use the following PhixFlow internal variables to specify the attributes you want to load using a file collector. These attributes are available for all file types.

Attribute	Description
_fileName	The name of the file.
_lineNumber	The line number of the record within the file it was read from. The _lineNumber attribute is not available for file collectors of File

...

Typeis File Details Only.
_modifiedDate	The datetime of when the file was last modified. The last modified time of a single file residing within a .gz or a .tgz container can not be determined by phixflow, instead the datetime of when the corresponding gz/tgz container was created will be returned.
_path	The full path to the file which is the result of concatenating the _rootDirectory and the _subDirectory values.
_rootDirectory	The root base directory (if specified) concatenated with the value evaluated in the Collectors 'Input Directory Expr' field.
_size	The size of the file in bytes. The size of a single file residing within a .gz or a .tgz container can not be determined by phixflow, instead a size of -1 will be returned.
_subDirectory	The sub-directory relative to the _rootDirectory in which the corresponding file resides.
_worksheet	The name of the current worksheet of the

...

Excel file. The _worksheet is

...

only available if

...

the File Type is Excel Spreadsheet.
_range	The

...

Excel range expression that was used. The _range

...

attribute is only available if

...

the File Type is Excel Spreadsheet.

Handling Compressed Files
Anchor
containers
containers

...

In the majority of cases a compressed file will just contain a single file.

...

For example, simple zip file called DailyCalls.zip would contain a single file named DailyCalls.csv.

However, some compressed files contain directories, sub-directories, files and further compressed files. In such cases the compressed file can be thought of as a directory, and further, any compressed files within the compressed file can be thought of as directories in the directory structure inside the compressed file. Therefore,

...

compressed files are treated like normal directories and obey the same rules when matching the

...

Directory Pattern Expression

...

and the

...

Exclude Dir Pattern Expression

...

. Similarly all directories, sub-

...

directories, and compressed files within a compressed file will also be treated as normal directories when matching the

...

Directory Pattern Expression

...

and the

...

Exclude Dir Pattern Expression

...

. Files contained anywhere inside the directory structure in the compressed file (including files contained in a compressed file within the compressed file) are treated as normal files when matching the

...

File Pattern Expression

...

.

Supported Compressed Files

Compression Type	Files ending with extension	Description
zip	".zip"	A zip archive created by either windows programs such as winzip etc or unix commands such as zip.

...

For example, to create a compressed zip file called dailyCalls.zip that includes a single file called dailyCalls_20120918.csv:

...

zip

...

`dailyCalls.zip dailyCalls_20120918.csv`
tar	".tar"	A tar archive created by the unix tar command.

...

For example to create a tar file called dailyCalls.tar that includes a single file called dailyCalls_20120918.csv:

...

tar -cvf dailyCalls.tar

...

`dailyCalls_20120918.csv`
gz	".gz"	A gz archive created by the unix gzip command

...

. For example, to create a compressed gz file called dailyCalls_20120918.csv.gz

...

that includes a single file called dailyCalls_20120918.csv.

...

gzip dailyCalls_20120918.csv

The unix gzip command always assumes the named container has a single file of the same name contained within.

tgz

".tgz"

A joint tarred gz archive created by combining both the tar and gzip commands into a single command

...

. For example, to create a compressed tgz file called dailyCalls.tgz

...

that includes a single file called dailyCalls_20120918.csv

...

:

tar -cvzf dailyCalls.tgz dailyCalls_20120918.csv

Note
There is currently no support for rar type compressions.

File Compression Examples

The following table shows how each compressed file found will be treated given the following values for 'File Pattern Expression', 'Directory Pattern Expression' and the 'Exclude Dir Pattern Expression'.

Compressed File Name	Compressed File Sub System	File Pattern Expression	Directory Pattern Expression	Exclude Dir Pattern Expression	Matching/Processed Files
DailyCalls10.zip	/DailyCalls10.csv	".Calls10."	".*

...

		DailyCalls10.zip/DailyCalls10.csv
DailyCalls.tar	/subdir1/calls10.csv /subdir1/calls20.csv /subdir2/calls100.csv /subdir2/calls200.csv /subdir3/calls1000.csv /subdir3/calls2000.csv	".calls10."	".*	".subdir2.	DailyCalls.tar/subdir1/calls10.csv DailyCalls.tar/subdir3/calls1000.csv
Outer.zip	/subdir1/calls10.csv /subdir1/calls20.csv /subdir1/Inner.zip/innerdir/calls100.csv /subdir1/Inner.zip/subdir2/calls1000.csv Note that Outer.zip contains a compressed zip file called Inner.zip	".calls10."	".subdir1.	".subdir2.	Outer.zip/subdir1/calls10.csv Outer.zip/subdir1/Inner.zip/innerdir/calls100.csv
Outer.tar.gz	/Outer.tar/subdir1/calls10.csv /Outer.tar/innerdir/calls100.csv /Outer.tar/subdir1/Inner.zip/innerdir/calls1000.csv Note that Outer.tar.gz contains a tar container which in turn contains a compressed zip file called Inner.zip	".calls10."	".subdir1.innerdir.*

...

Outer.tar.gz/Outer.tar/subdir1/Inner.zip/innerdir/calls1000.csv

...

Anchor

The form provides the standard form icons as well as the following:

...

Image Removed

...

Shows the model predecessors of the file collector.

The form also provides the following icons on the File Columns tab:

...

Image Removed

...

If a valid location has been configured in the file collector to locate an existing CSV file, the user can click on a button at the top of the grid to automatically create the file column descriptions in this form.

The column names are taken from the first row of the file. To construct the name all invalid characters are stripped out of the value found in each cell and the result is assumed to be the name of the column.

The remaining rows are examined to try to determine the type and length for each column definition based on the values found in the file. If a type cannot be determined then the column is defined as a string. The length of the field is set to be the length of the longest value found.

...

Image Removed

...

Shows the list of File Collectors.

...

Image Removed

...

Shows the list of Streams.

...

Image Removed

...

Deletes the selected object from the list.

...

Image Removed

...

Adds a new File Attribute. See the File Collector Attribute Details form.

autoconf
autoconf
Easy Loading for Excel Spreadsheets

With PhixFlow, you can quickly and easily load all the data from a worksheet in an Excel file, without having to set all the different file collector and stream properties.

Drag a file collector into an analysis model.
In the file collector properties, enter a name for the collector. Leave Auto Configure ticked.
Hover your mouse pointer over the file collector and click
Insert excerpt
_upload_file
_upload_file
nopanel true
Managed File.
Find and select the Excel spreadsheet you want to load into PhixFlow. Click Open then Insert excerpt
_upload_button
_upload_button
nopanel true
.
For an .xlsx file only, PhixFlow prompts you to specify the worksheet that you want to upload to the file collector.
PhixFlow automatically:
- uploads the Excel data:
  - from the specified worksheet in an .xlsx file.
  - from the first worksheet in an .xls file.
- sets the file collector properties
- creates a stream
- sets the stream properties, including adding attributes for each data column in the Excel spreadsheet.

You cannot use the Auto Configure option to:

select specific Excel data columns to load
always load the latest version of an Excel worksheet when you run analysis on the model
load a data file that is not an Excel file.

Versions Compared

Old Version 24

New Version Current

Key

Form: File Collector Details

Source Type = Specified Directory

Source Type = Managed File

Forms: File Columns

Form: File Column Details

Overview

File Collector Properties

Basic Settings

Internal Variables

Internal Variables

Internal Variables

Advanced

File Columns
Anchor
file_columns
file_columns

XML Namespaces

File Format Description

Error Handling
Anchor
fileCollectorErrorHandlingSummary
fileCollectorErrorHandlingSummary

Ignore Missing Columns

Ignore Extra Columns

Errors Before Rollback

Supported Formats and Values

Supported Date/Datetime Format Patterns
Anchor
validdateformats
validdateformats

Supported TrueFalse Values
Anchor
validtruefalsevalues
validtruefalsevalues

Form Icons

File Collector Attributes

File Collector Attributes
Anchor
attributes
attributes

Handling Compressed Files
Anchor
containers
containers

Supported Compressed Files

File Compression Examples

Anchor

See Also

autoconf
autoconf
Easy Loading for Excel Spreadsheets

Page Comparison

Versions Compared

Old Version 24

New Version Current

Key

Form: File Collector Details

Source Type = Specified Directory

Source Type = Managed File

Forms: File Columns

Form: File Column Details

Overview

File Collector Properties

Basic Settings

Internal Variables

Internal Variables

Internal Variables

Advanced

File Columns Anchorfile_columnsfile_columns

XML Namespaces

File Format Description

Error Handling AnchorfileCollectorErrorHandlingSummaryfileCollectorErrorHandlingSummary

Ignore Missing Columns

Ignore Extra Columns

Errors Before Rollback

Supported Formats and Values

Supported Date/Datetime Format Patterns Anchorvaliddateformatsvaliddateformats

Supported TrueFalse Values Anchorvalidtruefalsevaluesvalidtruefalsevalues

Form Icons

File Collector Attributes

File Collector Attributes Anchorattributesattributes

Handling Compressed Files Anchorcontainerscontainers

Supported Compressed Files

File Compression Examples

Anchor

See Also

autoconfautoconfEasy Loading for Excel Spreadsheets

File Columns
Anchor
file_columns
file_columns

Error Handling
Anchor
fileCollectorErrorHandlingSummary
fileCollectorErrorHandlingSummary

Supported Date/Datetime Format Patterns
Anchor
validdateformats
validdateformats

Supported TrueFalse Values
Anchor
validtruefalsevalues
validtruefalsevalues

File Collector Attributes
Anchor
attributes
attributes

Handling Compressed Files
Anchor
containers
containers

autoconf
autoconf
Easy Loading for Excel Spreadsheets