Table of Contents | ||||||
---|---|---|---|---|---|---|
|
...
- In the settings tab, set the Input Directory Expression.
- In the model, right-click on the file collector icon to open the context toolbar.
- Click
.Insert excerpt _upload_file _upload_file nopanel true - PhixFlow opens a file browser. Find and select the file.
- Click
.Insert excerpt _upload_file _upload_file nopanel true
Insert excerpt _standard_settings _standard_settings nopanel true
...
Sections on the Settings Tab
Basic Settings
Field | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
Name | Enter a name for this file collector. | ||||||||
Enabled | Tick to indicate you have completed the settings and the file collector is ready to use. | ||||||||
Source Type | Select the root directory on the PhixFlow server in which the file collector looks in for input files. The directory locations are specified in the System Configuration tab → System Directories section. Specified Directory For files that are already stored in a known location on the PhixFlow Server. When you run the file collector PhixFlow will extract data from files in this directory and save the data to the server location specified in System Configuration tab → System Directories → Import File Location. If you do not want to use the Import File Location, select Specified directory and Managed File Use for files that you have on your local machine or the network. Upload the file before attempting to run the collector. PhixFlow uploads the file to the server, in the location specified in System Configuration tab → System Directories → File Upload Directory. When you run the file collector, PhixFlow extrats the data from the files in this location. | ||||||||
Number of Header Lines | Enter the number of lines in the header of the file. These are ignored when reading the file. (This option is not available for Binary File, XML and HTML file types). | ||||||||
File Type | Can have values: Comma Separated Values: fields are delimited by a comma, (or other character). Fixed Length Records: fields have a fixed column width. Binary File: Data is extracted from the file using a Binary File Grammar (in XML) specified in the File Format Description tab. File Details Only: Only attribute details about the file itself will be available. Excel Spreadsheet: Data is extracted from the an excel spreadsheet supporting a .xls or .xlsx extension.
XML File: Data is extracted from an XML file HTML File: Data is extracted from an HTML file | ||||||||
Next Sequence | Available when File Location Strategy is All Files in Folder. Enter the next sequence number expected to be found within the name of the file being imported. | ||||||||
Allow Non-Scheduled Collection | If this is turned on, then the collector will run as part of any ad-hoc Analysis Engine run which requires this data. If not, it will only run as part of a scheduled task under the Analysis Engine. | ||||||||
FTP Site | The FTP Site on which the import file is stored. If no site is specified then the file is assumed to be on the local machine. If a site is specified then all directory paths specified on this form should be the full path to the file since the base directory specified in system configuration is ignored (since the base directory is specific to the local machine). | ||||||||
File Location Strategy | Can have values: All Files in Folder: read all files matching the pattern specified in File Pattern Expression. Read File Paths: read in file path names from a collector or stream. This input database collector or stream must be attached to the file collector by a lookup pipe with no index set. The attribute of the input stream or collector which contains the file path names is specified in the field: File Name Attribute. The value entered into this field should be plain text, e.g. myFilePaths but not quoted "myFilePaths". Each file path name is interpreted as a pathname relative to the Import Directory. A path name may be a simple file name, or it may have multiple levels of directory, including compressed files (which will be interpreted as directories). The directory separator must be '/' (forward-slash), and not '\' (back-slash), even on a Windows platform. There should be no leading '/'. E.g. 'abc.csv', 'dir1/dir2.zip/abc.csv' Read Names: This option is deprecated. Read in file locations from a collector or stream. This input collector or stream must be attached to the file collector by a pipe. The attribute of the input stream or collector which contains the file locations is specified in the field: File Name Attribute. | ||||||||
Tag | This field is only available if the Source Type field is set to Managed File. Specify a directory using string literals only. Do not use PhixFlow variables. When files are uploaded by PhixFlow they are placed into a directory whose full path is a combination of the root File Upload Directory (specified in System Configuration on the System Directories tab), the tag value specified here and the Input Directory specified below (hard coded to 'in' for Managed files). If you are creating a file collector to load email messages and/ or attached files, you can specify a tag here if one has been provided in the subject line of the incoming emails. See Reading Data From an Email Account for further details. | ||||||||
Ignore Base Directory | This field is only available if Source Type = Specified Directory. Normally the base directory, specified in the "System Directories" tab of the "System Configuration" screen, is prepended to all directories specified on this form. However, if this flag is ticked then this does not happen and the directories specified on this form alone are used as the full path specifications for the import file. | ||||||||
Input Directory Expression | When Source Type is Specified Directory Specify a directory using string literals only. Do not use PhixFlow variables. If the Source Type is Specified Directory, files will be read from the directory specified in Input Directory Expression. Unless the flag Ignore Base Directory is ticked, the path specified in this field will be added to the default input directory root - this is specified in the System Configuration File Upload Location. If the flag Ignore Base Directory is ticked, the full path for the input directory must be specified. In fact, this field is an expression. This must evaluate to a plain text string. In the simple case, this will be text surrounded by quotes, for example:
Also, because this is an expression, you must always use / rather than \, even on windows platforms. You can include PhixFlow variables in this expression, for example.:
If you need to include wildcards or some other variable element in the resulting path, you must use the Directory Pattern Expression. If File Location Strategy = All Files in Folder PhixFlow will look in this directory to find files matching the pattern specified in File Pattern Expression. If File Location Strategy = Read Names this is added to the start of the file location read from the file name attribute. When Source Type is Managed File If the Source Type is Managed file, this will contain a non editable value of "in" | ||||||||
Directory Pattern Expression | This field is used to identify valid sub-directories of the input directory. If a Directory Pattern Expression is provided then PhixFlow will not only check the Input Directory for files but will also check all sub-directories of the Input Directory. Each file found will then not only have its name checked against the File Pattern Expression but will also have the relative path from the Input Directory to the file (referred to as the sub-directory path) checked against the Directory Pattern Expression. For example, suppose the Input Directory has the sub-directories: 'region1/teamA'; 'region1/teamB'; 'region2/teamA'. If you want all the files across all regions for teamA, but not teamB, then you could use the following Directory Pattern Expression to pick out just the files for teamA: ".*/teamA/" Alternatively, if you wanted all the files for all teams in region 1 only, you could use the following Directory Pattern Expression: "region1/.*" Regular expression rules are used to perform this match rather than the sort of pattern matching rules you might be used to when listing files. For example:
A number of internal variables are available in these expressions:
Note that there are also a number of predefined compressed file expressions that will always be checked to determine if a file within a valid sub directory is actually a compressed file. If so then this file will assumed to be a valid compressed file and hence will be recursed into as if it was a standard matching directory. Please see Compressed Files for a list of valid compressed file expressions. | ||||||||
Exclude Dir. Pattern Expr. | This field can be used to exclude certain sub-directories found by the Directory Pattern Expression. For example, suppose the Input Directory has the sub-directories: 'region1/teamA'; 'region1/teamB'; 'region2/teamA'. If you want all the files across all regions for teamA, but not teamB, then you could use the following Directory Pattern Expression to find all files: ".*" combined with the following Exclude Dir. Pattern Expr to exclude those for teamB: ".*/teamB/" Regular expression rules are used to perform this match rather than the sort of pattern matching rules you might be used to when listing files. For example:
A number of internal variables are available in these expressions: | ||||||||
File Pattern Expression | Available when File Location Strategy is All Files in Folder. If you want to use one file collector to process several files, enter an expression to generate the list of files to be read. The filenames in this list must match the file names in the input dirtory. This expression must resolve to a Regular Expression As PhixFlow uses regular expression rules to perform this match, not the shell replacement style rules used in many file systems. E.g. to match all files, you must use ".*" and not "*". A number of internal variables are available in these expressions:
| ||||||||
Archive Directory Expression | Optionally, enter an expression for a directory path. The expression must resolve to a Regular Expression.
The Archive Directory Expression is the location to which all files processed by the file collector will be written. The Error Directory Expression is the location to which any files that cause an error during processing will be written. | ||||||||
Error Directory Expression | |||||||||
Local Archive Directory | Available when FTP Site is specified. Specify whether the archive directory is on the PhixFlow server (local) or on the original server. | ||||||||
Local Error Directory | Available when FTP Site is specified. Specify whether the error directory is on the PhixFlow server (local) or on the original server. |
Advanced
Field | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Maximum Files | Enter the maximum number of files that PhixFlow will process when the file collector runs. | ||||||||||||||||||||||
Minimum Files | Enter the minimum number of files that PhixFlow expects to find when the file collector runs. If fewer files are found then this is treated as an error. | ||||||||||||||||||||||
Max Records Per File | Enter the maximum number of records that PhixFlow will read from each file. | ||||||||||||||||||||||
Errors Before Rollback |
See error handling summary below. | ||||||||||||||||||||||
Parallel Readers | Enter the number of files to process in parallel. If blank, this defaults to 1. If the file collector is configured to read files in sequence, this field is ignored and a single file reader is used. | ||||||||||||||||||||||
Unreadable Directories | Select what PhixFlow will do if there are unreadable directories when it is searching a directory hierarchy for files to import.
| ||||||||||||||||||||||
XPath Expression | Available when File Type is XML File or HTML File. Enter valid XPath syntax. For information about how to use XPath expressions and how to use the returned data in the corresponding stream attribute expressions, see XPath Examples. | ||||||||||||||||||||||
Character Set |
| ||||||||||||||||||||||
Column Separator |
| ||||||||||||||||||||||
Separator Character |
| ||||||||||||||||||||||
Quote Style |
| ||||||||||||||||||||||
Quote Character |
| ||||||||||||||||||||||
Ignore Extra Columns |
See error handling summary below. | ||||||||||||||||||||||
Ignore Missing Columns |
See error handling summary below. | ||||||||||||||||||||||
Import Rows Matching | Enter an expression that PhixFlow compares to each line in the file against the expression. Only lines that match are imported. | ||||||||||||||||||||||
Replace Text Matching | In each imported line, find all occurrences of the expression that you enter in Replace Text Matching and replace it with the expression that you enter in With. | ||||||||||||||||||||||
With | |||||||||||||||||||||||
Excel Data Range Expression |
Leave this field blank or enter an expression for the spreadsheet data range that PhixFlow will look in. The data that PhixFlow extracts from the range is defined in the File Columns section, below. The expression can specify:
You cannot specify:
If the worksheet name contains:
| ||||||||||||||||||||||
Ignore Undefined Values | Available when File Type is Excel Spreadsheet. When importing the file:
| ||||||||||||||||||||||
File Password | Available if File Type is Excel Spreadsheet. If you are reading a spreadsheet which is password protected, enter the password here so that the file can be unlocked. | ||||||||||||||||||||||
Confirm Password | Available when File Type is Excel Spreadsheet. If you are reading a spreadsheet which is password protected, confirm the password here so that the file can be unlocked. |
File Columns
Anchor | ||||
---|---|---|---|---|
|
Available when File Type is Comma Separated Values, Fixed Length Records and Excel Spreadsheet. Enter the attributes of the data columns that you want to extract from the input file. The grid has the standard toolbar and the extra buttons:
attributesInsert excerpt _attribute_populate _
_attributesattribute_populate
Once you have uploaded a file, automatically populate the File Columns grid with the data attributes. PhixFlow samples some rows in the file to determine the values to use.nopanel true
Open a repository browser listing all the streams.Insert excerpt _show_streams _show_streams nopanel true
Open a repository browser listing all the file collectors.Insert excerpt _show_file_collectors _show_file_collectors nopanel true
To add attributes, click
_object Insert excerpt _add
_object_add
and to edit an attribute double-click on a row in the grid. PhixFlow opens an attribute form.nopanel true
Field | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Enter the name of the column, which can contain any combination of letters, numbers and _ the underscore character. If you use PhixFlow always uses this name to refer to this attribute. | ||||||||||||
Order | Enter a number that matches the column number in the input file. For example, if you want to extract the third, first and fifth column of data from a file, the three rows in this grid will have the order:
| ||||||||||||
Type | Enter one of the data types:
If you use | ||||||||||||
Length | Enter the maximum length of the field in the input file
This is:
If you use |
Xml Namespaces
Available when File Type is XML. Enter details about the XML namespace of the XML input file
...
Field | Description |
---|---|
Name | For the XML namespace, enter a name that matches the name you use in XPath expressions to extract data from the XML response. For example, if a default namespace is |
Value | Enter the XML namespace, for example http://schemas.xmlsoap.org/soap/envelope/ . |
File Format Description
Available when File Type is Binary File. Enter details about the binary input file format and the data you want to extract.
...
Compressed File Name | Compressed File Sub System | File Pattern Expression | Directory Pattern Expression | Exclude Dir Pattern Expression | Matching/Processed Files |
---|---|---|---|---|---|
DailyCalls10.zip | /DailyCalls10.csv | ".*Calls10.*" | ".* | DailyCalls10.zip/DailyCalls10.csv | |
DailyCalls.tar | /subdir1/calls10.csv /subdir1/calls20.csv /subdir2/calls100.csv /subdir2/calls200.csv /subdir3/calls1000.csv /subdir3/calls2000.csv | ".*calls10.*" | ".* | ".*subdir2.* | DailyCalls.tar/subdir1/calls10.csv DailyCalls.tar/subdir3/calls1000.csv |
Outer.zip | /subdir1/calls10.csv /subdir1/calls20.csv /subdir1/Inner.zip/innerdir/calls100.csv /subdir1/Inner.zip/subdir2/calls1000.csv Note that Outer.zip contains a compressed zip file called Inner.zip | ".*calls10.*" | ".*subdir1.* | ".*subdir2.* | Outer.zip/subdir1/calls10.csv Outer.zip/subdir1/Inner.zip/innerdir/calls100.csv |
Outer.tar.gz | /Outer.tar/subdir1/calls10.csv /Outer.tar/innerdir/calls100.csv /Outer.tar/subdir1/Inner.zip/innerdir/calls1000.csv Note that Outer.tar.gz contains a tar container which in turn contains a compressed zip file called Inner.zip | ".*calls10.*" | ".*subdir1.*innerdir.* | Outer.tar.gz/Outer.tar/subdir1/Inner.zip/innerdir/calls1000.csv |
Also See
...