A pipe is a connector that links two elements in a PhixFlow model and sends data from the input to the output. Pipes allows you to control which attributes and which records from the input are delivered by to the output, although in most cases - with minimal configuration - you will get all columns and the records from the current data set.
The pipe must be enabled to make it active.
For advanced configuration, see Advanced Pipe Configuration.
Panel | ||||
---|---|---|---|---|
| ||||
A pipe joining a datasource to a data collector has no editable details. All the output data set configuration occurs in the SQL query on the collector. |
Name
The name is used to refer to the pipe in other model elements.
Panel | ||||
---|---|---|---|---|
| ||||
The name can have no special characters except the underscore character '_' and it has to start with a letter and cannot be an Attribute Function name. |
Type
There are 3 options available:
...
Data To Read
Specify what input data to use. There are 4 options available:
...
Panel | ||||
---|---|---|---|---|
| ||||
If the Only collect from same run flag is ticked, the pipe will only collect data from inputs from the same analysis run that is generating the output data. This is only used when building a transactional model. |
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
In some circumstances the input Stream may have Stream Sets that have dates in the future relative to the Stream Set being generated for the output Stream. This may happen, for example, if you have rolled back a number of Stream Sets on the output Stream but have not rolled back the corresponding Stream Sets on the input Stream, and have then requested that the output Stream is brought up to date. Some of the Stream Sets on the input Stream will have dates in the future relative to the some of the Stream Sets you are rebuilding. By default, pipes will ignore any Stream Sets with dates in the future relative to the Stream Set you are generating. This is so that if you are rebuilding an old Stream Set the pipe will retrieve the same data on the rerun as it retrieved when the Stream Set was first built. Similarly, if you are running a Transactional Stream, it is possible that while your analysis run is taking place, other analysis runs which started after yours may have completed before yours. These will have generated additional Stream Sets on the input Stream with a future data relative to the date of the Stream Set you are generating. For Transactional input Streams it is possible to tell the pipe not to ignore these future Stream Sets by ticking the Read Future Data tick box on the Advanced tab. |
Static
Normally when a Pipe requests data from a non-static input Stream then that Stream will first attempt to bring itself up to date, generating new Streamsets as necessary, before supplying the data requests. However, if this field is ticked, the input Stream will not attempt to do this.
Multiplier
This causes the pipe to present each candidate set to the output stream in a different way than usual. The multiplier flag is on the Advanced tab of the form.
When processing data, a Stream will usually take one candidate set from each of its input pipes. If the multiplier flag is ticked on one of these, then the stream changes its behaviour to take / instead takes one record from the 'multiplied'/multiplier pipe, and all the records on the other pipes contribute to the pool of candidate sets available to the stream in the usual way. The process then repeats with the next record in the multiplier pipe record set.
Filters, sorting and grouping, aggregating
Filters, sorting and grouping, and aggregating are configured through their own tabs on the form:
...
Pipe Form Reference
The following fields are configured on the Details tab:
...
This field is used to determine which Streamsets to read from the input Stream.
...
Normally when a Pipe requests data from a non-static input Stream then that Stream will first attempt to bring itself up to date, generating new Streamsets as necessary, before supplying the data requests. However, if this field is ticked, the input Stream will not attempt to do this.
...
If this flag is not ticked then it is an indication to PhixFlow that the Stream is not ready to be used during any analysis runs and should be therefore be ignored.
The following fields are available on the Details tab if you set Date To Read = Custom:
...
- Only collect from same run is not ticked
- Max Stream Sets is blank or zero
- Historied is not ticked
The following fields are configured on the Advanced tab:
...
Mandatory
...
If ticked, when multiple Streams are being merged then there must be an input record from this Pipe for an output record to be generated by the output Stream.
If this is a push pipe with positive offsets and this flag is ticked then the notification to create another stream set will only be pushed along the pipe if the last stream set created contains at least one record.
...
The Execution Strategy determines how this pipe should be implemented. See the section on Directed Merge Strategy
...
The maximum number of concurrent worker tasks.
If blank, this defaults to 1.
...
The number of key values to read for a single worker task (which runs a single select statement).
If blank, this defaults to 1000. This is the maximum value that can be used when reading from an Oracle database.
...
The cache is used when carrying out lookups from streams or database collectors. When doing a lookup, there are two common scenarios:
- The pipe does a single lookup onto a stream or database table to get a large number of records in one go (e.g. 10,000 records)
- The pipe does many lookups, getting a small number of records for each lookup (e.g. 10 records at a time).
In case 2, the results returned are typically based on a key value, e.g. an account number. This will be used in the filter of the pipe, if you are reading from a stream, or in the query, if you are reading from a database collector. For example, the query in a database collector will include the condition:
Code Block |
---|
WHERE AccountNumber = _out.AccountNum |
For efficiency, the records are cached (stored temporarily in memory) so that if the same set of records need to be looked up again they are readily available without going back to the database.
This field allows you to set a limit on the size of the cache. Setting a limit is important because if you do not, the cache can become very large and consume a lot of memory, which can lead to a slow down in both your tasks and those of other users of PhixFlow.
To set the cache size, try to estimate the largest number of records that the lookup pipe will return on a single read.
If you do not set a limit, it will default to the system-wide default, specified in the Maximum Pipe Cache Size in the System Tuning tab of the System Configuration.
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
If a single read brings back over 90% of the specified cache size, a warning message will be logged to the console. If a single read brings back 100% or more of the cache size, a second warning message will be generated. If the Enforce Cache Size limit flag is ticked in System Configuration, instead of a warning an error will be generated, and the analysis run will stop completely.
|
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Every time the lookup pipe is referenced, PhixFlow calculates the values of all of the variable elements of the query or pipe filter, and checks if it already has a set of data in the cache retrieved using this set of variable values. If so the data is immediately returned from the cache. Otherwise, a new set of data is read from the stream of collector. If adding the new records to the cache would cause it to exceed the maximum cache size, previously cached results are removed to make enough room for the new results. |
...
During Look Ups
...
During File Export
...
During Drill Down
...
Where to look next
...
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|
...