This page is for data modellers. It provides an introduction to streams and pipes.

Overview

In an analysis model a data set is represented by a stream. A stream is a bit like an Excel spreadsheet, in that it contains a set of data with:

columns - these are the stream attributes
rows - these are the data records.

Streams are connected to other modelling objects by pipes. A pipe sends data from the input object to the output object. Objects are usually streams, but there are also objects to load and export data, such as a datasource or file exporter. Usually a pipe with the default settings passes all attributes and records onto the next object. However, you can use the pipe properties to control which attributes and records from the input object you want to pass through.

A pipe is a connector that links two elements in a PhixFlow model and sends data from the input to the output. Pipes allows you to control which attributes and which records from the input are delivered by to the output, although in most cases - with minimal configuration - you will get all columns and the records from the current run.

When you run an analysis model, the data is processed. This means the data set in a stream can change with each anaysis run. This means, unkike Excell, each stream can have multiple data sets over time. These are called stream sets. If there is a problem in the analysis run, you can "undo" it by rolling back the run. You can move stream sets.

To look at the data in a stream you use a stream view. The default view shows data in a grid. You can also create different views such as graphs and charts. Stream view properties have lots of options to control which attributes are included in the view, and how to sort the records.

Types of Stream

There are several types of stream.

Anchor
calculate
calculate

Insert excerpt
_stream_calculate
_stream_calculate
nopanel true

Calculate streams are the most basic stream type in PhixFlow. An output record will be produced for each input record.

Anchor
merge
merge

Insert excerpt
_stream_merge
_stream_merge
nopanel true

Merge streams combine sets of input data. In each input pipe a grouping is defined, and an output record is produced for each key value combination that is produced by this grouping applied across all inputs.

Anchor
aggregate
aggregate

Insert excerpt
_stream_aggregate
_stream_aggregate
nopanel true

Aggregate streams aggregate input data. In the input pipe a grouping is defined, and an output record is produced for each key value combination that is produced by the grouping.

Tip
Simple aggregations are better performed using aggregate pipes.

Aggregate streams are functionally identical to merge streams, but by convention, when there is only one input, an Aggregate Stream is used - this displays as a on the model view. Often this helps to clarify the purpose of the stream in the model.

Anchor
cartesian
cartesian
Cartesian Stream

Cartesian streams perform a cartesian join across all inputs. Although this can be useful in some cases, mostly it is easier and simpler to multiply output records with either an output multiplier - which can be configured for any stream type - or to use a multiplier pipe.

Anchor
byset
byset
CalculateBySet Stream

Calculate by Set streams are like calculate streams in that an output record is produced for each input record. But in addition a grouping can be configured on the input pipe which allows, for each record processed, related rows to be included in calculations.

Which Stream to Use

Scenario	Stream and pipe	Example
You only have one source and want 1 record per input record. See When to Use a Calculate Stream	Calculate	You have a comma separated file that you want to load into PhixFlow.
You only have one source, but you want to group the data and only pull back aggregated information for each group. See When to Use an Aggregate Stream	Aggregate	You want to find the earliest entry in a task list.
Combine data from 2 sources into 1 set of data. For each record in each data set, you get one record. See Merging Two Data Sets	Calculate and Merge	You have a set of customers stored in one system. You have a set of customers in another system. There are no overlaps. You want all your customers in one list.
Combining 2 sets of data that are a similar size and have a common key. For each pair of matching records from the data sets, a single record is produced in the output. See Merging Similar Data Sets	Merge	Comparing a stream of thousands of invoice totals with a stream of thousands of payments for each customer.
Finding records with the same key in a large stream for a large stream of data. For each pair of matching records from the data sets, a single record is produced in the output. See Deduplicating Similar Data Sets	Merge with directed pipe	Finding account details for 1 million records in a reference list of all (~20m) accounts.
Combining a large stream with data from a small stream, where the values of the small stream will be repeated throughout the result. For each pair of matching records from the data sets, a single record is produced in the output. See Enriching Data with Data From Another Set	Calculate with lookup pipe with order/index set	Find the description for each code in a stream of thousands from a stream containing mapping data. There are only ~100 possible codes.
Combining a large stream with data from a small stream, where values in the small stream will only used once in the result. For each pair of matching records from the data sets, a single record is produced in the output. See Combining Data Using a Lookup Pipe	Calculate with lookup pipe with filter	You have a stream containing all attendees of an upcoming football match and a small stream of people who are banned from attending matches.
Combining a large stream with data from a small stream, where the small stream the same rows are repeated throughout the result, but the filter values change slightly. For each pair of matching records from the data sets, a single record is produced in the output. Combining Data Using a Cache Extraction Filter Lookup Pipe	Calculate with lookup pipe with cache extraction filter	You have a price list for 4 different products with different prices between different dates.
You want to look back at a previous record within a group in a stream, or create a cumulative total per group. You get the same number of records as you put in. See Grouping and Referencing Data Using Calculate By Set Stream	Calculate by set	For a given account, you want to find the difference between each consecutive debit/credit to the account.

Publishing Streams
Anchor
publish
publish

When you make changes to a stream's properties or its attributes, PhixFlow publishes the changes to the stream data tables in the PhixFlow database. This happens automatically in the background. Publishing many streams or streams with many attributes can take some time, and may slow performance.

If the stream properties are set incorrectly, PhixFlow will not be able to publish the stream data to the database. If this happens, the

Insert excerpt

	_console
	_console
nopanel	true

will report the publishing error. PhixFlow will also display an error message if you try to interact with the stream, for example to view its data or to run analysis. You must correct the stream properties, so that PhixFlow can retry publishing the stream.

During the publishing process, PhixFlow may create temporary tables in its database. These are kept for a period, then automatically removed when a system task runs. For information about:

the system task, see Using Tasks and Task Plans
configuring the period that temporary tables are kept; see System Configuration → Delete Temp Tables after Days.

Versions Compared

Old Version 15

New Version 16

Key

Overview

Types of Stream

Anchor
calculate
calculate

Insert excerpt
_stream_calculate
_stream_calculate
nopanel true

Anchor
merge
merge

Insert excerpt
_stream_merge
_stream_merge
nopanel true

Anchor
aggregate
aggregate

Insert excerpt
_stream_aggregate
_stream_aggregate
nopanel true

Anchor
cartesian
cartesian
Cartesian Stream

Anchor
byset
byset
CalculateBySet Stream

Which Stream to Use

Publishing Streams
Anchor
publish
publish

Page Comparison

Versions Compared

Old Version 15

New Version 16

Key

Overview

Types of Stream

Anchorcalculatecalculate Insert excerpt_stream_calculate_stream_calculatenopaneltrue

Anchormergemerge Insert excerpt_stream_merge_stream_mergenopaneltrue

Anchoraggregateaggregate Insert excerpt_stream_aggregate_stream_aggregatenopaneltrue

AnchorcartesiancartesianCartesian Stream

AnchorbysetbysetCalculateBySet Stream

Which Stream to Use

Publishing Streams Anchorpublishpublish

Anchor
calculate
calculate

Insert excerpt
_stream_calculate
_stream_calculate
nopanel true

Anchor
merge
merge

Insert excerpt
_stream_merge
_stream_merge
nopanel true

Anchor
aggregate
aggregate

Insert excerpt
_stream_aggregate
_stream_aggregate
nopanel true

Anchor
cartesian
cartesian
Cartesian Stream

Anchor
byset
byset
CalculateBySet Stream

Publishing Streams
Anchor
publish
publish