Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
This page is for data modellers. It provides an introduction to streams and pipes.

Insert excerpt
_Banners
_Banners
nameanalysis
nopaneltrue

Overview

In an analysis model

a

, data

set is represented by a stream. A stream is a bit like an Excel spreadsheet, in that it contains a set of data with:columns

is held in a table which consists of:

  • Columns - these are the
stream
  • attributes.
rows
  • Rows - these are
the data sends data from the input object to the output object. Objects are usually streams, but there are also objects to load and export data
  • the data records.
Streams are connected to other modelling objects by pipes. A pipe 

Tables can be connected with pipes, which are used to send data from one table to another. Pipes also connect other types of modelling objects, such as a datasource that connects to a database or file exporter. Usually, a pipe

with the

has default settings, this means it passes all attributes and records onto the next object. However, you can use the pipe properties to control which attributes and records from the input object you want to pass through.

A pipe is a connector that links two elements in a PhixFlow model and sends data from the input to the output. Pipes allows you to control which attributes and which records from the input are delivered by to the output, although in most cases - with minimal configuration - you will get all columns and the records from the current run.

When you run an analysis model, the data is processed. This means the data set in a stream can change with each anaysis run. This means, unkike Excell, each stream can have multiple data sets over time. These are called stream sets

For example, here we have data being imported from a database using a datasource and data collector, the data is held in its original form in a table before being passed onto a second table where is it enriched. It is possible to add filters on any of the pipes to remove records, such as a business being found with no name or a transaction being older than a specified date.


Image Added

To move data between modelling objects we must run an analysis model. PhixFlow then uses the information in each object's properties to process the data. Because with each analysis run the data in a table can change, PhixFlow keeps the dataset for each run. If there is a problem in the analysis run, you can

"undo" it by rolling back the run.To look at the data in a stream you use a stream view. The

delete a dataset(s) by reverting back to the selected recordset; see Rollback RecordsetsYou can also copy or move data from a dataset; see Copying or Moving Table Data.

Viewing Data in a Table is possible via views, the default view shows data in a grid. You can also create

different

your own views such as graphs and charts.

Stream view Anchormergemerge

View properties have lots of options to control which attributes are included in the view, which records are shown and how to sort the records.

Here are the links to pages about streams.

See also:

Types of Stream

There are several types of stream.

Anchorcalculatecalculate   Insert excerpt_stream_calculate_stream_calculatenopaneltrue

Calculate streams are the most basic stream type in PhixFlow. An output record will be produced for each input record.

Running Analysis

Running analysis is where PhixFlow evaluates your configuration and uses this information to processes your data. Analysis is started from a table, and then all modelling objects which procede the table will process the data in order. It should be noted that:

  • Not all modelling objects will appear or need to be in the same analysis model. This means running analysis can run analysis on more modelling objects than appear in your analysis model.
  • Some modelling objects can be set to static, these will not run, unless analysis is run directly on them and PhixFlow will not run any items which precede a static item. This is useful where there is information which you do not need to reprocess every time you run analysis, or want to run it on a different schedule.

Solution

  1. To Run Analysis, hover over the 
    Insert excerpt
    _tables
    _tables
    nopaneltrue
     and click 
    Insert excerpt
    _
stream
  1. run_
merge
  1. analysis
    _
stream
  1. run_
merge Anchoraggregateaggregate 
  1. analysis
    nopaneltrue

Merge streams combine sets of input data. In each input pipe a grouping is defined, and an output record is produced for each key value combination that is produced by this grouping applied across all inputs.

  1. :
    1. Image Added
  2. To Run Analysis automatically on a timed schedule, see Task Plans.

Static Objects

There are several reasons you may wish to set a modelling object to be static, including:

  • You have information which does not need to be updated using an analysis model, such as country ISO codes or VAT amounts. These can be updated by a user through a screen.
  • You want data to be updated on a separate schedule from the other data.

Solution

  1. Tables
    1. Switch on static: hover over the item on an analysis model and click 
      Insert excerpt
      _
stream_aggregate
    1. static
      _
stream_aggregate
    1. static
      nopaneltrue

Aggregate streams aggregate input data. In the input pipe a grouping is defined, and an output record is produced for each key value combination that is produced by the grouping.

Tip

Simple aggregations are better performed using aggregate pipes.

Aggregate streams are functionally identical to merge streams, but by convention, when there is only one input, an Aggregate Stream is used - this displays as a Image Removed on the model view. Often this helps to clarify the purpose of the stream in the model.

AnchorcartesiancartesianCartesian Stream

Cartesian streams perform a cartesian join across all inputs. Although this can be useful in some cases, mostly it is easier and simpler to multiply output records with either an output multiplier - which can be configured for any stream type - or to use a multiplier pipe.

AnchorbysetbysetCalculateBySet Stream

Calculate by Set streams are like calculate streams in that an output record is produced for each input record. But in addition a grouping can be configured on the input pipe which allows, for each record processed, related rows to be included in calculations.

Which Stream to Use

ScenarioStream and pipeExampleYou only have one source and want 1 record per input record.
See When to Use a Calculate StreamCalculateYou have a comma separated file that you want to load into PhixFlow.

You only have one source, but you want to group the data and only pull back aggregated information for each group.
See When to Use an Aggregate Stream

AggregateYou want to find the earliest entry in a task list.Combine data from 2 sources into 1 set of data. For each record in each data set, you get one record.
See Merging Two Data SetsCalculate and MergeYou have a set of customers stored in one system. You have a set of customers in another system. There are no overlaps. You want all your customers in one list.Combining 2 sets of data that are a similar size and have a common key. For each pair of matching records from the data sets, a single record is produced in the output.
See Merging Similar Data SetsMergeComparing a stream of thousands of invoice totals with a stream of thousands of payments for each customer.Finding records with the same key in a large stream for a large stream of data. For each pair of matching records from the data sets, a single record is produced in the output.
See Deduplicating Similar Data SetsMerge
with directed pipeFinding account details for 1 million records in a reference list of all (~20m) accounts.Combining a large stream with data from a small stream, where the values of the small stream will be repeated throughout the result. For each pair of matching records from the data sets, a single record is produced in the output.
See Enriching Data with Data From Another SetCalculate
with lookup pipe
with order/index setFind the description for each code in a stream of thousands from a stream containing mapping data. There are only ~100 possible codes.Combining a large stream with data from a small stream, where values in the small stream will only used once in the result. For each pair of matching records from the data sets, a single record is produced in the output.
See Combining Data Using a Lookup PipeCalculate
with lookup pipe
with filterYou have a stream containing all attendees of an upcoming football match and a small stream of people who are banned from attending matches. Combining a large stream with data from a small stream, where the small stream the same rows are repeated throughout the result, but the filter values change slightly. For each pair of matching records from the data sets, a single record is produced in the output.
Combining Data Using a Cache Extraction Filter Lookup Pipe

Calculate
with lookup pipe
with cache extraction filter

You have a price list for 4 different products with different prices between different dates.You want to look back at a previous record within a group in a stream, or create a cumulative total per group. You get the same number of records as you put in.
See Grouping and Referencing Data Using Calculate By Set StreamCalculate by setFor a given account, you want to find the difference between each consecutive debit/credit to the account.Publishing Streams 
    1. .
    2. Switch off static:
      1. locate the table in the repository and double click it.
      2. In the properties window that opens locate the Analysis Options section and expand it.
      3. Untick Static.
  1. Pipes
    1. Switch on Static: Hover over the pipe and click 
      Insert excerpt
      _static
      _static
      nopaneltrue
      .
    2. Switch off static: Hover over the pipe and click the play button.

Tables and Time Periods

A table can contain any number of datasets, each in turn can contain many records. Depending on how the data will be interacted with, you will need to set the time period which PhixFlow uses to collect the datasets. The period can be:

Insert excerpt
Table
Table
namePeriod
nopaneltrue

For non-transactional periods, PhixFlow checks for incomplete record-sets and reports an error if it finds them. However, pipes from transactional tables allow incomplete recordsets, as data is constantly changing.

Pipes and Data to Read

Pipes have a plethora of uses (see Pipe), one key use is selecting which dataset(s) to read. The two most common options are:

  1. All: This means that the pipe will pull the data from all datasets.
  2. Latest: This means the pipe will only pull information from the very latest dataset.

This option is available in the pipe properties → Data to Read

Publishing Tables
Anchor
publish
publish

When you make changes to a

stream

table's properties or its attributes, PhixFlow publishes the changes to the

stream data tables in the

PhixFlow database. This happens automatically in the background. Publishing many

streams

tables or

streams

tables with many attributes can take some time

, and may slow performance

.

If

the stream

the table properties are set incorrectly, PhixFlow will not be able to publish

the stream data

the table to the database. If this happens, the

Insert excerpt
_console
_console
nopaneltrue
 will report the publishing error. PhixFlow will also display an error message if you try to interact with the

stream

table, for example to view its data or to run analysis. You must correct

the stream

the table properties, so that PhixFlow can retry publishing the

stream

table

During the publishing process, PhixFlow may create

temporary

temporary tables in its database. These are kept for a period, then automatically removed when a system task runs. For information about:

temporary

Insert excerpt
_publishing_space
_publishing_space
nopaneltrue

Learn More

Some options may still refer to streams and stream sets.

Insert excerpt
_terms_changing
_terms_changing
nopaneltrue