Analysis Models for Batch Processing Data
Analysis Models Overview
Analysis models are the graphical representation of the data processing that will be carried out by PhixFlow. Designed to handle large data sets, analysis models define how data is imported, transformed, viewed, analysed and exported. Each task is performed by a modelling object. For example, a Datasource is an object that allows PhixFlow to connect to an external database, and a Table allows you to write functions to manipulate data, such as calculating a value.
Running Analysis
Analysis models can be run in one of the following ways:
- Run Analysis: An ad-hoc request from a model, initiated by clicking Run Analysis in a table's popup toolbar.
- Scheduling a task: Initiated by a Task Plan.
- Triggered by an ActionFlow: Actionflows can call an analysis model.
As PhixFlow runs the analysis model, the steps are recorded in the System Console. Open this up at any time by clicking Administration, then System Console.
Analysis Model Window Layout
Importing Data
PhixFlow supports a range of methods for importing data including files, emails, databases and APIs. See 2. Importing Data.
Transforming Data
Transforming and enriching data is at the heart of analysis modelling, from looking up reference information to performing fuzzy logic deduplication of customers. There are a host of options and strategies, but begin by looking at Transforming Data.
Transformations are performed using a combination of modelling objects, such as tables, and Functions within those objects, such as replaceAll.
Enriching data is also extremely useful, such as structuring unstructured address data into individual address lines or deriving additional information, such as deriving asset categories based on information found in a description. See Enriching Data.
Reconciliation can be performed in analysis models, from simple master data checks, which provide details of records processed vs records output with supporting information, through to transactional reconciliation, where calculations are performed on data to ensure the results of the processed data matches the expected result. Reconciliation is particularly useful for data migrations to validate the accuracy of the data moved. See Reconciliation
Lookups enable data to be read from different tables for the purposes of enriching another. For example, it could be to check the status of an order. For more information on looking up data from other tables, see Enriching Data → Lookup Information.
Performance
Performance is key when handling large data sets, therefore PhixFlow provides a number of features to assist in the area of performance. These include caching data, memory lookups and indexing. See Performance and Tuning.
Modelling Objects
Modelling Objects, such as Tables and Datasources, appear on the canvas and are connected by pipes.
Pipes perform different roles, such as allowing data to flow through (solid lines) or performing lookups (dashed line) to retrieve additional information, such as looking up a product code and returning its name.
See Analysis Properties.
Candidate Sets
Candidate sets are a fundamental concept of function calculation in PhixFlow.
Every time a function calculation is carried out, all the required input data is brought together and organised into sets of data - one set for each Key group.
The Key groups are worked out using the Pipe Grouping Attributes defined on the input Pipe for each table.
Recordsets
Each time you run analysis on a model, PhixFlow creates a new set of data in each table; see Table. A recordset is a collection of data within a table for a given period. Recordsets, and the data they contain, remain in the table until you archive the data (see Task) or manually remove it (see Rollback Recordsets).
Within analysis models, all data is processed before commencing to the next modelling object. This is in contrast to actionflows which process each record in their entirety before moving to the next one.
Periods
The time period of a table determines how data in the table will be handled. The period is typically set to:
- Transactional: allows multiple users to run independent analysis tasks at the same time.
- Variable: generate or collect data since the most recent run of the table to the current date.
Data Range
The data range determines the recordset that will be displayed:
- Latest displays the records from the most recent recordset only.
- All displays the records from every recordset.
Rolling back
To remove data, the record set must be rolled back, see Rollback Recordsets.
Exports
PhixFlow supports a range of methods for exporting data, including files, direct writes to databases and APIs. Details of each can be found in Exporting Data from an Analysis Model.
Scheduling
When working with data, applications and IT systems, there are routine processes that you need to run on a schedule. PhixFlow makes it easy for you to set up and manage these processes using Task Plans, to which you add tasks. See Task Plans.
Example Analysis Model
What Next
The PhixFlow Fundamentals course provides a practical guide to using PhixFlow, including analysing and transforming data using Analysis Models.
Already started PhixFlow Fundamentals?
Return to Analysis Fundamentals
Further Reading
- Importing Data
- Transforming Data
- Exporting Data from an Analysis Model
- Scheduling and Automating Analysis
- API Integration
- Managing Models and Data
- Common Modelling Scenarios