By the end of this chapter you will be able to:
- Use Calculate by Set Streams
- Use Calculate by Set Streams to give a record number to records in a Stream
- Use the Allow Partial Set Processing flag
- Use the cache() function
Create list of bills with group totals
Drag the Stream AM Bills onto a model view. You will add a Stream to show each bill individually, but also with the maximum, minimum and total amounts across all bills on that account:
- Add a Calculate by Set Stream to your model, with Name Bill with Account totals
- Add a pipe from AM Bills to Bill with Account totals
- Drag all attributes from AM Bills to Bill with Account totals
- Add the grouping attribute ACCOUNT_NUM to the pipe
- To complete configuration of the Stream Bill with Account totals:
- Update the expression for each of the attributes you dragged in from AM Bills to use the _current internal variable. This will get the value for the individual bill being processed.
- Add an attribute MaxBillAmount. Give this an expression to find the maximum bill amount across all bills on the account. Remember that you can access the full set of bills being processed for the current key value – the current account number – by referring to the pipe name.
- Similarly, add an attribute MinBillAmount to give the minimum bill amount across all bills on the account.
- Add an attribute TotalBillAmount to give the total bill amount across all bills on the account.
Run Analysis on Bill with Account totals and check your results.
Load records and give a record number to each
In this exercise you will load records and give a record number to each – simply the number of the record in the load. It is often useful to give each record a unique identifier, if there is nothing in the data itself that easily does this.
- Drag the Stream AM Trans List onto a model view
- Add a Calculate By Set Stream to your model, with Name Trans List with Identifiers
- Add a pipe from AM Trans List to Trans List with Identifiers
- Drag all attributes from AM Trans List to Trans List with Identifiers
- Update the expressions of the dragged attributes to work with a Calculate By Set (using the _current internal variable)
- Add an attribute Identifier with type Integer, and give this the expression: _itemNumber. Look this internal variable up in the help to see what it does.
Run Analysis on Trans List with Identifiers. You will see that the item number you saved in Identifier can act as a unique label for each record.
Make a further update to this configuration:
- Open the configuration form for Trans List with Identifiers
- Go to the Advanced tab
- Tick the flag Allow Partial Set Processing
- Save your changes
Run Analysis on Trans List with Identifiers. You will get the same result as from the previous run. The flag Allow Partial Set Processing does not change the results of a model. It is useful is just such cases as this, when you use a Calculate by Set to generate a unique identifier for every record in a data set, or when you want to use the cache() function (see the next exercise) to calculate a set of running totals – and some of your candidate sets may be large.
In this exercise, you are not using a grouping at all on the input pipe, and so you will use a single candidate set to handle all records in the input data set. Setting this flag means that PhixFlow will not attempt to load the full input data set before starting to process it.
This flag is not suitable when you wish to calculate group totals – sum, max, min, etc. – for each key value, since you may not have all records available when calculating these group totals.
Create a list of bills with running totals (using cache())
Drag the Stream AM Bills onto a model. You will now build a Stream that will show every bill, but with a running total so far, for each account, of: the number of bills the account has had; the total amount they have been billed so far:
- Create a new Calculate by Set Stream with Name Bill Running totals
- Add a pipe from AM Bills to Bill Running totals
- Drag all attributes from AM Bills to Bill Running totals
- Update the expressions of the dragged attributes to work with a Calculate By Set
- Add an attribute to Bill Running totals with Name NumberBillsSoFar
- Set the expression of this attribute so that it will keep a running total of the bills this account has had so far. Look up the cache() function in the help for guidance. Use the value in the ACCOUNT_NUM field as the key in the cache() function.
- Add ACCOUNT_NUM as a grouping attribute to the input pipe into Bill Running totals
- Add BILL_DATE asc as an ordering attribute to the pipe – remember to make sure that the Group flag is not ticked. This will ensure that bills are processed, and running totals calculated, in the correct order for each account.
Run Analysis on Bill Running totals. Check that the number of bills each account has received so far is calculated correctly.
You will now add a new attribute to calculate a running total for the total amount billed so far for each account:
- Add an attribute to Bill Running totals with Name BillsRunningTotal
- Use the cache() function to calculate a running total for the amount billed so far on each account. Remember to use, as a key in the cache function, a different key from that used in the expression to calculate the number of bills so far (if you use the same key for both, the values will clash). You can use the ACCOUNT_NUM field again, but this time concatenated with a fixed string, e.g. _out.ACCOUNT_NUM+"-RunTotal"
Run Analysis on Bill Running totals again. Check that both sets of running totals are calculated correctly for each account.