3 Combining Data Sets

By the end of this chapter you will be able to:

You will also learn how to use an internal variable, _pipeName.

In this exercise you will produce a single list of updates to customers' Cable TV packages. Customers can update their packages both through a customer care system and in shops. You will bring all these updates together into a single list.

Read in data sets from files

For this exercise you need two files from the zip file you downloaded in  2 Using Excel Templates, step 5. The files contain updates from the two sources: the customer care system and the shop.

  • <path>\train\inputData\ChannelPackages\CustomerCareUpdates\CustomerCareUpdates.txt
  • <path>\train\inputData\ChannelPackages\ShopUpdates\ShopUpdates.txt
  1. Create a new model with the Name, Channel Package Check.
  2. Add a file collector to your model and load the data as follows:
    1. From the model toolbar, drag a  File Collector into the model.
    2. PhixFlow displays the settings tab for the new file collector. Set the Name to Customer Care Updates.
    3. In the model, hover your mouse pointer over the Customer Care Updates file collector to display the context toolbar and click  Upload File.
    4.  Navigate to the directory <path>\train\inputData\ChannelPackages\CustomerCareUpdates.
    5. Select the file CustomerCareUpdates.txt, click Open, then click  Upload File.
    6. PhixFlow adds a new table to the model, called CustomerCareUpdates.
    7. In the model, hover over the new CustomerCareUpdates table and click   Run Analysis
    8. Check the data is loaded. There should be 8 lines of data, with columns for Customer Ref, Sales Date and Package.
    9. Now the data is loaded into the table, set the table to  Static.
  3. Add another file collector to your model and repeat the process described above, this time:
    1. Set the file collector Name to Shop Updates.
    2. Upload <path>\train\inputData\ChannelPackages\ShopUpdates\ShopUpdates.txt.
    3. PhixFlow loads 7 rows of data to this table.
    4. Remember to set the table to Static.

Screenshot of the Channel Package Check model so far:

Combine the data sets

  1. Create a new table in your model and populate its attributes:
    1. Hover over the CustomerCareUpdates table and click  Create New Table.
    2. In the settings, set the Name to  Combined Updates.
    3. In the model, show the table attributes for CustomerCareUpdates .
    4. Select all the attributes and drag them into the properties for the Combined Updates table. Drop them into the Attributes section.
    5. Click  Apply and Close.
  2. To save your model, in the model toolbar click  Save
  3. Connect the ShopUpdates table to the Combined Updates table:
    1. Hover over ShopUpdates and click  Connector.
    2. Click Combined Updates to link the pipe to the table.

      When you connect tables, PhixFlow automatically adds a reference to the input pipe name. This appears in the attribute expression. To merge this data successfully,  you must update the attribute expressions to remove the in. prefix; see step 4 below.

  4. Fix all the attribute expressions.
    1. Double-click on Combined Updates to open its settings tab. For each attribute:
    2. In the Attributes section, double-click on an attribute to open its settings.
    3. In the Basic Settings section, edit the Expression to remove in. and click  Apply and Close
  5. In the Combined Updates table settings tab, click  Apply and Close to save and close.
  6. In the model, hover your mouse pointer over Combined Updates and click  Run Analysis.
  7. When the table has run, check that the table has:
    • the columns Customer Ref, Sales Date and Package
    • 15 rows
    • all cells have data.
      The default view:

To count the number of rows on a view, select  More Options in the top right of the view window, and click  Toggle Paging Bar.

Record the source for each update

You will now update your settings to record the source for each update, in the set of combined updates. To do this:

  1. To update the name of the pipe from the CustomerCareUpdates table to the Combined Updates table:
    1. In the model, click on the pipe to open its settings tab.
    2. Set Name to CC.
    3. Click  Apply and Close to save and close the pipe settings.
  2. Repeat the above steps to name the pipe from the ShopUpdates table to the Combined Updates table, shop.
  3. Double-click the table Combined Updates to open its settings tab.
  4. In the Attributes section, click  Create New add a new attribute and set:
    1. NameSource.
    2. Expression:

      if (_pipeName == "CC",
       "Customer Care"
       ,
       "Shop"
       )
    3. Click  Apply and Close to save and close the attribute settings.
  5. To make Source the first attribute in the table, drag it to the top of the attributes list.
  6. Click  Apply and Close to save and close the table settings.
  7. Run analysis on Combined Updates.
  8. In the model window, click  Save.
  9. Check the output data set and make sure that the source has been recorded on each record correctly.

Snapshot of the Chanel Package Check model so far: