8 Merging data sets directly from database tables

By the end of this chapter you will be able to:

  • Merge directly from database collectors
  • Populate attributes with key values using the '_key[n]' notation

In this exercise you will repeat the comparison you did in the previous exercise, of the full channel list in your model against the list in the inventory database – but this time using a merge:

  1. Add a new merge table to your Channel Package Check model, by dragging in  Merge Table from the analysis toolbar.
  2. Name this table: Channel List Reconciliation 2
  3. Add a pipe from Latest Package All Channels, with the Name: sales
  4. Drag the following attributes from Latest Package All Channels to Channel List Reconciliation 2:
    1. CustomerRef
    2. Channel
  5. Add a pipe from the database collector SOURCE_ALL_CHANNELS_LIST to Channel List Reconciliation 2
    1. Name this pipe: inv

  6. Set up the grouping attributes on the pipe:

    1. Double click on the pipe sales
      1. Go to the Sort/Group section.
      2. Press  Show Attributes and drag in the attributes CustomerRef and Channel.
      3. Press  Apply and Close to save your changes.
    2. Double-click on the pipe inv
      1. Go to the Sort/Group section.
      2. Press .
      3. In the new grouping attribute form that pops up:
        1. Select the attribute CustomerRef from the drop down list.
        2. Tick the Group flag
        3. Press  Apply and Close.
      4. Add the grouping attribute Channel in the same way.
      5. Back in the pipe Properties, press  Apply and Close.

Now you will complete the configuration of the merge table, using the _key internal variable to write key values into the table:

  1. Double click on Channel List Reconciliation 2 to open the Properties.
  2. Double click the attribute CustomerRef.
    1. In the attribute details form that pops up, update Expression to: _key[1]
      1. This writes the first key value into this attribute.
    2. Press  Apply and Close
    3. Double click the attribute Channel
    4. In the attribute details form that pops up, update Expression to: _key[2]
      1. This writes the second key value into this attribute.
    5. Press  Apply and Close.
    6. Add an attribute SalesCheck, and set the Expression to
if (countElements(sales) == 1,
 "YES"
 ,
 "NO"
 )

3. Add an attribute InvCheck, and set the Expression to:

if (countElements(inv) == 1,
 "YES"
 ,
 "NO"
 )

4. Press  Apply and Close in the table Properties to save your changes.

5. Update the Expression in the SOURCE_ALL_CHANNELS_LIST database collector to:

select * from SOURCE_ALL_CHANNELS_LIST
 order by CustomerRef, Channel

6. Toggle  Enabled on the database collector.

The ordering (order by) is needed to support the merge.

When merging directly from database tables, you have to repeat the ordering of data across the query and the pipe; that is, the ordering of data in the database collector query must match the grouping on the pipe from the database collector.

6. Run Analysis on Channel List Reconciliation 2.

7. In your results you should see an additional 3 records that were not included in the reconciliation you did in the previous exercise – channels that are recorded in the inventory, but not in the channels list, are included in your output.

This is because a merge will generate an output record for every key value it finds across all input pipes, whereas if you do a data enrichment using a lookup pipe – you will only get an output record for every record from the input pipe.

Filter unmatched rows

To help you find unmatched rows in your output data, you will now add a filter:

  1. Open the recordset you have just generated in Channel List Reconciliation 2 by hovering over this table and pressing  Show Recordsets.
  2. Create a filter:
    1. Where ANY of the following are true
      1. SalesCheck equals NO
      2. InvCheck equals NO
  3. Press Apply to apply the filter without saving.
  4. You will get a list of all non-matched records, that is, all records where the SalesCheck value is NO or the InvCheck value is NO.
  5. In the model window, click  Save.