Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Insert excerpt
_Banners
_Banners
nameanalysis
nopaneltrue

Introduction

Merges can be used to consolidate records from two or more tables. In this page we will work through a simple merge scenario to explain how merges work. All

resource

resources used in this example are available here: 

Scenario

We have two sets of company information, Businesses 1.xlsx and Businesses 2.xlsx, that we want to merge together and remove duplicates. There are a number of challenges we must resolve to achieve this:

  1. The attribute names in each file are different.
  2. There are duplicate companies that exist in both files.

  3. There are duplicates within the Businesses 2.xlsx file.

Solution

Step 1  Import the data

  1. Using a

    Insert excerpt
    _file

collector import
  1. _collector
    _file_collector
    nopaneltrue
    , import each of the Businesses Excel documents into their own table and run analysis on each to populate them with data.

Make
  1.  Make both of the tables static using the 

    Insert excerpt
    _static
    _static
    nopaneltrue
     option from the table hover over toolbar.
    This ensures we only read the files once.

 Create

 Step 2  Create the Merge Table

From the Create section of
  1. In the analysis toolbar, click 

    Insert excerpt

_datasource_show
  1. _tables

_datasource_show
  1. _tables
    nopaneltrue

drag
  1. Drag the Image Modified Merge Table  onto your model.

  2. In the settings, set the Name for example Businesses Merged

  3. The attributes can be created manually using the plus button in the Attributes section. Here we will use the attributes from the Businesses 1 table to automatically setup our merge attributes.

    1. Hover over the Businesses 1 table and from the hover over toolbar click 

      Insert excerpt
      _attributes_show_hide
      _attributes_show_hide
      nopaneltrue
      .

    2. Tick all of the attributes and then drag them into the attributes section of the Merge Table. 

Click Image Removed OK
    1. If PhixFlow asks for confirmation. Select Use original attribute name. Then click the arrow Image Added to confirm.
      Image Added
  1. Click 

    Insert excerpt
    _finish
    _finish
    nopaneltrue
     to save and close the settings for the merge table.

Step 3  Connect the Inputs

  1. In the analysis model, hover over the Businesses 1 table.

  2. From the hover over toolbar, select Image Modified Connector, then click the end of the arrow onto the merge table. This connector will pipe data into the merge table.

  3. A properties window opens for the pipe, set the name to B1 to better indicate the use of the pipe.

  4. In the Sort/Group section of the properties click 

    Insert excerpt
    _attributes_show
    _attributes_show
    nopaneltrue
     and PhixFlow displays the list of attributes (columns).

  5. Drag the Name attribute into the Sort/Group section. This defines the attribute that will be used to merge on, we call this the key. Keys are listed in order and we reference them by number, therefore the keys must appear in the same order on all pipes leading into the merge table. This means attributes we are merging on do not need to have the same name.

  6. Similarly, add a connector from the table Businesses 2 to the merge table and add the Company Name to the Sort/Group.

    1. Businesses 2 has duplicates within its data, we can

define which records
    1. remove these and decide which record to keep. Here we will use the most recent

Onboarding Date
    1. Onboarded Date to decide which record to keep.

    2. Drag the

Onboarding
    1. Onboarded Date attribute into the Sort/Group section.

    2. Double-click on the attribute and untick

group
    1. Group and set Direction to Z-A.

What we have done here is to use th 
    1.  By unticking group we are telling PhixFlow to sort by this attribute but omit it from grouping.

    2. PhixFlow will group and sort on the Company name , and use the Onboarded Date to also sort the groups. We need to select the top record from the group, and we do this by setting the Maximum Number of Records Per Group to 1. This will ensure we get the most recently onboarded business. The configuration should look like this:  
      Image Added

Step 4  Configure the Merge Table

Finally we
  1. We now need to tell the Merge Table

what we are using to merge on, and which values from the tables we want to keep. For example, if there a company in both Business Tables, we can set the precedence to use the one from Business 1
  1. how to populate its attributes.

  2. Click on the Merge Table, double click on the Name attribute. We are merging on the
names set the
  1. first merge key
we want to use
  1. ,
we only have one 

Configure groupings on the pipes

  1. Double-click on the pipe from CustomerAddresses  to Customer All Details to open its settings.
    1. In the Basic Settings section, set:
      • Name: this pipe is automatically named in.
    2. In the Sort/Group section click Image Removed Show Attributes. PhixFlow displays the list of attributes (columns).

    3. Drag CustomerRef from the list into the settings.
    4. Close the list of attributes.
  2. Double-click on the pipe from CustomerAddresses to SOURCE_CUSTOMER_PHONE_NUMBERS. This pipe will be called in_2.
    1. In the Sort/Group section, click Image Removed Show Attributes. PhixFlow displays a list of the attributes (columns) in the CustomerAddresses  data.

    2. Drag CUSTOMER_REF into the Sort/Group section.
    3. Close the list of attributes.
  3. Now add attributes from CustomerAddresses to the merge stream:
    1. Hover over CustomerAddresses and in the pop-up toolbar, click Image Removed Show Attributes.
    2. Drag all attributes from this list and drop them onto the stream icon for Customer All Details.

      How to select all the attributes in a list

      To select all attributes, you can:

      • either click the first attribute then Shift+click the last attribute
      • or click the check box next to the Name header. 

      Image Removed

    3. PhixFlow asks for confirmation. If it is not already selected, select Use original attribute name. Then click the arrow Image Removed to confirm.
      Image Removed
  4. Similarly, drag all attributes from SOURCE_CUSTOMER_PHONE_NUMBERS to Customer All Details.
  5. To see the resulting settings for Customer All Details, double-click its icon to open the settings tab and look at the Attributes section. You can see a list of 8 attributes (data columns).

Run analysis and view the stream data

  • To run analysis on your new stream, hover your mouse pointer over the stream Customer All Details, and in the pop-up toolbar, click Image Removed Run Analysis.
  • To view the data you just loaded, hover over the stream and click Image Removed Show view. In the drop-down list, select Image Removed Default View. You will see the address data you loaded from the file combined with the phone numbers you loaded from the database
    1. the name of the business, enter _key[1] in the expression:
      Image Added
      If you were merging on a second attribute enter _key[2], and this can be repeated for any number of attributes.
    2. For the Address1 attribute, we can decide on which source will have the best data to create a merged record from. Here we will let Businesses 1 take precedence. Double click on Address 1 and enter ifNull(B1.Address1, B2.AddressLine1). This tell PhixFlow to use the Address from the Businesses 1, but if it is null then take it from Businesses 2.
    3. Repeat these steps for the remaining attributes and your configuration will look like this: 
      Image Added

    Step 5  Run Analysis

    1. Run analysis to populate the Merge Table.
      Image Added 
    2. Hover over the merge table and click 
      Insert excerpt
      _table_with_attributes
      _table_with_attributes
      nopaneltrue
       → Default View.

    Troubleshooting

    1. How do I clear the data?
      To remove data from a table see Removing Recordsets.
    2. Why is there no data when I run my import?
      When you run an import the file you uploaded will be archived. If you run the import a second time without uploading a new file there is nothing to import. Upload your file again and run analysis on the table.
    3. PhixFlow says it can't find the attribute.
      In the merge table ensure the values in your expressions match the names of the attributes coming from your input tables