Versions Compared
Version | Old Version 2 | New Version 3 |
---|---|---|
Changes made by | ||
Saved on |
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|
Introduction
Merges can be used to consolidate records from two or more tables. In this page we will work through a simple merge scenario to explain how merges work. All resource used in this example are available here:
Scenario
We have two sets of company information, Businesses 1.xlsx and Businesses 2.xlsx, that we want to merge together and remove duplicates. There are a number of challenges we must resolve to achieve this:
- The attribute names in each file are different.
There are duplicate companies that exist in both files.
There are duplicates within the Businesses
12.xlsx file.
Solution
Import the data
Using a file collector import each of the
BusiesssBusinesses Excel documents into their own table and run analysis on each to populate them with data.
Make both of the tables static using the
option from the table hover over toolbar.Insert excerpt _static _static nopanel true This ensures we only read the files once.
Create the Merge Table
From the Create section of the analysis toolbar, click
Insert excerpt _datasource_show_tables _datasource_show_tables nopanel true drag the Image Modified Merge Table onto your model.
In the settings, set the Name for example
Businesses Merged
The attributes can be created manually using the plus button in the Attributes section. Here we will use the attributes from the Businesses 1 table to automatically setup our merge attributes.
Hover over the Businesses 1 table and from the hover over toolbar click
.Insert excerpt _attributes_show_hide _attributes_show_hide nopanel true Tick all of the attributes and then drag them into the attributes section of the Merge Table.
Click Image Modified OK to save and close the settings for the merge table.
Connect the Inputs
In the analysis model, hover over the Businesses 1
table
.- In
From the hover over toolbar, select Image Modified Connector, then click the end of the arrow onto the merge table. This connector will pipe data into the merge table.
- In the
A properties window
thatopens for the pipe
in
, set the name to B1 to better indicate the use of the pipe.
It is good practice to utilise meaningful names. In the Sort/Group section of the properties click
and PhixFlow displays the list of attributes (columns).Insert excerpt _attributes_show _attributes_show nopanel true Drag the
Name
attribute into the Sort/Group section. This defines the attribute that will be used to merge on, we call this the key. Keys are listed in order and we reference them by number, therefore the keys must appear in the same order on all pipes leading into the merge table. This means attributes we are merging on do not need to have the same name.Similarly, add a
connectorconnector from the table
Businesses 2
to the merge table and add the Company Name to the Sort/Group.Businesses 2 has duplicates, we can define which records to keep. Here we will use the most recent
Onboarding Date
.Drag the
Onboarding Date
attribute into the Sort/Group section.Double click on the attribute and untick group and set Direction to Z-A. What we have done here is to use th
Configure the Merge Table
Finally we now need to tell the Merge Table what we are using to merge on, and which values from the tables we want to keep. For example, if there a company in both Business Tables, we can set the precedence to use the one from Business 1.
Click on the Merge Table, double click on the Name attribute. We are merging on the names set the merge key we want to use, we only have one
Configure groupings on the pipes
- Double-click on the pipe from
CustomerAddresses
toCustomer All Details
to open its settings.- In the Basic Settings section, set:
- Name: this pipe is automatically named
in
.
- Name: this pipe is automatically named
In the Sort/Group section click Show Attributes. PhixFlow displays the list of attributes (columns).
- Drag
CustomerRef
from the list into the settings. - Close the list of attributes.
- In the Basic Settings section, set:
- Double-click on the pipe from
CustomerAddresses
to
. This pipe will be calledSOURCE_CUSTOMER_PHONE_NUMBERS
in_2.
In the Sort/Group section, click Show Attributes. PhixFlow displays a list of the attributes (columns) in the
CustomerAddresses
data.- Drag
CUSTOMER_REF
into the Sort/Group section. - Close the list of attributes.
- Now add attributes from
CustomerAddresses
to the merge stream:- Hover over
CustomerAddresses
and in the pop-up toolbar, click Show Attributes. Drag all attributes from this list and drop them onto the stream icon for
Customer All Details
.How to select all the attributes in a list
To select all attributes, you can:
- either click the first attribute then Shift+click the last attribute
- or click the check box next to the Name header.
- PhixFlow asks for confirmation. If it is not already selected, select Use original attribute name. Then click the arrow to confirm.
- Hover over
- Similarly, drag all attributes from
toSOURCE_CUSTOMER_PHONE_NUMBERS
Customer All Details.
- To see the resulting settings for
Customer All Details
, double-click its icon to open the settings tab and look at the Attributes section. You can see a list of 8 attributes (data columns).
Run analysis and view the stream data
- To run analysis on your new stream, hover your mouse pointer over the stream
Customer All Details
, and in the pop-up toolbar, click Run Analysis. - To view the data you just loaded, hover over the stream and click Show view. In the drop-down list, select Default View. You will see the address data you loaded from the file combined with the phone numbers you loaded from the database.