It is often useful to merge two or more streams of data to add additional attributes for records from another source and / or to identify duplicates and missing records. The examples below show how to do this manually to understand the process as well as using a built in feature of PhixFlow that provides some useful additional attributes on the merged data.
The model for these examples is 001 Merge Streams
Example 1: Manually merging customers and accounts
In this first example, one stream contains Customer names and references and a second stream contains address details from accounts. The data is able to be merged because the Customer reference and Account ID are the same
Step-by-step guide
- Add a new merge stream to hold the merged data.
- Connect the source streams to the merge stream using pull pipes e.g in and in2
- Drag all the non-key attributes from the source streams into the destination stream i.e all attributes except the one(s) that will be used to define the key of the records.
- Hover over the destination stream and select Quick Configure Options->Configure Input
- For each of the source streams displayed, select Show the List of Attributes and drag the key attribute into the box below
- Save the changes
- Now add a new attribute to the destination stream for the key. In the attribute expression type _key
- Run the destination stream. The results will contain one record for each unique key in the source streams, with attributes from all source streams
Example 2: Merging customers and accounts using the Merge feature
In this example, additional attributes are created that show where customers whose address couldn't be found, addresses with no corresponding customer and duplicate records, i.e Customers with multiple records in the address file
Step-by-step guide
- Select the source streams by right clicking on them whilst holding down Shift
- Right click on the model background and select Merge Selected Streams
- For each of the source streams displayed, select Show the List of Attributes and drag the key attribute into the box below as in example 1
- Save the changes and select All attributes in the automatic stream configuration screen
- The stream will now be configured with additional attributes as shown below:
- gridRowColor = Highlights rows in red if there are multiple records / duplicates
- in_Count & in_2_Count = Reports the number of records matching they key in each of the source streams
- in_HasDate & in_2_HasData = Reports whether or not a matching record was found in each stream
Related articles
Filter by label
There are no items with the selected labels at this time.