Scenario
Combining a large stream with data from a small stream, where the values of the very small stream will be repeated throughout the result. For each pair of matching records from the data sets, a single record is produced in the output.
Example
Find the description for each code in a stream of thousands from a stream containing mapping data. There are only ~100 possible codes.
Solution
- To do this, use a calculate stream with an order/index lookup pipe.
- In the below screenshot, 'Source Stream 1' about 2000 records and we want to enrich this data with data from 'Source Stream 2', which contains about 50 records.
- The result stream type is set to 'Calculate'.
- The pipe from 'Source Stream 1' is a pull pipe with no grouping.
- The pipe from 'Source Stream 2' is a lookup pipe. An Order/Index entry should be added to define the joining key between the 2 streams.
- This will index all the records from the source stream 2 by the index attribute, so they can be searched quickly. The data will be queried once and the result put into memory.
- All stream attributes use the attribute name, prefixed by the pipe name. For example, in1.Attribute1.
Watch out for the multiple records returned by your lookup pipe. You will need to do one of the following:
- Choose a record in each group using a dot number after your attribute. For example,
in_2.Description.1
- Use an aggregate function to find the best record in the the group (e.g. max(contacts.ContactName),
- Tick 'multiplier' on the pipe, which will cause the resulting stream to display one row per record in the group, rather than one record for the group.
- Place 1 in the Max Number of Records per Group, this will take the first record in the group. Use a non-grouped attribute should you wish to sort the group to control the first record.