Scenario
Combining a large stream with data from a small stream, where the values of the very small stream will be repeated throughout the result. For each pair of matching records from the data sets, a single record is produced in the output.
Example
Find the description for each code in a stream of thousands from a stream containing mapping data. There are only ~100 possible codes.
Solution
- To do this, use a calculate stream with an order/index lookup pipe.
- In the below screenshot, 'Source Stream 1' about 2000 records and we want to enrich this data with data from 'Source Stream 2', which contains about 50 records.
- The result stream type is set to 'Calculate'.
- The pipe from 'Source Stream 1' is a pull pipe with no grouping.
- The pipe from 'Source Stream 2' is a lookup pipe. An Order/Index entry should be added to define the joining key between the 2 streams.
- This will index all the records from the source stream 2 by the index attribute, so they can be searched quickly. The data will be queried once and the result put into memory.
- All stream attributes use the attribute name, prefixed by the pipe name. For example, in1.Attribute1.
You need to make sure that all attributes that you refer to with _out prefixes in the joining key have a lower order number than those that use the lookup pipe prefix. For example, in the above screenshot, it is essential that Attribute1 has a lower order number than Attribute3. If the order of the attributes were switched around, Attribute3 would not return a value, because the order/index would be looking for records where Attribute1 is null, because it would not yet be calculated.
Watch out for the multiple records returned by your lookup pipe. You will either need to:
- choose a record in each group (e.g. contacts.1.ContactName will choose the first record in the group),
- use an aggregate function to find the best record in the the group (e.g. max(contacts.ContactName),
- or tick 'multiplier' on the pipe, which will cause the resulting stream to display one row per record in the group, rather than one record for the group.