Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|
Info | ||||
---|---|---|---|---|
| ||||
Finding records with the same key in a large stream table of data. For each pair of matching records from the data sets, a single record is produced in the output. |
Info | ||||
---|---|---|---|---|
| ||||
Finding account details for 1 million records in a reference list of all (~20m) accounts. |
The configuration for this scenario is initially similar to Merging Similar Data Sets. A pull pipe is created from each source stream table to the result stream table. A single grouping key is put on each pipe. The key fields chosen should contain the same sort of data in the same format in both source streams tables, but do not have to have the same attribute name.
Image Modified
For the example scenario, where you want to find account details for 1 million records in a reference list of all (~20m) accounts, 'Source Stream Table 1', in the above screenshot, would be the stream table containing data the 1 million records and 'Source Stream table 2' would be the reference list of all accounts.
The pipe from 'Source Stream Table 1' should be set to mandatory.
The pipe from 'Source Stream Table 2' should be set to 'Directed' and the 'Max Workers' and 'Worker Size' fields populated. 'Max Workers' is the number of groups of keys that will be processed at the same time, i.e. threads. The 'Worker Size' is the number of keys that you want in each group.
Info |
---|
Please note that if you set 'Max workers' to the maximum number of threads available in your environment, then no other tasks will run until your directed merge completes. |
Your stream table should be configured like a standard merge:
The attribute expressions for your key fields will be _key[n], where n is the order number of your key field grouping.
All other attributes will be referred to by prefixing their pipe name. For example, to return the contact name from the contacts pipe, you would write 'contacts.ContactName'.
Info |
---|
Watch out for the multiple records in each group. You will either need to:
|
Check List For a Directed Merge From Database Collector
- One mandatory pipe, sorted by the key field.
- One directed pipe, with max workers and worker size set, sorted by the key field.
- Query in database collector includes the following, where <key field> is replaced by the name of the key field: <key field> is in ({_keyList})
- Query in database collector is ordered by the key field.
- The key field is the stream table has the expression _key[1].