Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Info | ||||
---|---|---|---|---|
| ||||
Finding records with the same key in a large stream of data. For each pair of matching records from the data sets, a single record is produced in the output. |
Info | ||||
---|---|---|---|---|
| ||||
Finding account details for 1 million records in a reference list of all (~20m) accounts. |
The configuration for this scenario is initially similar to Merging Similar Data Sets. A pull pipe is created from each source stream to the result stream. A single grouping key is put on each pipe. The key fields chosen should contain the same sort of data in the same format in both source streams, but do not have to have the same attribute name.
Image Modified
For the example scenario, where you want to find account details for 1 million records in a reference list of all (~20m) accounts, 'Source Stream 1', in the above screenshot, would be the stream containing data the 1 million records and 'Source Stream 2' would be the reference list of all accounts.
The pipe from 'Source Stream 1' should be set to mandatory.
The pipe from 'Source Stream 2' should be set to 'Directed' and the 'Max Workers' and 'Worker Size' fields populated. 'Max Workers' is the number of groups of keys that will be processed at the same time, i.e. threads. The 'Worker Size' is the number of keys that you want in each group.
Info |
---|
Please note that if you set 'Max workers' to the maximum number of threads available in your environment, then no other tasks will run until your directed merge completes. |
Your stream should be configured like a standard merge:
The attribute expressions for your key fields will be _key[n], where n is the order number of your key field grouping.
All other attributes will be referred to by prefixing their pipe name. For example, to return the contact name from the contacts pipe, you would write 'contacts.ContactName'.
Image Modified
Info |
---|
Watch out for the multiple records in each group. You will either need to:
|
Check List For a Directed Merge From Database Collector
- One mandatory pipe, sorted by the key field.
- One directed pipe, with max workers and worker size set, sorted by the key field.
- Query in database collector includes the following, where <key field> is replaced by the name of the key field: <key field> is in ({_keyList})
- Query in database collector is ordered by the key field.
- The key field is the stream has the expression _key[1].