...
- Merge All data and then filter out unwanted data
- Lookup to DB, via a lookup pipe to the DB collector
- Use a SQL IN clause via an Input Multiplier
Solution:
- Create a stream to serve as the source of the lookup.
- Create a DB collector to retrieve the data that needs to be "looked up"
- Implement data merges based on the 3 methods listed above.
Note : where stream and data sizes are small, the lookup method will have a negligible impact on the overall model. However, where data volumes are very large, choosing the correct lookup / enrichment solution can greatly impact the performance of the model. Try to determine which of the 3 methods listed here should be used in the following instances :
- Stream contains 5,000 records, and DB table for lookup contains 200,000,000 records.
- Stream contains 10,000,000 records, and DB table for lookup contains 10,000,000 records with an expected hit rate of 5%
- Stream contains 10,000,000 records, and DB table for lookup contains 10,000,000 records with an expected hit rate of 85%
- Stream contains 100,000,000 records, and DB table for lookup contains 100,000 records
What other data volume scenarios might impact data enrichment?
See Also
...