Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Merge All data and then filter out unwanted data
  • Lookup to DB, via a lookup pipe to the DB collector
  • Use a SQL IN clause via an Input Multiplier

 

Solution:

  • Create a stream to serve as the source of the lookup.
  • Create a DB collector to retrieve the data that needs to be "looked up"
  • Implement data merges based on the 3 methods listed above.

 

Note : where stream and data sizes are small, the lookup method will have a negligible impact on the overall model. However, where data volumes are very large, choosing the correct lookup / enrichment solution can greatly impact the performance of the model. Try to determine which of the 3 methods listed here should be used in the following instances :

  • Stream contains 5,000 records, and DB table for lookup contains 200,000,000 records.
  • Stream contains 10,000,000 records, and DB table for lookup contains 10,000,000 records with an expected hit rate of 5%
  • Stream contains 10,000,000 records, and DB table for lookup contains 10,000,000 records with an expected hit rate of 85%
  • Stream contains 100,000,000 records, and DB table for lookup contains 100,000 records

What other data volume scenarios might impact data enrichment?

See Also

...