Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Remove Duplicates

Scenario

Files (or database records) can often show up with duplicate data. Often it is OK, and sometimes it is required to ignore duplicate records.

...

  • Load all data (including duplicates) into a stream
  • Create a new stream from this stream - make it an aggregate stream.
  • Make the pipe linking the 2 streams an aggregate pipe, grouped on the field with duplicated data, and sorted by another field, depending on which record you want. e.g. if you want the lastest record, you could sort by the updated date.
  • In the second stream, reference data coming from the input pipe using an array index . i.e. in[1].value to just retrieve the first of the grouped records.

See Also

...