Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Click to select the stream that may contain duplicates.
  2. Right-click on the model view pane, and select 'Merge selected streams'. 

    Image Added
  3. in the pipe configuration dialog that pops up, group on the field with duplicated data and click the green tick to save your input.

    Image Added
  4. in the Automatic Stream Configuration dialog that appears, select 'just key attributes' from the drop down. 
    Image Added
  5. Run analysis on the stream that results. Viewing the data, it can be seen that for each value of the grouping key, 
    PhixFlow reports the number of records in that group, and also highlights lines where it is greater than one.

Step-by-step guide: Removing Duplicates

 

  1. Load all data (including duplicates) into a stream
  2. Create a new stream from this stream - make it an aggregate stream.
    Image Added
  3. Make On the pipe linking the 2 two streams an aggregate pipe, grouped , set the maximum number of records to be one and group it on the field with duplicated data, and sorted by another field. Apply sorting on another attribute, depending on which record you want. eE.g. if you want to get the latest record, you could sort by the last updated date.
    In the second stream, reference data coming from the input pipe using an array index . i.e. Image Added
  4. As an alternative to setting the maximum number of records per group, use the syntax in[1].value to just retrieve the first of the grouped records from a group with more than one record, in the attribute expressions in the aggregate stream.

 

Info

Filter by label (Content by label)
showLabelsfalse
max5
spacesHELP60
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "kb-how-to-article" and type = "page" and space = "HELP60"
labelskb-how-to-article

...