This page is for data modellers or administrators who need to manage stream data retention and deletion.
How to Configure Stream Data Retention and Deletion.
When a data modeller creates a stream they should set up the Data Retention Settings to specify the:
- number of days to keep stream sets and superseded records
- number of stream sets or stream sets with superseded records to keep.
The older data is deleted when you run a stream-data-delete task that acts on the stream; see Using Tasks and Task Plans and Task.
You can set up a stream-data-delete task with:
- either one or more specific streams. You may choose to do this, for example:
- to manage the data in several related streams
- to run frequently on a stream that contains a large amount of data.
- to run occasionally on streams with low volumes of data that change rarely.
- or have the All Streams option ticked. The task will run on all streams:
- that are not in another stream-data-delete task
- and have Data Retention Settings configured.
Managing Full Stream Sets
The following table shows the different combinations of settings possible in stream properties → Data Retention Settings. It assumes that a stream currently contains 8 stream sets:
- 2 from the current day
- 1 from each of the previous 6 days.
The values are:
- N: a number of days
- X: a number of stream sets
null: indicates no value has been entered for this option.
PhixFlow always retains the maximum number of active and superseded stream sets in the data, so that no conflicting stream sets will be deleted.
Archive After X Days | Keeping Latest Y StreamSets | Resulting Streams Archived/Retained |
---|---|---|
null | null | No stream sets will be deleted. |
0 | null | All stream sets will be deleted. |
1 | null | The last day of valid steam sets will be retained. All earlier stream sets will be deleted. In our example the 2 latest stream sets will be retained with the earliest 6 stream sets deleted. |
N | null | All stream sets which are older than N days before the latest valid stream set will be deleted. |
null | 0 | All stream sets will be deleted. |
null | 1 | The last valid stream set will be retained, all other stream sets will be deleted. |
null | X | The most recent Y valid stream sets will be retained, all others stream sets will be deleted. |
0 | 0 | All stream sets will be deleted. |
0 | 1 | The last valid stream set will be retained, all other stream sets will be deleted. |
1 | 0 | The last day of valid steam sets will be retained. All earlier stream sets will be archived. |
1 | 1 | The last day of valid steam sets will be retained regardless of if there are more than 1. If there are no stream sets in the last day then the first previous stream set will be retained instead. |
N | X | Will retain the maximum active stream sets in the data such that no conflicting stream sets will be deleted. |
Superseded Stream Sets
In the case where only the Keep Superseded for N Days and Keep Superseded for X StreamSets fields are populated, the same logic in the table above will apply to the superseded records. Note that again archiving will always retain the maximum superseded stream sets in the data such that no conflicting stream sets will be archived.
In the cases where a mixture of the full archive fields Keep for X Days, Keep for Y StreamSets' and the superseded archive fields Keep Superseded for X Days, Keep Superseded for Y StreamSets are populated, then the full archive values will be first applied and the resultant stream item records will be archived and deleted. Only then will the remaining stream sets use the Keep Superseded ... values to apply a further condition to archive and delete any remaining non qualifying superseded records.