Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this exercise you will create a stream that will be run daily to read files that are generated daily, containing details of customer sign-ups, by region.
Files containing details of daily sign-ups can be found in the input files, in the directory:
…\inputData\CustomerSignUps\input
First you will need to create a File Collector to read the daily files. Instead of doing an ad-hoc load of the file into CenterView PhixFlow like you have done so far, you will set up an automated file collection. This will actually read a copy of these files on the CenterView PhixFlow server:

  • Create a new model called Process Daily Signup Files
  • Add a File Collector with Name Read CustSignUp Files
  • Double-click on your new File Collector to open the configuration form.
  • In the Details tab:
    • Tick the Enabled flag
    • Leave the Number of Header Lines as 1
    • Leave the flag Allow Non-Scheduled Collector ticked
    • In the field Input Directory Expr. enter the value: "CustomerSignUps/input"

      When setting up automated file collection, the directory is relative to the system-wide default input path set for CenterViewPhixFlow. To see this:

  • Go to - Admin menu (top right-hand corner of the application)
  • Select System Configuration
  • Go to the System Directories tab
  • Note the entry Import File Location – the input directory you entered in your file collector gives the input directory for the files relative to this starting location|

    In file collectors, when entering Input Directory Expr, File Pattern Expression, Archive Directory Expression or Error Directory Expression - you must surround a fixed text value with quotes. E.g. "file.txt"
    When entering Input Directory Expr, Archive Directory Expression or Error Directory Expression you must make directory separators forward slashes (tick)(/), even when you are on a windows platform. E.g. "input/dataFiles"

    • In the field File Pattern Expression enter the value: ".*"
    • Press - Apply at the top of the File Collector form
  • Move to the File Columns tab
    • Press - Create the file attributes automatically from the header row in the file
    • CenterView PhixFlow will read the file and work out the attributes that make up the file. You will see these attributes appear in the Attributes list
  • In the banner of the File Collector details form, press - Create a new Stream using the File Collector attributes
  • Have a look the attributes in the new stream – you can see that these have been derived from the columns in the file collector
  • Save all your changes

The file names in the input directory have the format custSignUp-yyyymmdd.txt. Your stream will be scheduled to run daily, and you want to ensure that the correct file is read in on each day. To do this, you will use one of the dates from the run on each day.
Remember that in CenterView PhixFlow terms a run of a stream is called a Stream Set. In fact, for each Stream Set there are two dates:

...

  • Go to Admin and open System Configuration
  • Set the Effective Date field to 23/03/2009
  • If you wait for a while you will see the Effective Date appear in the bottom left hand of your application – however, you don't need to wait for this; the change will be applied right away

    System Date Override (Effective Date) can be useful on development and testing installations of CenterView PhixFlow – but should never been used on a production system!

  • Run ReadCustSignUpFiles

...

select * from SOURCE_CUST_SIGNUPS
where COLLECTION_DATE > to_date({_fromDate}, 'yyyymmdd.hh24miss')
and COLLECTION_DATE <= to_date({_toDate}, 'yyyymmdd.hh24miss')
Remember that in this query only those parts contained in {} are interpreted by CenterView PhixFlow – the rest is written in the query language of the external database you are collecting data from (in this case, Oracle)
So in this query, CenterView PhixFlow will insert _fromDate and _toDate - the start date and end date of each run (Stream Set) - into the query before running it against the database

...

In this exercise you will create a Stream to read files that are generated with a sequence number. As in the previous exercise, these contain details of customer signups, by region.
These files can be found in the input files, in the directory:
…\inputData\CustomerSignUpsSeq\input
The file names have the format custSignUp-N.txt where N is a sequence number.
Add a file collector to read in a copy of these files on the CenterView PhixFlow server:

  • Add a File Collector, with Name Read CustSignUp Files Seq, to the model you created in exercise 
  • Open the configuration form for Read CustSignUp Files Seq
  • In the Details tab:
    • Tick the Enabled flag
    • Leave the Number of Header Lines as 1
    • Leave the flag Allow Non-Scheduled Collector ticked
    • In the field Input Directory Expr. enter the value: "CustomerSignUpsSeq/input"
    • In the field File Pattern Expression enter the value: ".*"
    • Press - Apply at the top of the File Collector form
  • Move to the File Columns tab
    • Press - Create the file attributes automatically from the header row in the file
  • In the banner of the File Collector details form, press - Create a new Stream using the File Collector attributes
  • Save your changes to the file collector
  • Add a sequence:
    • Find the button - Show the list of Sequences in the left-hand menu bar
    • Press to create a new sequence
    • Enter the following details into the sequence configuration form:
      • Name: SignUpSeq
      • Start Value: 1
      • Press OK to save your changes
  • Update the file collector to use the next sequence number:
    • Open the File Collector configuration form
    • Update the File Pattern Expression to "custSignUp-" + nextValue("SignUpSeq") + ".txt"
    • Press OK to save your changes
    • Run Customer SignUps Seq
    • On the first run (in the first Stream Set) you will see 6 records load
    • In the Console, double-click on the analysis run in the Completed Tasks section, and then at the bottom of the Console open the Imorted/Exported Files tab – you can see the name of the file you have just read in
    • Run the stream two more times to load in the files with the next two sequence numbers
  • If you are finishing this exercise at this point, unset the Effective Date

...

select * from SOURCE_CUST_SIGNUPS
where COLLECTION_DATE > {_inputMultiplier}
and COLLECTION_DATE <= to_date({_toDate}, 'yyyymmdd.hh24miss')
Remember that in this query only those parts contained in {} are interpreted by CenterView PhixFlow – the rest is written in the query language of the external database you are collecting data from.
So in this query, CenterView PhixFlow will insert _inputMultiplier and _toDate into the query before running it against the database

...

  • Update the Effective Date to 23/03/2009 and run the Stream; a Stream Set of 6 records will be loaded – for the period 01/01/2000 (the start date of the Stream) – 23/03/2009
  • Update the Effective Date to 24/03/2009 and run the Stream again; another Stream Set of 6 records will be loaded – this time for the period 23/03/2009 - 24/03/2009
  • Update the Effective Date to 25/03/2009 and run the Stream again; a third Stream Set will be loaded – this time for the period 24/03/2009 – 25/03/2009

    This is an important configuration – although in this example the results are the same in exercise 2.2, in some cases you cannot guarantee that the key date of your input data will be lined up with your CenterView PhixFlow run (stream set) dates. Since this configuration is based on reading from the latest date of the previous read, the dates used are entirely based on the input data and not on the CenterView PhixFlow run dates.
    Note that in practice you generally would not need to add the configuration
    <= …{_toDate}…
    because you would simply read everything after the previous read date – taking all the data available in the table after that point. So in the example your query would be:
    select * from SOURCE_CUST_SIGNUPS
    where COLLECTION_DATE > {_inputMultiplier}

  • Unset the Effective Date