Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

PhixFlow generates processing statistics which are saved to the database. These statics are for use by administrators and designers who are tuning for performance.

See System Configuration for how to enable and manage stats generation.

Processing Statistics

The following statistics are currently recorded and available as data records in the STATS table. (See STATS record definition)

The performance statistics are classified by 3 major dimensions:

Insert excerpt
_Banners
_Banners
nameadministration
nopaneltrue

This page is for administrators and application designers who need data on how the system or their application is running.

Overview

The System Configuration option Log Statistics generates processing statistics into a database table.  We recommend this option is always ticked because log statistics are very useful:

  • when troubleshooting problems
  • when tuning applications or the PhixFlow system to improve performance.

Statistics Table Structure

Processing statistics are generated in a single table, stats.

The table structure for stats is as follows:

Column
Type
Description
from_dtmDatetimeThe start time of the period that this measure refers to
to_dtmDatetimeThe end time of the period that this measure refers to
initiator_typeString

The high-level object initiating the activity - eg. "TaskPlan", or "Action"

initiator_idId (String)The ID of the initiating TaskPlan or Action / other object.
initiator_nameString

The name of the initiating object.

context_typeStringThe activity triggered by the initiator. eg "Table"
context_idId (String)

e.g. The "Tables" ID that Analysis was run on.

context_nameStringeg. The name of the table that Analysis was run on.`
full_contextString

This a dotted notation indicating the full context.

eg. TaskPlan1.TableX.in

stats_typeString

The aspect of the system's behaviour that is measured

data_typeStringThe units of the measurement
data_valueDouble

The value of the measurement


Statistics Dimensions

The performance statistics are classified by 4 major dimensions:

  • Initiator: the high-level object initiating the system activity e.g. Task Plan
  • Context: the low-level object with which the system activity is associated e.g. Pipe
  • Stats Type: the aspect of system behaviour being measured e.g. database read times
  • Sample DateTime: Aggregate values apply to a period between a start and end time. Spot values apply to the end time.

Initiator

The initiator is the object representing the user activity that caused the system activity.

Initiators include:

  • running

    ...

      • an

    ...

      • action

    ...

      • a task plan
      • a table
    • viewing data for a

    ...

    • view
    • the system- for internal activities and things that can't be allocated to any specific cause e.g. memory and CPU usage.

    Where one initiator could include another, the activity is recorded against the initiator closest to the user

    ...

    . For example:

    • the user runs an

    ...

    • action which runs a

    ...

    • task plan - the initiator is the

    ...

    • action.

    ...

    • the user runs the same

    ...

    • task plan directly - the initiator is the

    ...

    • task plan

    With the exception of

    ...

    system, the initiator is identified by name, type and id.

    Context

    The context is the most detailed object to which the activity can be attributed.

    Contexts include:

    ...

    • table-action
    • task plan
    • table
    • pipe
    • collectors: file, database or HTTP
    • exporters: file, database or HTTP
    • system

    With the exception of

    ...

    system, the context is identified by name, full name, type and id.

    The full name is a dot-separated list of objects from the highest level to the lowest, ending in the context object.

    Stats Type / Data Type

    Stats Type is the specific aspect of system behaviour being measured.

    Data Type

    Data

    ...

    type contains the units of the measurement.

    Data types that represent a snapshot value are shown as the plural unit e.g. the Data Type for the amount of Java memory used is 'bytes'.

    Data

    ...

    types that represent a rate, or amount of throughput are shown as the average rate per second e.g. the data type for the number of items generated in a

    ...

    table is 'items/s' i.e. items per second. The value is calculated by dividing the total number or amount recorded in the sampling period

    ...

    by the duration of the sampling period in seconds. The rate per second is shown rather than the absolute number so that if the sampling period is changed, the numbers (the rate per second) stay in the same range.

    Data

    ...

    types include:

    Data TypeDescription
    activities/sActivities per second
    secondsA simple time value, e.g. a maximum wait time for a database statement to execute
    seconds/s

    Seconds per second. Stats that record times cumulative times (e.g. the total of internal wait times) are normalised per second.

    Where this applies to a single-threaded stats type, the value will always be between 0.0 and 1.0, where a value near 1.0 means that that part of the system is waiting nearly 100% of the time.

    itemsA number of items (records), e.g. the number of items in a pipe cache.
    items/sThe number of items processed per second e.g. the number of items generated per second.
    ops/sA number of operations per second (e.g.

    ...

    database reads)
    bytesA total number of bytes, e.g. the number of bytes of Java memory used.
    busyThe fraction of the time that a resource is busy e.g. the CPU utilisation.
    tasksA number of tasks

    Stats

    ...

    Type

    ...

    Each Stats Types record a single

    ...

    type of data. The data type is stored in the data as an indicator of the units of the data value for each stats type.

    Activity

    Activity stats record information about the high-level activities that the system was performing at any time. In general these are user-level actions e.g. running a Task Plan.

    Stats TypeData TypeDescription
    activity.startactivities/sThe number of activities per second that started
    activity.endactivities/sThe number of activities per second that finished
    activity.timeseconds/sThe time spent per second running the activities

    ...

    Database

    Database stats record aggregate values for low-level database operations.

    Stats TypeData TypeDescription
    data.exec.timeseconds/sTime spent executing database statements.
    data.read.timeseconds/sTime spent reading from a database.
    data.read.opsops/sNumber of database read operations

    ...

    per second
    data.read.itemsitems/sNumber of items (records) read from database

    ...

    per second
    data.write.timeseconds/sTime spent writing to a database.
    data.write.opsops/sNumber of database write operations

    ...

    per second
    data.write.itemsitems/sNumber of items (records) written to database

    ...

    per second

    Data Generation

    Data Generation stats record details of table data generation.

    Stats TypeData TypeDescription
    csf.create.timeseconds/sTime spent creating candidate sets
    csf.find.timeseconds/sTime spent finding data for candidate sets
    csf.process.timeseconds/sTime spent processing candidate sets to create records.
    generate.timeitems/sNumber of items generated

    ...

    per second
    generate.itemsseconds/sTime

    ...

    spent per second generating items 

    Data Output

    These stats record details of the output phase of Data Generation, in which items are written to an output queue so that they can be written out by one or more asynchronous writer processes.

    Stats TypeData TypeDescription
    output.enqueue.timeseconds/sTime spent adding generated items to the output queue
    output.dequeue.timeseconds/sTime spent taking generated items from the output queue
    output.dequeue.itemsitems/sThe number of items/s taken from the output queue
    output.wait.timeseconds/sTime spent waiting for the output writer
    output.write.timeseconds/sTime spent writing out items
    output.write.itemsitems/sThe number of items written out / second
    output.reject.itemsitems/sThe number of items rejected / second

    ...

    Pipes (Pull)

    These stats types record various aspects of the behaviour of pull pipes

    Stats TypeData TypeDescription
    pipe.pull.idle.timeseconds/sTime spent idle
    pipe.pull.prepare.timeseconds/sTime spent preparing (creating pipe candidate sets)
    pipe.pull.prepared.timeseconds/sTime spent after preparation
    pipe.pull.process.timeseconds/sTime spent creating candidate sets
    pipe.pull.processed.timeseconds/sTime spent after processing
    pipe.pull.read.timeseconds/sTime spent reading
    pipe.pull.readresponse.timeseconds/sTime spent processing read responses
    pipe.pull.submitted.timeseconds/sTime spent submitting read requests

    Pipes (Lookups)

    These stats types record various aspects of the behaviour of lookup pipes.

    Stats TypeData TypeDescription
    lookup.added.itemsitems/sThe number of items added to a pipe cache per second
    lookup.clash.opsopsThe number of lookups where another process was reading data for the same lookup
    lookup.miss.opsops/sThe number of lookups per second not satisfied by data already in the pipe cache
    lookup.removed.itemsitems/sThe number of items removed from a pipe cache per second
    lookup.size.itemsitemsThe number of items in a pipe cache
    pipe.lookup.opsops/sThe number of lookups / second
    pipe.lookup.timeseconds/sThe time spent doing lookups / second

    ...

    System Performance

    System stats record aspects of system behaviour.

    Stats TypeData TypeDescription
    java.memory.freebytesThe amount of java memory that is free
    java.memory.usedbytesThe amount of java memory that is used
    java.memory.totalbytesThe total amount of java memory
    system.cpubusyThe fraction of the time that the cpu is busy

    ...

    Activity Statistics

    Activity Statistics are recorded from TaskPlans executed, Actions triggered.

    ...

    STATS_TYPE

    ...

    INITIATOR_TYPE

    ...

    INITIATOR_ID

    ...

    TaskPlan/Action ID

    ...

    {Number of tasks started in this timeslot. The timeslot is set in the System Configuration > System Tuning Tab

    Stats DB Logging Seconds field}

    ...

    TaskPlan/Action ID

    ...

    Stats DB Logging Seconds field}

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    Stats DB Logging Seconds field}

    Stream Statistics

    This is a record of data volumes written to Streams and Exporters

    ...

    STATS_TYPE

    ...

    INITIATOR_TYPE

    ...

    INITIATOR_ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    insertVolumeBytes

    (bytes)

    ...

    TaskPlan/Action ID

    ...

    insertTimeTotal

    (seconds)

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    updateVolumeBytes

    (bytes)

    ...

    TaskPlan/Action ID

    ...

    updateTimeTotal

    (seconds)

    ...

    The timeslot is set in the System Configuration > System Tuning Tab

    Stats DB Logging Seconds field

    Pipe Statistics

    This is a record of Pipe Read Statistics

    ...

    STATS_TYPE

    ...

    INITIATOR_TYPE

    ...

    INITIATOR_ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    readTime

    (seconds)

    ...

    TaskPlan/Action ID

    ...

    readTimeMax

    (seconds)

    ...

    TaskPlan/Action ID

    ...

    This is a record of Pipe Queue Statistics

    ...

    STATS_TYPE

    ...

    INITIATOR_TYPE

    ...

    INITIATOR_ID

    ...

    TaskPlan/Action ID

    ...

    sumEnqueueTime

    (seconds)

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    sumDequeueTime

    (seconds)

    ...

    TaskPlan/Action ID

    ...

    This is a record of Pipe Lookup Statistics

    ...

    STATS_TYPE

    ...

    INITIATOR_TYPE

    ...

    INITIATOR_ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    TaskPlan/Action ID

    ...

    Database Statistics Record Structure

    The table structure for STATS is as follows:

    ...

    Member

    ...

    Class

    ...

    Description

    ...

    Statistics Type:

    E.g. 'Activity, Pipe, PipeLookup, PipeQueue, Stream'

    ...

    The object initiating the activity - eg. "TaskPlan", or "Action"

    ...

    The name of the initiating TaskPlan or Action / other object.

    ...

    e.g. The "Streams" ID that Analysis was run on.

    ...

    This a dotted notation indicating the full context.

    eg. TaskPlan1.Stream X.in

    ...

    Work Queues

    PhixFlow maintains a number of queues for asynchronous task work / task processing.

    These statistics record the occupancy levels of these queues, comprising for each queue the number of tasks queued (waiting) + the number running.

    Stats TypeData TypeDescription
    workqueue.cspt.sizetasksThe number of tasks in the CSPT work queue
    workqueue.dgr.sizetasksThe number of tasks in the DGR work queue
    workqueue.pdp.sizetasksThe number of tasks in the PDP work queue
    workqueue.prepare.sizetasksThe number of tasks in the PREPARE work queue
    workqueue.read.sizetasksThe number of tasks in the READ work queue
    workqueue.write.sizetasksThe number of tasks in the WRITE work queue
    workqueue.other.sizetasksThe number of tasks in the OTHER work queue
    workqueue.export.sizetasksThe number of tasks in the EXPORT work queue
    workqueue.view.sizetasksThe number of tasks in the VIEW work queue

    Examples

    These are examples of extracting specific sub-sets of the stats data.

    These examples show the use of direct sql statements to extract useful sub-sets of the stats data, but you could also import all data into a table and analyse it there.

    Monitoring Memory Usage

    For example, to extract Java memory total and free:

    Code Block
    select *
    from stats
    where stats_type in ('java.memory.free', 'java.memory.total')
    order by to_dtm;

    Monitoring Pipe Cache Sizes

    For example, to extract all pipe caches over a certain size:

    Code Block
    select *
    from stats
    where stats_type = 'lookup.size.items' and data_value>1000000
    order by full_context, to_dtm;