Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

PhixFlow generates processing statistics which are saved to the database. These statics are for use by administrators and designers who are tuning for performance.

See System Configuration for how to enable and manage stats generation.

Processing Statistics

The following statistics are currently recorded and available as data records in the STATS table. (See STATS record definition)

The performance statistics are classified by 3 major dimensions:

Initiator

The initiator is the object representing the user activity that caused the system activity.

Initiators include

  • (Running an) Action
  • (Running a) Task Plan
  • (Running a) Stream
  • (Viewing data for a) Stream View
  • System, for internal activities and things that can't be allocated to any specific cause e.g. memory and CPU usage.

Where one initiator could include another, the activity is recorded against the initiator closest to the user

e.g. the user runs an Action which runs a Task Plan - the initiator is the Action.

e.g. the user runs the same Task Plan directly - the initiator is the Task Plan

With the exception of System, the initiator is identified by name, type and id.

Context

The context is the most detailed object to which the activity can be attributed.

Contexts include

  • Stream Action
  • Task Plan
  • Stream
  • Pipe
  • File / Database / Http Collector
  • File / Database / Http Exporter
  • System

With the exception of System, the context is identified by name, full name, type and id.

The full name is a dot-separated list of objects from the highest level to the lowest, ending in the context object.

Stats Type / Data Type

Stats Type is the specific aspect of system behaviour being measured.

Data Type

Data Type contains the units of the measurement.

Data types that represent a snapshot value are shown as the plural unit e.g. the Data Type for the amount of Java memory used is 'bytes'.

Data Types that represent a rate, or amount of throughput are shown as the average rate per second e.g. the data type for the number of items generated in a stream is 'items/s' i.e. items per second. The value is calculated by dividing the total number or amount recorded in the sampling period and dividing it by the duration of the sampling period in seconds. The rate per second is shown rather than the absolute number so that if the sampling period is changed, the numbers (the rate per second) stay the same

Data Types include

Data TypeDescription
activities/sActivities per second
secondsA simple time value, e.g. a maximum wait time for a database statement to execute
seconds/s

Seconds per second. Stats that record times cumulative times (e.g. the total of internal wait times) are normalised per second.

Where this applies to a single-threaded stats type, the value will always be between 0.0 and 1.0, where a value near 1.0 means that that part of the system is waiting nearly 100% of the time.

itemsA number of items (records), e.g. the number of items in a pipe cache.
items/sThe number of items processed per second e.g. the number of items generated per second.
ops/sA number of operations per second (e.g. databaes reads)
bytesA total number of bytes, e.g. the number of bytes of Java memory used.
busyThe fraction of the time that a resource is busy e.g. the CPU utilisation.
taskA number of tasks

Stats Type

Each Stats Types record a single type of data. The data type is stored in the data as an indicator of the units of the data value for each stats type.

Activity

Activity stats record information about the high-level activities that the system was performing at any time. In general these are user-level actions e.g. running a Task Plan.

Stats TypeData TypeDescription
activity.startactivities/sThe number of activities that started
activity.endactivities/sThe number of activities that finished
activity.timeseconds/sThe time spent running the activities

System Performance

System stats record aspects of system behaviour.

Stats TypeData TypeDescription
java.memory.freebytesThe amount of java memory that is free
java.memory.usedbytesThe amount of java memory that is used
java.memory.totalbytesThe total amount of java memory
system.cpubusyThe fraction of the time that the cpu is busy

Database

Database stats record aggregate values for low-level database operations.

Stats TypeData TypeDescription
data.exec.timeseconds/sTime spent executing database statements.
data.read.timeseconds/sTime spent reading from a database.
data.read.opsops/sNumber of database read operations / second
data.read.itemsitems/sNumber of items (records) read from database / second
data.write.timeseconds/sTime spent writing to a database.
data.write.opsops/sNumber of database write operations / second
data.write.itemsitems/sNumber of items (records) written to database / second

Data Generation

Data Generation stats record details of stream data generation.

Stats TypeData TypeDescription
csf.create.timeseconds/sTime spent creating candidate sets
csf.find.timeseconds/sTime spent finding data for candidate sets
csf.process.timeseconds/sTime spent processing candidate sets to create stream items.
generate.timeitems/sNumber of items generated / second
generate.itemsseconds/sTime spent per second generating items 

Data Output

Data Output stats record details of the output phase of Data Gneration, where items are written to an output queue so that they can be written out by one or more writers.

Stats TypeData TypeDescription
output.enqueue.timeseconds/sTime spent adding generated items to the output queue
output.dequeue.timeseconds/sTime spent taking generated items from the output queue
output.dequeue.itemsitems/sThe number of items/s taken from the output queue
output.wait.timeseconds/sTime spent waiting for the output writer
output.write.timeseconds/sTime spent writing out items
output.write.itemsitems/sThe number of items written out / second
output.reject.itemsitems/sThe number of items rejected / second

Pipes (Pull)

These stats types record various aspects of the behaviour of pull pipes

Stats TypeData TypeDescription
pipe.pull.idle.timeseconds/sTime spent idle
pipe.pull.prepare.timeseconds/sTime spent preparing (creating pipe candidate sets)
pipe.pull.prepared.timeseconds/sTime spent after preparation
pipe.pull.process.timeseconds/sTime spent creating candidate sets
pipe.pull.processed.timeseconds/sTime spent after processing
pipe.pull.read.timeseconds/sTime spent reading
pipe.pull.readresponse.timeseconds/sTime spent processing read responses
pipe.pull.submitted.timeseconds/sTime spent submitting read requests

Pipes (Lookups)

These stats types record various aspects of the behaviour of lookup pipes.

Stats TypeData TypeDescription
lookup.added.itemsitems/sThe number of items added to a pipe cache per second
lookup.clash.opsopsThe number of lookups where another process was reading data for the same lookup
lookup.miss.opsops/sThe number of lookups per second not satisfied by data already in the pipe cache
lookup.removed.itemsitems/sThe number of items removed from a pipe cache per second
lookup.size.itemsitemsThe number of items in a pipe cache
pipe.lookup.opsops/sThe number of lookups / second
pipe.lookup.timeseconds/sThe time spent doing lookups / second

Work Queues

PhixFlow maintains a number of queues for asynchronous task work / task processing.

These statistics record the occupancy levels of these queues, comprising for each queue the number of tasks queued (waiting) + the number running.

Stats TypeData TypeDescription
workqueue.cspt.sizetasksThe number of tasks in the CSPT work queue
workqueue.dgr.sizetasksThe number of tasks in the DGR work queue
workqueue.pdp.sizetasksThe number of tasks in the PDP work queue
workqueue.prepare.sizetasksThe number of tasks in the PREPARE work queue
workqueue.read.sizetasksThe number of tasks in the READ work queue
workqueue.write.sizetasksThe number of tasks in the WRITE work queue
workqueue.other.sizetasksThe number of tasks in the OTHER work queue
workqueue.export.sizetasksThe number of tasks in the EXPORT work queue
workqueue.view.sizetasksThe number of tasks in the VIEW work queue





kup Statistics

STATS_TYPE
INITIATOR_TYPE
INITIATOR_ID
INITIATOR_NAMECONTEXT_TYPECONTEXT_IDCONTEXT_NAMEFULL_CONTEXTDATA_TYPEDATA_VALUE
PipeTaskPlan, Action

TaskPlan/Action ID

TaskPlan/Action namePipePipe IDPipe NameTaskPlan/Action Name.Stream Name.pipe NamecacheSize{Size of lookup pipe cache}
PipeTaskPlan, Action

TaskPlan/Action ID

TaskPlan/Action namePipePipe IDPipe NameTaskPlan/Action Name.Stream Name.pipe NameaddedToCache{Lookup pipe added to cache}
PipeTaskPlan, Action

TaskPlan/Action ID

TaskPlan/Action namePipePipe IDPipe NameTaskPlan/Action Name.Stream Name.pipe NameremovedFromCache{Lookup pipe removed from cache}
PipeTaskPlan, Action

TaskPlan/Action ID

TaskPlan/Action namePipePipe IDPipe NameTaskPlan/Action Name.Stream Name.pipe NamecacheHits{Count a read done from cache}
PipeTaskPlan, Action

TaskPlan/Action ID

TaskPlan/Action namePipePipe IDPipe NameTaskPlan/Action Name.Stream Name.pipe NamecacheMisses{Count a read not yet from the cache}
PipeTaskPlan, Action

TaskPlan/Action ID

TaskPlan/Action namePipePipe IDPipe NameTaskPlan/Action Name.Stream Name.pipe NamecacheParallelMisses{Total reads from cache by two simulatneous processes}

Database Statistics Record Structure

The table structure for STATS is as follows:

Member
Class
Description
stats_typeenum

Statistics Type:

E.g. 'Activity, Pipe, PipeLookup, PipeQueue, Stream'

fromDtmDatetimeThe start time of the period that this measure refers to
toDtmDatetimeThe end time of the period that this measure refers to
initiator_typeString

The object initiating the activity - eg. "TaskPlan", or "Action"

initiator_idUidThe ID of the initiating TaskPlan or Action / other object.
initiator_nameString

The name of the initiating TaskPlan or Action / other object.

context_typeStringThe activity triggered by the initiator. eg "Stream"
context_idUid

e.g. The "Streams" ID that Analysis was run on.

context_nameStringeg. The name of the stream that Analysis was run on.`
full_contextString

This a dotted notation indicating the full context.

eg. TaskPlan1.Stream X.in

data_typeStringThe type of statistics data being recorded for this stats type. Eg. totalRecordsWritten, totalWriteTime.
data_valueDouble

The value of the statistic being recorded for the given data_type.

  • No labels