This page is for administrators and application designers who need data on how the system or their application is running.

Overview

The System Configuration option Log Statistics generates processing statistics into a database table. We recommend this option is always ticked because log statistics are very useful:

when troubleshooting problems
when tuning applications or the PhixFlow system to improve performance.

Statistics Table Structure

Processing statistics are generated in a single table, stats.

The table structure for stats is as follows:

Column	Type	Description
from_dtm	Datetime	The start time of the period that this measure refers to
to_dtm	Datetime	The end time of the period that this measure refers to
initiator_type	String	The high-level object initiating the activity - eg. "TaskPlan", or "Action"
initiator_id	Id (String)	The ID of the initiating TaskPlan or Action / other object.
initiator_name	String	The name of the initiating object.
context_type	String	The activity triggered by the initiator. eg "Table"
context_id	Id (String)	e.g. The "Tables" ID that Analysis was run on.
context_name	String	eg. The name of the table that Analysis was run on.`
full_context	String	This a dotted notation indicating the full context. eg. TaskPlan1.TableX.in
stats_type	String	The aspect of the system's behaviour that is measured
data_type	String	The units of the measurement
data_value	Double	The value of the measurement

Statistics Dimensions

The performance statistics are classified by 4 major dimensions:

Initiator: the high-level object initiating the system activity e.g. Task Plan
Context: the low-level object with which the system activity is associated e.g. Pipe
Stats Type: the aspect of system behaviour being measured e.g. database read times
Sample DateTime: Aggregate values apply to a period between a start and end time. Spot values apply to the end time.

Initiator

The initiator is the object representing the user activity that caused the system activity.

Initiators include:

running
- an action
- a task plan
- a table
viewing data for a view
the system- for internal activities and things that can't be allocated to any specific cause e.g. memory and CPU usage.

Where one initiator could include another, the activity is recorded against the initiator closest to the user. For example:

the user runs an action which runs a task plan - the initiator is the action.
the user runs the same task plan directly - the initiator is the task plan

With the exception of system, the initiator is identified by name, type and id.

Context

The context is the most detailed object to which the activity can be attributed.

Contexts include:

table-action
task plan
table
pipe
collectors: file, database or HTTP
exporters: file, database or HTTP
system

With the exception of system, the context is identified by name, full name, type and id.

The full name is a dot-separated list of objects from the highest level to the lowest, ending in the context object.

Stats Type / Data Type

Stats Type is the specific aspect of system behaviour being measured.

Data Type

Data type contains the units of the measurement.

Data types that represent a snapshot value are shown as the plural unit e.g. the Data Type for the amount of Java memory used is 'bytes'.

Data types that represent a rate, or amount of throughput are shown as the average rate per second e.g. the data type for the number of items generated in a table is 'items/s' i.e. items per second. The value is calculated by dividing the total number or amount recorded in the sampling period by the duration of the sampling period in seconds. The rate per second is shown rather than the absolute number so that if the sampling period is changed, the numbers (the rate per second) stay in the same range.

Data types include:

Data Type	Description
activities/s	Activities per second
seconds	A simple time value, e.g. a maximum wait time for a database statement to execute
seconds/s	Seconds per second. Stats that record times cumulative times (e.g. the total of internal wait times) are normalised per second. Where this applies to a single-threaded stats type, the value will always be between 0.0 and 1.0, where a value near 1.0 means that that part of the system is waiting nearly 100% of the time.
items	A number of items (records), e.g. the number of items in a pipe cache.
items/s	The number of items processed per second e.g. the number of items generated per second.
ops/s	A number of operations per second (e.g. database reads)
bytes	A total number of bytes, e.g. the number of bytes of Java memory used.
busy	The fraction of the time that a resource is busy e.g. the CPU utilisation.
tasks	A number of tasks

Stats Type

Each Stats Types record a single type of data. The data type is stored in the data as an indicator of the units of the data value for each stats type.

Activity

Activity stats record information about the high-level activities that the system was performing at any time. In general these are user-level actions e.g. running a Task Plan.

Stats Type	Data Type	Description
activity.start	activities/s	The number of activities per second that started
activity.end	activities/s	The number of activities per second that finished
activity.time	seconds/s	The time spent per second running the activities

Database

Database stats record aggregate values for low-level database operations.

Stats Type	Data Type	Description
data.exec.time	seconds/s	Time spent executing database statements.
data.read.time	seconds/s	Time spent reading from a database.
data.read.ops	ops/s	Number of database read operations per second
data.read.items	items/s	Number of items (records) read from database per second
data.write.time	seconds/s	Time spent writing to a database.
data.write.ops	ops/s	Number of database write operations per second
data.write.items	items/s	Number of items (records) written to database per second

Data Generation

Data Generation stats record details of table data generation.

Stats Type	Data Type	Description
csf.create.time	seconds/s	Time spent creating candidate sets
csf.find.time	seconds/s	Time spent finding data for candidate sets
csf.process.time	seconds/s	Time spent processing candidate sets to create records.
generate.time	items/s	Number of items generated per second
generate.items	seconds/s	Time spent per second generating items

Data Output

These stats record details of the output phase of Data Generation, in which items are written to an output queue so that they can be written out by one or more asynchronous writer processes.

Stats Type	Data Type	Description
output.enqueue.time	seconds/s	Time spent adding generated items to the output queue
output.dequeue.time	seconds/s	Time spent taking generated items from the output queue
output.dequeue.items	items/s	The number of items/s taken from the output queue
output.wait.time	seconds/s	Time spent waiting for the output writer
output.write.time	seconds/s	Time spent writing out items
output.write.items	items/s	The number of items written out / second
output.reject.items	items/s	The number of items rejected / second

Pipes (Pull)

These stats types record various aspects of the behaviour of pull pipes

Stats Type	Data Type	Description
pipe.pull.idle.time	seconds/s	Time spent idle
pipe.pull.prepare.time	seconds/s	Time spent preparing (creating pipe candidate sets)
pipe.pull.prepared.time	seconds/s	Time spent after preparation
pipe.pull.process.time	seconds/s	Time spent creating candidate sets
pipe.pull.processed.time	seconds/s	Time spent after processing
pipe.pull.read.time	seconds/s	Time spent reading
pipe.pull.readresponse.time	seconds/s	Time spent processing read responses
pipe.pull.submitted.time	seconds/s	Time spent submitting read requests

Pipes (Lookups)

These stats types record various aspects of the behaviour of lookup pipes.

Stats Type	Data Type	Description
lookup.added.items	items/s	The number of items added to a pipe cache per second
lookup.clash.ops	ops	The number of lookups where another process was reading data for the same lookup
lookup.miss.ops	ops/s	The number of lookups per second not satisfied by data already in the pipe cache
lookup.removed.items	items/s	The number of items removed from a pipe cache per second
lookup.size.items	items	The number of items in a pipe cache
pipe.lookup.ops	ops/s	The number of lookups / second
pipe.lookup.time	seconds/s	The time spent doing lookups / second

System Performance

System stats record aspects of system behaviour.

Stats Type	Data Type	Description
java.memory.free	bytes	The amount of java memory that is free
java.memory.used	bytes	The amount of java memory that is used
java.memory.total	bytes	The total amount of java memory
system.cpu	busy	The fraction of the time that the cpu is busy

Work Queues

PhixFlow maintains a number of queues for asynchronous task work / task processing.

These statistics record the occupancy levels of these queues, comprising for each queue the number of tasks queued (waiting) + the number running.

Stats Type	Data Type	Description
workqueue.cspt.size	tasks	The number of tasks in the CSPT work queue
workqueue.dgr.size	tasks	The number of tasks in the DGR work queue
workqueue.pdp.size	tasks	The number of tasks in the PDP work queue
workqueue.prepare.size	tasks	The number of tasks in the PREPARE work queue
workqueue.read.size	tasks	The number of tasks in the READ work queue
workqueue.write.size	tasks	The number of tasks in the WRITE work queue
workqueue.other.size	tasks	The number of tasks in the OTHER work queue
workqueue.export.size	tasks	The number of tasks in the EXPORT work queue
workqueue.view.size	tasks	The number of tasks in the VIEW work queue

Examples

These are examples of extracting specific sub-sets of the stats data.

These examples show the use of direct sql statements to extract useful sub-sets of the stats data, but you could also import all data into a table and analyse it there.

Monitoring Memory Usage

For example, to extract Java memory total and free:

select *
from stats
where stats_type in ('java.memory.free', 'java.memory.total')
order by to_dtm;

Monitoring Pipe Cache Sizes

For example, to extract all pipe caches over a certain size:

select *
from stats
where stats_type = 'lookup.size.items' and data_value>1000000
order by full_context, to_dtm;

HELP100

Processing Statistics

Analytics