This page is for a data modeller who needs to load data from an external source via HTTP.
Overview
An HTTP
collector reads data from a HTTP Datasource. The collector defines how the data needed from the datasource
is extracted to be used in PhixFlow.
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
To add a new HTTP collector to an analysis model:
- Go to the model's toolbar → Create group.
- Click
to expand the menu.Insert excerpt _http _http nopanel true - Drag a
onto the analysis model.Insert excerpt _http_collector _http_collector nopanel true
To add an existing HTTP collector to an analysis model, in the model diagram toolbar:
- Go to the model toolbar → List group.
- Click
to expand the menu.Insert excerpt _http _http nopanel true - Click
Insert excerpt _http_collector _http_collector nopanel true
- to open the list of available collectors.
- Drag an HTTP collector into the analysis model.
Table Values in a HTTP Collector
To drive the lookups made by a HTTP
collector from a
table, the two must be connected using a lookup pipe. For example
, a URL for a server
can either be captured
or calculated
in an attribute called "ServerURL"
. The URL is then passed via a lookup pike to the HTTP
collector to be used in its URL Expression
.
If the pipe is called in
, here is how the URL Expression would be written on the HTTP
HTTP Collector Properties
collector: {substring(in.ServerUrl,9)}
Panel | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
|
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|
Insert excerpt | ||
---|---|---|
|
|
|
Basic Settings
Field | Description |
---|---|
Name |
Enter the name of the HTTP |
collector. |
Enabled |
| ||||||||
HTTP |
Datasource |
Select the HTTP datasource that this collector will collect from. For how to add a new one, see HTTP Datasource Properties. To select from a list, click
| ||||||||||
HTTP Request Method |
| |||||||||
Icon |
Enter the name of an icon to display in controls when this HTTP collector is used. | |
Timeout (secs) | The number of seconds to wait for a response from the corresponding HTTP datasource before a timeout is recorded. |
Allow Non-Scheduled Collection | If this is turned on, then the collector will run as part of any ad-hoc Analysis Engine run which requires this data. If not, it will only run as part of a scheduled Task Plan under the Analysis Engine. |
Datasource Instance Expression | The datasource to which this collector is connected may list multiple instances from which the data may be accessed. Each HTTP |
datasource instance is identified by a unique string. This expression should evaluate to a string which allows the collector to determine the specific instance to use. If the expression is blank then the collector will assume that there is only one instance and will use that one by default. If there is more than one instance and no expression is provided here then an error will be thrown during analysis since the collector will be unable to determine which source to use. See also HTTP datasource instance and Expressions and PhixScripts. |
Send Message
Define details of the HTTP request sent to the HTTP Datasource to get the data required.
Field | Description |
---|
URL Expression | The URL to be used, without the leading http:// prefix. The URL may contain embedded expressions within { }. If this field is blank, the |
URL field on the httpDatasourceInstance is used directly. For Example, this expression adds to the base |
URL provided by the HTTP |
datasource instance: The HTTP Collector will follow any HTTP redirections and return the final response. See also HTTP datasource instance and Expressions and PhixScripts. | |
Statement Expression | An expression to generate the data that will be sent by the exporter to the datasource. For Example
The username and password for the HTTP Datasource Instance are available as The data will be encoded using the charset parameter specified by the Content-Type Header if one is present. If no Content-Type Header is set then ISO-8859-1 will be used. If the Content-Type header is set, but does not specify a charset then PhixFlow will use a default character set dependant on the content type. |
HTTP Headers
This section has a toolbar with standard buttons. The grid contains a list of the HTTP headers defined for this collector. To add a HTTP header to the list, click
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Some headers will be set to default values if not provided. Automatically added headers may not appear in the debug log. The Content-Length header will be added to all requests and cannot be overridden by providing a value.
Response
Define the data response type/format that will be returned:
- HTML: response type allows an XPath Expression to be specified in order to retrieve just specified sections of the data into XML structures.
- XML: response type allows an XPath Expression to be specified in order to retrieve just specified sections of the data into XML structures. XML response types also support XML namespaces. The Xml Namspaces tab will be available when this response type is chosen.
- String: response type will return the full data as a string value.
Please see Response Examples for how the returned data can be used and evaluated in the corresponding stream attribute expressions.
Field | Description |
---|
Return Type | The type of the expected response : XML/HTTP/String |
XPath | The XPath expression used to resolve or filter the data that comes back in XML or HTML format. |
|
XML Namespaces
This section has a toolbar with standard buttons. The grid contains a list of the namespaces defined in an XML response.
To add a namespace to the list, click
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
panel. To remove a namespace, use the
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Advanced
Field | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
Log Traffic |
|
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|
XML Data
<?xml version ="1.0"?> <root xmlns:h="http://www.w3.org/TR/html4/"> <main page="PF Main Page" > <h:title h:name="PF Title">PF Title Text <h:datarow> <h:data h:initials="AA">Alistair Andrews</h:data> <h:data h:initials="BB">Bert Brown</h:data> </h:datarow> </h:title> <title name="Non namespace Title">Non namespace Title Text</title> </main> </root>
HTML Data
<html> <body nodename="Html Body"> <table> <tbody> <tr> <td initials="AA">
Alistair
Andrews</td> <td initials="BB">Bert Brown</td> </tr> </tbody> </table> </body> </html>
The data is being pointed to by either HTTP datasources or XML/HTML File collectors respectively.
The following table shows the different types of responses that can be returned from an HTTP Collector and how these can be used in the corresponding stream attribute expressions. A HTTP Collector response type of XML/HTML will mimic the responses from XML/HTML Collectors respectively.
Response Type | XPath Expression | Explanation | ||
---|---|---|---|---|
String | n/a | A String response should be referenced in the stream attribute expressions as in.value Note that in.value will contain the complete string data referenced above. | ||
XML | /root/main/h:title
| This XPath expression will bring back all elements matching the xpath expression including the parent/grandparents and all child elements/subelements. i.e
Note the use of
| ||
HTML | /html/body/table
| This XPath expression will bring back all elements matching the xpath expression including the parent/grandparents and all child elements/subelements. i.e
Note the use of:
|
Advanced
Field | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Log Traffic |
|