/
Importing Data Through an API

Importing Data Through an API

Overview

PhixFlow allows you to integrate with various APIs to import data. To achieve this you will require three elements.

  1. HTTP Datasourceprovide the information needed to connect to an external source of data via HTTP. 
  2. HTTP Collectorreads data from the HTTP Datasource. It defines how the data from the datasource is extracted and makes the API call to perform the extraction.
  3. Table: Stores the data from the HTTP Collector.

Example

We will connect to the UK Government bank holiday API and return the bank holiday dates for each country. See Bank Holidays for more detail on the API.

Solution

 HTTP Datasource

  1. Drag a HTTP Datasource from the toolbar and drop it on the analysis canvas.
  2. In the properties window set the following:
    1. Name, Set a name indicative of the API.
    2. Enabled, Tick this option. 
    3. Connection Type, Set this to that of the API, in our example we will use HTTPS.
    4. HTTP Datasource Instances, Set the instance connection details. Add a new HTTP Datasource Instance and complete the details as follows:
      1. Name, Indicative of the use of the instance.
      2. Enabled, Tick to use the instance.
      3. Login details can be set if required. see HTTP Datasource for more information.
      4. URL, The URL where the API can be found. For the Gov Bank Holiday API we set this to: www.gov.uk/bank-holidays.json
      5. Click  Apply and Close.
    5. Click  Apply and Close.

 HTTP Collector

  1. Hover your mouse over the HTTP Datasource created in the stage above.
  2. From the popup menu select  HTTP Collector. This adds a new HTTP Collector and sets it up to use our HTTP Datasource.
    1. You can drag a new HTTP Collector from the toolbar, but you will need to connect it to the HTTP Datasource.
  3. In the properties window that opens on the right set the following:
    1. Name, Set a name indicative of the data being collected. 
    2. Enabled, Tick this to use the collector.
    3. HTTP Request Method, This defaults to GET or POST. For our example that is correct. However see HTTP Collector for more information, if your API requires something different.
    4. URL Expression, we can reference the URL from the http datasource using the syntax ${_url}. This can be useful if you have a base url in your HTTP datasource, e.g. www.phixflow.com, and you want to append to it in different collectors e.g. ${_url}/myPage1
      1. To reference any of the attributes from the table calling the http collector we use ${_out.attributeName}. We encapsulate the attributes with ${}.
      2. For example, ${_out.token} 
    5. Statement Expression, this is not required by our example. If your API requires interaction such as when we are requesting a session token, it can be carried our in this statement expression.
    6. HTTP Headers, contains information about the request being sent to the API. This is not required for our example. An example header setup is illustrated below, for full details see HTTP Header
    7. Response → Return Type, In the response section set Return Type to JSON. Other types of data can be returned, see HTTP Collector.
    8. ResponsePath, specifies the data in the JSON you want to return. Most APIs will specify the structure of the data returned, the path is used to filter what is returned. The response utilises xPath Syntax, see HTTP Collector for syntax and further examples.
      1. In our example we only want the bank holidays for each country so we set the value to:

        $..events

        This gives us all countries as we have specified .. which acts as a wild card to for selecting all values.

        If we just wanted England and Wales we would specify: $.england-and-wales.events

 Table

  1. Hover your mouse over the HTTP Collector created in the stage above.
  2. From the popup menu select  Create New Table 
    1. You can drag a new Table from the toolbar, but you will need to connect it to the HTTP collector.
  3. In the properties window that opens on the right set the following:
    1. Name, Set a name indicative of the data being collected. 
    2. Attributes, Add the attributes you require from your collected data:
      1. Name to Country. Expression to in.^.^.division 
      2. Name to Title. Expression to in.title
      3. Name to Date. Expression to toDate(in.date, "yyyy-MM-dd")
    3. You can navigate up the JSON nodes using the hat symbol ^. This is seen in the Country attribute above: in.^.^.division
    4. You can descend the JSON nodes using ..  as a wild card to include all nodes at that level. This is seen in the Response Path of the HTTP Collector to include all countries: $..events
      1. It is also possible to use the specific node name to descend into a specific node in the JSON.

Secured Example

Where we need to use sensitive information, such as a password or token, these can be held in Secret Keys and access using the prefix _datasource.

Secret Key Setup

  1. Open the HTTP Datasource properties,
  2. Navigate to the Secret Keys section,
  3. Add your sensitive information,
    1. Name, name the secret key will be referenced by. For example clientSecret.
    2. Secret, the secret information to be stored in an encrypted format.
  4. Save all of your changes.
  5. The secret key can now be referenced by HTTP Collectors attached to the HTTP Datasource. It is available to use in the URL Expression and Statement Expression.
  6. To reference the secret key use the syntax: _datasource.secretname.
    1. For example, _datasource.clientSecret
client_id=client_secret=${_datasource.clientSecret}&resource=phixflow.com&grant_type=client_credentials

Troubleshooting

If you return 0 records

  1. Open the System Console
  2. In the Completed Tasks, click on the table that ran
  3. In the Messages section,  double-click the line with the message "Response from URL:..."
  4. In the window that opens, click the Message Details tab
  5. The raw data is displayed that is returned from the API.
    1. Ensure your Response Path is set correctly to traverse to the required data.

Diagnosing Issues

PhixFlow can log the traffic sent to and from an API, this can be helpful as it lets you see the actual communication data such as the Statement Expression values or the Responses from the API.

To enable logging:

  1. Click Administration Menu → System → Logging,
  2. Navigate down to Collector/Exporter Logging and tick
    1. Log HTTP Collector Connection Details
    2. Log HTTP Exporter Connection Details
  3. This is illustrated below:
  4. Once you have resolved your issue, turn these settings back off to avoid filling your logs with unnecessary information.
  5. To view the results open the System Console from the top right corner of PhixFlow ( System Console).
  6. In the Completed Task section, click on your Task.
  7. The Messages section on the right will now contain the logging for your activity.
  8. Double-click the messages to open them, in the window that opens also click on the Message Detail tab to see detailed information. Below is an example, here we see the response form the JSON response from the API:

More Information

For more information about the configuration options surrounding HTTP Datasources and Collectors see the following pages: