Adding a Custom Data Source
-Time required: 15 minutes*
Prerequisites
You must have:
- Admin or owner rights of the workspace you want to use.
Introduction
We support custom data sources being set and used in our data imports. You can add your custom data source along with any relevant plugins, like transforms or file bundles all at once.
In this example we will create a custom version of the data source tap-spreadsheets-anywhere
. We already support this data source, so we will have to make changes to the discovery yaml file we use to create this custom data source. Alongside these changes we will also add a related analyze file bundle containing some datasets, so you can immediately see your new custom data source’s data.
Adding Your Custom Data Source
We will be naming our “custom” tap-spreadsheets-anywhere
data source tap-example-custom-data-source
.
To add this custom data source you first need to navigate to the pipelines screen:
- Click on the
Lab
button. - Go to the
Pipelines
page. - Click
+ Import
- Select the
Custom
tab at the top and clickConnect
on the Custom option. - In the pop up window you can select your discovery.yml file (can be any name, just need to be in the correct yaml format), or paste in your plugin definition.
For tap-example-custom-data-source
we will be using:
extractors:
- name: tap-example-custom-data-source
variant: matatika
namespace: tap_example_custom_data_source
pip_url: git+https://github.com/ets/tap-spreadsheets-anywhere.git
executable: tap-spreadsheets-anywhere
capabilities:
- catalog
- discover
- state
settings:
- name: tables
kind: array
files:
- name: analyze-example-custom-data-source
variant: matatika
namespace: tap_example_custom_data_source
update:
analyze/datasets/tap-example-custom-data-source: true
pip_url: git+https://github.com/Matatika/analyze-example-custom-data-source.git
We are including a analyze file bundle that contains Matatika datasets so we can see the data being extracted by the tap in visualisations. By adding this bundle and setting the same namespace we effectively tell our data import that when you configure a tap-example-custom-data-source
data import, you should also add this bundle.
- Click next and you will now be on a screen that expects the settings for your pipeline.
- Expand the
tap-example-custom-data-source
section. - Our custom data source will require a Tables array, we will use:
[{ "path":"https://raw.githubusercontent.com/Matatika/matatika-examples/master/example_adding_a_custom_data_source", "name":"imdb_top_20_films", "pattern":"imdb_top_20_films.csv", "start_date":"2021-01-01T00:00:00Z", "key_properties":["rank"], "format":"csv" }]
- For this example we can leave
Section 2 - Clean, transform and organise
onDefault
. - For this example we can leave
Section 3 - Automate your import
onManual
. - Finally click save
Your new custom data source will be added to your workspace repository, along with any other associated plugins, during a data import config job which when you go back to your data imports screen will be running immediately. This config job will also publish the analyze file bundle’s datasets to your workspace.
Once the config job has completed, you can run your data import by pressing the start job button. (Solid Arrow).
When the data import job has completed, you will be able to see populated datasets in your workspace.
Adding Your Own Custom Data Source
More resources for adding your own custom data source: