What a busy month so far! In the last 2 weeks, we’ve pushed 250 changes into production. Mostly new features based on your feedback. Keep them coming!
Our Data Scientists and Data Heroes told us they need a better way to collaborate on their insights, and one that works with their existing tools.
You told us that you love Jupyter Notebooks: | So we used ETL tools to build and release a python library to publish your insights directly into your Matatika Mobile workspace |
You said you need to collaborate with your colleagues: | We added comments, likes, and views of datasets to the API and Matatika Mobile |
You said you need a command line to integrate change management and publishing into your DataOps: | So we added a CLI that works in any existing DevOps / DataOps pipeline. |
We’re going to tell you more about each of these features over the coming weeks. For today, let’s get technical and dive into a Jupyter Notebook and publish a dataset.
If this isn’t you, don’t worry if the article stops making sense in a paragraph or two. We’re working towards plain English AI.
We will make Data Heroes of us all in no time!
Imagine you’re an avid Data Scientist, and you’ve been hand hand-crafting some beautiful charts in your Python Jupyter Notebooks. For a start, one of the great things about a Notebook is that you can document your findings and analysis, side by side. Then share. For that reason, it is no surprise they are the data scientists’ favourite tool and there are more than 400,000 Notebooks just like this on Kaggle.com.
Matatika Mobile has that feature.
With just 3 lines of code, including the import, you can publish your datasets to your team in a private workspace.
Prerequisites:
The Matatika Python Library allows a user to programmatically publish a dataset to a workspace, whether that be within a Jupyter Notebook or a Python script.
To install, run:
pip install matatika
To publish a dataset, simply create a new Matatika client object and call the publish method:
from matatika.client import MatatikaClient
# auth_token, endpoint_url, workspace_id and datasets initialisation assumed
matatika = MatatikaClient(auth_token, endpoint_url, workspace_id)
matatika.publish(datasets)
Data must be provided as a dictionary object and must conform to the following specification:
Path | Description |
{dataset-alias} | A workspace unique identifier string that the dataset can be referenced by – multiple datasets can be defined in a single datasets dictionary |
{dataset-alias}.information | The dataset display name |
{dataset-alias}.questions | Questions the dataset might help in answering (interpreted by the Matatika elastic search service, powered by BERT) |
{dataset-alias}.description | A description of the dataset |
{dataset-alias}.rawData | The raw data of the dataset, conforming to the Google Charts specification |
{dataset-alias}.visualisation | The visualisation metadata for the dataset, conforming to the Google Charts specification |
datasets = {
‘planet-orbits’: {
‘information’: 'Planet Orbits in Our Solar System',
‘questions’: 'How many Earth-years does it take for Jupiter to orbit the sun?',
‘description’: '#Planet Orbits\nSun orbit data for all planets within our solar system.\n*Yes, Pluto is included!*',
‘rawData’: '[["Planet", "Orbit Distance (Light-hours)", "Orbit Duration (Earth-years)"],["Mercury", 0.3336, 0.2500],["Venus", 0.6300, 0.5833],["Earth", 0.8708, 1],["Mars", 1.3242, 1.9167],["Jupiter", 4.5287, 11.8333],["Saturn", 8.2997, 29.5000], ["Uranus", 16.7030, 84.0833], ["Neptune", 26.1883, 164.9167], ["Pluto", 33.8475, 248.0833]]',
‘visualisation’: '{"google-chart": {"chartType": "Bar"}}'
}
}
Now head to the Matatika app and you should see your new dataset published in the workspace context!
Once you’re registered, try it out for yourself with our Getting Started Guide.
For more information on how we use ETL Tools to achieve various features and functionalities from a company revenue perspective, see our in-depth piece: Ultimate Guide to ETL Tools for Modern Business Intelligence
Stay up to date with our insights as they become available.