Concepts
An quick walk through of some fundamental Matatika concepts and its usage.
Modern Data Stack
There are many opinions on the “Modern Data Stack” - frankly one size cannot fit both Google scale and a five-person startup. At Matatika, we have selected components that we believe provide you the best combination of Scalability, Performance, Flexibility and lowest total cost-of-ownership. Ultimately, our goal is to provide a modern data stack with No Limits.
Extraction
Meltano is primarily used to manage the data extraction in your Matatika workspace. Matatika are active contributors to Meltano and will continue to invest in other technologies that advance our customers’ ability to implement Data Ops methodologies.
Warehousing
PostgreSQL is the default data storage technology in your Matatika workspace. Data technologies will continue to advance and therefore, we believe it is vital your stack be database agnostic. Currently, SQL with ODBC/JDBC are the most widely adopted business intelligence data interfaces. The Matatika Platform supports any JDBC-compliant database or serverless data warehouses, such as Google BigQuery or AWS Athena.
Transformation
dbt is the default transformation technology in your Matatika workspace. SQL-based transformations in code and testing considerations by design, provide the fundamentals required to support Data Ops methodologies at the transformation layer. The Matatika Platform is able to execute any transformation technology in isolated containers.
Orchestration
Spring Cloud Dataflow is the main orchestration technology in your Matatika workspace. The Matatika Platform takes care of scheduling, log collection, and credential management with isolated containers for all workspace jobs.
Catalog
Your datasets and models are published, indexed and searchable via the Matatika API, CLI, or Configuration Repository. The Matatika Platform provides a personalised feed of insights by scoring your datasets by usage.
Analytics
Jupyter Notebooks and visualisations in ChartJS and Google Charts formats can be published to your workspace. The Matatika dataset format gives you full control of the chart visualisation as code, supporting Data Ops through to the analytics layer of your stack.
Data Ops
Your Matatika workspace is a unique Business Intelligence as Code solution. All artifacts are managed in a Git Configuration Repository with credentials securely stored inside the platform. We enable you to deliver a robust analytics solution without restricting you to any specific Data Ops methodology.