BeautifulSoup data into your data warehouse in minutes

Collect BeautifulSoup data into your data warehouse or ours. The Matatika pipelines will take care of the data collection and preparation for your analytics and BI tools.

Learn more Schedule a demo

Automate BeautifulSoup from a single space with no code

Python library for pulling data out of HTML and XML files.

Settings

Download Recursively

Attempt to download all pages recursively into the output directory prior to parsing files. Set this to False if you've previously run wget -r -A. Html https://sdk.meltano.com/en/latest/

Exclude tags

List of tags to exclude before extracting text content of the page.

Find All Kwargs

This dict contains all the kwargs that should be passed to the find_all call in order to extract text from the pages.

Flattening Enabled

'True' to enable schema flattening and automatically expand nested properties.

Flattening Max Depth

The max depth to flatten schemas.

Output Folder

The file path of where to write the intermediate downloaded HTML files to.

Parser

The BeautifulSoup parser to use.

Site URL

The site you'd like to scrape. The tap will download all pages recursively into the output directory prior to parsing files.

Source Name

The name of the source you're scraping. This will be used as the stream name.

Stream Map Config

User-defined config values to be used within map expressions.

Stream Maps

Config object for stream maps capability. For more information check out Stream Maps.

View source code

BeautifulSoup data you can trust

Extract, Transform, and Load BeautifulSoup data into your data warehouse or ours.

Interested in learning more?

Get in touch

By industry

By technology

DATA STRATEGY INSIGHT

Platform

Apps

Are You Overpaying for Data Management?

BeautifulSoup data into your data warehouse in minutes

Automate BeautifulSoup from a single space with no code

Settings

Download Recursively

Exclude tags

Find All Kwargs

Flattening Enabled

Flattening Max Depth

Output Folder

Parser

Site URL

Source Name

Stream Map Config

Stream Maps

BeautifulSoup data you can trust

Interested in learning more?