BeautifulSoup Connect

BeautifulSoup data into your data warehouse in minutes

Collect BeautifulSoup data into your data warehouse or ours. The Matatika pipelines will take care of the data collection and preparation for your analytics and BI tools.

Automate BeautifulSoup from a single space with no code

Python library for pulling data out of HTML and XML files.

Settings

Download Recursively

Attempt to download all pages recursively into the output directory prior to parsing files. Set this to False if you've previously run wget -r -A. Html https://sdk.meltano.com/en/latest/

Exclude tags

List of tags to exclude before extracting text content of the page.

Find All Kwargs

This dict contains all the kwargs that should be passed to the find_all call in order to extract text from the pages.

Flattening Enabled

'True' to enable schema flattening and automatically expand nested properties.

Flattening Max Depth

The max depth to flatten schemas.

Output Folder

The file path of where to write the intermediate downloaded HTML files to.

Parser

The BeautifulSoup parser to use.

Site URL

The site you'd like to scrape. The tap will download all pages recursively into the output directory prior to parsing files.

Source Name

The name of the source you're scraping. This will be used as the stream name.

Stream Map Config

User-defined config values to be used within map expressions.

Stream Maps

Config object for stream maps capability. For more information check out Stream Maps.


View source code

BeautifulSoup data you can trust

Extract, Transform, and Load BeautifulSoup data into your data warehouse or ours.

Interested in learning more?

Get in touch