Beautiful Soup
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML,[3] which is useful for web scraping.
Installation and Setupβ
pip install beautifulsoup4
Document Transformerβ
See a usage example.
from langchain.document_loaders import BeautifulSoupTransformer