Skip to main content

XML

The UnstructuredXMLLoader is used to load XML files. The loader works with .xml files. The page content will be the text extracted from the XML tags.

from langchain.document_loaders import UnstructuredXMLLoader
loader = UnstructuredXMLLoader(
"example_data/factbook.xml",
)
docs = loader.load()
docs[0]
Document(page_content='United States\n\nWashington, DC\n\nJoe Biden\n\nBaseball\n\nCanada\n\nOttawa\n\nJustin Trudeau\n\nHockey\n\nFrance\n\nParis\n\nEmmanuel Macron\n\nSoccer\n\nTrinidad & Tobado\n\nPort of Spain\n\nKeith Rowley\n\nTrack & Field', metadata={'source': 'example_data/factbook.xml'})