Microsoft OneNote
This notebook covers how to load documents from OneNote
.
Prerequisitesโ
- Register an application with the Microsoft identity platform instructions.
- When registration finishes, the Azure portal displays the app
registrationโs Overview pane. You see the Application (client) ID.
Also called the
client ID
, this value uniquely identifies your application in the Microsoft identity platform. - During the steps you will be following at item 1, you can set
the redirect URI as
http://localhost:8000/callback
- During the steps you will be following at item 1, generate a new
password (
client_secret
) underย Application Secretsย section. - Follow the instructions at this
document
to add the following
SCOPES
(Notes.Read
) to your application. - You need to install the msal and bs4 packages using the commands
pip install msal
andpip install beautifulsoup4
. - At the end of the steps you must have the following values:
CLIENT_ID
CLIENT_SECRET
๐ง Instructions for ingesting your documents from OneNoteโ
๐ Authenticationโ
By default, the OneNoteLoader
expects that the values of CLIENT_ID
and CLIENT_SECRET
must be stored as environment variables named
MS_GRAPH_CLIENT_ID
and MS_GRAPH_CLIENT_SECRET
respectively. You
could pass those environment variables through a .env
file at the root
of your application or using the following command in your script.
os.environ['MS_GRAPH_CLIENT_ID'] = "YOUR CLIENT ID"
os.environ['MS_GRAPH_CLIENT_SECRET'] = "YOUR CLIENT SECRET"
This loader uses an authentication called on behalf of a user. It is a 2 step authentication with user consent. When you instantiate the loader, it will call will print a url that the user must visit to give consent to the app on the required permissions. The user must then visit this url and give consent to the application. Then the user must copy the resulting page url and paste it back on the console. The method will then return True if the login attempt was successful.
from langchain.document_loaders.onenote import OneNoteLoader
loader = OneNoteLoader(notebook_name="NOTEBOOK NAME", section_name="SECTION NAME", page_title="PAGE TITLE")
Once the authentication has been done, the loader will store a token
(onenote_graph_token.txt
) at ~/.credentials/
folder. This token
could be used later to authenticate without the copy/paste steps
explained earlier. To use this token for authentication, you need to
change the auth_with_token
parameter to True in the instantiation of
the loader.
from langchain.document_loaders.onenote import OneNoteLoader
loader = OneNoteLoader(notebook_name="NOTEBOOK NAME", section_name="SECTION NAME", page_title="PAGE TITLE", auth_with_token=True)
Alternatively, you can also pass the token directly to the loader. This is useful when you want to authenticate with a token that was generated by another application. For instance, you can use the Microsoft Graph Explorer to generate a token and then pass it to the loader.
from langchain.document_loaders.onenote import OneNoteLoader
loader = OneNoteLoader(notebook_name="NOTEBOOK NAME", section_name="SECTION NAME", page_title="PAGE TITLE", access_token="TOKEN")
๐๏ธ Documents loaderโ
๐ Loading pages from a OneNote Notebookโ
OneNoteLoader
can load pages from OneNote notebooks stored in
OneDrive. You can specify any combination of notebook_name
,
section_name
, page_title
to filter for pages under a specific
notebook, under a specific section, or with a specific title
respectively. For instance, you want to load all pages that are stored
under a section called Recipes
within any of your notebooks OneDrive.
from langchain.document_loaders.onenote import OneNoteLoader
loader = OneNoteLoader(section_name="Recipes", auth_with_token=True)
documents = loader.load()
๐ Loading pages from a list of Page IDsโ
Another possibility is to provide a list of object_ids
for each page
you want to load. For that, you will need to query the Microsoft Graph
API to find
all the documents ID that you are interested in. This
link
provides a list of endpoints that will be helpful to retrieve the
documents ID.
For instance, to retrieve information about all pages that are stored in
your notebooks, you need make a request to:
https://graph.microsoft.com/v1.0/me/onenote/pages
. Once you have the
list of IDs that you are interested in, then you can instantiate the
loader with the following parameters.
from langchain.document_loaders.onenote import OneNoteLoader
loader = OneNoteLoader(object_ids=["ID_1", "ID_2"], auth_with_token=True)
documents = loader.load()