Categories
Uncategorized

How to Automatically Transcribe a Notion MP3 File

Time is super valuable. If we can save ourselves a few minutes or hours each day from having to do repetitive tasks, we can spend that saved time on activities we love. As programmers, we can automate some of these repetitive tasks and get back to living the life we want.

Let’s look at creating an application to do just that. Let’s create an app in Python that will connect the popular productivity tool, Notion, with the up-and-coming Natural Language Processing (NLP) Package Manager, Steamship. Our app will take an mp3 file we place in Notion, run it through the Steamship audio-markdown bundle and return the markdown to the same page as the initial mp3 file. Check out the GitHub here.

In this post we’ll cover:

  • An Introduction to Notin
  • How to Programmatically Work with Notion
    • Activating Your Notion API
    • Retrieving Audio Data with the Notion API
    • Returning the Transcribed Text to Notion
  • An Introduction to Steamship, the NLP Package that Manages Cloud Infra for us
  • Creating a Steamship Package to Transcribe an MP3 File in a Notion Block
    • Transcribing our Audio File
    • Exposing the API Endpoint
  • A Summary of How to Automatically Transcribe MP3 Files in Notion

What is Notion

Notion is a project management software platform designed to help organizations manage their efficiency and productivity. You can configure the pages to hold whatever info you need. Each page can have as many “blocks” as you need to store the data in an organized fashion.

A block can be a table, a paragraph of text, an image, or any of the many content types Notion accepts. In our example, we create a block that holds an mp3 file. Then, we connect our Notion block to the Steamship audio-markdown bundle to transcribe the audio.

Notion Related Functions

The first set of functions we’ll make are related to Notion. The notion.py file contains code to retrieve and return the mp3 file data we have placed in our Notion workspace. In the following photo, you can see our workspace and the mp3 file we have saved. The name of the mp3 file is cows_crows.mp3. This file contains a recording of me saying, “Cows can’t catch crows.” 

Add an audio file block to Notion

Activating Your Notion API

To get started with the Notion API, go to their Developer Portal and Get Started with the API. From there, simply create your API integration and copy your API key. Save your Notion API key because we will use it later to connect Notion and Steamship.

Go to the Notion Developers Homepage and “Get Started”

Adjust your API settings as shown below – it should be able to read and update content.

Create a new Notion Integration and copy the secret
Check boxes for the necessary permissions

Before working with the API on the page, you need to go to the upper right-hand corner of your Notion page and click the three dots. The drop-down menu should look like the one shown in the image below. Click the “More” button to add the Notion Integration you created earlier. In this case, I am calling mine “Testing with Steamship”.

Once you have your app integration set up and you have copied your API key, it can be included in the header code. The header also states the current version of Notion and that the content type both expected and accepted will be of the json format.

Retrieving Audio Data with the Notion API

We start the Python script by importing the necessary libraries: Dict, requests, and json. We use Dict to specify the expected type of the key and value pairs passed into a dictionary, requests to send URL requests and json to parse the JSON responses.

Next, we write a function called notion_headers to import the essential header information for Notion. For proper app connection, we need to use the API key Notion provides for our developer app. This function takes one parameter, our API key, and returns a dictionary, the necessary headers.

Next, we create a function to retrieve the path that houses our mp3 file. According to the Notion developer API documentation, we can use their provided endpoint to connect directly to our workspace. We use the requests library imported earlier to send an HTTP GET request to the Notion endpoint to return our data in a json format. 

The notion_get function takes two parameters, a path to a block and our API key. The first thing we do in our function is generate the URL that we can get our block from via the Notion API. Next, we send a request and pass the headers we created earlier with the API key for authentication. Before returning the response, we convert it into JSON format. In this example, we also print out the response text and the JSON formatted response text for clarity and debugging.

from typing import Dict
import requests
import json

def notion_headers(api_key: str) -> dict:
    return {
        'authorization': f"Bearer {<YOUR API KEY HERE>}",
        'Content-Type': 'application/json',
        'Notion-Version': '2022-06-28',
        "accept": "application/json",
    }

def notion_get(path: str, api_key: str):
    url = f"https://api.notion.com/v1/{path}"
    response = requests.get(url, headers=notion_headers(api_key))
    print(response.text)
    res_json = json.loads(response.text)
    print(res_json)
    return res_json

The first two functions will execute in the beginning of our API script, retrieving the initial mp3 file data for transcribing. The following two functions will execute at the end of our application to return the audio markdown to Notion. We write them in the same Python script file due to the benefits of modular programming. Now let’s take a look at the functions that return the markdown file to Notion.

Returning the Transcribed Text to Notion

We write a function called notion_patch() with the path of the mp3 audio block, the JSON formatted content, and the API key as parameters. It invokes the header information as well as an HTTP requests patch method to connect our JSON data to the Notion API endpoint. The Notion API documentation outlines how to properly define their block object types. The add_markdown() function takes the ID of the Notion page, the markdown transcription, and the API key as parameters. This function mimics their documentation and defines our specific paragraph block object content of audio markdown text data. This patch request will add a new block to our Notion workspace, which includes our audio markdown.

def notion_patch(path: str, content: Dict, api_key: str):
    url = f"https://api.notion.com/v1/{path}"
    response = requests.patch(url, json=content, headers=notion_headers(api_key))
    res_json = response.json()
    return res_json
 
def add_markdown(page_id: str, markdown: str, api_key: str):
    add_text_block = {
        "children":
            [{
                "object": "block",
                "type": "paragraph",
                "paragraph": {
                    "rich_text": [{
                        "type": "text",
                        "text": {
                            "content": markdown,
                        }
                    }]
                }
            }]
    }
    return notion_patch(f"blocks/{page_id}/children", add_text_block, api_key)

What is Steamship

Steamship is a cutting-edge Natural Language Processing software built to let you add language AI to your software quickly and easily. The Steamship packages import into your code like standard Python modules, but they run on their own auto-managed stack in the cloud. And because they run in the cloud on an auto-managed stack designed for Natural Language Processing, you can quickly scale as many separate instances as you need without ever managing a heavy infrastructure.

Our application will blend these two apps together seamlessly and get us away from repetitive tasks and back to living our lives. To get started working with Steamship, we install the Steamship CLI and Python Library.

Creating a Steamship Package to Transcribe an MP3 File in a Notion Block

Once we have installed the Steamship CLI and Python library we can get started creating an automated way to transcribe audio files with Steamship. The first step is to create a Steamship package. Once we have created a package, we’ll see a folder structure like the one below. 

The main changes we’ve made from the default package lie in the /src folder. Our /src folder houses four important files: init.py, api.py, notion.py, and a transcribe.py file. While we could store all the functions in one file, it is best practice to create modular code and split all the functions into aptly named files. The image below shows what our examples, src, and tests folders look like.

Steamship Package Structure

We use the __init__.py file to mark a directory as a Python package. Each directory containing required imported code as a Python package must contain an __init__.py. An __init__.py file can be empty, and in our case, it is. For our application, this file has to be empty. We do not have any specific use cases that require us to include code in that file. An empty __init__.py file is usually the default. 

Transcribing our Audio File

The audio markdown package is one of the four current bundles offered by Steamship. They provide great documentation and an up-to-date Github where they have written code we can place directly in our application. As before, we always start with importing the modules needed.

We complete the Python script by defining our function called transcribe_audio. This function takes two parameters, the URL to the audio file and a Steamship object. We start by using Steamship to spin up an instance of the audio-markdown package. Then we invoke the instance to transcribe our file.

The invoked code will execute on the Steamship NLP designated cloud infrastructure and return our audio markdown. If the request is unsuccessful, we’ll get a  status message that tells us  why. The code also limits the number of retries to 100 times to prevent the program from hanging. It’s also important to have the break clause in our while loops, so they don’t run infinitely and possibly crash our machine.

import time
from steamship import Steamship, TaskState

def transcribe_audio(audio_url: str, ship: Steamship):
    instance = ship.use("audio-markdown", "audio-markdown-crows")
    transcribe_task = instance.invoke("transcribe_url", url=audio_url)
    task_id = transcribe_task["task_id"]
    status = transcribe_task["status"]
    # Wait for completion
        retries = 0
        while retries <= 100 and status != TaskState.succeeded:
            response = instance.invoke("get_markdown", task_id=task_id)
            status = response["status"]
            if status == TaskState.failed:
                print(f"[FAILED] {response}['status_message']")
                break
 
            print(f"[Try {retries}] Transcription {status}.")
            if status == TaskState.succeeded:
                break
            time.sleep(2)
            retries += 1
 
        # Get Markdown
        markdown = response["markdown"]
        return markdown

Exposing our API Endpoint

In our api.py Python file, the code blends the sub-programs we created together. We create classes to import each of the functions created above into their respective places to ultimately return our markdown file. Start by importing the modules needed to run the script. We also import the functions from our other Python scripts by their name. 

from typing import Type
from steamship.invocable import Config, create_handler, post, PackageService
 
from notion import notion_get, add_markdown
from transcribe import transcribe_audio

We define a class called NotionAutoTranscribeConfig() that stores the security API key we received earlier for the Notion API integration. This is a required configuration function to connect to our Notion integration app. 

Next we write the NotionAutoTranscribe() class. The class object merges the Python scripts we wrote earlier. Our class object invokes the required API key from our config class object to access the Notion integration. It then executes the notion_get() function we wrote to retrieve the path to our mp3 file block. 

After parsing the returned information to discern the path-specified url to the mp3 audio data, the transcribe_audio() function calls Steamship to transcribe the given audio data. The transcribed text, “cows can’t catch crows”, prints to a new Notion block object using our add_markdown() function.

class NotionAutoTranscribeConfig(Config):
    """Config object containing required parameters to initialize a NotionAutoTranscribe instance."""
 
    notion_key: str  # Required
 
 
class NotionAutoTranscribe(PackageService):
    """Example steamship Package."""
 
    config: NotionAutoTranscribeConfig
 
    def config_cls(self) -> Type[Config]:
        """Return the Configuration class."""
        return NotionAutoTranscribeConfig
 
    @post("transcribe")
    def transcribe(self, url: str = None) -> str:
        """Transcribe the audio in the first Notion block of the page at `url` and append to the page.
 
        This uses the API Key provided at configuration time to fetch the Notion Page, transcribe the
        attached audio file, and then post the transcription results back to Notion as Markdown Text.
        """
 
        # Parse the Block ID from the Notion URL
        block_id = url.split("#")[1]
 
        # Get the Notion page
        print(f"Getting notion block {block_id}")
        notion_page = notion_get(f"blocks/{block_id}", self.config.notion_key)
 
        # Get the Page ID and Audio URL from the Notion File JSON
        audio_url = notion_page['audio']['file']['url']
        page_id = notion_page['parent']['page_id']
 
        print(f"Audio url: {audio_url}")
        print(f"Page ID: {page_id}")
 
        # Transcribe the file into Markdown
        markdown = transcribe_audio(audio_url, self.client)
 
        print(f"Markdown: {markdown}")
 
        # Add it to Notion
        res_json = add_markdown(page_id, markdown, self.config.notion_key)
 
        print(f"Res JSON: {res_json}")
 
        return res_json
 
handler = create_handler(NotionAutoTranscribe)
Transcribed audio file

Summary

In this article we automated transcribing an mp3 file in Notion. We programmed our custom application to connect the project management tool, Notion, with the NLP Package Manager, Steamship. Our app finds the mp3 audio file we placed in our project workspace, deploys the Steamship bundle designed to process our audio file, and then returns the complete audio markdown directly back to our Notion workspace. 

Further Reading

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

Leave a Reply Cancel reply