As content on the web increases, content moderation becomes more and more important to protect sensitive groups such as children and people who have suffered from trauma. We’re going to learn how to create your own AI content moderator using Python, Selenium, Beautiful Soup 4, and The Text API.
Our AI content moderator will be built in three parts, a webscraper to scrape all the text from a page, a module for the content moderation with AI using The Text API, and an orchestrator to put it all together.
Video Guide Here:
In this post we’ll create the orchestrator to put the webscraper and the AI content moderation module together.
To create this orchestrator we need to:
- Import the Webscraper and Content Moderation Functions
- Create Orchestrator Function
- Get URL from Input
- Scrape the Page for All the Text
- Moderate the Scraped Text
- Test Orchestration
Import the Webscraper and Content Moderation Functions
An orchestrator is simply a module that “orchestrates” the rest of the functions and modules in the software. In our case, we only have two other modules that we’re working with so our orchestrator only needs these two modules. Each module simply contains one function so we’ll just import each of those functions from each of their modules.
# imports from webscraper import scrape_page_text from content_moderator import moderate
Create Orchestrator Function
With our imports in place, we now need to create our orchestrator. Our orchestrator function will take a URL from the user, scrape the text, and moderate the text. After moderating the text, it will return the content moderation rating and whether or not it contains a triggering word.
Get URL from Input
We want to be able to run our AI content moderation on any URL. So, the first thing we’ll do when we create our
orchestrate function is get the URL. All we have to do for this is call the Python
input function. We’ll use the
input to prompt the user for a URL and save it to a variable.
# function def orchestrate(): # ask user for website URL url = input("What URL would you like to moderate? ")
Scrape the Page for All the Text
After we get the URL, we’ll scrape it. First, we’ll print out a statement to tell the user that we’re scraping the page text. Then we’ll call the
scrape_page_text method we imported from the webscraper and pass in the URL. We’ll save the returned text into a variable.
# call webscraper on the URL print("Scraping Page Text ...") text = scrape_page_text(url)
Moderate the Scraped Text
Now that we have the text from the scraped URL, we have to moderate it. We’ll tell the user that we’re moderating the page text, and then moderate it. We will use the AI moderation function that we created earlier and pass it the text. Then we’ll save the output as the rating and whether or not there’s a trigger word.
# call content moderator on the scraped data print("Moderating Page Text ...") rating, trigger = moderate(text)
Full Code for Orchestration Function
Here’s the full code for the orchestration function.
# function def orchestrate(): # ask user for website URL url = input("What URL would you like to moderate? ") # call webscraper on the URL print("Scraping Page Text ...") text = scrape_page_text(url) # call content moderator on the scraped data print("Moderating Page Text ...") rating, trigger = moderate(text) # return verdict return rating, trigger
Now let’s test our orchestration function. All we’re going to do is print out the call to
orchestrate. We’ll test my article about how I’m finally seeing the results of PythonAlgos effect on helping people learn Python.
We should see an output like the one below.
I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.