How to Build an AI Content Moderation System

The media machine feeds on our attention. It constantly pokes and prods and finds ways to provoke, shock, or entice. At the same time, we’re no less sensitive to graphic content than we were before. Artificial Intelligence (AI) can be a force for good or a force for evil. It’s important to remember, it’s not AI, but how we use it, that determines the ultimate outcome. Let’s take a look at how we can use AI for good by using it to create a content moderation system and protect our minds from the information assault on the internet.

In this post we’re going to go over how you can use the world’s most comprehensive AI sentiment analysis API with Python to create a content moderation system. Our Python based Content Moderator will tell us whether or not any site’s text contains triggering words. First you’ll need to go to The Text API and get your free API key. Then you’ll need to install the selenium, beautifulsoup4, and requests libraries. You’ll also need to go to Chromedriver and install that. Remember to keep your API key and Chromedriver executable in an easily accessible folder.

Video Guide Here:

Here are the steps to creating our own AI Content Moderation System with Python:

  1. Outline of the AI Content Moderation System
  2. Scrape the Webpage’s Content
  3. Create AI Content Moderator
  4. Orchestrate Webscraper and AI Content Moderation Module

Outline of the AI Content Moderation System

AI Content Moderation System Design
AI Content Moderation Design Overview

Our AI Content Moderation System will consist of three modules, a webscraper, a content moderator, and an orchestrator. We’ll use Selenium and Beautiful Soup 4 for the webscraper. We will use The Text API and HTTP requests for the Content Moderator. Finally, we’ll use the webscraper and content moderator for the orchestrator.

Scrape the Webpage’s Content

The first thing we’ll want to do is scrape all of the text from a webpage. In our webscraper module we will simply use Selenium and Beautiful Soup 4 to get all the text. First, we’ll set up a Selenium driver to run Chrome. We also need a function that will load a URL and scrape the webpage for text.

Our function will start by loading the URL with the Selenium driver we made. Then we will make the page content into an HTML soup with Beautiful Soup. After this step, we should quit the driver to save processing power. Once we have the HTML soup, we can get all the text. After getting the text, we should clean it up a bit, I just drop the newline characters. Finally, we simply return the text.

For a full explanation of each line of code, read Create Your Own Content Moderator Part 1.

Code to Scrape the Full Text of a Web Page

Here’s the code to scrape the full text of a web page.

# pip install selenium beautifulsoup4
# imports
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from time import sleep
 
chromedriver_path = "C:\\Users\\ytang\\Documents\\workspace\\content-moderation\\chromedriver.exe"
service = Service(chromedriver_path)
options = Options()
options.add_argument("--headless")
 
# function
def scrape_page_text(url: str):
    # create driver
    driver = webdriver.Chrome(service=service, options=options)
 
    # launch driver
    driver.get(url)
    sleep(3)
   
    # get soup from driver page
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    driver.quit()
   
    # scrape all the text from page
    text = soup.get_text()
    text = text.replace("\n", "")
   
    return(text)

Create AI Content Moderator

Our AI Moderator will be based on the Natural Language Processing power of The Text API. The first thing we’ll do is set up the headers, URL, and keywords that we’ll pass to the sentences_with_keywords API endpoint. Then we’ll create the function to do the content moderation.

The function to do content moderation will create a JSON body to send to the API endpoint. When we get the response back, we’ll loop through each of the keyword responses to do content moderation and check for trigger warnings. We’re going to use the movie rating system to determine which words to moderate for content and how many times they show up.

We’ll loop through the content moderation keywords first to determine a rating. Then we’ll loop through the keywords for trigger warnings and determine if there’s a trigger warning. Finally, we’ll return both the rating and whether or not there are any trigger words.

For a full explanation of the code, line by line, check out Create Your Own AI Content Moderator, Part 2.

Full Code for the AI Content Moderator

Here is the full code for the AI Content Moderator.

# pip install requests
# get API key from https://www.thetextapi.com
# imports
import requests
import json
from config import apikey
# create headers
headers = {
    "Content-Type": "application/json",
    "apikey": apikey
}
# create keywords
keywords = ["fuck", "damn", "shit", "sexual assault", "rape", "gun"]
# url
url = "https://app.thetextapi.com/text/sentences_with_keywords"
 
moderation = {
    0: "safe",
    1: "13+"
}
# create function
def moderate(text: str):
    # create the body from the text
    body = {
        "text": text,
        "keywords": keywords
    }
    # pass in with keywords
    response = requests.post(url=url, headers=headers, json=body)
    # receive response and check for returned sentences
    _dict = json.loads(response.text)
 
    # grade returned sentences for 13+, 18+
    rating = 0
    mod_status = ""
    for kw in keywords[:3]:
        rating += len(_dict[kw])
    if rating in moderation:
        mod_status = moderation[rating]
    else:
        mod_status = "18+"
   
    # grade for trigger warning
    triggers = 0
    trigger_warning = False
    for kw in keywords[3:]:
        triggers += len(_dict[kw])
    if triggers > 0:
        trigger_warning = True
   
    # return response
    return mod_status, trigger_warning

Orchestrate Webscraper and AI Content Moderation Module

The orchestrator module will combine both the AI Content Moderator and the webscraper above. All we need to do is import the functions for scraping pages and moderating the content, and then create a function to run both of them.

Our function will start by prompting the user for a URL. We’ll then pass that URL into the webscraper to get the text from the web page. Once we get the text back, we’ll pass that to the AI content moderator function to get our rating and trigger warning. Finally, we return the rating and whether or not there are trigger words.

For a full explanation of the code, line by line, check out Create Your Own AI Content Moderator, Part 3.

Full Code for the Orchestrator Module

Here’s the full code for orchestrating the AI Content Moderation System.

# imports
from webscraper import scrape_page_text
from content_moderator import moderate
# function
def orchestrate():
    # ask user for website URL
    url = input("What URL would you like to moderate? ")
    # call webscraper on the URL
    print("Scraping Page Text ...")
    text = scrape_page_text(url)
    # call content moderator on the scraped data
    print("Moderating Page Text ...")
    rating, trigger = moderate(text)
    # return verdict
    return rating, trigger
 
print(orchestrate())

Further Reading

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

¤5.00
¤15.00
¤100.00
¤5.00
¤15.00
¤100.00
¤5.00
¤15.00
¤100.00

Or enter a custom amount


Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Yujian Tang

4 thoughts on “How to Build an AI Content Moderation System

Leave a Reply

%d