Categories
NLP The Text API

Do More Polarizing YouTube Titles Get More Views?

Accompanying YouTube Video:

We’re all familiar with the concept of clickbait now, so I was curious if more polarizing YouTube titles got more views. Just like the last couple of articles involving YouTube, we’ll be using Selenium, BeautifulSoup, and The Text API. Just like when we found The Most Common Phrases on the Front Page of YouTube, we’ll do analyze polarity vs views per day in two steps:

  1. Pulling Titles and Views per Day from the Front Page of YouTube
  2. Analyzing Title Polarity against Views per Day

Pulling Titles and Views per Day from the Front Page of YouTube

We already did this exact step when we looked at How Many Views per Day Do Front Page YouTube Videos Get? I will briefly cover it here. We’ll install our libraries:

pip install selenium beautifulsoup4 dateparser requests

After we install our Python libraries, we’ll want to install Chromedriver. Once you’re done downloading your chromedriver, head on over to The Text API website and sign up for an API key. When you land on the page, scroll all the way down and click the “Get Your API Key” button as shown below.

Once you log in, you should see your API key front and center at the top of the page like shown in the image below. I have a paid account, but you can do this project with a free account, and paid accounts are in closed beta at the time of writing anyway.

Now that we’re done with setup we can get into the code. Below is the full code example, which can also be found on GitHub. All we do in the code is start up chromedriver, pull the link labels from YouTube’s homepage, and use Named Entity Recognition plus some string parsing to write our results to a JSON file. If you’d like a longer and more detailed explanation of the code, please refer to the article on How Many Views Do Front Page YouTube Videos Get? If you don’t want to bother with this code, you can also download the resulting JSON file I ended up with.

import re
import requests
import dateparser
import datetime
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from time import sleep
from random import randint
from bs4 import BeautifulSoup
import json
 
from text_api_config import headers, ner_url
 
chromedriver_path = "<your chrome driver path here>"
service = Service(chromedriver_path)
chrome_options = Options()
chrome_options.headless = True
driver = webdriver.Chrome(service=service, options=chrome_options)
 
home = "https://www.youtube.com/"
driver.get(home)
sleep(randint(2, 4))
# --------- run here and check for where to find videos ----------
 
soup = BeautifulSoup(driver.page_source, 'html.parser')
titles = soup.find_all("a", href=re.compile("watch.*"))
driver.quit()
# title, when uploaded, number of views
title_dict = {}
for title in titles:
    text = title.get('aria-label')
    if text is None:
        continue
    elements = text.split(' ')
    num_views = elements[-2]
    re_join = ' '.join(elements[:-2]).split('by')
    title_text = re_join[0]
    when_uploaded = re_join[1]
    body = {
        "text": when_uploaded
    }
    response = requests.post(ner_url, headers=headers, json=body)
    # {"ner":[["DATE","2 weeks ago"],["TIME","1 minute, 12 seconds"]]}
    x = re.findall(r"\"([0-9].*?)\"", response.text, re.DOTALL)
    try:
        y=x[0]
    except:
        continue
    # print(x[0])
    # print(f"{title_text}: {num_views}\nUploaded on: {when_uploaded}")
    title_dict[title_text] = (y, num_views)
 
title_to_avg_daily_views = {}
for title in title_dict:
    print(title)
    dt_object_then = dateparser.parse(title_dict[title][0])
    days_since = (datetime.datetime.now() - dt_object_then).days
    if days_since == 0:
        days_since = 1
    avg_views_per_day = int(title_dict[title][1].replace(',',''))/days_since
    print(avg_views_per_day)
    title_to_avg_daily_views[title] = avg_views_per_day
 
json_dict = json.dumps(title_to_avg_daily_views, indent=4)
with open("titles_and_views.json", "w") as f:
    f.write(json_dict)

exit()

Analyzing Title Polarity Against Views per Day

Alright let’s check it out, do more polarizing titles get more views per day than non-polarizing titles? Let’s take a look at the data. We will plot the polarity of the title against its views per day. To do this we’ll need to install matplotlib which we can do with pip like below:

pip install matplotlib

We’re going to use the pyplot module of matplotlib to plot our data. Let’s review what the data looks like:

The data is structured as a dictionary with the title as the key and the views per day as the value. Let’s get the polarity of these titles. We’ll be using The Text API to get the polarities, we’ll be sending each title to the https://app.thetextapi.com/text/text_polarity endpoint to get our polarity scores.

To start, we’ll import the libraries we’ll need: requests to send HTTP requests, json to parse our input, and our API key, which I’ve stored in a config file.

import requests
import json
from text_api_config import apikey
 
with open("titles_and_views.json", "r") as f:
    entries = json.load(f)

Let’s define a function to get our polarity scores. This function will take an “entry” which will be the title string. We’ll construct our headers to say that our request is in the form of a JSON, and pass our API key. The body of our request will tell the API that the text to be processed is the “entry” which is the title. Finally, we parse our request and load and return the “text polarity” value.

Getting Text Polarity from The Text API

def get_polarity(entry):
    headers = {
        "Content-Type": "application/json",
        "apikey": apikey
    }
    body = {
        "text": entry
    }
    url = "https://app.thetextapi.com/text/text_polarity"
    res = json.loads(requests.post(url, headers=headers, json=body).text)
    _p = res["text polarity"]
    return(_p)

Let’s create a dictionary and then loop through all the entries and get their polarities. As we loop through, we’ll save the polarities and the views per day in another dictionary, _dict, and store _dict as the value corresponding to the title of the video in the new_doc dictionary. At the end, I dump it into a JSON and save it as a file, but you can feel free to keep it as a dictionary as we move on to plotting the text polarities against the views per day.

new_doc = {}
for entry in entries:
    _p = get_polarity(entry)
    _vpd = entries[entry]
    _dict = {
        "polarity": _p,
        "views_per_day": _vpd
    }
    new_doc[entry] = _dict
 
json_dict = json.dumps(new_doc, indent=4)
with open("views_by_polarity.json", "w") as f:
    f.write(json_dict)

Plotting Text Polarity against Views per Day

So we’ll start by importing the libraries we need: matplotlib.pyplot and json. We import matplotlib.pyplot as plt just as a convention. Then we’ll open up our JSON document and read it in just like a dictionary. The scatter plot function expects two lists for x and y so we’ll take our dictionary and split it into two lists. To do this, we loop through each entry and append the polarity value to a polarity list and the views per day value to a views per day list. Then we pass these in as the x and y values to our scatter plot, label it, and take a look.

import matplotlib.pyplot as plt
import json
 
with open("views_by_polarity.json", "r") as f:
    entries = json.load(f)
 
polarities = []
vpds = []
 
for entry in entries:
    polarities.append(entries[entry]["polarity"])
    vpds.append(entries[entry]["views_per_day"])
 
plt.scatter(polarities, vpds)
plt.title("Views per Day by Polarity of Title")
plt.xlabel("Polarity of Title")
plt.ylabel("Views per Day")
plt.show()

The expected output should look something like this:

Yours may look slightly different since YouTube changes their front page for everyone, but LOOK AT THAT! The highest views per day belongs to a video that looks like the title is entirely neutral! It looks like we can’t say that neutral titles are likely to get more views, but we can say the converse – that videos with more views are more likely to have neutral titles. Out of curiosity, I decided to take a look at what the video was that had such a high views per day compared to any of the other videos. 

To do this, we’ll simply load the JSON file and sort it based on views per day:

import json
 
with open("views_by_polarity.json", "r") as f:
    vbp = json.load(f)
 
sorted_vbp = sorted(vbp.items(), key=lambda x:(x[1]["views_per_day"]))
print(sorted_vbp[-1])

We should get an output that looks like the following:

Okay, so personally I have no idea what this title means. I don’t know who Huggy is, and I’ve never heard of Poppy Playtime. I have no idea why this video has SO many views per day, I thought, maybe it was posted today, but I actually took a look on YouTube and:

It was uploaded over a week ago! This is the first time I’ve seen this character, and honestly I’m not tempted to watch this video so we’ll leave this exploration at that.

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.