How Many Views Per Day do Front Page YouTube Videos Get?

YouTube is the celebrity maker of our generation. I don’t know about you, but I’m pretty curious as to how many views per day a YouTube video on the front page gets. Let’s take a look at the videos on YouTube’s front page and see how many views per day each of these videos get. We’re going to write a Python program that will pull all this information down and analyze it for us. To do this, we’ll need the help of Selenium with Chromedriver, Beautiful Soup, and The Text API.

Let’s start by downloading our libraries (we’ll use the dateparser library to calculate when the video was posted, but there are other ways to do it too)

pip install selenium beautifulsoup4 dateparser requests

Note that the Beautiful Soup Python library is actually packaged under the name “beautifulsoup4”. After we install our Python libraries, we’ll want to go to the Chromedriver link provided above and install Chromedriver. You can pick whichever version of Chromedriver you want, and it should lead you to a page with “.zip” files to download that looks something like this:

Download the right “.zip” file for your operating system. I’m using windows, so I downloaded the “_win32” version. Once you download it, you’ll need to extract it and either keep track of it’s location or move the chromedriver.exe file into a folder that you can easily remember. Once you’re done downloading your chromedriver, head on over to The Text API website and sign up for an API key. When you land on the page, scroll all the way down and click the “Get Your API Key” button as shown below.

Sign up for The Text API

Once you log in, you should see your API key front and center at the top of the page like shown in the image below. I have a paid account, but you can do this project with a free account. Paid accounts are currently in closed Beta.

Create a Web Scraper to Scrape YouTube

Alright, now we’re done with all the setup, let’s get into the code! First we’ll have to import all the libraries we need. I’ve imported the re library for regular expressions, requests to send API requests to The Text API, dateparser to parse dates, datetime to get the current date, multiple Selenium libraries and sublibraries for running the chromedriver with options, sleep from the time library to wait for the page to load, randint to allow for a randomized waiting time so that we don’t look too much like a bot, BeautifulSoup from bs4 to parse the webpage, and json to parse JSON objects. I’ve also imported ner_url and headers from my text_api_config file. These are for hitting The Text API URL endpoint for extracting the date, and the headers contain the API key.

import re
import requests
import dateparser
import datetime
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from time import sleep
from random import randint
from bs4 import BeautifulSoup
import json
 
from text_api_config import headers, ner_url

The URL endpoint we’ll be hitting is the “Named Entity Recognition (NER)” endpoint so that we can extract the date that the video was uploaded. Before we move into the actual logic, I’ll show you what the headers and ner_url objects should look like:

headers = {
    "Content-Type": "application/json",
    "apikey": <your API key here>
}
text_url = "https://app.thetextapi.com/text/"
ner_url = text_url + "ner"

The first thing we’ll want to do is actually launch the chromedriver and navigate to the front page of YouTube. We’ll use the Service library we imported from Selenium earlier to open up chrome, and I’ll open it up in “headless” mode, which simply means that the driver will be able to access chrome, but we won’t see the browser. You can opt to run in headless mode or not. Once we get to YouTube, we should stop here and check to see where we can find video titles based on the HTML elements of the page.

chromedriver_path = "<wherever you saved chromedriver>"
service = Service(chromedriver_path)
chrome_options = Options()
chrome_options.headless = True
driver = webdriver.Chrome(service=service, options=chrome_options)
 
home = "https://www.youtube.com/"
driver.get(home)
sleep(randint(2, 4))
# --------- run here and check for where to find videos ----------

Alright, once we’ve found where to find videos, we’ll need to extract the videos. This is where BeautifulSoup comes in to help us out. We can simply use BeautifulSoup to extract the HTML structure of the page and pull all the “a” elements (links) that start with “watch” (thank you regular expressions) to find the locations of all the YouTube videos on the front page of YouTube. After we’re done getting the elements, we should quit the chromedriver so it doesn’t continue to run in the background. This is just best practice to reduce processing power, but you’re actually free to let it run if you’d like.

soup = BeautifulSoup(driver.page_source, 'html.parser')
titles = soup.find_all("a", href=re.compile("watch.*"))
driver.quit()

Now we gotta get to the logic part of extracting the page. The “text” of the YouTube video title is, at the time of writing, located in the “aria-label” text of the element. It should include a title, when it was posted, how long it is, and how many views it has. An example looks like this:

We can then split the title text on the “by” keyword and get the title and the author, date of posing, and length of video.

Parse and Analyze YouTube Videos and Their Views

We’ll have to parse the elements we want and send the second half, that is the elements without the title, to The Text API so we can extract the date, we can also get the length of the video, but we won’t need that for the scope of this project. I’ve included a comment below the response so as to understand what the ner_url returns as an object. It will be returned as a string because we are accessing it via the .text attribute of the response. We’ll be using a regular expression to look for the first text that starts with a number because we expect the date to be the first entry with a number. Of course, this will fail if the author’s name starts with a number. We’ll put our extraction of the date in a try/except block and then save our data into a dictionary with the relative date returned from the NER endpoint of The Text API and the total number of views under the title in the dictionary.

# title, when uploaded, number of views
title_dict = {}
for title in titles:
    text = title.get('aria-label')
    if text is None:
        continue
    print(text)
    elements = text.split(' ')
    num_views = elements[-2]
    re_join = ' '.join(elements[:-2]).split('by')
    print(re_join)
    title_text = re_join[0]
    when_uploaded = re_join[1]
    body = {
        "text": when_uploaded
    }
    response = requests.post(ner_url, headers=headers, json=body)
    # {"ner":[["DATE","2 weeks ago"],["TIME","1 minute, 12 seconds"]]}
    x = re.findall(r"\"([0-9].*?)\"", response.text, re.DOTALL)
    try:
        y=x[0]
    except:
        continue

    # print(f"{title_text}: {num_views}\nUploaded on: {when_uploaded}")
    title_dict[title_text] = (y, num_views)

Now that we have the titles, their views, and when they were uploaded, we can calculate the average views per day that a video gets. To do this, we’ll use the dateparser library we installed earlier to parse the relative date into an absolute epoch, then use the datetime library to get the current date. We’ll take the difference between them and convert that to a number of days. To avoid a divide by 0 case, we’ll convert 0 days into 1 day. Then we will parse the number of views by stripping the commas and converting it into an int type so we can perform division on it. Then we write it to a JSON file so we can keep track of it to use later.

title_to_avg_daily_views = {}
for title in title_dict:
    print(title)
    try:
        dt_object_then = dateparser.parse(title_dict[title][0])
    except:
        continue
    print(dt_object_then)
    days_since = (datetime.datetime.now() - dt_object_then).days
    if days_since == 0:
        days_since = 1
    avg_views_per_day = int(title_dict[title][1].replace(',',''))/days_since
    print(avg_views_per_day)
    title_to_avg_daily_views[title] = avg_views_per_day
 
json_dict = json.dumps(title_to_avg_daily_views, indent=4)
with open("local_titles_and_views.json", "w") as f:
    f.write(json_dict)

The final JSON file should look something like this:

That’s it! Now we know how many views a day each of the videos on the front page of YouTube get. In our next articles, we’ll be exploring the most common phrases among these videos and whether or not more polarizing YouTube videos get more views per day!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Yujian Tang

2 thoughts on “How Many Views Per Day do Front Page YouTube Videos Get?

Leave a Reply

%d bloggers like this: