Combine Files that Start with the Same Words with Python

Recently I had to combine a bunch of files that started with the same prefix. I did an NLP analysis of a YouTube series where I had to split up my API calls for each episode. After getting the partial analysis results, I wanted to combine them into one file for each episode. This is how we can use Python to combine all the files in a directory that start with the same prefix.

In this post, we’ll cover:

  • Getting all the Files in a Directory with Python
    • Collecting the Files that Start with the Same Prefix
    • Sorting Files in Order
  • Opening the Files and Combining the Contents with Python
  • A Summary of Combining Files that Start with the Same Words with Python

Getting all the Files in a Directory with Python

The first thing we need to be able to do before we can collect the files that start with the same word, is to get all the files in a directory. We’ll use the os library to do this. This is a built-in Python library so we don’t need to install anything.
In this example, we’ll create a function that will get all the filenames in a specified directory. Our function will take no parameters. It will start by creating an empty list to hold the filenames. Next, we’ll loop through all the files in a directory using os.listdir and append those filenames to the list. You could change this function to get the files in any folder by passing in one parameter, the name of the directory.

import os
 
# get filenames
def get_filenames():
    filenames = []
    for filename in os.listdir("./jp/"):
        filenames.append(filename[:-5])
    return filenames

Collecting the Files that Start with the Same Prefix

Now that we’ve got all the files in a directory (using os.listdir) we can also get all the files that start with a prefix. In this case, we’re going to use the get_filenames function we created above to get the list of prefixes. Then we’ll create an empty list that will contain lists of each file that start with the corresponding prefix from our filenames list. 

Next, we’ll loop through each of the prefix filenames. Within each loop we’ll initialize an empty list. This list will hold all the filenames that start with the desired prefix. Now we’ll loop through each of the files in the target directory and check if the filename starts with the desired prefix. If it does, we’ll append it to the list.

import os
from get_filenames import get_filenames
 
# collect all the separate mcps, ners, polarities, and summaries
filenames = get_filenames()
   
part_filenames = []
for filename in filenames:
    parts = []
    for part_name in os.listdir("./ners"):
        if part_name.startswith(filename+"_"):
            parts.append(part_name)

Sorting Files in Order

Now that we have a list of all the filenames that begin with the desired prefix, we should sort them. The code below still belongs in the first for loop but is executed after the directory traversal. We’ll use the sorted function to sort the filenames by len, or length. This is because we already have the files sorted alphabetically, but we don’t have them sorted by length. 

Without this call, files that end in _11 will come before files that end with _2. After sorting all the files for one prefix, we’ll append that list to the larger list.

    parts = sorted(parts, key=len)
    part_filenames.append(parts)

Full Code for Collecting the Files in Order

This is the full code for collecting all the files for a set of prefixes and appending them to a list of lists in sorted order.

import os
from get_filenames import get_filenames
 
# collect all the separate mcps, ners, polarities, and summaries
filenames = get_filenames()
   
part_filenames = []
for filename in filenames:
    parts = []
    for part_name in os.listdir("./ners"):
        if part_name.startswith(filename+"_"):
            parts.append(part_name)
    parts = sorted(parts, key=len)
    part_filenames.append(parts)

Opening the Files and Combining the Contents with Python

We’ve created ways to get all the files in a folder and a way to collect the files that correspond to a specific prefix. Now, we are going to finish up by opening the files and combining the contents with Python. 

We’ll start off by looping through and enumerating the list of filename prefixes. We’ll start off the loop by creating a list. This list will hold the string content of each of the files we’re opening. Next, we’ll loop through the index of the target file names corresponding to the prefix. For each of these target files, we’ll read the file in and append that to the list we created.

After we’ve looped through each of the files in the index of the list of lists corresponding to the prefix, we’ll write the list to a file. We’ll open up a file and loop through each of the entries in the list we created and write the string content.

for i, filename in enumerate(filenames):
    # ners
    ners = []
    for part_filename in part_filenames[i]:
        with open(f"./ners/{part_filename}", "r") as f:
            entries = f.read()
            ners.append(entries)
    with open(f"./ners/{filename}.txt", "w") as f:
        for entry in ners:
            f.write(entry)

Summary of Combining Files that Start with the Same Word with Python

In this file we learned about how to combine files that start with the same word in Python. First, we learned about how to use os.listdir() to get all the file names in a directory. Then we learned about how to loop through a list of files and grab all the ones with the same prefix. Finally, we saw how to loop through those files, open them up, and then combine their contents into one file.

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

One thought on “Combine Files that Start with the Same Words with Python

Leave a Reply

%d bloggers like this: