Creating, updating, and interacting with files is an integral part of data pipelines. In order to interactively access files, we have to be able to list them. There are three ways to list the files in a directory for Python. In this post we’ll cover how to use the
os module and three of its commands to list directories, subdirectories, and files in Python.
In this post we will:
- Get all File and Subdirectory Names
- Iterate Through Each Entry in the Directory
- List directories, subdirectories, and files with Python
Get all File and Subdirectory Names
The first way to list all the files and subdirectory names in a directory is the
os.listdir() function. This function returns a list of all the names of the entries in the directory as strings. This is the most basic way to list the subdirectories and file names in a directory.
This is the method we used to get all the filenames when combining files that start with the same word.
import os root = "." for obj in os.listdir(root): print(obj)
Iterate Through Each Entry in the Directory
A second way to get every entry in a directory is through the
os.scandir() function. This function doesn’t return a list of strings, but rather a special iterator object of
DirEntry objects. This method is more effective than
os.listdir() when we need more than the name of the entries.
DirEntry object contains not only the name and path of the entry, but also whether the entry is a file or subdirectory. It also tells us if the entry is a symlink or not. The
DirEntry object can make operating system calls so we could raise OSErrors while working with the results from an
import os root = "." for obj in os.scandir(root): print(obj)
List Directory, Subdirectory, and Files with Python
os.listdir() commands are great for getting the entries in one directory, but what if you need the subdirectory entries as well? This is where the third
os library function that can iterate through directories comes in,
os.walk() function returns a generator. Each item in the generator is a tuple of size three. The first entry in the tuple is the path, the second is the list of subdirectories, and the third is the list of files. The
walk function doesn’t just look in the current directory, it also recursively walks through every subdirectory.
We can print all the files, including the ones nested in subdirectories, in a directory using the
os.walk() function. All we have to do is loop through all the filenames in the list of files and print out the concatenation of the current path and the filename.
import os root = "." for path, subdirs, files in os.walk(root): for filename in files: print(os.path.join(path, filename))
Summary of Ways to List Directories, Subdirectories, and Files with Python
In this post we learned about three ways to list the files in a directory. The first two methods list the files and subdirectories in the current directory, and the last method goes into all the subdirectories as well.
scandir methods differ in the type of iterables they return, and the metadata attached to the objects in the iterable. On the other hand, `walk` returns a generator and not an iterable. It also contains tuples instead of objects or strings.
walk functions are the three built-in
os functions that allow us to access file data. The `listdir` function is best used when we just need file names in the current directory. If we also need the entry types and more metadata, then
scandir is a better option. Finally, if we need access to the subdirectory files as well, we should use the
- Long Short Term Memory (LSTM) in Keras
- Nested Lists in Python
- Download a YouTube Transcript in 3 Lines of Python
- Send API Requests Asynchronously in Python
- Should I Go to a Coding Bootcamp?
I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.