Categories
Career General Python level 1 python

Anagram Python Technical Interview Question Solutions

Anagrams are strings that are made up of the same set of letters. There are many ways to use anagrams in technical interview questions. The one that I got in my most recent round of technical interviews was along the lines of “remove all words in a list that are anagrams of words that have come before”.

The key function to working with anagrams in any algorithmic challenge is the anagram check code, and we’ll cover how to write Python anagram check code in this post.

In this post, we’ll cover:

  • Anagram check Python code
  • Removing all anagrams from a list with Python
  • Group anagrams with Python
  • Summary of Anagram Python Technical Interview Questions and Solutions

Check if two strings are anagrams

This is the core functionality of working with anagrams in Python. We need to be able to check if two strings are anagrams of each other. Remember that anagrams are words that have the same letters in different positions. For example, “slick” and “licks”.

So how do we check if two strings are anagrams? We need to make sure that they both contain the same letters and the same number of each letter. The easiest way to do this anagram check Python code is to use the `sorted` function. 

We will create an `are_anagrams` function which takes two parameters, `a` and `b`, the two strings we want to check if they are anagrams or not. All we are going to do is turn the strings into sorted lists of their lower case letters. We’ll do this by calling the `sorted` function around the string with the `lower()` function called on it already. Note that `sorted` returns a list!

def are_anagrams(a: str, b: str):
   a = sorted(a.lower())
   b = sorted(b.lower())
   return a == b
 
print(are_anagrams("abc", "cba"))
print(are_anagrams("GHHZ", "ZHgh"))
print(are_anagrams("not slick", "slick"))
print(are_anagrams("Yujian is Awesome", "Awesome Yujian is"))

Note: Why do we use `==` and not `is`? Because `==` evaluates the value of the compared objects while `is` checks if the aliases point to the same object. The above function should show the below results once it is run.

Remove all anagrams from a list of strings with Python

Now that we’ve created a basic Python anagram check function, let’s see how we can use it in a broader context. Let’s explore one of the technical interview problems I recently encountered. 

You’re given a list of strings. Your goal is to find a way to return the number of distinct arrangements of letters in the list. In other words, your goal is to find the number of strings left once you remove all duplicate anagrams.

Let’s start by creating a function called `remove_duplicates`. Our function will take one parameter, the list of strings. Let’s create two lists in our function. One will hold the unique string configurations we see and the second will hold the “cleaned” version of the list of strings without repeated anagrams.

Next, we’ll loop through each entry in the entries list. If the sorted version of the lower case of the string is already in the list of uniques, then we should just skip this string. Otherwise, we’ll append the sorted, lowercase version of the string to uniques and append the string to the cleaned list. 

I have printed out the cleaned list just so we can see it. It should be the first instance of each distinct string configuration we have. In the example below, the length should be 3.

def remove_duplicates(entries: list[str]):
   uniques = []
   cleaned = []
   for entry in entries:
       if sorted(entry.lower()) in uniques:
           continue
       uniques.append(sorted(entry.lower()))
       cleaned.append(entry)
   print(cleaned)
   return len(cleaned)
  
 
entries = ["abc", "bca", "GhZH", "gzHH", "Yujian is Awesome", "Awesome Yujian Is"]
print(remove_duplicates(entries))

If we run this script, we should see an output like the one below.

Python anagram grouping problem

Another technical interview question that I’ve seen is grouping anagrams together. Just like the question above, you are provided with a list of strings. This time the problem statement is to find all the words that are anagrams of each other and group them together. In our example, we will return a dictionary of the words that are anagrams grouped together.

Our `group_anagrams` function will take one parameter, a list of strings. The first thing we’ll do is initialize an empty dictionary of all the grouped strings. Then, we’ll loop through each entry in the list and check if the sorted letters of an entry are in the dictionary. If the string is not, then we add an entry for it and initialize a list containing the current string. Otherwise, we append the current string to an existing entry. Finally, we return the dictionary.

def group_anagrams(entries: list[str]):
   grouped = {}
   for entry in entries:
       _sorted = "".join(sorted(entry.lower()))
       if _sorted not in grouped:
           grouped[_sorted] = [entry]
       elif _sorted in grouped.keys():
           grouped[_sorted].append(entry)
   return grouped
 
entries = ["abc", "bca", "GhZH", "gzHH", "Yujian is Awesome", "Awesome Yujian Is"]
print(group_anagrams(entries))

Running the program above will get you something like the image below. The anagrams will be grouped together and sorted by key based on the letters in the word. Notice that the spaces of the strings are removed! This is because we `joined` the letters without any spaces to create these keys. 

Summary of Working with Python Anagrams

Classifying strings as anagrams through Python is an interesting topic. It’s quite easy to do with Python’s built-in function that sorts strings into lists of alphabetically ordered letters. All we have to do is cast the string into lowercase and call the `sorted` function on it.

Other than just comparing strings to see if they are anagrams, we also covered two technical interview questions around anagrams. We covered how to remove all the anagrams from a list so that it only contains distinct configurations of letters and how to group anagrams together with Python.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

How to Make a 4×4 Magic Square in Python

Magic Squares are one of the oldest recreational mathematics games. The smallest magic squares are three by three, but they can be made up to any size with the right formulas. Odd and even ordered magic squares have different formulas, but can be easily generated. Programming has made it even easier to make magic squares, as we are about to see. Let’s make a 4×4 Magic Square in Python

In this post, we’ll learn about:

  • What is a Magic Square?
  • How Do You Create a 4×4 Magic Square?
  • The Five Binary Encoding Patterns for 4×4 Magic Squares
  • Python Code to Make a 4 by 4 Magic Square
    • Creating the Magic Square with Lists in Python
    • Full Code for Python 4×4 Magic Square
  • Testing Our Python Magic Square
  • Summary of Creating a 4×4 Magic Square in Python

What is a Magic Square?

3×3 Magic Square from Wikipedia

A magic square is a square that is filled with numbers, usually positive integers, such that the rows, diagonals, and columns of the square all add up to the same number.

How Do You Create a 4×4 Magic Square?

Odd and even rank magic squares are created with different formulas. The 4×4 even rank magic square is created using five binary encoded patterns. A binary encoded pattern is an overlay of the square in which each square contains either a 0 or a 1. Let’s take a look at what the patterns for the 4×4 magic square are.

The Five Binary Encoding Patterns for 4×4 Magic Squares

These are the five binary encoding patterns that we add up to make a 4×4 magic square.

5 magic patterns for generating a 4×4 magic square

If you layer these patterns you get the square below. This base 4×4 magic square adds up to 12 in every row, column, and diagonal.

a basic 4×4 magic square

Python Code to Make a 4 by 4 Magic Square

Now we know what a magic square is and what the basic 4×4 magic square looks like. Let’s use Python to create some 4×4 magic squares. We will use nested lists to represent our magic square and its binary encoded patterns. We don’t need any external Python libraries to create this function.

Creating the Magic Square with Lists in Python

The first thing we’re going to do is create a function to generate the magic square.Our function will create the five binary encoded patterns separately and has one parameter. The only parameter this function takes is a list, representing the values for each binary encoding pattern. For example, a list of 8, 4, 2, 1, 1 would yield the patterns below.

8, 4, 2, 1, 1 scaled 4×4 magic square generator patterns

These patterns will yield the following magic square with value 34.

8, 4, 2, 1, 1 params 4 x 4 magic square

The first thing we’re going to do in our function is create an empty nested list representation of all five patterns.

def create_square(params: list):
   patterns = [[None] for _ in range(5)]

Now let’s get started on creating the individual patterns.

First Magic Square Pattern in Python

Let’s implement the first pattern. Remember that we are implementing the square pattern as a nested list. This means that each row of the square should be represented as a list. There are only two different row patterns in this pattern, so we’ll create two lists that represent each pattern.

We use the first element of the parameter list to scale the binary pattern and create the square pattern using list comprehension on the two list patterns we created earlier.

   '''
   pattern 1
   0011
   1100
   0011
   1100
   '''
   pattern_1_1 = [0, 0, 1, 1]
   pattern_1_2 = [1, 1, 0, 0]
   patterns[0] = [[pattern*params[0] for pattern in pattern_1_1],
               [pattern*params[0] for pattern in pattern_1_2],
               [pattern*params[0] for pattern in pattern_1_1],
               [pattern*params[0] for pattern in pattern_1_2]]

Second Magic Square Pattern in Python

Our second magic square pattern will also need two different row patterns. Just like the first square pattern, we will create the two row patterns with lists. Then, we’ll create the square pattern using list comprehension to scale the patterns by the second entry in the parameters list.

   '''
   pattern 2
   0110
   1001
   0110
   1001
   '''
   pattern_2_1 = [0, 1, 1, 0]
   pattern_2_2 = [1, 0, 0, 1]
   patterns[1] = [[pattern*params[1] for pattern in pattern_2_1],
               [pattern*params[1] for pattern in pattern_2_2],
               [pattern*params[1] for pattern in pattern_2_1],
               [pattern*params[1] for pattern in pattern_2_2]]

Third Magic Square Pattern in Python

Our third magic square pattern is built in the same way as the first two. First, we’ll create another two list representations of the row pattern representations. Next, we use the third entry from the parameter list to scale these row patterns into the third magic square pattern.

   '''
   pattern 3
   0101
   0101
   1010
   1010
   '''
   pattern_3_1 = [0, 1, 0, 1]
   pattern_3_2 = [1, 0, 1, 0]
   patterns[2] = [[pattern*params[2] for pattern in pattern_3_1],
               [pattern*params[2] for pattern in pattern_3_1],
               [pattern*params[2] for pattern in pattern_3_2],
               [pattern*params[2] for pattern in pattern_3_2]]

Fourth Magic Square Pattern in Python

The process to create the first three magic square patterns has been almost the same. For the fourth one, we don’t need to recreate row patterns. We are simply reusing the row patterns that the third magic square uses in a different order and scaling by the fourth entry of the parameters instead of the third.

   '''
   pattern 4
   0101
   1010
   1010
   0101
   '''
   patterns[3] = [[pattern*params[3] for pattern in pattern_3_1],
               [pattern*params[3] for pattern in pattern_3_2],
               [pattern*params[3] for pattern in pattern_3_2],
               [pattern*params[3] for pattern in pattern_3_1]]

Fifth Magic Square Pattern in Python

This one is even easier than the fourth pattern. All we do here is create a matrix of ones and then scale by the fifth and last entry of the parameters list.

   '''
   pattern5
   1111
   1111
   1111
   1111
   '''
   patterns[4] = [[params[4] for _ in range(4)] for _ in range(4)]

Layering the Patterns into a Magic Square

Now that we have all five patterns created, we need to layer them together to create the magic square. To start off, we’ll create a magic square of all zeros. Then we’ll loop through the list of patterns and add each entry of each pattern to the corresponding entry in the magic square. Finally, we’ll return the square we created.

   square = [[0 for _ in range(4)] for _ in range(4)]
   for pattern in patterns:
       for i in range(4):
           for j in range(4):
               square[i][j] += pattern[i][j]
   return square

Full Code for Python 4×4 Magic Square Generator Function

That concludes the function to create a 4×4 magic square generator in Python. This is the full code for the create_square function.

def create_square(params: list):
   patterns = [[None] for _ in range(5)]
   '''
   pattern 1
   0011
   1100
   0011
   1100
   '''
   pattern_1_1 = [0, 0, 1, 1]
   pattern_1_2 = [1, 1, 0, 0]
   patterns[0] = [[pattern*params[0] for pattern in pattern_1_1],
               [pattern*params[0] for pattern in pattern_1_2],
               [pattern*params[0] for pattern in pattern_1_1],
               [pattern*params[0] for pattern in pattern_1_2]]
   '''
   pattern 2
   0110
   1001
   0110
   1001
   '''
   pattern_2_1 = [0, 1, 1, 0]
   pattern_2_2 = [1, 0, 0, 1]
   patterns[1] = [[pattern*params[1] for pattern in pattern_2_1],
               [pattern*params[1] for pattern in pattern_2_2],
               [pattern*params[1] for pattern in pattern_2_1],
               [pattern*params[1] for pattern in pattern_2_2]]
   '''
   pattern 3
   0101
   0101
   1010
   1010
   '''
   pattern_3_1 = [0, 1, 0, 1]
   pattern_3_2 = [1, 0, 1, 0]
   patterns[2] = [[pattern*params[2] for pattern in pattern_3_1],
               [pattern*params[2] for pattern in pattern_3_1],
               [pattern*params[2] for pattern in pattern_3_2],
               [pattern*params[2] for pattern in pattern_3_2]]
   '''
   pattern 4
   0101
   1010
   1010
   0101
   '''
   patterns[3] = [[pattern*params[3] for pattern in pattern_3_1],
               [pattern*params[3] for pattern in pattern_3_2],
               [pattern*params[3] for pattern in pattern_3_2],
               [pattern*params[3] for pattern in pattern_3_1]]
   '''
   pattern5
   1111
   1111
   1111
   1111
   '''
   patterns[4] = [[params[4] for _ in range(4)] for _ in range(4)]
   square = [[0 for _ in range(4)] for _ in range(4)]
   for pattern in patterns:
       for i in range(4):
           for j in range(4):
               square[i][j] += pattern[i][j]
   return square

Testing Our Python 4×4 Magic Square Generator

Now that we’ve made a magic square generator, let’s test our function. We will pass the parameters a list of [8, 4, 2, 1, 1] which should result in the magic square we showed right before the code.

sq = create_square([8, 4, 2, 1, 1])
for l in sq:
   print(l)

The printout from running this file with Python should look like the image below. This is the same 34 magic square we showed above. Feel free to play around with the entries in your parameter list to generate different magic squares.

Python generated 4×4 magic square

Summary of Creating a 4×4 Magic Square in Python

In this post we covered how to create a 4×4 magic square in Python. First, we covered some background information on magic squares. Magic squares are n by n squares filled with numbers. Each row, column, and diagonal adds to the same magic number.

Next, we covered the five binary encoding patterns involved in creating a 4×4 magic square before diving into the Python code. We created a function that takes one list of five parameters and makes a magic square. Each of the five parameters correspond to how we scale the patterns. We represented the five magic square patterns involved in making a 4×4 magic square as nested lists.

To finish off our magic square creation function, we create a matrix of 0s to represent the initial square and populate the square using the five magic patterns we created. Finally, we tested our magic square creator function and saw that it generated a magic square of 34 as predicted.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
data structures and algorithms level 1 python

Kruskal’s Algorithm in Python (with Pseudocode)

Data structures and algorithms are a cornerstone of computer science. In our journey so far, we’ve looked at basic data structures like stacks, queues, and dequeues, linked lists and binary trees, and algorithms like sorting algorithms, tree algorithms, and Dijsktra’s algorithm. Now, let’s take a look at another important graph algorithm – Kruskal’s.

Kruskal’s algorithm is an algorithm that finds the minimum spanning tree (MST) of a graph. In this post we’ll cover:

  • Kruskal’s Algorithm Pseudocode
  • Kruskal’s Algorithm in Python
    • Creating a Graph Object
    • Python Union Find Algorithm
    • Implementing Kruskal’s Algorithm in Python
    • Full Python Code for Kruskal’s Graph Algorithm
    • Testing Kruskal’s Algorithm in Python
  • Summary of Kruskal’s Algorithm in Python

Kruskal’s Algorithm Pseudocode

Kruskal’s algorithm uses a greedy approach to build a minimum spanning tree. Let’s take a look at the pseudocode:

  1. Initialize a graph using the shortest (lowest weight) edge
  2. Find the shortest connected edge and add it to the shortest edges so far as long as adding the edge doesn’t create a cycle in the graph
  3. Repeat step 2 until all vertices have been included in the final MST

Kruskal’s Algorithm in Python

Let’s go over how to implement Kruskal’s Algorithm in Python. This implementation of Kruskal’s Algorithm is going to be as a function in a Graph object. We’ll create a Graph object that will hold the number of vertices in the graph as well as an adjacency list that represents the graph. 

Each entry in the adjacency list will have three entries, the two vertices and the weight of the edge between them. We also need to create functions to perform the find and union pieces of the union find algorithm.

Creating a Graph Object in Python

The first thing we’ll need to do is create our Graph object. All the functions we define later in the tutorial are a part of this object. We need an __init__ function and a function to add an edge. The __init__ function takes one parameter, the number of vertices. It sets one of the object properties to the number of vertices and creates an empty adjacency list to represent the edges in the graph.

class Graph(object):
   def __init__(self, num_vertices):
       self.V = num_vertices
       self.graph = []
  
   def add_edge(self, u, v, w):
       self.graph.append([u, v, w])

Python Union Find Algorithm

Union Find is an algorithm to find the sets of connected graphs. This algorithm consists of two separate functions, the find and union functions. The find function needs two parameters, and the union function needs four parameters.

The find function takes a list that keeps track of the minimum spanning tree and a vertex number. If the index in the MST list is equivalent to the passed in vertex, then we return the vertex number. Otherwise, we recursively look for the vertex by calling the find function with the same list, but with the new index corresponding to the value of the index of the vertex in the MST list.

The union function’s four parameters are the MST list, another list that marks the number of disjoint sets and the origin of those sets, and two vertices. First, we look for where the vertices are in terms of the MST. Then, we compare the values of the indices of the set tracking list and change the corresponding value in the MST list depending on the value comparison. If the values in the MST list are different, then the index of the smaller value is set to the higher value and vice versa. If they are the same, then we create a new disjoint set.

   # union find
   def find(self, root, i):
       if root[i] == i:
           return i
       return self.find(root, root[i])
  
   def union(self, root, rank, x, y):
       xroot = self.find(root, x)
       yroot = self.find(root, y)
       if rank[xroot] < rank[yroot]:
           root[xroot] = yroot
       elif rank[xroot] > rank[yroot]:
           root[yroot] = xroot
       else:
           root[yroot] = xroot
           rank[xroot] += 1

Implementing Kruskal’s Algorithm in Python

Now we have our helper functions, let’s take a look at how we can implement Kruskal’s algorithm in Python. We already have the two objects we need to perform Kruskal’s algorithm so this function does not need any parameters.

The first thing we’ll need to do in our Kruskal’s function is initialize the results list, which will contain the MST and will be structured the same as the list of the vertices and edge weights. We also need to initialize the iterations and number of edges added to the MST. Then, we’ll sort the adjacency list representing the graph by the weights of the algorithm.

Now, we’ll initialize the lists that will represent the root nodes of the sets we’ve found (root) and the list of the number of connected sets in the graph represented by the root node (rank). We’ll then populate the two lists.

Next, we’ll run a while loop that runs as long as we haven’t added the expected number of edges to the MST. In the while loop, we’ll start by extracting the vertices and edge weight between them corresponding to the graph index of the iteration we’re in. Now, we’ll use the find function to see if either or both vertices are connected to the graph.

If only one vertex is connected to the graph (x is not equal to y), then we increment the number of edges we’ve found in the MST, add the set of vertices and edge weights to the results list, and run the union algorithm to keep track of the MST and number of sets. When the MST is created, we’ll print it out.

   # applying kruskal's
   def kruskals(self):
       # initialize an empty MST list
       result = []
       # initialize i, the iteration and e, the edges added
       i, e  = 0, 0
       # sort the graph based on edge weights
       self.graph = sorted(self.graph, key = lambda item: item[2])
       # initialize root, which keeps track of the MST
       # and the rank, which keeps track of where each node belongs
       root = []
       rank = []
       for node in range(self.V):
           root.append(node)
           rank.append(0)
 
       # while we haven't yet added each edge
       # increment iterator and run the union find algorithm
       while e < self.V - 1:
           u, v, w = self.graph[i]
           i = i + 1
           x = self.find(root, u)
           y = self.find(root, v)
           print(f"x, y: {x}, {y}")
           if x != y:
               e = e + 1
               result.append([u, v, w])
               self.union(root, rank, x, y)
 
       for u, v, w in result:
           print(f'{u} - {v}: {w}')

Full Kruskal’s Graph Algorithm Python Code

Here’s the full code for Kruskal’s Algorithm in Python implemented with a Graph class:

# kruskal's in Python
class Graph(object):
   def __init__(self, num_vertices):
       self.V = num_vertices
       self.graph = []
  
   def add_edge(self, u, v, w):
       self.graph.append([u, v, w])
  
   # union find
   def find(self, root, i):
       if root[i] == i:
           return i
       print(i, root[i])
       return self.find(root, root[i])
  
   def union(self, root, rank, x, y):
       print(f"root: {root}, rank: {rank}")
       xroot = self.find(root, x)
       yroot = self.find(root, y)
       if rank[xroot] < rank[yroot]:
           root[xroot] = yroot
       elif rank[xroot] > rank[yroot]:
           root[yroot] = xroot
       else:
           root[yroot] = xroot
           rank[xroot] += 1
       print(f"root: {root}, rank: {rank}")
  
   # applying kruskal's
   def kruskals(self):
       # initialize an empty MST list
       result = []
       # initialize i, the iteration and e, the edges added
       i, e  = 0, 0
       # sort the graph based on edge weights
       self.graph = sorted(self.graph, key = lambda item: item[2])
       # initialize root, which keeps track of the MST
       # and the rank, which keeps track of where each node belongs
       root = []
       rank = []
       for node in range(self.V):
           root.append(node)
           rank.append(0)
 
       # while we haven't yet added each edge
       # increment iterator and run the union find algorithm
       while e < self.V - 1:
           u, v, w = self.graph[i]
           i = i + 1
           x = self.find(root, u)
           y = self.find(root, v)
           print(f"x, y: {x}, {y}")
           if x != y:
               e = e + 1
               result.append([u, v, w])
               self.union(root, rank, x, y)
 
       for u, v, w in result:
           print(f'{u} - {v}: {w}')

Testing Kruskal’s Algorithm in Python

Now let’s test Kruskal’s Algorithm. We’ll create the following graph:

Kruskal’s Algorithm Python Test, Initial Graph
g = Graph(6)
g.add_edge(0, 1, 4)
g.add_edge(0, 2, 4)
g.add_edge(1, 2, 2)
g.add_edge(1, 0, 4)
g.add_edge(2, 0, 4)
g.add_edge(2, 1, 2)
g.add_edge(2, 3, 3)
g.add_edge(2, 5, 2)
g.add_edge(2, 4, 4)
g.add_edge(3, 2, 3)
g.add_edge(3, 4, 3)
g.add_edge(4, 2, 4)
g.add_edge(4, 3, 3)
g.add_edge(5, 2, 2)
g.add_edge(5, 4, 3)
g.kruskals()

This will print out something like the image below.

Kruskal’s Minimum Spanning Tree Representation

The MST should correspond to this graph:

Kruskal’s Algorithm Python Generated Minimum Spanning Tree Example

Summary of Kruskal’s Algorithm in Python

Kruskal’s algorithm builds a minimum spanning tree of a graph through adding the minimum weighted edges in a greedy manner. We implemented Kruskal’s algorithm in Python by creating a Graph object and adding functions to it to use the union find algorithm to run Kruskal’s algorithm.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
General Python level 1 python

Nested Lists in Python

Nested lists are Python representations of two dimensional arrays. They are used to represent lists of lists. For example, a list of grocery lists for the month or matrices we can multiply. In this post we’re going to go over how to use Python to create and manipulate nested lists.

We’ll go over:

Python Nested Lists

The easiest way to create a nested list in Python is simply to create a list and put one or more lists in that list. In the example below we’ll create two nested lists. First, we’ll create a nested list by putting an empty list inside of another list. Then, we’ll create another nested list by putting two non-empty lists inside a list, separated by a comma as we would with regular list elements.

# create a nested list
nlist1 = [[]]
nlist2 = [[1,2],[3,4,5]]

List Comprehension with Python Nested Lists

We can also create nested lists with list comprehension. List comprehension is a way to create lists out of other lists. In our example below, we’ll create two lists with list comprehension in two ways.

First we’ll create a nested list using three separate list comprehensions. Second, we’ll create a nested list with nested list comprehension.

# create a list with list comprehension
nlist_comp1 = [[i for i in range(5)], [i for i in range(7)], [i for i in range(3)]]
nlist_comp2 = [[i for i in range(n)] for n in range(3)]
print(nlist_comp1)
print(nlist_comp2)

The results of the two lists should be: 

  • [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2]]
  • [[], [0], [0, 1]]

Adding Lists to a Two Dimensional Array

Now that we’ve learned how to create nested lists in Python, let’s take a look at how to add lists to them. We work with nested lists the same way we work with regular lists. We can add an element to a nested list with the append() function. In our example, we create a list and append it to one of our existing lists from above.

# append a list
list1 = [8, 7, 6]
nlist_comp1.append(list1)
print(nlist_comp1)

This should result in this list: [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2], [8, 7, 6]]

Concatenating Two Dimensional Lists in Python

Other than adding a list to our 2D lists, we can also add or concatenate two nested lists together. List concatenation in Python is quite simple, all we need to do is add them with the addition sign. Adding nested lists works the same way as adding plain, unnested lists. In our example, we’ll add the two lists we created using list comprehension together.

# concat nested lists
concat_nlist = nlist_comp1 + nlist_comp2
print(concat_nlist)

The list we should see from this is: [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2], [8, 7, 6], [], [0], [0, 1]]

How to Reverse a Nested List in Python

Now that we’ve created, added to, and concatenated nested lists, let’s reverse them. There are multiple ways to reverse nested lists, including creating a reversed copy or using list comprehension. In this example though, we’ll reverse a list in place using the built-in reverse() function. 

# reverse a nested list
concat_nlist.reverse()
print(concat_nlist)

This should print out the nested list: [[0, 1], [0], [], [8, 7, 6], [0, 1, 2], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4]]

Reversing the Sub Elements of a Nested List

Okay, so we can easily reverse a list using the reverse function. What if we want to reverse the sub elements of a nested list? We can reverse each of the lists in a nested list by looping through each list in the nested list and calling the reverse function on it.

# reverse sub elements of nested list
for _list in concat_nlist:
   _list.reverse()
print(concat_nlist)

If we call the above code on the original concatenated list, we will see this list: [[4, 3, 2, 1, 0], [6, 5, 4, 3, 2, 1, 0], [2, 1, 0], [6, 7, 8], [], [0], [1, 0]]

Reverse the Sub Elements and the Elements of a 2D Python Array

Now we can reverse the elements in a 2D list as well as reverse the elements of each nested list, we can put them together. To reverse the sub elements and the elements of a 2D list in Python, all we do is loop through each of the inside lists and reverse them, and then reverse the outside list after the loop. We can also do it in the reverse order.

# reverse sub elements + elements of a nested list
for _list in concat_nlist:
   _list.reverse()
concat_nlist.reverse()
print(concat_nlist)

Running this on the original concat_nlist should give: [[1, 0], [0], [], [6, 7, 8], [2, 1, 0], [6, 5, 4, 3, 2, 1, 0], [4, 3, 2, 1, 0]]

Turning a 2D List into a Normal List

So far, we’ve learned how to create, add to and together, and reverse a two-dimensional list in Python. Now, let’s take a look at turning that 2D list into a normal or flattened list. In this example, we’ll use list comprehension to extract each element from each sublist in the list.

# flatten list
flat_list = [ele for sublist in concat_nlist for ele in sublist]
print(flat_list)

When running the above code to flatten a list on the original concatenated lists, we should get this list: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 8, 7, 6, 0, 0, 1]

Reverse a Flattened Nested List Python

Let’s put it all together. We just flattened our 2D Python array into a one-dimensional list. Earlier, we reversed our 2D list. Now, let’s reverse our flattened list. Just like with the 2D list, all we have to do to reverse our flattened list is run the reverse function on the flattened list.

# reverse elements of a flattened list
flat_list.reverse()
print(flat_list)

After running the reverse function on the list we flattened above, we should get: [1, 0, 0, 6, 7, 8, 2, 1, 0, 6, 5, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0]

Summary of Python Nested Lists

In this post about nested lists in Python we learned how to create, manipulate, and flatten nested lists. First we learned how to simply create nested lists by just putting lists into a list, then we learned how to create nested lists through list comprehension. 

Next, we learned how to do some manipulation of 2D arrays in Python. First, how to append a list, then how to concatenate two lists, and finally, how to reverse them. Lastly, we learned how to flatten a 2D list and reverse it.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
General Python level 1 python

Level 1 Python: Pure Python Matrix Multiplication

Level 1 Python projects are projects that are more logically complex or require more libraries than the ones in the Super Simple Python series. Pure Python matrix multiplication is one of the projects that is more logically complex, we won’t be using any external libraries here. Most of these projects should take you between 30 to 45 minutes to build.

Matrix multiplication is an important part of linear algebra and machine learning. It’s actually quite easy to do in Python using the numpy library. However, it’s also useful to understand how matrix multiplication actually works. That’s why we’ll be taking a look at how we can do matrix multiplication with pure Python in this post.

In this post, we’ll cover:

  • Creating Some Example Matrices for Python matrix multiplication
  • Validate Matrices Being Multiplied
  • Pure Python Matrix Multiplication with Lists

Creating Some Example Matrices for Python Matrix Multiplication

In order to do matrix multiplication in pure Python, we’ll need to represent the matrices using native Python types. We’ll represent the matrices with lists of lists. Each entry in the outside list corresponds to a row in the matrix. Each entry in the inside list corresponds to a value in the matrix. 

For our example, we’ll create some square matrices to multiply for example. We’ll create two four by four matrices and two two by two matrices and multiply the four by fours by each other and the two by twos by each other.

mat_a = [[1, 2, 3, 4],
        [2, 4, 1, 3],
        [4, 2, 3, 1],
        [3, 1, 4, 2]]
 
mat_b = [[4, 2, 1, 3],
        [3, 4, 1, 2],
        [1, 2, 4, 3],
        [2, 1, 3, 4]]
 
# expected result:
# [[21, 20, 27, 32],
#  [27, 25, 19, 29],
#  [27, 23, 21, 29],
#  [23, 20, 26, 31]]
 
mat_c = [[1, 2],
        [2, 1]]
 
mat_d = [[3, 4],
        [4, 3]]
 
# expected result:
# [[11, 10],
#  [10, 11]]

Validate Matrices Being Multiplied

Before we attempt matrix multiplication, we should validate that it’s possible to multiply the matrices. An n x m matrix can only be multiplied by an m x p matrix. That means that the number of rows in the first matrix must be equal to the number of columns in the second matrix. This is due to the way that matrix multiplication works. When we multiply matrices, we sum the products of each row multiplied by each column.

We’ll create a validation function that takes two parameters. The first parameter is the first matrix and the second parameter is the second matrix. It’s important that these are in order because matrix multiplication is not commutative.

def validate(mat_a, mat_b):
   len_a = len(mat_a)
   len_b = len(mat_b[0])
   assert len_a == len_b

Pure Python Matrix Multiplication with Lists

Now that we’ve got some example matrices and a validation function, let’s create the function to actually do the matrix multiplication. Our matrix multiplication function will take two parameters. The first thing we’ll do is validate that the matrices can be multiplied. 

After validation, we’ll make matrix B easier to deal with in multiplication. To do this, we’ll zip it up into a tuple of tuples, and then wrap that in a list so we end up with a list of tuples. This is what allows us to easily access columns of B. 

Next, we’ll use list comprehension to do our pure Python matrix multiplication calculations. We’ll create a list of lists. The internal list is a sum of products for each entry in a row of A and a column of B for each column of B in the zipped iterable of matrix B. The external list is created based on each row of A in matrix A.

def mat_mul(mat_a, mat_b):
   validate(mat_a, mat_b)
   # turn matrix b into a list of tuples
   iterable_b = list(zip(*mat_b))
   return [[sum(a*b for a, b in zip(row_a, col_b)) for col_b in iterable_b] for row_a in mat_a]
 
print(mat_mul(mat_a, mat_b))
print(mat_mul(mat_c, mat_d))

This is what we should see in our output:

Pure Python Matrix Multiplication Results

Summary of Matrix Multiplication in Pure Python

In this post we learned how to do matrix multiplication with pure Python and no libraries. First we created a couple sets of matrices to multiply. Then we created a validation function to ensure that the two matrices passed to the function would be multipliable. Finally, we created a function to do the matrix multiplication using list comprehension and the zip function in Python.

Further Readings

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

Python Single Responsibility Principle

You’ve probably heard about programming principles. A piece of advice that always gets thrown around is “have the smallest classes/functions/modules possible”. What does that mean though? How do you make a function as small as possible? The single responsibility principle. This programming principle dictates how small a function could possibly be.

In this post, we’ll cover:

  • The Single Responsibility Principle
  • SRP Example with Python Functions
    • Generating Data Points
    • Plotting Generated Data Points
    • Putting them Together
  • A Summary of the Single Responsibility Principle Example

The Single Responsibility Principle

The Single Responsibility Principle (SRP) applies to modules, classes, and functions. In this example, we’ll be demonstrating the SRP with functions. It states that each function should have responsibility over a single part of a program’s functionality and encapsulate that part. 

Encapsulation refers to the bundling of data with the methods that touch that data. It also refers to the idea of restricting direct access to the components of an object or function. It can be thought of like the way we use APIs. When encapsulation is done correctly, we can think of each function to be a black box that returns a specified return value given a set of parameters.

The Single Responsibility Principle is the reason we often see programs divided into many nested directory structures. For example, if we’re working with a blog website, we may see folders like utils, users, and posts. Under each of those folders we’ll see files for specific functionality. For example, under users we may see separate files for logging in, registering, logging out, deleting users, and updating users.

SRP Example with Python Functions

In this post, we’ll cover a small example of SRP of functions with Python. We’ll create two functions that each encapsulate one action, and then one orchestrator function. Before we start, you’ll need to install the matplotlib library, which you can do so by using the line below in the terminal:

pip install matplotlib

In this example, we will generate a set of data points corresponding to the line y=2x for the x values from 0 to 1. Then, we will plot these data points. To show SRP, we’ll create one function which takes no parameters to generate the data points and one function which takes two parameters, lists x and y, and plots the data points.

When applying SRP on classes and modules, we typically split each class or module into its own file or directory. However, when working with functions we don’t need to do that. In this case, we will put the functions in the same file. We’ll open up our file as we always do with our imports. For this example, we’ll need to import matplotlib and random.

import matplotlib.pyplot as plt
import random

Generating Data Points

The first function we’ll make for this example is a function that generates data points. This function will take no parameters. We’ll start off by creating a list of 25 uniformly distributed x values from the range 0 to 1. Then we’ll sort those x values and create a list of y values equivalent to 2x for each value in the list. Finally, we’ll return both lists.

# function 1 - generate dataset
def gen():
    x = [random.uniform(0, 1) for _ in range(25)]
    x.sort()
    y = [2*_x for _x in x]
    return x, y

Plotting Generated Data Points

Our second function will plot two lists of data points. It will take two parameters, the list of x values and the list of y values. First, we’ll call the matplotlib.pyplot or plt class that we imported to plot the x and y values. Then, we will add a title to the plot. Finally, we’ll call the show() function to display the plot.

# function 2 - plot generated dataset
def plot(x: list, y: list):
    plt.plot(x, y)
    plt.title("plotting a generated dataset")
    plt.show()

Putting them Together

Finally, we have our two functions, it’s time to put them together. We’ll create an orchestrate function which takes no parameters. This function will simply call the gen() function and assign its outputs to two variables, x, and y, and then call the plot function to plot x and y

Note that this function is actually not necessary, if you wanted to, you could simply put the two lines in this function directly in the script. This function is only here to illustrate an example of best practices of encapsulation.

# function 3 - tie them together
def orchestrate():
    x, y = gen()
    plot(x, y)
   
orchestrate()

To call the function itself, we simply put a call to orchestrate() in our script. When we run our script, we should end up with a plot of the line y=2x as shown below.

Single Responsibility Principle Summary

In this post we learned about the Single Responsibility Principle and how it affects program structures. We then went over an example of SRP applied to functions in Python. In our example, we created a function that creates a dataset and graphs it. We applied SRP by splitting the functions into one that creates the dataset, one that graphs it, and one that puts them together.

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

Level 1 Python: Scientific Calculator

As part of the Super Simple Python series, we made a basic, four-function calculator. Since then, we’ve also covered how to do math in Python. From the math tutorial, we learned how we could use the Python math module to perform operations outside of the basic add, subtract, divide, and multiply. 

Let’s use the math module to create a scientific calculator in Python. We’ll be building off of the original calculator program, so if you have that, feel free to load it up. There are four steps to building a scientific calculator in Python, they are:

  • Creating Function Definitions
  • The Operation Map
  • Getting the Desired Operation from the User
  • Mapping and Performing Operations

Creating Function Definitions

The first thing we need to do is create the functions for each operation we want our calculator to have. Last time we created four functions: add, subtract, divide, and multiply. This time, we’ll add four more functions. 

We’ll add square, square root (sqrt), log, and exponentiate. The square, square root, and log functions each take one parameter. Meanwhile, the exponentiate function takes two parameters like the other four functions we originally created.

import math
 
# create function declarations
def add(a, b):
    return a + b
 
def subtract(a, b):
    return a - b
 
def divide(a, b):
    return a/b
 
def multiply(a, b):
    return a*b
 
def square(a):
    return a**2
 
def sqrt(a):
    return math.sqrt(a)
 
def log(a):
    return math.log(a)
 
def exponentiate(a, b):
    return a**b

The Operation Map

Next, we’ll create an operation map. This is simply a dictionary that maps a string onto a function. This is one of the nice things about Python, we can set the function as the value in a dictionary automatically. The only string we’ll change from the actual function name is sqrt to “square root”.

# create map
function_map = {
    "add": add,
    "subtract": subtract,
    "divide": divide,
    "multiply": multiply,
    "square": square,
    "square root": sqrt,
    "log": log,
    "exponentiate": exponentiate
}

Getting the Desired Operation from the User

Now we have all our functions written and the dictionary map that maps strings to functions. Next, we’ll write the code to ask the user for the input string. This string has to be one of the strings that is a key in the dictionary. We’ll give the user a list of possible operations, so they know which ones are available.

# ask user for desired operation
op = input("Which operation would you like to do? Add, subtract, divide, multiply, square, square root, log, or exponentiate? ")

Mapping and Performing Operations

At this point, everything is set up except for the actual execution of the operations. Unlike the four-function calculator, we have two kinds of operations, ones that take one argument and ones that take two. That’s why we asked for the operation first this time.

We’ll have to check if the operation is one of the three operations that only takes one input. If it does, then we’ll ask for one number. We’ll use the map to get the function and then pass the user input number to it and print our result.

If we are using one of the operations that require two parameters, we’ll ask the user for two numbers. Once we have the two numbers, we’ll call the function map to get the function and then pass the two input parameters to it. Then, we’ll print out the value returned from the function.

if op in ["square", "square root", "log"]:
    a = float(input("What number would you like to perform your operation on? "))
    x = function_map[op](a)
    print(x)
else:
    a = float(input("What is the first number? "))
    b = float(input("What is the second number? "))
    x = function_map[op](a, b)
    print(x)

Summary of Creating a Scientific Calculator in Python

In this post we extended the basic, four-function calculator into a scientific calculator. We added four functions, two of which use the math library, and three of which take only one parameter. We also extended the function dictionary. Then we changed up our input pattern to get the function first before asking for the numbers because the functions don’t take the same number of parameters anymore.

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
General Python level 1 python

List Directories, Files, and Subdirectories with Python

Creating, updating, and interacting with files is an integral part of data pipelines. In order to interactively access files, we have to be able to list them. There are three ways to list the files in a directory for Python. In this post we’ll cover how to use the os module and three of its commands to list directories, subdirectories, and files in Python.

In this post we will:

  • Get all File and Subdirectory Names
  • Iterate Through Each Entry in the Directory
  • List directories, subdirectories, and files with Python

Get all File and Subdirectory Names

The first way to list all the files and subdirectory names in a directory is the os.listdir() function. This function returns a list of all the names of the entries in the directory as strings. This is the most basic way to list the subdirectories and file names in a directory.

This is the method we used to get all the filenames when combining files that start with the same word.

import os
 
root = "."
 
for obj in os.listdir(root):
    print(obj)

Iterate Through Each Entry in the Directory

A second way to get every entry in a directory is through the os.scandir() function. This function doesn’t return a list of strings, but rather a special iterator object of DirEntry objects. This method is more effective than os.listdir() when we need more than the name of the entries.

Each DirEntry object contains not only the name and path of the entry, but also whether the entry is a file or subdirectory. It also tells us if the entry is a symlink or not. The DirEntry object can make operating system calls so we could raise OSErrors while working with the results from an os.scandir() call.

import os
 
root = "."
 
for obj in os.scandir(root):
    print(obj)

List Directory, Subdirectory, and Files with Python

The os.scandir() and os.listdir() commands are great for getting the entries in one directory, but what if you need the subdirectory entries as well? This is where the third os library function that can iterate through directories comes in, os.walk().

The os.walk() function returns a generator. Each item in the generator is a tuple of size three. The first entry in the tuple is the path, the second is the list of subdirectories, and the third is the list of files. The walk function doesn’t just look in the current directory, it also recursively walks through every subdirectory. 

We can print all the files, including the ones nested in subdirectories, in a directory using the os.walk() function. All we have to do is loop through all the filenames in the list of files and print out the concatenation of the current path and the filename.

import os
 
root = "."
 
for path, subdirs, files in os.walk(root):
    for filename in files:
        print(os.path.join(path, filename))

Summary of Ways to List Directories, Subdirectories, and Files with Python

In this post we learned about three ways to list the files in a directory. The first two methods list the files and subdirectories in the current directory, and the last method goes into all the subdirectories as well. 

The listdir and scandir methods differ in the type of iterables they return, and the metadata attached to the objects in the iterable. On the other hand, `walk` returns a generator and not an iterable. It also contains tuples instead of objects or strings.

The listdir, scandir, and walk functions are the three built-in os functions that allow us to access file data. The `listdir` function is best used when we just need file names in the current directory. If we also need the entry types and more metadata, then scandir is a better option. Finally, if we need access to the subdirectory files as well, we should use the walk function.

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
General Python level 1 python

Download any YouTube Video with Python

Over 100,000 people search for ways to download YouTube videos every month. Here’s how you can download a YouTube video with Python, for free. In this post we’re going to cover how to use the youtube_dl library to download YouTube videos with Python. We’re going to first learn about what the youtube_dl library is, and then build a Python program that will download YouTube videos.

In this post we will cover:

  • What is the youtube_dl Library?
  • How to Download YouTube Videos with the youtube_dl Library

What is the youtube_dl Library?

The youtube_dl library is an open-source command line tool that can download YouTube videos. It is a command line tool, meaning you can run it from your terminal. In addition to having a command line interface, it also has a Python library. The Python library allows you to execute the commands normally executed in the command line with Python. The advantage of using it in Python is having a more customizable interface and compacting other options into one command.

To follow this tutorial, you’ll need to install the Python library for youtube_dl. You can do so by using the line in the terminal below.

pip install youtube_dl

Learn more at yt-dl.org.

Downloading a YouTube Video with the youtube_dl Library

As always, we’ll begin our program by importing libraries. In this case, the only library we need is the youtube_dl library. Let’s create a function that will download a YouTube video from a link. The only parameter it will take is the link itself. We’ll start our function by defining some options for youtube_dl. We’ll tell it that we want our function in mp4 format and also to save it as the id of the video.

Next, we’ll strip the link to get the YouTube video ID. We’ll use the YoutubeDL object to extract the info from the video including the metadata. Then we’ll print out and return the save location. The extract_info function does the downloading part. Running the download_video function on a link will save the video as a .mp4 file in your current directory.

import youtube_dl
 
def download_video(link):
    ydl_opts = {
        'format': 'mp4',
        'outtmpl': "./%(id)s.%(ext)s",
    }
    _id = link.strip()
    meta = youtube_dl.YoutubeDL(ydl_opts).extract_info(_id)
    save_location = meta['id'] + ".mp4"
    print(save_location)
    return save_location

Further Reading

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
General Python level 1 python

Combine Files that Start with the Same Words with Python

Recently I had to combine a bunch of files that started with the same prefix. I did an NLP analysis of a YouTube series where I had to split up my API calls for each episode. After getting the partial analysis results, I wanted to combine them into one file for each episode. This is how we can use Python to combine all the files in a directory that start with the same prefix.

In this post, we’ll cover:

  • Getting all the Files in a Directory with Python
    • Collecting the Files that Start with the Same Prefix
    • Sorting Files in Order
  • Opening the Files and Combining the Contents with Python
  • A Summary of Combining Files that Start with the Same Words with Python

Getting all the Files in a Directory with Python

The first thing we need to be able to do before we can collect the files that start with the same word, is to get all the files in a directory. We’ll use the os library to do this. This is a built-in Python library so we don’t need to install anything.
In this example, we’ll create a function that will get all the filenames in a specified directory. Our function will take no parameters. It will start by creating an empty list to hold the filenames. Next, we’ll loop through all the files in a directory using os.listdir and append those filenames to the list. You could change this function to get the files in any folder by passing in one parameter, the name of the directory.

import os
 
# get filenames
def get_filenames():
    filenames = []
    for filename in os.listdir("./jp/"):
        filenames.append(filename[:-5])
    return filenames

Collecting the Files that Start with the Same Prefix

Now that we’ve got all the files in a directory (using os.listdir) we can also get all the files that start with a prefix. In this case, we’re going to use the get_filenames function we created above to get the list of prefixes. Then we’ll create an empty list that will contain lists of each file that start with the corresponding prefix from our filenames list. 

Next, we’ll loop through each of the prefix filenames. Within each loop we’ll initialize an empty list. This list will hold all the filenames that start with the desired prefix. Now we’ll loop through each of the files in the target directory and check if the filename starts with the desired prefix. If it does, we’ll append it to the list.

import os
from get_filenames import get_filenames
 
# collect all the separate mcps, ners, polarities, and summaries
filenames = get_filenames()
   
part_filenames = []
for filename in filenames:
    parts = []
    for part_name in os.listdir("./ners"):
        if part_name.startswith(filename+"_"):
            parts.append(part_name)

Sorting Files in Order

Now that we have a list of all the filenames that begin with the desired prefix, we should sort them. The code below still belongs in the first for loop but is executed after the directory traversal. We’ll use the sorted function to sort the filenames by len, or length. This is because we already have the files sorted alphabetically, but we don’t have them sorted by length. 

Without this call, files that end in _11 will come before files that end with _2. After sorting all the files for one prefix, we’ll append that list to the larger list.

    parts = sorted(parts, key=len)
    part_filenames.append(parts)

Full Code for Collecting the Files in Order

This is the full code for collecting all the files for a set of prefixes and appending them to a list of lists in sorted order.

import os
from get_filenames import get_filenames
 
# collect all the separate mcps, ners, polarities, and summaries
filenames = get_filenames()
   
part_filenames = []
for filename in filenames:
    parts = []
    for part_name in os.listdir("./ners"):
        if part_name.startswith(filename+"_"):
            parts.append(part_name)
    parts = sorted(parts, key=len)
    part_filenames.append(parts)

Opening the Files and Combining the Contents with Python

We’ve created ways to get all the files in a folder and a way to collect the files that correspond to a specific prefix. Now, we are going to finish up by opening the files and combining the contents with Python. 

We’ll start off by looping through and enumerating the list of filename prefixes. We’ll start off the loop by creating a list. This list will hold the string content of each of the files we’re opening. Next, we’ll loop through the index of the target file names corresponding to the prefix. For each of these target files, we’ll read the file in and append that to the list we created.

After we’ve looped through each of the files in the index of the list of lists corresponding to the prefix, we’ll write the list to a file. We’ll open up a file and loop through each of the entries in the list we created and write the string content.

for i, filename in enumerate(filenames):
    # ners
    ners = []
    for part_filename in part_filenames[i]:
        with open(f"./ners/{part_filename}", "r") as f:
            entries = f.read()
            ners.append(entries)
    with open(f"./ners/{filename}.txt", "w") as f:
        for entry in ners:
            f.write(entry)

Summary of Combining Files that Start with the Same Word with Python

In this file we learned about how to combine files that start with the same word in Python. First, we learned about how to use os.listdir() to get all the file names in a directory. Then we learned about how to loop through a list of files and grab all the ones with the same prefix. Finally, we saw how to loop through those files, open them up, and then combine their contents into one file.

Learn More

To learn more, feel free to reach out to me @yujian_tang on Twitter, connect with me on LinkedIn, and join our Discord. Remember to follow the blog to stay updated with cool Python projects and ways to level up your Software and Python skills! If you liked this article, please Tweet it, share it on LinkedIn, or tell your friends!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

Yujian Tang

I started my professional software career interning for IBM in high school after winning ACSL two years in a row. I got into AI/ML in college where I published a first author paper to IEEE Big Data. After college I worked on the AutoML infrastructure at Amazon before leaving to work in startups. I believe I create the highest quality software content so that’s what I’m doing now. Drop a comment to let me know!

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly