Categories
data structures and algorithms

Out of Place Sort a Singly Linked List in Python

While there are many sorting algorithms in computer science, there are just two ways to classify them. In place, and not in place (or out of place). In place sorting algorithms are ones that are run without an auxiliary data structure such as bubble sort. Out of place (or not in place) sorting algorithms are ones that need extra data structures like merge sort. In this post, we learn how to out of place sort a singly linked list.

We take an unordered singly linked list and create a sorted list out of it. I haven’t seen this algorithm anywhere else on the internet, this is an intuitive sorting algorithm I made up in my head. The time complexity of this algorithm is O(n^2). The space complexity of this algorithm is O(n) – the auxiliary data structure we create becomes the sorted list.

We cover:

  • What Functions Do You Need to Out of Place Sort on a Singly Linked List?
    • Finding the Minimum
    • Removing the Minimum
    • Full Code for a Custom Singly Linked List Class
  • Not In Place Sort Function for a Singly Linked List
    • Visualization of How to Sort a Singly Linked List
    • Code for How to Sort a Singly Linked List in Python
  • Sorting an Example Linked List
  • Summary of Out of Place Sort on a Singly Linked List in Python

What Functions Do You Need to Out of Place Sort on a Singly Linked List?

We cover linked lists and binary trees and sorting algorithms as parts of our introduction to data structures and algorithms series here on PythonAlgos. Now, we are combining the two to learn how to sort a singly linked list. Our original linked list implementation had helper functions to do things like get the size, run a search, delete a node, add a node, or traverse the list. We also had a linked list class separate from the Node class. 

Most of these functions are useless to us right now. For sorting a singly linked list out of place, let’s assume that we won’t bother individually deleting nodes, traversing the list, searching for specific values, or care about the size. We focus solely on the functions we need to actually get a sorted list. The functions that we need are a function to add nodes, a way to print the nodes (this is actually also a nice to have), a way to find the minimum, and a way to remove the minimum.

To start off the Node class, we create a simple __init__ method which takes the required self parameter and a value parameter. This __init__ method creates an initial Node out of the passed in value. We also set the next attribute to None – this is the attribute that makes the singly linked list linked.

Next, we create the append function. This function takes a value and appends it to the end of the linked list. Starting at the root Node, we iterate through until the next attribute is None, meaning we’ve reached the end of the list. Once we’ve reached the end of the list, we set the next value of the last Node to a Node with the passed in value.

The print_nodes function prints all the values of the nodes in one line separated by a space. Similar to the append function, we loop through all of the Nodes until we reach one where the next value is None. For each of the nodes, we print out the value, using an end parameter of a space so that each one does not print a newline before moving onto the next Node. Finally, we print out the value of the last node.

class Node(object):
   def __init__(self, value):
       self.value = value
       self.next = None
  
   def append(self, value):
       while self.next is not None:
           self = self.next
       self.next = Node(value)
 
   def print_nodes(self):
       while self.next is not None:
           print(self.value, end=" ")
           self = self.next
       print(self.value)

Finding the Minimum

The out of place sort algorithm for our singly linked list relies on repeatedly finding the minimum of the linked list. I actually refactored this code multiple times because there are a couple edge cases to consider here. The possible cases you could come across while looking for the minimum of a singly linked list include:

  • There is only one node and it is automatically the minimum
  • The first node is the minimum, and there are multiple nodes
  • The minimum is somewhere in the linked list, and it is not the first one

Handling all of these cases presents some nuance. In addition, we also return the parent when we find the minimum node. Remember that linked lists are only kept in memory by a reference to the root node. If we want to remove a node, we need to get a pointer to the parent of the node so we don’t lose access to the whole list.

The first case that we handle is the case that there is only one node. In this case, we don’t need to do anything but return the node. However, since we are also returning the parent for the last case, we need to return two objects. In this case, we will return a None type object as the parent, and the Node itself as the minimum. The other two cases can be handled with the same strategy.

To start off, we set the minimum node to the self, or the root node, and the parent node to None. Next, we loop through all the nodes until we reach a node that does not point to any other node (the second the last node). For each of the nodes, we check if the value of the next node is less than the value of the current minimum node. If it is, we set the parent node to the node we are on and the minimum node to the next node. Either way, we set the self to the next node to ensure we continue iterating. After we’ve looped through all the nodes, we return the parent and the minimum node.

   # keep track of min and parent
   def find_min(self):
       if self.next is None:
           return None, self
       min = self
       parent = None
       while self.next is not None:
           if self.next.value < min.value:
               parent = self
               min = self.next
           self = self.next
       return parent, min

Removing the Minimum

Now that we have a way to find the minimum node, we need a way to remove the minimum node. This function also has multiple cases to consider. The three cases that we need to consider when removing the minimum of a singly linked list, given that we have found the parent and the minimum already are:

  • There is no parent node to account for
  • The minimum node is the last node and does not have a next node
  • We’re removing a node in the middle and need to account for the next node

The first case happens when the minimum node is the first node or the only node. We handle this case first. When the parent node is None, we set the self object to the next node and the minimum node’s next pointer to None to remove it. Then we return the self object and the minimum node. Note that if the minimum node does not have a next object, we have emptied our original linked list.

Once again, we handle the other two cases together using if statements. Instead of branching on the parent node though, we use a condition on the minimum node. If the minimum node’s next object is not None, we set the parent node’s next pointer to the minimum node’s original next object. Then, we set the minimum node’s next pointer to None to remove the node out of the linked list.

Otherwise, the minimum node’s next object is None, and we are removing the last node. We can simply remove the last node by setting the parent node’s next object to None. Once again, we set the minimum node’s next object to None. We can actually make this code cleaner/less repetitive by setting the min.next = None line outside of the if/else conditional. I’m not sure that makes it more clear, but it does make the file smaller.

   # find minimum and parent
   # point parent.next to min.next if it exists
   # point min.next to None to officially remove
   def remove_min(self):
       parent, min = self.find_min()
       if parent is None:
           self = min.next
           min.next = None
           return self, min
 
       if min.next is not None:
           parent.next = min.next
           min.next = None
       else:
           parent.next = None
           min.next = None
       return self, min

Full Code for a Custom Singly Linked List Class

We’ve covered the logic for each part of the Node class for a singly linked list separately. Here is the full code we use for this implementation.

class Node(object):
   def __init__(self, value):
       self.value = value
       self.next = None
  
   def append(self, value):
       while self.next is not None:
           self = self.next
       self.next = Node(value)
 
   def print_nodes(self):
       while self.next is not None:
           print(self.value, end=" ")
           self = self.next
       print(self.value)
  
   # keep track of min and parent
   # while parent has next
   # # if next value is
   def find_min(self):
       if self.next is None:
           return self, self
       min = self
       parent = None
       while self.next is not None:
           if self.next.value < min.value:
               parent = self
               min = self.next
           self = self.next
       return parent, min
  
   # find minimum and parent
   # point parent.next to min.next if it exists
   # point min.next to original root
   def remove_min(self):
       parent, min = self.find_min()
       if parent is None:
           self = min.next
           min.next = None
           return self, min
 
       if min.next is not None:
           parent.next = min.next
           min.next = None
       else:
           parent.next = None
           min.next = None
       return self, min

Not In Place Sort Function for a Singly Linked List

Now that we have a Node class to create a singly linked list, let’s write our sorting algorithm. As mentioned above, this sorting algorithm is not an in-place sorting algorithm. We actually build an entirely new linked list to sort our original one. I made this sorting algorithm on the fly as an intuitive way to sort a singly linked list. 

This algorithm passes through the linked list once for each element. Each pass through of the linked list consists of removing the current minimum element and appending that element to a new, sorted linked list. In the next two subsections we see a visualization and the code with an explanation of how this sorting algorithm works.

Visualization of How to Sort a Singly Linked List

Let’s visualize how this out of place sorting algorithm works on a simple three element linked list. Assuming we start with a totally out of order list of 8, 4, 2, our algorithm will take three passes. Round one finds the minimum of 2, removes it from the original list, and starts a new singly linked list from it. The second pass looks at the new list of just 8, 4, finds the minimum of 4, and appends it to the new list we made in round one. Finally, we remove the last element, 8, and append that to the new list. Now we are left with a sorted singly linked list: 2, 4, 8.

Code for How to Sort a Singly Linked List in Python

I debated with myself over whether or not this sorting function should be a standalone function that acts on the class, or whether it should be a function in the class. I decided that it should be a standalone function because we are creating a totally new data structure. This sort function takes one parameter, the starting root node of the original list. It returns the new root of the sorted linked list.

As I break down in the comments above the function, this function does two things. First, remove the minimum and put it into a new linked list. Second, repeat step one until the original linked list is empty. Start by getting the parent and current minimum from the passed in root. Then, set the new root of the sorted singly linked list to the returned minimum. While the modified original linked list has a next node, repeatedly remove the minimum and append the new root to the new list. Finally, append the last remaining value in the linked list and return the root of the newly created linked list.

# 1. remove minimum and put it into a new linked list
# 2. repeat 1 until original linked list no longer exists
def sort_nodes(root: Node):
   new_list, min = root.remove_min()
   new_root = min
   while new_list.next is not None:
       new_list, min = new_list.remove_min()
       new_root.append(min.value)
   new_root.append(new_list.value)
   return new_root

Sorting an Example Linked List

Let’s test our function with three examples. The first example is the three element example shown in the visualization: 8, 4, 2. Next, we also append 1 and 5 and sort, and then 7 and sort. The resulting sorted lists should be: 2, 4, 8, then 1, 2, 4, 5, 8, and 1, 2, 4, 5, 7, 8. The example code to test the sort function is shown below.

root = Node(8)
root.append(4)
root.append(2)
root = sort_nodes(root)
root.print_nodes()
 
root.append(1)
root.append(5)
root = sort_nodes(root)
root.print_nodes()
 
root.append(7)
root = sort_nodes(root)
root.print_nodes()

When we run the code above, we get an output that matches the expected sorted lists as shown below.

Summary of Out of Place Sort on a Singly Linked List in Python

In this post, we created a custom linked list implementation and an out of place sorting algorithm. The sorting algorithm we created runs in O(n^2) time and requires O(n) space. It works by finding and removing the minimum element from the original linked list, and then appending it to a new, sorted linked list.

More by the Author

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
NLP The Text API

Twitter, NLP, and the 2022 World Cup

This article is published at 9:55am EST, 5 minutes before the start of the 2022 World Cup Final between Argentina and France.

This is the umpteenth installment in my exploration of whether or not Twitter sentiment is good at predicting anything. Last year, I used it on NFL games and Starbucks stock prices. The results? Twitter sentiment was better than most bettors for NFL games, coming in at about a 60% correct prediction by betting on the lower sentiment team. However, it was absolutely abysmal for stock prices. For this post, we’re not just going to look at sentiment, but also get a summary, extract the most common phrases, and the most common named entities through named entity recognition.

In the midst of the 2022 World Cup hype, I thought I’d revive this project and see how it predicts Argentina vs France. In this post we’ll take a look at:

  • Project Outline
  • What Are We Getting From Twitter?
  • Applying NLP Techniques
    • Asynchronous Calls to The Text API
    • Getting the Most Common Named Entities
    • Putting it Together
  • Predictions from Twitter vs My Personal Thoughts
  • Extras + Disclaimers
    • Create a Word Cloud
  • Summary

Project Outline

Before we get started, let’s take a look at what the outline of this project looks like. All the .png files except cloud_shape.png are produced by the program. Pay no attention to the __pycache__. The important files to look at here are orchestrator, pull_tweets, and inside of the text_analysis folder: async_pool, ner_processing, and text_orchestrator. The text_config and twitter_config files are the files I used to store my API keys so they don’t get uploaded to GitHub.

What Are We Getting From Twitter?

I used the Twitter API to get these Tweets. There’s some complaints about the limited use of this API, but it’s good enough unless you’re one of those people that needs to get every single Tweet on Twitter. In which case, you’re out of luck, this tutorial won’t help you do that. Anyway, we’re also going to need the requests and json library to send our HTTP request and parse the response. This is the pull_tweets file.

Once we’ve set everything up, we need to create a header to send to the API endpoint. For the Twitter API, that’s simply the bearer token. It also needs to be preceded by Bearer . Annoying, I know. I also added some params to our Twitter search. Specifically, we’re only getting English Tweets (lang:en) without links (-has:links) and are not retweets (-is:retweet). We’re also only going to grab the latest 50 Tweets. This is more about the amount of time that the connection can stay open rather than the total number of Tweets we care to analyze.

Once we have everything set up, we simply send the request to get the Tweets. After getting the Twitter data returned, we extract just the data portion to get the list of Tweets. I also join them all up with periods and a space to create one text paragraph. This is mainly for further processing down the line.

import requests
import json
 
from twitter_config import bearertoken
 
search_recent_endpoint = "https://api.twitter.com/2/tweets/search/recent"
headers = {
   "Authorization": f"Bearer {bearertoken}"
}
 
# automatically builds a search query from the requested term
# looks for english tweets with no links that are not retweets
# returns the tweets
def search(term: str):
   if term[0] == '@':
       params = {
           "query": f'from:{term[1:]} lang:en -has:links -is:retweet',
           'max_results': 50
       }
   else:
       params = {
           "query": f'{term} lang:en -has:links -is:retweet',
           'max_results': 50
       }
   response = requests.get(url=search_recent_endpoint, headers=headers, params=params)
   res = json.loads(response.text)
   tweets = res["data"]
   text = ". ".join( for tweet in tweets])
   return text

Applying NLP Techniques

Now we’re going to get into the text_analysis folder. There are three main files to look at there. First, the async_pool file to call The Text API, a comprehensive NLP API for text, to get a summary of the combined Tweets, the most common phrases, the named entities, and the overall sentiment. Second, a file to process the named entities into the most common named entities. Third, an orchestrator to put the two together.

Asynchronous Calls to The Text API

This is the async_pool file. We’ll need the asyncio, aiohttp, and json libraries to execute this file. We call four different API endpoints asynchronously using asyncio and aiohttp. The first thing we do is set the headers and API endpoints.

Next, we create a function to configure the request bodies that we send. We simply create a dictionary and assign a different request body as the value to each API endpoint key. Learn more about the optional values (proportion, labels, and num_phrases) in the documentation.

We need two more helper functions before we can pool the tasks and call the API asynchronously. One to gather the tasks into a semaphore to execute, and one to asynchronously call the API endpoints. Once we have these two, we simply create a thread pool and call the API endpoints. Learn more in this tutorial on how to call APIs asynchronously.

import asyncio
import aiohttp
import json
 
from .text_config import apikey
 
# configure request constants
headers = {
   "Content-Type": "application/json",
   "apikey": apikey
}
text_url = "https://app.thetextapi.com/text/"
summarize_url = text_url+"summarize"
ner_url = text_url+"ner"
mcp_url = text_url+"most_common_phrases"
polarity_url = text_url+"text_polarity"
 
# configure request bodies
# return a dict of url: body
def configure_bodies(text: str):
   _dict = {}
   _dict[summarize_url] = {
       "text": text,
       "proportion": 0.1
   }
   _dict[ner_url] = {
       "text": text,
       "labels": "ARTICLE"
   }
   _dict[mcp_url] = {
       "text": text,
       "num_phrases": 5
   }
   _dict[polarity_url] = {
       "text": text
   }
   return _dict
 
# configure async requests
# configure gathering of requests
async def gather_with_concurrency(n, *tasks):
   semaphore = asyncio.Semaphore(n)
   async def sem_task(task):
       async with semaphore:
           return await task
  
   return await asyncio.gather(*(sem_task(task) for task in tasks))
 
# create async post function
async def post_async(url, session, headers, body):
   async with session.post(url, headers=headers, json=body) as response:
       text = await response.text()
       return json.loads(text)
  
async def pool(text):
   conn = aiohttp.TCPConnector(limit=None, ttl_dns_cache=300)
   session = aiohttp.ClientSession(connector=conn)
   urls_bodies = configure_bodies(text)
   conc_req = 4
   summary, ner, mcp, polarity = await gather_with_concurrency(conc_req, *[post_async(url, session, headers, body) for url, body in urls_bodies.items()])
   await session.close()
   return summary["summary"], ner["ner"], mcp["most common phrases"], polarity["text polarity"]

Getting the Most Common Named Entities

This is the ner_processing file. The ner_processing file takes the named entities returned from the last file, async_pool, and processes them. We take the whole list, which is actually a list of lists, and loop through it twice. For each of the named entities in the list of lists, there are two entries. First, a named entity type, which could be a person, place, thing, organization, and so on. Learn more in this post about named entity recognition and its types.

We build a helper function to find the most common named entities by creating a nested dictionary. The key in the first layer is the named entity type. The inner dictionary contains key-value pairs of the named entity and how often it appears. Then we create a function to sort each of the inner dictionaries using lambda functions and return the top values to get the most common named entities.

# build dictionary of NERs
# extract most common NERs
# expects list of lists
def build_dict(ners: list):
   outer_dict = {}
   for ner in ners:
       entity_type = ner[0]
       entity_name = ner[1]
       if entity_type in outer_dict:
           if entity_name in outer_dict[entity_type]:
               outer_dict[entity_type][entity_name] += 1
           else:
               outer_dict[entity_type][entity_name] = 1
       else:
           outer_dict[entity_type] = {
               entity_name: 1
           }
   return outer_dict
 
# return most common entities after building the NERS out
def most_common(ners: list):
   _dict = build_dict(ners)
   mosts = {}
   for ner_type in _dict:
       sorted_types = sorted(_dict[ner_type], key=lambda x: _dict[ner_type][x], reverse=True)
       mosts[ner_type] = sorted_types[0]
   return mosts

Putting it Together

This is the text_orchestrator file. The orchestrator for the text analysis simply strings the functionality we created above together. First, we run the asynchronous API calls to get our summary, named entities, most common phrases, and the overall polarity. Then, we process our named entities and replace the newlines in the summary for a pretty printout and return all the values.

# input: text body
import asyncio
 
from .async_pool import pool
from .ner_processing import most_common
 
def orchestrate_text_analysis(text:str):
   """Step 1"""
   # task to execute all requests
   summary, ner, mcp, polarity = asyncio.get_event_loop().run_until_complete(pool(text))
  
   """Step 2"""
   # do NER analysis
   most_common_ners = most_common(ner)
   summary = summary.replace("\n", "")
   return summary, most_common_ners, mcp, polarity

Predicted Outcome From Twitter and Personal Thoughts

We run the root level orchestrator file to put it all together and see what Twitter thinks. This file calls the functions we made earlier. We create a function to put the word tasks together and an orchestration function to string the word tasks in after calling the search function to pull from Twitter. I’ve also added a timing to see how long things take. This is more for curiosity and benchmarking than anything else.

Finally, to orchestrate, we simply call the orchestrate function. In best practice, you would not do this in the same function. But this is simply an example tutorial so we will.

#imports
from pull_tweets import search
from text_analysis.text_orchestrator import orchestrate_text_analysis
from word_cloud import word_cloud
import time
 
def word_tasks(text: str, term: str):
   summary, most_common_ners, mcp, polarity = orchestrate_text_analysis(text)
   word_cloud(text, term)
   return summary, most_common_ners, mcp, polarity
 
# create function
def orchestrate(term: str):
   # pull tweets
   starttime = time.time()
   text = search(term)
   # call thread task for word stuff
   summary, most_common_ners, mcp, polarity = word_tasks(text, term)
   # thread tasks to create summary, ner, mcp, polarity, and word cloud tweets
   print(summary)
   print(most_common_ners)
   print(mcp)
   print(polarity)
   print(time.time()-starttime)
 
orchestrate("#argentina")
orchestrate("#france")
orchestrate("#worldcup")

Here’s the results we get:

The important thing to note here is this: France has a higher sentiment than Argentina. This is the main thing that I wanted to explore. Can we use Twitter sentiment to predict sports matches? If we follow the NFL logic – 60% of the time the lower sentiment team won – we can expect that Argentina will likely win. 

My personal prediction is also that Argentina will win – go Messi!

Extras + Disclaimers

I should put some disclaimers here – this is by no way a perfect method. This is simply a sample project that I thought would be fun to create to explore Twitter, NLP, and the World Cup! These Tweets were pulled about an hour ahead of time at 8:56am EST. 

Create Word Clouds

We also called a word cloud function in our orchestrator that I did not address earlier. The code below shows that function. Learn more in this tutorial on how to create a word cloud in Python.

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
 
# wordcloud function
def word_cloud(text, filename):
   stopwords = set(STOPWORDS)
   frame_mask=np.array(Image.open("cloud_shape.png"))
   wordcloud = WordCloud(max_words=50, mask=frame_mask, stopwords=stopwords, background_color="white").generate(text)
   plt.imshow(wordcloud, interpolation='bilinear')
   plt.axis("off")
   plt.savefig(f'{filename}.png')

Here are the images of the #Argentina and #France word clouds:

Summary

In this tutorial project we pulled tweets from Twitter via the Twitter API, asynchronously called four API endpoints for NLP tasks, created a word cloud, and orchestrated all of it. Let’s go Argentina!

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

Converting Audio File Type in Python

“This audio format is not supported”

Have you ever gotten this error message? That’s what inspired this article. I had a file that I needed to be an mp3 file that was saved as an m4a. I needed the file type to be mp3 to work with it. There were two solutions available to me. I could find a site online, or I could do it myself.

I went with the latter. In this article, we’re going to cover how you can convert different audio types in Python.

What is PyDub AudioSegment?

PyDub is one of a few good Python audio manipulation libraries. The AudioSegment module from PyDub is the most useful module in the library. It provides an all around powerful interface for manipulating your audio data. You can use AudioSegment to clip audio data, to play with the volume, change frame rates, and much more. 

Most relevant to us at the moment, you can use PyDub AudioSegment to convert audio file types. Before we dive into the code, make sure that you have the PyDub library installed via pip install pydub. If you are using Anaconda, you should be able to install it with conda install pydub.

Convert Audio File Types with Python

The code to convert audio file types with Python is incredibly easy to implement with PyDub. We start off our code by importing the AudioSegment module from PyDub. Then, we write a simple function that converts an audio file from one format to another.

This function needs three parameters. It needs to know the name of the file, the original format of the audio file, and the desired format that we want to convert it to. I’ve also added a short documentation blurb in the code below to describe the parameters. There’s no return value here, we’re not going to return the audio file, we’re just going to save it as the desired file type.

The actual audio file type conversion only takes two lines of Python. Isn’t that great? The first line creates an AudioSegment object using from_file, which takes two parameters. We need to pass the name of the file (including the file type), and the format that the file is in.

Now that we have a PyDub AudioSegment object, all we do is call the export function on it. The export function takes two parameters. We need to pass it the filename with the format that we want to convert to and the file format type as a string. That’s it. That’s all there is to creating a function that converts audio file types in Python.

from pydub import AudioSegment
 
def convert(filename: str, from_format: str, to_format: str):
   '''Converts audio file from one format to another and exports it
  
   Params:
       filename: name of original file
       from_format: format of og audio file
       to_format: desired format'''
   raw_audio = AudioSegment.from_file(f"{filename}+{from_format}", format=from_format)
   raw_audio.export(f"{filename}+{to_format}", format=to_format)

Converting an M4A to an MP3 in Python

Before we wrap up, let’s take a look at what it looks like to use this function to convert an m4a file to an mp3 file. Just like the actual function to convert audio file type, calling the function is incredibly easy. For our example, we’ll use this audio file. This is an m4a type audio file that I recorded.

To convert this to an mp3 file, all we do is call the convert function and pass the three parameters we declared: the name, the original audio file type, and the target file type. Once we call this function, we should see a new file in our folder – cows_crows.mp3.

# to run:
convert("cows_crows", "m4a", "mp3")

Summary of Converting Audio File Type in Python

In this tutorial, we learned how to convert audio file types with Python. First we looked at the PyDub function and the AudioSegment module from it. Then we created a function that took three parameters – a file name, the original audio file type, and the target audio file type. Finally, we used that function to convert an audio file as an example.

Further Reading

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

Working with the JSON library in Python

The JSON (Java Script Object Notation) is defined as a light-weight, data interchange file format. The syntax is specified by The IETF Datatracker and the ECMA. These are organizations dedicated to the standardization of databases and communication systems. This makes working with the JSON library in Python easier and more intuitive than ever.

JSON is a format that is used to send, receive and store data from systems in a network. It is “light-weight” in comparison to larger formats, such as HTML or XML. There are formatting techniques in Python that can be used to convert JSON to Python objects and vice versa. Python is built with a standard library of helpful modules and the JSON module is one of them. Once we import it using the following code, we have full use of its functionalities.

import json

JSON functions: json.dumps()

Syntax for json.dumps():

json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Some parameters used:

  • obj is the Python serializable object that you want to convert into a JSON format.
  • fp is a file pointer used to write JSON formatted data into a file. 
  • skipkeys is default = False. If coded as True, then any dictionary keys that are not of a basic type (i.e. str, int, float, bool, None) will be skipped instead of raising a TypeError
  • ensure_ascii is default = True and the output will have all incoming non-ASCII characters escaped. Changing that to false, the characters will be the same on output as on input.
  • allow_nan is default = True. The JavaScript equivalents such as NaN, Infinity and -Infinity will be used. When marked as False, if we attempt to serialize out of range float values, we will get a ValueError
  • indent is used to make the JSON output more readable. It prints it in pretty-print format.
  • sort_keys is default = False. When it is marked as True, the output of dictionaries will be sorted by key

The json.dumps() method is used to convert a Python object to a JSON string. Let’s say we receive an HTTP request from an application to send over the details of a popular drag queen from RuPaul’s Drag Race. The data is stored in a database. We can retrieve the information and store it in a Python dictionary. Then we can convert the Python dictionary into a JSON formatted string to send as a response to the request. To do this, we can use the json.dumps() method.

import json
 
queen = {'Name': 'LaGanja Estranja', 'Drag Race Season': 6, 'fav_color': 'green', 'skills': ['death drops', 'performance', 'green goddess']}
 
new_json_object = json.dumps(queen)
 
print(type(new_json_object))
print(new_json_object)

Output:

<class 'str'>
{'Name': 'LaGanja Estranja', 'Drag Race Season': 6, 'fav_color': 'green', 'skills': ['death drops', 'performance', 'green goddess']}

The output is a converted Python object that is now a JSON string.

JSON functions: json.dump()

Syntax for json.dump():

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

The json.dump() method is used to write a Python serialized object into a JSON formatted data file. The parameters are similar to the json.dumps() method above.Using the same example from above, we can take the returned JSON string and store it in a file for future use. Using the json.dump() method, we can convert the Python dictionary into a JSON format and write it into the file.

import json

queen = {'Name': 'LaGanja Estranja', 'Drag Race Season': 6, 'fav_color': 'green', 'skills': ['death drops', 'performance', 'green goddess']}

with open("queen.json", "w") as write_file:
  json.dump(queen, write_file)

After running the above code, we have created a JSON formatted data file called “queen.json”

Mapping during encoding and decoding

Mapping happens between JSON and Python entities while encoding JSON data. Encoding is also called serialization. Deserialization is the process of decoding the data. These are the transformations that process our data into a series of bytes that are then stored and/or transmitted across a network. 
To encode Python objects into JSON equivalent, the Python json module uses a specific conversion. The json.dump() and json.dumps() methods both perform the conversions when encoding. The following is a conversion table from the official Python documentation. It displays the intuitive mapping between JSON and Python data types.

JSON functions: json.load()

Syntax for json.load():

json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

Some parameters used:

  • fp is a file pointer used to write JSON formatted data into a file. 
  • object_hook is the optional function that will be called with the result of any object literal that is decoded. It is a custom decoder that can convert JSON data types into Python types other than the primitive ones it is built to handle. 
  • object_pairs_hook is similar to the optional function above but in regard to object literals decoded with an ordered list of pairs.

JSON files are read using the json.load() method in Python. Another way to say this is that the load() method is used to deserialize a file to a Python object. Using the same example as above, we now have a separate JSON file named “queen.json” that contains the following data:

{'Name': 'LaGanja Estranja', 'Drag Race Season': 6, 'fav_color': 'green', 'skills': ['death drops', 'performance', 'green goddess']}

We can open and read the data file we created by using the json.load() function.

import json

with open("queen.json", "r") as read_file:
    queen_info = json.load(read_file)
for key, value in queen_info.items():
        print(key, ":", value)

Output:

Name : LaGanja Estranja
Drag Race Season : 6
fav_color : green
skills : ['death drops', 'performance', 'green goddess']

The output is the Python Dictionary converted from the JSON data file. 

JSON functions: json.loads()

The json.loads() method  is used to convert a JSON string into a Python dictionary. It will deserialize native string, byte, or byte array instances containing JSON data. This method as well as the json.load() method both use the same mapping conversion table mentioned earlier.

import json
 
queen = """{"Name": "LaGanja Estranja", "Drag Race Season": 6, "fav_color": "green", "skills": ["death drops", "performance", "green goddess"]}"""

print(f"the queen variable is of the type:{type(queen)}")
 
queenDictionary = json.loads(queen)
print(f"the queenDictionary variable is of the type:{type(queenDictionary)}")
print(queenDictionary)

Output:

the queen variable is of the type:<class 'str'>
the queenDictionary variable is of the type:<class 'dict'>
{'Name': 'LaGanja Estranja', 'Drag Race Season': 6, 'fav_color': 'green', 'skills': ['death drops', 'performance', 'green goddess']}

This returns a Python dictionary from the JSON string. We can also find specific data from the dictionary by accessing its keys directly.

queenDictionary = json.loads(queen)
 
print(queenDictionary["Name"])
print(queenDictionary["fav_color"])
print(queenDictionary["skills"])

Output:

LaGanja Estranja
green
['death drops', 'performance', 'green goddess']

Summary

This has been a tutorial on working with JSON data in Python. We covered a few of the functions and their implementations.  If you enjoyed this article, please follow me, Z. Myricks here on PythonAlgos for more guides in all things Python and Natural language Processing. Connect with me on Twitter and LinkedIn also!

Further Reading

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
python starter projects

Can Python Guess Your Number?

Last year we created a High Low Guessing Game as part of our series on Super Simple Python projects. In that game, the computer picks a number and then you guess the numbers. With each guess, the computer will tell you if the number you guessed is higher or lower than the number it picked.

In this game, we’re going to do the opposite. You, the player, are going to think of a number between 1 and 100 and the computer is going to guess your number. With each guess our Python program makes, you will tell it if its guess is higher, lower, or just right.

We’ve split this program into two pieces. In this post, we’ll cover:

  • Setting the First “Guess”
  • Having Python Guess Your Number
  • Summary of How Python Can Guess Your Number

Setting the First Guess

The first thing we need to do is set up our first guess. To have the computer guess our number, we’ll need the random library. Specifically, we’ll need the randint function from the random library. This function generates a random integer between the low and high values passed to it, inclusive of both of those values. 

We start off our program by prompting the player (you) to think of a number between 1 and 100. Next, we have the computer guess a random integer between 1 and 100. We set l and h as the lower and upper (higher) bounds of our guessing range. This will come in handy shortly. As the first step in our game, we’ll ask the player if the first randomly generated number is higher, lower, or on point.

from random import randint
 
print("Think of a number between 1 and 100")
l = 1
h = 100
x = randint(l, h)
guessed = input(f"Is {x} higher, lower, or did we guess on point?(h, l, y) ")

Having Python Guess Your Number

At this point, we’ve gotten it so that the Python program can guess a number. Now, to have the computer guess your number, we just have to repeat this process with changing high/low bounds.

We’ll do this with a while loop. Our loop will run while our number has not been guessed by the player. As long as the answer to the input isn’t y, we adjust our high/low bounds and guess again. If the number that Python guessed was lower, then we set the new low bound to the previous guess plus 1. Else if the number that Python guessed was higher, we set the new high bound to the previous guess minus 1.

Once we’ve reset the new bounds, we generate a new random integer and ask the user if that is their number.

while guessed != "y":
   if guessed == "l":
       l = x + 1
   elif guessed == "h":
       h = x - 1
   x = randint(l, h)
   guessed = input(f"How about {x}?(h, l, y) ")

The two screenshots below show what it would be like to play the game if we were thinking of the numbers 50 and 12, respectively. 

Computer Guesses the Number 50
Computer Guesses the Number 12

Summary of How Python Can Guess Your Number

In this post we learned how to create a simple Python program to guess a number you’re thinking of between 1 and 100. We used the randint function from the random library to generate random integers between a high and low bound. To kick off our guessing game, we present you, the player, with a number and ask you to tell the computer if it’s higher than, lower than, or exactly the number you have in your head.

Then, depending on your answer, the computer adjusts the high and low bounds. After adjusting the bounds, it then comes up with another guess and presents you with that. This goes on until the computer has guessed your number. 

BONUS – what’s the expected number of times that the computer has to guess before it guesses your number?

Further Reading

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

Expense Tracker in Python – Level 1

Expense tracking is a common task used in every industry. In this post, we’re going to build a simple expense tracker in Python for exercise. This Python expense tracker will simply track your expenses in a CSV file. By the end of this tutorial, you’ll have a Python expense tracking program that shows you your expenses and allows you to add to the tracker.

In this post, we’ll cover:

  • Reading Expenses from a CSV File in Python
  • Writing to Your Python Expense Tracker
  • Seeing and Adding to Your Expenses
  • Summary of How to Build a Python Expense Tracker

Find this project on GitHub.

Reading Expenses from a CSV File in Python

We could start with a function to read from or write to the CSV file that we’re using to track our expenses. For this tutorial, I chose to start with reading expenses. First things first, we import the csv library. 

Our read_expenses function doesn’t need any parameters. We’re hard-coding in the name of our expense file. This means we need to make sure that we run this program in the same folder that we have the program code and the CSV file.

First, we try to open the expenses.csv file and create a CSV reader. Next, we declare an empty list that represents the expenses. We use the CSV reader to add all the rows of the expense tracker to the list of expenses. 

Now, we print out our expenses. Notice the comment there that denotes the way that the expenses are written. This column format has to be followed for both the expense reader and writer. We show the user that on a certain date, they spend the cost on the category.

If there is no expenses.csv file, we simply print out that the file doesn’t exist and move on.

import csv
 
def read_expenses():
      try:
       with open("expenses.csv", "r") as f:
           csv_reader = csv.reader(f, delimiter=",")
           expenses = []
           for row in csv_reader:
               expenses.append(row)
       # expenses come in the columns of date (0), category (1), price (2)
       for line in expenses:
           print(f"On {line[0]}, {line[2]} was spent on {line[1]}")
   except:
       print("No Expense Tracker File Exists Yet")

Writing to Your Python Expense Tracker

Now that we’ve created a function to read from our expense tracker, let’s create a function to write to it. Our write_expenses function executes a while loop while we are reporting expenses. To do this, we start with a reporting variable that we set to True and then open up the expenses file in append mode and create a CSV writer with it.

While we are reporting, we ask the user to input the date, category, and cost. We write this data that the user input into the expense tracker. Once we write the data in, we ask the user if they are done reporting. If they are, then we set our reporting variable to False to end the while loop. Finally, we close the file.

def write_expenses():
   reporting = True
   f = open("expenses.csv", "a")
   expense_writer = csv.writer(f, delimiter=",")
   while reporting:
       date = input("What date was the expense incurred? ")
       category = input("What category is the expense for? ")
       cost = input("How much money did you spend? ")
       expense_writer.writerow([date, category, cost])
       end = input("If you are done inputting expenses, type \"end\" ")
       if end == "end":
           reporting = False
 
   f.close()

Seeing and Adding to Your Expenses

Now it’s time to put the reading and writing functions together. First we tell the user that we’re going to show them the current state of the expense report. Then, we call the read_expenses function we made earlier. 

After showing them the current expense tracker (or the fact that it doesn’t exist), we ask the user if they want to report expenses. If they do, then we call the write_expense function we made.

print("Current state of expense report: ")
read_expenses()
report = input("Would you like to report expenses?(y/n) ")
if report == "y":
   write_expenses()

Starting from no expense tracker file, this is what an expense report would look like when we run the program.

After we put in the initial expenses, we can run the program again and see that all our expenses were saved and we can continue to report expenses if we’d like.

Summary of How to Build a Python Expense Tracker

In this post we learned how to build a simple CSV expense tracker in Python. We made two functions to encapsulate the reading and writing to a CSV functionality of our expense tracker. Then, we wrote a few lines of Python to run the program as a script. Our Python expense tracker shows you the date you made the expenditure, the category that you spent on, and the amount you spent.

Categories
level 1 python

Level 1 Python: Create an Audio Clipping Tool

Clipping audio files is one of the most basic functions of working with audio data. The `pydub` library makes this super easy. Just like the piece about cropping and resizing images, the only reason this program makes it into the Level 1 Python category is the use of an external library.

In this post we’re going to cover how to use `pydub` to both clip audio, and save it to a file. See this post for a full guide to manipulating audio data in Python. It goes over how to resample, merge, and overlay audio data and more. Before we dive into the code, you’ll need to install `pydub` with your package manager. I use `pip install pydub`. 

In this post we’ll cover:

  • How to Clip an Audio File in Python
  • How to Save and Clip an Audio File
  • Summary of Clipping and Saving Audio Files with Python

How to Clip an Audio File in Python

The first thing that we do is import the `AudioSegment` object from `pydub`. This is going to do most of our work for us. In our `clip_audio` function, we take three parameters. The sound itself, and the start and end of the clip that we want. Start and end have to be specified in milliseconds.

In the function, we simply take advantage of `AudioSegment` objects being able to access their frames like lists. We store the snippet from the passed in start to end milliseconds in a variable and return that variable. Technically, we could skip storing it and just return the slice of audio.

from pydub import AudioSegment
 
def clip_audio(sound: AudioSegment, start, end):
   extracted = sound[start:end]
   return extracted

How to Save and Clip an Audio File

This code goes in the same file as the code above. This function doesn’t just clip an audio file, but also saves it. The `clip_and_save_audio` function takes four parameters. The first three are the same as the `clip_audio` function, the fourth is a filename. 

We pass the first three functions exactly as they’re passed into the `clip_audio` function we made above. This returns an audio clip to us that we then `export` to a filename and format. You can specify the format and filename to whatever format you need. Just make sure that the filename you export to ends in the format extension.

def clip_and_save_audio(sound: AudioSegment, start, end, filename):
   extracted = clip_audio(sound, start, end)
   extracted.export(f"{filename}.wav", format="wav")

Summary of Clipping and Saving an Audio File in Python

Editing audio files doesn’t have to be hard. We can create simple Python tools that will help us do edits like clipping and saving in seconds. In this post we used `pydub` and its `AudioSegment` object to clip and save an audio file. See this post for a full guide to manipulating audio data in Python.

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

How to Create a Simple Memory Game in Python

Want to have a better memory? It’s been said that playing memory games helps. In this post, we build a simple memory game in Python. This game will test your memory by giving a series of strings for you to remember until you can’t do it anymore.

*Disclaimer: memory capacity improvement not guaranteed.

How to Create a Python Memory Game

We need three libraries to create a memory game in Python. We need the random library to generate random letters. The string library provides an easy way to pass a set of lowercase letters. The last library we need is the time library to enforce a time limit. These are all built-in Python libraries so you don’t have to install any external libraries.

We run the game in a play_game function that takes no parameters. We start by setting k, the number of letters displayed at once, to 3. Then we set loss, tracking whether the user has lost or not, to False and get the list of lowercase letters from the string library.

While the user hasn’t lost the game, we show a randomly generated string of length k for three seconds at a time. Then, we clear the screen by printing a ton of new lines. I’m using a terminal that’s small enough to be cleared with 100 lines. When the user inputs the correct string, we increment k by 1. As long as the user keeps winning, we keep playing. When the user misses a letter, we tell the user their score and end the game.

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
level 1 python

Image Resizing and Cropping Tool in Python

How do websites make those icons scale up and down with your screen size? By having multiple of the same image in different sizes. If you want to add your logo to your site in multiple different sizes, then you’ll need to learn how to do image resizing. Image resizing is always an annoying task. In this post, we’re going to learn not only how to resize images, but also how to crop them with Python.

In this post we’re going to cover:

  • What is PIL?
  • How to Crop an Image in Python with PIL
  • How to Resize an Image in Python with PIL
  • Using PIL to Save an Image
  • Testing Our Image Crop, Resize, and Save Functions
  • Summary of How to Resize, Crop, and Save Images in Python

What is PIL?

PIL stands for “Python Imaging Library”. It is an add-on library to Python for image processing. It was initially released in 1995 and discontinued in 2011. The current version of PIL that we use in this post was forked as “Pillow”. Pillow adds support for Python 3.

PIL, and subsequently Pillow, has a range of image manipulation tools. You can do per-pixel changes, masking and transparency, filtering, enhancement, adding text, and more. In this post, we will simply be using it to crop, resize, and save an image. The image we’re using is the word cloud background from this post on Word Clouds from Tweets.

Before we jump into the code, we have to install the library. We can do that with pip install pillow. If you are using Anaconda, you can use conda install pillow.

How to Crop an Image in Python with PIL

All of the code in this post belongs in one file. If you want to split it up, remember to import the PIL library each time. For our uses, we only need the Image object from PIL. The first thing that we’re going to do is open up an image and assign it to a variable. Next, we’ll print out the size just for our info.

Our crop_image function takes five parameters. The first parameter is the image itself, we require this to be an Image object. Next are the coordinates for the upper left and lower right coordinates of the rectangle we want to crop. It’s weird that Image takes this as a 4-tuple instead of two 2-tuples, but that’s the way the cookie crumbles.

The order of the integers that we need to pass are the leftmost value, the uppermost value, the rightmost value, and the bottommost value that we want to crop. In our function, we simply call the crop function of the Image object and pass a 4-tuple made from the integers passed in. We can show the image for clarity. At the end, we return the image so we can use it later.

from PIL import Image
 
im = Image.open("./cloud_shape.png")
 
width, height = im.size
print(width, height)
 
# left, upper combo gives the upper left corner coordinates
# right, lower combo gives the lower right corner coordinates
def crop_image(im: Image, left, upper, right, lower):
   im2 = im.crop((left, upper, right, lower))
   im2.show()
   return im2

How to Resize an Image with PIL in Python

Next, let’s take a look at resizing images with Python. This function only takes 3 parameters. The image itself is the first parameter. The other two parameters are the resulting width and height that we want to resize the image to.

Similar to cropping, all we do here is call the resize method. This method takes a tuple of the desired width and height of our image. Then we show the image for our info and return it for later use.

def resize_image(im:Image, width, height):
   im1 = im.resize((width, height))
   im1.show()
   return im1

Using PIL to Save an Image in Python

Finally, let’s make a third function to save images in Python. This function takes two parameters, one is the Image itself, and the other is the name of the file we want to save the image to. Similarly to cropping and resizing, we use the Image object to save the image. All we do is pass the filename to the save option to save the image to that file.

def save_image(im:Image, filename):
   im.save(filename)

Testing Our Image Crop, Resize, and Save Functions

Now that we have our three functions, let’s test them out. Let’s crop the image to a 210 by 210 square. The upper left corner we choose is the coordinate (210, 210) and the bottom right corner coordinate is (420, 420). For testing the resize command, we’ll resize to half the height and half the width. Note that you can also resize bigger (I also tested double the width and height).

Finally, we can test the image save function by passing in the resulting images and a filename. For this example, I just called the cropped image cropped.png and the resized image resized.png. You are not limited to PNG images though. 

cropped = crop_image(im, 210, 210, 420, 420)
resized = resize_image(im, width//2, height//2)
 
save_image(cropped, "cropped.png")
save_image(resized, "resized.png")

The images we got from the cloud image are shown below.

Cropped from (210, 210) to (420, 420).

Python Cropped Image with PIL

Resized from 1240x656 to 620x328.

Python Resize Image with Pillow

Summary of How to Crop, Resize, and Save Images in Python

In this post we took an introductory look at the PIL, now Pillow, library in Python. Even though there’s a name change, we still import PIL. The original PIL is no longer maintained and only supported through Python 2, which is obsolete.

After our brief introduction to Pillow, we looked at how to crop, resize, and save an image in that order. We learned that cropping an image requires four integer values that dictate the upper left and lower right corners. Meanwhile, resizing an image requires two integer values representing the new size in pixels. Finally, saving an image only requires one string parameter – the filename we’re saving to.

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Categories
General Python level 1 python

A Complete Guide to Python String Manipulation

I am pretty sure I’ve had to look up how to work with strings over 100,000 times now. Not just in Python, but for programming in general. Strings are not treated the same in each programming language though. For example, you have to use pointers to mess with strings in C. However, Python provides a lot of versatility and functionality with strings. In this post we’re going to cover many of the things that I’ve had to look up often enough to be annoyed.

If you want to learn about strings in R, check out this article on Concatenating and Splitting Strings in R by the amazing Kristen Kehrer from Comet ML. Sign up for a Comet ML Account to improve your ML Model Monitoring.

We cover:

  • Convert a Python String to Bytes
    • “TypeError: ‘str’ does not support the buffer interface” Note
  • Bytes to String with Python Decode Function
  • Python ljust function for Left Justification
  • String Indexing in Python (CharAt Java Equivalent)
  • What is the First Negative Index in a Python String?
  • Python String Copy Details
    • Python Copy String into a Shallow or Deep Copy Overview
    • Diagram of How Python String Copy Works
  • IsUpper Python Function (Equivalent to Java isUpperCase)
  • IsLower Python Function (Equivalent to Java isLowerCase)
  • Python Lower vs Casefold Function for Comparing Strings
  • Check if a String is Alphanumeric with the isAlNum Python Function
  • Summary of Python String Manipulation

Convert a Python String to Bytes

A basic programming task is to switch data types. Strings and bytes are both pretty common data types. Usually, you’ll want to switch a Python string to bytes if you want to store it. Computers don’t understand what a “string” is, but they do understand bytes.

Converting a Python string to bytes has some interesting nuances. Python 3 offers two ways to do this, shown below. I’ve also included the time module so we can see how long the two different functions take. The first way to convert a Python 3 string to bytes is using the bytes data type converter, the second is to use the encode function built-in to strings.

import time
 
# Python string to bytes
mystring = "Solar Power"
start = time.time()
b1 = bytes(mystring, 'utf-8')
print(f"bytes function took {time.time()-start} seconds")
start = time.time()
b2 = mystring.encode('utf-8')
print(f"encode function took {time.time()-start} seconds")

Running this function multiple times shows us that the two functions take nearly identical execution times. Both near 0. What happens under the hood though? The bytes data type converter function actually calls encode under the hood for strings. In the end, this level of abstraction doesn’t really add much execution time.

Python String to Bytes with Bytes and Encode Multiple Timing Tests

The main thing to think about when considering which way you want to convert your Python string into bytes is this – which is more Pythonic? The encode function has an equivalent that we’ll see in action below. The bytes function is more flexible, so it can be used on more than just strings.

Coming from a Java background, I like the encode function a bit more. The encode function feels more or less equivalent to the Java string .getBytes function.

“TypeError: ‘str’ does not support the buffer interface” Note

This will come up if you’re switching between Python 2 and 3. In Python 2, strings were byte strings, so you could implicitly write strings as bytes. In Python 3, you have to explicitly type cast a string to bytes.

Bytes to String with Python Decode Function

As we talked about above, one of the reasons to use the encode function is because of the nice symmetries with the decode function. The way we turn bytes into a string is by calling the decode function on it as a utf-8 encoded string. 

# Python bytes to string
mystring = "Solar Power"
b1 = bytes(mystring, 'utf-8')
b2 = mystring.encode('utf-8')
s1 = b1.decode("utf-8")
s2 = b2.decode("utf-8")
print(s1)
print(s2)

As we can see in the image below, decoding the resulting byte array from both strings results in the original strings. This shows that the encode and bytes function perform (basically) the same action.

Decoding Python String to Bytes and Back to String

Python ljust Function for Left Justification

Most strings are left aligned. This is because we read things from left to right in English. The Python ljust function puts a little twist on things. ljust makes strings left justified. A string that is left justified can (usually is) still left aligned. The difference is that we’re padding the string to the left.

Let’s take a look at how Python’s ljust string function works. The ljust function takes up to 2 parameters. One required parameter, the length of the resulting, left justified string, and one optional parameter, the padding character. If we don’t specify a padding character, Python ljust automatically uses a space.

The code below is a continuation of the file above with the same strings. We show three different ways to call ljust. First without the optional parameter, and then twice with different characters passed as the filler.

# Python ljust example
print(s1.ljust(25))
print(s1.ljust(25, "!"))
print(s2.ljust(25, "#"))

The output should look like the image below. Note that the first one is space padded so we see nothing when we print it out

Python ljust Padding Example Results

However, if we change the code slightly so that it shows the representation of that string using repr we see the spaces are there.

# Python ljust example
print(repr(s1.ljust(25)))
print(s1.ljust(25, "!"))
print(s2.ljust(25, "#"))

See how the string is now in quotes in the image below with multiple spaces behind it?

Python ljust Padding Example Results with Space Representation

String Indexing in Python (CharAt Java Equivalent)

As I said above, I come from a Java background. String indexing in Python is so easy. You can access characters in a string the same way you access entries in a list. In Java, you use the CharAt method to get the character at a specific index. In Python, you simply use brackets.

Let’s take a look at some examples of string indexing in Python on a left justified string. We call ljust on the string we’ve been using all along, Solar Power, and set that result to a new string. Then we use brackets to find the characters at each index. The example indices we’ll use are 0, 10, and 24. 

# Python String Indexing
s3 = s1.ljust(25, "$")
print(s3[0])
print(s3[10])
print(s3[24])

As we can see, we got the characters in the 1st, 11th, and 25th positions as expected. If you are new to programming – remember that Python/Java/C/etc are all 0 indexed. That means index 0 is the location of the first character.

Python String Indexing (CharAt) Example Output

What is the First Negative Index in a Python String?

Transitioning to Python and seeing negative indices was so weird. I was like “what am I looking at?” However negative indices in Python strings are not hard. So, what is the first negative index in a Python string? It’s the last character! For a deeper dive, check out String Slicing in Python.

Using the same string as above, the ljust 25 character padded with $ characters, we can use negative indices to get the same values as we had before. Earlier, we accessed the first, 11th, and 24th element in the string.

This time we’re going to access the same string index with a negative index. The rule to access index x in a string of length m with a negative index, n, is that the absolute value of x and n have to sum to m. For example, index 0 and index -25 are the same, just like index 10 and -15 and 24 and -1. (Code from the last section is reproduced here for clarity)

# Python ljust + negative indices
s3 = s1.ljust(25, "$")
print(s3[0])
print(s3[10])
print(s3[24])
 
print(s3[-25])
print(s3[-15])
print(s3[-1])

The image below shows that the string indices that we’re accessing are the same with the positive and negative index values.

Python negative to positive string indexing

Python String Copy Details

Most programming languages, including Python, have two types of copying. There is “shallow” copying and “deep” copying. It’s especially important to pay attention to the type of copying you use when it comes to non-constant type objects (i.e. lists). However, it is still worth noting that this behavior is different.

Let’s cover a few basic Python behaviors before we get deeper into how each of these copies work. Python is a “pass by alias” language. Some languages use “pass by reference”, meaning that references to variables are operated on. Some languages use “pass by value”, meaning the actual value stored in a variable is operated on.

The main difference between the functional effects of passing by reference or value is how variables are used in functions. Python’s “pass by alias” works similarly to a mix of pass by reference and pass by value. Python passes around a reference to a variable that points to the value on the heap.

It’s most important to distinguish the way variables are used/passed when working with functions or variables that are traditionally used with pointers (i.e. lists). If you come from a C background you may be used to strings being pointers. However, in Python, strings are immutable objects. This means that if you change the object, it doesn’t change the object in memory, instead it creates a new object entirely. The overall gist is that it doesn’t really matter if you use a shallow or deep copy for strings. 

Python Copy String into a Shallow or Deep Copy Overview

A shallow copy of an object contains references to the original objects. A deep copy does not retain the reference values, it contains copies of the original objects. The third way to do a Python string copy is to straight up use the =. Let’s look at some examples below.

We use the copy library to bring in both the copy and deepcopy functions. In the code below, we create three copies. First with an = operator, then with copy and deepcopy. Next, we show where these copies and the original are located in memory. (Check the image below the code out for expected behavior)

Next, we augment each of these strings to demonstrate a couple of things. First, changing one string doesn’t change the others no matter which copy method you use. Note that this is not the behavior that happens with mutable objects (i.e. lists). Second, as we change the objects, their locations in memory change as well. This shows that we are not changingan object so much as changing where the variable references in memory.

# python string copy example + memory alloc
import copy
s4 = s3
s5 = copy.copy(s3)
s6 = copy.deepcopy(s3)
locations = map(id, [s3, s4, s5, s6])
for loc in locations:
   print(f"Memory located at: {loc}")
s3 += "a"
s4 += "x"
s5 += "y"
s6 += "z"
print(f"String 3: {s3}")
print(f"String 4: {s4}")
print(f"String 5: {s5}")
print(f"String 6: {s6}")
locations = map(id, [s3, s4, s5, s6])
for loc in locations:
   print(f"Memory located at: {loc}")

From the picture below we can see that the strings all start out referencing the same place in memory. This is because string immutables are interned (not like the people you hire for the summer) in memory. Interning immutables like strings helps us save memory space.

Where strings are stored and saved in memory

Diagram of How Python String Copy Works

We can take a look at how the string copying works underneath the hood. When we first create the copies, we can see that they all point to the same address (on the heap). Python makes efficient use of memory by storing strings in a string pool. Before it assigns memory addresses, Python checks if the string is already in the string pool.

How the Stack Heap and String Pool work in Python – immutability

However, once we change the value of the string, we automatically point at different memory addresses because strings are immutable. Python assigns a new memory address to each new string.

How the Stack Heap and String Pool work in Python – different values

IsUpper Python Function (Equivalent to Java isUpperCase)

If I haven’t already said it enough times, I come from a Java background. It may not be immediately obvious, but the Python isupper function is the same as the Java isUpperCase function. It returns a boolean value that reports on whether or not the passed string is all upper case.

Let’s see the function in action below. Once again, we have to do some string slicing in Python to get some different strings. We could just pass it the strings that we have already seen but that would be boring, so let’s take a look at some different strings.

# Python isupper function
print(f"{s3} is upper case? {s3.isupper()}")
print(f"{s3[:4]} is upper case? {s3[:4].isupper()}")
print(f"{s3[:1]} is upper case? {s3[:1].isupper()}")
print(f"{s3[:-2]} is upper case? {s3[:-2].isupper()}")
print(f"{s3[6]} is upper case? {s3[6].isupper()}")

The above code results in an output like the one below. Note that the Python isupper function returns True if and only iff the entire string passed in is upper case.

Are these strings uppercase?

IsLower Python Function (Equivalent to Java isLowerCase)

Logically, the islower Python function works the exact same way as the isupper function, just the other way around. Instead of detecting if an entire string is upper case, it detects if an entire string is lowercase. We use some different slices here, but the concept is the same.

Once again, we take 5 substrings of one of the strings we created earlier. This time, we run the islower function on them.

# Python islower function
print(f"{s3} is lower case? {s3.islower()}")
print(f"{s3[:4]} is lower case? {s3[:4].islower()}")
print(f"{s3[1:4]} is lower case? {s3[1:4].islower()}")
print(f"{s3[:-16]} is lower case? {s3[:-16].islower()}")
print(f"{s3[8]} is lower case? {s3[8].islower()}")

The code above produces an output similar to the image below. Notice that in both cases for islower and isupper, a one character string is recognized. 

Are these strings lowercase? Python Example Output

Python Lower vs Casefold Function for Comparing Strings

Python has two functions that convert all the characters in your string to lowercase alphanumerics. First we have the classic lower function, which turns all the characters into lowercase characters. Second we have casefold which does the same thing as lower but more “aggressively”.

The difference is in which conditions you want to use the function. If you are looking to just convert a string to lowercase, use lower. The caveat here is the lower pretty much operates only on ASCII values. There are only 128 ASCII characters.
Meanwhile, if we want to work with the 144,697 Unicode characters, it’s suggested to use the Python casefold function. CaseFold is meant to compare strings irrespective of case. Unlike lower, casefold is not about turning a string into lowercase. It is meant to compare strings that come in from different cases.

# Python casefold vs lower
s3 += "ẞ"
print(s3.casefold())
print(s3.lower())

The code above shows the difference between the Python casefold and lower functions. See how casefold folds the German orthographic Eszett (ẞ) into an “ss”. Meanwhile, the Python lower function doesn’t change it. This is a primary example of the difference between Python casefold and lower.

Python Casefold vs Python Lower

Check if a String is Alphanumeric with the isAlNum Python Function

The last few functions we looked at, isupper, islower, lower, and casefold all revolve around letter casing. In this section, we’re going to look at both letters and numbers. The isalnum function checks if a string is made up of entirely alphanumerics.

In the code below, we check four different strings with isalnum

# check for alphanumeric with isalnum
print(f"Is {s3} alphanumeric? {s3.isalnum()}")
print(f"Is {s1} alphanumeric? {s1.isalnum()}")
print(f"Is {s5} alphanumeric? {s5.isalnum()}")
print(f"Is {s3[:3]} alphanumeric? {s3[:3].isalnum()}")

Did you think that s1 (“Solar Power”) was going to be True for isalnum? When I first started, I totally did. However, the string has a space in it! That means that it evaluates to False. The first three characters, “Sol”, is the only string we tested that evaluates to true for isalnum.

Summary of Python String Manipulation

What a gauntlet. You probably won’t remember everything that you learned here in just one go round. Maybe bookmark the page for review so you don’t lose it :). In this post we covered a ton of different string manipulation techniques.

We started from converting a Python string to bytes. There’s a difference between strings and byte strings in Python 3 as opposed to in Python 2. It’s important to know this or you may run into type errors. However, I wouldn’t be surprised if you never touch Python 2 again, it’s 2022 and Python 2 isn’t even officially supported anymore.

Next, we looked at ljust. This Python string function left justifies your string with a customer character. If you don’t pass a custom character, you get spaces as a default. After ljust, we looked at how to do string indexing. Being someone with a Java background, I refer to this as the Python charat equivalent. 

In Python, string indexing is as simple as using brackets. Beyond that, Python also allows for negative indices. You can slice your strings forwards and backwards. After our brief foray into Python string slicing, we looked at copying.

There are multiple ways to handle copying in Python. We looked at 3 ways to copy a string. First, a direct = assignation, then the copy and deepcopy implementations. For immutables like strings, we see almost no difference in behavior. A further piece on lists will cover more differences.

Finally, we moved on to letter casing with isupper, islower, lower, and casefold. I want to mention here that upper is also a function, it does exactly what you think it does, it sets all the letters to uppercase. The last thing we did was check to see if a string was fully alphanumeric using Python’s isalnum function.

More by the Author

I run this site to help you and others like you find cool projects and practice software skills. If this is helpful for you and you enjoy your ad free site, please help fund this site by donating below! If you can’t donate right now, please think of us next time.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly