Keeping things sorted with bisect.insort()

Don’t Repeat Yourself. If you find yourself sorting large lists again and again using the .sort() or sorted() methods you’re probably doing it wrong, but unless you’re working on large lists you’re probably not noticing it (I’d argue if you know you’re definitely not going to be working on large lists then using bisect.insort() might be overkill).

bisect.insort() allows us to use array bisection to efficiently insert an element into an already sorted array.

To compare speeds I created a simple script to sort the same list of random integers using bisect.insort() and list.sort().

#!/usr/bin/env python3

import logging as log
import utils.dummydata as dummy
from utils.timing import timed
import bisect

LIST_SIZE = 10**5 # 100,000

def setup():
    setup_logging()

def setup_logging():
    logging_format = "%(asctime)s: %(message)s"
    log.basicConfig(
        format=logging_format,
        level=log.DEBUG,
        datefmt="%H:%M:%S"
    )

@timed
def basic_sort(vals: list[int]) -> list[int]:

    new_list : list[int] = []

    for v in vals:
        new_list.append(v)
        new_list.sort()
    
    return new_list

@timed
def insort_sort(vals: list[int]) -> list[int]:
    
    new_list : list[int] = []

    for v in vals: 
        bisect.insort(new_list, v)

    return new_list

def main():
    setup()

    random_nums = dummy.random_list(int, LIST_SIZE)

    log.info(f"Built a random list of {LIST_SIZE:,} ints")

    log.info("Building new sorted list using `list.sort()`...")
    basic_sort(random_nums)

    log.info("Building a new sorted list using `bisect.insort()`...")
    insort_sort(random_nums)
    

if __name__ == "__main__":
    main()

Output:

❯ ./main.py
 23:55:23: Built a random list of 100,000 ints
 23:55:23: Building new sorted list using list.sort()…
 23:56:10: Completed basic_sort in 47.511071838 seconds
 23:56:10: Building a new sorted list using bisect.insort()…
 23:56:12: Completed insort_sort in 1.136234315000003 seconds

That’s a 42x speed increase!

Python Application Starter

This guide will walk through setting up a Python application with argument parsing, logging, and a virtual environment. This serves as the foundation for all of the Python applications I create.

The following assumes that you Python 3.x installed and are comfortable working with the terminal.

Setting Up the Virtual Environment

Virtual environments help us decouple the applications we’re building from our local machines. In effect, they’re little Python “sandboxes” with their own executables and library installations. Getting used to virtual environments will save you the headache of trying to debug issues with shared libraries. It is also essential if you’re planning on collaborating with other developers on your project.

Steps

First, create the folder for your application

mkdir basic_python_application

Then, use venv to create the virtual environment

python3 -m venv env

Running tree we can see the structure of the new env directory created

.
 └── env
     ├── bin
     │   ├── Activate.ps1
     │   ├── activate
     │   ├── activate.csh
     │   ├── activate.fish
     │   ├── easy_install
     │   ├── easy_install-3.9
     │   ├── pip
     │   ├── pip3
     │   ├── pip3.9
     │   ├── python -> python3
     │   ├── python3 -> /usr/local/bin/python3
     │   └── python3.9 -> python3
     ├── include
     ├── lib
     │   └── python3.9
     └── pyvenv.cfg
 5 directories, 13 files

Pay attention to the bin folder above. In it we see the executables that we’ll “plug into” when we active our virtual environment.

You can now activate the virtual environment by running source ./env/bin/activate from the root of your project directory and deactivate the environment by running deactivate (The deactivate command will not be available to you after deactivating the environment)

Creating the Skeleton Program

Although we will not be installing any third-party libraries in this setup it is good practice to make sure your virtual environment is activated before working on your project.

Steps

There’s really only one step here, copy and paste the following Gist into a main.py file located at the root of your project directory.

#!/usr/bin/env python3
import argparse
import logging as log
def setup():
setup_logging()
setup_argument_parsing()
def setup_logging():
logging_format = "%(asctime)s: %(message)s"
log.basicConfig(
format=logging_format,
level=log.DEBUG,
datefmt="%H:%M:%S"
)
def setup_argument_parsing():
parser = argparse.ArgumentParser(
description='A sample description of the application'
)
"""
Example of a positional style argument
parser.add_argument(
'integers',
dest='integers'
)
"""
"""
Example of a named 'non-positional' style argument
parser.add_argument(
'-o', '--output',
dest='output_directory',
required=False,
default='output'
)
"""
"""
Example of a boolean 'store_true' style argument
parser.add_argument(
'--skipBack',
dest='skip_copy_back',
action='store_true',
required=False,
default=False
)
"""
args = parser.parse_args()
configure_globals()
def configure_globals():
"""
Configuration settings should be passed in as arguments to this
function and set in the following form
global SAMPLE_ARGUMENT
SAMPLE_ARGUMENT = sample_argument
"""
pass
def main():
setup()
if __name__ == "__main__":
main()

What’s Going on Here?

~95% of all my Python applications have at least the following in common:

  1. They need arguments to know what data to operate on or to modify the behavior of the script
  2. They need to log information about program execution

This skeleton script is just a simple program that sets that up and quits. A solid foundation for an application.

Setting Up Git

Setting up Git in your project is essential to collaborating with other programmers and is foundational in any rollback strategy. Something that can feel like annoying overhead until you find yourself debugging something at 2am wishing you could just undo something you just pushed out.

Steps

From the root of your project directory run git init

STOP: If you have any experience with git you might be tempted to run git add .; git commit -m "First commit". But NOT YET.

I’ve often made the mistake of committing a bunch of junk into my repo by running git commit . without checking what I was actually committing. Doing this isn’t the end of the world but a little prevention goes a long way here. Before moving forward we’re going to add a gitignore file that will help us keep our repository clean by telling Git what it shouldn’t commit to version control.

We’re going to grab this one that works for most python projects. Copy the contents and put them into a .gitignore file at the root of your project directory.

Hint: You could also run the following command from the root directory of your project to download and store it all from the command line. Just make sure that you’re in the root of the project directory.

> curl https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore > .gitignore

After adding the .gitignore file run git status to double-check what you’re about to commit. You should get something like this:

❯ git status
 On branch master
 No commits yet
 Untracked files:
   (use "git add …" to include in what will be committed)
     .gitignore
     main.py
 nothing added to commit but untracked files present (use "git add" to track)

Notice that the env file is not in the list. This is because it’s listed in the .gitignore file we just pulled down.

Now that we know what will be added we can run the following to stage the files and make our first commit:

❯ git add .
❯ git commit -m "My first commit"
 [master (root-commit) 858554c] My first commit
  2 files changed, 213 insertions(+)
  create mode 100644 .gitignore
  create mode 100755 main.py
 

Adding Functionality

Let’s say we wanted to write a simple program that took a person’s name as a parameter as well as some flags to configure whether to reverse and/or capitalize the string in a welcome message. Here are a few examples of what it might look like:

> ./main.py john
> 23:10:50: 🤖 Hello There john
> ...
> ./main.py john -c
> 23:10:55: 🤖 Hello There JOHN
> ...
> ./main.py john -c -r
> 23:11:05: 🤖 Hello There NHOJ

Steps

Starting with the application skeleton we begin by adding the following in the setup_argument_parsing function body:

parser = argparse.ArgumentParser(
    description='A application that displays a welcome string when given a name'
)

parser.add_argument(
    'name'
)

parser.add_argument(
    '-c', '--capitalize',
    dest='capitalize',
    action='store_true',
    default=False
)

parser.add_argument(
    '-r', '--reverse',
    dest='reverse',
    action='store_true',
    default=False
)

args = parser.parse_args()

configure_globals(args.name, args.capitalize, args.reverse)

First we set up the argument parser with a description of our application (visible when passing --help as an argument).

Finally we pull the parsed arguments and pass them along to configure_globals which looks like this:

def configure_globals(name, capitalize, reverse):

    global NAME
    global CAPITALIZE
    global REVERSE

    NAME = name
    CAPITALIZE = capitalize
    REVERSE = reverse

All we’re doing here is setting the arguments, originally passed from the parser, to global variables. Don’t forget to define these at the top of your script like so:

NAME = ""
REVERSE  = False
CAPITALIZE = False

Now everything should be set up to access these globals wherever you need in your application.

Note: Things should only be given the scope they absolutely need to accomplish their goal. I’ve convinced myself that it makes sense here as they are arguments to the program itself. If you have a better approach ping me in the comments.

Final Thoughts

Things to Remember

Always make sure that you’re working within your virtual environment. You should make it a habit to double-check the prompt before running any pip install <Package> command.

This is what I see in my shell (zsh with p10k)

Things for the Future

Having this definitely speeds things up for me but in true Software Engineer Style™ I’m sure I could spend another 10 hours making it slightly faster. Current thoughts include leveraging a Python fzf library to build things automatically. Meaning you could have a flow like this:

❯ pycreateproject test_proj
Building python project. Select features to include:
☐ Logging
☐ Argument Parsing
☐ Config Files
❯ pycreateproject test_proj
Building python project. Select features to include:
☒ Logging
☒ Argument Parsing
☐ Config Files
Done! Project created at ~/Documents/favorites/python/blog/test_proj

Purpose

Reflection

The process of understanding something contains at least the following two steps:

  1. Gather new information
  2. Synthesize that information into some conclusion

For the past few months, it feels like my brain has been stuck in the first step without ever coming to some conclusion or idea. Just a bunch of amorphous thoughts floating through, and eventually out of, my head. So the hope is that forcing myself to reflect on and crystalize my thoughts into a blog post might help ideas stick a little more.

There’s also something nice about the fact that this blog exists, currently, in a spot where it’s private enough that I don’t feel too nervous about setting things down but public enough that I still feel some need to make things coherent.

Keeping a Record

3D printing, automating the digitization of a family photo library, dabbling in woodworking…these are all things that can be considered “technical” but would never be captured in the green squares of a contribution grid (yet). I often forget this and can be much too hard on myself for not getting enough done. So I hope that with this record of work done it’ll be a little easier for me to see all technical output over time (I’ll never get it 100% but at least I can try to even things out a bit).

It would also be really nice to have an organized repository of Guides and How To’s that I could reference when I find myself doing the same thing for the nth time. Future me has enough issues to worry about, why not take Configuring a Local Kafka Test Cluster off his plate by writing up a nice guide.

In addition to future me, I think the guides would also be useful to others. Now I want to preface this with something important, I love talking about programming. In the pre-COVID world it was definitely one of those “Don’t get him started” topics for me. But I am human, which in this context means two things: I only have so much time in a day to help people and I only have so much patience to help people with the same problem. I think any of us would go a little insane if their job was to show people how to set up a Python virtual environment day in & day out… day in … day out… day in… You get the point. If I could write a really great guide on how to do something I can not only save myself some time but expand the reach of my assistance. Anyone Googling a topic I’ve covered could theoretically benefit.

Improving My Technical Writing Skills

I feel like I owe a lot of my understanding in this field to good technical writers. Authors of books, blog posts, & READMEs that stuck out to me as being engaging and educational. I’ve also read a lot of technical writing that wasn’t so great. Walls of text that seemed to actively resist understanding. Some days I feel like I’m the one writing those articles; sometimes out of laziness and sometimes out of a lack of practice. So I’m here to train that skill and get comfortable with my written voice.