One a Day One-Liners with Python — Week 1
Welcome aboard!
Every day I add a new Python One Liner to this thread. Feel free to leave comments here or clone the repo on GitHub and make a pull request if you think you’ve got a better solution. Benchmarks are welcome too!
Update (2023.01.29): I’m now writing the Oner Liners posts in weekly installments with themes! Head over to my main page to find more…
Jan 7, 2023
Tokenize text 📚
tokens = [(i,t) for i,t in enumerate(re.sub(r'[^\w\s]', '', text).split(' '))]
It’s not the most sophisticated tokenizer, though in some situations, it’s all you need. It generates a list of 2-tuples containing the term index in the original text and the term itself. Along the way, it also removes punctuation. It’s also pretty fast, processing a ~200k word document in about .03 seconds.
Jan 6, 2023
Memoize a function 🐘
from functools import lru_cache
@lru_cache(maxsize=32, typed=False)
def top_n_terms(text, n=10): pass
Discussion
More goodness from the functools
module today. This simple decorator caches or “memoizes” the results of a function based on the arguments provided to it, the first time it is called. Then subsequent calls to the function, with the same arguments, return the cached result.
In cases where a time consuming function always produces same result given the same input, memoizing is a nice way to optimize for performance, especially for computationally expensive routines.
maxsize
indicates the number of most recent calls to keep cached and typed
indicates whether or not cache results uniquely based on argument datatypes. For example, if typed
is True
an argument with the value 3.0 (a float) on one call and 3 (an int) on another call, could potentially return different results.
Further Reading
Jan 5, 2023
k-permutations of n 🧮
from functools import reduce
n, k = (10, 4)
reduce(lambda x, i: x*i, range(1, n+1))/reduce(lambda x, i: x*i, range(1, 2 if n==k else n-k))
Discussion
This one runs a little long, but it’s too fun not to share! It calculates the number of permutations of a non repeating sequence of k elements from a set of n choices.
To conceptualize this, I like to think about pin codes. Let’s say we want to determine how many 4 digit pin codes with no repetitions exist. In this case n = 10, where the choices are 0–9 and k = 4, the length of the pin code. The equation to calculate such a result is: n! / (n-k)!
.
Notice the division occurring roughly in the middle of the One Liner. Each side of the equation is simply computing a factorial. The right hand side is almost exactly the same as the left, but it also ensures that the second argument to the range generator is at least two so as to avoid an empty generator result.
This One Liner also uses functools.reduce
which applies a given function over every element of an iterable object. The arguments to the function are x
, an accumulator and i
the next element in the iteratable.
Further Reading
Jan 4, 2023
Spin up a web server from the command line… 🕸
python -m http.server 8192
Discussion
Often, I’ll use this concise utility to transfer files from one computer to another on my local network. Other times, I use it to serve a frontend production build for smoke testing prior to deployment.
By default the server will use the current working directory to serve files from, though you can specify a --directory
argument to use an alternative root.
Jan 3, 2023
Find the n
closest matches to a given string, using difflib
🧐
import difflib
closest = difflib.get_close_matches(word, possibilities, n=3, cutoff=0.8)
Further Reading
Jan 2, 2023
Divide and floor with double slash, simple and effective 👍🏻
result = 8 // 3
Discussion
Not much to say about this little gem. It does what it does and does it well.
Jan 1, 2023
Generate a random eight character id! 🥇
import random, string
id = ''.join(random.choices(string.ascii_letters + string.digits, k=8))
Discussion
A couple of things are important to think about with this solution. One is the stability and the other performance.
For stability we need to ask ourselves just how robust it is. In other words what is the likelihood that we encounter collisions. I ran several tests each 100 epochs, generating 1 million ids and every one was unique. At 10 million per epoch I saw one or more collisions in about 20% of the epochs, or about 1 collision per 50 million ids. So, for smaller projects this would be a decent approach.
For performance, the join
method runs in O(n) so, it’s running time is dependent upon the size of the input. In this case we know the input size to be quite small. I found conflicting information about the running time of random.choices
, with some saying it runs in constant time and others saying it is implementation dependent. Still, this should be a pretty fast solution for generating one or a few ids at a time.
Further Reading
More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.
Interested in scaling your software startup? Check out Circuit.