Using Content Addressable Identifiers (CIDs) for Fun and Profit

I just discovered a gold-mine of wallpapers on this Reddit thread @ https://www.reddit.com/r/hyprland/comments/1n1r6bw/where_do_you_guys_go_for_wallpapers/. I absolutely love it. So many Github repositories full of high-quality wallpapers. I downloaded many of them - all for my personal collection of nice wallpapers. But I faced two problems, this writing describes them, along with the solutions I came up with. And the solution involves using BLAKE3 cryptographic hash function.

I already maintain a collection of wallpapers, though not in a very oraganized form. This time I wanted to find one or more axis based on which I can organize my richer wallaper collection.
More important problem was duplicate wallpapers, many popular wallpapers were appearing on multiple wallpaper collector's albums, under same or different name. I wanted to get rid of duplicates i.e. I wanted to build a set (think data-structure) of wallpapers - getting rid of copies.

For addressing problem (1), I decided to sort all the wallpapers I collected into one of two buckets. Based on form-factor of wallpaper, I decide which bucket it goes to. Formula is simple, if image_width > image_height, it is Wide, else it is Narrow.

# https://pypi.org/project/pillow/ from PIL import Image def is_image_wide(image_path: str) -> bool: img = Image.open(image_path) (width, height) = img.size return width > height

The reason I decided to go with this simple bucketing technique is simply because the devices where I use these wallpapers are of form-factor either wide or narrow - laptop, tablet or mobile. Ease of consumption.

Problem (2) is more interesting. We can use a hash function to produce a small digest for an arbirary large input string (here raw image data) and just by comparing two of those digests, we can almost certainly say whether those two input images were same or not. There are so many hash functions with various degrees of collision-resistance. But we want to be really sure about collision-resistance, hence we go with a special class of hash-functions called cryptographic hash functions. Another interesting property they possess is avalanche effect - meaning if you change even a single bit of input image to a cryptographic hash function, it will produce a drastically different digest. NIST has standardized many such cryptographic hash functions, but I personally like BLAKE3 - a great fan of its simple yet flexible design, resulting in fast real-world performance. Hence I will use BLAKE3 hash function to produce a 32 -bytes digest for each image so that I can get a list of only unique wallpapers. Interested to learn more about BLAKE3? Have a look at their official Github repository @ https://github.com/BLAKE3-team/BLAKE3

Back to solving both of our problems. I've to put only unique wallpapers into either of two buckets i.e. wide or narrow. Each directory in file-system works like a set data-structure, meaning no two files in same directory can have same name. But many different wallpapers in my growing collection of wallpapers have the same name - often they are just a decimal number. I can rename all the wallpapers to their cryptographic digest (think output of running BLAKE3 hash function on the content of wallpaper - raw bytes), that way each directory (which is a set data-structure, provided by file-system) will only hold unique wallpapers and the issue of different wallpapers having same name gets resolved.

Our problem should be addressed by now. All my wallpapers should be categorized into two buckets (more appropriately directories), each holding only unique wallpapers, addressed by its BLAKE3 cryptographic digest. Addressing files by its content, more appropriately a short cryptographic digest of its arbitrary large content, is known as Content Addressable Identification or CID in short. Curious about CIDs? IPFS has tons of high-quality content on cryptographic digest based content retrieval. One such example is here.

Putting it all together, here is the Python script I wrote to sort wallpapers into either of two buckets.

# HOW TO USE? git clone https://gist.github.com/ba8ab2a8f340ff5a5107c711744ea7c7.git pushd ba8ab2a8f340ff5a5107c711744ea7c7 python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # It will create two directories in current working directory `wide` and `narrow`. # Then it will recursively find all images (with well-known file extensions) # in wallpaper-collection directory, put them into either bucket (actually directory), # while renaming each wallpaper to its corresponding BLAKE3 digest, preserving file extension. python sort_images.py path/to/wallpaper-collection deactivate popd

Using Content Addressable Identifiers (CIDs) for Fun and Profit

Created : August 29, 2025