I just discovered a gold-mine of wallpapers on this Reddit thread @ https://www.reddit.com/r/hyprland/comments/1n1r6bw/where_do_you_guys_go_for_wallpapers/. I absolutely love it. So many Github repositories full of high-quality wallpapers. I downloaded many of them - all for my personal collection of nice wallpapers. But I faced two problems, this writing describes them, along with the solutions I came up with. And the solution involves using BLAKE3 cryptographic hash function.
For addressing problem (1), I decided to sort all the wallpapers I collected into one of two buckets. Based on form-factor of wallpaper, I decide which bucket it goes to. Formula is simple, if image_width > image_height, it is Wide, else it is Narrow.
The reason I decided to go with this simple bucketing technique is simply because the devices where I use these wallpapers are of form-factor either wide or narrow - laptop, tablet or mobile. Ease of consumption.
Problem (2) is more interesting. We can use a hash function to produce a small digest for an arbirary large input string (here raw image data) and just by comparing two of those digests, we can almost certainly say whether those two input images were same or not. There are so many hash functions with various degrees of collision-resistance. But we want to be really sure about collision-resistance, hence we go with a special class of hash-functions called cryptographic hash functions. Another interesting property they possess is avalanche effect - meaning if you change even a single bit of input image to a cryptographic hash function, it will produce a drastically different digest. NIST has standardized many such cryptographic hash functions, but I personally like BLAKE3 - a great fan of its simple yet flexible design, resulting in fast real-world performance. Hence I will use BLAKE3 hash function to produce a 32 -bytes digest for each image so that I can get a list of only unique wallpapers. Interested to learn more about BLAKE3? Have a look at their official Github repository @ https://github.com/BLAKE3-team/BLAKE3
Back to solving both of our problems. I've to put only unique wallpapers into either of two buckets
i.e. wide or narrow. Each directory in file-system works like a set data-structure, meaning no two
files in same directory can have same name. But many different wallpapers in my growing collection
of wallpapers have the same name - often they are just a decimal number.
I can rename all the wallpapers to their cryptographic digest (think output of running BLAKE3 hash
function on the content of wallpaper - raw bytes), that way each directory (which is a set
data-structure, provided by file-system) will only hold unique wallpapers and the issue of different
wallpapers having same name gets resolved.
Our problem should be addressed by now. All my wallpapers should be categorized into two buckets
(more appropriately directories), each holding only unique wallpapers, addressed by its BLAKE3
cryptographic digest. Addressing files by its content, more appropriately a short cryptographic
digest of its arbitrary large content, is known as Content Addressable Identification or CID in
short. Curious about CIDs? IPFS has tons of high-quality content on cryptographic digest based
content retrieval. One such example is here.
Putting it all together, here is the Python script I wrote to sort wallpapers into either of two buckets.