![]() |
An interesting visual representation of sorting algorithms. From left to right: bubble, heap, and quick sort. |
An interesting visual representation of sorting algorithms, from left to right: bubble, heap, and quick sort.
In my previous 'Journey to Robot' article, I mentioned my interest in using image hashing as a way to check for duplicate images. In this article, I'll be going more in depth on how it works, and how to use it. If you'd like to skip to it's usage in python, click here.
Image hashing is the process of creating a hash value based on the image's contents. This means that different images will have different hash values, and similar images will have similar hash values. (The length and complexity of the hash value will depend on the hashing method your using). Comparing an image's hash value to the hash value of another can give you somewhat accurate duplicate detection.
There are a couple different image hashing algorithms out there, for these examples, I'll be using
Here's an example of an image hash generated using dhash:
c1c4e4a484a0809083fbffcc0040831e
So How Does It Work?
dHash goes through a couple of steps to create the has value:
- Converts the image to grayscale.
- Resizes it to a 9x9 thumbnail.
- Creates a 64-bit row hash (where a 1 bit represents increasing light intensity, and a 0 bit represents decreasing light intensity)
- Creates a 64-bit column hash (Same as the row hash in the previous step but for the columns of pixels.
- Combine the two 64 bit row and column hash values into one big 128 bit hash value.
A Bit, short for binary digit is the smallest unit of data. It can have a value of either 1 or 0. A 64-bit hash has 64 of these bits, each of which have a value of 1 or 0. In step 3, the computer moves from left to right, giving each bit a value on 1 if that pixel is brighter than the previous, or a value of 0 if it's dimmer.
Step 4 does the same as step 3, except it moves up to down.
The 64 bits of each are combined to create one long 128 bit hash.
How do I code this?
Now that we understand what's going on behinds the scenes, lets work on implementing this in python!
First, let's install dHash: pip install dhash
Next, you'll need to install one of 2 packages: wand, or Pillow.
I've had a couple issues installing Pillow so I'll be using wand , but you can use either. Due to their differences in converting images to grayscale, the hash values created may differ between the two (which is why it's a good idea to stick with one or the other and not use both.).
To install wand, use: pip install wand
If you're having issues installing packages with pip, check out my troubleshooting article.
Here's some example code explaining how to generate a hash value in python:
In the example above, I generate a hash value for the given image, 'doggo.jpg'. I'm using python 2, but this works just as fine in python 3.
The hexadecimal string in the output is your hash value!
As always, any comments, suggestions, or questions are greatly appreciated!
Leave a comment, or message me on twitter 🚀
Comments
Post a Comment