Hashing microscope images to check for similarity


I am trying to take two identical pictures with the plugable microscope.

My process is simple, take two back to back images of what is under the microscope, then take the hash of the two images. I would have expected them to be the same, however they aren’t.

Is there a way to adjust settings to aide in this?

Results showing the hash of the two images:

6b3a1351201d0f60106eb10e468bcda23add57cbfa71334e41fffaa9acb94a90 2018-08-01-120905.jpg

92adc513194f7ef2c9e5fff2a363bfb4a52bdb540b87aa93610f5a46d9c1315c 2018-08-01-120900.jpg




Interesting idea!

However, it will be impossible to do a similarity comparison using hashes. If one bit of data between two images is different, the entire hash will be different and incomparable. This is why hashes are used for binary authentication and driver signature enforcement, if absolutely anything about the data changes the hash check will not pass.

In this example, even if the image data (the actual pixels you see) were completely identical, the two images would still not have matching hashes since the metadata would be different. The two images were taken at distinct time periods, having different timestamp metadata, therefore mismatching hashes.

A better approach might be to use image comparison algorithms. The OpenCV library, available in many different languages, has image difference comparisons and there are a multitude of tutorials out there on how to do this.


@sam_morgan Thanks for pointing out meta-data. I have confirmed with exif:

EXIF tags in ‘2018-08-01-120900.jpg’ (‘Intel’ byte order):
Tag |Value
Model |USB Microscope
Software |Cheese 3.10.2
Date and Time |2018:08:01 12:09:00
X-Resolution |72
Y-Resolution |72
Resolution Unit |Inch
Exif Version |Exif Version 2.3
Date and Time (Origi|2018:08:01 12:09:00
FlashPixVersion |FlashPix Version 1.0
Color Space |Internal error (unknown value 65535)

EXIF tags in ‘2018-08-01-120905.jpg’ (‘Intel’ byte order):
Tag |Value
Model |USB Microscope
Software |Cheese 3.10.2
Date and Time |2018:08:01 12:09:05
X-Resolution |72
Y-Resolution |72
Resolution Unit |Inch
Exif Version |Exif Version 2.3
Date and Time (Origi|2018:08:01 12:09:05
FlashPixVersion |FlashPix Version 1.0
Color Space |Internal error (unknown value 65535)

I will try to remove the meta data or at least modify it to make them similar and see if that is causing the hash mismatch.

I fully understand your explanation of hashing, I didn’t account for what I couldn’t see. I have hope this will work with some tweaking.


I stripped out the meta-data with exiftool and then did an image diff:

Remove meta-data:
exiftool -all= 2018-08-01-120900.jpg
exiftool -all= 2018-08-01-120905.jpg

perform the diff:
compare -compose src 2018-08-01-120900.jpg 2018-08-01-120905.jpg diff.jpg

What could explain why there are so many pixels that differ here?

What is the microscope doing between images being taken which looks to be 5 seconds?



Even removing the metadata, a hash comparison is unlikely to be useful. Only one bit of data has to be different for the hash comparison to fail. Let’s do a quick experiment.

A 640x480 image has 307,200 pixels. If one pixel is different, we can say that the image is 99.9997% similar. So, using the number 1 to represent a pixel, I pasted 307,200 1 characters into an SHA-256 hash generator. The generated hash was:


Changing the last 1 to a 0 resulted in the following hash:


Using the following string similarity tester, this shows that the two hashes are 38.28% similar:


So, even if you remove the metadata, unless each pixel in the image is 100% identical, you will not have an identical hash, or even one that is comparable.

Edit: Looks like you typed a reply while I was doing this. This reply is meant to be in response to this post: http://support.plugable.com/plugable/…


This is the nature of photography, it’s nearly impossible to take two pixel-perfect images. Lighting, reflections, table vibrations, sensor interpretation, etc are all different from moment to moment. Put that through a JPEG compression algorithm, and you will get different results every time.

What exactly are you trying to accomplish with this comparison?


Does this forum allow for direct messaging?

I was just thinking about all the variables that come into play here and how I can reduce them. Maybe take the image in a dark place and solely rely on the light from LEDs. Go with a lossless format, etc.

Is there a way to lock down any variable configurations with the microscope to prevent the sensors from seeing the “same” subject differently?

Yes I am starting to understand the difficulty with what I am trying which I’d rather explain privately.


We can certainly move this to a private discussion. Please email support@plugable.com and mention ticket #231620 and that will get you a direct line to me!


Thanks, sent an email as per above.