similARiTy – Lightweight tool for image similarity search (2019)

If digital image stocks have been built up over years of work with different people, it is inevitable that duplicates will occur. These redundant data records are of course a thorn in the side of every data curator, and he or she will try to merge or delete the corresponding data records with every finding. As soon as a certain mass of images has been reached, the finding of these duplicates is often left to chance or results from information provided by users. Thanks to the major developments in the field of image similarity search, however, the topic can now also be dealt with more systematically. For example, with the similARiTy tool, which compares two convolutes of digital images on the basis of percepual hashing and BK-tree. First, a json file is created for each directory to be compared, which is then compared with each other. Prerequisites for using this command-line tool are python and imagemagick.

The development of similARiTy was carried out by Thorsten Wübbena (former research director, Digital Humanities) from January to August 2019.