imagehash

package module
v1.0.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 7, 2024 License: MIT Imports: 4 Imported by: 0

README

Large scale image similarity search (Golang)

This is a fast and RAM-friendly hash-table-based image comparison package for large image collections (thousands and more). Resized and near-duplicate images can be found with it.

When numBuckets parameter is low (~4), the package is a rough pre-filtering first step. Then the second precise step is needed with images4 on the image set produced in the first step. This 2 step sequence (imagehash > images4) is necessary, because direct one-to-all comparison with images4 might be slow for very large image collections. For small image sets it is easier to skip the first step altogether.

When numBuckets is very high (~200), be sure to do a few tests, because, as in the example below, only 10 dimensions (pixels in Y channel) are used from the total 11x11*3=363 pixel values in the icon. This could under-represent some images, because bucket width is very small for high numBuckets.

Go doc

Algorithm

Demo (images4)

Example of comparing 2 photos using imagehash

The demo shows only the hash-based similarity testing (without making actual hash table). But hash table is implied in full implementation.

package main

import (
	"fmt"
	"github.com/vitali-fedulov/imagehash"
	"github.com/vitali-fedulov/images4"
)

const (
	// Recommended hyper-space parameters for initial trials.

	// I usually do not change epsPct parameter.
	// epsPct defines the range of uncertainty at hypercube borders,
	// when a nearest similar point may end up in the nearby hypercube,
	// thus having a different hash. The larger the value, the larger
	// the uncertainty range is. Larger values may produce larger hashSets,
	// which could be compute-expensive. 0.25 corresponds to 25% of bucket
	// width.
	epsPct = 0.25

	// Experiment by increasing numBuckets from 4 to 230 or higher.
	// It will make your searches faster, more precise, but maybe too strict.
	// It corresponds to the level of granularity of hyperspace quantization.
	// The higher the value, the more granular is N-space sub-division.
	// This example uses 10-dimensional vectors, splitting the 10-space into
	// 4^10 = 1048576 hypercubes. 4 splits one pixel brightness values into
	// 4 buckets. For numBuckets = 230, there will be 4×10²³ possible hypercubes.
	numBuckets = 4
)

func main() {

	// Open and decode photos (skipping error handling for clarity).
	img1, _ := images4.Open("1.jpg")
	img2, _ := images4.Open("2.jpg")

	// Icons are compact image representations needed for comparison.
	icon1 := images4.Icon(img1)
	icon2 := images4.Icon(img2)

	// Hash table values.

	// Value to save to the hash table as a key with corresponding
	// image ids. Table structure: map[centralHash][]imageId.
	// imageId is simply an image number in a directory tree.
	centralHash := imagehash.CentralHash(
		icon1, imagehash.HyperPoints10, epsPct, numBuckets)

	// Hash set to be used as a query to the hash table. Each hash from
	// the hashSet has to be checked against the hash table.
	// See more info in the package "hyper" README.
	hashSet := imagehash.HashSet(
		icon2, imagehash.HyperPoints10, epsPct, numBuckets)

	// Checking hash matches. In full implementation this will
	// be done on the hash table map[centralHash][]imageId.
	foundSimilarImage := false
	for _, hash := range hashSet {
		if centralHash == hash {
			foundSimilarImage = true
			break
		}
	}

	// Image comparison result.
	if foundSimilarImage {
		fmt.Println("Images are approximately similar.")
	} else {
		fmt.Println("Images are distinct.")
	}

	// Then use func Similar of package images4 for final
	// confirmation of image similarity. That is:
	// if images4.Similar(icon1, icon2) == true {
	//    fmt.Println("Images are definitely similar")
	// }
}

For advanced users

An alternative to using images4 package is generating multiple hash sets on different pixel sub-sets of the icon with package imagehash, so that search results of one hash set can be joined with another, or several hash sets. Each join operation will improve the result. Look at var HyperPoints10 description to understand how to create such different pixel sub-sets.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var HyperPoints10 = []image.Point{
	{2, 5}, {3, 3}, {3, 8}, {4, 6}, {5, 2},
	{6, 4}, {6, 7}, {8, 2}, {8, 5}, {8, 8}}

HyperPoints10 is a convenience 10-point predefined set with coordinates of icon values to become 10 dimensions needed for hash generation with package "hyper". The 10 points are the only pixels from an icon to be used for hash generation (unless you define your own set of hyper points with CustomPoints function, or manually. The 10 points have been modified manually a little to avoid texture-like symmetries.

Functions

func CentralHash

func CentralHash(icon images4.IconT, hyperPoints []image.Point,
	epsPercent float64, numBuckets int) uint64

CentralHash generates a central hash for a given icon by sampling luma values at well-distributed icon points (hyperPoints, HyperPoints10) and later using package "hyper". This hash can then be used for a record or a query. When used for a record, you will need a hash set made with func HashSet for a query. And vice versa. To better understand CentralHash, read the following doc: https://vitali-fedulov.github.io/algorithm-for-hashing-high-dimensional-float-vectors.html

func CustomPoints

func CustomPoints(n int) map[image.Point]bool

CustomPoints is a utility function to create hyper points similar to HyperPoints10. It is needed if you are planning to use the package with billions of images, and might need higher number of sample points (more dimensions). You may also decide to reduce number of dimensions in order to reduce number of hashes per image. In both cases CustomPoints will help generate point sets similar to HyperPoints10. The function chooses a set of points (pixels from Icon) placed apart as far as possible from each other to increase variable independence. Number of chosen points corresponds to the number of dimensions n. Brightness values at those points represent one coordinate each in n-dimensional space for hash generation with package "hyper". Final point patterns are somewhat irregular, which is good to avoid occasional mutual pixel dependence of textures in images. For cases of low n, to avoid texture-like symmetries and visible patterns, it is recommended to slightly modify point positions manually, and with that distribute points irregularly across the Icon.

func HashSet

func HashSet(icon images4.IconT, hyperPoints []image.Point,
	epsPercent float64, numBuckets int) []uint64

HashSet generates a hash set for a given icon by sampling luma values of well-distributed icon points (hyperPoints, HyperPoints10) and later using package "hyper". This hash set can then be used for records or a query. When used for a query, you will need a hash made with func CentralHash as a record. And vice versa. To better understand HashSet, read the following doc: https://vitali-fedulov.github.io/algorithm-for-hashing-high-dimensional-float-vectors.html

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL