SparseHash: Embedding Jaccard Coefficient between Supports of Signals

D.Valsesia, S.M.Fosson, C.Ravazzi, T.Bianchi, E.Magli

2016 IEEE International Conference on Multimedia and Expo Workshops (ICME 2016 Workshops), Seattle, USA, July 11-15, 2016

Abstract

Embeddings provide compact representations of signals to be used to perform inference in a wide variety of tasks. Random projections have

been extensively used to preserve Euclidean distances or inner products of high dimensional signals into low dimensional representations.

Different techniques based on hashing have been used in the past to embed set similarity metrics such as the Jaccard coefficient. In this

paper we show that a class of random projections based on sparse matrices can be used to preserve the Jaccard coefficient between the

supports of sparse signals. Our proposed construction can be therefore used in a variety of tasks in machine learning and multimedia signal

processing where the overlap between signal supports is a relevant similarity metric. We also present an application in retrieval of similar text documents where SparseHash improves over MinHash.

Additional material

Click on an item to open a preview, then on (top-left) to download it.

Presentation