1 Star 0 Fork 1

guofei9987 / pyLSHash

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README.md 3.97 KB
一键复制 编辑 原始数据 按行查看 历史
guofei9987 提交于 2022-07-24 02:47 . verion 0.1.1

pyLSHash

PyPI Python package codecov License Python Platform stars

A fast Python implementation of locality sensitive hashing.

I was using kayzhu/LSHash, but it stopped updating since 2013.
So I maintain it myself, and I have made a lot of improvement based on it.

Highlights

  • Fast hash calculation for large amount of high dimensional data through the use of numpy arrays.
  • Built-in support for persistency through Redis.
  • Multiple hash indexes support.
  • Built-in support for common distance/objective functions for ranking outputs.

Installation

pyLSHash depends on the following libraries:

  • numpy
  • redis (if persistency through Redis is needed)

To install:

$ pip install pyLSHash

Quickstart

To create 6-bit hashes for input data of 8 dimensions:

from pyLSHash import LSHash

lsh = LSHash(hash_size=6, input_dim=8)
lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
# attach extra_data
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])

res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

[((1, 2, 3, 4, 5, 6, 7, 8), 1.0), ((2, 3, 4, 5, 6, 7, 8, 9), 11)]

User defined distance function

def l1norm_dist(x, y):
    return sum(abs(x - y))


res2 = lsh.query([1, 2, 3, 4, 5, 6, 7, 7], dist_func=l1norm_dist)

print(res2)

Use Redis

from pyLSHash import LSHash

lsh = LSHash(hash_size=6, input_dim=8
             , storage_instance=RedisStorage({'host': 'localhost', 'port': 6379, 'decode_responses': True}))

lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
# attach extra_data
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])

res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

Use other database as storage

from pyLSHash import LSHash
from pyLSHash.storage import StorageBase
import redis
import json


class MyStorage(StorageBase):
    def __init__(self):
        self.storage = redis.StrictRedis(host='localhost', port=6379, decode_responses=True)

    def keys(self, pattern="*"):
        return self.storage.keys(pattern)

    def set_val(self, key, val):
        self.storage.set(key, val)

    def get_val(self, key):
        return self.storage.get(key)

    def append_val(self, key, val):
        self.storage.rpush(key, json.dumps(val))

    def get_list(self, key):
        res_list = [json.loads(val) for val in self.storage.lrange(key, 0, -1)]
        return tuple((tuple(item[0]), item[1]) for item in res_list)

    def clear(self):
        for key in self.storage.keys():
            self.storage.delete(key)


lsh = LSHash(hash_size=6, input_dim=8
             , storage_instance=MyStorage())

lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9], extra_data="some vector info")
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])

res = lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

save&load model

lsh.save_uniform_planes("filename.pkl")
lsh.load_uniform_planes("filename.pkl")

clear indexed data

lsh.clear_storage()
Python
1
https://gitee.com/guofei9987/pyLSHash.git
git@gitee.com:guofei9987/pyLSHash.git
guofei9987
pyLSHash
pyLSHash
main

搜索帮助