Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) Beijing Wodong Tianjun Information Technology Co., Ltd. The Gamma Authors.
Copyright (c) The Gamma Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
112 changes: 56 additions & 56 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,56 +1,56 @@
# Gamma Python SDK
gamma python sdk and python wheel packages.
## Overview
This repository shows gamma python sdk and provides scripts to create wheel
packages for the gamma library.
[python sdk api](./docs/APIPythonSDK.md) is the document of python sdk api.
Files in directory of python shows how the python sdk encapsulate gamma.
setup.py is written for creating wheel packages for gamma.
Of course, pip install vearch is the easiest way to use this python sdk. And
this repository helps to build your custom python sdk.
## Building source package
if thers is a custom built gamma library in the system, build source package
for the best performance.
### Prerequisite
You can build it with docker image: pypywheels/manylinux2010-pypy_x86_64:latest
auditwheel tool should be installed firstly. You can install it by pip.
The package can be built when gamma is already built and installed.
See the official [gamma installation
instruction](https://github.com/vearch/gamma/blob/master/README.md) for more
on how to build and install gamma. In particular, compiling wheel packages
requires additional compilation options in compiling gamma.
```bash
git clone https://github.com/vearch/vearch-python.git
git submodule init
git submodule update
cd gamma
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TESTING=ON ..
make
sh build-wheels.sh
sh install-vearch.sh
```
Then the whl file will be generated into the wheelhouse directory.
For building wheel packages, swig 3.0.12 or later needs to be avaiable.
### Linux
In linux, `auditwheel` is used for creating python wheel packages ocntains
precompiled binary extensions.
Header locations and link flags can be customized by `GAMMA_INCLUDE` and
`GAMMA_LDFLAGS` environment variables for building wheel packages.
Windows and OSX are not supported yet.
# Gamma Python SDK

gamma python sdk and python wheel packages.

## Overview

This repository shows gamma python sdk and provides scripts to create wheel
packages for the gamma library.

[python sdk api](./docs/APIPythonSDK.md) is the document of python sdk api.
Files in directory of python shows how the python sdk encapsulate gamma.
setup.py is written for creating wheel packages for gamma.

Of course, pip install vearch is the easiest way to use this python sdk. And
this repository helps to build your custom python sdk.

## Building source package

if thers is a custom built gamma library in the system, build source package
for the best performance.

### Prerequisite

You can build it with docker image: pypywheels/manylinux2010-pypy_x86_64:latest

auditwheel tool should be installed firstly. You can install it by pip.

The package can be built when gamma is already built and installed.
See the official [gamma installation
instruction](https://github.com/vearch/gamma/blob/master/README.md) for more
on how to build and install gamma. In particular, compiling wheel packages
requires additional compilation options in compiling gamma.

```bash
git clone https://github.com/vearch/vearch-python.git
git submodule init
git submodule update
cd gamma
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TESTING=ON ..
make
sh build-wheels.sh
sh install-vearch.sh
```

Then the whl file will be generated into the wheelhouse directory.

For building wheel packages, swig 3.0.12 or later needs to be avaiable.

### Linux

In linux, `auditwheel` is used for creating python wheel packages ocntains
precompiled binary extensions.
Header locations and link flags can be customized by `GAMMA_INCLUDE` and
`GAMMA_LDFLAGS` environment variables for building wheel packages.
Windows and OSX are not supported yet.
12 changes: 7 additions & 5 deletions build-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ elif [ `expr substr ${OS} 1 5` == "Linux" ];then
export GAMMA_LDFLAGS=$BASE_PATH/build/libgamma.so
export GAMMA_INCLUDE=$BASE_PATH
export LD_LIBRARY_PATH=$BASE_PATH/build/:$LD_LIBRARY_PATH
for PYBIN in /opt/python/cp38-cp38/bin; do
"${PYBIN}/pip" install -r dev-requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
"${PYBIN}/python" setup.py bdist_wheel
auditwheel repair dist/vearch*
rm -rf dist build vearch.egg-info
for PYBIN in /opt/python/*/bin; do
if [[ ${PYBIN} =~ "cp" ]]; then
"${PYBIN}/pip" install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
"${PYBIN}/python" setup.py bdist_wheel
auditwheel repair dist/vearch*
rm -rf dist build vearch.egg-info
fi
done
elif [ `expr substr ${OS} 1 10` == "MINGW" ];then
echo "windows not support"
Expand Down
41 changes: 41 additions & 0 deletions demos/demo_scann_module.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import vearch
import time
import numpy as np
print("create table")
engine = vearch.Engine("files", "logs")
table = {
"name" : "test_table",
"engine" : {
"index_size": 50000,
"retrieval_type": "VEARCH",
"retrieval_param": {
"metric_type": "InnerProduct",
"ncentroids": 512,
"nsubvector": 256,
"reordering": True
}
},
"properties" : {
"feature": {
"type": "vector",
"dimension": 512,
"store_type": "Mmap"
}
}
}
engine.create_table(table)
print("add data")
add_num = 100000
X = np.random.rand(add_num, 512).astype('float32')
engine.add2(X)
print("search")
nprobe, rerank, query_num= 20, 100, 10
engine.set_nprobe(nprobe)
engine.set_rerank(rerank)
Q = np.random.rand(query_num, 512).astype('float32')
indexed_num = 0
while indexed_num != X.shape[0]:
indexed_num = engine.get_status()['min_indexed_num']
time.sleep(1)
engine.search2(Q, query_num)

Empty file modified docs/APIPythonSDK.md
100755 → 100644
Empty file.
1 change: 0 additions & 1 deletion gamma
Submodule gamma deleted from 7dd623
6 changes: 3 additions & 3 deletions install-vearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ if [ ${OS} == "Darwin" ];then
pip install ${WHEEL}
done
elif [ `expr substr ${OS} 1 5` == "Linux" ];then
for PYBIN in /opt/python/cp38-cp38/bin; do
for PYBIN in /opt/python/*/bin; do
python_tag=$(echo ${PYBIN} | cut -d '/' -f4)
"${PYBIN}/pip" uninstall vearch --yes
"${PYBIN}/pip" install "wheelhouse/vearch-$version-${python_tag}-manylinux_2_12_x86_64.manylinux2010_x86_64.whl"
"${PYBIN}/python" -c "import vearch"
"${PYBIN}/pip" install "wheelhouse/vearch-${version}.3-${python_tag}-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
"${PYBIN}/python" -c "import vearch"
done
elif [];then
echo "Windows not support!!!"
Expand Down
86 changes: 80 additions & 6 deletions python/__init__.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,6 @@
2. vector similarity search
3. use like a database

Vearch have four builtins.object
Engine
EngineTable
Item
Query
use help(vearch.Object) to get more detail infomations.
"""
import time
Expand Down Expand Up @@ -566,6 +561,15 @@ def create_item(self, table, doc_id, doc_info):
self.doc.SetKey(doc_id)
return doc_info['_id']

def create_items(self, table, doc_ids, docs):
if len(table.vec_infos) != 1 or len(table.field_infos) != 1:
return 1
for doc_id in doc_ids:
doc = Doc()
doc.SetKey(doc_id)
docs.AddDoc(doc)
return 0

def set_doc(self):
for field in self.fields:
if field.type == dataType.VECTOR:
Expand Down Expand Up @@ -1120,7 +1124,6 @@ def deserialize(self, table, buf):
query_results.append(query_result)
self.query_results = query_results


class Engine:
''' vearch core
It is used to store, update and delete feature vectors,
Expand Down Expand Up @@ -1203,6 +1206,38 @@ def add(self, docs_info):
print("finish add cost %.4f s" % (time.time() - start))
return doc_ids

def add2(self, data):
''' add docs into table
data: raw vector
return: unique docs' id for docs
'''
if self.verbose:
start = time.time()
if not isinstance(data, np.ndarray):
ex = Exception('The add function takes an incorrect argument; it must be of a list type.')
raise ex
nb, d = data.shape
doc_ids = [self.create_id() for i in range(nb)]
docs = Docs()
doc = GammaDoc()
if doc.create_items(self.gamma_table, doc_ids, docs):
return []
results = swigCreateBatchResult(nb)
if self.verbose:
print("prepare add cost %.4f s" % (time.time() - start))
start = time.time()
if 0 == swigAddOrUpdateDocsCPP2(self.c_engine, docs, swig_ptr(data), results):
if self.verbose:
print("gamma add cost %.4f s" % (time.time() - start))
start = time.time()
for i in range(nb):
if results.Code(i) == 0:
self.total_added_num += 1
swigDeleteBatchResult(results)
if self.verbose:
print("finish add cost %.4f s" % (time.time() - start))
return doc_ids

def update_doc(self, doc_info, doc_id):
''' update doc's info. The docs_info must contain "_id" information.
doc_info: doc's new info.
Expand Down Expand Up @@ -1330,6 +1365,45 @@ def search(self, query_info):
if self.verbose:
print("get results cost %f ms" %((time.time() - start) * 1000))
return results

def set_nprobe(self, nprobe):
swigSetNprobe(self.c_engine, nprobe, self.gamma_table.engine['retrieval_type'])

def set_rerank(self, rerank):
swigSetRerank(self.c_engine, rerank, self.gamma_table.engine['retrieval_type'])

def search2(self, xq, k):
''' search in table
xq: query data
'''
#if self.verbose:
# start = time.time()
if not isinstance(xq, np.ndarray):
ex = Exception('The search2 function takes an incorrect argument; it must be of a list type.')
raise ex
# d should also check, TODO
if len(xq.shape) > 1:
n, d = xq.shape
elif len(xq.shape) == 1:
n = 1
d = xq.shape[0]
else:
return ()
distances = np.empty((n, k), dtype=np.float32)
labels = np.empty((n, k), dtype=np.int64)
result = swigCreateVectorResult(n, k, swig_ptr(distances), swig_ptr(labels))
result.query = swig_ptr(xq)
#if self.verbose:
# print("prepare search cost %f ms" %((time.time() - start) * 1000))
# start = time.time()
ret = swigSearchCPP2(self.c_engine, result)
swigDeleteVectorResult(result)
#if self.verbose:
# print("gamma search cost %f ms" %((time.time() - start) * 1000))
if ret:
return ()
else:
return distances, labels

def del_doc_by_query(self, query_info):
''' delete docs by query
Expand Down
35 changes: 33 additions & 2 deletions python/swigvearch.i
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ typedef int64_t size_t;
return vec_res;
}
}

tig_gamma::Request *swigCreateRequest() {
return new tig_gamma::Request();
}
Expand All @@ -163,7 +163,30 @@ typedef int64_t size_t;
request = nullptr;
}
}


void swigSetNprobe(void *engine, int nprobe, std::string index_type) {
CPPSetNprobe(engine, nprobe, index_type);
}

void swigSetRerank(void *engine, int rerank, std::string index_type) {
CPPSetRerank(engine, rerank, index_type);
}

tig_gamma::VectorResult *swigCreateVectorResult(int n, int k, float *dists, int64_t *labels) {
tig_gamma::VectorResult *result = new tig_gamma::VectorResult();
result->init(n, k, dists, labels);
return result;
}

void swigDeleteVectorResult(tig_gamma::VectorResult *result) {
if (result) {
result->dists = nullptr;
result->docids = nullptr;
delete result;
result = nullptr;
}
}

tig_gamma::Response *swigCreateResponse() {
return new tig_gamma::Response();
}
Expand Down Expand Up @@ -264,6 +287,10 @@ typedef int64_t size_t;
return CPPSearch(engine, request, response);
}

int swigSearchCPP2(void* engine, tig_gamma::VectorResult *result) {
return CPPSearch2(engine, result);
}

int swigAddOrUpdateDocCPP(void* engine, tig_gamma::Doc *doc) {
return CPPAddOrUpdateDoc(engine, doc);
}
Expand All @@ -283,6 +310,10 @@ typedef int64_t size_t;
return CPPAddOrUpdateDocs(engine, docs, results);
}

int swigAddOrUpdateDocsCPP2(void* engine, tig_gamma::Docs *docs, float *data, tig_gamma::BatchResult *results) {
return CPPAddOrUpdateDocs2(engine, docs, data, results);
}

int swigDelDocByQuery(void* engine, unsigned char *pRequest, int len){
char* request_str = (char*)pRequest;
return DelDocByQuery(engine, request_str, len);
Expand Down
3 changes: 1 addition & 2 deletions dev-requirements.txt → requirements.txt
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
numpy
flatbuffers==1.12.0
delocate
numpy>=1.16.0
Loading