Re: vector search support

Поиск
Список
Период
Сортировка
От Jonathan S. Katz
Тема Re: vector search support
Дата
Msg-id 49c7ba52-818a-6d0b-b8fd-eadef8e195a1@postgresql.org
обсуждение исходный текст
Ответ на Re: vector search support  (Giuseppe Broccolo <g.broccolo.7@gmail.com>)
Ответы Re: vector search support  (Giuseppe Broccolo <g.broccolo.7@gmail.com>)
Список pgsql-hackers
On 4/26/23 9:31 AM, Giuseppe Broccolo wrote:
> Hi Nathan,
> 
> I find the patches really interesting. Personally, as Data/MLOps 
> Engineer, I'm involved in a project where we use embedding techniques to 
> generate vectors from documents, and use clustering and kNN searches to 
> find similar documents basing on spatial neighbourhood of generated 
> vectors.

Thanks! This seems to be a pretty common use-case these days.

> We finally opted for ElasticSearch as search engine, considering that it 
> was providing what we needed:
> 
> * support to store dense vectors
> * support for kNN searches (last version of ElasticSearch allows this)

I do want to note that we can implement indexing techniques with GiST 
that perform K-NN searches with the "distance" support function[1], so 
adding the fundamental functions to help with this around known vector 
search techniques could add this functionality. We already have this 
today with "cube", but as Nathan mentioned, it's limited to 100 dims.

> An internal benchmark showed us that we were able to achieve the 
> expected performance, although we are still lacking some points:
> 
> * clustering of vectors (this has to be done outside the search engine, 
> using DBScan for our use case)

 From your experience, have you found any particular clustering 
algorithms better at driving a good performance/recall tradeoff?

> * concurrency in updating the ElasticSearch indexes storing the dense 
> vectors

I do think concurrent updates of vector-based indexes is one area 
PostgreSQL can ultimately be pretty good at, whether in core or in an 
extension.

> I found these patches really interesting, considering that they would 
> solve some of open issues when storing dense vectors. Index support 
> would help a lot with searches though.

Great -- thanks for the feedback,

Jonathan

[1] https://www.postgresql.org/docs/devel/gist-extensibility.html

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: vector search support
Следующее
От: Kaiting Chen
Дата:
Сообщение: Is NEW.ctid usable as table_tuple_satisfies_snapshot?