Re: [GENERAL] Creation of tsearch2 index is very slow

Поиск

Список

Период

Сортировка

От	Craig A. James
Тема	Re: [GENERAL] Creation of tsearch2 index is very slow
Дата	21 января 2006 г. 00:34:52
Msg-id	43D18EA9.8060409@modgraph-usa.com обсуждение исходный текст
Ответ на	Re: [GENERAL] Creation of tsearch2 index is very slow (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-performance

Дерево обсуждения

Tom Lane wrote:
> Well, we're trying to split an index page that's gotten full into two
> index pages, preferably with approximately equal numbers of items in
> each new page (this isn't a hard requirement though).  ...  If that's
> correct, what you really want is to divide the values so that the unions
> of the two sets have minimal overlap ... which seems to me to have
> little to do with what the code does at present.

This problem has been studied extensively by chemists, and they haven't found any easy solutions.

The Jarvis Patrick clustering algorithm might give you hints about a fast approach.  In theory it's K*O(N^2), but J-P
ispreferred for large datasets (millions of molecules) because the coefficient K can be made quite low.  It starts with
a"similarity metric" for two bit strings, the Tanimoto or Tversky coefficients: 

  http://www.daylight.com/dayhtml/doc/theory/theory.finger.html#RTFToC83

J-P Clustering is described here:

  http://www.daylight.com/dayhtml/doc/cluster/cluster.a.html#cl33

J-P Clustering is probably not the best for this problem (see the illustrations in the link above to see why), but the
generalidea of computing N-nearest-neighbors, followed by a partitioning step, could be what's needed. 

Craig

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Steinar H. Gunderson"
Дата: 20 января 2006 г., 23:36:44
Сообщение: Re: [GENERAL] Creation of tsearch2 index is very slow

Следующее

От: Rikard Pavelic
Дата: 21 января 2006 г., 07:00:21
Сообщение: Re: [PERFORMANCE] Stored Procedures

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] Creation of tsearch2 index is very slow

Предыдущее

Следующее