Re: why is gist index taking so much space on the disc

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Re: why is gist index taking so much space on the disc
Дата
Msg-id 20051121194527.GG16764@svana.org
обсуждение исходный текст
Ответ на Re: why is gist index taking so much space on the disc  (Grzegorz Jaskiewicz <gj@pointblue.com.pl>)
Ответы Re: why is gist index taking so much space on the disc  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-hackers
On Mon, Nov 21, 2005 at 08:14:44PM +0100, Grzegorz Jaskiewicz wrote:
> >You mean you sometimes put the same elements in the two halves? You
> >shouldn't do that. The whole point is that the search will descend any
> >node that matches consistant, but any single key should only appear
> >once in each index.
> >
> >picksplit should *split* the set, not return two sets about the same
> >size as you started...
>
> Nope, I mean that 'masks' created to match either 'half' sometimes
> match elements in the other one.
> This shouldn't be a big deal, just one level to go down on query to
> much more specific result set.
> I have fixed that with, somewhat hack.

It's not a hack, that's how it's supposed to work. An entry should only
appear once in the index, but it could appear in multiple places. Like
you say, some entries can go into either half.

B-Trees are the rather special case that you can always split a set of
values into two non-overlapping sets. With geometric types (like your
bitmasks) you can't avoid overlap sometimes so you have to follow
multiple branches to check if an element is there or not.

Your pseudo code is good and will work fine. However, ideally you want
to divide the overlap in such a way that later splits work better.
Maybe by trying to decide which mask is "closer".

The better the splitting the more efficient your tree will become.
Ofcourse, perfect splitting may be expensive but then it depends on how
many inserts vs how many selects. If you do a lot of searches it may be
worth the time.

BTW, I glad you're making progress and hopefully you might be able to
publish some code. PostgreSQL could do with some example GiST indexes
on bitmaps.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PostgreSQL 8.1.0 catalog corruption
Следующее
От: Tino Wildenhain
Дата:
Сообщение: Re: plpython and bytea