Re: Hash Indexes. (Was: planner complaints)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Hash Indexes. (Was: planner complaints)
Дата
Msg-id 26085.954797589@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Hash Indexes. (Was: planner complaints)  (Mark Dalphin <mdalphin@amgen.com>)
Список pgsql-sql
Mark Dalphin <mdalphin@amgen.com> writes:
> Tom Lane wrote:
>> Why would you do that?  The hash index method doesn't have any advantage
>> over btree that I can see, and it's got a lot of disadvantages.

> Tom, I have heard this stated several times in this list and yet it
> contradicts what I was taught in my course on databases. It was
> explained that using a HASH index could be faster than a BTREE index
> for direct lookup of an item, however, the tradeoff was that you
> couldn't do "unequal" comparisons (ie COLUMN < SomeValue).  The speed
> gain was because the HASH index could go directly to the page
> containing the data while the btree index might need to load several
> pages to get to the final data, especially for large BTREE indexes.
> Is this simply not true for PostgreSQL, or do you think it isn't true
> in general (for most implementations of HASH and BTREE)?

Hmm.  I haven't actually timed hash versus btree lookups in Postgres;
it's possible that hash is faster (at least for some tables).  I'm not
sure I believe that a hash "can go directly to the right page", though.
You could go directly to the right hash bucket, but how many index
entries will be in the bucket?  You might still have to follow a chain
of overflow pages.  As Professor Knuth remarked about hashing, when
you use it you are absolutely depending on the laws of probability:
the average case is really good, but the worst case is awful.  Btrees
have more predictable performance.

When I commented that hashes have disadvantages, I was thinking of other
issues that are Postgres-specific: PG's hash indexes support fewer data
types than our btrees do, and our btrees support concurrent index updates
while hashes don't.  If you use a hash index you are pretty much back
to the bad old pre-MVCC days: you can have N readers *or* one writer
at any instant.  (I imagine this could be fixed if anyone cared to
invest the work.)  The btree code is also a lot more heavily used by
most people, so I'd expect it to have fewer bugs.

And, as you say, btrees support order-related queries while hashes only
support equality.  IMHO this alone is sufficient to justify choosing
btree, unless you are quite certain that you know all the kinds of query
that you will be throwing at the table and none of them involve
ordering.
        regards, tom lane


В списке pgsql-sql по дате отправления:

Предыдущее
От: Michael Fork
Дата:
Сообщение: User/Groups
Следующее
От: "tjk@tksoft.com"
Дата:
Сообщение: Re: User/Groups