Обсуждение: using hash index when BETWEEN is specified

Поиск

Список

Период

Сортировка

using hash index when BETWEEN is specified

От

Zdenek Kotala

Дата:

10 сентября 2008 г., 10:39:04

I has played with new hash index implementation and I tried following 
command:

postgres=# select * from test where id between 1 and 5;
Time: 9651,033 ms
postgres=# explain select * from test where id between 1 and 5;                       QUERY PLAN
--------------------------------------------------------- Seq Scan on test  (cost=0.00..141681.00 rows=1 width=4)
Filter:((id >= 1) AND (id <= 5))
 
(2 rows)


Hash index is created on id column. However when I use

postgres=# explain select * from test where id in (1,2,3,4,5);                               QUERY PLAN
------------------------------------------------------------------------- Bitmap Heap Scan on test  (cost=22.24..332.53
rows=83width=4)   Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))   ->  Bitmap Index Scan on test_idx
(cost=0.00..22.22rows=83 width=0)         Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
 
(4 rows)

Time: 1,352 ms

I'm not planner guru but it seems to me that BETWEEN clause could be 
rewritten as a IN clause for integer data types and small interval.

    Zdenek

Re: using hash index when BETWEEN is specified

От

"Asko Oja"

Дата:

10 сентября 2008 г., 10:52:28

On Wed, Sep 10, 2008 at 1:39 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:

I has played with new hash index implementation and I tried following command:

postgres=# select * from test where id between 1 and 5;
Time: 9651,033 ms
postgres=# explain select * from test where id between 1 and 5;
QUERY PLAN
---------------------------------------------------------
Seq Scan on test (cost=0.00..141681.00 rows=1 width=4)
Filter: ((id >= 1) AND (id <= 5))
(2 rows)

Hash index is created on id column. However when I use

postgres=# explain select * from test where id in (1,2,3,4,5);
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=22.24..332.53 rows=83 width=4)
Recheck Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
-> Bitmap Index Scan on test_idx (cost=0.00..22.22 rows=83 width=0)
Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
(4 rows)

Time: 1,352 ms

I'm not planner guru but it seems to me that BETWEEN clause could be rewritten as a IN clause for integer data types and small interval.

Where should the line be drawn.
Define small :)

Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: using hash index when BETWEEN is specified

От

"Robert Haas"

Дата:

10 сентября 2008 г., 11:24:11

>> I'm not planner guru but it seems to me that BETWEEN clause could be
>> rewritten as a IN clause for integer data types and small interval.
>
> Where should the line be drawn.
> Define small :)

When the estimated cost is lower?

...Robert

Re: using hash index when BETWEEN is specified

От

Hannu Krosing

Дата:

10 сентября 2008 г., 12:25:46

On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:
> >> I'm not planner guru but it seems to me that BETWEEN clause could be
> >> rewritten as a IN clause for integer data types and small interval.
> >
> > Where should the line be drawn.
> > Define small :)
> 
> When the estimated cost is lower?

You still need to draw a line for when to even try estimating the cost .

Will this be interval of 10 ? or 100 ? or 10000 ?

----------------
Hannu

Re: using hash index when BETWEEN is specified

От

Zdenek Kotala

Дата:

10 сентября 2008 г., 14:31:09

Hannu Krosing napsal(a):
> On Wed, 2008-09-10 at 07:13 -0400, Robert Haas wrote:
>>>> I'm not planner guru but it seems to me that BETWEEN clause could be
>>>> rewritten as a IN clause for integer data types and small interval.
>>> Where should the line be drawn.
>>> Define small :)
>> When the estimated cost is lower?
> 
> You still need to draw a line for when to even try estimating the cost .
> 
> Will this be interval of 10 ? or 100 ? or 10000 ?

I think it depends of ration of unique integer number in a table and 
numbers of requested interval, number distribution and total number of rows.

For example if you have 10 distinct number and each has 100 occurrence 
then full scan is better (for between 1 and 5). But if each number 
occurs 100000x. Then using hash index should be effective.
    Zdenek

Re: using hash index when BETWEEN is specified

От

Tom Lane

Дата:

10 сентября 2008 г., 16:13:08

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> I think it depends of ration of unique integer number in a table and 
> numbers of requested interval, number distribution and total number of rows.

> For example if you have 10 distinct number and each has 100 occurrence 
> then full scan is better (for between 1 and 5). But if each number 
> occurs 100000x. Then using hash index should be effective.

I think this discussion is a complete waste of time.  Hash indexes don't
win against btrees for single indexscans currently.  Even if that ever
gets fixed, it's highly unlikely that they'd win for N separate
indexscans versus 1 indexscan, which is what a query rewrite of this
sort would produce.  Remember that the btree will have the desired range
of keys stored adjacently, whereas in a hash they are almost certainly
in distinct buckets, and likely not even close-together buckets if the
hash function is doing its job well.  So you really are talking about a
factor of N both in indexscan setup overhead and in I/O costs.
        regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: using hash index when BETWEEN is specified

using hash index when BETWEEN is specified

Re: using hash index when BETWEEN is specified

Re: using hash index when BETWEEN is specified

Re: using hash index when BETWEEN is specified

Re: using hash index when BETWEEN is specified

Re: using hash index when BETWEEN is specified