Обсуждение: Use of index for 50% column restriction

Поиск
Список
Период
Сортировка

Use of index for 50% column restriction

От
Bruce Momjian
Дата:
As part of my research on the parsing/planning behavior of PREPARE, I
found a surprising behavior --- a WHERE clause that is 50% restrictive
is using an index.  I thought only <10% restrictions used indexes.  To
setup the test:
DROP TABLE IF EXISTS test;CREATE TABLE test (c1 INT, c2 INT, c3 INT);INSERT INTO test SELECT c1, 0, 0 FROM
generate_series(1,10000) AS a(c1);INSERT INTO test SELECT c1, 1, 1 FROM generate_series(10001, 20000) AS a(c1);CREATE
INDEXi_test_c2 ON test (c2);ANALYZE test;EXPLAIN SELECT * FROM test WHERE c2 = 0;
 

The output is:
                                  QUERY PLAN
----------------------------------------------------------------------------- Index Scan using i_test_c2 on test
(cost=0.29..349.29rows=10000 width=12)  ----------    Index Cond: (c2 = 0) (2 rows)
 

\timing does show the optimizer is making the right decision to use the
index, and this behavior is the same back to at least 9.3.  Setting
effective_cache_size = '8kB' does not change this behavior.  What am I
missing?  Is my 10% assumption wrong?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Use of index for 50% column restriction

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> As part of my research on the parsing/planning behavior of PREPARE, I
> found a surprising behavior --- a WHERE clause that is 50% restrictive
> is using an index.  I thought only <10% restrictions used indexes.

There's no such hard-and-fast rule.  The cost estimate break point depends
greatly on the index order correlation (which is 100% in your example),
as well as some other factors like the index size versus
effective_cache_size.

For randomly-ordered data I believe the cutover is actually well below 10%.
        regards, tom lane



Re: Use of index for 50% column restriction

От
Bruce Momjian
Дата:
On Wed, Jun  8, 2016 at 01:28:54PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > As part of my research on the parsing/planning behavior of PREPARE, I
> > found a surprising behavior --- a WHERE clause that is 50% restrictive
> > is using an index.  I thought only <10% restrictions used indexes.
> 
> There's no such hard-and-fast rule.  The cost estimate break point depends
> greatly on the index order correlation (which is 100% in your example),
> as well as some other factors like the index size versus
> effective_cache_size.
> 
> For randomly-ordered data I believe the cutover is actually well below 10%.

Ah, I had not considered the correlation order of the rows in the table.
This test returns the sequential scan I expected by using floor(random()
* 2):
DROP TABLE IF EXISTS test;CREATE TABLE test (c1 INT, c2 INT, c3 INT);INSERT INTO test SELECT c1, floor(random() * 2), 0
FROMgenerate_series(1, 10000) AS a(c1);INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001,
20000)AS a(c1);CREATE INDEX i_test_c2 ON test (c2);ANALYZE test;EXPLAIN SELECT * FROM test WHERE c2 = 0;
 

Thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Use of index for 50% column restriction

От
Bruce Momjian
Дата:
On Wed, Jun  8, 2016 at 05:07:34PM -0400, Bruce Momjian wrote:
> > For randomly-ordered data I believe the cutover is actually well below 10%.
> 
> Ah, I had not considered the correlation order of the rows in the table.
> This test returns the sequential scan I expected by using floor(random()
> * 2):
> 
>     DROP TABLE IF EXISTS test;
>     CREATE TABLE test (c1 INT, c2 INT, c3 INT);
>     INSERT INTO test SELECT c1, floor(random() * 2), 0 FROM generate_series(1, 10000) AS a(c1);
>     INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000) AS a(c1);
>     CREATE INDEX i_test_c2 ON test (c2);
>     ANALYZE test;
>     EXPLAIN SELECT * FROM test WHERE c2 = 0;
> 
> Thanks.

Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:
DROP TABLE IF EXISTS test;CREATE TABLE test (c1 INT, c2 INT, c3 INT);INSERT INTO test SELECT c1, abs(floor(random() *
4)-1),abs(floor(random() * 4)-1) FROM generate_series(1, 10000) AS a(c1);INSERT INTO test SELECT c1, abs(floor(random()
*4)-1), abs(floor(random() * 4)-1) FROM generate_series(10001, 15000) AS a(c1);INSERT INTO test SELECT c1,
abs(floor(random()* 4)-1), abs(floor(random() * 4)-1) FROM generate_series(15001, 20000) AS a(c1);CREATE INDEX
i_test_c2ON test (c2);ANALYZE test;
 
SELECT c2, COUNT(*) FROM test GROUP BY c2 ORDER BY 1; c2 | count----+-------  0 |  5020  25%  1 | 10006  50%  2 |  4974
25%
 
EXPLAIN SELECT * FROM TEST WHERE c2 = 1;                            QUERY
PLAN-----------------------------------------------------------Seq Scan on test  (cost=0.00..359.00 rows=10006
width=12)  Filter: (c2 = 1)
 
EXPLAIN SELECT * FROM TEST WHERE c2 = 0;                                 QUERY
PLAN----------------------------------------------------------------------------Bitmap Heap Scan on test
(cost=99.19..270.94rows=5020 width=12)   Recheck Cond: (c2 = 0)   ->  Bitmap Index Scan on i_test_c2  (cost=0.00..97.94
rows=5020width=0)         Index Cond: (c2 = 0)
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Use of index for 50% column restriction

От
Jim Nasby
Дата:
On 6/8/16 4:36 PM, Bruce Momjian wrote:
> Just a follow-up, but even with a randomized correlation order, it seems
> 25% restrictivity generates a Bitmap Index Scan:

AFAIK we do the bitmap heap scan in heap order, thereby eliminating the 
effect of correlation?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: Use of index for 50% column restriction

От
Bruce Momjian
Дата:
On Tue, Jun 14, 2016 at 03:24:12PM -0500, Jim Nasby wrote:
> On 6/8/16 4:36 PM, Bruce Momjian wrote:
> >Just a follow-up, but even with a randomized correlation order, it seems
> >25% restrictivity generates a Bitmap Index Scan:
> 
> AFAIK we do the bitmap heap scan in heap order, thereby eliminating the
> effect of correlation?

High correlation would still cause us to access fewer heap pages than
random data.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +