Re: querying with index on jsonb slower than standard column. Why?

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: querying with index on jsonb slower than standard column. Why?
Дата
Msg-id 5485C449.4020204@aklaver.com
обсуждение исходный текст
Ответ на Re: querying with index on jsonb slower than standard column. Why?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: querying with index on jsonb slower than standard column. Why?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
On 12/07/2014 05:28 PM, Tom Lane wrote:
> Tim Dudgeon <tdudgeon.ml@gmail.com> writes:
>> The index created is not a gin index. Its a standard btree index on the
>> data extracted from the json. So the indexes on the standard columns and
>> the ones on the 'fields' extracted from the json seem to be equivalent.
>> But perform differently.
> 
> I don't see any particular difference ...
> 
> regression=# explain analyze select count(*) from json_test where (data->>'assay1_ic50')::float > 90
> and (data->>'assay2_ic50')::float < 10;
>                                                                                       QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Aggregate  (cost=341613.79..341613.80 rows=1 width=0) (actual time=901.207..901.208 rows=1 loops=1)
>     ->  Bitmap Heap Scan on json_test  (cost=123684.69..338836.02 rows=1111111 width=0) (actual time=497.982..887.128
rows=100690loops=1)
 
>           Recheck Cond: ((((data ->> 'assay2_ic50'::text))::double precision < 10::double precision) AND (((data ->>
'assay1_ic50'::text))::doubleprecision > 90::double precision))
 
>           Heap Blocks: exact=77578
>           ->  BitmapAnd  (cost=123684.69..123684.69 rows=1111111 width=0) (actual time=476.585..476.585 rows=0
loops=1)
>                 ->  Bitmap Index Scan on idx_data_json_assay2_ic50  (cost=0.00..61564.44 rows=3333333 width=0)
(actualtime=219.287..219.287 rows=999795 loops=1)
 
>                       Index Cond: (((data ->> 'assay2_ic50'::text))::double precision < 10::double precision)
>                 ->  Bitmap Index Scan on idx_data_json_assay1_ic50  (cost=0.00..61564.44 rows=3333333 width=0)
(actualtime=208.197..208.197 rows=1000231 loops=1)
 
>                       Index Cond: (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision)
>   Planning time: 0.128 ms
>   Execution time: 904.196 ms
> (11 rows)
> 
> regression=# explain analyze select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
>                                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Aggregate  (cost=197251.24..197251.25 rows=1 width=0) (actual time=895.238..895.238 rows=1 loops=1)
>     ->  Bitmap Heap Scan on json_test  (cost=36847.25..197003.24 rows=99197 width=0) (actual time=495.427..881.033
rows=100690loops=1)
 
>           Recheck Cond: ((assay2_ic50 < 10::double precision) AND (assay1_ic50 > 90::double precision))
>           Heap Blocks: exact=77578
>           ->  BitmapAnd  (cost=36847.25..36847.25 rows=99197 width=0) (actual time=474.201..474.201 rows=0 loops=1)
>                 ->  Bitmap Index Scan on idx_data_col_assay2_ic50  (cost=0.00..18203.19 rows=985434 width=0) (actual
time=219.060..219.060rows=999795 loops=1)
 
>                       Index Cond: (assay2_ic50 < 10::double precision)
>                 ->  Bitmap Index Scan on idx_data_col_assay1_ic50  (cost=0.00..18594.21 rows=1006637 width=0) (actual
time=206.066..206.066rows=1000231 loops=1)
 
>                       Index Cond: (assay1_ic50 > 90::double precision)
>   Planning time: 0.129 ms
>   Execution time: 898.237 ms
> (11 rows)
> 
> regression=# \timing
> Timing is on.
> regression=# select count(*) from json_test where (data->>'assay1_ic50')::float > 90
> and (data->>'assay2_ic50')::float < 10;
>   count
> --------
>   100690
> (1 row)
> 
> Time: 882.607 ms
> regression=# select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
>   count
> --------
>   100690
> (1 row)
> 
> Time: 881.071 ms
> 
>             regards, tom lane
> 
> 

Running the above on my machine I do see the slow down the OP reports. I ran it several times 
and it stayed around 3.5x. It might be interesting to get the OS and architecture information 
from the OP.

test=# select version();

version


------------------------------------------------------------------------------------------------------------------------------
                                                    PostgreSQL 9.4rc1 on i686-pc-linux-gnu, compiled by gcc (SUSE
Linux)4.8.1 20130909 [gcc-4_8-branch revision 202388], 32-bit                                                       
 
(1 row)                           


test=# \timing 
Timing is on.
test=# select count(*) from json_test where (data->>'assay1_ic50')::float > 90 
test-# and (data->>'assay2_ic50')::float < 10;count 
-------99288
(1 row)

Time: 9092.966 ms


test=# select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;count 
-------99288
(1 row)
                                                                 
 
Time: 2542.294 ms           


explain analyze select count(*) from json_test where (data->>'assay1_ic50')::float > 90 
and (data->>'assay2_ic50')::float < 10;
                                                                                    QUERY PLAN
                                                           
 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=332209.79..332209.80 rows=1 width=0) (actual time=8980.009..8980.009 rows=1 loops=1)  ->  Bitmap Heap Scan on
json_test (cost=123684.69..329432.02 rows=1111111 width=0) (actual time=538.688..8960.308 rows=99288 loops=1)
RecheckCond: ((((data ->> 'assay2_ic50'::text))::double precision < 10::double precision) AND (((data ->>
'assay1_ic50'::text))::doubleprecision > 90::double precision))        Rows Removed by Index Recheck: 7588045
HeapBlocks: exact=20894 lossy=131886        ->  BitmapAnd  (cost=123684.69..123684.69 rows=1111111 width=0) (actual
time=531.066..531.066rows=0 loops=1)              ->  Bitmap Index Scan on idx_data_json_assay2_ic50
(cost=0.00..61564.44rows=3333333 width=0) (actual time=258.717..258.717 rows=998690 loops=1)                    Index
Cond:(((data ->> 'assay2_ic50'::text))::double precision < 10::double precision)              ->  Bitmap Index Scan on
idx_data_json_assay1_ic50 (cost=0.00..61564.44 rows=3333333 width=0) (actual time=251.664..251.664 rows=997880 loops=1)
                  Index Cond: (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision)Planning time:
0.391msExecution time: 8980.391 ms
 
(12 rows)



test=# explain analyze select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
                                                   QUERY PLAN
               
 

----------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=196566.38..196566.39 rows=1 width=0) (actual time=2609.545..2609.545 rows=1 loops=1)  ->  Bitmap Heap Scan on
json_test (cost=37869.00..196304.39 rows=104796 width=0) (actual time=550.273..2590.093 rows=99288 loops=1)
RecheckCond: ((assay2_ic50 < 10::double precision) AND (assay1_ic50 > 90::double precision))        Rows Removed by
IndexRecheck: 7588045        Heap Blocks: exact=20894 lossy=131886        ->  BitmapAnd  (cost=37869.00..37869.00
rows=104796width=0) (actual time=542.666..542.666 rows=0 loops=1)              ->  Bitmap Index Scan on
idx_data_col_assay2_ic50 (cost=0.00..18871.73 rows=1021773 width=0) (actual time=263.959..263.959 rows=998690 loops=1)
                 Index Cond: (assay2_ic50 < 10::double precision)              ->  Bitmap Index Scan on
idx_data_col_assay1_ic50 (cost=0.00..18944.62 rows=1025624 width=0) (actual time=257.912..257.912 rows=997880 loops=1)
                 Index Cond: (assay1_ic50 > 90::double precision)Planning time: 0.834 msExecution time: 2609.960 ms
 
(12 rows)


-- 
Adrian Klaver
adrian.klaver@aklaver.com



В списке pgsql-sql по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: querying with index on jsonb slower than standard column. Why?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: querying with index on jsonb slower than standard column. Why?