Обсуждение: Query is slower with a large proportion of NULLs in several columns

Поиск
Список
Период
Сортировка

Query is slower with a large proportion of NULLs in several columns

От
Lars Bergeson
Дата:
First of all, here is the version of PostgreSQL I'm using:
PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit

I'm new to PostgreSQL, and I'm deciding if I should make columns in my database nullable or not.

I have no need to distinguish between blank/zero and null. But I have noticed that using NULL for unused values does save disk space, as opposed to using zero/blank default values.

In my table I have 142 columns and 18,508,470 rows. Using NULLs instead of zero/blank reduces the table storage space from 11.462 GB down to 9.120 GB. That's a 20% reduction in size, and from what I know about the data it's about right that 20% of the values in the database are unused.

I would think that any full table scans would run faster against the table that has the null values since there are less data pages to read. But, actually, the opposite is true, and by quite a large margin (about 50% slower!).

In the "Slow Query Questions" guide, it says to mention if the table
  • has a large proportion of NULLs in several columns
Yes, it does, I would estimate that about 20% of the time a column's value is null. Why does this matter? Is this a known thing about PostgreSQL performance? If so, where do I read about it?

The table does not contain large objects, has been freshly loaded (so not a lot of UPDATE/DELETEs), is not growing, only has the 1 primary index, and does not use triggers.

Anyway, below are the query results. The field being selected (creation_user) is not in any index, which forces a full table scan:

--> 18.844 sec to execute when all columns defined NOT NULL WITH DEFAULT, table size is 11.462 GB
select creation_user, count(*)
   from eu.royalty_no_null
 group by creation_user;

creation_user|count   |
-------------+--------+
[BLANK]      |   84546|
BACOND       |      10|
BALUN        |    2787|
FOGGOL       |     109|
TRBATCH      |18421018|
QUERY PLAN
Finalize GroupAggregate  (cost=1515478.96..1515479.72 rows=3 width=15) (actual time=11133.324..11135.311 rows=5 loops=1)
  Group Key: creation_user
  I/O Timings: read=1884365.335
  ->  Gather Merge  (cost=1515478.96..1515479.66 rows=6 width=15) (actual time=11133.315..11135.300 rows=13 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        I/O Timings: read=1884365.335
        ->  Sort  (cost=1514478.94..1514478.95 rows=3 width=15) (actual time=11127.396..11127.398 rows=4 loops=3)
              Sort Key: creation_user
              Sort Method: quicksort  Memory: 25kB
              I/O Timings: read=1884365.335
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Partial HashAggregate  (cost=1514478.89..1514478.92 rows=3 width=15) (actual time=11127.370..11127.372 rows=4 loops=3)
                    Group Key: creation_user
                    Batches: 1  Memory Usage: 24kB
                    I/O Timings: read=1884365.335
                    Worker 0:  Batches: 1  Memory Usage: 40kB
                    Worker 1:  Batches: 1  Memory Usage: 40kB
                    ->  Parallel Seq Scan on royalty_no_null  (cost=0.00..1475918.59 rows=7712059 width=7) (actual time=0.006..9339.296 rows=6169490 loops=3)
                          I/O Timings: read=1884365.335
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', search_path = 'public, public, "$user"'
Planning Time: 0.098 ms
Execution Time: 11135.368 ms


--> 30.57 sec to execute when all columns are nullable instead of defaulting to zero/blank, table size is 9.120 GB:
select creation_user, count(*)
   from eu.royalty_with_null
 group by creation_user;

creation_user|count   |
-------------+--------+
BACOND       |      10|
BALUN        |    2787|
FOGGOL       |     109|
TRBATCH      |18421018|
[NULL]       |   84546|
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------+
Finalize GroupAggregate  (cost=1229649.93..1229650.44 rows=2 width=15) (actual time=25404.925..25407.262 rows=5 loops=1)
  Group Key: creation_user
  I/O Timings: read=17141420.771
  ->  Gather Merge  (cost=1229649.93..1229650.40 rows=4 width=15) (actual time=25404.917..25407.249 rows=12 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        I/O Timings: read=17141420.771
        ->  Sort  (cost=1228649.91..1228649.91 rows=2 width=15) (actual time=25398.004..25398.006 rows=4 loops=3)
              Sort Key: creation_user
              Sort Method: quicksort  Memory: 25kB
              I/O Timings: read=17141420.771
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Partial HashAggregate  (cost=1228649.88..1228649.90 rows=2 width=15) (actual time=25397.918..25397.920 rows=4 loops=3)
                    Group Key: creation_user
                    Batches: 1  Memory Usage: 24kB
                    I/O Timings: read=17141420.771
                    Worker 0:  Batches: 1  Memory Usage: 40kB
                    Worker 1:  Batches: 1  Memory Usage: 40kB
                    ->  Parallel Seq Scan on royalty_with_null  (cost=0.00..1190094.92 rows=7710992 width=7) (actual time=1.063..21481.517 rows=6169490 loops=3)
                          I/O Timings: read=17141420.771
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', search_path = 'public, public, "$user"'
Planning Time: 0.112 ms
Execution Time: 25407.318 ms

The query runs about 50% longer even though I would think there are 20% less disk pages to read!

In this particular column creation_user very few values are unused, but in other columns in the table many more of the rows have an unused value (blank/zero or NULL).

It seems to make no difference if the column selected is near the beginning of the row or the end, results are about the same.

What is it about null values in the table that slows down the full table scan?

If I populate blank/zero for all of the unused values in columns that are NULLable, the query is fast again. So just defining the columns as NULLable isn't what slows it down -- it's actually the NULL values in the rows that seems to degrade performance.

Do other operations (besides full table scan) get slowed down by null values as well?

Here is the table definition with no nulls, the other table is the same except that all columns are NULLable.
CREATE TABLE eu.royalty_no_null (
isn int4 NOT NULL,
contract_key numeric(19) NOT NULL DEFAULT 0,
co_code numeric(7) NOT NULL DEFAULT 0,
rec_status numeric(1) NOT NULL DEFAULT 0,
record_type numeric(1) NOT NULL DEFAULT 0,
contract_no numeric(6) NOT NULL DEFAULT 0,
subcon_no numeric(4) NOT NULL DEFAULT 0,
sub_division varchar(6) NOT NULL DEFAULT '',
top_price_perc numeric(3) NOT NULL DEFAULT 0,
bar_code_ind varchar(1) NOT NULL DEFAULT '',
process_step_no varchar(1) NOT NULL DEFAULT '',
main_code_group varchar(1) NOT NULL DEFAULT '',
condition varchar(1) NOT NULL DEFAULT '',
neg_sales_ind varchar(1) NOT NULL DEFAULT '',
con_type_ind varchar(1) NOT NULL DEFAULT '',
exchg_tape_ind varchar(1) NOT NULL DEFAULT '',
bagatelle_ind varchar(1) NOT NULL DEFAULT '',
restrict_terr_ind varchar(1) NOT NULL DEFAULT '',
sys_esc_ind varchar(1) NOT NULL DEFAULT '',
equiv_prc_ind varchar(1) NOT NULL DEFAULT '',
scal_fact_ind varchar(1) NOT NULL DEFAULT '',
sell_off_ind varchar(1) NOT NULL DEFAULT '',
cut_rate_ind varchar(1) NOT NULL DEFAULT '',
gross_nett_ind varchar(1) NOT NULL DEFAULT '',
sleeve_ind varchar(1) NOT NULL DEFAULT '',
rate_ind varchar(1) NOT NULL DEFAULT '',
price_basis_calc varchar(1) NOT NULL DEFAULT '',
source_price_basis varchar(1) NOT NULL DEFAULT '',
esc_ind varchar(1) NOT NULL DEFAULT '',
payment_period varchar(1) NOT NULL DEFAULT '',
subcon_reserve_ind varchar(1) NOT NULL DEFAULT '',
release_reason_code varchar(1) NOT NULL DEFAULT '',
bagatelle_qty_val_ind varchar(1) NOT NULL DEFAULT '',
payable_ind varchar(1) NOT NULL DEFAULT '',
record_sequence numeric(1) NOT NULL DEFAULT 0,
sales_type numeric(1) NOT NULL DEFAULT 0,
rate_index numeric(1) NOT NULL DEFAULT 0,
participation numeric(7, 4) NOT NULL DEFAULT 0,
contract_co numeric(7) NOT NULL DEFAULT 0,
reporting_co numeric(7) NOT NULL DEFAULT 0,
article_no varchar(13) NOT NULL DEFAULT '',
art_cat_adm numeric(7) NOT NULL DEFAULT 0,
equiv_config varchar(2) NOT NULL DEFAULT '',
music_class numeric(2) NOT NULL DEFAULT 0,
sales_reference_no numeric(7) NOT NULL DEFAULT 0,
sales_processing_no numeric(2) NOT NULL DEFAULT 0,
sales_trans_code numeric(4) NOT NULL DEFAULT 0,
terr_combination numeric(6) NOT NULL DEFAULT 0,
rate_no numeric(7) NOT NULL DEFAULT 0,
rate varchar(20) NOT NULL DEFAULT '',
rate_normal numeric(7, 3) NOT NULL DEFAULT 0,
rate_esc_1 numeric(7, 3) NOT NULL DEFAULT 0,
sales_process_qty numeric(9) NOT NULL DEFAULT 0,
roy_price numeric(11, 4) NOT NULL DEFAULT 0,
royalty_fee numeric(11, 2) NOT NULL DEFAULT 0,
unit_fee numeric(9, 4) NOT NULL DEFAULT 0,
article_release_date numeric(6) NOT NULL DEFAULT 0,
royalty_rate numeric(7, 3) NOT NULL DEFAULT 0,
orig_article_no varchar(13) NOT NULL DEFAULT '',
no_of_records_in_set numeric(3) NOT NULL DEFAULT 0,
sales_end_period_date numeric(7) NOT NULL DEFAULT 0,
sales_settlement_period numeric(5) NOT NULL DEFAULT 0,
sales_processing_date numeric(7) NOT NULL DEFAULT 0,
sales_date numeric(7) NOT NULL DEFAULT 0,
sales_record_no numeric(9) NOT NULL DEFAULT 0,
equiv_prc_perc numeric(5, 2) NOT NULL DEFAULT 0,
scal_fact_perc numeric(5, 2) NOT NULL DEFAULT 0,
sell_off_perc numeric(5, 2) NOT NULL DEFAULT 0,
price_adj numeric(5, 2) NOT NULL DEFAULT 0,
sleeve_allowance numeric(5, 2) NOT NULL DEFAULT 0,
sales_percentage numeric(5, 2) NOT NULL DEFAULT 0,
price_basis_perc numeric(5, 2) NOT NULL DEFAULT 0,
qtr_date_yyyyq numeric(5) NOT NULL DEFAULT 0,
orig_subcon_no numeric(4) NOT NULL DEFAULT 0,
orig_sales_channel varchar(2) NOT NULL DEFAULT '',
orig_sales_terr numeric(3) NOT NULL DEFAULT 0,
subcon_reserve_code numeric(4) NOT NULL DEFAULT 0,
gross_sales_qty numeric(9) NOT NULL DEFAULT 0,
net_sales_qty numeric(9) NOT NULL DEFAULT 0,
sales_reserved_qty numeric(9) NOT NULL DEFAULT 0,
sales_reserved_perc numeric(5, 2) NOT NULL DEFAULT 0,
released_qty numeric(9) NOT NULL DEFAULT 0,
source_tax_perc numeric(5, 2) NOT NULL DEFAULT 0,
source_tax_amount numeric(11, 2) NOT NULL DEFAULT 0,
income_amount numeric(11, 2) NOT NULL DEFAULT 0,
gross_income numeric(11, 2) NOT NULL DEFAULT 0,
exchange_rate numeric(13, 7) NOT NULL DEFAULT 0,
sales_origin_idc varchar(1) NOT NULL DEFAULT '',
black_box varchar(30) NOT NULL DEFAULT '',
contract_expiry_date numeric(6) NOT NULL DEFAULT 0,
contract_expiry_period numeric(3) NOT NULL DEFAULT 0,
bagatelle_amount numeric(9) NOT NULL DEFAULT 0,
to_be_released_date numeric(6) NOT NULL DEFAULT 0,
sales_start_period_date numeric(7) NOT NULL DEFAULT 0,
selling_company numeric(7) NOT NULL DEFAULT 0,
royalty_amount numeric(11, 2) NOT NULL DEFAULT 0,
currency_code numeric(3) NOT NULL DEFAULT 0,
roy_price_curr numeric(11, 4) NOT NULL DEFAULT 0,
royalty_fee_curr numeric(11, 2) NOT NULL DEFAULT 0,
unit_fee_curr numeric(9, 4) NOT NULL DEFAULT 0,
source_tax_amount_curr numeric(11, 2) NOT NULL DEFAULT 0,
royalty_amount_curr numeric(11, 2) NOT NULL DEFAULT 0,
change_description varchar(40) NOT NULL DEFAULT '',
error_no numeric(2) NOT NULL DEFAULT 0,
first_rel_ind varchar(1) NOT NULL DEFAULT '',
max50_percentage numeric(5, 2) NOT NULL DEFAULT 0,
max50_compare_ind varchar(1) NOT NULL DEFAULT '',
ppd_fin_curr numeric(11, 4) NOT NULL DEFAULT 0,
project_ref_nr varchar(15) NOT NULL DEFAULT '',
reserve_priority_ind varchar(1) NOT NULL DEFAULT '',
esca_seqno numeric(7) NOT NULL DEFAULT 0,
cut_rate_perc numeric(5, 2) NOT NULL DEFAULT 0,
cp_details_c int2 NOT NULL DEFAULT 0,
price_info_c int2 NOT NULL DEFAULT 0,
arts_con_recording_c int2 NOT NULL DEFAULT 0,
processing_company numeric(7) NOT NULL DEFAULT 0,
date_time_change_p numeric(13) NOT NULL DEFAULT 0,
locked_ind_b bytea NOT NULL DEFAULT '\x00',
date_time_cleanup_p numeric(13) NOT NULL DEFAULT 0,
min_price_perc numeric(5, 2) NOT NULL DEFAULT 0,
reserve_sale_type varchar(1) NOT NULL DEFAULT '',
uplift_perc numeric(5, 2) NOT NULL DEFAULT 0,
double_ind bytea NOT NULL DEFAULT '\x00',
min_unit_fee numeric(9, 4) NOT NULL DEFAULT 0,
sales_batch_type varchar(1) NOT NULL DEFAULT '',
prev_esca_qty numeric(9) NOT NULL DEFAULT 0,
arts_con_recording_2_c int2 NOT NULL DEFAULT 0,
aif_share_percentage numeric(5, 2) NOT NULL DEFAULT 0,
aif_rate_percentage numeric(5, 2) NOT NULL DEFAULT 0,
original_recording_id varchar(12) NOT NULL DEFAULT '',
inter_companied_line bytea NOT NULL DEFAULT '\x00',
last_chg_date_time timestamp(6) NOT NULL DEFAULT '0001-01-01',
last_chg_by_id varchar(8) NOT NULL DEFAULT '',
creation_date timestamp(6) NOT NULL DEFAULT '0001-01-01',
creation_user varchar(8) NOT NULL DEFAULT '',
aif_sublic_ind varchar(1) NOT NULL DEFAULT '',
sublicensee numeric(7) NOT NULL DEFAULT 0,
scal_fact_qty numeric(5) NOT NULL DEFAULT 0,
pay_delay numeric(3) NOT NULL DEFAULT 0,
recalc_date timestamp(6) NOT NULL DEFAULT '0001-01-01',
recalc_userid varchar(8) NOT NULL DEFAULT '',
sent_to_sap_ind varchar(1) NOT NULL DEFAULT '',
CONSTRAINT royalty_no_null_pkey PRIMARY KEY (isn)
);

Re: Query is slower with a large proportion of NULLs in several columns

От
"David G. Johnston"
Дата:
On Monday, December 20, 2021, Lars Bergeson <larsavatar@gmail.com> wrote:

What is it about null values in the table that slows down the full table scan?

If I populate blank/zero for all of the unused values in columns that are NULLable, the query is fast again. So just defining the columns as NULLable isn't what slows it down -- it's actually the NULL values in the rows that seems to degrade performance.

The presence or absence of the constraint has zero effect on the contents of the page/tuple.  As soon as you have a single null in a row you are adding a null bitmap [1] to the stored tuple.  And now for every single column the system has to check whether a specific column’s value is null or not.  Given the number of columns in your table, that this is noticeable is not surprising.

David J.

Re: Query is slower with a large proportion of NULLs in several columns

От
Tom Lane
Дата:
Lars Bergeson <larsavatar@gmail.com> writes:
> What is it about null values in the table that slows down the full table
> scan?

If a row has any nulls, then it contains a "nulls bitmap" [1] that says
which columns are null, and that bitmap has to be consulted while
walking through the row contents.  So the most obvious theory here
is that that adds overhead that's significant in your case.  But there
are some holes in that theory, mainly that the I/O timings you are
showing don't seem very consistent:

no nulls:
>   I/O Timings: read=1884365.335
> Execution Time: 11135.368 ms

with nulls:
>   I/O Timings: read=17141420.771
> Execution Time: 25407.318 ms

Regardless of CPU time required, it should not take 10X less I/O
time to read a physically larger table.  So there's something
fairly bogus going on there.  One thing you might try is disabling
parallelism (set max_parallel_workers_per_gather = 0) to see if
that's confusing the numbers somehow.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-TUPLE-LAYOUT



Re: Query is slower with a large proportion of NULLs in several columns

От
Lars Bergeson
Дата:
ok, here are results after I did:
set max_parallel_workers_per_gather = 0;

no nulls table is 11.462 GB:
QUERY PLAN
HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual time=19908.343..19908.345 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=532369.898
  ->  Seq Scan on royalty_no_null  (cost=0.00..1583887.42 rows=18508942 width=7) (actual time=0.013..16705.734 rows=18508470 loops=1)
        I/O Timings: read=532369.898
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', max_parallel_workers_per_gather = '0', search_path = 'public, public, "$user"'
Planning Time: 0.056 ms
Execution Time: 19908.383 ms

with nulls table is 9.120 GB:
QUERY PLAN
HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual time=30369.758..30369.761 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=6440851.540
  ->  Seq Scan on royalty_with_null  (cost=0.00..1298048.80 rows=18506380 width=7) (actual time=0.015..25525.104 rows=18508470 loops=1)
        I/O Timings: read=6440851.540
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', max_parallel_workers_per_gather = '0', search_path = 'public, public, "$user"'
Planning Time: 0.060 ms
Execution Time: 30369.796 ms

Still taking 10X more I/O to read the smaller table. Very odd.

Regarding the earlier comment from David Johnston: If I put null values in just one of the columns for all rows, it should force a null bitmap to be created for every row, with the same amount of checking of the bitmap required. However, the query still runs faster even though the table is larger:
with nulls table is 11.604 GB when all values are filled except 1 column has mostly nulls. The extra 0.14 GB (11.604 GB - 11.462 GB) is probably space consumed by null bitmaps:
QUERY PLAN
HashAggregate  (cost=1693765.03..1693765.06 rows=3 width=15) (actual time=26452.653..26452.655 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=2706123.209
  ->  Seq Scan on royalty_with_null_cols_filled  (cost=0.00..1601218.02 rows=18509402 width=7) (actual time=0.014..22655.366 rows=18508470 loops=1)
        I/O Timings: read=2706123.209
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', max_parallel_workers_per_gather = '0', search_path = 'public, public, "$user"'
Planning Time: 0.068 ms
Execution Time: 26452.691 ms

It seems to be the actual presence of null values that slows things down, even when the same sized null bitmap exists for each row.

On Mon, Dec 20, 2021 at 5:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lars Bergeson <larsavatar@gmail.com> writes:
> What is it about null values in the table that slows down the full table
> scan?

If a row has any nulls, then it contains a "nulls bitmap" [1] that says
which columns are null, and that bitmap has to be consulted while
walking through the row contents.  So the most obvious theory here
is that that adds overhead that's significant in your case.  But there
are some holes in that theory, mainly that the I/O timings you are
showing don't seem very consistent:

no nulls:
>   I/O Timings: read=1884365.335
> Execution Time: 11135.368 ms

with nulls:
>   I/O Timings: read=17141420.771
> Execution Time: 25407.318 ms

Regardless of CPU time required, it should not take 10X less I/O
time to read a physically larger table.  So there's something
fairly bogus going on there.  One thing you might try is disabling
parallelism (set max_parallel_workers_per_gather = 0) to see if
that's confusing the numbers somehow.

                        regards, tom lane

[1] https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-TUPLE-LAYOUT

Re: Query is slower with a large proportion of NULLs in several columns

От
Justin Pryzby
Дата:
On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
> ok, here are results after I did:
> set max_parallel_workers_per_gather = 0;
> 
> HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual time=19908.343..19908.345 rows=5 loops=1)
>   I/O Timings: read=532369.898
> Execution Time: 19908.383 ms

> HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual time=30369.758..30369.761 rows=5 loops=1)
>   I/O Timings: read=6440851.540
> Execution Time: 30369.796 ms

> Still taking 10X more I/O to read the smaller table. Very odd.

If I'm not wrong, it's even worse than that ?
It takes 20 or 30sec to run the query - but it says the associated I/O times
are ~500sec or ~6000sec ?

What architecture and OS/version are you running ?
How did you install postgres?  From a package or compiled from source ?

It might be interesting to know the output from something like this command,
depending on whether and where the headers like pg_config_x86_64.h are installed.

grep -r HAVE_CLOCK_GETTIME /usr/pgsql-13/include

-- 
Justin



Re: Query is slower with a large proportion of NULLs in several columns

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
>> Still taking 10X more I/O to read the smaller table. Very odd.

> If I'm not wrong, it's even worse than that ?
> It takes 20 or 30sec to run the query - but it says the associated I/O times
> are ~500sec or ~6000sec ?

It would help if somebody had labeled the units of I/O Time
... but I'm guessing those are microsec vs. the millisec
of the other times, because otherwise it's completely wrong.

            regards, tom lane



Re: Query is slower with a large proportion of NULLs in several columns

От
"David G. Johnston"
Дата:
On Monday, December 20, 2021, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
> ok, here are results after I did:
> set max_parallel_workers_per_gather = 0;
>
> HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual time=19908.343..19908.345 rows=5 loops=1)
>   I/O Timings: read=532369.898
> Execution Time: 19908.383 ms

> HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual time=30369.758..30369.761 rows=5 loops=1)
>   I/O Timings: read=6440851.540
> Execution Time: 30369.796 ms

> Still taking 10X more I/O to read the smaller table. Very odd.

If I'm not wrong, it's even worse than that ?
It takes 20 or 30sec to run the query - but it says the associated I/O times
are ~500sec or ~6000sec ?

What architecture and OS/version are you running ?
How did you install postgres?  From a package or compiled from source ?

The docs indicate you’ll only see I/O Timing information if using EXPLAIN BUFFERS but I’m not seeing any of the other buffer-related information in these plans.  Thoughts?

David J.

Re: Query is slower with a large proportion of NULLs in several columns

От
"David G. Johnston"
Дата:


On Monday, December 20, 2021, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
>> Still taking 10X more I/O to read the smaller table. Very odd.

> If I'm not wrong, it's even worse than that ?
> It takes 20 or 30sec to run the query - but it says the associated I/O times
> are ~500sec or ~6000sec ?

It would help if somebody had labeled the units of I/O Time
... but I'm guessing those are microsec vs. the millisec
of the other times, because otherwise it's completely wrong.


Related to my preceding observation, from the explain (buffers) docs:

“…and the time spent reading and writing data file blocks (in milliseconds) if track_io_timing is enabled.“

David J.

Re: Query is slower with a large proportion of NULLs in several columns

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Monday, December 20, 2021, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It would help if somebody had labeled the units of I/O Time
>> ... but I'm guessing those are microsec vs. the millisec
>> of the other times, because otherwise it's completely wrong.

> Related to my preceding observation, from the explain (buffers) docs:
> “…and the time spent reading and writing data file blocks (in milliseconds)
> if track_io_timing
> <https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-IO-TIMING>
> is enabled.“

Hmm ... the code sure looks like it's supposed to be millisec:

            appendStringInfoString(es->str, "I/O Timings:");
            if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
                appendStringInfo(es->str, " read=%0.3f",
                                 INSTR_TIME_GET_MILLISEC(usage->blk_read_time));
            if (!INSTR_TIME_IS_ZERO(usage->blk_write_time))
                appendStringInfo(es->str, " write=%0.3f",
                                 INSTR_TIME_GET_MILLISEC(usage->blk_write_time));

And when I try some cases here, I get I/O timing numbers that are
consistent with the overall time reported by EXPLAIN, for example

 Seq Scan on foo  (cost=0.00..843334.10 rows=11000010 width=508) (actual time=0.
015..1897.492 rows=11000000 loops=1)
   Buffers: shared hit=15874 read=717460
   I/O Timings: read=1184.638
 Planning:
   Buffers: shared hit=5 read=2
   I/O Timings: read=0.025
 Planning Time: 0.229 ms
 Execution Time: 2151.529 ms

So now we have a real mystery about what is happening on Lars'
system.  Those numbers can't be right.

            regards, tom lane