Обсуждение: EXISTS by itself vs SELECT EXISTS much slower in query.
A description of what you are trying to achieve and what results you expect.:
I have two equivalent queries, one with an EXISTS clause by itself and one wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
I would expect both to be the same speed / have same execution plan.
-- slow
explain (analyze, buffers)
SELECT
parent.*,
EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;
-- fast
explain (analyze, buffers)
SELECT
parent.*,
(SELECT EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id)) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;
-- slow
https://explain.depesz.com/s/DzcK
-- fast
https://explain.depesz.com/s/EftS
Setup:
CREATE TABLE parent(parent_id BIGSERIAL PRIMARY KEY, name text);
CREATE TABLE child(child_id BIGSERIAL PRIMARY KEY, parent_id bigint references parent(parent_id), name text);
-- random name and sequential primary key for 100 thousand parents.
INSERT INTO parent
SELECT
nextval('parent_parent_id_seq'),
md5(random()::text)
FROM generate_series(1, 100000);
-- 1 million children.
-- set every odd id parent to have children. even id parent gets none.
INSERT INTO child
SELECT
nextval('child_child_id_seq'),
((generate_series/2*2) % 100000)::bigint + 1,
md5(random()::text)
FROM generate_series(1, 1000000);
CREATE INDEX ON child(parent_id);
VACUUM ANALYZE parent, child;
Both queries return the same results - I have taken a md5 of both queries without the LIMIT clause to confirm.
Tables have been vacuumed and analyzed.
No other queries are being executed.
Reproducible with LIMIT 1 or LIMIT 100 or LIMIT 500.
Changing work_mem makes no difference.
-[ RECORD 1 ]--+---------
relname | parent
relpages | 935
reltuples | 100000
relallvisible | 935
relkind | r
relnatts | 2
relhassubclass | f
reloptions |
pg_table_size | 7700480
-[ RECORD 2 ]--+---------
relname | child
relpages | 10310
reltuples | 1e+06
relallvisible | 10310
relkind | r
relnatts | 3
relhassubclass | f
reloptions |
pg_table_size | 84516864
PostgreSQL version number you are running:
PostgreSQL 13.4 on arm-apple-darwin20.5.0, compiled by Apple clang version 12.0.5 (clang-1205.0.22.9), 64-bit
How you installed PostgreSQL:
Using homebrew for mac.
brew install postgres
Changes made to the settings in the postgresql.conf file: see Server Configuration for a quick way to list them all.
checkpoint_completion_target | 0.9 | configuration file
checkpoint_timeout | 30min | configuration file
client_encoding | UTF8 | client
cpu_tuple_cost | 0.03 | configuration file
effective_cache_size | 4GB | configuration file
log_directory | log | configuration file
log_min_duration_statement | 25ms | configuration file
log_statement | none | configuration file
log_temp_files | 0 | configuration file
log_timezone | America/Anchorage | configuration file
maintenance_work_mem | 512MB | configuration file
max_parallel_maintenance_workers | 2 | configuration file
max_parallel_workers | 4 | configuration file
max_parallel_workers_per_gather | 4 | configuration file
max_stack_depth | 2MB | environment variable
max_wal_size | 10GB | configuration file
max_worker_processes | 4 | configuration file
min_wal_size | 80MB | configuration file
random_page_cost | 1.1 | configuration file
shared_buffers | 512MB | configuration file
shared_preload_libraries | auto_explain | configuration file
track_io_timing | on | configuration file
vacuum_cost_limit | 1000 | configuration file
wal_buffers | 64MB | configuration file
wal_compression | on | configuration file
work_mem | 128MB | configuration file
Operating system and version:
macOS Big Sur 11.2.3
I have confirmed this to happen on ubuntu linux however.
What program you're using to connect to PostgreSQL:
psql
Is there anything relevant or unusual in the PostgreSQL server logs?:
no
Hardware specs:
MacBook Air10,1 M1
8GB RAM
APPLE SSD AP0512Q 500.28GB
I have two equivalent queries, one with an EXISTS clause by itself and one wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
I would expect both to be the same speed / have same execution plan.
-- slow
explain (analyze, buffers)
SELECT
parent.*,
EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;
-- fast
explain (analyze, buffers)
SELECT
parent.*,
(SELECT EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id)) AS child_exists
FROM parent
ORDER BY parent_id LIMIT 10;
-- slow
https://explain.depesz.com/s/DzcK
-- fast
https://explain.depesz.com/s/EftS
Setup:
CREATE TABLE parent(parent_id BIGSERIAL PRIMARY KEY, name text);
CREATE TABLE child(child_id BIGSERIAL PRIMARY KEY, parent_id bigint references parent(parent_id), name text);
-- random name and sequential primary key for 100 thousand parents.
INSERT INTO parent
SELECT
nextval('parent_parent_id_seq'),
md5(random()::text)
FROM generate_series(1, 100000);
-- 1 million children.
-- set every odd id parent to have children. even id parent gets none.
INSERT INTO child
SELECT
nextval('child_child_id_seq'),
((generate_series/2*2) % 100000)::bigint + 1,
md5(random()::text)
FROM generate_series(1, 1000000);
CREATE INDEX ON child(parent_id);
VACUUM ANALYZE parent, child;
Both queries return the same results - I have taken a md5 of both queries without the LIMIT clause to confirm.
Tables have been vacuumed and analyzed.
No other queries are being executed.
Reproducible with LIMIT 1 or LIMIT 100 or LIMIT 500.
Changing work_mem makes no difference.
-[ RECORD 1 ]--+---------
relname | parent
relpages | 935
reltuples | 100000
relallvisible | 935
relkind | r
relnatts | 2
relhassubclass | f
reloptions |
pg_table_size | 7700480
-[ RECORD 2 ]--+---------
relname | child
relpages | 10310
reltuples | 1e+06
relallvisible | 10310
relkind | r
relnatts | 3
relhassubclass | f
reloptions |
pg_table_size | 84516864
PostgreSQL version number you are running:
PostgreSQL 13.4 on arm-apple-darwin20.5.0, compiled by Apple clang version 12.0.5 (clang-1205.0.22.9), 64-bit
How you installed PostgreSQL:
Using homebrew for mac.
brew install postgres
Changes made to the settings in the postgresql.conf file: see Server Configuration for a quick way to list them all.
checkpoint_completion_target | 0.9 | configuration file
checkpoint_timeout | 30min | configuration file
client_encoding | UTF8 | client
cpu_tuple_cost | 0.03 | configuration file
effective_cache_size | 4GB | configuration file
log_directory | log | configuration file
log_min_duration_statement | 25ms | configuration file
log_statement | none | configuration file
log_temp_files | 0 | configuration file
log_timezone | America/Anchorage | configuration file
maintenance_work_mem | 512MB | configuration file
max_parallel_maintenance_workers | 2 | configuration file
max_parallel_workers | 4 | configuration file
max_parallel_workers_per_gather | 4 | configuration file
max_stack_depth | 2MB | environment variable
max_wal_size | 10GB | configuration file
max_worker_processes | 4 | configuration file
min_wal_size | 80MB | configuration file
random_page_cost | 1.1 | configuration file
shared_buffers | 512MB | configuration file
shared_preload_libraries | auto_explain | configuration file
track_io_timing | on | configuration file
vacuum_cost_limit | 1000 | configuration file
wal_buffers | 64MB | configuration file
wal_compression | on | configuration file
work_mem | 128MB | configuration file
Operating system and version:
macOS Big Sur 11.2.3
I have confirmed this to happen on ubuntu linux however.
What program you're using to connect to PostgreSQL:
psql
Is there anything relevant or unusual in the PostgreSQL server logs?:
no
Hardware specs:
MacBook Air10,1 M1
8GB RAM
APPLE SSD AP0512Q 500.28GB
Вложения
postgresql 14, linux with: CREATE TABLE child(child_id bigint generated always as identity PRIMARY KEY, parent_id bigint references parent(parent_id), name text); CREATE TABLE child(child_id bigint generated always as identity PRIMARY KEY, parent_id bigint references parent(parent_id), name text); --------- INSERT INTO parent(name) SELECT md5(random()::text) FROM generate_series(1, 100000); --------- INSERT INTO child(parent_id, name) SELECT ((generate_series/2*2) % 100000)::bigint + 1, md5(random()::text) FROM generate_series(1, 1000000); --------- CREATE INDEX ON child(parent_id); VACUUM ANALYZE parent, child; slow: explain (analyze, buffers) SELECT parent.*, EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS child_exists FROM parent ORDER BY parent_id LIMIT 10; https://explain.depesz.com/s/Sx9t fast: explain (analyze, buffers) SELECT parent.*, (SELECT EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id)) AS child_exists FROM parent ORDER BY parent_id LIMIT 10; https://explain.depesz.com/s/mIXR ------- so, this looks strange. On 11/8/21, Jimmy A <jimmypsql@gmail.com> wrote: > A description of what you are trying to achieve and what results you > expect.: > I have two equivalent queries, one with an EXISTS clause by itself and one > wrapped in a (SELECT EXISTS) and the "naked" exists is much slower. > I would expect both to be the same speed / have same execution plan. > > -- slow > explain (analyze, buffers) > SELECT > parent.*, > EXISTS (SELECT * FROM child WHERE child.parent_id=parent.parent_id) AS > child_exists > FROM parent > ORDER BY parent_id LIMIT 10; > > -- fast > explain (analyze, buffers) > SELECT > parent.*, > (SELECT EXISTS (SELECT * FROM child WHERE > child.parent_id=parent.parent_id)) AS child_exists > FROM parent > ORDER BY parent_id LIMIT 10; > > -- slow > https://explain.depesz.com/s/DzcK > > -- fast > https://explain.depesz.com/s/EftS > > Setup: > CREATE TABLE parent(parent_id BIGSERIAL PRIMARY KEY, name text); > CREATE TABLE child(child_id BIGSERIAL PRIMARY KEY, parent_id bigint > references parent(parent_id), name text); > > -- random name and sequential primary key for 100 thousand parents. > INSERT INTO parent > SELECT > nextval('parent_parent_id_seq'), > md5(random()::text) > FROM generate_series(1, 100000); > > -- 1 million children. > -- set every odd id parent to have children. even id parent gets none. > INSERT INTO child > SELECT > nextval('child_child_id_seq'), > ((generate_series/2*2) % 100000)::bigint + 1, > md5(random()::text) > FROM generate_series(1, 1000000); > > CREATE INDEX ON child(parent_id); > VACUUM ANALYZE parent, child; > > Both queries return the same results - I have taken a md5 of both queries > without the LIMIT clause to confirm. > Tables have been vacuumed and analyzed. > No other queries are being executed. > Reproducible with LIMIT 1 or LIMIT 100 or LIMIT 500. > Changing work_mem makes no difference. > > -[ RECORD 1 ]--+--------- > relname | parent > relpages | 935 > reltuples | 100000 > relallvisible | 935 > relkind | r > relnatts | 2 > relhassubclass | f > reloptions | > pg_table_size | 7700480 > -[ RECORD 2 ]--+--------- > relname | child > relpages | 10310 > reltuples | 1e+06 > relallvisible | 10310 > relkind | r > relnatts | 3 > relhassubclass | f > reloptions | > pg_table_size | 84516864 > > PostgreSQL version number you are running: > PostgreSQL 13.4 on arm-apple-darwin20.5.0, compiled by Apple clang version > 12.0.5 (clang-1205.0.22.9), 64-bit > > How you installed PostgreSQL: > Using homebrew for mac. > brew install postgres > > Changes made to the settings in the postgresql.conf file: see Server > Configuration for a quick way to list them all. > checkpoint_completion_target | 0.9 | configuration > file > checkpoint_timeout | 30min | configuration > file > client_encoding | UTF8 | client > cpu_tuple_cost | 0.03 | configuration > file > effective_cache_size | 4GB | configuration > file > log_directory | log | configuration > file > log_min_duration_statement | 25ms | configuration > file > log_statement | none | configuration > file > log_temp_files | 0 | configuration > file > log_timezone | America/Anchorage | configuration > file > maintenance_work_mem | 512MB | configuration > file > max_parallel_maintenance_workers | 2 | configuration > file > max_parallel_workers | 4 | configuration > file > max_parallel_workers_per_gather | 4 | configuration > file > max_stack_depth | 2MB | environment > variable > max_wal_size | 10GB | configuration > file > max_worker_processes | 4 | configuration > file > min_wal_size | 80MB | configuration > file > random_page_cost | 1.1 | configuration > file > shared_buffers | 512MB | configuration > file > shared_preload_libraries | auto_explain | configuration > file > track_io_timing | on | configuration > file > vacuum_cost_limit | 1000 | configuration > file > wal_buffers | 64MB | configuration > file > wal_compression | on | configuration > file > work_mem | 128MB | configuration > file > > Operating system and version: > macOS Big Sur 11.2.3 > I have confirmed this to happen on ubuntu linux however. > > What program you're using to connect to PostgreSQL: > psql > > Is there anything relevant or unusual in the PostgreSQL server logs?: > no > > Hardware specs: > MacBook Air10,1 M1 > 8GB RAM > APPLE SSD AP0512Q 500.28GB > -- Respectfully, Boytsov Vasya
Jimmy A <jimmypsql@gmail.com> writes: > I have two equivalent queries, one with an EXISTS clause by itself and one > wrapped in a (SELECT EXISTS) and the "naked" exists is much slower. > I would expect both to be the same speed / have same execution plan. That is a dangerous assumption. In general, wrapping (SELECT ...) around something has a significant performance impact, because it pushes Postgres to try to decouple the sub-select's execution from the outer query. As an example, postgres=# select x, random() from generate_series(1,3) x; x | random ---+--------------------- 1 | 0.08595356832524814 2 | 0.6444265043474005 3 | 0.6878852071694332 (3 rows) postgres=# select x, (select random()) from generate_series(1,3) x; x | random ---+-------------------- 1 | 0.7028987801136708 2 | 0.7028987801136708 3 | 0.7028987801136708 (3 rows) That's not a bug: it's expected that the second query will evaluate random() only once. In the case at hand, I suspect you're getting a "hashed subplan" in one query and not the other. The depesz.com display doesn't really show that, but EXPLAIN VERBOSE would. regards, tom lane
I see, I never knew that.
Indeed there is a hashed subplan for the EXISTS by itself. So that explains it.
Thanks Tom.
Indeed there is a hashed subplan for the EXISTS by itself. So that explains it.
Thanks Tom.
On Mon, Nov 8, 2021 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jimmy A <jimmypsql@gmail.com> writes:
> I have two equivalent queries, one with an EXISTS clause by itself and one
> wrapped in a (SELECT EXISTS) and the "naked" exists is much slower.
> I would expect both to be the same speed / have same execution plan.
That is a dangerous assumption. In general, wrapping (SELECT ...) around
something has a significant performance impact, because it pushes Postgres
to try to decouple the sub-select's execution from the outer query.
As an example,
postgres=# select x, random() from generate_series(1,3) x;
x | random
---+---------------------
1 | 0.08595356832524814
2 | 0.6444265043474005
3 | 0.6878852071694332
(3 rows)
postgres=# select x, (select random()) from generate_series(1,3) x;
x | random
---+--------------------
1 | 0.7028987801136708
2 | 0.7028987801136708
3 | 0.7028987801136708
(3 rows)
That's not a bug: it's expected that the second query will evaluate
random() only once.
In the case at hand, I suspect you're getting a "hashed subplan"
in one query and not the other. The depesz.com display doesn't
really show that, but EXPLAIN VERBOSE would.
regards, tom lane