Re: Puzzling table scan in a CTE

Поиск
Список
Период
Сортировка
От Elliot
Тема Re: Puzzling table scan in a CTE
Дата
Msg-id 528F8F11.40100@gmail.com
обсуждение исходный текст
Ответ на Puzzling table scan in a CTE  (<slapo@centrum.sk>)
Ответы Re: [GENERAL] Puzzling table scan in a CTE  (<slapo@centrum.sk>)
Список pgsql-general
On 2013-11-22 11:54, slapo@centrum.sk wrote:

Good day,

 

I have a recursive CTE where a table scan occurs, even though there doesn't seem to be a good reason for it.

It seems the planner came to the conclusion that columns that are not actually used in the output, joins or a where clause are a part of the output.

It's not a performance problem now and the query runs quickly (which is why I haven't posted it on the performance mailing list). What bothers me is that the scan seems to be there without being really necessary and I would like to avoid any extra I/O. I would expect the plan to be close the simplified query without CTE posted after the CTE query below.

I'm also worried that this could get worse over time and that it might influence the query's performance.

 

The question:

Am I missing something and the table scan is necessary?

If not, is there a way to avoid it?

 

This is the query:

WITH RECURSIVE user_subordinates 

AS 

(

SELECT

ur1."id" AS var_id,

ur1.id_user_parent AS var_id_user_parent,

ur1.login AS var_login,

ur1_1.login AS var_login_parent,

1::smallint AS var_sort_order

FROM

"user" ur1

JOIN

"user" ur1_1

ON(ur1.id_user_parent=ur1_1."id")

WHERE

ur1.id = 3970

AND ur1.disabled=false

UNION ALL

SELECT

ur2."id" AS var_id,

ur2.id_user_parent AS var_id_user_parent,

ur2.login AS var_login,

ups1.var_login AS var_login_parent,

(ups1.var_sort_order + 1)::smallint AS var_sort_order

FROM

user_subordinates ups1

JOIN

"user" ur2

ON( ups1.var_id = ur2.id_user_parent

AND ups1.var_id <> ur2.id

)

)

SELECT

var_id,

var_id_user_parent,

var_login,

var_login_parent,

var_sort_order

FROM

user_subordinates

ORDER BY

var_sort_order,       

var_id,             

var_id_user_parent

;

This is its execution plan:

http://explain.depesz.com/s/4hY

 

This is a simplified version of the query without CTE:

SELECT

ur2."id" AS var_id,

ur2.id_user_parent AS var_id_user_parent,

ur2.login AS var_login,

ups1.var_login AS var_login_parent,

(ups1.var_sort_order + 1)::smallint AS var_sort_order

FROM

(

SELECT

ur1."id" AS var_id,

ur1.id_user_parent AS var_id_user_parent,

ur1.login AS var_login,

ur1_1.login AS var_login_parent,

1::smallint AS var_sort_order

FROM

"user" ur1

JOIN

"user" ur1_1

ON(ur1.id_user_parent=ur1_1."id")

WHERE

ur1.id = 3970

AND ur1.disabled=false

) ups1

JOIN

"user" ur2

ON( ups1.var_id = ur2.id_user_parent

AND ups1.var_id <> ur2.id

)

;

 

Its plan is here:

http://explain.depesz.com/s/Ak3

 

Here's the table's DDL:

CREATE TABLE "user"

(

id bigserial NOT NULL, -- Unique row identifier

id_user_parent bigint NOT NULL DEFAULT 3, -- Identifier of user's parent user

id_address bigint NOT NULL,

login character varying NOT NULL, -- Login name of a user

password character varying NOT NULL, -- Login password of a user

date_created timestamp without time zone NOT NULL DEFAULT now(), -- Date and time at which the record was created

id_user_created bigint NOT NULL, -- Identifier of a user who created the row

date_modified timestamp without time zone, -- Date and time at which the record was modified

id_user_modified bigint, -- Identifier of a user who modified the row

disabled boolean NOT NULL DEFAULT true, -- Its value is set to true when user account is disabled, false when enabled.

activation_hash character varying NOT NULL, -- Activation hash that a hashed string sent to the user matches against during activation.

invoiced boolean NOT NULL DEFAULT false, -- True is user wishes to be invoiced, false otherwise.

CONSTRAINT user_pkey PRIMARY KEY (id),

CONSTRAINT user_id_address_fkey FOREIGN KEY (id_address) REFERENCES address (id),

CONSTRAINT user_id_user_created_fkey FOREIGN KEY (id_user_created) REFERENCES "user" (id),

CONSTRAINT user_id_user_modified_fkey FOREIGN KEY (id_user_modified) REFERENCES "user" (id),

CONSTRAINT u_login UNIQUE (login)

);

CREATE INDEX fki_user_id_address_fkey ON "user" USING btree(id_address);

CREATE INDEX ix__user__id_user_parent ON "user" USING btree(id_user_parent);

CREATE UNIQUE INDEX ix__user__login ON "user" USING btree (lower(login::text) COLLATE pg_catalog."default");

CREATE UNIQUE INDEX ix__user__login__varchar_pattern_ops ON "user" USING btree (lower(login::text) COLLATE pg_catalog."default" varchar_pattern_ops);

 

Environment: Windows 7 Professional x64, PostgreSQL 9.3.1. The table has been analysed before executing the query and getting the explain result. It's on my workstation and nobody else uses the database but me.

 

Any thought on this are appreciated.

 

Thank you.

 

Peter Slapansky

 

You can "set enable_seqscan = off" to see what the planner thinks of using the index on id_user_parent.
Does every user have a parent? Or a child? Those will affect how many matches you'll find, and if it's a lot then it'll be faster doing the table scan.
Also, have you changed any configuration settings? random_page_cost might need some tuning.


В списке pgsql-general по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: include all the postgres libraries (C)
Следующее
От:
Дата:
Сообщение: Re: [GENERAL] Puzzling table scan in a CTE