Обсуждение: PG planning randomly ?
Hi, I'm having some issues with this simple query: SELECT _comment.*, _article.title AS article_title, _article.reference AS article_reference FROM _comment INNER JOIN _article ON _article.id = _comment.parent_id WHERE _comment.path <@ '0.1.3557034' ORDER BY _comment.date_publishing DESC OFFSET 0 LIMIT 5 ; The varying information here is the ltree path "0.1.3557034" Usually it's quite fast (<1s) but sometimes after an ANALYZE on the _comment table it gets so slow it's killing our servers. And it's really random. We run our servers with default_statistics_target=100, I tried setting it up to 1000 (max) but it does not change this wrong behavior. I executed the same query on our 11 servers, 3 of them executed the query slowly after the ANALYZE. Sometimes it happens to more, sometimes to less. Here is the EXPLAIN ANALYZE data on those 3 servers before and after the ANALYZE execution. ===== Server 1 =( ====== ===== The Query on server 1 before an ANALYZE ===== Limit (cost=16286.04..16286.06 rows=5 width=567) (actual time=62.521..62.526 rows=5 loops=1) -> Sort (cost=16286.04..16289.89 rows=1539 width=567) (actual time=62.519..62.520 rows=5 loops=1) Sort Key: _comment.date_publishing -> Nested Loop (cost=0.00..16204.57 rows=1539 width=567) (actual time=2.063..44.517 rows=3606 loops=1) -> Index Scan using gist_idx_comment_path on _comment (cost=0.00..4736.73 rows=1539 width=534) (actual time=2.038..20.487 rows=3748 loops=1) Index Cond: (path <@ '0.1.14666029'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..7.44 rows=1 width=41) (actual time=0.004..0.004 rows=1 loops=3748) Index Cond: (_article.id = _comment.parent_id) Total runtime: 64.844 ms ===== The Query on server 1 after an ANALYZE ===== Limit (cost=0.00..11082.13 rows=5 width=569) (actual time=313945.051..693805.921 rows=5 loops=1) -> Nested Loop (cost=0.00..34057601.77 rows=15366 width=569) (actual time=313945.049..693805.912 rows=5 loops=1) -> Index Scan Backward using idx_comment_date_publishing on _comment (cost=0.00..33949736.04 rows=15366 width=536) (actual time=313923.129..693755.772 rows=5 loops=1) Filter: (path <@ '0.1.14666029'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..7.01 rows=1 width=41) (actual time=10.016..10.018 rows=1 loops=5) Index Cond: (_article.id = _comment.parent_id) Total runtime: 693806.044 ms ===== Poor Server 2 ='( ====== ===== The Query on server 2 before an ANALYZE ===== Limit (cost=21096.49..21096.51 rows=5 width=586) (actual time=34.184..34.187 rows=5 loops=1) -> Sort (cost=21096.49..21100.33 rows=1535 width=586) (actual time=34.182..34.184 rows=5 loops=1) Sort Key: _comment.date_publishing -> Nested Loop (cost=0.00..21015.26 rows=1535 width=586) (actual time=0.119..25.232 rows=3606 loops=1) -> Index Scan using gist_idx_comment_path on _comment (cost=0.00..6325.53 rows=1535 width=553) (actual time=0.100..11.066 rows=3748 loops=1) Index Cond: (path <@ '0.1.14666029'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..9.56 rows=1 width=41) (actual time=0.002..0.003 rows=1 loops=3748) Index Cond: (_article.id = _comment.parent_id) Total runtime: 34.658 ms ===== The Query on server 2 after an ANALYZE ===== Limit (cost=0.00..18806.13 rows=5 width=585) (actual time=363344.748..575823.722 rows=5 loops=1) -> Nested Loop (cost=0.00..57764897.33 rows=15358 width=585) (actual time=363344.747..575823.715 rows=5 loops=1) -> Index Scan Backward using idx_comment_date_publishing on _comment (cost=0.00..57618270.03 rows=15358 width=552) (actual time=363344.681..575823.502 rows=5 loops=1) Filter: (path <@ '0.1.14666029'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..9.53 rows=1 width=41) (actual time=0.036..0.036 rows=1 loops=5) Index Cond: (_article.id = _comment.parent_id) Total runtime: 575823.796 ms ===== Poor Server 3 ='(((( ====== ===== The Query on server 3 before an ANALYZE ===== Limit (cost=20563.80..20563.81 rows=5 width=585) (actual time=31.424..31.428 rows=5 loops=1) -> Sort (cost=20563.80..20567.64 rows=1539 width=585) (actual time=31.423..31.424 rows=5 loops=1) Sort Key: _comment.date_publishing -> Nested Loop (cost=0.00..20482.32 rows=1539 width=585) (actual time=1.198..22.912 rows=3606 loops=1) -> Index Scan using gist_idx_comment_path on _comment (cost=0.00..6341.85 rows=1539 width=552) (actual time=1.160..9.641 rows=3748 loops=1) Index Cond: (path <@ '0.1.14666029'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..9.18 rows=1 width=41) (actual time=0.002..0.003 rows=1 loops=3748) Index Cond: (_article.id = _comment.parent_id) Total runtime: 31.850 ms ===== The Query on server 3 after an ANALYZE ===== Limit (cost=0.00..18726.66 rows=5 width=585) (actual time=171378.294..286416.273 rows=5 loops=1) -> Nested Loop (cost=0.00..57577000.69 rows=15373 width=585) (actual time=171378.293..286416.269 rows=5 loops=1) -> Index Scan Backward using idx_comment_date_publishing on _comment (cost=0.00..57436080.63 rows=15373 width=552) (actual time=171378.249..286416.062 rows=5 loops=1) Filter: (path <@ '0.1.14666029'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..9.15 rows=1 width=41) (actual time=0.034..0.034 rows=1 loops=5) Index Cond: (_article.id = _comment.parent_id) Total runtime: 286416.339 ms How can we stick the planner to the faster execution plan ? Please help our poor servers, they are tired ;) -- Laurent Raufaste <http://www.glop.org/>
2008/2/26, Laurent Raufaste <analogue@glop.org>: > Hi, > > I'm having some issues with this simple query: > > SELECT > _comment.*, > _article.title AS article_title, > _article.reference AS article_reference > FROM > _comment > INNER JOIN _article > ON _article.id = _comment.parent_id > WHERE > _comment.path <@ '0.1.3557034' > ORDER BY > _comment.date_publishing DESC > OFFSET 0 > LIMIT 5 > ; I forgot the table definition, here it is ;) Table "ob2._comment" Column | Type | Modifiers -------------------+-----------------------------+-------------------------------------------------------------- id | bigint | not null default nextval('element_id_sequence'::regclass) parent_id | bigint | path | ltree | data | text | date_creation | timestamp without time zone | not null default now() date_publishing | timestamp without time zone | not null default now() date_modification | timestamp without time zone | not null default now() counters | hstore | reference | integer | not null default nextval('_comment_reference_seq'::regclass) text | text | Indexes: "_comment_pkey" PRIMARY KEY, btree (id), tablespace "indexspace" "gist_idx_comment_path" gist (path), tablespace "indexspace" "idx_comment_date_creation" btree (date_creation), tablespace "indexspace" "idx_comment_date_publishing" btree (date_publishing), tablespace "indexspace" "idx_comment_parent_id" btree (parent_id), tablespace "indexspace" "idx_comment_reference" btree (reference), tablespace "indexspace" Inherits: _element Thanks for looking into ou problem ! -- Laurent Raufaste <http://www.glop.org/>
"Laurent Raufaste" <analogue@glop.org> writes: > I'm having some issues with this simple query: > SELECT > _comment.*, > _article.title AS article_title, > _article.reference AS article_reference > FROM > _comment > INNER JOIN _article > ON _article.id = _comment.parent_id > WHERE > _comment.path <@ '0.1.3557034' > ORDER BY > _comment.date_publishing DESC > OFFSET 0 > LIMIT 5 > ; > The varying information here is the ltree path "0.1.3557034" What PG version is this? If it's 8.2 or later then increasing the stats target for _comment.path to 100 or more would likely help. regards, tom lane
2008/2/26, Tom Lane <tgl@sss.pgh.pa.us>: > > What PG version is this? > > If it's 8.2 or later then increasing the stats target for _comment.path > to 100 or more would likely help. > I'm using PG 8.2.4. We are using 100 as default_statistics_target by default and all our column are using this value: # SELECT attname,attstattarget FROM pg_attribute WHERE attrelid=16743 AND attname='path' ; attname | attstattarget ---------+--------------- path | -1 I tried increasing the stats target with the command: SET default_statistics_target=1000 ; That's the command I launched before executing the ANALYZE showed in the previous mail. The ANALYZE were longer to complete, but it did not change the planner behavior (sometimes right, sometimes wrong). I did not try setting up the target stats directly using an ALTER TABLE because it implies some LOCK on our replication cluster. Do you think the planner will act differently by using an ALTER TABLE rather then just the "SET default_statistics_target" command ? If so, I will try it =) Thanks. -- Laurent Raufaste <http://www.glop.org/>
"Laurent Raufaste" <analogue@glop.org> writes: > 2008/2/26, Tom Lane <tgl@sss.pgh.pa.us>: >> If it's 8.2 or later then increasing the stats target for _comment.path >> to 100 or more would likely help. > I'm using PG 8.2.4. > We are using 100 as default_statistics_target by default and all our > column are using this value: Hmm, that ought to be enough to activate the better selectivity estimator. Unless ... did you update this database from a pre-8.2 DB that already had contrib/ltree in it? If so, did you just load the existing old definition of ltree as part of your dump, or did you install 8.2's version fresh? I'm wondering if you have a definition of operator <@ that doesn't specify the new selectivity estimator. Please try a pg_dump -s and see what it shows as the definition of <@. regards, tom lane
2008/2/26, Tom Lane <tgl@sss.pgh.pa.us>: > "Laurent Raufaste" <analogue@glop.org> writes: > > > 2008/2/26, Tom Lane <tgl@sss.pgh.pa.us>: > > >> If it's 8.2 or later then increasing the stats target for _comment.path > >> to 100 or more would likely help. > > > I'm using PG 8.2.4. > > We are using 100 as default_statistics_target by default and all our > > column are using this value: > > > Hmm, that ought to be enough to activate the better selectivity > estimator. > > Unless ... did you update this database from a pre-8.2 DB that already > had contrib/ltree in it? If so, did you just load the existing old > definition of ltree as part of your dump, or did you install 8.2's > version fresh? I'm wondering if you have a definition of operator <@ > that doesn't specify the new selectivity estimator. Please try a > pg_dump -s and see what it shows as the definition of <@. > > regards, tom lane > Here's the first definition of the <@ operator in my dump: -- -- Name: <@; Type: OPERATOR; Schema: public; Owner: postgres -- CREATE OPERATOR <@ ( PROCEDURE = ltree_risparent, LEFTARG = ltree, RIGHTARG = ltree, COMMUTATOR = @>, RESTRICT = ltreeparentsel, JOIN = contjoinsel ); ALTER OPERATOR public.<@ (ltree, ltree) OWNER TO postgres; Our data was created on an older PG (8.1.x) but we installed 8.2.x from scratch, only dumping the schema and the data in it. I used ltree found in the 8.2.4 source. Do you think an update of ltree, or better of the database will fix the problem ? We plan on upgrading to the 8.3 branch in the next weeks, but this behavior can't wait this much as our servers are overburned from time to time =( Thanks for your help ;) -- Laurent Raufaste <http://www.glop.org/>
On Tue, Feb 26, 2008 at 11:12 AM, Laurent Raufaste <analogue@glop.org> wrote: > 2008/2/26, Tom Lane <tgl@sss.pgh.pa.us>: > > > > > What PG version is this? > > > > If it's 8.2 or later then increasing the stats target for _comment.path > > to 100 or more would likely help. > > > > I'm using PG 8.2.4. > We are using 100 as default_statistics_target by default and all our > column are using this value: > # SELECT attname,attstattarget FROM pg_attribute WHERE attrelid=16743 > AND attname='path' ; > attname | attstattarget > ---------+--------------- > path | -1 > > I tried increasing the stats target with the command: > SET default_statistics_target=1000 ; > That's the command I launched before executing the ANALYZE showed in > the previous mail. > The ANALYZE were longer to complete, but it did not change the planner > behavior (sometimes right, sometimes wrong). You're doing it wrong. The default target affects newly created columns / tables. You need to use alter table to change a stats target after creation. Like so: alter table abc alter column xyz set statistics 100;
"Laurent Raufaste" <analogue@glop.org> writes: > 2008/2/26, Tom Lane <tgl@sss.pgh.pa.us>: >> ... I'm wondering if you have a definition of operator <@ >> that doesn't specify the new selectivity estimator. Please try a >> pg_dump -s and see what it shows as the definition of <@. > Here's the first definition of the <@ operator in my dump: > CREATE OPERATOR <@ ( > PROCEDURE = ltree_risparent, > LEFTARG = ltree, > RIGHTARG = ltree, > COMMUTATOR = @>, > RESTRICT = ltreeparentsel, > JOIN = contjoinsel > ); That's the right RESTRICT function, but what exactly did you mean by "first definition"? Are there more? It may be that it's just not possible for the estimator to come up with accurate rowcount estimates given the amount of info it has available. The query you are complaining about confuses the issue quite a lot by involving other issues. Would you try just "explain analyze select 1 from _commment where path <@ '....';" for various typical path values, and see if it's coming up with estimated rowcounts that are in the right ballpark compared to the actual ones? regards, tom lane
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > On Tue, Feb 26, 2008 at 11:12 AM, Laurent Raufaste <analogue@glop.org> wrote: >> I tried increasing the stats target with the command: >> SET default_statistics_target=1000 ; >> That's the command I launched before executing the ANALYZE showed in >> the previous mail. > You're doing it wrong. The default target affects newly created > columns / tables. You need to use alter table to change a stats > target after creation. Like so: > alter table abc alter column xyz set statistics 100; That's completely incorrect. If the column doesn't have a specific stats target (indicated by -1 in attstattarget, which Laurent showed us was the case), then ANALYZE will use the current value of default_statistics_target. The table-creation-time value of that parameter isn't relevant at all. regards, tom lane
2008/2/26, Tom Lane <tgl@sss.pgh.pa.us>: > > That's the right RESTRICT function, but what exactly did you mean by > "first definition"? Are there more? I thought it was enough, here is the complete definition of the <@ operator: -- -- Name: <@; Type: OPERATOR; Schema: public; Owner: postgres -- CREATE OPERATOR <@ ( PROCEDURE = ltree_risparent, LEFTARG = ltree, RIGHTARG = ltree, COMMUTATOR = @>, RESTRICT = ltreeparentsel, JOIN = contjoinsel ); ALTER OPERATOR public.<@ (ltree, ltree) OWNER TO postgres; -- -- Name: <@; Type: OPERATOR; Schema: public; Owner: postgres -- CREATE OPERATOR <@ ( PROCEDURE = _ltree_r_isparent, LEFTARG = ltree, RIGHTARG = ltree[], COMMUTATOR = @>, RESTRICT = contsel, JOIN = contjoinsel ); ALTER OPERATOR public.<@ (ltree, ltree[]) OWNER TO postgres; -- -- Name: <@; Type: OPERATOR; Schema: public; Owner: postgres -- CREATE OPERATOR <@ ( PROCEDURE = _ltree_risparent, LEFTARG = ltree[], RIGHTARG = ltree, COMMUTATOR = @>, RESTRICT = contsel, JOIN = contjoinsel ); ALTER OPERATOR public.<@ (ltree[], ltree) OWNER TO postgres; -- -- Name: <@; Type: OPERATOR; Schema: public; Owner: postgres -- CREATE OPERATOR <@ ( PROCEDURE = hs_contained, LEFTARG = hstore, RIGHTARG = hstore, COMMUTATOR = @>, RESTRICT = contsel, JOIN = contjoinsel ); ALTER OPERATOR public.<@ (hstore, hstore) OWNER TO postgres; > > It may be that it's just not possible for the estimator to come up with > accurate rowcount estimates given the amount of info it has available. > The query you are complaining about confuses the issue quite a lot by > involving other issues. Would you try just "explain analyze select 1 > from _commment where path <@ '....';" for various typical path values, > and see if it's coming up with estimated rowcounts that are in the right > ballpark compared to the actual ones? > It might be the source of the problem =) I executed the following query on all the servers with a varying path (but with the same path on each server), before and after lauching an ANALYZE _comment. EXPLAIN ANALYZE SELECT 1 FROM _comment WHERE path <@ '0.1.810879' ; On every server except one it showed the same plan before and after the ANALYZE: Bitmap Heap Scan on _comment (cost=174.87..6163.31 rows=1536 width=0) (actual time=1.072..1.495 rows=1070 loops=1) Recheck Cond: (path <@ '0.1.14155763'::ltree) -> Bitmap Index Scan on gist_idx_comment_path (cost=0.00..174.48 rows=1536 width=0) (actual time=1.058..1.058 rows=1070 loops=1) Index Cond: (path <@ '0.1.14155763'::ltree) Total runtime: 1.670 ms On a random server, the plan before the ANALYZE was: Bitmap Heap Scan on _comment (cost=15833.00..440356.99 rows=155649 width=0) (actual time=1.581..2.885 rows=1070 loops=1) Recheck Cond: (path <@ '0.1.14155763'::ltree) -> Bitmap Index Scan on gist_idx_comment_path (cost=0.00..15794.09 rows=155649 width=0) (actual time=1.552..1.552 rows=1070 loops=1) Index Cond: (path <@ '0.1.14155763'::ltree) Total runtime: 3.160 ms The runtime is ok, but the planned cost is huge, because the row count of the index scan estimates 100x more rows. After the ANALYZE it was like the others. If this wrong row count happens, I understand why the planner try to find an alternative plan in the first query I showed you in a previous mail. How can I help him to better estimate the row count ? Setting default_stats_target to 1000 did not help =( -- Laurent Raufaste <http://www.glop.org/>
"Laurent Raufaste" <analogue@glop.org> writes: > On a random server, the plan before the ANALYZE was: > Bitmap Heap Scan on _comment (cost=15833.00..440356.99 rows=155649 > width=0) (actual time=1.581..2.885 rows=1070 loops=1) > Recheck Cond: (path <@ '0.1.14155763'::ltree) > -> Bitmap Index Scan on gist_idx_comment_path > (cost=0.00..15794.09 rows=155649 width=0) (actual time=1.552..1.552 > rows=1070 loops=1) > Index Cond: (path <@ '0.1.14155763'::ltree) > Total runtime: 3.160 ms > The runtime is ok, but the planned cost is huge, because the row count > of the index scan estimates 100x more rows. After the ANALYZE it was > like the others. If this wrong row count happens, I understand why the > planner try to find an alternative plan in the first query I showed > you in a previous mail. > How can I help him to better estimate the row count ? Setting > default_stats_target to 1000 did not help =( Are you sure the table had been analyzed recently at all on that server? If it had, then what you must be dealing with is a different result from a different random sample. The laws of statistics say that sometimes a random sample won't be very representative ... but if the sample is reasonably large they also say that won't happen very often. You could try ANALYZEing over and over and seeing what rowcount estimate you get after each one. If you frequently get a bad estimate, maybe it would be worth looking at the pg_stats row for _comment.path to see if there's anything obviously bogus about the bad samples. regards, tom lane
2008/2/27, Tom Lane <tgl@sss.pgh.pa.us>: > > Are you sure the table had been analyzed recently at all on that server? > > If it had, then what you must be dealing with is a different result from > a different random sample. The laws of statistics say that sometimes a > random sample won't be very representative ... but if the sample is > reasonably large they also say that won't happen very often. You could > try ANALYZEing over and over and seeing what rowcount estimate you get > after each one. If you frequently get a bad estimate, maybe it would be > worth looking at the pg_stats row for _comment.path to see if there's > anything obviously bogus about the bad samples. > Thanks for your help Tom, it's greetly appreciated here =) Yes, I ANALYZE the table before any bunch of request. I isolated the problem, it happens when a lot (thousands) of rows in the _comment table are matching the query. I can now reproduce the problem at will, and on any server, even on our development server. - I took 3 paths mathing thousands of rows: 0.1.4108047, 0.1.15021804 and 0.1.4749259 - I wrote the following script: ANALYZE _comment (default_stats_target is 100) EXPLAIN ANALYZE SELECT 1 FROM _comment WHERE _path <@ 0.1.4108047 ; EXPLAIN ANALYZE SELECT 1 FROM _comment WHERE _path <@ 0.1.15021804 ; EXPLAIN ANALYZE SELECT 1 FROM _comment WHERE _path <@ 0.1.4749259 ; SELECT * FROM pg_stats WHERE tablename = '_comment' AND attname='path'; If I execute it a lot of times, approx. 2/3 of the executed query costs are OK: Bitmap Heap Scan on _comment (cost=114.24..4634.75 rows=1540 width=0) (actual time=6.715..13.836 rows=12589 loops=1) Recheck Cond: (path <@ '0.1.15021804'::ltree) -> Bitmap Index Scan on gist_idx_comment_path (cost=0.00..113.85 rows=1540 width=0) (actual time=6.515..6.515 rows=12589 loops=1) Index Cond: (path <@ '0.1.15021804'::ltree) And 1/3 of the executed queries are huge: Bitmap Heap Scan on _comment (cost=10366.65..342840.71 rows=156174 width=0) (actual time=6.513..12.984 rows=12589 loops=1) Recheck Cond: (path <@ '0.1.15021804'::ltree) -> Bitmap Index Scan on gist_idx_comment_path (cost=0.00..10327.61 rows=156174 width=0) (actual time=6.313..6.313 rows=12589 loops=1) Index Cond: (path <@ '0.1.15021804'::ltree) The pg_stats table show no strange value, and the only rows not constant are null_frac and correlation. avg_width go from 56 to 57 and n_disctinct stay to -1 (which is OK, all path are distinct) after each ANALYZE: [schemaname] => ob2 [tablename] => _comment [attname] => path [null_frac] => 6.66667e-05 [avg_width] => 56 [n_distinct] => -1 [most_common_vals] => [most_common_freqs] => [correlation] => -0.256958 If I do the same test with default_stats_target=1000, I get the same behavior (huge row counts) but a bit closer to the reality. Instead of only getting 2 different estimations accross all the requests (rows=1540 and rows=156174), I get 3 different ones: (rows=1543 rows=15446 and rows=61784). The problem is that the cost is still huge compared to the reality. And the query we use in our production environment switch to a different way of running it. Fast version: Limit (cost=15557.29..15557.30 rows=5 width=570) (actual time=1305.824..1305.829 rows=5 loops=1) -> Sort (cost=15557.29..15561.14 rows=1540 width=570) (actual time=1305.822..1305.825 rows=5 loops=1) Sort Key: _comment.date_publishing -> Nested Loop (cost=0.00..15475.75 rows=1540 width=570) (actual time=0.185..847.502 rows=61537 loops=1) -> Index Scan using gist_idx_comment_path on _comment (cost=0.00..4746.26 rows=1540 width=537) (actual time=0.118..307.553 rows=64825 loops=1) Index Cond: (path <@ '0.1.4108047'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..6.95 rows=1 width=41) (actual time=0.006..0.006 rows=1 loops=64825) Index Cond: (_article.id = _comment.parent_id) Slow version: Limit (cost=0.00..1047.60 rows=5 width=566) (actual time=0.352..1.625 rows=5 loops=1) -> Nested Loop (cost=0.00..32663447.76 rows=155897 width=566) (actual time=0.351..1.620 rows=5 loops=1) -> Index Scan Backward using idx_comment_date_publishing on _comment (cost=0.00..31719108.69 rows=155897 width=533) (actual time=0.286..1.412 rows=5 loops=1) Filter: (path <@ '0.1.4108047'::ltree) -> Index Scan using _article_pkey on _article (cost=0.00..6.04 rows=1 width=41) (actual time=0.038..0.039 rows=1 loops=5) Index Cond: (_article.id = _comment.parent_id) Don't you think an increase in some RAM parameter would help the server working on this kind of query ? We have 20+GB of RAM for those servers Thanks. -- Laurent Raufaste <http://www.glop.org/>