Обсуждение: PGOPTIONS="-fh" make check gets stuck since Postgres 11
Hi all, I have begun playing with regressplans.sh which enforces various combinations of "-f s|i|n|m|h" when running the regression tests, and I have noticed that -fh can cause the server to become stuck in the test join_hash.sql with this query (not sure which portion of the SET LOCAL parameters are involved) : select count(*) from simple r join extremely_skewed s using (id); This does not happen with REL_10_STABLE where the test executes immediately, so we has visibly an issue caused by v11 here. Any thoughts? -- Michael
Вложения
On Mon, Jul 8, 2019 at 5:53 PM Michael Paquier <michael@paquier.xyz> wrote: > I have begun playing with regressplans.sh which enforces various > combinations of "-f s|i|n|m|h" when running the regression tests, and > I have noticed that -fh can cause the server to become stuck in the > test join_hash.sql with this query (not sure which portion of the SET > LOCAL parameters are involved) : > select count(*) from simple r join extremely_skewed s using (id); > > This does not happen with REL_10_STABLE where the test executes > immediately, so we has visibly an issue caused by v11 here. If you don't allow hash joins it makes this plan: Aggregate -> Nested Loop Join Filter: (r.id = s.id) -> Seq Scan on simple r -> Materialize -> Seq Scan on extremely_skewed s "simple" has 20k rows and "extremely_skewed" has 20k rows but the planner thinks it only has 2. So this going to take O(n^2) time and n is 20k. Not sure what to do about that. Maybe "join_hash" should be skipped for the -h (= no hash joins please) case? -- Thomas Munro https://enterprisedb.com
On Mon, Jul 08, 2019 at 06:49:44PM +1200, Thomas Munro wrote: > "simple" has 20k rows and "extremely_skewed" has 20k rows but the > planner thinks it only has 2. So this going to take O(n^2) time and n > is 20k. Not sure what to do about that. Maybe "join_hash" should be > skipped for the -h (= no hash joins please) case? Ah, thanks. Yes that's going to take a while :) Well, another thing I'd like to think about is if there is any point to keep regressplans.sh and the relevant options in postgres at this stage. From the top of the file one can read that: # This script runs the Postgres regression tests with all useful combinations # of the backend options that disable various query plan types. If the # results are not all the same, it may indicate a bug in a particular # plan type, or perhaps just a regression test whose results aren't fully # determinate (eg, due to lack of an ORDER BY keyword). However if you run any option with make check, then in all runs there are tests failing. We can improve the situation for some of them with ORDER BY queries by looking at the query outputs, but some EXPLAIN queries are sensitive to that, and the history around regressplans.sh does not play in favor of it (some people really use it?). If you look at the latest commits for it, it has not been really touched in 19 years. So I would be rather in favor in nuking it. -- Michael
Вложения
Michael Paquier <michael@paquier.xyz> writes: > I have begun playing with regressplans.sh which enforces various > combinations of "-f s|i|n|m|h" when running the regression tests, and > I have noticed that -fh can cause the server to become stuck in the > test join_hash.sql with this query (not sure which portion of the SET > LOCAL parameters are involved) : > select count(*) from simple r join extremely_skewed s using (id); > This does not happen with REL_10_STABLE where the test executes > immediately, so we has visibly an issue caused by v11 here. Yeah, these test cases were added by fa330f9ad in v11. What it looks like to me is that some of these test cases force "enable_mergejoin = off", so if you also have enable_hashjoin off then you are going to get a nestloop plan, and it's hardly surprising that that takes an unreasonable amount of time on the rather large test tables used in these tests. Given the purposes of this test, I think it'd be reasonable to force both enable_hashjoin = on and enable_mergejoin = off at the very top of the join_hash script, or the corresponding place in join.sql in v11. Thomas, was there a specific reason for forcing enable_mergejoin = off for only some of these tests? regards, tom lane
Michael Paquier <michael@paquier.xyz> writes: > Well, another thing I'd like to think about is if there is any point > to keep regressplans.sh and the relevant options in postgres at this > stage. From the top of the file one can read that: The point of regressplans.sh is to see if anything goes seriously wrong when forcing non-default plan choices --- seriously wrong being defined as crashes or semantically wrong answers. It's not expected that the regression tests will automatically pass when you do that, because of their dependencies on output row ordering, not to mention all the EXPLAINs. I'm not for removing it --- the fact that its results require manual evaluation doesn't make it useless. Having said that, join_hash.sql in particular seems to have zero value if it's not testing hash joins, so I think it'd be reasonable for it to override a global enable_hashjoin = off setting. None of the other regression test scripts seem to take nearly as much of a performance hit from globally forcing poor plans. regards, tom lane
On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote: > Having said that, join_hash.sql in particular seems to have zero > value if it's not testing hash joins, so I think it'd be reasonable > for it to override a global enable_hashjoin = off setting. None of > the other regression test scripts seem to take nearly as much of a > performance hit from globally forcing poor plans. I am a bit confused here. Don't you mean to have enable_hashjoin = *on* at the top of hash_join.sql instead like in the attached? -- Michael
Вложения
On Tue, Jul 9, 2019 at 2:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Given the purposes of this test, I think it'd be reasonable to force > both enable_hashjoin = on and enable_mergejoin = off at the very top > of the join_hash script, or the corresponding place in join.sql in > v11. Thomas, was there a specific reason for forcing enable_mergejoin > = off for only some of these tests? Based on a suggestion from Andres (if I recall correctly), I wrapped each individual test in savepoint/rollback, and then set just the GUCs needed to get the plan shape and execution code path I wanted to exercise, and I guess I found that I only needed to disable merge joins for some of them. The idea was that the individual tests could be understood independently. -- Thomas Munro https://enterprisedb.com
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote: >> Having said that, join_hash.sql in particular seems to have zero >> value if it's not testing hash joins, so I think it'd be reasonable >> for it to override a global enable_hashjoin = off setting. None of >> the other regression test scripts seem to take nearly as much of a >> performance hit from globally forcing poor plans. > I am a bit confused here. Don't you mean to have enable_hashjoin = > *on* at the top of hash_join.sql instead like in the attached? Right, overriding any enable_hashjoin = off that might've come from PGOPTIONS or wherever. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Jul 9, 2019 at 2:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Given the purposes of this test, I think it'd be reasonable to force >> both enable_hashjoin = on and enable_mergejoin = off at the very top >> of the join_hash script, or the corresponding place in join.sql in >> v11. Thomas, was there a specific reason for forcing enable_mergejoin >> = off for only some of these tests? > Based on a suggestion from Andres (if I recall correctly), I wrapped > each individual test in savepoint/rollback, and then set just the GUCs > needed to get the plan shape and execution code path I wanted to > exercise, and I guess I found that I only needed to disable merge > joins for some of them. The idea was that the individual tests could > be understood independently. But per this discussion, they can only be "understood independently" if you make some assumptions about the prevailing values of the planner GUCs. regards, tom lane
On Tue, Jul 9, 2019 at 2:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Based on a suggestion from Andres (if I recall correctly), I wrapped > > each individual test in savepoint/rollback, and then set just the GUCs > > needed to get the plan shape and execution code path I wanted to > > exercise, and I guess I found that I only needed to disable merge > > joins for some of them. The idea was that the individual tests could > > be understood independently. > > But per this discussion, they can only be "understood independently" > if you make some assumptions about the prevailing values of the > planner GUCs. Yeah. I had obviously never noticed that test script. +1 for just enabling hash joins the top of join_hash.sql in 12+, and the equivalent section in 11's join.sql (which is luckily at the end of the file). -- Thomas Munro https://enterprisedb.com
On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote: > Yeah. I had obviously never noticed that test script. +1 for just > enabling hash joins the top of join_hash.sql in 12+, and the > equivalent section in 11's join.sql (which is luckily at the end of > the file). Right, I did not pay much attention to REL_11_STABLE. In this case the test begins around line 2030 and reaches the bottom of the file. I would actually add a RESET at the bottom of it to avoid any tests to be impacted, as usually bug-fix tests are just appended. Thomas, perhaps you would prefer fixing it yourself? Or should I? -- Michael
Вложения
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote: >> Yeah. I had obviously never noticed that test script. +1 for just >> enabling hash joins the top of join_hash.sql in 12+, and the >> equivalent section in 11's join.sql (which is luckily at the end of >> the file). > Right, I did not pay much attention to REL_11_STABLE. In this case > the test begins around line 2030 and reaches the bottom of the file. > I would actually add a RESET at the bottom of it to avoid any tests to > be impacted, as usually bug-fix tests are just appended. Agreed that the scope should be limited. But in 12/HEAD, I think the relevant tests are all wrapped into one transaction block, so that using SET LOCAL would be enough. Not sure if 11 is the same. regards, tom lane
On Tue, Jul 9, 2019 at 2:45 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote: > > Yeah. I had obviously never noticed that test script. +1 for just > > enabling hash joins the top of join_hash.sql in 12+, and the > > equivalent section in 11's join.sql (which is luckily at the end of > > the file). > > Right, I did not pay much attention to REL_11_STABLE. In this case > the test begins around line 2030 and reaches the bottom of the file. > I would actually add a RESET at the bottom of it to avoid any tests to > be impacted, as usually bug-fix tests are just appended. Thomas, > perhaps you would prefer fixing it yourself? Or should I? It's my mistake. I'll fix it later today. -- Thomas Munro https://enterprisedb.com
On Tue, Jul 09, 2019 at 03:04:27PM +1200, Thomas Munro wrote: > It's my mistake. I'll fix it later today. Thanks! -- Michael
Вложения
On Mon, Jul 8, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The point of regressplans.sh is to see if anything goes seriously
wrong when forcing non-default plan choices --- seriously wrong being
defined as crashes or semantically wrong answers. It's not expected
that the regression tests will automatically pass when you do that,
because of their dependencies on output row ordering, not to mention
all the EXPLAINs. I'm not for removing it --- the fact that its
results require manual evaluation doesn't make it useless.
It might be worth post-processing results files to ignore row ordering
in some cases to allow for easier comparison. Has this been proposed
in the past?
in some cases to allow for easier comparison. Has this been proposed
in the past?
--
Melanie Plageman
On Tue, Jul 09, 2019 at 11:54:29AM -0700, Melanie Plageman wrote: > It might be worth post-processing results files to ignore row ordering > in some cases to allow for easier comparison. Has this been proposed > in the past? Not that I recall. -- Michael
Вложения
On Wed, Jul 10, 2019 at 10:12 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jul 09, 2019 at 11:54:29AM -0700, Melanie Plageman wrote: > > It might be worth post-processing results files to ignore row ordering > > in some cases to allow for easier comparison. Has this been proposed > > in the past? > > Not that I recall. > It would be good if we can come up with something like that. It will be helpful for zheap, where in some cases we get different row ordering due to in-place updates. As of now, we try to add Order By or do some extra magic to get consistent row ordering. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote: > It would be good if we can come up with something like that. It will > be helpful for zheap, where in some cases we get different row > ordering due to in-place updates. As of now, we try to add Order By > or do some extra magic to get consistent row ordering. That was an issue for me as well when working with Postgres-XC when the row ordering was not guaranteed depending on the number of nodes (speaking of which Greenplum has the same issues, no?). Adding ORDER BY clauses to a set of tests may make sense, but then this may impact the plans generated for some of them.. -- Michael
Вложения
On Wed, Jul 10, 2019 at 12:40 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:
> It would be good if we can come up with something like that. It will
> be helpful for zheap, where in some cases we get different row
> ordering due to in-place updates. As of now, we try to add Order By
> or do some extra magic to get consistent row ordering.
That was an issue for me as well when working with Postgres-XC when
the row ordering was not guaranteed depending on the number of nodes
(speaking of which Greenplum has the same issues, no?). Adding ORDER
BY clauses to a set of tests may make sense, but then this may impact
the plans generated for some of them..
--
Michael
We have a tool that does this. gpdiff [1] is used for results post-processing
and it uses a perl module called atmsort [2] to deal with the specific ORDER BY
case discussed here.
[1] https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/gpdiff.pl
[2] https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/atmsort.pl
--
Melanie Plageman
Michael Paquier <michael@paquier.xyz> writes: > On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote: >> It would be good if we can come up with something like that. It will >> be helpful for zheap, where in some cases we get different row >> ordering due to in-place updates. As of now, we try to add Order By >> or do some extra magic to get consistent row ordering. > That was an issue for me as well when working with Postgres-XC when > the row ordering was not guaranteed depending on the number of nodes > (speaking of which Greenplum has the same issues, no?). Adding ORDER > BY clauses to a set of tests may make sense, but then this may impact > the plans generated for some of them.. Yeah, I do not want to get into a situation where we can't test queries that lack ORDER BY. Also, the fact that tableam X doesn't reproduce heap's row ordering is not a good reason to relax the strength of the tests for heap. So I'm wondering about some postprocessing that we could optionally apply. Perhaps the tools Melanie mentions could help. regards, tom lane
On Wed, Jul 10, 2019 at 6:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:
>> It would be good if we can come up with something like that. It will
>> be helpful for zheap, where in some cases we get different row
>> ordering due to in-place updates. As of now, we try to add Order By
>> or do some extra magic to get consistent row ordering.
> That was an issue for me as well when working with Postgres-XC when
> the row ordering was not guaranteed depending on the number of nodes
> (speaking of which Greenplum has the same issues, no?). Adding ORDER
> BY clauses to a set of tests may make sense, but then this may impact
> the plans generated for some of them..
Yeah, I do not want to get into a situation where we can't test
queries that lack ORDER BY. Also, the fact that tableam X doesn't
reproduce heap's row ordering is not a good reason to relax the
strength of the tests for heap. So I'm wondering about some
postprocessing that we could optionally apply. Perhaps the tools
Melanie mentions could help.
Surprisingly, I have been working from a couple of days to use those
Perl tools from Greenplum for Zedstore. As for Zedstore plans differ
for many regress tests because relation size not being the same as
heap and all. Also, for similar reasons, row orders change as
well. So, to effectively use the test untouched to validate Zedstore
and yes was thinking will help Zheap testing as well. I also tested
the same for regressplans.sh and it will lift a lot of manual burden
of investigating the results. As one can specify to completely ignore
explain plan outputs from the comparison between results and
expected. Will post patch for the tool, once I get in little decent
shape.
Perl tools from Greenplum for Zedstore. As for Zedstore plans differ
for many regress tests because relation size not being the same as
heap and all. Also, for similar reasons, row orders change as
well. So, to effectively use the test untouched to validate Zedstore
and yes was thinking will help Zheap testing as well. I also tested
the same for regressplans.sh and it will lift a lot of manual burden
of investigating the results. As one can specify to completely ignore
explain plan outputs from the comparison between results and
expected. Will post patch for the tool, once I get in little decent
shape.
On Wed, Jul 10, 2019 at 09:11:41AM -0700, Ashwin Agrawal wrote: > Will post patch for the tool, once I get in little decent shape. That would be nice! We may be able to get something into v13 this way then. -- Michael