Обсуждение: installcheck failing on psql_crosstab
Hi all, With a low value of work_mem, like 1MB, I am noticing that the new psql_crosstab is generating a couple of diffs with installcheck (tested only on OSX 10.11): *************** *** 127,134 **** \crosstabview v h i v | h0 | h1 | h2 | h4 | #null# ----+--------+----+----+----+-------- ! v1 | #null# | | 3 +| | ! | | | 7 | | v2 | | 3 | | | v0 | | | | 4 +| 5 | | | | -3| --- 127,134 ---- \crosstabview v h i v | h0 | h1 | h2 | h4 | #null# ----+--------+----+----+----+-------- ! v1 | #null# | | 7 +| | ! | | | 3 | | v2 | | 3 | | | v0 | | | | 4 +| 5 | | | | -3| *************** *** 143,150 **** ----+------+-----+----- h0 | baz | | h1 | | bar | ! h2 | foo +| | ! | quux | | h4 | | | qux+ | | | dbl | | | qux --- 143,150 ---- ----+------+-----+----- h0 | baz | | h1 | | bar | ! h2 | quux+| | ! | foo | | h4 | | | qux+ | | | dbl | | | qux *************** *** 156,163 **** \crosstabview 1 "h" 4 v | h0 | h1 | h2 | h4 | ----+-----+-----+------+-----+----- ! v1 | baz | | foo +| | ! | | | quux | | v2 | | bar | | | v0 | | | | qux+| qux | | | | dbl | --- 156,163 ---- \crosstabview 1 "h" 4 v | h0 | h1 | h2 | h4 | ----+-----+-----+------+-----+----- ! v1 | baz | | quux+| | ! | | | foo | | v2 | | bar | | | v0 | | | | qux+| qux | | | | dbl | ====================================================================== I know that we guarantee that make installcheck may not work on many target servers as a lot of tests are very GUC-sensitive, but this looks a bit oversensitive to me, especially knowing that it is the only diff generated by the whole test suite. Don't you think that those tests could be made more portable? Regards, -- Michael
Michael Paquier wrote: > I know that we guarantee that make installcheck may not work on many > target servers as a lot of tests are very GUC-sensitive, but this > looks a bit oversensitive to me, especially knowing that it is the > only diff generated by the whole test suite. > Don't you think that those tests could be made more portable? Not sure about that. I think the only way to change this would be to remove those particular tests completely, and I don't think that's a good tradeoff. If somebody can show a way to avoid the problem without removing those tests for multiline functionality, I'm all ears. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Michael Paquier wrote: >> I know that we guarantee that make installcheck may not work on many >> target servers as a lot of tests are very GUC-sensitive, but this >> looks a bit oversensitive to me, especially knowing that it is the >> only diff generated by the whole test suite. >> Don't you think that those tests could be made more portable? > Not sure about that. I think the only way to change this would be to > remove those particular tests completely, and I don't think that's a > good tradeoff. If somebody can show a way to avoid the problem without > removing those tests for multiline functionality, I'm all ears. Presumably what is happening is that the planner is switching from hash to sort aggregation. We could force the plan choice via enable_hashagg, which seems OK from the standpoint that this is only meant to test psql not the backend. However, I'm dubious of the entire project. I think if you push the value of work_mem a bit further in either direction, you will find other tests falling over. I'm not excited about changing this one just because it's currently the first to fail. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Michael Paquier wrote: > >> I know that we guarantee that make installcheck may not work on many > >> target servers as a lot of tests are very GUC-sensitive, but this > >> looks a bit oversensitive to me, especially knowing that it is the > >> only diff generated by the whole test suite. > >> Don't you think that those tests could be made more portable? > > > Not sure about that. I think the only way to change this would be to > > remove those particular tests completely, and I don't think that's a > > good tradeoff. If somebody can show a way to avoid the problem without > > removing those tests for multiline functionality, I'm all ears. > > Presumably what is happening is that the planner is switching from hash > to sort aggregation. We could force the plan choice via enable_hashagg, > which seems OK from the standpoint that this is only meant to test psql > not the backend. However, I'm dubious of the entire project. I think > if you push the value of work_mem a bit further in either direction, > you will find other tests falling over. I'm not excited about changing > this one just because it's currently the first to fail. I can't imagine that the server is avoiding hash aggregation on a 1MB work_mem limit for data that's a few dozen of bytes. Is it really doing that? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Presumably what is happening is that the planner is switching from hash >> to sort aggregation. > I can't imagine that the server is avoiding hash aggregation on a 1MB > work_mem limit for data that's a few dozen of bytes. Is it really doing > that? Yup: regression=# explain SELECT v,h, string_agg(i::text, E'\n') AS i FROM ctv_data GROUP BY v, h ORDER BY h,v; QUERY PLAN ------------------------------------------------------------------------Sort (cost=33.87..34.37 rows=200 width=96) SortKey: h, v -> HashAggregate (cost=23.73..26.23 rows=200 width=96) Group Key: h, v -> Seq Scan on ctv_data (cost=0.00..16.10 rows=610 width=68) (5 rows) regression=# set work_mem = '1MB'; SET regression=# explain SELECT v,h, string_agg(i::text, E'\n') AS i FROM ctv_data GROUP BY v, h ORDER BY h,v; QUERY PLAN ------------------------------------------------------------------------GroupAggregate (cost=44.32..55.97 rows=200 width=96) Group Key: h, v -> Sort (cost=44.32..45.85 rows=610 width=68) Sort Key: h, v -> Seq Scan on ctv_data (cost=0.00..16.10 rows=610 width=68) (5 rows) Now that you mention it, this does seem a bit odd, although I remember that there's a pretty substantial fudge factor in there when we have no statistics (which we don't in this example). If I ANALYZE ctv_data then it sticks to the hashagg plan all the way down to 64kB work_mem. regards, tom lane
I wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Tom Lane wrote: >>> Presumably what is happening is that the planner is switching from hash >>> to sort aggregation. >> I can't imagine that the server is avoiding hash aggregation on a 1MB >> work_mem limit for data that's a few dozen of bytes. Is it really doing >> that? > Yup: I looked more closely and found that the reason it's afraid to use hash aggregation is the amount of transition space potentially needed by string_agg. That's being estimated as 8kB per group, and with the (default) estimate of 200 groups, you get about 1.6MB estimated to be needed. Also, I confirmed my suspicion that some other regression tests fail when you reduce work_mem below 1MB. So I'm not really excited about changing this one. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > I can't imagine that the server is avoiding hash aggregation on a 1MB > > work_mem limit for data that's a few dozen of bytes. Is it really doing > > that? > > Yup: Aha. Thanks for testing. > Now that you mention it, this does seem a bit odd, although I remember > that there's a pretty substantial fudge factor in there when we have > no statistics (which we don't in this example). If I ANALYZE ctv_data > then it sticks to the hashagg plan all the way down to 64kB work_mem. Hmm, so we could solve the complaint by adding an ANALYZE. I'm open to that; other opinions? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 7, 2016 at 12:28 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Tom Lane wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> > I can't imagine that the server is avoiding hash aggregation on a 1MB >> > work_mem limit for data that's a few dozen of bytes. Is it really doing >> > that? >> >> Yup: > > Aha. Thanks for testing. > >> Now that you mention it, this does seem a bit odd, although I remember >> that there's a pretty substantial fudge factor in there when we have >> no statistics (which we don't in this example). If I ANALYZE ctv_data >> then it sticks to the hashagg plan all the way down to 64kB work_mem. > > Hmm, so we could solve the complaint by adding an ANALYZE. I'm open to > that; other opinions? We could just enforce work_mem to 64kB and then reset it. -- Michael
On Tue, Jun 7, 2016 at 12:31 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Jun 7, 2016 at 12:28 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Tom Lane wrote: >>> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> >>> > I can't imagine that the server is avoiding hash aggregation on a 1MB >>> > work_mem limit for data that's a few dozen of bytes. Is it really doing >>> > that? >>> >>> Yup: >> >> Aha. Thanks for testing. >> >>> Now that you mention it, this does seem a bit odd, although I remember >>> that there's a pretty substantial fudge factor in there when we have >>> no statistics (which we don't in this example). If I ANALYZE ctv_data >>> then it sticks to the hashagg plan all the way down to 64kB work_mem. >> >> Hmm, so we could solve the complaint by adding an ANALYZE. I'm open to >> that; other opinions? > > We could just enforce work_mem to 64kB and then reset it. Or just set up work_mem to a wanted value for the duration of the run of psql_crosstab. Attached is my proposal. -- Michael
Вложения
Michael Paquier <michael.paquier@gmail.com> writes: > On Tue, Jun 7, 2016 at 12:31 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Tue, Jun 7, 2016 at 12:28 AM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >>> Hmm, so we could solve the complaint by adding an ANALYZE. I'm open to >>> that; other opinions? >> We could just enforce work_mem to 64kB and then reset it. > Or just set up work_mem to a wanted value for the duration of the run > of psql_crosstab. Attached is my proposal. I liked the ANALYZE idea better; this seems pretty ad-hoc. regards, tom lane
Tom Lane wrote: > Michael Paquier <michael.paquier@gmail.com> writes: > > On Tue, Jun 7, 2016 at 12:31 PM, Michael Paquier > > <michael.paquier@gmail.com> wrote: > >> On Tue, Jun 7, 2016 at 12:28 AM, Alvaro Herrera > >> <alvherre@2ndquadrant.com> wrote: > >>> Hmm, so we could solve the complaint by adding an ANALYZE. I'm open to > >>> that; other opinions? > > >> We could just enforce work_mem to 64kB and then reset it. > > > Or just set up work_mem to a wanted value for the duration of the run > > of psql_crosstab. Attached is my proposal. > > I liked the ANALYZE idea better; this seems pretty ad-hoc. Done that way. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 8, 2016 at 8:21 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Tom Lane wrote: >> Michael Paquier <michael.paquier@gmail.com> writes: >> > On Tue, Jun 7, 2016 at 12:31 PM, Michael Paquier >> > <michael.paquier@gmail.com> wrote: >> >> On Tue, Jun 7, 2016 at 12:28 AM, Alvaro Herrera >> >> <alvherre@2ndquadrant.com> wrote: >> >>> Hmm, so we could solve the complaint by adding an ANALYZE. I'm open to >> >>> that; other opinions? >> >> >> We could just enforce work_mem to 64kB and then reset it. >> >> > Or just set up work_mem to a wanted value for the duration of the run >> > of psql_crosstab. Attached is my proposal. >> >> I liked the ANALYZE idea better; this seems pretty ad-hoc. > > Done that way. OK, thanks for fixing the issue! -- Michael