Обсуждение: Window-functions patch handling of aggregates
The window functions patch is laboring under the delusion that it can call an aggregate's final function and then go back to invoking the transfn some more on the same data. This is merest fantasy :-( regression=# select array_agg(q1) over(order by q1) from int8_tbl; server closed the connection unexpectedly This probably means the server terminated abnormally before or whileprocessing the request. The connection to the server was lost. Attempting reset: Failed. Unless we want to move the goalposts on what an aggregate is allowed to do internally, we're going to have to change this to re-aggregate repeatedly. Neither prospect is appetizing in the least. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > The window functions patch is laboring under the delusion that it can > call an aggregate's final function and then go back to invoking the > transfn some more on the same data. This is merest fantasy :-( > > regression=# select array_agg(q1) over(order by q1) from int8_tbl; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > Unless we want to move the goalposts on what an aggregate is allowed > to do internally, we're going to have to change this to re-aggregate > repeatedly. Neither prospect is appetizing in the least. Does it currently copy the state datum before calling the final function? Would that help? It does seem like the abstract interface we have for aggregate functions is a strength and we should leverage that. The burden of supporting more complex cases should be borne by the users that are bending the rules. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Gregory Stark <stark@enterprisedb.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Unless we want to move the goalposts on what an aggregate is allowed >> to do internally, we're going to have to change this to re-aggregate >> repeatedly. Neither prospect is appetizing in the least. > Does it currently copy the state datum before calling the final function? > Would that help? No. The entire point of what we have now formalized as "aggregates with internal-type transition values" is that the transition value isn't necessarily a single palloc chunk. For stuff like array_agg, the performance costs of requiring that are enormous. On looking at what array_agg does, it seems the issue there is that the final-function actually deletes the working state when it thinks it's done with it (see construct_md_array). It would probably be possible to fix things so that it doesn't do that when it's called by a WindowAgg instead of a regular Agg. What I'm more concerned about is what third-party code will break. contrib/intagg has done this type of thing since forever, and I'm sure people have copied that... regards, tom lane
2008/12/25 Tom Lane <tgl@sss.pgh.pa.us>: > Gregory Stark <stark@enterprisedb.com> writes: >> Tom Lane <tgl@sss.pgh.pa.us> writes: >>> Unless we want to move the goalposts on what an aggregate is allowed >>> to do internally, we're going to have to change this to re-aggregate >>> repeatedly. Neither prospect is appetizing in the least. > >> Does it currently copy the state datum before calling the final function? >> Would that help? > > No. The entire point of what we have now formalized as "aggregates with > internal-type transition values" is that the transition value isn't > necessarily a single palloc chunk. For stuff like array_agg, the > performance costs of requiring that are enormous. > > On looking at what array_agg does, it seems the issue there is that > the final-function actually deletes the working state when it thinks > it's done with it (see construct_md_array). It would probably be > possible to fix things so that it doesn't do that when it's called by > a WindowAgg instead of a regular Agg. What I'm more concerned about > is what third-party code will break. contrib/intagg has done this > type of thing since forever, and I'm sure people have copied that... I have concerned it once before on the first design of the window functions. I don't have much idea about all over the aggregate functions but at least count(*) does some assumption of AggState in its context. So I concluded when the window functions are introduced it must be announced that if you use aggregate in the window context, you must be sure it supports window as well as aggregate. It is because currently (<= 8.3) aggregates are assumed it is called in AggState only but the assumption would be broken now. It is designed, and announcing helps much third party code to support or not to support window functions (i.e. you can stop by error if window is not supported by the function). Regards, -- Hitoshi Harada
Yeah, it seems like adding a flag like iswindowable to aggregate functions is the safest option. It would be nice if it represented an abstract property of the state function or final function rather than just "works with the implementation of window functions". I'm not sure what that property is though - isidempotent? isreentrant? Maybe just a vague isrepeatable? -- Greg On 24 Dec 2008, at 20:16, "Hitoshi Harada" <umi.tanuki@gmail.com> wrote: > 2008/12/25 Tom Lane <tgl@sss.pgh.pa.us>: >> Gregory Stark <stark@enterprisedb.com> writes: >>> Tom Lane <tgl@sss.pgh.pa.us> writes: >>>> Unless we want to move the goalposts on what an aggregate is >>>> allowed >>>> to do internally, we're going to have to change this to re- >>>> aggregate >>>> repeatedly. Neither prospect is appetizing in the least. >> >>> Does it currently copy the state datum before calling the final >>> function? >>> Would that help? >> >> No. The entire point of what we have now formalized as "aggregates >> with >> internal-type transition values" is that the transition value isn't >> necessarily a single palloc chunk. For stuff like array_agg, the >> performance costs of requiring that are enormous. >> >> On looking at what array_agg does, it seems the issue there is that >> the final-function actually deletes the working state when it thinks >> it's done with it (see construct_md_array). It would probably be >> possible to fix things so that it doesn't do that when it's called by >> a WindowAgg instead of a regular Agg. What I'm more concerned about >> is what third-party code will break. contrib/intagg has done this >> type of thing since forever, and I'm sure people have copied that... > > I have concerned it once before on the first design of the window > functions. I don't have much idea about all over the aggregate > functions but at least count(*) does some assumption of AggState in > its context. So I concluded when the window functions are introduced > it must be announced that if you use aggregate in the window context, > you must be sure it supports window as well as aggregate. It is > because currently (<= 8.3) aggregates are assumed it is called in > AggState only but the assumption would be broken now. It is designed, > and announcing helps much third party code to support or not to > support window functions (i.e. you can stop by error if window is not > supported by the function). > > Regards, > > -- > Hitoshi Harada
2008/12/25 Greg Stark <greg.stark@enterprisedb.com>: > Yeah, it seems like adding a flag like iswindowable to aggregate functions > is the safest option. > > It would be nice if it represented an abstract property of the state > function or final function rather than just "works with the implementation > of window functions". I'm not sure what that property is though - > isidempotent? isreentrant? Maybe just a vague isrepeatable? No, I meant wrinting such like: Datum some_trans_fn(PG_FUNCTION_ARGS) { if (fcinfo->context && IsA(fcinfo->context, WindowAggState)) elog(ERROR, "some_agg does not support window aggregate"); ... } rather than adding column to catalog. To add flag you must add new syntax for CREATE AGGREGATE, which is slightly more painful. Regards, -- Hitoshi Harada
2008/12/25 Hitoshi Harada <umi.tanuki@gmail.com>: > 2008/12/25 Greg Stark <greg.stark@enterprisedb.com>: >> Yeah, it seems like adding a flag like iswindowable to aggregate functions >> is the safest option. >> >> It would be nice if it represented an abstract property of the state >> function or final function rather than just "works with the implementation >> of window functions". I'm not sure what that property is though - >> isidempotent? isreentrant? Maybe just a vague isrepeatable? > > No, I meant wrinting such like: > > Datum > some_trans_fn(PG_FUNCTION_ARGS) > { > if (fcinfo->context && IsA(fcinfo->context, WindowAggState)) > elog(ERROR, "some_agg does not support window aggregate"); > > ... > } > > rather than adding column to catalog. To add flag you must add new > syntax for CREATE AGGREGATE, which is slightly more painful. > enhancing of CREATE AGGREGATE syntax should be better, it could solve problem with compatibility. regards Pavel Stehule > Regards, > > -- > Hitoshi Harada > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
2008/12/25 Pavel Stehule <pavel.stehule@gmail.com>: > 2008/12/25 Hitoshi Harada <umi.tanuki@gmail.com>: >> 2008/12/25 Greg Stark <greg.stark@enterprisedb.com>: >>> Yeah, it seems like adding a flag like iswindowable to aggregate functions >>> is the safest option. >>> >>> It would be nice if it represented an abstract property of the state >>> function or final function rather than just "works with the implementation >>> of window functions". I'm not sure what that property is though - >>> isidempotent? isreentrant? Maybe just a vague isrepeatable? >> >> No, I meant wrinting such like: >> >> Datum >> some_trans_fn(PG_FUNCTION_ARGS) >> { >> if (fcinfo->context && IsA(fcinfo->context, WindowAggState)) >> elog(ERROR, "some_agg does not support window aggregate"); >> >> ... >> } >> >> rather than adding column to catalog. To add flag you must add new >> syntax for CREATE AGGREGATE, which is slightly more painful. >> > > enhancing of CREATE AGGREGATE syntax should be better, it could solve > problem with compatibility. > Most of the aggregates have no problem with this issue and the rest are fixable like array_agg. So adding a column and syntax is too much, I guess. My suggestion of raising error is urgent patch for third party aggregates that are copied from contrib/intagg. But if there is a chance of general approach to call repeatable aggregate other than WindowAgg, that should be considered. Regards, -- Hitoshi Harada
Greg Stark <greg.stark@enterprisedb.com> writes: > Yeah, it seems like adding a flag like iswindowable to aggregate > functions is the safest option. I agree with Hitoshi-san: that's passing information in the wrong direction. The right direction is to make it visible to the called function which context it's being called in, so that it can do the right thing (or at worst, throw an error if it can't). The current state of play is that the documented expectation for aggregate functions is that they should do if (fcinfo->context && IsA(fcinfo->context, AggState)) ... optimized code for aggregate case ...else ... support regularcall, or throw error ... However, a bit of grepping for AggState shows that this expectation isn't met very well within the core distribution, let alone elsewhere. There are about 10 aggregates that have optimizations like this, only 8 of which are playing by the rules --- the violators are: tsearch2's tsa_rewrite_accum: just assumes it's been passed an AggState, dumps core (or worse) if not array_agg: Asserts it's been passed an AggState, dumps core if not As submitted, Hitoshi's patch made the WindowAgg code pass its WindowAggState to the aggregate functions, which at best would have the result of disabling the internal optimizations of the aggregates, and would result in a core dump in any aggregate that was taking a shortcut. The working copy I have right now does this: if (numaggs > 0) { /* * Set up a mostly-dummy AggState to be passed to plain aggregates * so theyknow they can optimize things. We don't supply any useful * info except for the ID of the aggregate-lifetimecontext. */ winstate->aggstate = makeNode(AggState); winstate->aggstate->aggcontext= winstate->wincontext; } and then passes the dummy AggState to plain aggregates, instead of WindowAggState. This makes count(*) and most of the other aggs work as desired, but it's not sufficient for array_agg because of that release-the-data-structure-in-the-final-function optimization. So the alternatives I see are: 1. Go back to Hitoshi's plan of passing WindowAggState to the aggregates. This will require changing every one of the ten aggregates in the core distro, as well as every third-party aggregate that has a similar optimization; and we just have to keep our fingers crossed that anyone who's taking a short-cut will fix their code before it fails in the field. 2. Use an intermediate dummy AggState as I have in my version, but document some convention for telling this from a "real" AggState when needed. (Not hard, we just pick some field that would never be zero in a real AggState and document testing that.) This is certainly on the ugly side, but it would very substantially cut the number of places that need changes. Only aggregates that are doing something irreversible in their final-functions would need to be touched. If we were working in a green field then #1 would clearly be the preferable choice, but worrying about compatibility with existing third-party aggregates is making me lean to #2. Comments? regards, tom lane
On Fri, 2008-12-26 at 14:17 -0500, Tom Lane wrote: > Greg Stark <greg.stark@enterprisedb.com> writes: > > Yeah, it seems like adding a flag like iswindowable to aggregate > > functions is the safest option. > > So the alternatives I see are: > > 1. Go back to Hitoshi's plan of passing WindowAggState to the > aggregates. This will require changing every one of the ten aggregates > in the core distro, as well as every third-party aggregate that has > a similar optimization; and we just have to keep our fingers crossed > that anyone who's taking a short-cut will fix their code before it > fails in the field. > > 2. Use an intermediate dummy AggState as I have in my version, but > document some convention for telling this from a "real" AggState > when needed. (Not hard, we just pick some field that would never be > zero in a real AggState and document testing that.) This is certainly > on the ugly side, but it would very substantially cut the number of > places that need changes. Only aggregates that are doing something > irreversible in their final-functions would need to be touched. > > If we were working in a green field then #1 would clearly be the > preferable choice, but worrying about compatibility with existing > third-party aggregates is making me lean to #2. Comments? > > regards, tom lane > I believe the goal should be correctness but why not both? Fix what we can and put in place a "work around" that would be removed in 8.5? Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company,serving since 1997
"Joshua D. Drake" <jd@commandprompt.com> writes: > I believe the goal should be correctness but why not both? Fix what we > can and put in place a "work around" that would be removed in 8.5? Why not both what? The driving concern here is that there might be third-party aggregates that will dump core if invoked as window functions, and there is simply not anything we can do to prevent that. Short of refusing to call them at all, which strikes me as an extreme overreaction since most of them can be expected to work fine. (In particular, if we went with solution #1 then *all* the ones that were playing by the documented rules would still work.) Also, there is no such thing as "a workaround we can remove in 8.5". If we put in something like a "safe as window function" attribute for CREATE AGGREGATE, we'll have to support it till the end of time. regards, tom lane
On Fri, Dec 26, 2008 at 02:17:29PM -0500, Tom Lane wrote: > So the alternatives I see are: > > 1. Go back to Hitoshi's plan of passing WindowAggState to the > aggregates. This will require changing every one of the ten aggregates > in the core distro, as well as every third-party aggregate that has > a similar optimization; and we just have to keep our fingers crossed > that anyone who's taking a short-cut will fix their code before it > fails in the field. > > 2. Use an intermediate dummy AggState as I have in my version, but > document some convention for telling this from a "real" AggState > when needed. (Not hard, we just pick some field that would never be > zero in a real AggState and document testing that.) This is certainly > on the ugly side, but it would very substantially cut the number of > places that need changes. Only aggregates that are doing something > irreversible in their final-functions would need to be touched. > > If we were working in a green field then #1 would clearly be the > preferable choice, but worrying about compatibility with existing > third-party aggregates is making me lean to #2. Comments? Exactly how large is this third-party aggregate problem? Rather than support a huge wart, we could just tell people starting now and prominently in the release notes that such things need a re-do and point to examples of how it's done. The case this won't work for is where a vendor of a proprietary C extension is gone and/or won't update their stuff for 8.4, and it seems to me that we can't, and shouldn't try to, take responsibility for that use case anyhow. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
> 1. Go back to Hitoshi's plan of passing WindowAggState to the > aggregates. This will require changing every one of the ten aggregates > in the core distro, as well as every third-party aggregate that has > a similar optimization; and we just have to keep our fingers crossed > that anyone who's taking a short-cut will fix their code before it > fails in the field. Unfortunately, if we don't want to add an explicit iswindowable flag (and I understand that that's ugly), then I think this is the way to go. It's a shame that people will have to make code changes, but inventing a fake AggState object just to get around this problem sounds worse. The array_agg code is new and the fact that it doesn't follow the design pattern should be considered a bug in that code rather than a justification for an ugly workaround. ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > Unfortunately, if we don't want to add an explicit iswindowable flag > (and I understand that that's ugly), then I think this is the way to > go. It's a shame that people will have to make code changes, but > inventing a fake AggState object just to get around this problem > sounds worse. The array_agg code is new and the fact that it doesn't > follow the design pattern should be considered a bug in that code > rather than a justification for an ugly workaround. Well, array_agg may be new but it's simply a re-implementation of a design pattern that existed in contrib/intagg since 7.3 or so. I have no problem with fixing array_agg --- what I'm wondering about is who has copied intagg before. regards, tom lane
2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>: > "Robert Haas" <robertmhaas@gmail.com> writes: >> Unfortunately, if we don't want to add an explicit iswindowable flag >> (and I understand that that's ugly), then I think this is the way to >> go. It's a shame that people will have to make code changes, but >> inventing a fake AggState object just to get around this problem >> sounds worse. The array_agg code is new and the fact that it doesn't >> follow the design pattern should be considered a bug in that code >> rather than a justification for an ugly workaround. > > Well, array_agg may be new but it's simply a re-implementation of a > design pattern that existed in contrib/intagg since 7.3 or so. I have > no problem with fixing array_agg --- what I'm wondering about is who > has copied intagg before. We agree that the best solution for ten core aggregates is to rewrite them to support or not support WindowAgg, so the care for third party aggregates copied from intagg is nothing but announcing that the behavior is changing. -- if we had better alternative we should do it, but it seems to me that there's no way not to break the non-core aggregates. SInce at t least you must compile the modules again on 8.4 release, compiling time warnings or something is the best announcing for now. Or any other suggestions? Regards, -- Hitoshi Harada