Обсуждение: Re: PostgreSQL crashes with SIGSEGV
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed Hi, All information is related to WIP-tuplesort-memcontext-fix.patch. The patch applies and fixes the bug on REL9_6_STABLE and LGTM. However, REL9_5_STABLE affected as well and the patch doesn't applicable to it. But it may be another entry because the difference in tuplesort may cause some changes in the fix too. The new status of this patch is: Ready for Committer
Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes: > The new status of this patch is: Ready for Committer I don't feel particularly comfortable committing a patch that was clearly labeled as a rushed draft by its author. Peter, where do you stand on this work? In a quick look at the patches, WIP-kludge-fix.patch seems clearly unacceptable for back-patching because it changes the signature and behavior of ExecResetTupleTable, which external code might well be using. regards, tom lane
On Wed, Jan 17, 2018 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes: >> The new status of this patch is: Ready for Committer > > I don't feel particularly comfortable committing a patch that > was clearly labeled as a rushed draft by its author. > Peter, where do you stand on this work? I would like to take another pass over WIP-tuplesort-memcontext-fix.patch, to be on the safe side. I'm currently up to my neck in parallel CREATE INDEX work, though, and would prefer to avoid context switching for a week or two, if possible. How time sensitive do you think this is? I think we'll end up doing this: * Committing the minimal modifications made (in both WIP patches) to tuplesort_getdatum() to both v10 and master branches. tuplesort_getdatum() must follow the example of tuplesort_gettupleslot() on these branches, since tuplesort_gettupleslot() already manages to get everything right in recent releases. (There is no known tuplesort_getdatum() crash on these versions, but this still seems necessary on general principle.) * Committing something pretty close to WIP-tuplesort-memcontext-fix.patch to 9.6. * Committing another fix to 9.5. This fix will apply the same principles as WIP-tuplesort-memcontext-fix.patch, but will be simpler mechanically, since the whole batch memory mechanism added to tuplesort.c for 9.6 isn't involved. I'm not sure whether or not we should also apply this still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions don't have grouping sets, and so cannot crash. ISTM that we should leave them alone, since tuplesort has had this problem forever. Perhaps you should go ahead and commit a patch with just the changes to tuplesort_getdatum() now. This part seems low risk, and worth doing in a single, (almost) consistent pass over the back branches. This part is a trivial backpatch, and could be thought of as an independent problem. (It will be nice to get v10 and master branches completely out of the way quickly.) > In a quick look at the patches, WIP-kludge-fix.patch seems clearly > unacceptable for back-patching because it changes the signature and > behavior of ExecResetTupleTable, which external code might well be using. The signature change isn't required, and it was silly of me to add it. But I also really dislike the general approach it takes, and mostly posted it because I thought that it was a useful counterpoint. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Wed, Jan 17, 2018 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't feel particularly comfortable committing a patch that >> was clearly labeled as a rushed draft by its author. >> Peter, where do you stand on this work? > I would like to take another pass over > WIP-tuplesort-memcontext-fix.patch, to be on the safe side. I'm > currently up to my neck in parallel CREATE INDEX work, though, and > would prefer to avoid context switching for a week or two, if > possible. How time sensitive do you think this is? Probably not very. It'd be nice to have it done by the next minor releases, ie before 5-Feb ... but given that these bugs are years old, missing that deadline would not be catastrophic. > I'm not sure whether or not we should also apply this > still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions > don't have grouping sets, and so cannot crash. ISTM that we should > leave them alone, since tuplesort has had this problem forever. +1. If the problem isn't known to be reproducible in those branches, the risk of adding new bugs seems to outweigh any benefit. regards, tom lane
On Wed, Jan 17, 2018 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Probably not very. It'd be nice to have it done by the next minor > releases, ie before 5-Feb ... but given that these bugs are years > old, missing that deadline would not be catastrophic. Got it. >> I'm not sure whether or not we should also apply this >> still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions >> don't have grouping sets, and so cannot crash. ISTM that we should >> leave them alone, since tuplesort has had this problem forever. > > +1. If the problem isn't known to be reproducible in those branches, > the risk of adding new bugs seems to outweigh any benefit. You could make the same objection to changing tuplesort_getdatum() outside of the master branch, though. I think that going back further than that for the (arguably independent) tuplesort_getdatum() subset fix might still be a good idea. I wonder where you stand on this. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Wed, Jan 17, 2018 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> +1. If the problem isn't known to be reproducible in those branches, >> the risk of adding new bugs seems to outweigh any benefit. > You could make the same objection to changing tuplesort_getdatum() > outside of the master branch, though. I think that going back further > than that for the (arguably independent) tuplesort_getdatum() subset > fix might still be a good idea. I wonder where you stand on this. I haven't been following the thread very closely, so I don't have an opinion on that. regards, tom lane
On Wed, Jan 17, 2018 at 1:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You could make the same objection to changing tuplesort_getdatum() >> outside of the master branch, though. I think that going back further >> than that for the (arguably independent) tuplesort_getdatum() subset >> fix might still be a good idea. I wonder where you stand on this. > > I haven't been following the thread very closely, so I don't have an > opinion on that. A complicating factor for this fix of mine is that mode_final() seems to have its own ideas about tuple memory lifetime, over and above what tuplesort_getdatum() explicitly promises, as can be seen here: /* * Note: we *cannot* clean up the tuplesort object here, because the value * to be returned is allocated inside its sortcontext. We could use * datumCopy to copy it out of there, but it doesn't seem worth the * trouble, since the cleanup callback will clear the tuplesort later. */ My WIP-tuplesort-memcontext-fix.patch fix is premised on the idea that nodeAgg.c/grouping sets got it right: nodeAgg.c should be able to continue to assume that in "owning" the memory used for a tuple (in a table slot), it has it in its own memory context -- otherwise, the whole tts_shouldFree tuple slot mechanism is prone to double-frees. This comment directly contradicts/undermines that premise. ISTM that either grouping sets or mode_final() must necessarily be wrong, because each oversteps, and infers a different contract from tuplesort tuple fetching routines (different assumptions about memory contexts are made in each case). Only one can be right, unless it's okay to have one rule for tuplesort_getdatum() and another for tuplesort_gettupleslot() (which seems questionable to me). I still think that grouping sets is right (and that mode_final() is wrong). Do you? -- Peter Geoghegan
On Wed, Jan 17, 2018 at 2:23 PM, Peter Geoghegan <pg@bowt.ie> wrote: > A complicating factor for this fix of mine is that mode_final() seems > to have its own ideas about tuple memory lifetime, over and above what > tuplesort_getdatum() explicitly promises, as can be seen here: > > /* > * Note: we *cannot* clean up the tuplesort object here, because the value > * to be returned is allocated inside its sortcontext. We could use > * datumCopy to copy it out of there, but it doesn't seem worth the > * trouble, since the cleanup callback will clear the tuplesort later. > */ > ISTM that either grouping sets or mode_final() must necessarily be > wrong, because each oversteps, and infers a different contract from > tuplesort tuple fetching routines (different assumptions about memory > contexts are made in each case). Only one can be right, unless it's > okay to have one rule for tuplesort_getdatum() and another for > tuplesort_gettupleslot() (which seems questionable to me). I still > think that grouping sets is right (and that mode_final() is wrong). Do > you? It would be nice to get an opinion on this mode_final() + tuplesort memory lifetime business from you, Tom. Note that you removed the quoted comment within be0ebb65, back in October. -- Peter Geoghegan
On 18 January 2018 at 03:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes:
> The new status of this patch is: Ready for Committer
I don't feel particularly comfortable committing a patch that
was clearly labeled as a rushed draft by its author.
Peter, where do you stand on this work?
In a quick look at the patches, WIP-kludge-fix.patch seems clearly
unacceptable for back-patching because it changes the signature and
behavior of ExecResetTupleTable, which external code might well be using.
Definitely is using, in the case of BDR and pglogical. But we can patch in a version check easily enough.
On Tue, Feb 6, 2018 at 5:54 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> In a quick look at the patches, WIP-kludge-fix.patch seems clearly >> unacceptable for back-patching because it changes the signature and >> behavior of ExecResetTupleTable, which external code might well be using. > Definitely is using, in the case of BDR and pglogical. But we can patch in a > version check easily enough. That won't be necessary. The WIP-kludge-fix.patch approach is never going to be used, and was only really posted for illustrative purposes. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Wed, Jan 17, 2018 at 2:23 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> A complicating factor for this fix of mine is that mode_final() seems >> to have its own ideas about tuple memory lifetime, over and above what >> tuplesort_getdatum() explicitly promises, as can be seen here: >> /* >> * Note: we *cannot* clean up the tuplesort object here, because the value >> * to be returned is allocated inside its sortcontext. We could use >> * datumCopy to copy it out of there, but it doesn't seem worth the >> * trouble, since the cleanup callback will clear the tuplesort later. >> */ >> ISTM that either grouping sets or mode_final() must necessarily be >> wrong, because each oversteps, and infers a different contract from >> tuplesort tuple fetching routines (different assumptions about memory >> contexts are made in each case). Only one can be right, unless it's >> okay to have one rule for tuplesort_getdatum() and another for >> tuplesort_gettupleslot() (which seems questionable to me). I still >> think that grouping sets is right (and that mode_final() is wrong). Do >> you? > It would be nice to get an opinion on this mode_final() + tuplesort > memory lifetime business from you, Tom. I'm fairly sure that that bit in mode_final() was just a hack to make it work. If there's a better, more principled way, let's go for it. > Note that you removed the quoted comment within be0ebb65, back in October. There were multiple instances of basically that same comment before. AFAICS I just consolidated them into one place, in the header comment for ordered_set_shutdown. regards, tom lane
On Wed, Feb 7, 2018 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@bowt.ie> writes: >> It would be nice to get an opinion on this mode_final() + tuplesort >> memory lifetime business from you, Tom. > > I'm fairly sure that that bit in mode_final() was just a hack to make > it work. If there's a better, more principled way, let's go for it. This is the more principled way. It is much easier to make every single tuplesort caller on every release branch follow this rule (or those on 9.5+, at least). My big concern with making tuplesort_getdatum() deliberately copy into caller's context is that that could introduce memory leaks in caller code that evolved in a world where tuplesort_end() frees pass-by-reference datum memory. Seems like the only risky case is certain ordered set aggregate code, though. And, it's probably just a matter of fixing the comments. I'll read through the bug report thread that led up to October's commit be0ebb65 [1] tomorrow, to make sure of this. I just noticed that process_ordered_aggregate_single() says that the behavior I propose for tuplesort_getdatum() is what it *already* expects: /* * Note: if input type is pass-by-ref, the datums returned by the sort are * freshly palloc'd in the per-query context, so we must be careful to * pfree them when they are no longer needed. */ This supports the idea that ordered set aggregate code is the odd one out when it comes to beliefs about tuplesort memory contexts. Even just among tuplesort_getdatum() callers, though even more so among tuplesort callers in general. One simple rule for all tuplesort fetch routines is the high-level goal here. >> Note that you removed the quoted comment within be0ebb65, back in October. > > There were multiple instances of basically that same comment before. > AFAICS I just consolidated them into one place, in the header comment for > ordered_set_shutdown. I see. So, to restate my earlier remarks in terms of the newer organization: The last line in that header comment will need to be removed, since it will become false under my scheme. The line is: "Note that many of the finalfns could *not* free the tuplesort object, at least not without extra data copying, because what they return is a pointer to a datum inside the tuplesort object". [1] https://www.postgresql.org/message-id/flat/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A%40mail.gmail.com#CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com -- Peter Geoghegan
On Wed, Feb 7, 2018 at 7:02 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Feb 7, 2018 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Peter Geoghegan <pg@bowt.ie> writes: >>> It would be nice to get an opinion on this mode_final() + tuplesort >>> memory lifetime business from you, Tom. >> >> I'm fairly sure that that bit in mode_final() was just a hack to make >> it work. If there's a better, more principled way, let's go for it. > > This is the more principled way. It is much easier to make every > single tuplesort caller on every release branch follow this rule (or > those on 9.5+, at least). I now think that my approach to fixing 9.6 in WIP-tuplesort-memcontext-fix.patch is too complicated to justify backpatching. I had the right idea there, and have no reason to think it won't work, but it now seems like the complexity simply isn't worth it. The advantage of WIP-tuplesort-memcontext-fix.patch was that it avoided an extra copy within tuplesort_gettupleslot() on the earlier Postgres versions it targeted (the versions where that function does not have a "copy" argument), by arranging to make sure that low-level routines have tuplesort caller context passed all the way down. However, now that I consider the frequency that WIP-tuplesort-memcontext-fix.patch would avoid such copying given real 9.6 workloads, its approach looks rather unappealing -- we should instead just do a copy in all cases. Another way of putting it is that it now seems like the approach taken in bugfix commit d8589946d should be taken even further for 9.6, so that we *always* copy for the tuplesort_gettupleslot() caller, rather than just copying in the most common cases. We'd also sometimes have to free the redundant memory allocated by tuplesort_gettuple_common() within tuplesort_gettupleslot() if we went this way -- the should_free = true case would have tuplesort_gettuple_common() do a pfree() after copying. Needing a pfree() is a consequence of allocating memory for caller, and then copying it for caller when we know that we're using caller's memory context. A bit weird, but certainly very simple. New plan: * For 9.5 and 9.6, the approach taken in bugfix commit d8589946d should be taken even further -- we should always copy. Moreover, we should always copy within tuplesort_getdatum(), for the same reasons. * For 9.5, 9.6, 10, and master, we should make sure that tuplesort_getdatum() uses the caller's memory context. The fact that it doesn't already do so seems like a simple oversight. We should do this to be consistent with tuplesort_gettupleslot(). (This isn't critical, but seems like a good idea.) * For 9.5, 9.6, 10, and master, we should adjust some comments from tuplesort_getdatum() callers, so that they no longer say that tuplesort datum tuple memory lives in tuplesort context. That won't be true anymore. Anyone have an opinion on this? The advantages of this approach are: - It's far simpler than WIP-tuplesort-memcontext-fix.patch, and can be applied to 9.5 and 9.6 with only small adjustments. - It leaves all branches essentially consistent with v10+. v10+ gets everything right already (except for that one minor tuplesort_getdatum() + caller context issue), and it seems sensible to treat v10 as a kind of model to follow here. There are also some disadvantages for this new plan, though: - There is a slightly awkward question for tuplesort_getdatum() in 9.6: Is tuplesort_getdatum() *always* explicitly copying an acceptable overhead, given that tuplesort_getdatum() is not known to cause a crash? I doubt so myself, since tuplesort_getdatum() *always* copies on Postgres v10+ anyway, and even on 9.6 copying is already the common case. - There is a new overhead in 9.5. As I said, 9.6 mostly already copies anyway, since it already has d8589946d -- 9.5 never got that commit. This is very similar to the situation we faced about a year ago with d8589946d on 9.6, since there isn't going to be much extra copying than the copying that d8589946d already implies. ISTM that d8589946d set a precedent that makes the situation that this creates for 9.5 okay today. -- Peter Geoghegan
On Mon, Feb 12, 2018 at 6:15 PM, Peter Geoghegan <pg@bowt.ie> wrote: > * For 9.5 and 9.6, the approach taken in bugfix commit d8589946d > should be taken even further -- we should always copy. Moreover, we > should always copy within tuplesort_getdatum(), for the same reasons. > > * For 9.5, 9.6, 10, and master, we should make sure that > tuplesort_getdatum() uses the caller's memory context. The fact that > it doesn't already do so seems like a simple oversight. We should do > this to be consistent with tuplesort_gettupleslot(). (This isn't > critical, but seems like a good idea.) > > * For 9.5, 9.6, 10, and master, we should adjust some comments from > tuplesort_getdatum() callers, so that they no longer say that > tuplesort datum tuple memory lives in tuplesort context. That won't be > true anymore. Attached patches do it that way. I'm happy with what I came up with, which is a lot simpler than my first approach. The extra copying seems likely to be well worth it, since it is fairly isolated in practice, especially on 9.6. There is no extra copying from v10+, since they don't need the first fix at all. -- Peter Geoghegan
Вложения
At Thu, 22 Feb 2018 16:22:47 -0800, Peter Geoghegan <pg@bowt.ie> wrote in <CAH2-Wz=QW6FNpLEfQpFKmKiu_WkfxCpmPDmp6ZUf=7SrARsNpQ@mail.gmail.com> > On Mon, Feb 12, 2018 at 6:15 PM, Peter Geoghegan <pg@bowt.ie> wrote: > > * For 9.5 and 9.6, the approach taken in bugfix commit d8589946d > > should be taken even further -- we should always copy. Moreover, we > > should always copy within tuplesort_getdatum(), for the same reasons. > > > > * For 9.5, 9.6, 10, and master, we should make sure that > > tuplesort_getdatum() uses the caller's memory context. The fact that > > it doesn't already do so seems like a simple oversight. We should do > > this to be consistent with tuplesort_gettupleslot(). (This isn't > > critical, but seems like a good idea.) > > > > * For 9.5, 9.6, 10, and master, we should adjust some comments from > > tuplesort_getdatum() callers, so that they no longer say that > > tuplesort datum tuple memory lives in tuplesort context. That won't be > > true anymore. > > Attached patches do it that way. I'm happy with what I came up with, > which is a lot simpler than my first approach. The extra copying seems > likely to be well worth it, since it is fairly isolated in practice, > especially on 9.6. There is no extra copying from v10+, since they > don't need the first fix at all. FWIW I examined the case by myself. And I confirmed that the cause is tuple freeing after tuplesort_end conducted by shouldFree. As a principle, since it is declared that the caller is responsible to free the result, tuplesort shouldn't free it out of the caller's control. Even if it is not currently causig use-after-free problem, it is also possible to happen. From this point of view, the patch for 9.5 and 9.6 looks fine and actually fixes the problem. For 10+, copying is controlled by the caller side, but only tuplesort_getdatum() didn't make the copy in the caller context. It is just an overlook and the fix looks reasonable. I'm not appropriate for checking comment wording but it makes sense for me. If no one objects, I'll mark this as Ready for Commit in a couple of days. reagards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Mar 26, 2018 at 5:14 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Attached patches do it that way. I'm happy with what I came up with, >> which is a lot simpler than my first approach. The extra copying seems >> likely to be well worth it, since it is fairly isolated in practice, >> especially on 9.6. There is no extra copying from v10+, since they >> don't need the first fix at all. > If no one objects, I'll mark this as Ready for Commit in a couple > of days. Thank you for the review, Horiguchi-san. It's hard to decide how important each goal is when coming up with a back-patchable fix like this. When the goals are somewhat in competition with each other, a second or a third opinion is particularly appreciated. -- Peter Geoghegan
At Mon, 26 Mar 2018 20:40:51 -0700, Peter Geoghegan <pg@bowt.ie> wrote in <CAH2-Wzk4MTSPWnCv3ENz9HZrtiJGut8TOtLTaf56AxmJ9VbcEA@mail.gmail.com> > On Mon, Mar 26, 2018 at 5:14 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> Attached patches do it that way. I'm happy with what I came up with, > >> which is a lot simpler than my first approach. The extra copying seems > >> likely to be well worth it, since it is fairly isolated in practice, > >> especially on 9.6. There is no extra copying from v10+, since they > >> don't need the first fix at all. > > > If no one objects, I'll mark this as Ready for Commit in a couple > > of days. > > Thank you for the review, Horiguchi-san. It's hard to decide how > important each goal is when coming up with a back-patchable fix like > this. When the goals are somewhat in competition with each other, a > second or a third opinion is particularly appreciated. Understood. I'll wait for the other opinions. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Mar 27, 2018 at 2:23 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> > If no one objects, I'll mark this as Ready for Commit in a couple >> > of days. >> >> Thank you for the review, Horiguchi-san. It's hard to decide how >> important each goal is when coming up with a back-patchable fix like >> this. When the goals are somewhat in competition with each other, a >> second or a third opinion is particularly appreciated. > > Understood. I'll wait for the other opinions. I wasn't specifically requesting that you not mark the patch as ready for committer, actually. I was just expressing that your input was valuable. Sorry for being unclear. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Mar 26, 2018 at 5:14 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> If no one objects, I'll mark this as Ready for Commit in a couple >> of days. > Thank you for the review, Horiguchi-san. It's hard to decide how > important each goal is when coming up with a back-patchable fix like > this. When the goals are somewhat in competition with each other, a > second or a third opinion is particularly appreciated. It looks good to me. The only real objection would be if someone came up with a test case proving that there's a significant performance degradation from the extra copies. But given that these are back branches, it would take a pretty steep penalty for me to want to take the risk of refactoring to avoid that. I've pushed it with some cosmetic adjustments. regards, tom lane
On Wed, Mar 28, 2018 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It looks good to me. The only real objection would be if someone came > up with a test case proving that there's a significant performance > degradation from the extra copies. But given that these are back > branches, it would take a pretty steep penalty for me to want to take > the risk of refactoring to avoid that. > > I've pushed it with some cosmetic adjustments. Thank you, Tom. -- Peter Geoghegan