Обсуждение: StandbyAcquireAccessExclusiveLock doesn't necessarily
Commit 37c54863c removed the code in StandbyAcquireAccessExclusiveLock that checked the return value of LockAcquireExtended. AFAICS this was flat out wrong, because it's still passing reportMemoryError = false to LockAcquireExtended, meaning there are still cases where LOCKACQUIRE_NOT_AVAIL will be returned. In such a situation, the startup process would believe it had acquired exclusive lock even though it hadn't, with potentially dire consequences. While we could certainly put back a test there, it's not clear to me that it could do anything more useful than erroring out, at least not without largely reverting 37c54863c. So my inclination is to remove the reportMemoryError = false parameter, and just let an error happen in the unlikely situation that we hit OOM for the lock table. That would allow this code to not use LockAcquireExtended at all. Indeed, I'd be rather tempted to remove that parameter from LockAcquireExtended altogether, as I don't believe it's either particularly useful, or at all well tested, or even testable. regards, tom lane
On 8 September 2018 at 00:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--
Commit 37c54863c removed the code in StandbyAcquireAccessExclusiveLock
that checked the return value of LockAcquireExtended. AFAICS this was
flat out wrong, because it's still passing reportMemoryError = false
to LockAcquireExtended, meaning there are still cases where
LOCKACQUIRE_NOT_AVAIL will be returned. In such a situation, the
startup process would believe it had acquired exclusive lock even
though it hadn't, with potentially dire consequences.
While we could certainly put back a test there, it's not clear to me
that it could do anything more useful than erroring out, at least
not without largely reverting 37c54863c.
So my inclination is to remove the reportMemoryError = false parameter,
and just let an error happen in the unlikely situation that we hit OOM
for the lock table.
That would allow this code to not use LockAcquireExtended at all.
Indeed, I'd be rather tempted to remove that parameter from
LockAcquireExtended altogether, as I don't believe it's either
particularly useful, or at all well tested, or even testable.
I've never seen an out of memory on the lock table and that seems even less likely since changes in 9.2.
So no problem removing that.
Are you looking for a patch to backpatch, or just a change for the future? Changing the parameter in a backpatch seems more trouble than its worth.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 7, 2018 at 6:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So my inclination is to remove the reportMemoryError = false parameter, > and just let an error happen in the unlikely situation that we hit OOM > for the lock table. Wouldn't that take down the entire cluster with no restart? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10 September 2018 at 19:16, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Fri, Sep 7, 2018 at 6:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So my inclination is to remove the reportMemoryError = false parameter,
> and just let an error happen in the unlikely situation that we hit OOM
> for the lock table.
Wouldn't that take down the entire cluster with no restart?
Please explain why you think that would be with no restart.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On 10 September 2018 at 19:16, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Sep 7, 2018 at 6:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> So my inclination is to remove the reportMemoryError = false parameter, >>> and just let an error happen in the unlikely situation that we hit OOM >>> for the lock table. >> Wouldn't that take down the entire cluster with no restart? > Please explain why you think that would be with no restart. More to the point, what are our alternatives, and how much can we really improve it? If the WAL stream requires more concurrent AELs than the standby's lock table can hold, there isn't going to be any solution short of manual intervention to increase the standby's max_locks_per_transaction. The point of the previous coding here was that perhaps there's some range of number-of-locks-needed where kicking hot-standby queries off of locks would allow recovery to proceed. However, it is (as far as I know) unproven that that actually works, let alone is effective enough to justify maintaining very-hard-to-test code for. The field demand for such behavior can be measured by the number of complaints we've had since breaking it in 9.6, namely zero. So no, I do not want to re-implement and maintain that behavior on the strength of just a guess that sometimes it might be useful. If somebody else feels a burning passion for it, they can do the work --- but I'd be inclined to argue that it'd be a HEAD-only performance improvement, not a back-patchable bug fix. regards, tom lane
On Tue, Sep 11, 2018 at 5:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Please explain why you think that would be with no restart. Because the startup process will die, and if that happens, IIRC, there's no crash-and-restart loop. You're just done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 11, 2018 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The point of the previous coding here was that perhaps there's some > range of number-of-locks-needed where kicking hot-standby queries > off of locks would allow recovery to proceed. However, it is (as > far as I know) unproven that that actually works, let alone is > effective enough to justify maintaining very-hard-to-test code for. > The field demand for such behavior can be measured by the number of > complaints we've had since breaking it in 9.6, namely zero. > > So no, I do not want to re-implement and maintain that behavior on > the strength of just a guess that sometimes it might be useful. > If somebody else feels a burning passion for it, they can do the > work --- but I'd be inclined to argue that it'd be a HEAD-only > performance improvement, not a back-patchable bug fix. Mmph. Well, I'm not in love with that position, because having the standby exit in such a way as to require manual intervention when an automated recovery strategy is possible is undesirable, but I'm not volunteering to do the work, either, so maybe we don't have many alternatives. I think, though, that it is pretty obvious that the intermediate zone which you refer to as "perhaps" existing does indeed exist. Queries running on the standby consume lock table slots, and killing them frees up those slots. Q.E.D. I suspect the reason why this hasn't come up much in practice is because (1) there are guards against setting various GUCs including max_connections and max_locks_per_transaction lower on the standby than they are set on the master (cf. CheckRequiredParameterValues) and (2) if those guards are not quite sufficient to ensure that the lock table on the standby is always as large there as it is on the master, it doesn't end up mattering because the number of AccessExclusiveLocks on relations is generally going to be a very small percentage of the total number of lock table slots. But if somebody's interested in working on this, maybe we could construct a TAP test case that involves the master running "BEGIN; LOCK TABLE a1, a2, a3, a4, ....;" concurrently with some "select pg_sleep from empty1, empty2, ..." queries on the standby. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Sep 11, 2018 at 5:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Please explain why you think that would be with no restart. > Because the startup process will die, and if that happens, IIRC, > there's no crash-and-restart loop. You're just done. Unless we think that the startup process will never never ever throw an error, that might be a behavior that needs discussion in itself. Obviously an infinite crash-and-restart loop would be bad, but perhaps the postmaster could have logic that would allow restarting the startup process some small number of times. I think the hard part would be in deciding whether a previous restart had succeeded (ie made progress beyond the prior crash point), so that it should no longer count against the retry limit. regards, tom lane
On 11 September 2018 at 16:11, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Tue, Sep 11, 2018 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The point of the previous coding here was that perhaps there's some
> range of number-of-locks-needed where kicking hot-standby queries
> off of locks would allow recovery to proceed. However, it is (as
> far as I know) unproven that that actually works, let alone is
> effective enough to justify maintaining very-hard-to-test code for.
> The field demand for such behavior can be measured by the number of
> complaints we've had since breaking it in 9.6, namely zero.
Agreed.
> So no, I do not want to re-implement and maintain that behavior on
> the strength of just a guess that sometimes it might be useful.
> If somebody else feels a burning passion for it, they can do the
> work --- but I'd be inclined to argue that it'd be a HEAD-only
> performance improvement, not a back-patchable bug fix.
Mmph. Well, I'm not in love with that position, because having the
standby exit in such a way as to require manual intervention when an
automated recovery strategy is possible is undesirable, but I'm not
volunteering to do the work, either, so maybe we don't have many
alternatives.
I think, though, that it is pretty obvious that the intermediate zone
which you refer to as "perhaps" existing does indeed exist. Queries
running on the standby consume lock table slots, and killing them
frees up those slots. Q.E.D.
I suspect the reason why this hasn't come up much in practice is
because (1) there are guards against setting various GUCs including
max_connections and max_locks_per_transaction lower on the standby
than they are set on the master (cf. CheckRequiredParameterValues) and
(2) if those guards are not quite sufficient to ensure that the lock
table on the standby is always as large there as it is on the master,
it doesn't end up mattering because the number of AccessExclusiveLocks
on relations is generally going to be a very small percentage of the
total number of lock table slots. But if somebody's interested in
working on this, maybe we could construct a TAP test case that
involves the master running "BEGIN; LOCK TABLE a1, a2, a3, a4, ....;"
concurrently with some "select pg_sleep from empty1, empty2, ..."
queries on the standby.
max_connections on standby must be same or higher on standby
standby users are not allowed to request strong locks, so the only strong locks coming through are AccessExclusiveLocks from master.
max_locks_per_transaction is minimum 10 and it would be reasonable to assume it is set to same or higher than master also.
Workloads on master are also subject to memory errors, so excessive use of locks on master would hit limits and that would naturally prune the workload before it hit the standby.
It's hard to see how any reasonable workload would affect the standby. And if it did, you'd change the parameter and restart, just like you already have to do if someone changes max_connections on master first.
So I don't see any problem or anything abnormal in what Tom suggests.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-09-11 16:23:44 +0100, Simon Riggs wrote: > It's hard to see how any reasonable workload would affect the standby. And > if it did, you'd change the parameter and restart, just like you already > have to do if someone changes max_connections on master first. Isn't one of the most common ways to run into "out of shared memory" "You might need to increase max_locks_per_transaction." to run pg_dump? And that's pretty commonly done against standbys? Greetings, Andres Freund
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Sep 11, 2018 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The point of the previous coding here was that perhaps there's some >> range of number-of-locks-needed where kicking hot-standby queries >> off of locks would allow recovery to proceed. However, it is (as >> far as I know) unproven that that actually works, let alone is >> effective enough to justify maintaining very-hard-to-test code for. > I think, though, that it is pretty obvious that the intermediate zone > which you refer to as "perhaps" existing does indeed exist. Queries > running on the standby consume lock table slots, and killing them > frees up those slots. Q.E.D. Sounds like lock whack-a-mole to me. If there are enough standby queries running to create the problem, there's probably a continuous inbound query stream; you aren't going to be able to free up locktable space on net without some way of suppressing new lock acquisitions altogether. That's still more mechanism that'd have to be designed, written, and tested. And believe you me, I will insist on a test case, now that we know this has been broken for at least two years with nobody noticing. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2018-09-11 16:23:44 +0100, Simon Riggs wrote: >> It's hard to see how any reasonable workload would affect the standby. And >> if it did, you'd change the parameter and restart, just like you already >> have to do if someone changes max_connections on master first. > Isn't one of the most common ways to run into "out of shared memory" > "You might need to increase max_locks_per_transaction." to run pg_dump? > And that's pretty commonly done against standbys? If the startup process has acquired enough AELs to approach locktable full, any concurrent pg_dump has probably failed already, because it'd be trying to share-lock every table and so would have a huge conflict cross-section; it's hard to believe it wouldn't get cancelled pretty early in that process. (Again, if you think this scenario is probable, you have to explain the lack of field complaints.) The case where this behavior might really be of some use, IMO, is the lots-of-small-transactions scenario --- but we lack the stop-new-locks mechanism that would be needed to make the behavior actually effective for that case. regards, tom lane
Hi, On 2018-09-11 12:03:44 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Isn't one of the most common ways to run into "out of shared memory" > > "You might need to increase max_locks_per_transaction." to run pg_dump? > > And that's pretty commonly done against standbys? > > If the startup process has acquired enough AELs to approach locktable > full, any concurrent pg_dump has probably failed already, because it'd > be trying to share-lock every table and so would have a huge conflict > cross-section; it's hard to believe it wouldn't get cancelled pretty > early in that process. (Again, if you think this scenario is probable, > you have to explain the lack of field complaints.) I was thinking of the other way round - there's a running pg_dump and then somebody does a bit of DDL (say a DROP SCHEMA CASCADE in a multi-tenant scenario). Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-09-11 12:03:44 -0400, Tom Lane wrote: >> If the startup process has acquired enough AELs to approach locktable >> full, any concurrent pg_dump has probably failed already, because it'd >> be trying to share-lock every table and so would have a huge conflict >> cross-section; it's hard to believe it wouldn't get cancelled pretty >> early in that process. (Again, if you think this scenario is probable, >> you have to explain the lack of field complaints.) > I was thinking of the other way round - there's a running pg_dump and > then somebody does a bit of DDL (say a DROP SCHEMA CASCADE in a > multi-tenant scenario). Doesn't matter: startup would hit a lock conflict and cancel the pg_dump to get out of it, long before approaching locktable full. regards, tom lane
On 2018-09-11 12:18:59 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-09-11 12:03:44 -0400, Tom Lane wrote: > >> If the startup process has acquired enough AELs to approach locktable > >> full, any concurrent pg_dump has probably failed already, because it'd > >> be trying to share-lock every table and so would have a huge conflict > >> cross-section; it's hard to believe it wouldn't get cancelled pretty > >> early in that process. (Again, if you think this scenario is probable, > >> you have to explain the lack of field complaints.) > > > I was thinking of the other way round - there's a running pg_dump and > > then somebody does a bit of DDL (say a DROP SCHEMA CASCADE in a > > multi-tenant scenario). > > Doesn't matter: startup would hit a lock conflict and cancel the pg_dump > to get out of it, long before approaching locktable full. Only if all that's happening in the same database, which is far from a given. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-09-11 12:18:59 -0400, Tom Lane wrote: >> Doesn't matter: startup would hit a lock conflict and cancel the pg_dump >> to get out of it, long before approaching locktable full. > Only if all that's happening in the same database, which is far from a > given. Well, there remains the fact that we've seen no field reports that seem to trace to failure-to-acquire-AEL since 9.6 came out. So arguing that this *could* be a probable scenario fails to comport with the available evidence. My inclination is to fix it as I've suggested and wait to see if there are field complaints before expending a whole lot of effort to create a better solution. In any case, I am not willing to create that better solution myself, and neither is Robert; are you? regards, tom lane
On 2018-09-11 12:26:44 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-09-11 12:18:59 -0400, Tom Lane wrote: > >> Doesn't matter: startup would hit a lock conflict and cancel the pg_dump > >> to get out of it, long before approaching locktable full. > > > Only if all that's happening in the same database, which is far from a > > given. > > Well, there remains the fact that we've seen no field reports that seem > to trace to failure-to-acquire-AEL since 9.6 came out. So arguing that > this *could* be a probable scenario fails to comport with the available > evidence. True. But it seems like it'd be really hard to actually encounter any problems due to the failing lock, especially in a way that is visible enough to be reported. In most cases the missing AEL will let things just continue to operate as normal, and in some cases you'll get errors about not being able to access files. I have *zero* trust that we'd actually see this kind of error reported. It might even be that we've seen reports of this, but didn't attribute the errors correctly. There were a few reports about standbys having corruptly replayed truncations - and a truncation that happens concurrently to buffer accesses (due to the missing AEL) could e.g. explain that, by later then re-enlarging the relations due to a non-purged dirty block. > My inclination is to fix it as I've suggested and wait to see if there > are field complaints before expending a whole lot of effort to create > a better solution. In any case, I am not willing to create that better > solution myself, and neither is Robert; are you? On master I'd be ok with that, but on the backbranches that seems like playing with user's setups. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-09-11 12:26:44 -0400, Tom Lane wrote: >> Well, there remains the fact that we've seen no field reports that seem >> to trace to failure-to-acquire-AEL since 9.6 came out. So arguing that >> this *could* be a probable scenario fails to comport with the available >> evidence. > It might even be that we've seen reports of this, but didn't attribute > the errors correctly. Yeah, that's certainly possible. One good thing about the change I'm recommending is that the symptom would be very clear (ie, "out of shared memory" from the startup process). If we do start getting reports of it, we'd know where the problem is. >> My inclination is to fix it as I've suggested and wait to see if there >> are field complaints before expending a whole lot of effort to create >> a better solution. In any case, I am not willing to create that better >> solution myself, and neither is Robert; are you? > On master I'd be ok with that, but on the backbranches that seems like > playing with user's setups. I am not sure which part of "I will not fix this" you didn't understand. *If* we get clear evidence that it happens for a non-negligible number of users, I might be willing to reconsider. Right now my reading of the evidence is that it hasn't, and won't; so I judge that putting in a complex mechanism to try to cope with the situation would be a net loss for reliability. Back-patching such a mechanism seems like it'd be an even worse engineering decision. regards, tom lane
On 2018-09-11 12:50:06 -0400, Tom Lane wrote: > I am not sure which part of "I will not fix this" you didn't understand. Maybe the "this is an open list, and we can discuss things" bit?