Обсуждение: Pruning never visible changes
A user asked me whether we prune never visible changes, such as BEGIN; INSERT... UPDATE.. (same row) COMMIT; Once committed, the original insert is no longer visible to anyone, so "ought to be able to be pruned", sayeth the user. And they also say that changing the app is much harder, as ever. After some thought, Yes, we can prune, but not in all cases - only if the never visible tuple is at the root end of the update chain. The only question is can that be done cheaply enough to bother with. The answer in one specific case is Yes, in other cases No. This patch adds a new test for this use case, and code to remove the never visible row when the changes are made by the same xid. (I'm pretty sure there used to be a test for this some years back and I'm guessing it was removed because it isn't always possible to remove the tuple, which this new patch honours.) Please let me know what you think. -- Simon Riggs http://www.EnterpriseDB.com/
Вложения
Simon Riggs <simon.riggs@enterprisedb.com> writes: > A user asked me whether we prune never visible changes, such as > BEGIN; > INSERT... > UPDATE.. (same row) > COMMIT; Didn't we just have this discussion in another thread? You cannot do that, at least not without checking that the originating transaction has no snapshots that could see the older row version. I'm not sure whether or not snapmgr.c has enough information to determine that, but in any case this formulation is surely unsafe, because it isn't even checking whether that transaction is our own, let alone asking snapmgr.c. I'm dubious that a safe version would fire often enough to be worth the cycles spent. regards, tom lane
On Fri, 16 Sept 2022 at 15:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > A user asked me whether we prune never visible changes, such as > > BEGIN; > > INSERT... > > UPDATE.. (same row) > > COMMIT; > > Didn't we just have this discussion in another thread? Not that I was aware of, but it sounds like a different case anyway. > You cannot > do that, at least not without checking that the originating > transaction has no snapshots that could see the older row version. I'm saying this is possible only AFTER the end of the originating xact, so there are no issues with additional snapshots. i.e. the never visible row has to be judged RECENTLY_DEAD before we even check. -- Simon Riggs http://www.EnterpriseDB.com/
Simon Riggs <simon.riggs@enterprisedb.com> writes: > On Fri, 16 Sept 2022 at 15:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You cannot >> do that, at least not without checking that the originating >> transaction has no snapshots that could see the older row version. > I'm saying this is possible only AFTER the end of the originating > xact, so there are no issues with additional snapshots. I see, so the point is just that we can prune even if the originating xact hasn't yet passed the global xmin horizon. I agree that's safe, but will it fire often enough to be worth the trouble? Also, why does it need to be restricted to certain shapes of HOT chains --- that is, why can't we do exactly what we'd do if the xact *were* past the xmin horizon? regards, tom lane
On Fri, 2022-09-16 at 10:26 -0400, Tom Lane wrote: > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > A user asked me whether we prune never visible changes, such as > > BEGIN; > > INSERT... > > UPDATE.. (same row) > > COMMIT; > > Didn't we just have this discussion in another thread? For reference: that was https://postgr.es/m/f6a491b32cb44bb5daaafec835364f7149348fa1.camel@cybertec.at Yours, Laurenz Albe
On Fri, 16 Sept 2022 at 21:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Fri, 2022-09-16 at 10:26 -0400, Tom Lane wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > > A user asked me whether we prune never visible changes, such as > > > BEGIN; > > > INSERT... > > > UPDATE.. (same row) > > > COMMIT; > > > > Didn't we just have this discussion in another thread? > > For reference: that was > https://postgr.es/m/f6a491b32cb44bb5daaafec835364f7149348fa1.camel@cybertec.at Thanks. I confirm I hadn't seen that, and indeed, I wrote the patch on 5 Sept before you posted. -- Simon Riggs http://www.EnterpriseDB.com/
On Fri, 16 Sept 2022 at 18:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > On Fri, 16 Sept 2022 at 15:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> You cannot > >> do that, at least not without checking that the originating > >> transaction has no snapshots that could see the older row version. > > > I'm saying this is possible only AFTER the end of the originating > > xact, so there are no issues with additional snapshots. > > I see, so the point is just that we can prune even if the originating > xact hasn't yet passed the global xmin horizon. I agree that's safe, > but will it fire often enough to be worth the trouble? It is an edge case with limited utility, I agree, which is why I looked for a cheap test (xmin == xmax only). This additional test is also valid, but too expensive to apply: TransactionIdGetTopmostTranactionId(xmax) == TransactionIdGetTopmostTranactionId(xmin) > Also, why > does it need to be restricted to certain shapes of HOT chains --- > that is, why can't we do exactly what we'd do if the xact *were* > past the xmin horizon? I thought it important to maintain the integrity of the HOT chain, in that the xmax of one tuple version matches the xmin of the next. So pruning only from the root of the chain allows us to maintain that validity check. I'm on the fence myself, which is why I didn't post it immediately I had written it. If not, it would be useful to add a note in comments to the code to explain why we don't do this. -- Simon Riggs http://www.EnterpriseDB.com/
On Fri, 16 Sept 2022 at 10:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > A user asked me whether we prune never visible changes, such as > > BEGIN; > > INSERT... > > UPDATE.. (same row) > > COMMIT; > > Didn't we just have this discussion in another thread? Well..... not "just" :) commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Apr 29 16:29:42 2011 -0400 Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum(). VACUUM was willing to remove a committed-dead tuple immediately if it was deleted by the same transaction that inserted it. The idea is that such a tuple could never have been visible to any other transaction, so we don't need to keep it around to satisfy MVCC snapshots. However, there was already an exception for tuples that are part of an update chain, and this exception created a problem: we might remove TOAST tuples (which are never part of an update chain) while their parent tuple stayed around (if it was part of an update chain). This didn't pose a problem for most things, since the parent tuple is indeed dead: no snapshot will ever consider it visible. But MVCC-safe CLUSTER had a problem, since it will try to copy RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST data too, and would fail if VACUUM had already removed the toast tuples. Easiest fix is to get rid of the special case for xmin == xmax. This may delay reclaiming dead space for a little bit in some cases, but it's by far the most reliable way to fix the issue. Per bug #5998 from Mark Reid. Back-patch to 8.3, which is the oldest version with MVCC-safe CLUSTER.
On Mon, 19 Sept 2022 at 01:16, Greg Stark <stark@mit.edu> wrote: > > On Fri, 16 Sept 2022 at 10:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > > A user asked me whether we prune never visible changes, such as > > > BEGIN; > > > INSERT... > > > UPDATE.. (same row) > > > COMMIT; > > > > Didn't we just have this discussion in another thread? > > Well..... not "just" :) This recent thread [0] mentioned the same, and I mentioned it in [1] too last year. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/flat/2031521.1663076724%40sss.pgh.pa.us#01542683a64a312e5c21541fecd50e63 Subject: Re: Tuples inserted and deleted by the same transaction Date: 2022-09-13 14:13:44 [1] https://www.postgresql.org/message-id/CAEze2Whjnhg96Wt2-DxtBydhmMDmVm2WfWOX3aGcB2C2Hbry0Q%40mail.gmail.com Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic Date: 2021-06-14 09:53:47 (in a thread about a PS comment)
On Mon, 19 Sept 2022 at 00:16, Greg Stark <stark@mit.edu> wrote: > > On Fri, 16 Sept 2022 at 10:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > > A user asked me whether we prune never visible changes, such as > > > BEGIN; > > > INSERT... > > > UPDATE.. (same row) > > > COMMIT; > > > > Didn't we just have this discussion in another thread? > > Well..... not "just" :) > > commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414 > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Fri Apr 29 16:29:42 2011 -0400 > > Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum(). > > VACUUM was willing to remove a committed-dead tuple immediately if it was > deleted by the same transaction that inserted it. The idea is that such a > tuple could never have been visible to any other transaction, so we don't > need to keep it around to satisfy MVCC snapshots. However, there was > already an exception for tuples that are part of an update chain, and this > exception created a problem: we might remove TOAST tuples (which are never > part of an update chain) while their parent tuple stayed around (if it was > part of an update chain). This didn't pose a problem for most things, > since the parent tuple is indeed dead: no snapshot will ever consider it > visible. But MVCC-safe CLUSTER had a problem, since it will try to copy > RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST > data too, and would fail if VACUUM had already removed the toast tuples. > > Easiest fix is to get rid of the special case for xmin == xmax. This may > delay reclaiming dead space for a little bit in some cases, but it's by far > the most reliable way to fix the issue. > > Per bug #5998 from Mark Reid. Back-patch to 8.3, which is the oldest > version with MVCC-safe CLUSTER. Good research Greg, thank you. Only took 10 years for me to notice it was gone ;-) -- Simon Riggs http://www.EnterpriseDB.com/
On 2022-Sep-22, Simon Riggs wrote: > On Mon, 19 Sept 2022 at 00:16, Greg Stark <stark@mit.edu> wrote: > > VACUUM was willing to remove a committed-dead tuple immediately if it was > > deleted by the same transaction that inserted it. The idea is that such a > > tuple could never have been visible to any other transaction, so we don't > > need to keep it around to satisfy MVCC snapshots. However, there was > > already an exception for tuples that are part of an update chain, and this > > exception created a problem: we might remove TOAST tuples (which are never > > part of an update chain) while their parent tuple stayed around (if it was > > part of an update chain). This didn't pose a problem for most things, > > since the parent tuple is indeed dead: no snapshot will ever consider it > > visible. But MVCC-safe CLUSTER had a problem, since it will try to copy > > RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST > > data too, and would fail if VACUUM had already removed the toast tuples. > Good research Greg, thank you. Only took 10 years for me to notice it > was gone ;-) But this begs the question: is the proposed change safe, given that ancient consideration? I don't think TOAST issues have been mentioned in this thread so far, so I wonder if there is a test case that verifies that this problem doesn't occur for some reason. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Thu, 22 Sept 2022 at 15:16, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Sep-22, Simon Riggs wrote: > > > On Mon, 19 Sept 2022 at 00:16, Greg Stark <stark@mit.edu> wrote: > > > > VACUUM was willing to remove a committed-dead tuple immediately if it was > > > deleted by the same transaction that inserted it. The idea is that such a > > > tuple could never have been visible to any other transaction, so we don't > > > need to keep it around to satisfy MVCC snapshots. However, there was > > > already an exception for tuples that are part of an update chain, and this > > > exception created a problem: we might remove TOAST tuples (which are never > > > part of an update chain) while their parent tuple stayed around (if it was > > > part of an update chain). This didn't pose a problem for most things, > > > since the parent tuple is indeed dead: no snapshot will ever consider it > > > visible. But MVCC-safe CLUSTER had a problem, since it will try to copy > > > RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST > > > data too, and would fail if VACUUM had already removed the toast tuples. > > > Good research Greg, thank you. Only took 10 years for me to notice it > > was gone ;-) > > But this begs the question: is the proposed change safe, given that > ancient consideration? I don't think TOAST issues have been mentioned > in this thread so far, so I wonder if there is a test case that verifies > that this problem doesn't occur for some reason. Oh, completely agreed. I will submit a modified patch that adds a test case and just a comment to explain why we can't remove such rows. -- Simon Riggs http://www.EnterpriseDB.com/