Обсуждение: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Hi, If vacuum fails to remove a tuple with xmax older than VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed, it will ERROR out when determining whether or not to freeze the tuple with "cannot freeze committed xmax". In back branches starting with 14, failing to remove tuples older than OldestXmin during pruning caused vacuum to infinitely loop in lazy_scan_prune(), as investigated on this [1] thread. On master, after 1ccc1e05ae removed the retry loop in lazy_scan_prune() and stopped comparing tuples to OldestXmin, the hang could no longer happen, but we can still attempt to freeze dead tuples visibly killed before OldestXmin -- resulting in an ERROR. Pruning may fail to remove dead tuples with xmax before OldestXmin if the tuple is not considered removable by GlobalVisState. For vacuum, the GlobalVisState is initially calculated at the beginning of vacuuming the relation -- at the same time and with the same value as VacuumCutoffs->OldestXmin. A backend's GlobalVisState may be updated again when it is accessed if a new snapshot is taken or if something caused ComputeXidHorizons() to be called. This can happen, for example, at the end of a round of index vacuuming when GetOldestNonRemovableTransactionId() is called. Normally this may result in GlobalVisState's horizon moving forward -- potentially allowing more dead tuples to be removed. However, if a disconnected standby with a running transaction older than VacuumCutoffs->OldestXmin reconnects to the primary after vacuum initially calculates GlobalVisState and OldestXmin but before GlobalVisState is updated, the value of GlobalVisState->maybe_needed could go backwards. If this happens in the middle of vacuum's first pruning and freezing pass, it is possible that pruning/freezing could then encounter a tuple whose xmax is younger than GlobalVisState->maybe_needed and older than VacuumCutoffs->OldestXmin. heap_prune_satisfies_vacuum() would deem the tuple HEAPTUPLE_RECENTLY_DEAD and would not remove it. But the heap_pre_freeze_checks() would ERROR out with "cannot freeze committed xmax". This check is to avoid freezing dead tuples. We can fix this by always removing tuples considered dead before VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby has a transaction that sees that tuple as alive, because it will simply wait to replay the removal until it would be correct to do so or recovery conflict handling will cancel the transaction that sees the tuple as alive and allow replay to continue. Attached is the suggested fix for master plus a repro. I wrote it as a recovery suite TAP test, but I am _not_ proposing we add it to the ongoing test suite. It is, amongst other things, definitely prone to flaking. I also had to use loads of data to force two index vacuuming passes now that we have TIDStore, so it is a slow test. If you want to run the repro with meson, you'll have to add 't/099_vacuum_hang.pl' to src/test/recovery/meson.build and then run it with: meson test postgresql:recovery / recovery/099_vacuum_hang If you use autotools, you can run it with: make check PROVE_TESTS="t/099_vacuum_hang.pl" The repro forces a round of index vacuuming after the standby reconnects and before pruning a dead tuple whose xmax is older than OldestXmin. At the end of the round of index vacuuming, _bt_pendingfsm_finalize() calls GetOldestNonRemovableTransactionId(), thereby updating the backend's GlobalVisState and moving maybe_needed backwards. Then vacuum's first pass will continue with pruning and find our later inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin. I make sure that the standby reconnects between vacuum_get_cutoffs() and pruning because I have a cursor on the page keeping VACUUM FREEZE from getting a cleanup lock. See the repro for step-by-step explanations of how it works. I have a modified version of this that repros the infinite loop on 14-16 with substantially less data. See it here [2]. Also, the repro attached to this mail won't work on 14 and 15 because of changes to background_psql. - Melanie [1] https://postgr.es/m/20240415173913.4zyyrwaftujxthf2%40awork3.anarazel.de#1b216b7768b5bd577a3d3d51bd5aadee [2] https://www.postgresql.org/message-id/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
Вложения
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > If vacuum fails to remove a tuple with xmax older than > VacuumCutoffs->OldestXmin and younger than > GlobalVisState->maybe_needed, it will ERROR out when determining > whether or not to freeze the tuple with "cannot freeze committed > xmax". > > In back branches starting with 14, failing to remove tuples older than > OldestXmin during pruning caused vacuum to infinitely loop in > lazy_scan_prune(), as investigated on this [1] thread. This is a great summary. > We can fix this by always removing tuples considered dead before > VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby > has a transaction that sees that tuple as alive, because it will > simply wait to replay the removal until it would be correct to do so > or recovery conflict handling will cancel the transaction that sees > the tuple as alive and allow replay to continue. I think that this is the right general approach. > The repro forces a round of index vacuuming after the standby > reconnects and before pruning a dead tuple whose xmax is older than > OldestXmin. > > At the end of the round of index vacuuming, _bt_pendingfsm_finalize() > calls GetOldestNonRemovableTransactionId(), thereby updating the > backend's GlobalVisState and moving maybe_needed backwards. Right. I saw details exactly consistent with this when I used GDB against a production instance. I'm glad that you were able to come up with a repro that involves exactly the same basic elements, including index page deletion. -- Peter Geoghegan
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Hi, Melanie! I'm glad to hear you that you have found a root case of the problem) Thank you for that!
Hi, If vacuum fails to remove a tuple with xmax older than VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed, it will ERROR out when determining whether or not to freeze the tuple with "cannot freeze committed xmax". In back branches starting with 14, failing to remove tuples older than OldestXmin during pruning caused vacuum to infinitely loop in lazy_scan_prune(), as investigated on this [1] thread. On master, after 1ccc1e05ae removed the retry loop in lazy_scan_prune() and stopped comparing tuples to OldestXmin, the hang could no longer happen, but we can still attempt to freeze dead tuples visibly killed before OldestXmin -- resulting in an ERROR. Pruning may fail to remove dead tuples with xmax before OldestXmin if the tuple is not considered removable by GlobalVisState. For vacuum, the GlobalVisState is initially calculated at the beginning of vacuuming the relation -- at the same time and with the same value as VacuumCutoffs->OldestXmin. A backend's GlobalVisState may be updated again when it is accessed if a new snapshot is taken or if something caused ComputeXidHorizons() to be called. This can happen, for example, at the end of a round of index vacuuming when GetOldestNonRemovableTransactionId() is called. Normally this may result in GlobalVisState's horizon moving forward -- potentially allowing more dead tuples to be removed. However, if a disconnected standby with a running transaction older than VacuumCutoffs->OldestXmin reconnects to the primary after vacuum initially calculates GlobalVisState and OldestXmin but before GlobalVisState is updated, the value of GlobalVisState->maybe_needed could go backwards. If this happens in the middle of vacuum's first pruning and freezing pass, it is possible that pruning/freezing could then encounter a tuple whose xmax is younger than GlobalVisState->maybe_needed and older than VacuumCutoffs->OldestXmin. heap_prune_satisfies_vacuum() would deem the tuple HEAPTUPLE_RECENTLY_DEAD and would not remove it. But the heap_pre_freeze_checks() would ERROR out with "cannot freeze committed xmax". This check is to avoid freezing dead tuples. We can fix this by always removing tuples considered dead before VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby has a transaction that sees that tuple as alive, because it will simply wait to replay the removal until it would be correct to do so or recovery conflict handling will cancel the transaction that sees the tuple as alive and allow replay to continue.
This is an interesting and difficult case) I noticed that when initializing the cluster, in my opinion, we provide excessive freezing. Initialization takes a long time, which can lead, for example, to longer test execution. I got rid of this by adding the OldestMxact checkbox is not FirstMultiXactId, and it works fine.
if (prstate->cutoffs &&
TransactionIdIsValid(prstate->cutoffs->OldestXmin) &&
prstate->cutoffs->OldestMxact != FirstMultiXactId &&
NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin))
return HEAPTUPLE_DEAD;
Can I keep it?
To be honest, the meson test is new for me, but I see its useful features. I think I will use it for checking my features)Attached is the suggested fix for master plus a repro. I wrote it as a recovery suite TAP test, but I am _not_ proposing we add it to the ongoing test suite. It is, amongst other things, definitely prone to flaking. I also had to use loads of data to force two index vacuuming passes now that we have TIDStore, so it is a slow test. If you want to run the repro with meson, you'll have to add 't/099_vacuum_hang.pl' to src/test/recovery/meson.build and then run it with: meson test postgresql:recovery / recovery/099_vacuum_hang If you use autotools, you can run it with: make check PROVE_TESTS="t/099_vacuum_hang.pl The repro forces a round of index vacuuming after the standby reconnects and before pruning a dead tuple whose xmax is older than OldestXmin. At the end of the round of index vacuuming, _bt_pendingfsm_finalize() calls GetOldestNonRemovableTransactionId(), thereby updating the backend's GlobalVisState and moving maybe_needed backwards. Then vacuum's first pass will continue with pruning and find our later inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin. I make sure that the standby reconnects between vacuum_get_cutoffs() and pruning because I have a cursor on the page keeping VACUUM FREEZE from getting a cleanup lock. See the repro for step-by-step explanations of how it works. I have a modified version of this that repros the infinite loop on 14-16 with substantially less data. See it here [2]. Also, the repro attached to this mail won't work on 14 and 15 because of changes to background_psql. [1] https://postgr.es/m/20240415173913.4zyyrwaftujxthf2%40awork3.anarazel.de#1b216b7768b5bd577a3d3d51bd5aadee [2] https://www.postgresql.org/message-id/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
I couldn't understand why the replica is necessary here. Now I am digging why I got the similar behavior without replica when I have only one instance. I'm still checking this in my test, but I believe this patch fixes the original problem because the symptoms were the same.
-- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On 21/06/2024 03:02, Peter Geoghegan wrote: > On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: >> If vacuum fails to remove a tuple with xmax older than >> VacuumCutoffs->OldestXmin and younger than >> GlobalVisState->maybe_needed, it will ERROR out when determining >> whether or not to freeze the tuple with "cannot freeze committed >> xmax". >> >> In back branches starting with 14, failing to remove tuples older than >> OldestXmin during pruning caused vacuum to infinitely loop in >> lazy_scan_prune(), as investigated on this [1] thread. > > This is a great summary. +1 >> We can fix this by always removing tuples considered dead before >> VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby >> has a transaction that sees that tuple as alive, because it will >> simply wait to replay the removal until it would be correct to do so >> or recovery conflict handling will cancel the transaction that sees >> the tuple as alive and allow replay to continue. > > I think that this is the right general approach. +1 >> The repro forces a round of index vacuuming after the standby >> reconnects and before pruning a dead tuple whose xmax is older than >> OldestXmin. >> >> At the end of the round of index vacuuming, _bt_pendingfsm_finalize() >> calls GetOldestNonRemovableTransactionId(), thereby updating the >> backend's GlobalVisState and moving maybe_needed backwards. > > Right. I saw details exactly consistent with this when I used GDB > against a production instance. > > I'm glad that you were able to come up with a repro that involves > exactly the same basic elements, including index page deletion. Would it be possible to make it robust so that we could always run it with "make check"? This seems like an important corner case to regression test. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 4:10 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote: > > We can fix this by always removing tuples considered dead before > VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby > has a transaction that sees that tuple as alive, because it will > simply wait to replay the removal until it would be correct to do so > or recovery conflict handling will cancel the transaction that sees > the tuple as alive and allow replay to continue. > > This is an interesting and difficult case) I noticed that when initializing the cluster, in my opinion, we provide excessivefreezing. Initialization takes a long time, which can lead, for example, to longer test execution. I got rid ofthis by adding the OldestMxact checkbox is not FirstMultiXactId, and it works fine. > > if (prstate->cutoffs && > TransactionIdIsValid(prstate->cutoffs->OldestXmin) && > prstate->cutoffs->OldestMxact != FirstMultiXactId && > NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin)) > return HEAPTUPLE_DEAD; > > Can I keep it? This looks like an addition to the new criteria I added to heap_prune_satisfies_vacuum(). Is that what you are suggesting? If so, it looks like it would only return HEAPTUPLE_DEAD (and thus only remove) a subset of the tuples my original criteria would remove. When vacuum calculates OldestMxact as FirstMultiXactId, it would not remove those tuples deleted before OldestXmin. It seems like OldestMxact will equal FirstMultiXactID sometimes right after initdb and after transaction ID wraparound. I'm not sure I totally understand the criteria. One thing I find confusing about this is that this would actually remove less tuples than with my criteria -- which could lead to more freezing. When vacuum calculates OldestMxact == FirstMultiXactID, we would not remove tuples deleted before OldestXmin and thus return HEAPTUPLE_RECENTLY_DEAD for those tuples. Then we would consider freezing them. So, it seems like we would do more freezing by adding this criteria. Could you explain more about how the criteria you are suggesting works? Are you saying it does less freezing than master or less freezing than with my patch? > Attached is the suggested fix for master plus a repro. I wrote it as a > recovery suite TAP test, but I am _not_ proposing we add it to the > ongoing test suite. It is, amongst other things, definitely prone to > flaking. I also had to use loads of data to force two index vacuuming > passes now that we have TIDStore, so it is a slow test. -- snip -- > I have a modified version of this that repros the infinite loop on > 14-16 with substantially less data. See it here [2]. Also, the repro > attached to this mail won't work on 14 and 15 because of changes to > background_psql. > > I couldn't understand why the replica is necessary here. Now I am digging why I got the similar behavior without replicawhen I have only one instance. I'm still checking this in my test, but I believe this patch fixes the original problembecause the symptoms were the same. Did you get similar behavior on master or on back branches? Was the behavior you observed the infinite loop or the error during heap_prepare_freeze_tuple()? In my examples, the replica is needed because something has to move the horizon on the primary backwards. When a standby reconnects with an older oldest running transaction ID than any of the running transactions on the primary and the vacuuming backend recomputes its RecentXmin, the horizon may move backwards when compared to the horizon calculated at the beginning of the vacuum. Vacuum does not recompute cutoffs->OldestXmin during vacuuming a relation but it may recompute the values in the GlobalVisState it uses for pruning. We knew of only one other way that the horizon could move backwards which Matthias describes here [1]. However, this is thought to be its own concurrency-related bug in the commit-abort path that should be fixed -- as opposed to the standby reconnecting with an older oldest running transaction ID which can be expected. Do you know if you were seeing the effects of the scenario Matthias describes? - Melanie [1] https://www.postgresql.org/message-id/CAEze2WjMTh4KS0%3DQEQB-Jq%2BtDLPR%2B0%2BzVBMfVwSPK5A%3DWZa95Q%40mail.gmail.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 4:27 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 21/06/2024 03:02, Peter Geoghegan wrote: > > On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman > > <melanieplageman@gmail.com> wrote: > > > >> The repro forces a round of index vacuuming after the standby > >> reconnects and before pruning a dead tuple whose xmax is older than > >> OldestXmin. > >> > >> At the end of the round of index vacuuming, _bt_pendingfsm_finalize() > >> calls GetOldestNonRemovableTransactionId(), thereby updating the > >> backend's GlobalVisState and moving maybe_needed backwards. > > > > Right. I saw details exactly consistent with this when I used GDB > > against a production instance. > > > > I'm glad that you were able to come up with a repro that involves > > exactly the same basic elements, including index page deletion. > > Would it be possible to make it robust so that we could always run it > with "make check"? This seems like an important corner case to > regression test. I'd have to look into how to ensure I can stabilize some of the parts that seem prone to flaking. I can probably stabilize the vacuum bit with a query of pg_stat_activity making sure it is waiting to acquire the cleanup lock. I don't, however, see a good way around the large amount of data required to trigger more than one round of index vacuuming. I could generate the data more efficiently than I am doing here (generate_series() in the from clause). Perhaps with a copy? I know it is too slow now to go in an ongoing test, but I don't have an intuition around how fast it would have to be to be acceptable. Is there a set of additional tests that are slower that we don't always run? I didn't follow how the wraparound test ended up, but that seems like one that would have been slow. - Melanie
On 6/24/24 16:53, Melanie Plageman wrote: > On Mon, Jun 24, 2024 at 4:27 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> On 21/06/2024 03:02, Peter Geoghegan wrote: >>> On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman >>> <melanieplageman@gmail.com> wrote: >>> >>>> The repro forces a round of index vacuuming after the standby >>>> reconnects and before pruning a dead tuple whose xmax is older than >>>> OldestXmin. >>>> >>>> At the end of the round of index vacuuming, _bt_pendingfsm_finalize() >>>> calls GetOldestNonRemovableTransactionId(), thereby updating the >>>> backend's GlobalVisState and moving maybe_needed backwards. >>> >>> Right. I saw details exactly consistent with this when I used GDB >>> against a production instance. >>> >>> I'm glad that you were able to come up with a repro that involves >>> exactly the same basic elements, including index page deletion. >> >> Would it be possible to make it robust so that we could always run it >> with "make check"? This seems like an important corner case to >> regression test. > > I'd have to look into how to ensure I can stabilize some of the parts > that seem prone to flaking. I can probably stabilize the vacuum bit > with a query of pg_stat_activity making sure it is waiting to acquire > the cleanup lock. > > I don't, however, see a good way around the large amount of data > required to trigger more than one round of index vacuuming. I could > generate the data more efficiently than I am doing here > (generate_series() in the from clause). Perhaps with a copy? I know it > is too slow now to go in an ongoing test, but I don't have an > intuition around how fast it would have to be to be acceptable. Is > there a set of additional tests that are slower that we don't always > run? I didn't follow how the wraparound test ended up, but that seems > like one that would have been slow. > I think it depends on what is the impact on the 'make check' duration. If it could be added to one of the existing test groups, then it depends on how long the slowest test in that group is. If the new test needs to be in a separate group, it probably needs to be very fast. But I was wondering how much time are we talking about, so I tried creating a table, filling it with 300k rows => 250ms creating an index => 180ms delete 90% data => 200ms vacuum t => 130ms which with m_w_m=1MB does two rounds of index cleanup. That's ~760ms. I'm not sure how much more stuff does the test need to do, but this would be pretty reasonable, if we could add it to an existing group. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > We can fix this by always removing tuples considered dead before > VacuumCutoffs->OldestXmin. I don't have a great feeling about this fix. It's not that I think it's wrong. It's just that the underlying problem here is that we have heap_page_prune_and_freeze() getting both GlobalVisState *vistest and struct VacuumCutoffs *cutoffs, and the vistest wants to be in charge of deciding what gets pruned, but that doesn't actually work, because as I pointed out in http://postgr.es/m/CA+Tgmob1BtWcP6R5-toVHB5wqHasPTSR2TJkcDCutMzaUYBaHQ@mail.gmail.com it's not properly synchronized with vacrel->cutoffs.OldestXmin. Your fix is to consider both variables, which again may be totally correct, but wouldn't it be a lot better if we didn't have two variables fighting for control of the same behavior? (I'm not trying to be a nuisance here -- I think it's great that you've done the work to pin this down and perhaps there is no better fix than what you've proposed.) -- Robert Haas EDB: http://www.enterprisedb.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 11:44 AM Robert Haas <robertmhaas@gmail.com> wrote: > I don't have a great feeling about this fix. It's not that I think > it's wrong. It's just that the underlying problem here is that we have > heap_page_prune_and_freeze() getting both GlobalVisState *vistest and > struct VacuumCutoffs *cutoffs, and the vistest wants to be in charge > of deciding what gets pruned, but that doesn't actually work, because > as I pointed out in > http://postgr.es/m/CA+Tgmob1BtWcP6R5-toVHB5wqHasPTSR2TJkcDCutMzaUYBaHQ@mail.gmail.com > it's not properly synchronized with vacrel->cutoffs.OldestXmin. Your > fix is to consider both variables, which again may be totally correct, > but wouldn't it be a lot better if we didn't have two variables > fighting for control of the same behavior? Why would it be better? It's to our advantage to have vistest prune away extra tuples when possible. Andres placed a lot of emphasis on that during the snapshot scalability work -- vistest can be updated on the fly. The problem here is that OldestXmin is supposed to be more conservative than vistest, which it almost always is, except in this one edge case. I don't think that plugging that hole changes the basic fact that there is one source of truth about what *needs* to be pruned. There is such a source of truth: OldestXmin. -- Peter Geoghegan
On Mon, Jun 24, 2024 at 12:43 PM Peter Geoghegan <pg@bowt.ie> wrote: > The problem here is that OldestXmin is supposed to be more > conservative than vistest, which it almost always is, except in this > one edge case. I don't think that plugging that hole changes the basic > fact that there is one source of truth about what *needs* to be > pruned. There is such a source of truth: OldestXmin. Well, another approach could be to make it so that OldestXmin actually is always more conservative than vistest rather than almost always. I agree with you that letting the pruning horizon move forward during vacuum is desirable. I'm just wondering if having the vacuum code need to know a second horizon is really the best way to address that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 1:05 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jun 24, 2024 at 12:43 PM Peter Geoghegan <pg@bowt.ie> wrote: > > The problem here is that OldestXmin is supposed to be more > > conservative than vistest, which it almost always is, except in this > > one edge case. I don't think that plugging that hole changes the basic > > fact that there is one source of truth about what *needs* to be > > pruned. There is such a source of truth: OldestXmin. > > Well, another approach could be to make it so that OldestXmin actually > is always more conservative than vistest rather than almost always. If we did things like that then it would still be necessary to write a patch like the one Melanie came up with, on the grounds that we'd really need to be paranoid about having missed some subtlety. We might as well just rely on the mechanism directly. I just don't think that it makes much difference. -- Peter Geoghegan
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 1:05 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jun 24, 2024 at 12:43 PM Peter Geoghegan <pg@bowt.ie> wrote: > > The problem here is that OldestXmin is supposed to be more > > conservative than vistest, which it almost always is, except in this > > one edge case. I don't think that plugging that hole changes the basic > > fact that there is one source of truth about what *needs* to be > > pruned. There is such a source of truth: OldestXmin. > > Well, another approach could be to make it so that OldestXmin actually > is always more conservative than vistest rather than almost always. For the purposes of pruning, we are effectively always using the more conservative of the two with this patch. Are you more concerned about having a single horizon for pruning or about having a horizon that does not move backwards after being established at the beginning of vacuuming the relation? Right now, in master, we do use a single horizon when determining what is pruned -- that from GlobalVisState. OldestXmin is only used for freezing and full page visibility determinations. Using a different horizon for pruning by vacuum than freezing is what is causing the error on master. > I agree with you that letting the pruning horizon move forward during > vacuum is desirable. I'm just wondering if having the vacuum code need > to know a second horizon is really the best way to address that. I was thinking about this some more and I realized I don't really get why we think using GlobalVisState for pruning will let us remove more tuples in the common case. I had always thought it was because the vacuuming backend's GlobalVisState will get updated periodically throughout vacuum and so, assuming the oldest running transaction changes, our horizon for vacuum would change. But, in writing this repro, it is actually quite hard to get GlobalVisState to update. Our backend's RecentXmin needs to have changed. And there aren't very many places where we take a new snapshot after starting to vacuum a relation. One of those is at the end of index vacuuming, but that can only affect the pruning horizon if we have to do multiple rounds of index vacuuming. Is that really the case we are thinking of when we say we want the pruning horizon to move forward during vacuum? - Melanie
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 3:23 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > I had always thought it was because the vacuuming backend's > GlobalVisState will get updated periodically throughout vacuum and so, > assuming the oldest running transaction changes, our horizon for > vacuum would change. I believe that it's more of an aspirational thing at this point. That it is currently aspirational (it happens to some extent but isn't ever particularly useful) shouldn't change the analysis about how to fix this bug. > One of those is at the > end of index vacuuming, but that can only affect the pruning horizon > if we have to do multiple rounds of index vacuuming. Is that really > the case we are thinking of when we say we want the pruning horizon to > move forward during vacuum? No, that's definitely not what we were thinking of. It's just an accident that it's almost the only thing that'll do that. -- Peter Geoghegan
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > If vacuum fails to remove a tuple with xmax older than > VacuumCutoffs->OldestXmin and younger than > GlobalVisState->maybe_needed, it will ERROR out when determining > whether or not to freeze the tuple with "cannot freeze committed > xmax". One thing I don't understand is why it is okay to freeze the xmax of a dead tuple just because it is from an aborted update. heap_prepare_freeze_tuple() is called on HEAPTUPLE_RECENTLY_DEAD tuples with normal xmaxes (non-multis) so that it can freeze tuples from aborted updates. The only case in which we freeze dead tuples with a non-multi xmax is if the xmax is from before OldestXmin and is also not committed (so from an aborted update). Freezing dead tuples replaces their xmax with InvalidTransactionId -- which would make them look alive. So, it makes sense we don't do this for dead tuples in the common case. But why is it 1) okay and 2) desirable to freeze xmaxes of tuples from aborted updates? Won't it make them look alive again? - Melanie
On Mon, Jun 24, 2024 at 3:23 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > Are you more concerned about having a single horizon for pruning or > about having a horizon that does not move backwards after being > established at the beginning of vacuuming the relation? I'm not sure I understand. The most important thing here is fixing the bug. But if we have a choice of how to fix the bug, I'd prefer to do it by having the pruning code test one horizon that is always correct, rather than (as I think the patch does) having it test against two horizons because as a way of covering possible discrepancies between those values. > Right now, in master, we do use a single horizon when determining what > is pruned -- that from GlobalVisState. OldestXmin is only used for > freezing and full page visibility determinations. Using a different > horizon for pruning by vacuum than freezing is what is causing the > error on master. Agreed. > I had always thought it was because the vacuuming backend's > GlobalVisState will get updated periodically throughout vacuum and so, > assuming the oldest running transaction changes, our horizon for > vacuum would change. But, in writing this repro, it is actually quite > hard to get GlobalVisState to update. Our backend's RecentXmin needs > to have changed. And there aren't very many places where we take a new > snapshot after starting to vacuum a relation. One of those is at the > end of index vacuuming, but that can only affect the pruning horizon > if we have to do multiple rounds of index vacuuming. Is that really > the case we are thinking of when we say we want the pruning horizon to > move forward during vacuum? I thought the idea was that the GlobalVisTest stuff would force a recalculation now and then, but maybe it doesn't actually do that? Suppose process A begins a transaction, acquires an XID, and then goes idle. Process B now begins a giant vacuum. At some point in the middle of the vacuum, A ends the transaction. Are you saying that B's GlobalVisTest never really notices that this has happened? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 3:36 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > One thing I don't understand is why it is okay to freeze the xmax of a > dead tuple just because it is from an aborted update. We don't do that with XID-based xmaxs. Though perhaps we should, since we'll already prune-away the successor tuple, and so might as well go one tiny step further and clear the xmax for the original tuple via freezing/setting it InvalidTransactionId. Instead we just leave the original tuple largely undisturbed, with its original xmax. We do something like that with Multi-based xmax fields, though not with the specific goal of cleaning up after aborts in mind (we can also remove lockers that are no longer running, regardless of where they are relative to OldestXmin, stuff like that). The actual goal with that is to enforce MultiXactCutoff, independently of whether or not their member XIDs are < FreezeLimit yet. > The only case in which we freeze dead tuples > with a non-multi xmax is if the xmax is from before OldestXmin and is > also not committed (so from an aborted update). Perhaps I misunderstand, but: we simply don't freeze DEAD (not RECENTLY_DEAD) tuples in the first place, because we don't have to (pruning removes them instead). It doesn't matter if they're DEAD due to being from aborted transactions or DEAD due to being deleted/updated by a transaction that committed (committed and < OldestXmin). The freezing related code paths in heapam.c don't particularly care whether a tuple xmax is RECENTLY_DEAD or LIVE to HTSV + OldestXmin. Just as long as it's not fully DEAD (then it should have been pruned). -- Peter Geoghegan
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 4:36 PM Robert Haas <robertmhaas@gmail.com> wrote: > I'm not sure I understand. The most important thing here is fixing the > bug. But if we have a choice of how to fix the bug, I'd prefer to do > it by having the pruning code test one horizon that is always correct, > rather than (as I think the patch does) having it test against two > horizons because as a way of covering possible discrepancies between > those values. Your characterizing of OldestXmin + vistest as two horizons seems pretty arbitrary to me. I know what you mean, of course, but it seems like a distinction without a difference. > I thought the idea was that the GlobalVisTest stuff would force a > recalculation now and then, but maybe it doesn't actually do that? It definitely can do that. Just not in a way that meaningfully increases the number of heap tuples that we can recognize as DEAD and remove. At least not currently. > Suppose process A begins a transaction, acquires an XID, and then goes > idle. Process B now begins a giant vacuum. At some point in the middle > of the vacuum, A ends the transaction. Are you saying that B's > GlobalVisTest never really notices that this has happened? That's my understanding, yes. That is, vistest is approximately the same thing as OldestXmin anyway. At least for now. -- Peter Geoghegan
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 4:42 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Jun 24, 2024 at 3:36 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > One thing I don't understand is why it is okay to freeze the xmax of a > > dead tuple just because it is from an aborted update. > > We don't do that with XID-based xmaxs. Though perhaps we should, since > we'll already prune-away the successor tuple, and so might as well go > one tiny step further and clear the xmax for the original tuple via > freezing/setting it InvalidTransactionId. Instead we just leave the > original tuple largely undisturbed, with its original xmax. I thought that was the case too, but we call heap_prepare_freeze_tuple() on HEAPTUPLE_RECENTLY_DEAD tuples and then else if (TransactionIdIsNormal(xid)) { /* Raw xmax is normal XID */ freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin); } And then later we if (freeze_xmax) frz->xmax = InvalidTransactionId; and then when we execute freezing the tuple in heap_execute_freeze_tuple() HeapTupleHeaderSetXmax(tuple, frz->xmax); Which sets the xmax to InvalidTransactionId. Or am I missing something? > > The only case in which we freeze dead tuples > > with a non-multi xmax is if the xmax is from before OldestXmin and is > > also not committed (so from an aborted update). > > Perhaps I misunderstand, but: we simply don't freeze DEAD (not > RECENTLY_DEAD) tuples in the first place, because we don't have to > (pruning removes them instead). It doesn't matter if they're DEAD due > to being from aborted transactions or DEAD due to being > deleted/updated by a transaction that committed (committed and < > OldestXmin). Right, I'm talking about HEAPTUPLE_RECENTLY_DEAD tuples. HEAPTUPLE_DEAD tuples are pruned away. But we can't replace the xmax of a tuple that has been deleted or updated by a transaction that committed with InvalidTransactionId. And it seems like the code does that? Why even call heap_prepare_freeze_tuple() on HEAPTUPLE_RECENTLY_DEAD tuples? Is it mainly to handle MultiXact freezing? > The freezing related code paths in heapam.c don't particularly care > whether a tuple xmax is RECENTLY_DEAD or LIVE to HTSV + OldestXmin. > Just as long as it's not fully DEAD (then it should have been pruned). But it just seems like we shouldn't freeze RECENTLY_DEAD either. - Melanie
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 4:51 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Jun 24, 2024 at 4:36 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I thought the idea was that the GlobalVisTest stuff would force a > > recalculation now and then, but maybe it doesn't actually do that? > > It definitely can do that. Just not in a way that meaningfully > increases the number of heap tuples that we can recognize as DEAD and > remove. At least not currently. > > > Suppose process A begins a transaction, acquires an XID, and then goes > > idle. Process B now begins a giant vacuum. At some point in the middle > > of the vacuum, A ends the transaction. Are you saying that B's > > GlobalVisTest never really notices that this has happened? > > That's my understanding, yes. That is, vistest is approximately the > same thing as OldestXmin anyway. At least for now. Exactly. Something has to cause this backend to update its view of the horizon. At the end of index vacuuming, GetOldestNonRemovableTransactionId() will explicitly ComputeXidHorizons() which will update our backend's GlobalVisStates. Otherwise, if our backend's RecentXmin is updated, by taking a new snapshot, then we may update our GlobalVisStates. See GlobalVisTestShouldUpdate() for the conditions under which we would update our GlobalVisStates during the normal visibility checks happening during pruning. Vacuum used to open indexes after calculating horizons before starting its first pass. This led to a recomputation of the horizon. But, in master, there aren't many obvious places where such a thing would be happening. - Melanie
On Mon, Jun 24, 2024 at 04:51:24PM -0400, Peter Geoghegan wrote: > On Mon, Jun 24, 2024 at 4:36 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I'm not sure I understand. The most important thing here is fixing the > > bug. But if we have a choice of how to fix the bug, I'd prefer to do > > it by having the pruning code test one horizon that is always correct, > > rather than (as I think the patch does) having it test against two > > horizons because as a way of covering possible discrepancies between > > those values. > > Your characterizing of OldestXmin + vistest as two horizons seems > pretty arbitrary to me. I know what you mean, of course, but it seems > like a distinction without a difference. "Two horizons" matches how I model it. If the two were _always_ indicating the same notion of visibility, we wouldn't have this thread. On Mon, Jun 24, 2024 at 03:23:39PM -0400, Melanie Plageman wrote: > Right now, in master, we do use a single horizon when determining what > is pruned -- that from GlobalVisState. OldestXmin is only used for > freezing and full page visibility determinations. Using a different > horizon for pruning by vacuum than freezing is what is causing the > error on master. Agreed, and I think using different sources for pruning and freezing is a recipe for future bugs. Fundamentally, both are about answering "is snapshot_considers_xid_in_progress(snapshot, xid) false for every snapshot?" That's not to say this thread shall unify the two, but I suspect that's the right long-term direction.
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 9:30 PM Noah Misch <noah@leadboat.com> wrote: > On Mon, Jun 24, 2024 at 03:23:39PM -0400, Melanie Plageman wrote: > > Right now, in master, we do use a single horizon when determining what > > is pruned -- that from GlobalVisState. OldestXmin is only used for > > freezing and full page visibility determinations. Using a different > > horizon for pruning by vacuum than freezing is what is causing the > > error on master. > > Agreed, and I think using different sources for pruning and freezing is a > recipe for future bugs. Fundamentally, both are about answering "is > snapshot_considers_xid_in_progress(snapshot, xid) false for every snapshot?" > That's not to say this thread shall unify the two, but I suspect that's the > right long-term direction. What does it really mean to unify the two, though? If the OldestXmin field was located in struct GlobalVisState (next to definitely_needed and maybe_needed), but everything worked in essentially the same way as it will with Melanie's patch in place, would that count as unifying the two? Why or why not? -- Peter Geoghegan
On Mon, Jun 24, 2024 at 09:49:53PM -0400, Peter Geoghegan wrote: > On Mon, Jun 24, 2024 at 9:30 PM Noah Misch <noah@leadboat.com> wrote: > > On Mon, Jun 24, 2024 at 03:23:39PM -0400, Melanie Plageman wrote: > > > Right now, in master, we do use a single horizon when determining what > > > is pruned -- that from GlobalVisState. OldestXmin is only used for > > > freezing and full page visibility determinations. Using a different > > > horizon for pruning by vacuum than freezing is what is causing the > > > error on master. > > > > Agreed, and I think using different sources for pruning and freezing is a > > recipe for future bugs. Fundamentally, both are about answering "is > > snapshot_considers_xid_in_progress(snapshot, xid) false for every snapshot?" > > That's not to say this thread shall unify the two, but I suspect that's the > > right long-term direction. > > What does it really mean to unify the two, though? > > If the OldestXmin field was located in struct GlobalVisState (next to > definitely_needed and maybe_needed), but everything worked in > essentially the same way as it will with Melanie's patch in place, > would that count as unifying the two? Why or why not? To me, no, unification would mean removing the data redundancy. Relocating the redundant field and/or code that updates the redundant field certainly could reduce the risk of bugs, so I'm not opposing everything short of removing the data redundancy. I'm just agreeing with the "prefer" from https://postgr.es/m/CA+TgmoYzS_bkt_MrNxr5QrXDKfedmh4tStn8UBTTBXqv=3JTew@mail.gmail.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Hi, On 2024-06-24 16:35:50 -0400, Robert Haas wrote: > On Mon, Jun 24, 2024 at 3:23 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > Are you more concerned about having a single horizon for pruning or > > about having a horizon that does not move backwards after being > > established at the beginning of vacuuming the relation? > > I'm not sure I understand. The most important thing here is fixing the > bug. But if we have a choice of how to fix the bug, I'd prefer to do > it by having the pruning code test one horizon that is always correct, > rather than (as I think the patch does) having it test against two > horizons because as a way of covering possible discrepancies between > those values. I think that's going in the wrong direction. We *want* to prune more aggressively if we can (*), the necessary state is represented by the vistest. That's a different thing than *having* to prune tuples beyond a certain xmin (the cutoff determined by vacuum.c/vacuumlazy.c). The problem we're having here is that the two states can get out of sync due to the vistest "moving backwards", because of hot_standby_feedback (and perhaps also an issue around aborts). To prevent that we can a) make sure that we always take the hard cutoff into account b) prevent vistest from going backwards (*) we really ought to become more aggressive, by removing intermediary row versions when they're not visible to anyone, but not yet old enough to be below the horizon. But that realistically will only be possible in *some* cases, e.g. when the predecessor row version is on the same page. > > I had always thought it was because the vacuuming backend's > > GlobalVisState will get updated periodically throughout vacuum and so, > > assuming the oldest running transaction changes, our horizon for > > vacuum would change. But, in writing this repro, it is actually quite > > hard to get GlobalVisState to update. Our backend's RecentXmin needs > > to have changed. And there aren't very many places where we take a new > > snapshot after starting to vacuum a relation. One of those is at the > > end of index vacuuming, but that can only affect the pruning horizon > > if we have to do multiple rounds of index vacuuming. Is that really > > the case we are thinking of when we say we want the pruning horizon to > > move forward during vacuum? > > I thought the idea was that the GlobalVisTest stuff would force a > recalculation now and then, but maybe it doesn't actually do that? It forces an accurate horizon to be determined the first time it would require it to determine visibility. The "first time" is determined by RecentXmin not having changed. The main goal of the vistest stuff was to not have to determine an accurate horizon in GetSnapshotData(). Determining an accurate horizon requires accessing each backends ->xmin, which causes things to scale badly, as ->xmin changes so frequently. The cost of determining the accurate horizon is irrelevant for vacuums, but it's not at all irrelevant for on-access pruning. > Suppose process A begins a transaction, acquires an XID, and then goes > idle. Process B now begins a giant vacuum. At some point in the middle > of the vacuum, A ends the transaction. Are you saying that B's > GlobalVisTest never really notices that this has happened? Not at the moment, but we should add heuristics like that. Greetings, Andres Freund
On Tue, Jun 25, 2024 at 8:03 AM Andres Freund <andres@anarazel.de> wrote: > I think that's going in the wrong direction. We *want* to prune more > aggressively if we can (*), the necessary state is represented by the > vistest. That's a different thing than *having* to prune tuples beyond a > certain xmin (the cutoff determined by vacuum.c/vacuumlazy.c). The problem > we're having here is that the two states can get out of sync due to the > vistest "moving backwards", because of hot_standby_feedback (and perhaps also > an issue around aborts). I agree that we want to prune more aggressively if we can. I think that fixing this by preventing vistest from going backward is reasonable, and I like it better than what Melanie proposed, although I like what Melanie proposed much better than not fixing it! I'm not sure how to do that cleanly, but one of you may have an idea. I do think that having a bunch of different XID values that function as horizons and a vistest object that holds some more XID horizons floating around in vacuum makes the code hard to understand. The relationships between the various values are not well-documented. For instance, the vistest has to be after vacrel->cutoffs.OldestXmin for correctness, but I don't think there's a single comment anywhere saying that; meanwhile, the comments for VacuumCutoffs say "OldestXmin is the Xid below which tuples deleted by any xact (that committed) should be considered DEAD, not just RECENTLY_DEAD." Surely the reader can be forgiven for thinking that this is the cutoff that will actually be used by pruning, but it isn't. And more generally, it seems like a fairly big problem to me that LVRelState directly stores NewRelfrozenXid; contains a VacuumCutoffs object that stores relfrozenxid, OldestXmin, and FreezeLimit; and also points to a GlobalVisState object that contains definitely_needed and maybe_needed. That is six different XID cutoffs for one vacuum operation. That's a lot. I can't describe how they're all different from each other or what the necessary relationships between them are off-hand, and I bet nobody else could either, at least until recently, else we might not have this bug. I feel like if it were possible to have fewer of them and still have things work, we'd be better off. I'm not sure that's doable. But six seems like a lot. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On 2024-06-25 08:42:02 -0400, Robert Haas wrote: > On Tue, Jun 25, 2024 at 8:03 AM Andres Freund <andres@anarazel.de> wrote: > > I think that's going in the wrong direction. We *want* to prune more > > aggressively if we can (*), the necessary state is represented by the > > vistest. That's a different thing than *having* to prune tuples beyond a > > certain xmin (the cutoff determined by vacuum.c/vacuumlazy.c). The problem > > we're having here is that the two states can get out of sync due to the > > vistest "moving backwards", because of hot_standby_feedback (and perhaps also > > an issue around aborts). > > I agree that we want to prune more aggressively if we can. I think > that fixing this by preventing vistest from going backward is > reasonable, and I like it better than what Melanie proposed, although > I like what Melanie proposed much better than not fixing it! I'm not > sure how to do that cleanly, but one of you may have an idea. It's not hard - but it has downsides. It'll mean that - outside of vacuum - we'll much more often not react to horizons going backwards due to hot_standby_feedback. Which means that hot_standby_feedback, when used without slots, will prevent fewer conflicts. > I do think that having a bunch of different XID values that function > as horizons and a vistest object that holds some more XID horizons > floating around in vacuum makes the code hard to understand. The > relationships between the various values are not well-documented. For > instance, the vistest has to be after vacrel->cutoffs.OldestXmin for > correctness, but I don't think there's a single comment anywhere > saying that; It is somewhat documented: * Note: the approximate horizons (see definition of GlobalVisState) are * updated by the computations done here. That's currently required for * correctness and a small optimization. Without doing so it's possible that * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative * horizon than later when deciding which tuples can be removed - which the * code doesn't expect (breaking HOT). > And more generally, it seems like a fairly big problem to me that > LVRelState directly stores NewRelfrozenXid; contains a VacuumCutoffs > object that stores relfrozenxid, OldestXmin, and FreezeLimit; and also > points to a GlobalVisState object that contains definitely_needed and > maybe_needed. That is six different XID cutoffs for one vacuum > operation. That's a lot. I can't describe how they're all different > from each other or what the necessary relationships between them are > off-hand, and I bet nobody else could either, at least until recently, > else we might not have this bug. I feel like if it were possible to > have fewer of them and still have things work, we'd be better off. I'm > not sure that's doable. But six seems like a lot. Agreed. I don't think you can just unify things though, they actually are all different for good, or at least decent, reasons. I think improving the naming alone could help a good bit though. Greetings, Andres Freund
On Tue, Jun 25, 2024 at 9:07 AM Andres Freund <andres@anarazel.de> wrote: > It's not hard - but it has downsides. It'll mean that - outside of vacuum - > we'll much more often not react to horizons going backwards due to > hot_standby_feedback. Which means that hot_standby_feedback, when used without > slots, will prevent fewer conflicts. Can you explain this in more detail? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Mon, Jun 24, 2024 at 4:10 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote:We can fix this by always removing tuples considered dead before VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby has a transaction that sees that tuple as alive, because it will simply wait to replay the removal until it would be correct to do so or recovery conflict handling will cancel the transaction that sees the tuple as alive and allow replay to continue. This is an interesting and difficult case) I noticed that when initializing the cluster, in my opinion, we provide excessive freezing. Initialization takes a long time, which can lead, for example, to longer test execution. I got rid of this by adding the OldestMxact checkbox is not FirstMultiXactId, and it works fine. if (prstate->cutoffs && TransactionIdIsValid(prstate->cutoffs->OldestXmin) && prstate->cutoffs->OldestMxact != FirstMultiXactId && NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin)) return HEAPTUPLE_DEAD; Can I keep it?This looks like an addition to the new criteria I added to heap_prune_satisfies_vacuum(). Is that what you are suggesting? If so, it looks like it would only return HEAPTUPLE_DEAD (and thus only remove) a subset of the tuples my original criteria would remove. When vacuum calculates OldestMxact as FirstMultiXactId, it would not remove those tuples deleted before OldestXmin. It seems like OldestMxact will equal FirstMultiXactID sometimes right after initdb and after transaction ID wraparound. I'm not sure I totally understand the criteria. One thing I find confusing about this is that this would actually remove less tuples than with my criteria -- which could lead to more freezing. When vacuum calculates OldestMxact == FirstMultiXactID, we would not remove tuples deleted before OldestXmin and thus return HEAPTUPLE_RECENTLY_DEAD for those tuples. Then we would consider freezing them. So, it seems like we would do more freezing by adding this criteria. Could you explain more about how the criteria you are suggesting works? Are you saying it does less freezing than master or less freezing than with my patch?
At first, I noticed that with this patch, vacuum fouls the nodes more often than before, and it seemed to me that more time was spent initializing the cluster with this patch than before, so I suggested considering this condition. After checking again, I found that the problem was with my laptop. So, sorry for the noise.
I'm sorry, I need a little more time to figure this out. I will answer this question later.Attached is the suggested fix for master plus a repro. I wrote it as a recovery suite TAP test, but I am _not_ proposing we add it to the ongoing test suite. It is, amongst other things, definitely prone to flaking. I also had to use loads of data to force two index vacuuming passes now that we have TIDStore, so it is a slow test.-- snip --I have a modified version of this that repros the infinite loop on 14-16 with substantially less data. See it here [2]. Also, the repro attached to this mail won't work on 14 and 15 because of changes to background_psql. I couldn't understand why the replica is necessary here. Now I am digging why I got the similar behavior without replica when I have only one instance. I'm still checking this in my test, but I believe this patch fixes the original problem because the symptoms were the same.Did you get similar behavior on master or on back branches? Was the behavior you observed the infinite loop or the error during heap_prepare_freeze_tuple()? In my examples, the replica is needed because something has to move the horizon on the primary backwards. When a standby reconnects with an older oldest running transaction ID than any of the running transactions on the primary and the vacuuming backend recomputes its RecentXmin, the horizon may move backwards when compared to the horizon calculated at the beginning of the vacuum. Vacuum does not recompute cutoffs->OldestXmin during vacuuming a relation but it may recompute the values in the GlobalVisState it uses for pruning. We knew of only one other way that the horizon could move backwards which Matthias describes here [1]. However, this is thought to be its own concurrency-related bug in the commit-abort path that should be fixed -- as opposed to the standby reconnecting with an older oldest running transaction ID which can be expected. Do you know if you were seeing the effects of the scenario Matthias describes? [1] https://www.postgresql.org/message-id/CAEze2WjMTh4KS0%3DQEQB-Jq%2BtDLPR%2B0%2BzVBMfVwSPK5A%3DWZa95Q%40mail.gmail.com
-- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Tue, Jun 25, 2024 at 10:31 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jun 25, 2024 at 9:07 AM Andres Freund <andres@anarazel.de> wrote: > > It's not hard - but it has downsides. It'll mean that - outside of vacuum - > > we'll much more often not react to horizons going backwards due to > > hot_standby_feedback. Which means that hot_standby_feedback, when used without > > slots, will prevent fewer conflicts. > > Can you explain this in more detail? If we prevent GlobalVisState from moving backward, then we would less frequently be pushing the horizon backward on the primary in response to hot standby feedback. Then, the primary would do more things that would not be safely replayable on the standby -- so the standby could end up encountering more recovery conflicts. - Melanie
On Tue, Jun 25, 2024 at 11:39 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > On Tue, Jun 25, 2024 at 10:31 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jun 25, 2024 at 9:07 AM Andres Freund <andres@anarazel.de> wrote: > > > It's not hard - but it has downsides. It'll mean that - outside of vacuum - > > > we'll much more often not react to horizons going backwards due to > > > hot_standby_feedback. Which means that hot_standby_feedback, when used without > > > slots, will prevent fewer conflicts. > > > > Can you explain this in more detail? > > If we prevent GlobalVisState from moving backward, then we would less > frequently be pushing the horizon backward on the primary in response > to hot standby feedback. Then, the primary would do more things that > would not be safely replayable on the standby -- so the standby could > end up encountering more recovery conflicts. I don't get it. hot_standby_feedback only moves horizons backward on the primary, AFAIK, when it first connects, or when it reconnects. Which I guess could be frequent for some users with flaky networks, but does that really rise to the level of "much more often"? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Hi, On 2024-06-25 12:31:11 -0400, Robert Haas wrote: > On Tue, Jun 25, 2024 at 11:39 AM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > On Tue, Jun 25, 2024 at 10:31 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > On Tue, Jun 25, 2024 at 9:07 AM Andres Freund <andres@anarazel.de> wrote: > > > > It's not hard - but it has downsides. It'll mean that - outside of vacuum - > > > > we'll much more often not react to horizons going backwards due to > > > > hot_standby_feedback. Which means that hot_standby_feedback, when used without > > > > slots, will prevent fewer conflicts. > > > > > > Can you explain this in more detail? > > > > If we prevent GlobalVisState from moving backward, then we would less > > frequently be pushing the horizon backward on the primary in response > > to hot standby feedback. Then, the primary would do more things that > > would not be safely replayable on the standby -- so the standby could > > end up encountering more recovery conflicts. > > I don't get it. hot_standby_feedback only moves horizons backward on > the primary, AFAIK, when it first connects, or when it reconnects. > Which I guess could be frequent for some users with flaky networks, > but does that really rise to the level of "much more often"? Well, the thing is that with the "prevent it from going backwards" approach, once the horizon is set to something recent in a backend, it's "sticky". If a replica is a bit behind or if there's a long-lived snapshot and disconnects, the vistest state will advance beyond where the replica needs it to be. Even if the standby later reconnects, the vistest in long-lived sessions will still have the more advanced state. So all future pruning these backends do runs into the risk of performing pruning that removes rows the standby still deems visible and thus causes recovery conflicts. I.e. you don't even need frequent disconnects, you just need one disconnect and sessions that aren't shortlived. That said, obviously there will be plenty setups where this won't cause an issue. I don't really have a handle on how often it'd be a problem. Greetings, Andres Freund
On Tue, Jun 25, 2024 at 1:10 PM Andres Freund <andres@anarazel.de> wrote: > That said, obviously there will be plenty setups where this won't cause an > issue. I don't really have a handle on how often it'd be a problem. Fair enough. Even if it's not super-common, it doesn't seem like a great idea to regress such scenarios in the back-branches. Is there any way that we could instead tweak things so that we adjust the visibility test object itself? Like can have a GlobalVisTest API where we can supply the OldestXmin from the VacuumCutoffs and have it ... do something useful with that? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On 2024-06-25 14:35:00 -0400, Robert Haas wrote: > Is there any way that we could instead tweak things so that we adjust > the visibility test object itself? Like can have a GlobalVisTest API > where we can supply the OldestXmin from the VacuumCutoffs and have it > ... do something useful with that? I doubt that's doable in the back branches. And even on HEAD, I don't think it's a particularly attractive option - there's just a global vistest for each of the types of objects with a specific horizon (they need to be updated occasionally, e.g. when taking snapshots). So there's not really a spot to put an associated OldestXmin. We could put it there and remove it at the end of vacuum / in an exception handler, but that seems substantially worse.
On Tue, Jun 25, 2024 at 4:41 PM Andres Freund <andres@anarazel.de> wrote: > I doubt that's doable in the back branches. And even on HEAD, I don't think > it's a particularly attractive option - there's just a global vistest for each > of the types of objects with a specific horizon (they need to be updated > occasionally, e.g. when taking snapshots). So there's not really a spot to put > an associated OldestXmin. We could put it there and remove it at the end of > vacuum / in an exception handler, but that seems substantially worse. Oh, right: I forgot that the visibility test objects were just pointers to global variables. Well, I don't know. I guess that doesn't leave any real options but to fix it as Melanie proposed. But I still don't like it very much. I feel like having to test against two different thresholds in the pruning code is surely a sign that we're doing something wrong. -- Robert Haas EDB: http://www.enterprisedb.com