Re: Synchronizing slots from primary to standby
От | James Coleman |
---|---|
Тема | Re: Synchronizing slots from primary to standby |
Дата | |
Msg-id | CAAaqYe-bB5Rh_fSkZyEUFwUfWfLSqatP8apTVUui4x-S4=1Ung@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Synchronizing slots from primary to standby (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Список | pgsql-hackers |
On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > The way I understand it: > > 1. This feature (probably) depends on the "Minimal logical decoding on > standbys" patch. The details there aren't totally clear (to me). That > patch had some activity lately but I don't see it in a state that it's > nearing readiness. > > 2. I think the way this (my) patch is currently written needs some > refactoring about how we launch and manage workers. Right now, it's all > mangled together with logical replication, since that is a convenient > way to launch and manage workers, but it really doesn't need to be tied > to logical replication, since it can also be used for other logical slots. > > 3. It's an open question how to configure this. My patch show a very > minimal configuration that allows you to keep all logical slots always > behind one physical slot, which addresses one particular use case. In > general, you might have things like, one set of logical slots should > stay behind one physical slot, another set behind another physical slot, > another set should not care, etc. This could turn into something like > the synchronous replication feature, where it ends up with its own > configuration language. > > Each of these are clearly significant jobs on their own. > Thanks for bringing this topic back up again. I haven't gotten a chance to do any testing on the patch yet, but here are my initial notes from reviewing it: First, reusing the logical replication launcher seems a bit gross. It's obviously a pragmatic choice, but I find it confusing and likely to become only moreso given the fact there's nothing about slot syncing that's inherently limited to logical slots. Plus the feature currently is about syncing slots on a physical replica. So I think that probably should change. Second, it seems to me that the worker-per-DB architecture means that this is unworkable on a cluster with a large number of DBs. The original thread said that was because "logical slots are per database, walrcv_exec needs db connection, etc". As to the walrcv_exec, we're (re)connecting to the primary for each synchronization anyway, so that doesn't seem like a significant reason. I don't understand why logical slots being per-database means we have to do it this way. Is there something about the background worker architecture (I'm revealing my own ignorance here I suppose) that requires this? Also it seems that we reconnect to the primary every time we want to synchronize slots. Maybe that's OK, but it struck me as a bit odd, so I wanted to ask about it. Third, wait_for_standby_confirmation() needs a function comment. Andres noted this earlier, but it seems like we're doing quite a bit of work in this function for each commit. Some of it is obviously duplicative like the parsing of standby_slot_names. The waiting introduced also doesn't seem like a good idea. Andres also commented on that earlier; I'd echo his comments here absent an explanation of why it's preferable/necessary to do it this way. > + if (flush_pos >= commit_lsn && wait_slots_remaining > 0) > + wait_slots_remaining --; I might be missing something re: project style, but the space before the "--" looks odd to my eyes. > * Call either PREPARE (for two-phase transactions) or COMMIT (for > * regular ones). > */ > + > + wait_for_standby_confirmation(commit_lsn); > + > if (rbtxn_prepared(txn)) > rb->prepare(rb, txn, commit_lsn); > else It appears the addition of this call splits the comment from the code it goes with. > + * Wait for remote slot to pass localy reserved position. Typo ("localy" -> "locally"). This patch would be a significant improvement for us; I'm hoping we can see some activity on it. I'm also hoping to try to do some testing next week and see if I can poke any holes in the functionality (with the goal of verifying Andres's concerns about the safety without the minimal logical decoding on a replica patch). Thanks, James Coleman
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter GeogheganДата:
Сообщение: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Следующее
От: David ZhangДата:
Сообщение: Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit