Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От James Coleman
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAAaqYe-bB5Rh_fSkZyEUFwUfWfLSqatP8apTVUui4x-S4=1Ung@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Список pgsql-hackers
On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> The way I understand it:
>
> 1. This feature (probably) depends on the "Minimal logical decoding on
> standbys" patch.  The details there aren't totally clear (to me).  That
> patch had some activity lately but I don't see it in a state that it's
> nearing readiness.
>
> 2. I think the way this (my) patch is currently written needs some
> refactoring about how we launch and manage workers.  Right now, it's all
> mangled together with logical replication, since that is a convenient
> way to launch and manage workers, but it really doesn't need to be tied
> to logical replication, since it can also be used for other logical slots.
>
> 3. It's an open question how to configure this.  My patch show a very
> minimal configuration that allows you to keep all logical slots always
> behind one physical slot, which addresses one particular use case.  In
> general, you might have things like, one set of logical slots should
> stay behind one physical slot, another set behind another physical slot,
> another set should not care, etc.  This could turn into something like
> the synchronous replication feature, where it ends up with its own
> configuration language.
>
> Each of these are clearly significant jobs on their own.
>

Thanks for bringing this topic back up again.

I haven't gotten a chance to do any testing on the patch yet, but here
are my initial notes from reviewing it:

First, reusing the logical replication launcher seems a bit gross.
It's obviously a pragmatic choice, but I find it confusing and likely
to become only moreso given the fact there's nothing about slot
syncing that's inherently limited to logical slots. Plus the feature
currently is about syncing slots on a physical replica. So I think
that probably should change.

Second, it seems to me that the worker-per-DB architecture means that
this is unworkable on a cluster with a large number of DBs. The
original thread said that was because "logical slots are per database,
walrcv_exec needs db connection, etc". As to the walrcv_exec, we're
(re)connecting to the primary for each synchronization anyway, so that
doesn't seem like a significant reason. I don't understand why logical
slots being per-database means we have to do it this way. Is there
something about the background worker architecture (I'm revealing my
own ignorance here I suppose) that requires this?

Also it seems that we reconnect to the primary every time we want to
synchronize slots. Maybe that's OK, but it struck me as a bit odd, so
I wanted to ask about it.

Third, wait_for_standby_confirmation() needs a function comment.
Andres noted this earlier, but it seems like we're doing quite a bit
of work in this function for each commit. Some of it is obviously
duplicative like the parsing of standby_slot_names. The waiting
introduced also doesn't seem like a good idea. Andres also commented
on that earlier; I'd echo his comments here absent an explanation of
why it's preferable/necessary to do it this way.

> + if (flush_pos >= commit_lsn && wait_slots_remaining > 0)
> +         wait_slots_remaining --;

I might be missing something re: project style, but the space before
the "--" looks odd to my eyes.

>    * Call either PREPARE (for two-phase transactions) or COMMIT (for
>    * regular ones).
>    */
> +
> + wait_for_standby_confirmation(commit_lsn);
> +
>   if (rbtxn_prepared(txn))
>           rb->prepare(rb, txn, commit_lsn);

>  else

It appears the addition of this call splits the comment from the code
it goes with.

> + * Wait for remote slot to pass localy reserved position.

Typo ("localy" -> "locally").

This patch would be a significant improvement for us; I'm hoping we
can see some activity on it. I'm also hoping to try to do some testing
next week and see if I can poke any holes in the functionality (with
the goal of verifying Andres's concerns about the safety without the
minimal logical decoding on a replica patch).

Thanks,
James Coleman



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Следующее
От: David Zhang
Дата:
Сообщение: Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit