RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS3PR01MB57184FA4AB9F2E6F1BF802DC945A2@OS3PR01MB5718.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Friday, February 23, 2024 6:12 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Here are some random comments:

Thanks for the comments!

>
> 1 ===
>
> Commit message "Allow logical walsenders to wait for the physical"
>
> s/physical/physical standby/?
>
> 2 ==
>
> +++ b/src/backend/replication/logical/logicalfuncs.c
> @@ -30,6 +30,7 @@
>  #include "replication/decode.h"
>  #include "replication/logical.h"
>  #include "replication/message.h"
> +#include "replication/walsender.h"
>
> Is this include needed?

Removed.

>
> 3 ===
>
> +        * Slot sync is currently not supported on the cascading
> + standby. This is
>
> s/on the/on a/?

Changed.

>
> 4 ===
>
> +       if (!ok)
> +               GUC_check_errdetail("List syntax is invalid.");
> +
> +       /*
> +        * If there is a syntax error in the name or if the replication slots'
> +        * data is not initialized yet (i.e., we are in the startup process), skip
> +        * the slot verification.
> +        */
> +       if (!ok || !ReplicationSlotCtl)
> +       {
> +               pfree(rawname);
> +               list_free(elemlist);
> +               return ok;
> +       }
>
> we are testing the "ok" value twice, what about using if...else if... instead and
> test it once? If so, it might be worth to put the:
>
> "
> +       pfree(rawname);
> +       list_free(elemlist);
> +       return ok;
> "
>
> in a "goto".

There were comments to remove the 'goto' statement and avoid
duplicate free code, so I prefer the current style.

>
> 5 ===
>
> +        * for which all standbys to wait for. Even if we have
> + physical-slots
>
> s/physical-slots/physical slots/?

Changed.

>
> 6 ===
>
>         * Switch to the same memory context under which GUC variables are
>
> s/to the same memory/to the memory/?

Changed.

>
> 7 ===
>
> + * Return a copy of standby_slot_names_list if the copy flag is set to
> + true,
>
> Not sure, but would it be worth explaining why one would want to set to flag to
> true or false? (i.e why one would not want to receive the original list).

I think the usage can be found from the caller's code, e.g we need to remove
the slots that caught up from the list each time, so we cannot directly modify
the global list. The GetStandbySlotList function is general function and I feel
we can avoid adding more comments here.

>
> 8 ===
>
> +       if (RecoveryInProgress())
> +               return NIL;
>
> The need is well documented just above, but are we not violating the fact that
> we return the original list or a copy of it? (that's what the comment above the
> GetStandbySlotList() function definition is saying).
>
> I think the comment above the GetStandbySlotList() function needs a bit of
> rewording to cover that case.

Adjusted.


>
> 9 ===
>
> +                        * harmless, a WARNING should be enough, no need to
> error-out.
>
> s/error-out/error out/?

Changed.

>
> 10 ===
>
> +                       if (slot->data.invalidated != RS_INVAL_NONE)
> +                       {
> +                               /*
> +                                * Specified physical slot have been invalidated,
> so no point
> +                                * in waiting for it.
>
> We discovered in [1], that if the wal_status is "unreserved" then the slot is still
> serving the standby. I think we should handle this case differently, thoughts?

I think the 'invalidated' slot can still be used is a separate bug. Because
once the slot is invalidated, it can neither protect WALs or ROWs from being
removed even if the restart_lsn of the slot can be moved forward after being invalidated.

If the standby can move restart_lsn forward for invalidated slots, then
it should also set the 'invalidated' flag back to NONE, otherwise the slot
cannot serve its purpose anymore. I also reported similar bug before[1].

>
> 11 ===
>
> +                                * Specified physical slot have been
> + invalidated, so no point
>
> s/have been/has been/?

Changed.

>
> 12 ===
>
> +++ b/src/backend/replication/slotfuncs.c
> @@ -22,6 +22,7 @@
>  #include "replication/logical.h"
>  #include "replication/slot.h"
>  #include "replication/slotsync.h"
> +#include "replication/walsender.h"
>
> Is this include needed?

No, it's not needed. Removed.

Attach the V98 patch set which addressed above comments.
I also adjusted few comments based on off-list comments from Shveta.

The discussion for wait behavior is on-going, so I didn't change the behavior in this version.

[1]
https://www.postgresql.org/message-id/flat/OS0PR01MB5716A626A4AF5814E057CEE39484A@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andy Fan
Дата:
Сообщение: Re: Shared detoast Datum proposal
Следующее
От: Andrei Lepikhov
Дата:
Сообщение: Re: "type with xxxx does not exist" when doing ExecMemoize()