Hi,
On 11/8/23 4:50 AM, Amit Kapila wrote:
> On Tue, Nov 7, 2023 at 7:58 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>> If we think this window is too short we could:
>>
>> - increase it
>> or
>> - don't drop the slot once created (even if there is no activity
>> on the primary during PrimaryCatchupWaitAttempt attempts) so that
>> the next loop of attempts will compare with "older" LSN/xmin (as compare to
>> dropping and re-creating the slot). That way the window would be since the
>> initial slot creation.
>>
>
> Yeah, this sounds reasonable but we can't mark such slots to be
> synced/available for use after failover.
Yeah, currently we are fine as slots are dropped in wait_for_primary_slot_catchup() if
we are not in recovery anymore.
> I think if we want to follow
> this approach then we need to also monitor these slots for any change
> in the consecutive cycles and if we are able to sync them then
> accordingly we enable them to use after failover.
What about to add a new field in ReplicationSlotPersistentData
indicating that we are waiting for "sync" and drop such slots during promotion and
/or if not in recovery?
> Another somewhat related point is that right now, we just wait for the
> change on the first slot (the patch refers to it as the monitoring
> slot) for computing nap_time before which we will recheck all the
> slots. I think we can improve that as well such that even if any
> slot's information is changed, we don't consider changing naptime.
>
Yeah, that sounds reasonable to me.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com