On 4 August 2018 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Aug 04, 2018 at 07:44:59AM +0100, Simon Riggs wrote:
>> I think the problem is that writing the online checkpoint is deferred
>> after promotion, so this is a timing issue that probably doesn't show
>> in our regression tests.
>
> Somewhat. It is a performance improvement of 9.3 to let the startup
> request a checkpoint to the checkpointer process instead of doing it
> itself.
Yes, and so issuing a manual CHECKPOINT would remove that benefit.
>> Sounds like we should write a pending timeline change to the control
>> file and have pg_rewind check that instead.
>>
>> I'd call this a timing bug, not a doc issue.
>
> Well, having pg_rewind enforce a checkpoint on the promoted standby
> could cause a performance hit as well if we do it mandatorily as if
> there is delay between the promotion and the rewind triggerring a
> checkpoint could have already happen. So it is for me a documentation
> bug first regarding the failover workflow, and potentially a patch for a
> new feature which makes pg_rewind trigger directly a checkpoint.
pg_rewind doesn't work correctly. Documenting a workaround doesn't change that.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services