On Mon, May 15, 2023 at 03:38:17PM +1200, Thomas Munro wrote:
> On Fri, May 12, 2023 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> > 2023-05-11 20:19:22.248 MSK [2037134] FATAL: invalid memory alloc request size 2021163525
> > 2023-05-11 20:19:22.248 MSK [2037114] LOG: startup process (PID 2037134) exited with exit code 1
>
> Thanks Alexander. Looking into this. I think it is probably
> something like: recycled standby pages are not zeroed (something we
> already needed to do something about[1]), and when we read a recycled
> garbage size (like your "xxxx") at the end of a page at an offset
> where we don't have a full record header on one page, we skip the
> ValidXLogRecordHeader() call (and always did), but the check in
> allocate_recordbuf() which previously handled that "gracefully" (well,
> it would try to allocate up to 1GB bogusly, but it wouldn't try to
> allocate more than that and ereport) is a bit too late. I probably
> need to add an earlier not-too-big validation. Thinking.
I agree about an earlier not-too-big validation. Like the attached? I
haven't tested it with Alexander's recipe or pondered it thoroughly.
> [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de
Regarding [1], is it still worth zeroing recycled pages on standbys and/or
reading the whole header before allocating xl_tot_len? (Are there benefits
other than avoiding a 1G backend allocation or 4G frontend allocation, or is
that benefit worth the cycles?)