Обсуждение: Why does PostgresNode.pm set such a low value of max_wal_senders?
I noticed this recent buildfarm failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-09-29%2018%3A45%3A17 which boils down to error running SQL: 'psql:<stdin>:1: ERROR: could not connect to the publisher: FATAL: number of requested standby connectionsexceeds max_wal_senders (currently 5)' while running 'psql -XAtq -d port=62411 host=/tmp/cmXKiWUDs9 dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'ALTER SUBSCRIPTIONsub2 REFRESH PUBLICATION' at /home/pgbf/buildroot/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pmline 1546. Digging in the postmaster log shows that indeed we were at the limit of 5 wal senders. One was about to exit (else this test could never succeed at all), but it had not done so fast enough to avoid this failure. Further digging in the buildfarm archives shows that "number of requested standby connections exceeds max_wal_senders" seems rather common on our slower buildfarm members, eg there are two such complaints in prairiedog's latest successful HEAD build. Apparently, most of the time this gets masked by automatic restart of logrep workers; but when a test script involves explicit execution of a replication command, it's going to notice if that try fails to connect. So I wonder why PostgresNode.pm is doing print $conf "max_wal_senders = 5\n"; Considering that our default these days is 10 senders, and that a walsender slot doesn't really cost much, this seems unduly cheapskate. I propose raising this to 10. There might be some value in the fact that this situation is exercising the automatic-reconnection behavior, but if so I'd like to find a more consistent way of testing that. regards, tom lane
On 2020-Sep-29, Tom Lane wrote: > So I wonder why PostgresNode.pm is doing > > print $conf "max_wal_senders = 5\n"; > > Considering that our default these days is 10 senders, and that a > walsender slot doesn't really cost much, this seems unduly cheapskate. > I propose raising this to 10. I suggest to remove that line. max_wal_senders used to default to 0 when PostgresNode was touched to have this line in commit 89ac7004dad; the global default was raised in f6d6d2920d2c. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-29, Tom Lane wrote: >> So I wonder why PostgresNode.pm is doing >> print $conf "max_wal_senders = 5\n"; >> Considering that our default these days is 10 senders, and that a >> walsender slot doesn't really cost much, this seems unduly cheapskate. >> I propose raising this to 10. > I suggest to remove that line. max_wal_senders used to default to 0 > when PostgresNode was touched to have this line in commit 89ac7004dad; > the global default was raised in f6d6d2920d2c. Hm. We could do so back to v10 where that came in, and there are no src/test/subscription tests before v10, so that should be sufficient. Sold. regards, tom lane
Michael Paquier <michael@paquier.xyz> writes: > On Wed, Sep 30, 2020 at 10:38:59PM -0700, Noah Misch wrote: >> In favor of minimal values, we've had semaphore-starved buildfarm members in >> the past. Perhaps those days are over, seeing that this commit has not yet >> broken a buildfarm member in that particular way. Keeping max_wal_senders=10 >> seems fine. > Indeed, I am not spotting anything suspicious here. Yeah, so far so good. Note that PostgresNode.pm does attempt to cater for semaphore-starved machines, by cutting max_connections as much as it can. In practice the total semaphore usage of a subscription test is probably still less than that of one postmaster with default max_connections. >> No, PostgreSQL commit 54c2ecb changed that. I recommend an explicit >> max_wal_senders=10 in PostgresNode, which makes it easy to test >> wal_level=minimal: > Ah, thanks, I have missed this piece. So we really need to have a > value set in this module after all. Agreed, I'll go put it back. On the other point, I think that we should continue to complain about max_wal_senders > 0 with wal_level = minimal. If we reduce that to a LOG message, which'd be the net effect of trying to be laxer, people wouldn't see it and would then wonder why they can't start replication. regards, tom lane
At Wed, 30 Sep 2020 22:38:59 -0700, Noah Misch <noah@leadboat.com> wrote in noah> Perhaps wal_level=minimal should stop its pedantic call for max_wal_senders=0. noah> As long as the relevant error messages are clear, it would be fine for noah> wal_level=minimal to ignore max_wal_senders and size resources as though noah> max_wal_senders=0. That could be one less snag for end users. (It's not noah> worth changing solely to save a line in PostgresNode, though.) At Thu, 01 Oct 2020 09:42:52 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in tgl> On the other point, I think that we should continue to complain tgl> about max_wal_senders > 0 with wal_level = minimal. If we reduce tgl> that to a LOG message, which'd be the net effect of trying to be tgl> laxer, people wouldn't see it and would then wonder why they can't tgl> start replication. FWIW, I'm on the noah's side. One reason of that is that if we implement the in-place setting relation persistence feature for bulk-data loading, wal_level would get flipped-then-back between minimal and replica or logical. The restriction about max_wal_senders is the pain n the ass in that case.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center