Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Still meditating on this ... and it strikes me that the pgstat.c code
>> is really uncommunicative about problems. In particular,
>> pgstat_read_statsfile_timestamp and pgstat_read_statsfile don't complain
>> at all about being unable to read a stats file.
> Yeah, I had the same thought.
OK, I'll add some logging.
>> Lastly, backend_read_statsfile is designed to send an inquiry message
>> every time through the loop, ie, every 10 msec. This is said to be in
>> case the stats collector drops one. But is this enough to flood the
>> collector and make things worse? I wonder if there should be some
>> backoff there.
> I also think the autovacuum worker minimum timestamp may be playing
> games with the retry logic too. Maybe a worker is requesting a new file
> continuously because pgstat is not able to provide one before the
> deadline is past, and thus overloading it. I still think that 500ms is
> too much for a worker, but backing off all the way to 10ms seems too
> much. Maybe it should just be, say, 100ms.
But we don't advance the deadline within the wait loop, so (in theory)
a single requestor shouldn't be able to trigger more than one stats file
update. I wonder though if an autovac worker could make many such
requests over its lifespan ...
regards, tom lane