Обсуждение: Interpreting autovacuum logs (9.6)

Поиск
Список
Период
Сортировка

Interpreting autovacuum logs (9.6)

От
Gabe Kopley
Дата:
Hi all,

Please see this graph of data points extracted from autovacuum logs: https://imgur.com/a/OCRKoDn . It's from a 9.6 instance with default params for autovacuum with the exceptions of autovacuum_work_mem=10000 and log_autovacuum_min_duration=100.

1. How should we interpret the # tuples remain reported by the autovac logs? The source code says it's the "estimated total # of tuples" which to me means # dead + # live. But that is invalidated by the pattern here where the orange points (# tuples remain) are dramatically higher than # dead not removable (blue points) + # dead removed (green points) + # live (which never exceeded 1M during this entire interval, per count query).

2. Beginning around the first 6/19 tick, what could be causing # tuples remain to drop steeply after periods of growth when # tuples removed is 0? I confirmed there was no truncation. And what could the # tuples remain recurrent asymptote at ~22M mean?

(further context for those curious: the discontinuity at 6/23 is due to an individual autovacuum run getting stuck. After manually killing that run, the next one succeeded and you see that reporting toward the right of the graph)

Thanks!

Gabe


Re: Interpreting autovacuum logs (9.6)

От
Adrian Klaver
Дата:
On 6/26/20 11:47 AM, Gabe Kopley wrote:
> Hi all,
> 
> Please see this graph of data points extracted from autovacuum logs: 
> https://imgur.com/a/OCRKoDn . It's from a 9.6 instance with default 
> params for autovacuum with the exceptions of autovacuum_work_mem=10000 
> and log_autovacuum_min_duration=100.
> 
> 1. How should we interpret the # tuples remain reported by the autovac 
> logs? The source code says it's the "estimated total # of tuples" which 
> to me means # dead + # live. But that is invalidated by the pattern here 
> where the orange points (# tuples remain) are dramatically higher than # 
> dead not removable (blue points) + # dead removed (green points) + # 
> live (which never exceeded 1M during this entire interval, per count query).
> 
> 2. Beginning around the first 6/19 tick, what could be causing # tuples 
> remain to drop steeply after periods of growth when # tuples removed is 
> 0? I confirmed there was no truncation. And what could the # tuples 
> remain recurrent asymptote at ~22M mean?

AFAIK there is not dedicated autovac log, so something is pulling this 
out of the Postgres log correct?

What is the program that is doing that and what is the raw output?

> 
> (further context for those curious: the discontinuity at 6/23 is due to 
> an individual autovacuum run getting stuck. After manually killing that 
> run, the next one succeeded and you see that reporting toward the right 
> of the graph)
> 
> Thanks!
> 
> Gabe
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Interpreting autovacuum logs (9.6)

От
Gabe Kopley
Дата:
That's right Adrian, no dedicated log, but rather the messages to the Postgres log emitted I believe here: https://github.com/postgres/postgres/blob/REL9_6_18/src/backend/commands/vacuumlazy.c#L382 . I'm just using a regex to pull out those [removed, remain, dead not removable] values and plot them.

On Fri, Jun 26, 2020 at 12:02 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 6/26/20 11:47 AM, Gabe Kopley wrote:
> Hi all,
>
> Please see this graph of data points extracted from autovacuum logs:
> https://imgur.com/a/OCRKoDn . It's from a 9.6 instance with default
> params for autovacuum with the exceptions of autovacuum_work_mem=10000
> and log_autovacuum_min_duration=100.
>
> 1. How should we interpret the # tuples remain reported by the autovac
> logs? The source code says it's the "estimated total # of tuples" which
> to me means # dead + # live. But that is invalidated by the pattern here
> where the orange points (# tuples remain) are dramatically higher than #
> dead not removable (blue points) + # dead removed (green points) + #
> live (which never exceeded 1M during this entire interval, per count query).
>
> 2. Beginning around the first 6/19 tick, what could be causing # tuples
> remain to drop steeply after periods of growth when # tuples removed is
> 0? I confirmed there was no truncation. And what could the # tuples
> remain recurrent asymptote at ~22M mean?

AFAIK there is not dedicated autovac log, so something is pulling this
out of the Postgres log correct?

What is the program that is doing that and what is the raw output?

>
> (further context for those curious: the discontinuity at 6/23 is due to
> an individual autovacuum run getting stuck. After manually killing that
> run, the next one succeeded and you see that reporting toward the right
> of the graph)
>
> Thanks!
>
> Gabe
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Interpreting autovacuum logs (9.6)

От
Adrian Klaver
Дата:
On 6/26/20 12:13 PM, Gabe Kopley wrote:
> That's right Adrian, no dedicated log, but rather the messages to the 
> Postgres log emitted I believe here: 
> https://github.com/postgres/postgres/blob/REL9_6_18/src/backend/commands/vacuumlazy.c#L382 
> . I'm just using a regex to pull out those [removed, remain, dead not 
> removable] values and plot them.

Further on in the source there is:

_("tuples: %.0f removed, %.0f remain, %.0f are dead but not yet 
removable, oldest xmin: %u\n"),
 
vacrelstats->tuples_deleted,
 
vacrelstats->new_rel_tuples,
 
vacrelstats->new_dead_tuples,
                                                          OldestXmin);

To me that tracks with your graph. The orange line is both the new live 
and dead tuples and the difference to the blue line is the number of new 
live tuples relative to the number of new dead tuples. The flat line for 
tuples removed seems to track the fact that the other tuples lines 
increase because the VACUUM process was not working. The two data points 
at the far right, when the auto VACUUM kicked in, back that up.

> 
> On Fri, Jun 26, 2020 at 12:02 PM Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 6/26/20 11:47 AM, Gabe Kopley wrote:
>      > Hi all,
>      >
>      > Please see this graph of data points extracted from autovacuum logs:
>      > https://imgur.com/a/OCRKoDn . It's from a 9.6 instance with default
>      > params for autovacuum with the exceptions of
>     autovacuum_work_mem=10000
>      > and log_autovacuum_min_duration=100.
>      >
>      > 1. How should we interpret the # tuples remain reported by the
>     autovac
>      > logs? The source code says it's the "estimated total # of tuples"
>     which
>      > to me means # dead + # live. But that is invalidated by the
>     pattern here
>      > where the orange points (# tuples remain) are dramatically higher
>     than #
>      > dead not removable (blue points) + # dead removed (green points) + #
>      > live (which never exceeded 1M during this entire interval, per
>     count query).
>      >
>      > 2. Beginning around the first 6/19 tick, what could be causing #
>     tuples
>      > remain to drop steeply after periods of growth when # tuples
>     removed is
>      > 0? I confirmed there was no truncation. And what could the # tuples
>      > remain recurrent asymptote at ~22M mean?
> 
>     AFAIK there is not dedicated autovac log, so something is pulling this
>     out of the Postgres log correct?
> 
>     What is the program that is doing that and what is the raw output?
> 
>      >
>      > (further context for those curious: the discontinuity at 6/23 is
>     due to
>      > an individual autovacuum run getting stuck. After manually
>     killing that
>      > run, the next one succeeded and you see that reporting toward the
>     right
>      > of the graph)
>      >
>      > Thanks!
>      >
>      > Gabe
>      >
>      >
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com