Обсуждение: Issue enabling track_counts to launch autovacuum in 9.4.5

Поиск
Список
Период
Сортировка

Issue enabling track_counts to launch autovacuum in 9.4.5

От
Derek Elder
Дата:
Good day,

(I apologize if this isn't the right place for this, I haven't used the mailing list before and I'm not a Postgres expert.)

We've run into an issue where autovacuum is not running on one of our servers using 9.4.5.

We discovered that track_counts appears to be off:

2016-03-02 14:58:09 EST [14366][2-1] DEBUG: loaded library "repmgr_funcs"
2016-03-02 14:58:09 EST [14366][3-1] DEBUG: SlruScanDirectory invoking callback on pg_notify/0000
2016-03-02 14:58:09 EST [14366][4-1] DEBUG: removing file "pg_notify/0000"
2016-03-02 14:58:09 EST [14366][5-1] DEBUG: dynamic shared memory system will support 690 segments
2016-03-02 14:58:09 EST [14366][6-1] DEBUG: created dynamic shared memory control segment 1804289383 (5532 bytes)
2016-03-02 14:58:09 EST [14366][7-1] DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
2016-03-02 14:58:09 EST [14366][8-1] LOG: could not resolve "localhost": Name or service not known
2016-03-02 14:58:09 EST [14366][9-1] LOG: disabling statistics collector for lack of working socket
2016-03-02 14:58:09 EST [14366][10-1] WARNING: autovacuum not started because of misconfiguration
2016-03-02 14:58:09 EST [14366][11-1] HINT: Enable the "track_counts" option.
2016-03-02 14:58:09 EST [14366][12-1] LOG: database system is ready to accept connections
2016-03-02 14:58:10 EST [14366][13-1] DEBUG: forked new backend, pid=14377 socket=8
2016-03-02 14:58:10 EST [14366][14-1] DEBUG: server process (PID 14377) exited with exit code 0
2016-03-02 14:58:10 EST [14366][15-1] DEBUG: forked new backend, pid=14379 socket=8
2016-03-02 14:58:10 EST [14366][16-1] DEBUG: server process (PID 14379) exited with exit code 0
2016-03-02 15:00:01 EST [14366][17-1] DEBUG: forked new backend, pid=14425 socket=8
2016-03-02 15:00:01 EST [14366][18-1] DEBUG: server process (PID 14425) exited with exit code 0
2016-03-02 15:03:26 EST [14366][19-1] DEBUG: postmaster received signal 2
2016-03-02 15:03:26 EST [14366][20-1] LOG: received fast shutdown request
2016-03-02 15:03:26 EST [14366][21-1] LOG: aborting any active transactions
2016-03-02 15:03:26 EST [14366][22-1] DEBUG: cleaning up dynamic shared memory control segment with ID 1804289383


show autovacuum;
autovacuum 
------------
on

show track_counts;
track_counts 
--------------
off



From what I had read, this setting should be on by default. When I checked our other servers I see that track_counts is on and the autovacuum process is working correctly on them. Indeed we don't even have the setting explicitly listed in our postgresql.conf on these servers.

We first tried a simple restart, but that didn't work.

Next I attempted to add track_counts = on to the postgresql.conf and reload it, but the setting doesn't seem to take effect.

I even tried "SET track_counts = on;", but that only works in the current session and doesn't persist.

Is there any other way of turning this setting on or is there something somewhere that could be preventing track_counts from being enabled correctly?

Thank you very much,

Derek

This message, and any documents attached hereto, may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally  privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter.  Thank you for your cooperation.

Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
Alvaro Herrera
Дата:
Derek Elder wrote:

> From what I had read, this setting should be on by default. When I checked
> our other servers I see that track_counts is on and the autovacuum process
> is working correctly on them. Indeed we don't even have the setting
> explicitly listed in our postgresql.conf on these servers.
>
> We first tried a simple restart, but that didn't work.

Try
  ALTER SYSTEM RESET track_counts;

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
"David G. Johnston"
Дата:

On Wed, Mar 2, 2016 at 2:29 PM, Derek Elder <dereke@mirthcorp.com> wrote:

2016-03-02 14:58:09 EST [14366][8-1] LOG: could not resolve "localhost": Name or service not known
2016-03-02 14:58:09 EST [14366][9-1] LOG: disabling statistics collector for lack of working socket

I'm reasonably certain the above is the root of the problem.
 

2016-03-02 14:58:09 EST [14366][10-1] WARNING: autovacuum not started because of misconfiguration
2016-03-02 14:58:09 EST [14366][11-1] HINT: Enable the "track_counts" option.

​This is simply a symptom.  The hint is misleading in this situation.

Someone more informed can hopefully offer more guidance fixing your localhost ​resolution problem.

David J.

Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
Derek Elder
Дата:
That was indeed the root cause. The /etc/hosts file on the server had incorrect permissions which caused localhost to not resolve.

Going to file this away in the knowledge base. Thank you so much for the help David!

Derek

On Wed, Mar 2, 2016 at 1:37 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Wed, Mar 2, 2016 at 2:29 PM, Derek Elder <dereke@mirthcorp.com> wrote:

2016-03-02 14:58:09 EST [14366][8-1] LOG: could not resolve "localhost": Name or service not known
2016-03-02 14:58:09 EST [14366][9-1] LOG: disabling statistics collector for lack of working socket

I'm reasonably certain the above is the root of the problem.
 

2016-03-02 14:58:09 EST [14366][10-1] WARNING: autovacuum not started because of misconfiguration
2016-03-02 14:58:09 EST [14366][11-1] HINT: Enable the "track_counts" option.

​This is simply a symptom.  The hint is misleading in this situation.

Someone more informed can hopefully offer more guidance fixing your localhost ​resolution problem.

David J.



This message, and any documents attached hereto, may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally  privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter.  Thank you for your cooperation.

Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
Derek Elder
Дата:
The root cause ended up being an /etc/hosts file with incorrect permissions, but I'll file this command away in the knowledge base.

Thanks for the assist Alvaro!

Derek

On Wed, Mar 2, 2016 at 1:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Derek Elder wrote:

> From what I had read, this setting should be on by default. When I checked
> our other servers I see that track_counts is on and the autovacuum process
> is working correctly on them. Indeed we don't even have the setting
> explicitly listed in our postgresql.conf on these servers.
>
> We first tried a simple restart, but that didn't work.

Try
  ALTER SYSTEM RESET track_counts;

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


This message, and any documents attached hereto, may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally  privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter.  Thank you for your cooperation.

Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
Tom Lane
Дата:
Derek Elder <dereke@mirthcorp.com> writes:
> That was indeed the root cause. The /etc/hosts file on the server had
> incorrect permissions which caused localhost to not resolve.

It strikes me that this should not have been so hard to solve.  The
stats collector was trying to tell you what was wrong, but evidently
you could not interpret those messages correctly.  I am thinking that
we need to do some work on the message wording; or maybe there is one
more message that needs to be emitted so you can follow the causal
chain?

In particular, perhaps it wasn't immediately obvious that the first
of these messages was the cause of the second:

> 2016-03-02 14:58:09 EST [14366]: [8-1] LOG: could not resolve "localhost": Name or service not known
> 2016-03-02 14:58:09 EST [14366]: [9-1] LOG: disabling statistics collector for lack of working socket

in which case maybe we could rephrase the first message along the
lines of "could not resolve "localhost" to establish statistics
collector socket: <strerror detail here>".  (There are a few other
messages in the same area that would need to be changed similarly.)

Or maybe the problem was that when we forced track_counts off because of
no stats collector, we didn't emit any bleat noting that, which if we had
might have led you to realize that the above messages were the direct
cause of the next one:

> 2016-03-02 14:58:09 EST [14366]: [10-1] WARNING: autovacuum not started because of misconfiguration
> 2016-03-02 14:58:09 EST [14366]: [11-1] HINT: Enable the "track_counts" option.

Or both changes, or something else entirely?

I'd be interested to hear how you perceived these log messages and
what you think might help the next person.

            regards, tom lane


Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
"David G. Johnston"
Дата:
On Wed, Mar 2, 2016 at 3:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Derek Elder <dereke@mirthcorp.com> writes:
> That was indeed the root cause. The /etc/hosts file on the server had
> incorrect permissions which caused localhost to not resolve.

It strikes me that this should not have been so hard to solve.  The
stats collector was trying to tell you what was wrong, but evidently
you could not interpret those messages correctly.  I am thinking that
we need to do some work on the message wording; or maybe there is one
more message that needs to be emitted so you can follow the causal
chain?

In particular, perhaps it wasn't immediately obvious that the first
of these messages was the cause of the second:

> 2016-03-02 14:58:09 EST [14366]: [8-1] LOG: could not resolve "localhost": Name or service not known
> 2016-03-02 14:58:09 EST [14366]: [9-1] LOG: disabling statistics collector for lack of working socket

in which case maybe we could rephrase the first message along the
lines of "could not resolve "localhost" to establish statistics
collector socket: <strerror detail here>".  (There are a few other
messages in the same area that would need to be changed similarly.)

Or maybe the problem was that when we forced track_counts off because of
no stats collector, we didn't emit any bleat noting that, which if we had
might have led you to realize that the above messages were the direct
cause of the next one:

> 2016-03-02 14:58:09 EST [14366]: [10-1] WARNING: autovacuum not started because of misconfiguration
> 2016-03-02 14:58:09 EST [14366]: [11-1] HINT: Enable the "track_counts" option.

Or both changes, or something else entirely?

I'd be interested to hear how you perceived these log messages and
what you think might help the next person.

​The fact that the first two are only LOG level and not WARNING would seems like the easiest improvement to make.  I had the benefit of basically knowing track_counts was a red-herring given the provided context so I went and started looking at anything preceding the first warning that could give me a hint as to the nature of the "misconfiguration".

It probably would help to specify, if known, whether the suspected mis-configuration is external or internal to PostgreSQL - i.e., do I need to fix postgres.conf or is something external (like the hosts file) to blame.  In this case since we don't control "localhost" it would be "external misconfiguration".

This also doesn't help:
show autovacuum;
autovacuum 
------------
on

Why do we indirectly disable autovacuum via disabling one of its required parameters instead of just disabling the main property.  I don't suppose we can add a third option (on, off, broken) to this which would allow distinguishing between a user-specified condition (off) and a system imposed one (broken).

This is getting a bit deep for a rare problem like this - I think that making ​the root messages WARNING (or ERROR) instead of info (and ideally linking the two explicitly if possible) would have the desired effect of pointing the user to the first thing they need to fix - and assume they would ignore all subsequent messages (and hints) until the first one is handled (i.e. use good trouble-shooting practices).  The hint and the change to track_counts then becomes a non-issue.


David J.

Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
"Joshua D. Drake"
Дата:
On 03/02/2016 02:49 PM, Tom Lane wrote:

> Or maybe the problem was that when we forced track_counts off because of
> no stats collector, we didn't emit any bleat noting that, which if we had
> might have led you to realize that the above messages were the direct
> cause of the next one:
>
>> 2016-03-02 14:58:09 EST [14366]: [10-1] WARNING: autovacuum not started because of misconfiguration
>> 2016-03-02 14:58:09 EST [14366]: [11-1] HINT: Enable the "track_counts" option.
>
> Or both changes, or something else entirely?
>
> I'd be interested to hear how you perceived these log messages and
> what you think might help the next person.

I can tell you that as I read them, they meant nothing and I know what
they mean. This is why:

2016-03-02 14:58:09 EST [14366]: [8-1] LOG: could not resolve
"localhost": Name or service not known
2016-03-02 14:58:09 EST [14366]: [9-1] LOG: disabling statistics
collector for lack of working socket

Why would I care about a "LOG" message. Generally speaking they don't
tell us anything useful. If, it was:

ERROR: could not resolve "localhost" ...
ERROR: Disabling statistics collector ....

It would mean a whole lot more. The second set makes sense to me but
what isn't clear is that the first one is really an error that can cause
the warnings of the second.

I do think that it should be an error if we are disabling statistics due
to something being wrong. We can't really operate properly without them.

Sincerely,

JD

--
Command Prompt, Inc.                  http://the.postgres.company/
                         +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> ​The fact that the first two are only LOG level and not WARNING would seems
> like the easiest improvement to make.

Unfortunately, that would be a disimprovement, because in many common
configurations WARNING messages don't appear in the postmaster log at all.
In fact, I'd say it's a bug that the "autovacuum not started" message is
emitted as a WARNING not LOG.

Maybe we need some way of printing something at LOG priority but having
the printed text be WARNING.  I'm afraid that might cause about as much
confusion as it would solve, though.

> It probably would help to specify, if known, whether the suspected
> mis-configuration is external or internal to PostgreSQL - i.e., do I need
> to fix postgres.conf or is something external (like the hosts file) to
> blame.

In the case of a name resolution failure, the problem is certainly
external to Postgres, but we don't have enough information to say more
than that.  We could print a hint guessing at causes (like broken
/etc/host or /etc/resolv.conf files), but it would be guesses --- and
I'm afraid there's enough cross-system variation in the way this stuff is
configured that any hint would be likely to just be misleading.

> This is getting a bit deep for a rare problem like this - I think that
> making ​the root messages WARNING (or ERROR)

ERROR would mean that the postmaster fails to start at all.  That doesn't
seem like an improvement either.

            regards, tom lane


Re: Issue enabling track_counts to launch autovacuum in 9.4.5

От
"David G. Johnston"
Дата:
On Wed, Mar 2, 2016 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> ​The fact that the first two are only LOG level and not WARNING would seems
> like the easiest improvement to make.

Unfortunately, that would be a disimprovement, because in many common
configurations WARNING messages don't appear in the postmaster log at all.
In fact, I'd say it's a bug that the "autovacuum not started" message is
emitted as a WARNING not LOG.

Maybe we need some way of printing something at LOG priority but having
the printed text be WARNING.  I'm afraid that might cause about as much
confusion as it would solve, though.

​Yes, I recall this unusual situation now...​

In this specific case, when we know that "WARNING: autovacuum not started because of misconfiguration" was emitted, if the previous two messages were also at WARNING would they have been emitted as well?


> It probably would help to specify, if known, whether the suspected
> mis-configuration is external or internal to PostgreSQL - i.e., do I need
> to fix postgres.conf or is something external (like the hosts file) to
> blame.

In the case of a name resolution failure, the problem is certainly
external to Postgres, but we don't have enough information to say more
than that.  We could print a hint guessing at causes (like broken
/etc/host or /etc/resolv.conf files), but it would be guesses --- and
I'm afraid there's enough cross-system variation in the way this stuff is
configured that any hint would be likely to just be misleading.

​I was trying to restrict it to simply internal/external though - I wouldn't care where the resolution comes other than we known that nothing in PostgreSQL is involved as a server, only as a client.​  So saying "its not our fault" seems appropriate and sufficient so the user doesn't spend time with ALTER SYSTEM or editing the configuration file.


> This is getting a bit deep for a rare problem like this - I think that
> making ​the root messages WARNING (or ERROR)

ERROR would mean that the postmaster fails to start at all.  That doesn't
seem like an improvement either.


Its only a problem if the postmaster starts and we emit error...​do we have FATAL that could imply the postmaster doesn't start and use ERROR if one of the optional related processes (statistics, auto-vacuum) doesn't start?

​David J.