Обсуждение: fails to start - pg_stat_tmp ?

Поиск
Список
Период
Сортировка

fails to start - pg_stat_tmp ?

От
lejeczek
Дата:
Hi guys.

I have simple cluster managed by a popular toolset, I believe, and I get a standby/slave node which after it was shutdown, rebooted, is unable to start, not by the that cluster-manager nor manually.
I get:
...
2023-11-10 10:18:45.491 UTC [54858] LOG:  starting PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-11-10 10:18:45.491 UTC [54858] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2023-11-10 10:18:45.491 UTC [54858] LOG:  listening on IPv6 address "::", port 5433
2023-11-10 10:18:45.495 UTC [54858] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2023-11-10 10:18:45.505 UTC [54860] LOG:  database system was interrupted while in recovery at log time 2023-11-10 09:53:18 UTC
2023-11-10 10:18:45.505 UTC [54860] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-11-10 10:18:45.540 UTC [54862] drwszelakiadmmdisc@drwszelaki_discourse FATAL:  the database system is starting up
2023-11-10 10:18:45.545 UTC [54863] drwszelakiadmmdisc@drwszelaki_discourse FATAL:  the database system is starting up
2023-11-10 10:18:45.795 UTC [54860] LOG:  entering standby mode
2023-11-10 10:18:45.799 UTC [54860] FATAL:  could not open directory "/var/run/postgresql/14-paf.pg_stat_tmp": No such file or directory
2023-11-10 10:18:45.800 UTC [54858] LOG:  startup process (PID 54860) exited with exit code 1
2023-11-10 10:18:45.800 UTC [54858] LOG:  aborting startup due to startup process failure
2023-11-10 10:18:45.802 UTC [54858] LOG:  database system is shut down


Indeed that dir, that path does not exist on the node - from what I can see, among the nodes comprising the cluster this path exists only on the _master_ node.

What would that specific errors be suggesting or meaning - do I have something mis-configured, perhaps the toolset itself does not deal with "mere" reboot well, or at all.
Toolset is HA/pcs(pacemaker) cluster tools.

many thanks, L.

Re: fails to start - pg_stat_tmp ?

От
Laurenz Albe
Дата:
On Fri, 2023-11-10 at 11:33 +0100, lejeczek wrote:
> I have simple cluster managed by a popular toolset, I believe, and I get a standby/slave node which
> after it was shutdown, rebooted, is unable to start, not by the that cluster-manager nor manually.

Thatks for telling us all the detail...

>  FATAL:  could not open directory "/var/run/postgresql/14-paf.pg_stat_tmp": No such file or directory
>  LOG:  startup process (PID 54860) exited with exit code 1
>
> Indeed that dir, that path does not exist on the node - from what I can see, among the nodes
> comprising the cluster this path exists only on the _master_ node.

Looks like somebody randomly deleted directories on that machine.

Create it again, and make sure it belongs to the postgres user and
nobody else has write access.

Yours,
Laurenz Albe



Re: fails to start - pg_stat_tmp ?

От
Scott Ribe
Дата:
/var/run is intended for volatile data--depending on files there being permanent is dangerous

There could easily be some "cleanup" utility removing whatever its idea of "old" files is




Re: fails to start - pg_stat_tmp ?

От
Ron
Дата:
On 11/10/23 06:45, Scott Ribe wrote:
> /var/run is intended for volatile data--depending on files there being permanent is dangerous
>
> There could easily be some "cleanup" utility removing whatever its idea of "old" files is

And wonder what other critical files were "cleaned up"...

-- 
Born in Arizona, moved to Babylonia.



Re: fails to start - pg_stat_tmp ?

От
lejeczek
Дата:

On 10/11/2023 13:20, Laurenz Albe wrote:
> On Fri, 2023-11-10 at 11:33 +0100, lejeczek wrote:
>> I have simple cluster managed by a popular toolset, I believe, and I get a standby/slave node which
>> after it was shutdown, rebooted, is unable to start, not by the that cluster-manager nor manually.
> Thatks for telling us all the detail...
>
>>   FATAL:  could not open directory "/var/run/postgresql/14-paf.pg_stat_tmp": No such file or directory
>>   LOG:  startup process (PID 54860) exited with exit code 1
>>
>> Indeed that dir, that path does not exist on the node - from what I can see, among the nodes
>> comprising the cluster this path exists only on the _master_ node.
> Looks like somebody randomly deleted directories on that machine.
>
> Create it again, and make sure it belongs to the postgres user and
> nobody else has write access.
>
> Yours,
> Laurenz Albe
>
>
Indeed something removes at/after reboot the path, last bit 
of with stats - I started fiddling with _systemd_ tmpfiles 
which I thought was the culprit but it seems systemd neither 
creates nor cleans it.
I also have bucardo - cleaned it up, did not help nor after 
I completely removed bucardo
pg_ctlcluster start - does work, with this wrapper path gets 
created and server starts
OS journal boot logs do not mention this path neither.
It's a true sticky-wicket for me right now - to add, other 
two virtually identical nodes seem free from this issue.