Обсуждение: [HACKERS] Active zombies at AIX

Поиск
Список
Период
Сортировка

[HACKERS] Active zombies at AIX

От
Konstantin Knizhnik
Дата:
Hi hackers,

Yet another story about AIX. For some reasons AIX very slowly cleaning zombie processes.
If we launch pgbench with -C parameter then very soon limit for maximal number of connections is exhausted.
If maximal number of connection is set to 1000, then after ten seconds of pgbench activity we get about 900 zombie processes and it takes about 100 seconds (!)
before all of them are terminated.

proctree shows a lot of defunt processes:

[14:44:41]root@postgres:~ # proctree 26084446
26084446 /opt/postgresql/xlc/9.6/bin/postgres -D /postg_fs/postgresql/xlc
4784362 <defunct>
4980786 <defunct>
11403448 <defunct>
11468930 <defunct>
11993176 <defunct>
12189710 <defunct>
12517390 <defunct>
13238374 <defunct>
13565974 <defunct>
13893826 postgres: wal writer process
14024716 <defunct>
15401000 <defunct>
...
25691556 <defunct> But ps shows that status of process is <existing>

[14:46:02]root@postgres:~ # ps -elk | grep 25691556

  • A - 25691556 - - - - - <exiting>

Breakpoint set in reaper() function in postmaster shows that each invocation of this functions (called by SIGCHLD handler) proceed 5-10 PIDS per invocation.
So there are two hypothesis: either AIX is very slowly delivering SIGCHLD to parent, either exit of process takes too much time.

The fact the backends are in exiting state makes second hypothesis more reliable.
We have tried different Postgres configurations with local and TCP sockets, with different amount of shared buffers and built both with gcc and xlc.
In all cases behavior is similar: zombies do not want to die.
As far as it is not possible to attach debugger to defunct process, it is not clear how to understand what's going on.

I wonder if somebody has encountered similar problems at AIX and may be can suggest some solution to solve this problem.
Thanks in advance
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Active zombies at AIX

От
Tom Lane
Дата:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> But ps shows that status of process is <existing>
> [14:46:02]root@postgres:~ # ps -elk | grep 25691556
>   * A - 25691556 - - - - - <exiting>

As far as I could find by googling, this means that the process is
not actually a zombie yet, so it's not the postmaster's fault.

Apparently it's possible in some versions of AIX for an exiting process to
get stuck while releasing its reference to a socket, though I couldn't
find much detail about that.  I wonder how old your AIX is ...
        regards, tom lane



Re: [HACKERS] Active zombies at AIX

От
Konstantin Knizhnik
Дата:

On 24.01.2017 18:26, Tom Lane wrote:
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>> But ps shows that status of process is <existing>
>> [14:46:02]root@postgres:~ # ps -elk | grep 25691556
>>    * A - 25691556 - - - - - <exiting>
> As far as I could find by googling, this means that the process is
> not actually a zombie yet, so it's not the postmaster's fault.
>
> Apparently it's possible in some versions of AIX for an exiting process to
> get stuck while releasing its reference to a socket, though I couldn't
> find much detail about that.  I wonder how old your AIX is ...

It is AIX 7.1 (I expect that it is most recent version of AIX).


>
>             regards, tom lane

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Active zombies at AIX

От
Konstantin Knizhnik
Дата:
Last update on the problem.
Using kdb tool (thank's to Tony Reix for advice and help) we get the following trace of Poastgres backend in existing stack trace:

pvthread+073000 STACK:
[005E1958]slock+000578 (00000000005E1958, 8000000000001032 [??])
[00009558].simple_lock+000058 ()
[00651DBC]vm_relalias+00019C (??, ??, ??, ??, ??)
[006544AC]vm_map_entry_delete+00074C (??, ??, ??)
[00659C30]vm_map_delete+000150 (??, ??, ??, ??)
[00659D88]vm_map_deallocate+000048 (??, ??)
[0011C588]kexitx+001408 (??)
[000BB08C]kexit+00008C ()
___ Recovery (FFFFFFFFFFF9290) ___
WARNING: Eyecatcher/version mismatch in RWA


So there seems to be lock contention while unmapping memory segments.
My assumption was that Postgres is detaching all attached segments before exit (in shmem_exit callback or earlier).
I have added logging around proc_exit_prepare function (which is called by atexit callback) and check that it completes immediately.
So I thought that this vm_map_deallocate can be related with deallocation of normal (malloced) memory, because in Linux memory allocator may use mmap.
But in AIX it is not true.
Below is report of Bergamini Demien (once again a lot of thanks  for help with investigation the problem):

The memory allocator in AIX libc does not use mmap and vm_relalias() is only called for shared memory mappings.

I talked with the AIX VMM expert at IBM and he said that what you hit is one of the most common performance bottlenecks in AIX memory management.

He also said that SysV Shared Memory (shmget/shmat) perform better on AIX than mmap.

Some improvements have been made in AIX 6.1 (see “perf suffers when procs sharing the same segs all exit at once”: http://www-01.ibm.com/support/docview.wss?uid=isg1IZ83819) but it does not help in your case.

In src/backend/port/sysv_shmem.c, it says that PostgreSQL 9.3 switched from using SysV Shared Memory to using mmap.

Maybe you could try to switch back to using SysV Shared Memory on AIX to see if it helps performance-wise.

Also, the good news is that there are some restricted tunables in AIX that can be tweaked to help different workloads which may have different demands.

One of them is relalias_percentage which works with force_relalias_lite:

# vmo -h relalias_percentage

Help for tunable relalias_percentage:

Purpose:

If force_relalias_lite is set to 0, then this specifies the factor used in the heuristic to decide whether to avoid locking the source mmapped segment or not.

Values:

        Default: 0
        Range: 0 - 32767
        Type: Dynamic
        Unit:

Tuning:

This is used when tearing down an mmapped region and is a scalability statement, where avoiding the lock may help system throughput, but, in some cases, at the cost of more compute time used. If the number of pages being unmapped is less than this value divided by 100 and multiplied by the total number of pages in memory in the source mmapped segment, then the source lock will be avoided. A value of 0 for relalias_percentage, with force_relalias_lite also set to 0, will cause the source segment lock to always be taken. Effective values for relalias_percentage will vary by workload, however, a suggested value is: 200.

 
You may also try to play with the munmap_npages vmo tunable.
Your vmo settings for lgpg_size, lgpg_regions and v_pinshm already seem correct.


On 24.01.2017 18:08, Konstantin Knizhnik wrote:
Hi hackers,

Yet another story about AIX. For some reasons AIX very slowly cleaning zombie processes.
If we launch pgbench with -C parameter then very soon limit for maximal number of connections is exhausted.
If maximal number of connection is set to 1000, then after ten seconds of pgbench activity we get about 900 zombie processes and it takes about 100 seconds (!)
before all of them are terminated.

proctree shows a lot of defunt processes:

[14:44:41]root@postgres:~ # proctree 26084446
26084446 /opt/postgresql/xlc/9.6/bin/postgres -D /postg_fs/postgresql/xlc
4784362 <defunct>
4980786 <defunct>
11403448 <defunct>
11468930 <defunct>
11993176 <defunct>
12189710 <defunct>
12517390 <defunct>
13238374 <defunct>
13565974 <defunct>
13893826 postgres: wal writer process
14024716 <defunct>
15401000 <defunct>
...
25691556 <defunct> But ps shows that status of process is <existing>

[14:46:02]root@postgres:~ # ps -elk | grep 25691556

  • A - 25691556 - - - - - <exiting>

Breakpoint set in reaper() function in postmaster shows that each invocation of this functions (called by SIGCHLD handler) proceed 5-10 PIDS per invocation.
So there are two hypothesis: either AIX is very slowly delivering SIGCHLD to parent, either exit of process takes too much time.

The fact the backends are in exiting state makes second hypothesis more reliable.
We have tried different Postgres configurations with local and TCP sockets, with different amount of shared buffers and built both with gcc and xlc.
In all cases behavior is similar: zombies do not want to die.
As far as it is not possible to attach debugger to defunct process, it is not clear how to understand what's going on.

I wonder if somebody has encountered similar problems at AIX and may be can suggest some solution to solve this problem.
Thanks in advance
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Active zombies at AIX

От
Konstantin Knizhnik
Дата:
I tried to rebuild Postgres without mmap and the problem disappears (pgbench with -C doesn't cause connection limit exhaustion any more).
Unfortunately there is no any convenient way to configure PostgreSQL not to use mmap.
I have to edit sysv_shmem.c file, commenting

#ifndef EXEC_BACKEND
#define USE_ANONYMOUS_SHMEM
#endif

I wonder why do we prohibit now configuration of Postgres without mmap?


On 06.02.2017 12:47, Konstantin Knizhnik wrote:
Last update on the problem.
Using kdb tool (thank's to Tony Reix for advice and help) we get the following trace of Poastgres backend in existing stack trace:

pvthread+073000 STACK:
[005E1958]slock+000578 (00000000005E1958, 8000000000001032 [??])
[00009558].simple_lock+000058 ()
[00651DBC]vm_relalias+00019C (??, ??, ??, ??, ??)
[006544AC]vm_map_entry_delete+00074C (??, ??, ??)
[00659C30]vm_map_delete+000150 (??, ??, ??, ??)
[00659D88]vm_map_deallocate+000048 (??, ??)
[0011C588]kexitx+001408 (??)
[000BB08C]kexit+00008C ()
___ Recovery (FFFFFFFFFFF9290) ___
WARNING: Eyecatcher/version mismatch in RWA


So there seems to be lock contention while unmapping memory segments.
My assumption was that Postgres is detaching all attached segments before exit (in shmem_exit callback or earlier).
I have added logging around proc_exit_prepare function (which is called by atexit callback) and check that it completes immediately.
So I thought that this vm_map_deallocate can be related with deallocation of normal (malloced) memory, because in Linux memory allocator may use mmap.
But in AIX it is not true.
Below is report of Bergamini Demien (once again a lot of thanks  for help with investigation the problem):

The memory allocator in AIX libc does not use mmap and vm_relalias() is only called for shared memory mappings.

I talked with the AIX VMM expert at IBM and he said that what you hit is one of the most common performance bottlenecks in AIX memory management.

He also said that SysV Shared Memory (shmget/shmat) perform better on AIX than mmap.

Some improvements have been made in AIX 6.1 (see “perf suffers when procs sharing the same segs all exit at once”: http://www-01.ibm.com/support/docview.wss?uid=isg1IZ83819) but it does not help in your case.

In src/backend/port/sysv_shmem.c, it says that PostgreSQL 9.3 switched from using SysV Shared Memory to using mmap.

Maybe you could try to switch back to using SysV Shared Memory on AIX to see if it helps performance-wise.

Also, the good news is that there are some restricted tunables in AIX that can be tweaked to help different workloads which may have different demands.

One of them is relalias_percentage which works with force_relalias_lite:

# vmo -h relalias_percentage

Help for tunable relalias_percentage:

Purpose:

If force_relalias_lite is set to 0, then this specifies the factor used in the heuristic to decide whether to avoid locking the source mmapped segment or not.

Values:

        Default: 0
        Range: 0 - 32767
        Type: Dynamic
        Unit:

Tuning:

This is used when tearing down an mmapped region and is a scalability statement, where avoiding the lock may help system throughput, but, in some cases, at the cost of more compute time used. If the number of pages being unmapped is less than this value divided by 100 and multiplied by the total number of pages in memory in the source mmapped segment, then the source lock will be avoided. A value of 0 for relalias_percentage, with force_relalias_lite also set to 0, will cause the source segment lock to always be taken. Effective values for relalias_percentage will vary by workload, however, a suggested value is: 200.

 
You may also try to play with the munmap_npages vmo tunable.
Your vmo settings for lgpg_size, lgpg_regions and v_pinshm already seem correct.


On 24.01.2017 18:08, Konstantin Knizhnik wrote:
Hi hackers,

Yet another story about AIX. For some reasons AIX very slowly cleaning zombie processes.
If we launch pgbench with -C parameter then very soon limit for maximal number of connections is exhausted.
If maximal number of connection is set to 1000, then after ten seconds of pgbench activity we get about 900 zombie processes and it takes about 100 seconds (!)
before all of them are terminated.

proctree shows a lot of defunt processes:

[14:44:41]root@postgres:~ # proctree 26084446
26084446 /opt/postgresql/xlc/9.6/bin/postgres -D /postg_fs/postgresql/xlc
4784362 <defunct>
4980786 <defunct>
11403448 <defunct>
11468930 <defunct>
11993176 <defunct>
12189710 <defunct>
12517390 <defunct>
13238374 <defunct>
13565974 <defunct>
13893826 postgres: wal writer process
14024716 <defunct>
15401000 <defunct>
...
25691556 <defunct> But ps shows that status of process is <existing>

[14:46:02]root@postgres:~ # ps -elk | grep 25691556

  • A - 25691556 - - - - - <exiting>

Breakpoint set in reaper() function in postmaster shows that each invocation of this functions (called by SIGCHLD handler) proceed 5-10 PIDS per invocation.
So there are two hypothesis: either AIX is very slowly delivering SIGCHLD to parent, either exit of process takes too much time.

The fact the backends are in exiting state makes second hypothesis more reliable.
We have tried different Postgres configurations with local and TCP sockets, with different amount of shared buffers and built both with gcc and xlc.
In all cases behavior is similar: zombies do not want to die.
As far as it is not possible to attach debugger to defunct process, it is not clear how to understand what's going on.

I wonder if somebody has encountered similar problems at AIX and may be can suggest some solution to solve this problem.
Thanks in advance
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Active zombies at AIX

От
Peter Eisentraut
Дата:
On 2/6/17 6:28 AM, Konstantin Knizhnik wrote:
> I wonder why do we prohibit now configuration of Postgres without mmap?

It's not really prohibited, but it's not something that people generally
need, and we want to keep the number of configuration variations low.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Active zombies at AIX

От
Andres Freund
Дата:
On 2017-02-06 15:39:10 -0500, Peter Eisentraut wrote:
> On 2/6/17 6:28 AM, Konstantin Knizhnik wrote:
> > I wonder why do we prohibit now configuration of Postgres without mmap?
> 
> It's not really prohibited, but it's not something that people generally
> need, and we want to keep the number of configuration variations low.

I think that was a fairly bad call. Making it hard to use anything but
mmap (on mmap supporting platforms) caused a fair bit of trouble and
performance regressions on several platforms by now (freebsd reported it
fairly quickly, and now aix), all to avoid a trivial amount of code and
one guc.

FWIW, there's a patch somewhere in the archive making it configurable.

- Andres



Re: [HACKERS] Active zombies at AIX

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-02-06 15:39:10 -0500, Peter Eisentraut wrote:
>> On 2/6/17 6:28 AM, Konstantin Knizhnik wrote:
>>> I wonder why do we prohibit now configuration of Postgres without mmap?

>> It's not really prohibited, but it's not something that people generally
>> need, and we want to keep the number of configuration variations low.

> I think that was a fairly bad call. Making it hard to use anything but
> mmap (on mmap supporting platforms) caused a fair bit of trouble and
> performance regressions on several platforms by now (freebsd reported it
> fairly quickly, and now aix), all to avoid a trivial amount of code and
> one guc.

> FWIW, there's a patch somewhere in the archive making it configurable.

Clearly we should do something, but I'm not sure that a GUC is the right
answer; far too few people would set it correctly.  I think it might be
better to have the per-platform "template" files decide whether to set
USE_ANONYMOUS_SHMEM or not.
        regards, tom lane



Re: [HACKERS] Active zombies at AIX

От
Andres Freund
Дата:
On 2017-02-06 16:06:25 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-02-06 15:39:10 -0500, Peter Eisentraut wrote:
> >> On 2/6/17 6:28 AM, Konstantin Knizhnik wrote:
> >>> I wonder why do we prohibit now configuration of Postgres without mmap?
> 
> >> It's not really prohibited, but it's not something that people generally
> >> need, and we want to keep the number of configuration variations low.
> 
> > I think that was a fairly bad call. Making it hard to use anything but
> > mmap (on mmap supporting platforms) caused a fair bit of trouble and
> > performance regressions on several platforms by now (freebsd reported it
> > fairly quickly, and now aix), all to avoid a trivial amount of code and
> > one guc.
> 
> > FWIW, there's a patch somewhere in the archive making it configurable.
> 
> Clearly we should do something, but I'm not sure that a GUC is the right
> answer; far too few people would set it correctly.  I think it might be
> better to have the per-platform "template" files decide whether to set
> USE_ANONYMOUS_SHMEM or not.

Well, sysv shmem will be less "comfortable" to use on those platforms
too. And you'll usually only hit the performance problems on bigger
installations. I don't think it'll be an improvement if after an upgrade
postgres doesn't work anymore because people have gotten used to not
having to configure sys shmem.

I suspect a better solution would be to have a list GUC with a platform
dependant default (i.e. sysv, anonymous on freebsd/aix; the other way
round on linux). At startup we'd then try those in order.

Regards,

Andres