Обсуждение: Postgres service stops when I kill client backend on Windows

Поиск

Список

Период

Сортировка

Postgres service stops when I kill client backend on Windows

От

Dmitry Vasilyev

Дата:

09 октября 2015 г., 12:52:48

I’ve started PostgreSQL server on Windows and then I kill client
backend’s process by taskkill the service was stopped: 

postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
           1976

postgres=# \! taskkill /pid 1976 /f
SUCCESS: The process with PID 1976 has been terminated.
postgres=# select 1;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>


If I kill backend’s process on Linux then service not failing. So
what’s the problem? Why PostgreSQL is so strange on Windows?


------
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Postgres service stops when I kill client backend on Windows

От

"Charles Clavadetscher"

Дата:

09 октября 2015 г., 13:14:08

Hello Dmitry

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Dmitry Vasilyev
> Sent: Freitag, 9. Oktober 2015 11:52
> To: pgsql-hackers@postgresql.org
> Subject: [HACKERS] Postgres service stops when I kill client backend on Windows
>
> I’ve started PostgreSQL server on Windows and then I kill client
> backend’s process by taskkill the service was stopped:
>
> postgres=# select pg_backend_pid();
>  pg_backend_pid
> ----------------
>            1976
>
> postgres=# \! taskkill /pid 1976 /f
> SUCCESS: The process with PID 1976 has been terminated.
> postgres=# select 1;
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
>
> If I kill backend’s process on Linux then service not failing. So
> what’s the problem? Why PostgreSQL is so strange on Windows?

I can't say what happens on windows, but I don't undestand either why you want to kill the session you are in.
Besides that why don't you use pg_terminate_backend?

db=> select pg_backend_pid();pg_backend_pid
----------------          8808
(1 row)

db=> select pg_terminate_backend(8808);
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request. 
The connection to the server was lost. Attempting reset: Succeeded.
db=> select pg_backend_pid();pg_backend_pid
----------------          8500
(1 row)

Regards
Charles

>
>
> ------
> Dmitry Vasilyev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Postgres service stops when I kill client backend on Windows

От

Dmitry Vasilyev

Дата:

09 октября 2015 г., 13:26:01

This code stoped server too:

postgres=# do $$ unpack p,1x8 $$ language plperlu;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>




------> > 
> Hello Dmitry
> 
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owne
> > r@postgresql.org] On Behalf Of Dmitry Vasilyev
> > Sent: Freitag, 9. Oktober 2015 11:52
> > To: pgsql-hackers@postgresql.org
> > Subject: [HACKERS] Postgres service stops when I kill client
> > backend on Windows
> > 
> > I’ve started PostgreSQL server on Windows and then I kill client
> > backend’s process by taskkill the service was stopped:
> > 
> > postgres=# select pg_backend_pid();
> >  pg_backend_pid
> > ----------------
> >            1976
> > 
> > postgres=# \! taskkill /pid 1976 /f
> > SUCCESS: The process with PID 1976 has been terminated.
> > postgres=# select 1;
> > server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
> > !>
> > 
> > 
> > If I kill backend’s process on Linux then service not failing. So
> > what’s the problem? Why PostgreSQL is so strange on Windows?
> 
> I can't say what happens on windows, but I don't undestand either why
> you want to kill the session you are in.
> Besides that why don't you use pg_terminate_backend?
> 
> db=> select pg_backend_pid();
>  pg_backend_pid
> ----------------
>            8808
> (1 row)
> 
> db=> select pg_terminate_backend(8808);
> FATAL:  terminating connection due to administrator command
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> db=> select pg_backend_pid();
>  pg_backend_pid
> ----------------
>            8500
> (1 row)
> 
> Regards
> Charles
> 
> > 
> > 
> > ------
> > Dmitry Vasilyev
> > Postgres Professional: http://www.postgrespro.com
> > The Russian Postgres Company
> > 
> > 
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 
>

Re: Postgres service stops when I kill client backend on Windows

От

Robert Haas

Дата:

10 октября 2015 г., 02:43:17

On Fri, Oct 9, 2015 at 5:52 AM, Dmitry Vasilyev
<d.vasilyev@postgrespro.ru> wrote:
> I’ve started PostgreSQL server on Windows and then I kill client
> backend’s process by taskkill the service was stopped:
>
> postgres=# select pg_backend_pid();
>  pg_backend_pid
> ----------------
>            1976
>
> postgres=# \! taskkill /pid 1976 /f
> SUCCESS: The process with PID 1976 has been terminated.
> postgres=# select 1;
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
>
> If I kill backend’s process on Linux then service not failing. So
> what’s the problem? Why PostgreSQL is so strange on Windows?

Hmm.  I'd expect that to cause a crash-and-restart cycle, just like a
SIGQUIT would cause a crash-and-restart cycle on Linux.  But I would
expect the server to end up running again at the end, not stopped.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

10 октября 2015 г., 18:24:00

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Oct 9, 2015 at 5:52 AM, Dmitry Vasilyev
>> postgres=# select 1;
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.

> Hmm.  I'd expect that to cause a crash-and-restart cycle, just like a
> SIGQUIT would cause a crash-and-restart cycle on Linux.  But I would
> expect the server to end up running again at the end, not stopped.

It *is* a crash and restart cycle, or at least no evidence to the
contrary has been provided.

Whether psql's attempt to do an immediate reconnect succeeds or not is
very strongly timing-dependent, on both Linux and Windows.  It's easy
for it to attempt the reconnection before crash recovery is complete,
and then you get the above symptom.  Personally I get a "Failed" result
more often than not, regardless of platform.
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Dmitry Vasilyev

Дата:

10 октября 2015 г., 18:33:32

I have written, what service stopped. This action is repeatable.
You can run command 'psql -c "do $$ unpack p,1x8 $$ language plperlu;"'
and after this windows service will stop. 

On Сб, 2015-10-10 at 10:23 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Oct 9, 2015 at 5:52 AM, Dmitry Vasilyev
> > > postgres=# select 1;
> > > server closed the connection unexpectedly
> > > This probably means the server terminated abnormally
> > > before or while processing the request.
> > > The connection to the server was lost. Attempting reset: Failed.
> 
> > Hmm.  I'd expect that to cause a crash-and-restart cycle, just like
> > a
> > SIGQUIT would cause a crash-and-restart cycle on Linux.  But I
> > would
> > expect the server to end up running again at the end, not stopped.
> 
> It *is* a crash and restart cycle, or at least no evidence to the
> contrary has been provided.
> 
> Whether psql's attempt to do an immediate reconnect succeeds or not
> is
> very strongly timing-dependent, on both Linux and Windows.  It's easy
> for it to attempt the reconnection before crash recovery is complete,
> and then you get the above symptom.  Personally I get a "Failed"
> result
> more often than not, regardless of platform.
> 
>             regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

10 октября 2015 г., 18:56:03

Dmitry Vasilyev <d.vasilyev@postgrespro.ru> writes:
> I have written, what service stopped. This action is repeatable.
> You can run command 'psql -c "do $$ unpack p,1x8 $$ language plperlu;"'
> and after this windows service will stop.

Well, (a) that probably means that your plperl installation is broken,
and (b) you still haven't convinced me that you had an actual service
stop, and not just that the recovery time was longer than psql would
wait before retrying the connection.  Can you start a fresh psql
session after waiting a few seconds?
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Dmitry Vasilyev

Дата:

10 октября 2015 г., 19:04:11

Hello Tom!

On Сб, 2015-10-10 at 10:55 -0500, Tom Lane wrote:
> Dmitry Vasilyev <d.vasilyev@postgrespro.ru> writes:
> > I have written, what service stopped. This action is repeatable.
> > You can run command 'psql -c "do $$ unpack p,1x8 $$ language
> > plperlu;"'
> > and after this windows service will stop.
> 
> Well, (a) that probably means that your plperl installation is
> broken,
> and (b) you still haven't convinced me that you had an actual service
> stop, and not just that the recovery time was longer than psql would
> wait before retrying the connection.  Can you start a fresh psql
> session after waiting a few seconds?
> 
>             regards, tom lane

This is knowned bug of perl:

perl -e ' unpack p,1x8'
Segmentation fault (core dumped)

backend of postgres is crashed, and windows service is stopped:

C:\Users\vadv>sc query postgresql-X64-9.4 | findstr /i "STATE"
        S
TATE              : 1  STOPPED


The log you can see bellow:

2015-10-10 19:00:13 AST LOG:  database system was interrupted; last
known up at 2015-10-10 18:54:47 AST
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  checkpoint record is at 0/16A01C8
2015-10-10 19:00:13 AST DEBUG:  redo record is at 0/16A01C8; shutdown
TRUE
2015-10-10 19:00:13 AST DEBUG:  next transaction ID: 0/678; next OID:
16393
2015-10-10 19:00:13 AST DEBUG:  next MultiXactId: 1; next
MultiXactOffset: 0
2015-10-10 19:00:13 AST DEBUG:  oldest unfrozen transaction ID: 667, in
database 1
2015-10-10 19:00:13 AST DEBUG:  oldest MultiXactId: 1, in database 1
2015-10-10 19:00:13 AST DEBUG:  transaction ID wrap limit is
2147484314, limited by database with OID 1
2015-10-10 19:00:13 AST DEBUG:  MultiXactId wrap limit is 2147483648,
limited by database with OID 1
2015-10-10 19:00:13 AST DEBUG:  starting up replication slots
2015-10-10 19:00:13 AST LOG:  database system was not properly shut
down; automatic recovery in progress
2015-10-10 19:00:13 AST DEBUG:  resetting unlogged relations: cleanup 1
init 0
2015-10-10 19:00:13 AST LOG:  redo starts at 0/16A0230
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12057; tid 0/3
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12059; tid 1/3
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12060; tid 1/2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11979; tid 31/63
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11984; tid 16/34
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11889; tid 67/5
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11894; tid 9/132
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11895; tid 18/81
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12003; tid 48/62
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12005; tid 28/16
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12006; tid 27/24
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11950; tid 0/5
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11952; tid 1/3
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11953; tid 1/5
2015-10-10 19:00:13 AST LOG:  record with zero length at 0/16AB308
2015-10-10 19:00:13 AST LOG:  redo done at 0/16AB2D8
2015-10-10 19:00:13 AST LOG:  last completed transaction was at log
time 2015-10-10 18:55:09.464+03
2015-10-10 19:00:13 AST DEBUG:  resetting unlogged relations: cleanup 0
init 1
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  performing replication slot checkpoint
2015-10-10 19:00:13 AST DEBUG:  attempting to remove WAL segments older
than log file 000000000000000000000000
2015-10-10 19:00:13 AST DEBUG:  SlruScanDirectory invoking callback on
pg_multixact/offsets/0000
2015-10-10 19:00:13 AST DEBUG:  SlruScanDirectory invoking callback on
pg_multixact/members/0000
2015-10-10 19:00:13 AST DEBUG:  SlruScanDirectory invoking callback on
pg_multixact/offsets/0000
2015-10-10 19:00:13 AST DEBUG:  oldest MultiXactId member is at offset
0
2015-10-10 19:00:13 AST LOG:  MultiXact member wraparound protections
are now enabled
2015-10-10 19:00:13 AST DEBUG:  MultiXact member stop limit is now
4294914944 based on MultiXact 1
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(0): 1 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(0): 3 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(0): 2 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  exit(0)
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  reaping dead processes
2015-10-10 19:00:13 AST LOG:  database system is ready to accept
connections
2015-10-10 19:00:13 AST LOG:  autovacuum launcher started
2015-10-10 19:00:13 AST DEBUG:  InitPostgres
2015-10-10 19:00:13 AST DEBUG:  my backend ID is 1
2015-10-10 19:00:13 AST DEBUG:  checkpointer updated shared memory
configuration values
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  forked new backend, pid=3432
socket=1288
2015-10-10 19:00:13 AST DEBUG:  StartTransaction
2015-10-10 19:00:13 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children: 
2015-10-10 19:00:13 AST DEBUG:  CommitTransaction
2015-10-10 19:00:13 AST DEBUG:  name: unnamed;
blockState:       STARTED; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children: 
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  received inquiry for database 0
2015-10-10 19:00:13 AST DEBUG:  writing stats file
"pg_stat_tmp/global.stat"
2015-10-10 19:00:13 AST DEBUG:  postgres child[3432]: starting with (
2015-10-10 19:00:13 AST DEBUG:      postgres
2015-10-10 19:00:13 AST DEBUG:  )
2015-10-10 19:00:13 AST DEBUG:  InitPostgres
2015-10-10 19:00:13 AST DEBUG:  my backend ID is 2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  StartTransaction
2015-10-10 19:00:13 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children: 
2015-10-10 19:00:13 AST FATAL:  role "WIN-TDLBFCTPHT0$" does not exist
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(1): 1 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(1): 6 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(1): 3 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  exit(1)
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  reaping dead processes
2015-10-10 19:00:13 AST DEBUG:  server process (PID 3432) exited with
exit code 1
2015-10-10 19:00:16 AST DEBUG:  forked new backend, pid=148 socket=1288
2015-10-10 19:00:16 AST DEBUG:  postgres child[148]: starting with (
2015-10-10 19:00:16 AST DEBUG:      postgres
2015-10-10 19:00:16 AST DEBUG:  )
2015-10-10 19:00:16 AST DEBUG:  InitPostgres
2015-10-10 19:00:16 AST DEBUG:  my backend ID is 2
2015-10-10 19:00:16 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:16 AST DEBUG:  StartTransaction
2015-10-10 19:00:16 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children: 
2015-10-10 19:00:16 AST FATAL:  role "vadv" does not exist
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(1): 1 before_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(1): 6 on_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  proc_exit(1): 3 callbacks to make
2015-10-10 19:00:16 AST DEBUG:  exit(1)
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:16 AST DEBUG:  reaping dead processes
2015-10-10 19:00:16 AST DEBUG:  server process (PID 148) exited with
exit code 1
2015-10-10 19:00:20 AST DEBUG:  forked new backend, pid=5024
socket=1288
2015-10-10 19:00:20 AST DEBUG:  postgres child[5024]: starting with (
2015-10-10 19:00:20 AST DEBUG:      postgres
2015-10-10 19:00:20 AST DEBUG:  )
2015-10-10 19:00:20 AST DEBUG:  InitPostgres
2015-10-10 19:00:20 AST DEBUG:  my backend ID is 2
2015-10-10 19:00:20 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:20 AST DEBUG:  StartTransaction
2015-10-10 19:00:20 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children: 
2015-10-10 19:00:20 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:20 AST DEBUG:  CommitTransaction
2015-10-10 19:00:20 AST DEBUG:  name: unnamed;
blockState:       STARTED; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children: 
2015-10-10 19:00:32 AST DEBUG:  StartTransactionCommand
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  StartTransaction
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children: 
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  ProcessUtility
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  server process (PID 5024) was
terminated by exception 0xC0000005
2015-10-10 19:00:32 AST DETAIL:  Failed process was running: do $$
unpack p,1x8 $$ language plperlu;
2015-10-10 19:00:32 AST HINT:  See C include file "ntstatus.h" for a
description of the hexadecimal value.
2015-10-10 19:00:32 AST LOG:  server process (PID 5024) was terminated
by exception 0xC0000005
2015-10-10 19:00:32 AST DETAIL:  Failed process was running: do $$
unpack p,1x8 $$ language plperlu;
2015-10-10 19:00:32 AST HINT:  See C include file "ntstatus.h" for a
description of the hexadecimal value.
2015-10-10 19:00:32 AST LOG:  terminating any other active server
processes
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1848
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 968
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1100
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1856
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1104
2015-10-10 19:00:32 AST WARNING:  terminating connection because of
crash of another server process
2015-10-10 19:00:32 AST DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2015-10-10 19:00:32 AST HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  writing stats file
"pg_stat/global.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  writing stats file
"pg_stat/db_12135.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  removing temporary stats file
"pg_stat_tmp/db_12135.stat"
2015-10-10 19:00:32 AST DEBUG:  writing stats file "pg_stat/db_0.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  removing temporary stats file
"pg_stat_tmp/db_0.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST LOG:  all server processes terminated;
reinitializing
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(1): 3 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  cleaning up dynamic shared memory
control segment with ID 851401618
2015-10-10 19:00:32 AST DEBUG:  invoking
IpcMemoryCreate(size=290095104)
2015-10-10 19:00:42 AST FATAL:  pre-existing shared memory block is
still in use
2015-10-10 19:00:42 AST HINT:  Check if there are any old server
processes still running, and terminate them.
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  proc_exit(1): 2 callbacks to make
2015-10-10 19:00:42 AST DEBUG:  exit(1)
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:42 AST DEBUG:  logger shutting down
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(0): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(0): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  proc_exit(0): 0 callbacks to make
2015-10-10 19:00:42 AST DEBUG:  exit(0)

Re: Postgres service stops when I kill client backend on Windows

От

Pavel Stehule

Дата:

10 октября 2015 г., 20:19:24

2015-10-10 18:04 GMT+02:00 Dmitry Vasilyev <d.vasilyev@postgrespro.ru>:

Hello Tom!

On Сб, 2015-10-10 at 10:55 -0500, Tom Lane wrote:
> Dmitry Vasilyev <d.vasilyev@postgrespro.ru> writes:
> > I have written, what service stopped. This action is repeatable.
> > You can run command 'psql -c "do $$ unpack p,1x8 $$ language
> > plperlu;"'
> > and after this windows service will stop.
>
> Well, (a) that probably means that your plperl installation is
> broken,
> and (b) you still haven't convinced me that you had an actual service
> stop, and not just that the recovery time was longer than psql would
> wait before retrying the connection. Can you start a fresh psql
> session after waiting a few seconds?
>
> regards, tom lane

This is knowned bug of perl:

perl -e ' unpack p,1x8'
Segmentation fault (core dumped)

so it is expected behave. After any unexpected client fails, the server is restarted

Regards

Pavel

backend of postgres is crashed, and windows service is stopped:

C:\Users\vadv>sc query postgresql-X64-9.4 | findstr /i "STATE"
        S
TATE              : 1  STOPPED

The log you can see bellow:

2015-10-10 19:00:13 AST LOG:  database system was interrupted; last
known up at 2015-10-10 18:54:47 AST
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  checkpoint record is at 0/16A01C8
2015-10-10 19:00:13 AST DEBUG:  redo record is at 0/16A01C8; shutdown
TRUE
2015-10-10 19:00:13 AST DEBUG:  next transaction ID: 0/678; next OID:
16393
2015-10-10 19:00:13 AST DEBUG:  next MultiXactId: 1; next
MultiXactOffset: 0
2015-10-10 19:00:13 AST DEBUG:  oldest unfrozen transaction ID: 667, in
database 1
2015-10-10 19:00:13 AST DEBUG:  oldest MultiXactId: 1, in database 1
2015-10-10 19:00:13 AST DEBUG:  transaction ID wrap limit is
2147484314, limited by database with OID 1
2015-10-10 19:00:13 AST DEBUG:  MultiXactId wrap limit is 2147483648,
limited by database with OID 1
2015-10-10 19:00:13 AST DEBUG:  starting up replication slots
2015-10-10 19:00:13 AST LOG:  database system was not properly shut
down; automatic recovery in progress
2015-10-10 19:00:13 AST DEBUG:  resetting unlogged relations: cleanup 1
init 0
2015-10-10 19:00:13 AST LOG:  redo starts at 0/16A0230
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12057; tid 0/3
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12059; tid 1/3
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12060; tid 1/2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11979; tid 31/63
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11984; tid 16/34
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11889; tid 67/5
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11894; tid 9/132
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11895; tid 18/81
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12003; tid 48/62
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12005; tid 28/16
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/12006; tid 27/24
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11950; tid 0/5
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11952; tid 1/3
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 80 to 17
2015-10-10 19:00:13 AST CONTEXT:  xlog redo insert: rel
1663/12135/11953; tid 1/5
2015-10-10 19:00:13 AST LOG:  record with zero length at 0/16AB308
2015-10-10 19:00:13 AST LOG:  redo done at 0/16AB2D8
2015-10-10 19:00:13 AST LOG:  last completed transaction was at log
time 2015-10-10 18:55:09.464+03
2015-10-10 19:00:13 AST DEBUG:  resetting unlogged relations: cleanup 0
init 1
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 5 to 13
2015-10-10 19:00:13 AST DEBUG:  performing replication slot checkpoint
2015-10-10 19:00:13 AST DEBUG:  attempting to remove WAL segments older
than log file 000000000000000000000000
2015-10-10 19:00:13 AST DEBUG:  SlruScanDirectory invoking callback on
pg_multixact/offsets/0000
2015-10-10 19:00:13 AST DEBUG:  SlruScanDirectory invoking callback on
pg_multixact/members/0000
2015-10-10 19:00:13 AST DEBUG:  SlruScanDirectory invoking callback on
pg_multixact/offsets/0000
2015-10-10 19:00:13 AST DEBUG:  oldest MultiXactId member is at offset
0
2015-10-10 19:00:13 AST LOG:  MultiXact member wraparound protections
are now enabled
2015-10-10 19:00:13 AST DEBUG:  MultiXact member stop limit is now
4294914944 based on MultiXact 1
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(0): 1 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(0): 3 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(0): 2 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  exit(0)
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  reaping dead processes
2015-10-10 19:00:13 AST LOG:  database system is ready to accept
connections
2015-10-10 19:00:13 AST LOG:  autovacuum launcher started
2015-10-10 19:00:13 AST DEBUG:  InitPostgres
2015-10-10 19:00:13 AST DEBUG:  my backend ID is 1
2015-10-10 19:00:13 AST DEBUG:  checkpointer updated shared memory
configuration values
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  forked new backend, pid=3432
socket=1288
2015-10-10 19:00:13 AST DEBUG:  StartTransaction
2015-10-10 19:00:13 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children:
2015-10-10 19:00:13 AST DEBUG:  CommitTransaction
2015-10-10 19:00:13 AST DEBUG:  name: unnamed;
blockState:       STARTED; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children:
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  received inquiry for database 0
2015-10-10 19:00:13 AST DEBUG:  writing stats file
"pg_stat_tmp/global.stat"
2015-10-10 19:00:13 AST DEBUG:  postgres child[3432]: starting with (
2015-10-10 19:00:13 AST DEBUG:    postgres
2015-10-10 19:00:13 AST DEBUG:  )
2015-10-10 19:00:13 AST DEBUG:  InitPostgres
2015-10-10 19:00:13 AST DEBUG:  my backend ID is 2
2015-10-10 19:00:13 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:13 AST DEBUG:  StartTransaction
2015-10-10 19:00:13 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children:
2015-10-10 19:00:13 AST FATAL:  role "WIN-TDLBFCTPHT0$" does not exist
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(1): 1 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(1): 6 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(1): 3 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  exit(1)
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:13 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:13 AST DEBUG:  reaping dead processes
2015-10-10 19:00:13 AST DEBUG:  server process (PID 3432) exited with
exit code 1
2015-10-10 19:00:16 AST DEBUG:  forked new backend, pid=148 socket=1288
2015-10-10 19:00:16 AST DEBUG:  postgres child[148]: starting with (
2015-10-10 19:00:16 AST DEBUG:    postgres
2015-10-10 19:00:16 AST DEBUG:  )
2015-10-10 19:00:16 AST DEBUG:  InitPostgres
2015-10-10 19:00:16 AST DEBUG:  my backend ID is 2
2015-10-10 19:00:16 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:16 AST DEBUG:  StartTransaction
2015-10-10 19:00:16 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children:
2015-10-10 19:00:16 AST FATAL:  role "vadv" does not exist
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(1): 1 before_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(1): 6 on_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  proc_exit(1): 3 callbacks to make
2015-10-10 19:00:16 AST DEBUG:  exit(1)
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:16 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:16 AST DEBUG:  reaping dead processes
2015-10-10 19:00:16 AST DEBUG:  server process (PID 148) exited with
exit code 1
2015-10-10 19:00:20 AST DEBUG:  forked new backend, pid=5024
socket=1288
2015-10-10 19:00:20 AST DEBUG:  postgres child[5024]: starting with (
2015-10-10 19:00:20 AST DEBUG:    postgres
2015-10-10 19:00:20 AST DEBUG:  )
2015-10-10 19:00:20 AST DEBUG:  InitPostgres
2015-10-10 19:00:20 AST DEBUG:  my backend ID is 2
2015-10-10 19:00:20 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:20 AST DEBUG:  StartTransaction
2015-10-10 19:00:20 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children:
2015-10-10 19:00:20 AST DEBUG:  mapped win32 error code 2 to 2
2015-10-10 19:00:20 AST DEBUG:  CommitTransaction
2015-10-10 19:00:20 AST DEBUG:  name: unnamed;
blockState:       STARTED; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children:
2015-10-10 19:00:32 AST DEBUG:  StartTransactionCommand
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  StartTransaction
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  name: unnamed;
blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0,
nestlvl: 1, children:
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  ProcessUtility
2015-10-10 19:00:32 AST STATEMENT:  do $$ unpack p,1x8 $$ language
plperlu;
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  server process (PID 5024) was
terminated by exception 0xC0000005
2015-10-10 19:00:32 AST DETAIL:  Failed process was running: do $$
unpack p,1x8 $$ language plperlu;
2015-10-10 19:00:32 AST HINT:  See C include file "ntstatus.h" for a
description of the hexadecimal value.
2015-10-10 19:00:32 AST LOG:  server process (PID 5024) was terminated
by exception 0xC0000005
2015-10-10 19:00:32 AST DETAIL:  Failed process was running: do $$
unpack p,1x8 $$ language plperlu;
2015-10-10 19:00:32 AST HINT:  See C include file "ntstatus.h" for a
description of the hexadecimal value.
2015-10-10 19:00:32 AST LOG:  terminating any other active server
processes
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1848
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 968
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1100
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1856
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  sending SIGQUIT to process 1104
2015-10-10 19:00:32 AST WARNING:  terminating connection because of
crash of another server process
2015-10-10 19:00:32 AST DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2015-10-10 19:00:32 AST HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:32 AST DEBUG:  writing stats file
"pg_stat/global.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  writing stats file
"pg_stat/db_12135.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  removing temporary stats file
"pg_stat_tmp/db_12135.stat"
2015-10-10 19:00:32 AST DEBUG:  writing stats file "pg_stat/db_0.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  removing temporary stats file
"pg_stat_tmp/db_0.stat"
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST DEBUG:  reaping dead processes
2015-10-10 19:00:32 AST LOG:  all server processes terminated;
reinitializing
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  shmem_exit(1): 3 on_shmem_exit
callbacks to make
2015-10-10 19:00:32 AST DEBUG:  cleaning up dynamic shared memory
control segment with ID 851401618
2015-10-10 19:00:32 AST DEBUG:  invoking
IpcMemoryCreate(size=290095104)
2015-10-10 19:00:42 AST FATAL:  pre-existing shared memory block is
still in use
2015-10-10 19:00:42 AST HINT:  Check if there are any old server
processes still running, and terminate them.
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  proc_exit(1): 2 callbacks to make
2015-10-10 19:00:42 AST DEBUG:  exit(1)
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(-1): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  proc_exit(-1): 0 callbacks to make
2015-10-10 19:00:42 AST DEBUG:  logger shutting down
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(0): 0 before_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  shmem_exit(0): 0 on_shmem_exit
callbacks to make
2015-10-10 19:00:42 AST DEBUG:  proc_exit(0): 0 callbacks to make
2015-10-10 19:00:42 AST DEBUG:  exit(0)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Postgres service stops when I kill client backend on Windows

От

Ali Akbar

Дата:

11 октября 2015 г., 02:55:07

Greetings,

2015-10-11 0:18 GMT+07:00 Pavel Stehule <pavel.stehule@gmail.com>:

2015-10-10 18:04 GMT+02:00 Dmitry Vasilyev <d.vasilyev@postgrespro.ru>:

On Сб, 2015-10-10 at 10:55 -0500, Tom Lane wrote:
> Dmitry Vasilyev <d.vasilyev@postgrespro.ru> writes:
> > I have written, what service stopped. This action is repeatable.
> > You can run command 'psql -c "do $$ unpack p,1x8 $$ language
> > plperlu;"'
> > and after this windows service will stop.
>

so it is expected behave. After any unexpected client fails, the server is restarted

I can confirm this too. In linux (i use Fedora 22), this is what happens when a server is killed:

=== 1. before:

$ sudo systemctl status postgresql.service

postgresql.service - PostgreSQL database server

Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled)

Active: active (running) since Jum 2015-10-09 16:25:43 WIB; 1 day 14h ago

Process: 778 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS)

Process: 747 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)

Main PID: 783 (postgres)

CGroup: /system.slice/postgresql.service

├─ 783 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432

├─ 812 postgres: logger process

├─ 821 postgres: checkpointer process

├─ 822 postgres: writer process

├─ 823 postgres: wal writer process

├─ 824 postgres: autovacuum launcher process

├─ 825 postgres: stats collector process

└─17181 postgres: postgres test [local] idle

=== 2. killing and attempt to reconnect:

$ sudo kill 17181

test=# select 1;

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Succeeded.

=== 3. service status after:

$ sudo systemctl status postgresql.service

postgresql.service - PostgreSQL database server

Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled)

Active: active (running) since Jum 2015-10-09 16:25:43 WIB; 1 day 14h ago

Process: 778 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS)

Process: 747 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)

Main PID: 783 (postgres)

CGroup: /system.slice/postgresql.service

├─ 783 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432

├─ 812 postgres: logger process

├─ 821 postgres: checkpointer process

├─ 822 postgres: writer process

├─ 823 postgres: wal writer process

├─ 824 postgres: autovacuum launcher process

├─ 825 postgres: stats collector process

└─17422 postgres: postgres test [local] idle

===

The service status is still active (running), and new process 17422 handles the client.

But this is what happens in Windows (win 7 32 bit, postgres 9.4):

=== 1. before:

C:\Windows\system32>sc queryex postgresql-9.4

SERVICE_NAME: postgresql-9.4

TYPE : 10 WIN32_OWN_PROCESS

STATE : 4 RUNNING

(STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN)

WIN32_EXIT_CODE : 0 (0x0)

SERVICE_EXIT_CODE : 0 (0x0)

CHECKPOINT : 0x0

WAIT_HINT : 0x0

PID : 3716

FLAGS :

=== 2. killing & attempt to reconnect:

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

2080

(1 row)

C:\Windows\system32>taskkill /F /PID 2080

SUCCESS: The process with PID 2080 has been terminated.

postgres=# select 1;

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Failed.

=== 3. service status after:

C:\Windows\system32>sc query postgresql-9.4

SERVICE_NAME: postgresql-9.4

TYPE : 10 WIN32_OWN_PROCESS

STATE : 1 STOPPED

WIN32_EXIT_CODE : 0 (0x0)

SERVICE_EXIT_CODE : 0 (0x0)

CHECKPOINT : 0x0

WAIT_HINT : 0x0

===

The client cannot reconnect. The service is dead. This is nasty, because any client can exploit some segfault bug like the one in perl Dmitry mentoined upthread, and the postgresql service is down.

Note: killing the server process with pg_terminate_backend isn't causing this behavior to happen. The client reconnects normally, and the service is still running.

Regards,

Ali Akbar

Re: Postgres service stops when I kill client backend on Windows

От

Michael Paquier

Дата:

11 октября 2015 г., 06:55:29

On Sun, Oct 11, 2015 at 8:54 AM, Ali Akbar <the.apaan@gmail.com> wrote:
> C:\Windows\system32>taskkill /F /PID 2080
> SUCCESS: The process with PID 2080 has been terminated.

taskkill /f *forcefully* terminates the process targeted [1]. Isn't
that equivalent to a kill -9? If you headshot a backend process on
Linux with kill -9, an instance won't restart either.
[1]: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/taskkill.mspx?mfr=true
-- 
Michael

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

11 октября 2015 г., 07:40:10

Dmitry Vasilyev <d.vasilyev@postgrespro.ru> writes:
> On Сб, 2015-10-10 at 10:55 -0500, Tom Lane wrote:
>> and (b) you still haven't convinced me that you had an actual service
>> stop, and not just that the recovery time was longer than psql would
>> wait before retrying the connection.

> The log you can see bellow:
> ...
> 2015-10-10 19:00:32 AST DEBUG:  cleaning up dynamic shared memory control segment with ID 851401618
> 2015-10-10 19:00:32 AST DEBUG:  invoking IpcMemoryCreate(size=290095104)
> 2015-10-10 19:00:42 AST FATAL:  pre-existing shared memory block is still in use
> 2015-10-10 19:00:42 AST HINT:  Check if there are any old server processes still running, and terminate them.

Thanks for providing some detail!  It's clear from the above log excerpt
that we're timing out after 10 seconds in win32_shmem.c's version of
PGSharedMemoryCreate, because CreateFileMapping is still reporting that
the old shared memory segment still exists.  When we last discussed this
sort of problem in
http://www.postgresql.org/message-id/flat/49FA3B6F.6080906@dunslane.net
there was no evidence that such a failure could persist for longer than a
second or two.  Now it seems that on your machine the failure state can
persist for at least 10 seconds, but I don't know why.

If I had to guess, on the basis of no evidence, I'd wonder whether the
DSM code broke it; there is evidently at least one DSM segment in play
in your use-case.  But that's only a guess.
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Amit Kapila

Дата:

11 октября 2015 г., 09:01:35

On Sun, Oct 11, 2015 at 10:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dmitry Vasilyev <d.vasilyev@postgrespro.ru> writes:
> > The log you can see bellow:
> > ...
> > 2015-10-10 19:00:32 AST DEBUG: cleaning up dynamic shared memory control segment with ID 851401618
> > 2015-10-10 19:00:32 AST DEBUG: invoking IpcMemoryCreate(size=290095104)
> > 2015-10-10 19:00:42 AST FATAL: pre-existing shared memory block is still in use
> > 2015-10-10 19:00:42 AST HINT: Check if there are any old server processes still running, and terminate them.
>
..
>
> If I had to guess, on the basis of no evidence, I'd wonder whether the
> DSM code broke it; there is evidently at least one DSM segment in play
> in your use-case. But that's only a guess.
>

There is some possibility based on the above DEBUG messages that

DSM could cause this problem, but I think the last message (pre-existing

shared memory block is still in use) won't be logged for DSM. We create

the new dsm segment in below code dsm_postmaster_startup()->

dsm_impl_op()->dsm_impl_windows()

dsm_impl_windows()

{

if (op == DSM_OP_CREATE)
..

}

Basically in this path, we try to recreate the dsm with different name if it

fails with ALREADY_EXIST error.

To diagnose the reason of problem, I think we can write a diagnostic

patch which would do below 2 points:

1. Increase the below loop count 10 to 50 or 100 in win32_shmem.c

or instead of loop count, we can increase the sleep time as well.

PGSharedMemoryCreate()
{
..
for (i = 0; i < 10; i++)
..

if (GetLastError() == ERROR_ALREADY_EXISTS)
{
..
Sleep(1000);
continue;
}

}

2. Increase the log messages both in win32_shmem.c and dsm related

code which can help us in narrowing down the problem.

If you find this as reasonable approach to diagnose the root cause

of problem, I can work on writing a diagnostic patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Postgres service stops when I kill client backend on Windows

От

Magnus Hagander

Дата:

11 октября 2015 г., 12:58:40

On Sun, Oct 11, 2015 at 5:55 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Sun, Oct 11, 2015 at 8:54 AM, Ali Akbar <the.apaan@gmail.com> wrote:
> C:\Windows\system32>taskkill /F /PID 2080
> SUCCESS: The process with PID 2080 has been terminated.

taskkill /f *forcefully* terminates the process targeted [1]. Isn't
that equivalent to a kill -9? If you headshot a backend process on
Linux with kill -9, an instance won't restart either.
[1]: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/taskkill.mspx?mfr=true

It does. If you want a "gracefull kill" on Windows, you must use "pg_ctl kill" which can send an "emulated term-signal".

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Postgres service stops when I kill client backend on Windows

От

Andrew Dunstan

Дата:

11 октября 2015 г., 17:33:15

On 10/11/2015 05:58 AM, Magnus Hagander wrote:
>
>
> On Sun, Oct 11, 2015 at 5:55 AM, Michael Paquier 
> <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:
>
>     On Sun, Oct 11, 2015 at 8:54 AM, Ali Akbar <the.apaan@gmail.com
>     <mailto:the.apaan@gmail.com>> wrote:
>     > C:\Windows\system32>taskkill /F /PID 2080
>     > SUCCESS: The process with PID 2080 has been terminated.
>
>     taskkill /f *forcefully* terminates the process targeted [1]. Isn't
>     that equivalent to a kill -9? If you headshot a backend process on
>     Linux with kill -9, an instance won't restart either.
>     [1]:
>     http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/taskkill.mspx?mfr=true
>
>
>
> It does. If you want a "gracefull kill" on Windows, you must use 
> "pg_ctl kill" which can send an "emulated term-signal".
>
>

Nevertheless, we'd like a hard crash of a backend other than the 
postmaster not to have worse effects than on *nix, where killing a 
backend even with SIGKILL doesn't halt the server:
   andrew=# select pg_backend_pid();     pg_backend_pid   ----------------              24359   (1 row)
   andrew=# \! kill -9 24359   andrew=# select 1;   server closed the connection unexpectedly        This probably
meansthe server terminated abnormally        before or while processing the request.   The connection to the server was
lost.Attempting reset: Succeeded.   andrew=#

Amit's proposals elsewhere to increase the shmem timeout and increase 
logging seem reasonable.

cheers

andrew

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

11 октября 2015 г., 18:23:05

Andrew Dunstan <andrew@dunslane.net> writes:
> Amit's proposals elsewhere to increase the shmem timeout and increase 
> logging seem reasonable.

I'm back to the position I had in the previous thread, which is that
we don't really understand why any delay is needed here at all, and
we ought to try to remedy that lack rather than just hoping that more
and more delay will fix it.  It may be that there's some proactive
measure we can take to improve matters.

I'm a bit suspicious that we may have leaked a handle to the shared
memory block someplace, for example.  That would explain why this
symptom is visible now when it was not in 2009.  Or maybe it's dependent
on some feature that we didn't test back then --- for instance, if
the logging collector is in use, could it have inherited a handle and
not closed it?

One thing I noticed in the CreateFileMapping docs is that Windows
apparently implements the sort of anonymous mapping we're doing as
a mapping of part of the "system paging file".  I wonder if it's too
dumb (perhaps in only some releases) to realize that it doesn't
really need to flush dirty pages to disk when the last reference
to the mapping is abandoned.  In that case maybe an explicit flush
request would move things along.
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Magnus Hagander

Дата:

11 октября 2015 г., 18:30:01

On Sun, Oct 11, 2015 at 4:32 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 10/11/2015 05:58 AM, Magnus Hagander wrote:

On Sun, Oct 11, 2015 at 5:55 AM, Michael Paquier <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:

On Sun, Oct 11, 2015 at 8:54 AM, Ali Akbar <the.apaan@gmail.com
<mailto:the.apaan@gmail.com>> wrote:
> C:\Windows\system32>taskkill /F /PID 2080
> SUCCESS: The process with PID 2080 has been terminated.

taskkill /f *forcefully* terminates the process targeted [1]. Isn't
that equivalent to a kill -9? If you headshot a backend process on
Linux with kill -9, an instance won't restart either.
[1]:
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/taskkill.mspx?mfr=true

It does. If you want a "gracefull kill" on Windows, you must use "pg_ctl kill" which can send an "emulated term-signal".

Nevertheless, we'd like a hard crash of a backend other than the postmaster not to have worse effects than on *nix, where killing a backend even with SIGKILL doesn't halt the server:

Oh, absolutely. I was just pointing out that something like taskill *should* result in a hard restart of *all* backends, and if you want to kill off just the one you should never use it, you should instead use pg_ctl kill. But of course, none of those two should lead to the scenario explained here where it does not come back up again.

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Postgres service stops when I kill client backend on Windows

От

Magnus Hagander

Дата:

11 октября 2015 г., 18:38:04

On Sun, Oct 11, 2015 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:
> Amit's proposals elsewhere to increase the shmem timeout and increase
> logging seem reasonable.

I'm back to the position I had in the previous thread, which is that
we don't really understand why any delay is needed here at all, and
we ought to try to remedy that lack rather than just hoping that more
and more delay will fix it. It may be that there's some proactive
measure we can take to improve matters.

I'm a bit suspicious that we may have leaked a handle to the shared
memory block someplace, for example. That would explain why this
symptom is visible now when it was not in 2009. Or maybe it's dependent
on some feature that we didn't test back then --- for instance, if
the logging collector is in use, could it have inherited a handle and
not closed it?

Even if we leaked it, it should go away when the other processes died.

What would be interesting to know is if there at this point is *any* postgres.exe process still running, and in that case what it is. It should then be possible to use Process Explorer to figure out which process it is (by looking at the "fake title"), and probably also which shared memory handles it has open (even though they don't have a name, their info might explain things).

So if someone with a reproducible case could check that as well, I think it woudl be valuable information.

One thing I noticed in the CreateFileMapping docs is that Windows
apparently implements the sort of anonymous mapping we're doing as
a mapping of part of the "system paging file". I wonder if it's too
dumb (perhaps in only some releases) to realize that it doesn't
really need to flush dirty pages to disk when the last reference
to the mapping is abandoned. In that case maybe an explicit flush
request would move things along.

First of all, note that "system paging file" is exactly the same as "swap file" or "swap partition" on Unix. Just in case there is any unclearness there.

And I'm pretty sure it doesn't do that. Surely we would've seen performance issues from that before in that case. But I don't really have any facts to back that up :)

We do get, AIUI, the SEC_COMMIT behaviour which commits the pages initially to make sure there is actually space for them. I don't believe that one specifically says anything about when you close it.

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

11 октября 2015 г., 18:42:17

Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Oct 11, 2015 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm a bit suspicious that we may have leaked a handle to the shared
>> memory block someplace, for example.  That would explain why this
>> symptom is visible now when it was not in 2009.  Or maybe it's dependent
>> on some feature that we didn't test back then --- for instance, if
>> the logging collector is in use, could it have inherited a handle and
>> not closed it?

> Even if we leaked it, it should go away when the other processes died.

I'm fairly certain that we do not kill/restart the logging collector
during a database restart (because it's impossible to reproduce the
original stderr destination if we do).  Not sure if any other postmaster
children are allowed to survive.
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Michael Paquier

Дата:

12 октября 2015 г., 02:18:51

> On Sun, Oct 11, 2015 at 5:55 AM, Michael Paquier wrote:
>> On Sun, Oct 11, 2015 at 8:54 AM, Ali Akbar <the.apaan@gmail.com> wrote:
>> > C:\Windows\system32>taskkill /F /PID 2080
>> > SUCCESS: The process with PID 2080 has been terminated.
>>
>> taskkill /f *forcefully* terminates the process targeted [1]. Isn't
>> that equivalent to a kill -9? If you headshot a backend process on
>> Linux with kill -9, an instance won't restart either.
>> [1]:
>> http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/taskkill.mspx?mfr=true
> It does. If you want a "gracefull kill" on Windows, you must use "pg_ctl
> kill" which can send an "emulated term-signal".

Ah, yes. Sure. I had restart_after_crash = off on this instance...
-- 
Michael

Re: Postgres service stops when I kill client backend on Windows

От

Amit Kapila

Дата:

12 октября 2015 г., 08:55:39

On Sun, Oct 11, 2015 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
> > On Sun, Oct 11, 2015 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm a bit suspicious that we may have leaked a handle to the shared
> >> memory block someplace, for example. That would explain why this
> >> symptom is visible now when it was not in 2009. Or maybe it's dependent
> >> on some feature that we didn't test back then --- for instance, if
> >> the logging collector is in use, could it have inherited a handle and
> >> not closed it?
>
> > Even if we leaked it, it should go away when the other processes died.
>
> I'm fairly certain that we do not kill/restart the logging collector
> during a database restart (because it's impossible to reproduce the
> original stderr destination if we do).

True and it seems this is the reason for issue we are discussing here.

The reason why this happens is that during creation of shared memory

(PGSharedMemoryCreate()), we duplicate the handle such that it

become inheritable by all child processes. Then during fork

(syslogger_forkexec()->postmaster_forkexec()->internal_forkexec) we

always inherit the handles which causes syslogger to get a copy of

shared memory handle which it neither uses and nor closes it.

I could easily reproduce the issue if logging collector is on and even if

we try to increase the loop count or sleep time in PGSharedMemoryCreate(),

it doesn't change the situation as the syslogger has a valid handle to

shared memory. One way to fix is to just close the shared memory handle

in sys logger as we are not going to need it and attached patch which does

this fixes the issue for me. Another invasive fix in case we want to

retain shared memory handle for some purpose (future requirement) could

be to send some signal to syslogger in restart path so that it can release

the shared memory handle.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

fix_syslogger_dangling_shmhandle_v1.patch

Re: Postgres service stops when I kill client backend on Windows

От

Michael Paquier

Дата:

12 октября 2015 г., 13:15:20

On Mon, Oct 12, 2015 at 2:55 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Oct 11, 2015 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I could easily reproduce the issue if logging collector is on and even if
> we try to increase the loop count or sleep time in PGSharedMemoryCreate(),
> it doesn't change the situation as the syslogger has a valid handle to
> shared memory.  One way to fix is to just close the shared memory handle
> in sys logger as we are not going to need it and attached patch which does
> this fixes the issue for me.  Another invasive fix in case we want to
> retain shared memory handle for some purpose (future requirement) could
> be to send some signal to syslogger in restart path so that it can release
> the shared memory handle.

+#ifdef EXEC_BACKEND
+    if (!CloseHandle(UsedShmemSegID))
+        elog(LOG, "could not close handle to shared memory: error
code %lu", GetLastError());
+#endif
I am pretty sure that you would want a WIN32 block here, not
EXEC_BACKEND as the latter can be used on non-Windows platforms as
well to emulate Windows behavior.
-- 
Michael

Re: Postgres service stops when I kill client backend on Windows

От

Andres Freund

Дата:

12 октября 2015 г., 13:25:42

On 2015-10-12 11:25:35 +0530, Amit Kapila wrote:
>      /*
> +     * Close the shared memory handle as the syslogger doesn't need to
> +     * attach to it.  For EXEC_BACKEND case, the shared memory handle
> +     * is inherited by all postmaster child processes irrespective of
> +     * whether they need it or not.
> +     */
> +#ifdef EXEC_BACKEND
> +    if (!CloseHandle(UsedShmemSegID))
> +        elog(LOG, "could not close handle to shared memory: error code %lu", GetLastError());
> +#endif
> +

It feels wrong to do this in syslogger.c - I mean it's not the only
process that's not attached to shared memory. Sure, the others get
killed, but nonetheless...

Greetings,

Andres Freund

Re: Postgres service stops when I kill client backend on Windows

От

Magnus Hagander

Дата:

12 октября 2015 г., 13:26:21

On Mon, Oct 12, 2015 at 12:25 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-10-12 11:25:35 +0530, Amit Kapila wrote:
> /*
> + * Close the shared memory handle as the syslogger doesn't need to
> + * attach to it. For EXEC_BACKEND case, the shared memory handle
> + * is inherited by all postmaster child processes irrespective of
> + * whether they need it or not.
> + */
> +#ifdef EXEC_BACKEND
> + if (!CloseHandle(UsedShmemSegID))
> + elog(LOG, "could not close handle to shared memory: error code %lu", GetLastError());
> +#endif
> +

It feels wrong to do this in syslogger.c - I mean it's not the only
process that's not attached to shared memory. Sure, the others get
killed, but nonetheless...

+1. It feels like we're setting our selves up for repeating this mistake at some later time :)

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Postgres service stops when I kill client backend on Windows

От

Amit Kapila

Дата:

12 октября 2015 г., 14:16:32

On Mon, Oct 12, 2015 at 3:45 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Mon, Oct 12, 2015 at 2:55 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sun, Oct 11, 2015 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I could easily reproduce the issue if logging collector is on and even if
> > we try to increase the loop count or sleep time in PGSharedMemoryCreate(),
> > it doesn't change the situation as the syslogger has a valid handle to
> > shared memory. One way to fix is to just close the shared memory handle
> > in sys logger as we are not going to need it and attached patch which does
> > this fixes the issue for me. Another invasive fix in case we want to
> > retain shared memory handle for some purpose (future requirement) could
> > be to send some signal to syslogger in restart path so that it can release
> > the shared memory handle.
>
> +#ifdef EXEC_BACKEND
> + if (!CloseHandle(UsedShmemSegID))
> + elog(LOG, "could not close handle to shared memory: error
> code %lu", GetLastError());
> +#endif
> I am pretty sure that you would want a WIN32 block here, not
> EXEC_BACKEND as the latter can be used on non-Windows platforms as
> well to emulate Windows behavior.
>

Agreed, I can change the patch to use WIN32, but it seems not all

people want to follow this approach. So lets first try to see what

is the best way to fix.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Postgres service stops when I kill client backend on Windows

От

Michael Paquier

Дата:

12 октября 2015 г., 15:38:18

On Mon, Oct 12, 2015 at 7:26 PM, Magnus Hagander <magnus@hagander.net> wrote:
>
>
> On Mon, Oct 12, 2015 at 12:25 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2015-10-12 11:25:35 +0530, Amit Kapila wrote:
>> >       /*
>> > +      * Close the shared memory handle as the syslogger doesn't need to
>> > +      * attach to it.  For EXEC_BACKEND case, the shared memory handle
>> > +      * is inherited by all postmaster child processes irrespective of
>> > +      * whether they need it or not.
>> > +      */
>> > +#ifdef EXEC_BACKEND
>> > +     if (!CloseHandle(UsedShmemSegID))
>> > +             elog(LOG, "could not close handle to shared memory: error
>> > code %lu", GetLastError());
>> > +#endif
>> > +
>>
>> It feels wrong to do this in syslogger.c - I mean it's not the only
>> process that's not attached to shared memory. Sure, the others get
>> killed, but nonetheless...
>
>
> +1. It feels like we're setting our selves up for repeating this mistake at
> some later time :)

Actually, doesn't this apply as well to the archiver and the pgstat
collector? So perhaps we may want to do that in SubPostmasterMain with
PGSharedMemoryDetach. See for example the attached as an idea (patch
completely untested).
--
Michael

Вложения

20151012_detach_shmem.patch

Re: Postgres service stops when I kill client backend on Windows

От

Andres Freund

Дата:

12 октября 2015 г., 15:45:55

On 2015-10-12 21:38:12 +0900, Michael Paquier wrote:
> >> It feels wrong to do this in syslogger.c - I mean it's not the only
> >> process that's not attached to shared memory. Sure, the others get
> >> killed, but nonetheless...
> >
> >
> > +1. It feels like we're setting our selves up for repeating this mistake at
> > some later time :)
> 
> Actually, doesn't this apply as well to the archiver and the pgstat
> collector?

As mentioned above? The difference is that the archiver et al get killed
by postmaster during a PANIC restart thus don't present the problem
discussed here.

> So perhaps we may want to do that in SubPostmasterMain with
> PGSharedMemoryDetach. See for example the attached as an idea (patch
> completely untested).

> +    /*
> +     * Close any existing shared memory segment as those processes do not
> +     * need to have an access to it. This state is inherited from the
> +     * postmaster whether they need it or not.
> +     */
> +    if (strcmp(argv[1], "--forkarch") == 0 ||
> +        strcmp(argv[1], "--forkcol") == 0 ||
> +        strcmp(argv[1], "--forklog") == 0)
> +        PGSharedMemoryDetach();
> +

Well, in those cases we won't have attached to shared memory, so I'm not
convinced that this is the right solution. In fact, won't this lead to
hitting the elog in
void
PGSharedMemoryDetach(void)
{if (UsedShmemSegAddr != NULL){    if (!UnmapViewOfFile(UsedShmemSegAddr))        elog(LOG, "could not unmap view of
sharedmemory: error code %lu", GetLastError());

    UsedShmemSegAddr = NULL;}
}
UsedShmemSegAddr will have been setup by read_backend_variables(), but
the process won't have anything mapped at this point?

Greetings,

Andres Freund

Re: Postgres service stops when I kill client backend on Windows

От

Dmitry Vasilyev

Дата:

12 октября 2015 г., 16:42:41

Hello, Amit!

On Пн, 2015-10-12 at 11:25 +0530, Amit Kapila wrote:

On Sun, Oct 11, 2015 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
> > On Sun, Oct 11, 2015 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm a bit suspicious that we may have leaked a handle to the shared
> >> memory block someplace, for example. That would explain why this
> >> symptom is visible now when it was not in 2009. Or maybe it's dependent
> >> on some feature that we didn't test back then --- for instance, if
> >> the logging collector is in use, could it have inherited a handle and
> >> not closed it?
>
> > Even if we leaked it, it should go away when the other processes died.
>
> I'm fairly certain that we do not kill/restart the logging collector
> during a database restart (because it's impossible to reproduce the
> original stderr destination if we do).

True and it seems this is the reason for issue we are discussing here.
The reason why this happens is that during creation of shared memory
(PGSharedMemoryCreate()), we duplicate the handle such that it
become inheritable by all child processes. Then during fork
(syslogger_forkexec()->postmaster_forkexec()->internal_forkexec) we
always inherit the handles which causes syslogger to get a copy of
shared memory handle which it neither uses and nor closes it.

I could easily reproduce the issue if logging collector is on and even if
we try to increase the loop count or sleep time in PGSharedMemoryCreate(),
it doesn't change the situation as the syslogger has a valid handle to
shared memory. One way to fix is to just close the shared memory handle
in sys logger as we are not going to need it and attached patch which does
this fixes the issue for me. Another invasive fix in case we want to
retain shared memory handle for some purpose (future requirement) could
be to send some signal to syslogger in restart path so that it can release
the shared memory handle.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Specified patch with "ifdef WIN32" is working for me. Maybe it’s necessary to check open handlers from replication for example?

--------------

Dmitry Vasilyev

Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

Re: Postgres service stops when I kill client backend on Windows

От

Oleg Bartunov

Дата:

12 октября 2015 г., 17:04:32

On Mon, Oct 12, 2015 at 4:42 PM, Dmitry Vasilyev <d.vasilyev@postgrespro.ru> wrote:

Hello, Amit!

On Пн, 2015-10-12 at 11:25 +0530, Amit Kapila wrote:
On Sun, Oct 11, 2015 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
> > On Sun, Oct 11, 2015 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm a bit suspicious that we may have leaked a handle to the shared
> >> memory block someplace, for example. That would explain why this
> >> symptom is visible now when it was not in 2009. Or maybe it's dependent
> >> on some feature that we didn't test back then --- for instance, if
> >> the logging collector is in use, could it have inherited a handle and
> >> not closed it?
>
> > Even if we leaked it, it should go away when the other processes died.
>
> I'm fairly certain that we do not kill/restart the logging collector
> during a database restart (because it's impossible to reproduce the
> original stderr destination if we do).

True and it seems this is the reason for issue we are discussing here.
The reason why this happens is that during creation of shared memory
(PGSharedMemoryCreate()), we duplicate the handle such that it
become inheritable by all child processes. Then during fork
(syslogger_forkexec()->postmaster_forkexec()->internal_forkexec) we
always inherit the handles which causes syslogger to get a copy of
shared memory handle which it neither uses and nor closes it.

I could easily reproduce the issue if logging collector is on and even if
we try to increase the loop count or sleep time in PGSharedMemoryCreate(),
it doesn't change the situation as the syslogger has a valid handle to
shared memory. One way to fix is to just close the shared memory handle
in sys logger as we are not going to need it and attached patch which does
this fixes the issue for me. Another invasive fix in case we want to
retain shared memory handle for some purpose (future requirement) could
be to send some signal to syslogger in restart path so that it can release
the shared memory handle.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Specified patch with "ifdef WIN32" is working for me. Maybe it’s necessary to check open handlers from replication for example?

Assuming the problem will be fixed, should we release Beta2 soon ?

--------------
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

12 октября 2015 г., 17:05:06

Andres Freund <andres@anarazel.de> writes:
> On 2015-10-12 21:38:12 +0900, Michael Paquier wrote:
>> Actually, doesn't this apply as well to the archiver and the pgstat
>> collector?

> As mentioned above? The difference is that the archiver et al get killed
> by postmaster during a PANIC restart thus don't present the problem
> discussed here.

I thought your objection to the original patch was exactly that we should
not treat syslogger as a special case for this purpose.

> Well, in those cases we won't have attached to shared memory, so I'm not
> convinced that this is the right solution.

No, you're missing the point.  In Windows builds, child processes inherit
a "handle" reference to the shared memory mapping, whether or not they
make any use of the handle to re-attach to that shared memory.  The point
here is that we need to close that handle if we're not going to use it.

I think the right thing is something close to Michael's proposed patch,
though not duplicating and reversing the previous if-test like that.
In other words, something like this in SubPostmasterMain:
/* * If appropriate, physically re-attach to shared memory segment. We want * to do this before going any further to
ensurethat we can attach at the * same address the postmaster used.

+     * If we're not re-attaching, close the inherited handle to avoid leaks. */if (strcmp(argv[1], "--forkbackend") ==
0||    strcmp(argv[1], "--forkavlauncher") == 0 ||    strcmp(argv[1], "--forkavworker") == 0 ||    strcmp(argv[1],
"--forkboot")== 0 ||    strncmp(argv[1], "--forkbgworker=", 15) == 0)    PGSharedMemoryReAttach();

+#ifdef WIN32
+    else
+        close the handle;
+#endif

        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

12 октября 2015 г., 17:07:01

Oleg Bartunov <obartunov@gmail.com> writes:
> Assuming the problem will be fixed, should we release Beta2 soon ?

This bug has existed since we had native Windows support.  It's entirely
immaterial for beta purposes, and I have a hard time thinking it's
critical enough to justify a short release cycle for the back branches
either.
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Andres Freund

Дата:

12 октября 2015 г., 17:15:33

On 2015-10-12 10:04:55 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2015-10-12 21:38:12 +0900, Michael Paquier wrote:
> >> Actually, doesn't this apply as well to the archiver and the pgstat
> >> collector?
> 
> > As mentioned above? The difference is that the archiver et al get killed
> > by postmaster during a PANIC restart thus don't present the problem
> > discussed here.
> 
> I thought your objection to the original patch was exactly that we should
> not treat syslogger as a special case for this purpose.

Yes. The above was just about this not being actively broken - I'd
mentioned the other processes before and to me it sounded like Michael
thought there might be an active problem.

> > Well, in those cases we won't have attached to shared memory, so I'm not
> > convinced that this is the right solution.
> 
> No, you're missing the point.

Don't think so.

> In Windows builds, child processes inherit
> a "handle" reference to the shared memory mapping, whether or not they
> make any use of the handle to re-attach to that shared memory.  The point
> here is that we need to close that handle if we're not going to use it.

Right. But that doesn't mean it's right to call PGSharedMemoryDetach()
without other changes as done in Michael's proposed patch? That'll do an
UnmapViewOfFile() which'll fail because nothing i mapped, but still not
close UsedShmemSegID?

Greetings,

Andres Freund

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

12 октября 2015 г., 17:21:50

Andres Freund <andres@anarazel.de> writes:
> Right. But that doesn't mean it's right to call PGSharedMemoryDetach()
> without other changes as done in Michael's proposed patch? That'll do an
> UnmapViewOfFile() which'll fail because nothing i mapped, but still not
> close UsedShmemSegID?

Ah, right, I'd not noticed that he proposed changing
CloseHandle(UsedShmemSegID) to PGSharedMemoryDetach().  The latter is
clearly the wrong thing.

I'm not sure whether we should just put the CloseHandle call in
postmaster.c, or invent a function in win32_shmem.c to provide a
layer of abstraction.
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

12 октября 2015 г., 17:40:55

I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Right. But that doesn't mean it's right to call PGSharedMemoryDetach()
>> without other changes as done in Michael's proposed patch? That'll do an
>> UnmapViewOfFile() which'll fail because nothing i mapped, but still not
>> close UsedShmemSegID?

> Ah, right, I'd not noticed that he proposed changing
> CloseHandle(UsedShmemSegID) to PGSharedMemoryDetach().  The latter is
> clearly the wrong thing.

Actually, now that I look at it, it's even more obvious that this is the
wrong thing because *all the subprocess types in question already call
PGSharedMemoryDetach*.  That's necessary on Unix, but I should think that
on Windows all it will do is provoke the log message:
           elog(LOG, "could not unmap view of shared memory: error code %lu", GetLastError());

Could someone confirm whether syslogger, archiver, stats collector
processes reliably produce that log message at startup on Windows?
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Amit Kapila

Дата:

12 октября 2015 г., 17:49:08

On Mon, Oct 12, 2015 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> Right. But that doesn't mean it's right to call PGSharedMemoryDetach()
> >> without other changes as done in Michael's proposed patch? That'll do an
> >> UnmapViewOfFile() which'll fail because nothing i mapped, but still not
> >> close UsedShmemSegID?
>
> > Ah, right, I'd not noticed that he proposed changing
> > CloseHandle(UsedShmemSegID) to PGSharedMemoryDetach(). The latter is
> > clearly the wrong thing.
>
> Actually, now that I look at it, it's even more obvious that this is the
> wrong thing because *all the subprocess types in question already call
> PGSharedMemoryDetach*. That's necessary on Unix, but I should think that
> on Windows all it will do is provoke the log message:
>
> elog(LOG, "could not unmap view of shared memory: error code %lu", GetLastError());
>
> Could someone confirm whether syslogger, archiver, stats collector
> processes reliably produce that log message at startup on Windows?
>

I have tried this approach of calling PGSharedMemoryDetach() for

syslogger before calling closehandle() patch and I saw that message

and understood that it is not going to work.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

12 октября 2015 г., 18:04:24

I wrote:
> Actually, now that I look at it, it's even more obvious that this is the
> wrong thing because *all the subprocess types in question already call
> PGSharedMemoryDetach*.

Ah, scratch that: in most of them, the call is in #ifndef EXEC_BACKEND
stanzas.  The exception is bgworker start for a non-attached-to-shmem
worker, and in that case there's no log message because in fact
SubPostmasterMain did reattach.

This is kind of a mess :-(.  But it does look like what we want is
for SubPostmasterMain to do more than nothing when it chooses not to
reattach.  Probably that should include resetting UsedShmemSegAddr to
NULL, as well as closing the handle.
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

12 октября 2015 г., 23:35:18

I wrote:
> This is kind of a mess :-(.  But it does look like what we want is
> for SubPostmasterMain to do more than nothing when it chooses not to
> reattach.  Probably that should include resetting UsedShmemSegAddr to
> NULL, as well as closing the handle.

After poking around a bit more, I propose the attached patch.  I've
checked that this is happy with an EXEC_BACKEND Unix build, but I'm not
able to test it on Windows ... would somebody do that?

BTW, it appears from this that Cygwin builds have been broken right along
in a different way: according to the code in sysv_shmem's
PGSharedMemoryReAttach, Cygwin does cause a re-attach to occur, which we
were not undoing for putatively-not-connected-to-shmem child processes.
That's a robustness problem because it breaks the postmaster's expectation
that it's safe to not reinitialize shmem after a crash of one of those
processes.  I believe this patch fixes that problem as well, though if
anyone can test it on Cygwin that wouldn't be a bad thing either.

            regards, tom lane

diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 8be5bbe..c7a3a91 100644
*** a/src/backend/port/sysv_shmem.c
--- b/src/backend/port/sysv_shmem.c
*************** PGSharedMemoryReAttach(void)
*** 619,624 ****
--- 619,652 ----

      UsedShmemSegAddr = hdr;        /* probably redundant */
  }
+
+ /*
+  * PGSharedMemoryNoReAttach
+  *
+  * Clean up if we choose *not* to re-attach to an already existing shared
+  * memory segment.  This is not used in the non EXEC_BACKEND case, either.
+  *
+  * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
+  * routine.  The caller must have already restored them to the postmaster's
+  * values.
+  */
+ void
+ PGSharedMemoryNoReAttach(void)
+ {
+     Assert(UsedShmemSegAddr != NULL);
+     Assert(IsUnderPostmaster);
+
+ #ifdef __CYGWIN__
+     /* cygipc (currently) appears to not detach on exec. */
+     PGSharedMemoryDetach();
+ #endif
+
+     /* For cleanliness, reset UsedShmemSegAddr to show we're not attached. */
+     UsedShmemSegAddr = NULL;
+     /* And the same for UsedShmemSegID. */
+     UsedShmemSegID = 0;
+ }
+
  #endif   /* EXEC_BACKEND */

  /*
*************** PGSharedMemoryReAttach(void)
*** 629,634 ****
--- 657,665 ----
   * (it will have an on_shmem_exit callback registered to do that).  Rather,
   * this is for subprocesses that have inherited an attachment and want to
   * get rid of it.
+  *
+  * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
+  * routine.
   */
  void
  PGSharedMemoryDetach(void)
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index db67627..8152522 100644
*** a/src/backend/port/win32_shmem.c
--- b/src/backend/port/win32_shmem.c
***************
*** 17,23 ****
  #include "storage/ipc.h"
  #include "storage/pg_shmem.h"

! HANDLE        UsedShmemSegID = 0;
  void       *UsedShmemSegAddr = NULL;
  static Size UsedShmemSegSize = 0;

--- 17,23 ----
  #include "storage/ipc.h"
  #include "storage/pg_shmem.h"

! HANDLE        UsedShmemSegID = INVALID_HANDLE_VALUE;
  void       *UsedShmemSegAddr = NULL;
  static Size UsedShmemSegSize = 0;

*************** PGSharedMemoryCreate(Size size, bool mak
*** 218,226 ****
          elog(LOG, "could not close handle to shared memory: error code %lu", GetLastError());


-     /* Register on-exit routine to delete the new segment */
-     on_shmem_exit(pgwin32_SharedMemoryDelete, PointerGetDatum(hmap2));
-
      /*
       * Get a pointer to the new shared memory segment. Map the whole segment
       * at once, and let the system decide on the initial address.
--- 218,223 ----
*************** PGSharedMemoryCreate(Size size, bool mak
*** 254,259 ****
--- 251,259 ----
      UsedShmemSegSize = size;
      UsedShmemSegID = hmap2;

+     /* Register on-exit routine to delete the new segment */
+     on_shmem_exit(pgwin32_SharedMemoryDelete, PointerGetDatum(hmap2));
+
      *shim = hdr;
      return hdr;
  }
*************** PGSharedMemoryReAttach(void)
*** 299,321 ****
  }

  /*
   * PGSharedMemoryDetach
   *
   * Detach from the shared memory segment, if still attached.  This is not
!  * intended for use by the process that originally created the segment. Rather,
   * this is for subprocesses that have inherited an attachment and want to
   * get rid of it.
   */
  void
  PGSharedMemoryDetach(void)
  {
      if (UsedShmemSegAddr != NULL)
      {
          if (!UnmapViewOfFile(UsedShmemSegAddr))
!             elog(LOG, "could not unmap view of shared memory: error code %lu", GetLastError());

          UsedShmemSegAddr = NULL;
      }
  }


--- 299,368 ----
  }

  /*
+  * PGSharedMemoryNoReAttach
+  *
+  * Clean up if we choose *not* to re-attach to an already existing shared
+  * memory segment.
+  *
+  * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
+  * routine.  The caller must have already restored them to the postmaster's
+  * values.
+  */
+ void
+ PGSharedMemoryNoReAttach(void)
+ {
+     Assert(UsedShmemSegAddr != NULL);
+     Assert(IsUnderPostmaster);
+
+     /*
+      * Under Windows we will not have mapped the segment, so we don't need to
+      * un-map it.  Just reset UsedShmemSegAddr to show we're not attached
+      * (this is important in case somebody calls PGSharedMemoryDetach later).
+      */
+     UsedShmemSegAddr = NULL;
+
+     /*
+      * We *must* close the inherited shmem segment handle, else Windows will
+      * consider the existence of this process to mean it can't release the
+      * shmem segment yet.  We can now use PGSharedMemoryDetach to do that.
+      */
+     PGSharedMemoryDetach();
+ }
+
+ /*
   * PGSharedMemoryDetach
   *
   * Detach from the shared memory segment, if still attached.  This is not
!  * intended for use by the process that originally created the segment
!  * (it will have an on_shmem_exit callback registered to do that).  Rather,
   * this is for subprocesses that have inherited an attachment and want to
   * get rid of it.
+  *
+  * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
+  * routine.
   */
  void
  PGSharedMemoryDetach(void)
  {
+     /* Unmap the view, if it's mapped */
      if (UsedShmemSegAddr != NULL)
      {
          if (!UnmapViewOfFile(UsedShmemSegAddr))
!             elog(LOG, "could not unmap view of shared memory: error code %lu",
!                  GetLastError());

          UsedShmemSegAddr = NULL;
      }
+
+     /* And close the shmem handle, if we have one */
+     if (UsedShmemSegID != INVALID_HANDLE_VALUE)
+     {
+         if (!CloseHandle(UsedShmemSegID))
+             elog(LOG, "could not close handle to shared memory: error code %lu",
+                  GetLastError());
+
+         UsedShmemSegID = INVALID_HANDLE_VALUE;
+     }
  }


*************** PGSharedMemoryDetach(void)
*** 326,334 ****
  static void
  pgwin32_SharedMemoryDelete(int status, Datum shmId)
  {
      PGSharedMemoryDetach();
-     if (!CloseHandle(DatumGetPointer(shmId)))
-         elog(LOG, "could not close handle to shared memory: error code %lu", GetLastError());
  }

  /*
--- 373,380 ----
  static void
  pgwin32_SharedMemoryDelete(int status, Datum shmId)
  {
+     Assert(DatumGetPointer(shmId) == UsedShmemSegID);
      PGSharedMemoryDetach();
  }

  /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 24e8404..90c2f4a 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** SubPostmasterMain(int argc, char *argv[]
*** 4628,4634 ****
      /*
       * If appropriate, physically re-attach to shared memory segment. We want
       * to do this before going any further to ensure that we can attach at the
!      * same address the postmaster used.
       */
      if (strcmp(argv[1], "--forkbackend") == 0 ||
          strcmp(argv[1], "--forkavlauncher") == 0 ||
--- 4628,4635 ----
      /*
       * If appropriate, physically re-attach to shared memory segment. We want
       * to do this before going any further to ensure that we can attach at the
!      * same address the postmaster used.  On the other hand, if we choose not
!      * to re-attach, we may have other cleanup to do.
       */
      if (strcmp(argv[1], "--forkbackend") == 0 ||
          strcmp(argv[1], "--forkavlauncher") == 0 ||
*************** SubPostmasterMain(int argc, char *argv[]
*** 4636,4641 ****
--- 4637,4644 ----
          strcmp(argv[1], "--forkboot") == 0 ||
          strncmp(argv[1], "--forkbgworker=", 15) == 0)
          PGSharedMemoryReAttach();
+     else
+         PGSharedMemoryNoReAttach();

      /* autovacuum needs this set before calling InitProcess */
      if (strcmp(argv[1], "--forkavlauncher") == 0)
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 0b169af..9dbcbce 100644
*** a/src/include/storage/pg_shmem.h
--- b/src/include/storage/pg_shmem.h
*************** extern void *UsedShmemSegAddr;
*** 61,66 ****
--- 61,67 ----

  #ifdef EXEC_BACKEND
  extern void PGSharedMemoryReAttach(void);
+ extern void PGSharedMemoryNoReAttach(void);
  #endif

  extern PGShmemHeader *PGSharedMemoryCreate(Size size, bool makePrivate,

Re: Postgres service stops when I kill client backend on Windows

От

Dmitry Vasilyev

Дата:

13 октября 2015 г., 00:24:57

Hello Tom!
On Пн, 2015-10-12 at 16:35 -0400, Tom Lane wrote:
> I wrote:
> > This is kind of a mess :-(.  But it does look like what we want is
> > for SubPostmasterMain to do more than nothing when it chooses not
> > to
> > reattach.  Probably that should include resetting UsedShmemSegAddr
> > to
> > NULL, as well as closing the handle.
> 
> After poking around a bit more, I propose the attached patch.  I've
> checked that this is happy with an EXEC_BACKEND Unix build, but I'm
> not
> able to test it on Windows ... would somebody do that?
> 
> BTW, it appears from this that Cygwin builds have been broken right
> along
> in a different way: according to the code in sysv_shmem's
> PGSharedMemoryReAttach, Cygwin does cause a re-attach to occur, which
> we
> were not undoing for putatively-not-connected-to-shmem child
> processes.
> That's a robustness problem because it breaks the postmaster's
> expectation
> that it's safe to not reinitialize shmem after a crash of one of
> those
> processes.  I believe this patch fixes that problem as well, though
> if
> anyone can test it on Cygwin that wouldn't be a bad thing either.
> 
>             regards, tom lane
> 

This patch is working for me,
binaries: https://goo.gl/32j7QE (MSVC 2010, build script here: https://github.com/postgrespro/pgwininstall).


------
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Postgres service stops when I kill client backend on Windows

От

Michael Paquier

Дата:

13 октября 2015 г., 11:06:56

On Tue, Oct 13, 2015 at 5:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> This is kind of a mess :-(.  But it does look like what we want is
>> for SubPostmasterMain to do more than nothing when it chooses not to
>> reattach.  Probably that should include resetting UsedShmemSegAddr to
>> NULL, as well as closing the handle.
>
> After poking around a bit more, I propose the attached patch.  I've
> checked that this is happy with an EXEC_BACKEND Unix build, but I'm not
> able to test it on Windows ... would somebody do that?
>
> BTW, it appears from this that Cygwin builds have been broken right along
> in a different way: according to the code in sysv_shmem's
> PGSharedMemoryReAttach, Cygwin does cause a re-attach to occur, which we
> were not undoing for putatively-not-connected-to-shmem child processes.
> That's a robustness problem because it breaks the postmaster's expectation
> that it's safe to not reinitialize shmem after a crash of one of those
> processes.  I believe this patch fixes that problem as well, though if
> anyone can test it on Cygwin that wouldn't be a bad thing either.

I don't have a Cygwin environment at hand. That's unfortunate..

Looking at the patch, clearly +1 for the additional routine in both
win32_shmem.c and sysv_shmem.c to clean up the shmem state at backend
level. I have played as well with the patch on Windows and it behaves
as expected: without the patch a process killed with taskkill /f stops
straight the server even if restart_on_crash is on. With the patch the
server restarts correctly.

(Sorry, I should have mentioned that my last patch was untested and
*surely broken*, that was the result of a 3-min guess to make the
cleanup more generic for child processes that do not need to be
attached to shmem).
Regards,
-- 
Michael

Re: Postgres service stops when I kill client backend on Windows

От

Andrew Dunstan

Дата:

13 октября 2015 г., 16:49:08


On 10/12/2015 04:35 PM, Tom Lane wrote:
> I wrote:
>> This is kind of a mess :-(.  But it does look like what we want is
>> for SubPostmasterMain to do more than nothing when it chooses not to
>> reattach.  Probably that should include resetting UsedShmemSegAddr to
>> NULL, as well as closing the handle.
> After poking around a bit more, I propose the attached patch.  I've
> checked that this is happy with an EXEC_BACKEND Unix build, but I'm not
> able to test it on Windows ... would somebody do that?
>
> BTW, it appears from this that Cygwin builds have been broken right along
> in a different way: according to the code in sysv_shmem's
> PGSharedMemoryReAttach, Cygwin does cause a re-attach to occur, which we
> were not undoing for putatively-not-connected-to-shmem child processes.
> That's a robustness problem because it breaks the postmaster's expectation
> that it's safe to not reinitialize shmem after a crash of one of those
> processes.  I believe this patch fixes that problem as well, though if
> anyone can test it on Cygwin that wouldn't be a bad thing either.
>
>             


OK, I can test it. But it's not quite clear to me from your description 
how I should test Cygwin.


cheers

andrew

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

13 октября 2015 г., 16:57:36

Andrew Dunstan <andrew@dunslane.net> writes:
> On 10/12/2015 04:35 PM, Tom Lane wrote:
>> BTW, it appears from this that Cygwin builds have been broken right along
>> in a different way: according to the code in sysv_shmem's
>> PGSharedMemoryReAttach, Cygwin does cause a re-attach to occur, which we
>> were not undoing for putatively-not-connected-to-shmem child processes.
>> That's a robustness problem because it breaks the postmaster's expectation
>> that it's safe to not reinitialize shmem after a crash of one of those
>> processes.  I believe this patch fixes that problem as well, though if
>> anyone can test it on Cygwin that wouldn't be a bad thing either.

> OK, I can test it. But it's not quite clear to me from your description 
> how I should test Cygwin.

The point is that I think that right now, the logging collector subprocess
remains connected to shared memory, which it should not (and won't, if my
patch is doing the right thing).  I do not know if there's an easy way to
inspect the process state to verify that on Windows.

If nothing else, you could put a bogus access to some shared-memory data
structure into the syslogger loop, and check that it succeeds now and
crashes after applying the patch ...
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Tom Lane

Дата:

13 октября 2015 г., 18:29:07

Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Oct 13, 2015 at 5:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After poking around a bit more, I propose the attached patch.  I've
>> checked that this is happy with an EXEC_BACKEND Unix build, but I'm not
>> able to test it on Windows ... would somebody do that?

> Looking at the patch, clearly +1 for the additional routine in both
> win32_shmem.c and sysv_shmem.c to clean up the shmem state at backend
> level. I have played as well with the patch on Windows and it behaves
> as expected: without the patch a process killed with taskkill /f stops
> straight the server even if restart_on_crash is on. With the patch the
> server restarts correctly.

OK, pushed with some additional comment-smithing.

I noticed while looking at this that for subprocesses that aren't supposed
to be attached to shared memory, we do pgwin32_ReserveSharedMemoryRegion()
anyway in internal_forkexec(), and then that's never undone anywhere,
so that that segment of the subprocess's memory space remains reserved.
I'm not sure if this is worth changing, but if it is, we could do so now
by calling VirtualFree() in PGSharedMemoryNoReAttach().

BTW, I am suspicious that the DSM stuff may have related issues --- do
we use inheritable mapping handles for DSM segments on Windows?
        regards, tom lane

Re: Postgres service stops when I kill client backend on Windows

От

Amit Kapila

Дата:

14 октября 2015 г., 06:58:59

On Tue, Oct 13, 2015 at 8:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Michael Paquier <michael.paquier@gmail.com> writes:
> > On Tue, Oct 13, 2015 at 5:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> After poking around a bit more, I propose the attached patch. I've
> >> checked that this is happy with an EXEC_BACKEND Unix build, but I'm not
> >> able to test it on Windows ... would somebody do that?
>
> > Looking at the patch, clearly +1 for the additional routine in both
> > win32_shmem.c and sysv_shmem.c to clean up the shmem state at backend
> > level. I have played as well with the patch on Windows and it behaves
> > as expected: without the patch a process killed with taskkill /f stops
> > straight the server even if restart_on_crash is on. With the patch the
> > server restarts correctly.
>
> OK, pushed with some additional comment-smithing.
>
> I noticed while looking at this that for subprocesses that aren't supposed
> to be attached to shared memory, we do pgwin32_ReserveSharedMemoryRegion()
> anyway in internal_forkexec(), and then that's never undone anywhere,
> so that that segment of the subprocess's memory space remains reserved.
> I'm not sure if this is worth changing, but if it is, we could do so now
> by calling VirtualFree() in PGSharedMemoryNoReAttach().
>

I think it is worth doing, as it can save the memory for processes which

don't attach to shared memory. Another thing is that we do allocate

handles (by using duplicate handle) in save_backend_variables() which

I am not sure are required for all the processes, anyway this doesn't

seem worth the trouble.

> BTW, I am suspicious that the DSM stuff may have related issues --- do
> we use inheritable mapping handles for DSM segments on Windows?
>

Not by default, there is an API dsm_pin_segment() which Duplicates the

handle for Postmaster process to retain the shared memory segment

till Postmaster shutdown. In general, I don't see such issues for DSM,

but please point me if you see anything problematic.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Postgres service stops when I kill client backend on Windows

Вложения

Вложения