Обсуждение: client waits for end of update operation and server proc is idle

Поиск
Список
Период
Сортировка

client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
Hello,

We face (somehow reproducible) the following situation with a 13.1
server and a client written in ESQL/C:

# ps ax | grep post
...
27106 ?        Ss     0:00 postgres: sisis sisis ::1(49518) idle in transaction

# lsof | grep 49518
INDEX     27074                      sisis    2u     IPv6           26465150       0t0        TCP
localhost:49518->localhost:postgresql(ESTABLISHED)
 
postmaste 27106                   postgres    9u     IPv6           26464149       0t0        TCP
localhost:postgresql->localhost:49518(ESTABLISHED)
 


# psql -Usisis -dsisis
psql (11.4, server 13.1)
WARNING: psql major version 11, server major version 13.
         Some psql features might not work.
Type "help" for help.

sisis=# select * from pg_stat_activity where client_port = 49518;

 datid  | datname |  pid  | leader_pid | usesysid | usename |        application_name         | client_addr |
client_hostname| client_port |          backend_start          |           xact_start            |
query_start          |          state_change           | wait_event_type | wait_event |        state        |
backend_xid| backend_xmin |        query        |  backend_type
 

--------+---------+-------+------------+----------+---------+---------------------------------+-------------+-----------------+-------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+-----------------+------------+---------------------+-------------+--------------+---------------------+----------------
 951941 | sisis   | 27106 |            |    16384 | sisis   | SunRise DBCALL V7.2 (pid=27074) | ::1         |
     |       49518 | 23.04.2021 05:57:19.536206 CEST | 23.04.2021 05:57:19.562102 CEST | 23.04.2021 05:57:19.664471
CEST| 23.04.2021 05:57:19.664495 CEST | Client          | ClientRead | idle in transaction |    18228156 |     18228156
|fetch per_kette_seq | client backend
 
(1 row)


i.e. the server is in 'ClientRead | idle in transaction';


The ESQL/C code the client is executing at this moment is:

  /*
   * update the actual row
   */
  posDebug(POSDEBUG_FLOW, "ES UPDATE per_kette set .. WHERE CURRENT OF hc_per_kette");
  EXEC SQL UPDATE per_kette
      SET  desknr =:hrec_per_kette.desknr,
      frei =:hrec_per_kette.frei,
      modif =:hrec_per_kette.modif,
      anz =:hrec_per_kette.anz,
      katlist =:hrec_per_kette.katlist
  WHERE CURRENT OF hc_per_kette;

An attached GDB shows:

# gdb /opt/lib/sisis/catserver/bin/INDEX 27074

#0  0x00007f1e65c1b5c4 in poll () from /lib64/libc.so.6
#1  0x00007f1e648ac22f in pqSocketPoll () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#2  0x00007f1e648ac0ff in pqSocketCheck () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#3  0x00007f1e648abf83 in pqWaitTimed () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#4  0x00007f1e648abf55 in pqWait () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#5  0x00007f1e648a7d3a in PQgetResult () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#6  0x00007f1e648a8478 in PQexecFinish () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#7  0x00007f1e648a812e in PQexecParams () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#8  0x00007f1e65818216 in ecpg_execute ()
   from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
#9  0x00007f1e658195de in ecpg_do () from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
#10 0x00007f1e658196ef in ECPGdo () from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
#11 0x00007f1e67e091c6 in update_record () at per_kette.pgc:1968
#12 0x00007f1e67e04d16 in per_kette (zugriff=10, scroll=-20000, lock=-20000, key=-20000,
    sto=-20000, p_daten=0x7fff4e6db9b0, sel_anw=0x0, where_anw=0x0, p_btw_daten=0x0,
    order_by=0x0, auf_ab=0x0, group_by=0x0, having=0x0, into_temp=0x0,
    count=0x7fff4e6da6d0) at per_kette.pgc:497
#13 0x00007f1e67d7d70f in DB_rewr (tabmodul=0x7f1e67e047ca <per_kette>,
    p_daten=0x7fff4e6db9b0) at dbcall.pgc:3274
#14 0x00007f1e68ade821 in BKDeskInsert (setid=FSTAB_Personen, kette=...,
    katkey=168952979) at ./BKDesk.c:1447
#15 0x00007f1e68ae79c1 in InvAufnehmen (setid=FSTAB_Personen, katkey=168952979)
#16 0x00007f1e68ae6e17 in BKBearbeiteIndexAuftrag (setid=FSTAB_Personen, auftrag=...)
    at ./BKInvert.c:109
#17 0x0000000000405384 in main (argc=3, argv=0x7fff4e6ec1d8) at ./INDEX.c:525

i.e. our client issued ECPGdo() and now waits on finish; the provided
data looks fine:

(gdb) p hrec_per_kette.desknr
$2 = 1384118
(gdb) p hrec_per_kette.katlist
value requires 130000 bytes, which is more than max-value-size

(gdb) x/1s hrec_per_kette.katlist
0x7f1e6864af6e <hrec_per_kette+14>:
"\\xee9e01001eaf010057af010005ce010082d20100d4f801008e0a0200af340200253a0200a58e0200ae8e020044a602003ea7020020f1030024f1030054170400a04304003f670400ba8904003bed000005490100925201007454010083600100828301"...

the server pocess is in epoll_pwait():

# gdb /usr/local/sisis-pap/pgsql-13.0/bin/postmaster 27106
...
(gdb) bt
#0  0x00007fe7ee953fe0 in epoll_pwait () from /lib64/libc.so.6
#1  0x00000000008acf10 in WaitEventSetWaitBlock ()
#2  0x00000000008acde9 in WaitEventSetWaitBlock ()
#3  0x000000000073c5a5 in check_ssl_key_file_permissions ()
#4  0x000000000074636e in RemoveSocketFiles ()
#5  0x000000000074662b in pq_getbyte_if_available ()
#6  0x000000000074697b in pq_getmessage ()
#7  0x00000000008d543b in pg_analyze_and_rewrite_params ()
#8  0x00007f54fec14780 in ?? ()
#9  0x00007ffffec148a0 in ?? ()
#10 0x00007ffffec14780 in ?? ()
#11 0x00000053008d3b1b in ?? ()
#12 0x00007ffffec14780 in ?? ()
#13 0x00000000008d5482 in pg_analyze_and_rewrite_params ()

The client has a log file for this update which shows the actual time of
this operation:

# ls -l --full-time /var/spool/sisis/catserver/tmp/.lock_index-A.sisis.2
-rw-rw---- 1 sisis sisis 6 2021-04-23 05:57:19.529476471 +0200 /var/spool/sisis/catserver/tmp/.lock_index-A.sisis.2


The serverlog has around this time (sorry for German):

2021-04-23 05:55:23.591 CEST [2317] LOG:  unvollständige Message vom Client
2021-04-23 05:55:23.593 CEST [2317] FEHLER:  Speicher aufgebraucht
2021-04-23 05:55:23.593 CEST [2317] DETAIL:  Kann Zeichenkettenpuffer mit 0 Bytes nicht um 1380986158 Bytes
vergrößern.
2021-04-23 05:55:23.593 CEST [2317] LOG:  konnte Daten nicht an den Client senden: Datenübergabe unterbrochen (broken
pipe)
2021-04-23 05:55:23.593 CEST [2317] FATAL:  Verbindung zum Client wurde verloren
2021-04-23 05:57:20.339 CEST [27190] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
2021-04-23 05:59:17.419 CEST [26827] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion

What else could be checked for this? Any hints about this issue?

Thanks

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: client waits for end of update operation and server proc is idle

От
Laurenz Albe
Дата:
On Fri, 2021-04-23 at 10:15 +0200, Matthias Apitz wrote:
> We face (somehow reproducible) the following situation with a 13.1
> server and a client written in ESQL/C:
> 
> # ps ax | grep post
> ...
> 27106 ?        Ss     0:00 postgres: sisis sisis ::1(49518) idle in transaction
> 
> # lsof | grep 49518
> INDEX     27074                      sisis    2u     IPv6           26465150       0t0        TCP
localhost:49518->localhost:postgresql(ESTABLISHED)
 
> postmaste 27106                   postgres    9u     IPv6           26464149       0t0        TCP
localhost:postgresql->localhost:49518(ESTABLISHED)
 
> 
> sisis=# select * from pg_stat_activity where client_port = 49518;
> 
> [...] ClientRead | idle in transaction [...]
> 
> The ESQL/C code the client is executing at this moment is:
> 
>   /*
>    * update the actual row
>    */
>   posDebug(POSDEBUG_FLOW, "ES UPDATE per_kette set .. WHERE CURRENT OF hc_per_kette");
>   EXEC SQL UPDATE per_kette
>       SET  desknr =:hrec_per_kette.desknr,
>       frei =:hrec_per_kette.frei,
>       modif =:hrec_per_kette.modif,
>       anz =:hrec_per_kette.anz,
>       katlist =:hrec_per_kette.katlist
>   WHERE CURRENT OF hc_per_kette;
> 
> An attached GDB shows:
> 
> # gdb /opt/lib/sisis/catserver/bin/INDEX 27074
> 
> #0  0x00007f1e65c1b5c4 in poll () from /lib64/libc.so.6
> #1  0x00007f1e648ac22f in pqSocketPoll () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
> #2  0x00007f1e648ac0ff in pqSocketCheck () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
> #3  0x00007f1e648abf83 in pqWaitTimed () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
> #4  0x00007f1e648abf55 in pqWait () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
> #5  0x00007f1e648a7d3a in PQgetResult () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
> #6  0x00007f1e648a8478 in PQexecFinish () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
> #7  0x00007f1e648a812e in PQexecParams () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
> #8  0x00007f1e65818216 in ecpg_execute ()
>    from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
> #9  0x00007f1e658195de in ecpg_do () from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
> #10 0x00007f1e658196ef in ECPGdo () from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
>
> i.e. our client issued ECPGdo() and now waits on finish; the provided
> data looks fine:
> 
> the server pocess is in epoll_pwait():
> 
> # gdb /usr/local/sisis-pap/pgsql-13.0/bin/postmaster 27106
> ...
> (gdb) bt
> #0  0x00007fe7ee953fe0 in epoll_pwait () from /lib64/libc.so.6
> #1  0x00000000008acf10 in WaitEventSetWaitBlock ()
> #2  0x00000000008acde9 in WaitEventSetWaitBlock ()
> #3  0x000000000073c5a5 in check_ssl_key_file_permissions ()
> #4  0x000000000074636e in RemoveSocketFiles ()
> #5  0x000000000074662b in pq_getbyte_if_available ()
> #6  0x000000000074697b in pq_getmessage ()
> #7  0x00000000008d543b in pg_analyze_and_rewrite_params ()
> 
> The client has a log file for this update which shows the actual time of
> this operation:
> 
> # ls -l --full-time /var/spool/sisis/catserver/tmp/.lock_index-A.sisis.2
> -rw-rw---- 1 sisis sisis 6 2021-04-23 05:57:19.529476471 +0200 /var/spool/sisis/catserver/tmp/.lock_index-A.sisis.2
> 
> 
> The serverlog has around this time (sorry for German):
> 
> 2021-04-23 05:55:23.591 CEST [2317] LOG:  unvollständige Message vom Client
> 2021-04-23 05:55:23.593 CEST [2317] FEHLER:  Speicher aufgebraucht
> 2021-04-23 05:55:23.593 CEST [2317] DETAIL:  Kann Zeichenkettenpuffer mit 0 Bytes nicht um 1380986158 Bytes
vergrößern.
> 2021-04-23 05:55:23.593 CEST [2317] LOG:  konnte Daten nicht an den Client senden: Datenübergabe unterbrochen (broken
pipe)
> 2021-04-23 05:55:23.593 CEST [2317] FATAL:  Verbindung zum Client wurde verloren
> 2021-04-23 05:57:20.339 CEST [27190] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> 2021-04-23 05:59:17.419 CEST [26827] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> 
> What else could be checked for this? Any hints about this issue?

It seems pretty clear that both the client and the server are waiting for each other.

The server is of the opinion that it has completed the query, while the client is
still waiting for the response.

I would look for the problem in the network between both.  The error messages
(unexpected EOF on client connection, broken pipe) seem to point in the same direction.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: client waits for end of update operation and server proc is idle

От
Karsten Hilbert
Дата:
Am Fri, Apr 23, 2021 at 10:48:24AM +0200 schrieb Laurenz Albe:

> > The serverlog has around this time (sorry for German):
> >
> > 2021-04-23 05:55:23.591 CEST [2317] LOG:  unvollständige Message vom Client
> > 2021-04-23 05:55:23.593 CEST [2317] FEHLER:  Speicher aufgebraucht
> > 2021-04-23 05:55:23.593 CEST [2317] DETAIL:  Kann Zeichenkettenpuffer mit 0 Bytes nicht um 1380986158 Bytes
vergrößern.
> > 2021-04-23 05:55:23.593 CEST [2317] LOG:  konnte Daten nicht an den Client senden: Datenübergabe unterbrochen
(brokenpipe) 
> > 2021-04-23 05:55:23.593 CEST [2317] FATAL:  Verbindung zum Client wurde verloren
> > 2021-04-23 05:57:20.339 CEST [27190] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> > 2021-04-23 05:59:17.419 CEST [26827] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> >
> > What else could be checked for this? Any hints about this issue?
>
> It seems pretty clear that both the client and the server are waiting for each other.
>
> The server is of the opinion that it has completed the query, while the client is
> still waiting for the response.
>
> I would look for the problem in the network between both.  The error messages
> (unexpected EOF on client connection, broken pipe) seem to point in the same direction.

It also reports an out-of-memory situation:

>> 2021-04-23 05:55:23.593 CEST [2317] FEHLER:  Speicher aufgebraucht
>> 2021-04-23 05:55:23.593 CEST [2317] DETAIL:  Kann Zeichenkettenpuffer mit 0 Bytes nicht um 1380986158 Bytes
vergrößern.

It seems what is to be sent to the client is "too large".

Following which sending the data seems (intentionally?)
aborted by the server, which the other side likely reports as
loss-of-signal.

Also, where does the "imcomplete message from client"
originate from ?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
El día viernes, abril 23, 2021 a las 10:48:24a. m. +0200, Laurenz Albe escribió:

> > The serverlog has around this time (sorry for German):
> >
> > 2021-04-23 05:55:23.591 CEST [2317] LOG:  unvollständige Message vom Client
> > 2021-04-23 05:55:23.593 CEST [2317] FEHLER:  Speicher aufgebraucht
> > 2021-04-23 05:55:23.593 CEST [2317] DETAIL:  Kann Zeichenkettenpuffer mit 0 Bytes nicht um 1380986158 Bytes
vergrößern.
> > 2021-04-23 05:55:23.593 CEST [2317] LOG:  konnte Daten nicht an den Client senden: Datenübergabe unterbrochen
(brokenpipe) 
> > 2021-04-23 05:55:23.593 CEST [2317] FATAL:  Verbindung zum Client wurde verloren
> > 2021-04-23 05:57:20.339 CEST [27190] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> > 2021-04-23 05:59:17.419 CEST [26827] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> >
> > What else could be checked for this? Any hints about this issue?
>
> It seems pretty clear that both the client and the server are waiting for each other.

Yes, they are both alive as processes. Both are running on the same
Linux server, i.e. no real network between. And any loss of data would
have caused perhaps the end of one (and then the other) process.
I will try to gather a TCPDUMP for this.

>
> The server is of the opinion that it has completed the query, while the client is
> still waiting for the response.
>
> I would look for the problem in the network between both.  The error messages
> (unexpected EOF on client connection, broken pipe) seem to point in the same direction.


The other 'unerwartetes EOF auf Client' (unexpected EOF client) are
logged by other PostgreSQL server processes, not invloved here, and
caused by killing other clients with 'kill -9'.

    matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
¡Con Cuba no te metas!  «»  Don't mess with Cuba!  «»  Leg Dich nicht mit Kuba an!
http://www.cubadebate.cu/noticias/2020/12/25/en-video-con-cuba-no-te-metas/



Re: client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
El día viernes, abril 23, 2021 a las 11:32:02a. m. +0200, Matthias Apitz escribió:

> El día viernes, abril 23, 2021 a las 10:48:24a. m. +0200, Laurenz Albe escribió:
> 
> > > The serverlog has around this time (sorry for German):
> > > 
> > > 2021-04-23 05:55:23.591 CEST [2317] LOG:  unvollständige Message vom Client
> > > 2021-04-23 05:55:23.593 CEST [2317] FEHLER:  Speicher aufgebraucht
> > > 2021-04-23 05:55:23.593 CEST [2317] DETAIL:  Kann Zeichenkettenpuffer mit 0 Bytes nicht um 1380986158 Bytes
vergrößern.
> > > 2021-04-23 05:55:23.593 CEST [2317] LOG:  konnte Daten nicht an den Client senden: Datenübergabe unterbrochen
(brokenpipe)
 
> > > 2021-04-23 05:55:23.593 CEST [2317] FATAL:  Verbindung zum Client wurde verloren
> > > 2021-04-23 05:57:20.339 CEST [27190] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> > > 2021-04-23 05:59:17.419 CEST [26827] LOG:  unerwartetes EOF auf Client-Verbindung mit einer offenen Transaktion
> > > 
> > > What else could be checked for this? Any hints about this issue?
> > 
> > It seems pretty clear that both the client and the server are waiting for each other.
> 
> Yes, they are both alive as processes. Both are running on the same
> Linux server, i.e. no real network between. And any loss of data would
> have caused perhaps the end of one (and then the other) process.
> I will try to gather a TCPDUMP for this.
> 

I hit the same situation again. The ESQL/C client sent an UPDATE to the
PostgreSQL server:

# gdb /opt/lib/sisis/catserver/bin/INDEX 31864
...
#0  0x00007f688bb955c4 in poll () from /lib64/libc.so.6
#1  0x00007f688a82622f in pqSocketPoll () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#2  0x00007f688a8260ff in pqSocketCheck () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#3  0x00007f688a825f83 in pqWaitTimed () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#4  0x00007f688a825f55 in pqWait () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#5  0x00007f688a821d3a in PQgetResult () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#6  0x00007f688a822478 in PQexecFinish () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#7  0x00007f688a82212e in PQexecParams () from /usr/local/sisis-pap/pgsql/lib/libpq.so.5
#8  0x00007f688b792216 in ecpg_execute ()
   from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
#9  0x00007f688b7935de in ecpg_do () from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
#10 0x00007f688b7936ef in ECPGdo () from /usr/local/sisis-pap/pgsql/lib/libecpg.so.6
#11 0x00007f688dd831c6 in update_record () at per_kette.pgc:1968
#12 0x00007f688dd7ed16 in per_kette (zugriff=10, scroll=-20000, lock=-20000, key=-20000, 
    sto=-20000, p_daten=0x7ffef0499950, sel_anw=0x0, where_anw=0x0, p_btw_daten=0x0, 
    order_by=0x0, auf_ab=0x0, group_by=0x0, having=0x0, into_temp=0x0, 
    count=0x7ffef0498670) at per_kette.pgc:497
#13 0x00007f688dcf770f in DB_rewr (tabmodul=0x7f688dd7e7ca <per_kette>, 
    p_daten=0x7ffef0499950) at dbcall.pgc:3274
#14 0x00007f688ea58821 in BKDeskInsert (setid=FSTAB_Personen, kette=..., 
    katkey=168952979) at ./BKDesk.c:1447


and the Pos server sits idle:

# select * from pg_stat_activity where client_port = 52288;

 datid  | datname |  pid  | leader_pid | usesysid | usename |        application_name         | client_addr |
client_hostname| client_port |          backend_start          |           xact_start            |
query_start          |          state_change           | wait_event_type | wait_event |        state        |
backend_xid| backend_xmin |        query        |  backend_type  
 

--------+---------+-------+------------+----------+---------+---------------------------------+-------------+-----------------+-------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+-----------------+------------+---------------------+-------------+--------------+---------------------+----------------
 951941 | sisis   | 31894 |            |    16384 | sisis   | SunRise DBCALL V7.2 (pid=31864) | 127.0.0.1   |
     |       52288 | 25.04.2021 10:57:15.922137 CEST | 25.04.2021 10:57:15.968081 CEST | 25.04.2021 10:57:16.051256
CEST| 25.04.2021 10:57:16.051273 CEST | Client          | ClientRead | idle in transaction |    18314796 |     18314796
|fetch per_kette_seq | client backend
 


But this time I was collecting all TCP traffic of all clients to and
from the port 5432. This TCPDUMP command writes all traffic as binary
data to files of 100.000.000 bytes in size (-C 100) and appends a number
when a new file is needed:

# /usr/sbin/tcpdump -n -i lo -C 100 -w /var/spool/sisis/tcp5432. port 5432 &

Later one can look into the files for the local port 52288 of the
connection with:

# tcpdump -n -r tcp5432.2 -Xx -s1024 port 52288 > INDEX.52288.txt


10:57:15.921783 IP 127.0.0.1.52288 > 127.0.0.1.5432: Flags [S], seq 3652783347, win 65495, options [mss 65495,sackOK,TS
val3424445579 ecr 0,nop,wscale 7], length 0
 
    0x0000:  4500 003c b720 4000 4006 8599 7f00 0001  E..<..@.@.......
    0x0010:  7f00 0001 cc40 1538 d9b9 0cf3 0000 0000  .....@.8........
    0x0020:  a002 ffd7 fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  cc1c e48b 0000 0000 0103 0307            ............
10:57:15.921792 IP 127.0.0.1.5432 > 127.0.0.1.52288: Flags [S.], seq 4241480135, ack 3652783348, win 65483, options
[mss65495,sackOK,TS val 3424445579 ecr 3424445579,nop,wscale 7], length 0
 
    0x0000:  4500 003c 0000 4000 4006 3cba 7f00 0001  E..<..@.@.<.....
    0x0010:  7f00 0001 1538 cc40 fccf d9c7 d9b9 0cf4  .....8.@........
    0x0020:  a012 ffcb fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  cc1c e48b cc1c e48b 0103 0307            ............
10:57:15.921799 IP 127.0.0.1.52288 > 127.0.0.1.5432: Flags [.], ack 1, win 512, options [nop,nop,TS val 3424445579 ecr
3424445579],length 0
 
    0x0000:  4500 0034 b721 4000 4006 85a0 7f00 0001  E..4.!@.@.......
    0x0010:  7f00 0001 cc40 1538 d9b9 0cf4 fccf d9c8  .....@.8........
    0x0020:  8010 0200 fe28 0000 0101 080a cc1c e48b  .....(..........
    0x0030:  cc1c e48b                                ....



And here you see the last command (the UPDATE) sent to the Pos server:

10:57:16.051388 IP 127.0.0.1.52288 > 127.0.0.1.5432: Flags [P.], seq 14407:14812, ack 120304, win 512, options
[nop,nop,TSval 3424445708 ecr 3424445708], length 405
 
    0x0000:  4500 01c9 b84f 4000 4006 82dd 7f00 0001  E....O@.@.......
    0x0010:  7f00 0001 cc40 1538 d9b9 453a fcd1 afb7  .....@.8..E:....
    0x0020:  8018 0200 ffbd 0000 0101 080a cc1c e50c  ................
    0x0030:  cc1c e50c 5000 0000 7e00 7570 6461 7465  ....P...~.update
    0x0040:  2070 6572 5f6b 6574 7465 2073 6574 2064  .per_kette.set.d
    0x0050:  6573 6b6e 7220 3d20 2431 2020 2c20 6672  esknr.=.$1..,.fr
    0x0060:  6569 203d 2024 3220 202c 206d 6f64 6966  ei.=.$2..,.modif
    0x0070:  203d 2024 3320 202c 2061 6e7a 203d 2024  .=.$3..,.anz.=.$
    0x0080:  3420 202c 206b 6174 6c69 7374 203d 2024  4..,.katlist.=.$
    0x0090:  3520 2077 6865 7265 2063 7572 7265 6e74  5..where.current
    0x00a0:  206f 6620 6863 5f70 6572 5f6b 6574 7465  .of.hc_per_kette
    0x00b0:  0000 0042 0000 00ff 0000 0000 0005 0000  ...B............
    0x00c0:  0007 3133 3834 3131 3800 0000 0130 0000  ..1384118....0..
    0x00d0:  0001 3000 0000 0232 3600 0000 d25c 7865  ..0....26....\xe
    0x00e0:  6539 6530 3130 3031 6561 6630 3130 3035  e9e01001eaf01005
    0x00f0:  3761 6630 3130 3030 3563 6530 3130 3038  7af010005ce01008
    0x0100:  3264 3230 3130 3064 3466 3830 3130 3038  2d20100d4f801008
    0x0110:  6530 6130 3230 3061 6633 3430 3230 3032  e0a0200af3402002
    0x0120:  3533 6130 3230 3061 3538 6530 3230 3061  53a0200a58e0200a
    0x0130:  6538 6530 3230 3034 3461 3630 3230 3033  e8e020044a602003
    0x0140:  6561 3730 3230 3032 3066 3130 3330 3032  ea7020020f103002
    0x0150:  3466 3130 3330 3035 3431 3730 3430 3061  4f1030054170400a
    0x0160:  3034 3330 3430 3033 6636 3730 3430 3062  04304003f670400b
    0x0170:  6138 3930 3430 3033 6265 6430 3030 3030  a8904003bed00000
    0x0180:  3534 3930 3130 3039 3235 3230 3130 3037  5490100925201007
    0x0190:  3435 3430 3130 3038 3336 3030 3130 3038  4540100836001008
    0x01a0:  3238 3330 3130 3039 3330 3431 3230 6100  28301009304120a.
    0x01b0:  0100 0044 0000 0006 5000 4500 0000 0900  ...D....P.E.....
    0x01c0:  0000 0000 5300 0000 04                   ....S....
10:57:16.051395 IP 127.0.0.1.5432 > 127.0.0.1.52288: Flags [.], ack 14812, win 512, options [nop,nop,TS val 3424445708
ecr3424445708], length 0
 
    0x0000:  4500 0034 453c 4000 4006 f785 7f00 0001  E..4E<@.@.......
    0x0010:  7f00 0001 1538 cc40 fcd1 afb7 d9b9 46cf  .....8.@......F.
    0x0020:  8010 0200 fe28 0000 0101 080a cc1c e50c  .....(..........
    0x0030:  cc1c e50c                                ....

The send buffer is complete, the bytea string \x ....00 is terminated
by a NULL byte. TCP ACK'ed the package and then no further reaction of
the Pos server on this connection. It still sits there if someone needs
more information about this connection.

From the collected data, I tend to say: this must be a bug in the Pos
server...

Any ideas?

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
El día domingo, abril 25, 2021 a las 01:54:49p. m. +0200, Matthias Apitz escribió:

>From the collected data, I tend to say: this must be a bug in the Pos
> server...
> 

At the end of the day, it turned out that out client caused the problem.

Because we were hunting some other issue, the client was logging some
message with fprintf(stderr, ...) but the fd=2 was not stderr, fd=2 was the
connection to the socket of the PostgreSQL server. And ofc the Pos
server could not understand our human message.

Problem solved. Sorry for all the noise.

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: client waits for end of update operation and server proc is idle

От
Tom Lane
Дата:
Matthias Apitz <guru@unixarea.de> writes:
> At the end of the day, it turned out that out client caused the problem.
> Because we were hunting some other issue, the client was logging some
> message with fprintf(stderr, ...) but the fd=2 was not stderr, fd=2 was the
> connection to the socket of the PostgreSQL server. And ofc the Pos
> server could not understand our human message.

Hmph.  So why wasn't this visible in the tcpdump trace?

My initial thought was that the server process should log a message
and drop the connection once it received something that didn't look
like a valid Postgres-protocol message, on the assumption that
message-boundary sync could never be recovered reliably.  But on
further thought, it would start by taking the second through fifth
bytes of the input as a message length, and it wouldn't really do
anything interesting until it had collected that much input ---
and, if those bytes were text, that would look like a large number.
So what the server is really doing is waiting for the rest of the
imaginary message, which will never arrive.

Maybe we should have more sanity checking on alleged message lengths.
I recall there is some checking of that sort in libpq, but I'm less
sure about the server.

            regards, tom lane



Re: client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
El día domingo, abril 25, 2021 a las 11:51:45a. m. -0400, Tom Lane escribió:

> Matthias Apitz <guru@unixarea.de> writes:
> > At the end of the day, it turned out that out client caused the problem.
> > Because we were hunting some other issue, the client was logging some
> > message with fprintf(stderr, ...) but the fd=2 was not stderr, fd=2 was the
> > connection to the socket of the PostgreSQL server. And ofc the Pos
> > server could not understand our human message.
> 
> Hmph.  So why wasn't this visible in the tcpdump trace?

It was visible. That's why I detected it while looking up the TCP
packages before the UPDATE:

10:57:16.051326 IP 127.0.0.1.52288 > 127.0.0.1.5432: Flags [P.], seq 14280:14407, ack 120304, win 512, options
[nop,nop,TSval 3424445708 ecr 3424445708], length 127
 
        0x0000:  4500 00b3 b84e 4000 4006 83f4 7f00 0001  E....N@.@.......
        0x0010:  7f00 0001 cc40 1538 d9b9 44bb fcd1 afb7  .....@.8..D.....
        0x0020:  8018 0200 fea7 0000 0101 080a cc1c e50c  ................
        0x0030:  cc1c e50c 5352 502d 3236 3937 363a 2072  ....SRP-26976:.r
        0x0040:  6573 746f 7265 6420 7374 7275 6374 206b  estored.struct.k
        0x0050:  6574 7465 7361 747a 2066 6f72 2073 6574  ettesatz.for.set
        0x0060:  6964 3a32 2c20 6465 736b 3a33 3831 3832  id:2,.desk:38182
        0x0070:  3038 302c 2066 656c 646e 723a 362c 206b  080,.feldnr:6,.k
        0x0080:  6174 6b65 793a 3136 3839 3532 3937 392c  atkey:168952979,
        0x0090:  206b 6574 7465 7361 747a 2e66 7265 693a  .kettesatz.frei:
        0x00a0:  302c 206b 6574 7465 7361 747a 2e61 6e7a  0,.kettesatz.anz
        0x00b0:  3235 0a                                  25.
10:57:16.051388 IP 127.0.0.1.52288 > 127.0.0.1.5432: Flags [P.], seq 14407:14812, ack 120304, win 512, options
[nop,nop,TSval 3424445708 ecr 3424445708], length 405
 
        0x0000:  4500 01c9 b84f 4000 4006 82dd 7f00 0001  E....O@.@.......
        0x0010:  7f00 0001 cc40 1538 d9b9 453a fcd1 afb7  .....@.8..E:....
        0x0020:  8018 0200 ffbd 0000 0101 080a cc1c e50c  ................
        0x0030:  cc1c e50c 5000 0000 7e00 7570 6461 7465  ....P...~.update
        0x0040:  2070 6572 5f6b 6574 7465 2073 6574 2064  .per_kette.set.d
        0x0050:  6573 6b6e 7220 3d20 2431 2020 2c20 6672  esknr.=.$1..,.fr

> 
> My initial thought was that the server process should log a message
> and drop the connection once it received something that didn't look
> like a valid Postgres-protocol message, on the assumption that
> message-boundary sync could never be recovered reliably.  But on
> further thought, it would start by taking the second through fifth
> bytes of the input as a message length, and it wouldn't really do
> anything interesting until it had collected that much input ---
> and, if those bytes were text, that would look like a large number.
> So what the server is really doing is waiting for the rest of the
> imaginary message, which will never arrive.
> 
> Maybe we should have more sanity checking on alleged message lengths.
> I recall there is some checking of that sort in libpq, but I'm less
> sure about the server.

Yes, this would make sense.

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: client waits for end of update operation and server proc is idle

От
Tom Lane
Дата:
Matthias Apitz <guru@unixarea.de> writes:
> El día domingo, abril 25, 2021 a las 11:51:45a. m. -0400, Tom Lane escribió:
>> Hmph.  So why wasn't this visible in the tcpdump trace?

> It was visible. That's why I detected it while looking up the TCP
> packages before the UPDATE:

> 10:57:16.051326 IP 127.0.0.1.52288 > 127.0.0.1.5432: Flags [P.], seq 14280:14407, ack 120304, win 512, options
[nop,nop,TSval 3424445708 ecr 3424445708], length 127 
>         0x0000:  4500 00b3 b84e 4000 4006 83f4 7f00 0001  E....N@.@.......
>         0x0010:  7f00 0001 cc40 1538 d9b9 44bb fcd1 afb7  .....@.8..D.....
>         0x0020:  8018 0200 fea7 0000 0101 080a cc1c e50c  ................
>         0x0030:  cc1c e50c 5352 502d 3236 3937 363a 2072  ....SRP-26976:.r
>         0x0040:  6573 746f 7265 6420 7374 7275 6374 206b  estored.struct.k
>         0x0050:  6574 7465 7361 747a 2066 6f72 2073 6574  ettesatz.for.set
>         0x0060:  6964 3a32 2c20 6465 736b 3a33 3831 3832  id:2,.desk:38182
>         0x0070:  3038 302c 2066 656c 646e 723a 362c 206b  080,.feldnr:6,.k
>         0x0080:  6174 6b65 793a 3136 3839 3532 3937 392c  atkey:168952979,
>         0x0090:  206b 6574 7465 7361 747a 2e66 7265 693a  .kettesatz.frei:
>         0x00a0:  302c 206b 6574 7465 7361 747a 2e61 6e7a  0,.kettesatz.anz
>         0x00b0:  3235 0a                                  25.

Ah.  On looking at the server code, I see there *is* some validation
on the message type code --- but here, the first byte is 'S' which
happens to be a legal message type, so you got past that check.
Still, the apparent message length is then "RP-2" (0x52502d32)
which is bigger than 1GB so I don't quite understand why you didn't
get an "out of memory" error.

In any case, we could clearly do with some more sanity checking than
is there now.  I'll post a patch for that in a bit.

            regards, tom lane



Re: client waits for end of update operation and server proc is idle

От
Tom Lane
Дата:
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> Am Sun, Apr 25, 2021 at 01:21:25PM -0400 schrieb Tom Lane:
>> Still, the apparent message length is then "RP-2" (0x52502d32)
>> which is bigger than 1GB so I don't quite understand why you didn't
>> get an "out of memory" error.

> But he did, didn't he ?

Those log messages do appear to match the theory, but in the case at
hand, the server seems to have hung up without throwing any error.

However, we'd reproduce that if the apparent "message length" was
less than 1GB (assuming the server process could malloc that much
memory).  That'd occur for any second byte less than 0x40, such
as an ASCII digit.

            regards, tom lane



Re: client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
El día domingo, abril 25, 2021 a las 08:05:21p. m. +0200, Karsten Hilbert escribió:

> > > > The serverlog has around this time (sorry for German):
> > > >
> > > > 2021-04-23 05:55:23.591 CEST [2317] LOG:  unvollständige Message vom Client
> 
>     "incomplete message from client"
> 
> > > > 2021-04-23 05:55:23.593 CEST [2317] FEHLER:  Speicher aufgebraucht
> 
>     "memory exhausted"
> 
> > > > 2021-04-23 05:55:23.593 CEST [2317] DETAIL:  Kann Zeichenkettenpuffer mit 0 Bytes nicht um 1380986158 Bytes
vergrößern.
> 
>     "cannot extend 0-byte string buffer by 1380986158 bytes"
> 
> > > > 2021-04-23 05:55:23.593 CEST [2317] LOG:  konnte Daten nicht an den Client senden: Datenübergabe unterbrochen
(brokenpipe)
 
> 

These messages at 05:55:xx are caused by our automatic deployment of the
software every day which starts at 05:55:00 by cron and short after installing all
software all servers (the PostgreSQL clients) get stopped (i.e. killed)
and restarted. The hung appears a few minutes later at 05:57:xx.

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
El día domingo, abril 25, 2021 a las 01:54:49p. m. +0200, Matthias Apitz escribió:

...

> # select * from pg_stat_activity where client_port = 52288;
> 
>  datid  | datname |  pid  | leader_pid | usesysid | usename |        application_name         | client_addr |
client_hostname| client_port |          backend_start          |           xact_start            |
query_start          |          state_change           | wait_event_type | wait_event |        state        |
backend_xid| backend_xmin |        query        |  backend_type  
 
>
--------+---------+-------+------------+----------+---------+---------------------------------+-------------+-----------------+-------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+-----------------+------------+---------------------+-------------+--------------+---------------------+----------------
>  951941 | sisis   | 31894 |            |    16384 | sisis   | SunRise DBCALL V7.2 (pid=31864) | 127.0.0.1   |
       |       52288 | 25.04.2021 10:57:15.922137 CEST | 25.04.2021 10:57:15.968081 CEST | 25.04.2021 10:57:16.051256
CEST| 25.04.2021 10:57:16.051273 CEST | Client          | ClientRead | idle in transaction |    18314796 |     18314796
|fetch per_kette_seq | client backend
 

Having solved this issue, one last question: Is there any good way for a
C-written or ESQL/C-written client to get the local port number or even
the PID of the PostgreSQL server process on the other end of the
connection for logging/tracing purpose?

Thanks

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
¡Con Cuba no te metas!  «»  Don't mess with Cuba!  «»  Leg Dich nicht mit Kuba an!
http://www.cubadebate.cu/noticias/2020/12/25/en-video-con-cuba-no-te-metas/



Re: client waits for end of update operation and server proc is idle

От
Matthias Apitz
Дата:
El día lunes, abril 26, 2021 a las 07:38:40a. m. +0200, Matthias Apitz escribió:

> Having solved this issue, one last question: Is there any good way for a
> C-written or ESQL/C-written client to get the local port number or even
> the PID of the PostgreSQL server process on the other end of the
> connection for logging/tracing purpose?

Ah, I found it as

int PQbackendPID(const PGconn *conn);

One should read firt the fine documentatio befor asking stupid
questions: https://www.postgresql.org/docs/11/libpq-status.html

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
¡Con Cuba no te metas!  «»  Don't mess with Cuba!  «»  Leg Dich nicht mit Kuba an!
http://www.cubadebate.cu/noticias/2020/12/25/en-video-con-cuba-no-te-metas/