Patch solves the problem with blocking backend in pgwin32_waitforsinglesocket()
when it tries to send something to stat collector.
Patch makes two thing:
1) pgwin32_waitforsinglesocket(): WaitForMultipleObjectsEx now called with
finite timeout (100ms) in case of FP_WRITE and UDP socket. If timeout occurs
then pgwin32_waitforsinglesocket() returns EINTR. Reason: As it follows from
tests (see below) process may sleep forever in WaitForMultipleObjectsEx in case
of infinite timeout.
2) pgwin32_send(): add loop around WSASend and pgwin32_waitforsinglesocket().
The reason is: for UDP socket, 'ok' result from pgwin32_waitforsinglesocket()
isn't guarantee that socket is still free, it can become busy again and
following WSASend call will fail with WSAEWOULDBLOCK error.
Note, situations above occur only on very high load and very rare. About 1 time
per several hours. Personally, I don't like 1) patch way, but I can't find
better solution.
To simulate the bug, I developed test suite
(http://www.sigaev.ru/misc/wintest.tgz). Test runs one 'collector' and several
(32 by defaults) clients, which send a lot of packets to collector. Socket
library is taken from pgsql directly. Installation & testing (under MinGW):
% tar xzvf wintest.tgz
% cd wintest
% make
% ./serveres
Archive contains two socket.c:
socket.c.orig - as it in pgsql
socket.c - already patched
fprintf() calls are added to pgwin32_waitforsinglesocket() and in case of
socket.c.orig several clients never go out. Usually, it's needed 1-3 minutes to
reproduce. Test suite works harder than pgsql, and block occurs even on
uniprocessor box. It may be needed to increase number of clients to reliable
reproduce the bug.
Objections, comments, advices, suggestions?
I intend to commit patch to all affected branches today or tomorrow if there are
no objections or better ideas.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/