pgsql: Fix broken error handling in parallel pg_dump/pg_restore.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема pgsql: Fix broken error handling in parallel pg_dump/pg_restore.
Дата
Msg-id E1b5br6-0004jn-Dj@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
Fix broken error handling in parallel pg_dump/pg_restore.

In the original design for parallel dump, worker processes reported errors
by sending them up to the master process, which would print the messages.
This is unworkably fragile for a couple of reasons: it risks deadlock if a
worker sends an error at an unexpected time, and if the master has already
died for some reason, the user will never get to see the error at all.
Revert that idea and go back to just always printing messages to stderr.
This approach means that if all the workers fail for similar reasons (eg,
bad password or server shutdown), the user will see N copies of that
message, not only one as before.  While that's slightly annoying, it's
certainly better than not seeing any message; not to mention that we
shouldn't assume that only the first failure is interesting.

An additional problem in the same area was that the master failed to
disable SIGPIPE (at least until much too late), which meant that sending a
command to an already-dead worker would cause the master to crash silently.
That was bad enough in itself but was made worse by the total reliance on
the master to print errors: even if the worker had reported an error, you
would probably not see it, depending on timing.  Instead disable SIGPIPE
right after we've forked the workers, before attempting to send them
anything.

Additionally, the master relies on seeing socket EOF to realize that a
worker has exited prematurely --- but on Windows, there would be no EOF
since the socket is attached to the process that includes both the master
and worker threads, so it remains open.  Make archive_close_connection()
close the worker end of the sockets so that this acts more like the Unix
case.  It's not perfect, because if a worker thread exits without going
through exit_nicely() the closures won't happen; but that's not really
supposed to happen.

This has been wrong all along, so back-patch to 9.3 where parallel dump
was introduced.

Report: <2458.1450894615@sss.pgh.pa.us>

Branch
------
REL9_4_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/ea274b2f4bc2dfcb6196526dd749f6dca07cbc8b

Modified Files
--------------
src/bin/pg_dump/parallel.c        | 180 ++++++++++++++------------------------
src/bin/pg_dump/parallel.h        |   4 +-
src/bin/pg_dump/pg_backup_utils.c |  32 ++++++-
3 files changed, 98 insertions(+), 118 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: pgsql: Update doc text to reflect new column in MVCC phenomena table.
Следующее
От: Tom Lane
Дата:
Сообщение: pgsql: Fix broken error handling in parallel pg_dump/pg_restore.