Обсуждение: Hung Postgres Processes

Поиск
Список
Период
Сортировка

Hung Postgres Processes

От
"Josh Berkus"
Дата:
Folks,

Just had this particular very unpleasant experience for the first time.
 I had an overnight series of data transformations running ... usually,
they run from 12:30am to 1:20 am ... and the process hung.   Badly.
  Requiring a "fast" system shutdown and restoring the database from
backup.

Here's the details:
Platform:  Hand-built Dual Athalon MP/Molex RAID 5 (UW SCSI) system.
    PostgreSQL 7.2.3
    SuSE Linux 7.3

Data imports started normally at 12:00am and apparently completed.
 Data transformation process (16-35 UPDATES and INSERTs affecting a
combined 1, 300,000 records) started at about 12:30am after the import
ended.  The data transformations are a series of functions called by a
Perl script through cron as the root user.

Sometime during the transformation process, a statement hung.   The
procedure continued running for at least 2 hours, at which point
another script, set up to detect such problems, ran a "pg_ctl -m fast
stop".   Instead of stopping, the postgresql server hung.

When I got to the machine in the morning, there were 3 processes, one
query, one checkpoint process and the postmaster which were frozen.
  SIGHUP and SIGTERM were ignored by these; SIGKILL was able to kill
the postmaster process, but the two other processes went to "D" status
and were untouchable.

I was forced to fast-shutdown the server. While Postgres did restart OK
after restarting the machine, I did not trust the data integrity, and
restored from backup.

Has anyone else encountered this kind of situation?  Is there a way to
prevent it, or a less drastic way to resolve it?   What are likely
causes?

-Josh Berkus










Re: Hung Postgres Processes

От
Tom Lane
Дата:
"Josh Berkus" <josh@agliodbs.com> writes:
> When I got to the machine in the morning, there were 3 processes, one
> query, one checkpoint process and the postmaster which were frozen.
>   SIGHUP and SIGTERM were ignored by these; SIGKILL was able to kill
> the postmaster process, but the two other processes went to "D" status
> and were untouchable.

You've got hardware problems.  It's not Postgres' fault if the disk
stops responding, which is what a process stuck in disk-wait state
implies.

            regards, tom lane