Обсуждение: Missing domain socket after reboot.

Поиск
Список
Период
Сортировка

Missing domain socket after reboot.

От
Bill Moseley
Дата:
After a reboot today Postgresql 8.1 came back up and started
accepting connections over TCP but the unix socket file was missing.

This is on Debian Stable, and I can't imagine what might of removed
the file.

Running psql I get:

    $ psql test
    psql: could not connect to server: No such file or directory
            Is the server running locally and accepting
            connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Yep, missing:


    $ ls -la /var/run/postgresql
    total 8
    drwxrwsr-x   2 postgres postgres 4096 2006-06-21 17:03 .
    drwxr-xr-x  16 root     root     4096 2006-06-21 21:10 ..

Config looks ok:

    /etc/postgresql/8.1/main$ fgrep unix_socket_dir postgresql.conf
    unix_socket_directory = '/var/run/postgresql'

Startup option:

    $ ps ux -u postgres | grep unix_socket
    postgres  1512  0.0  0.3  17564  3476 ?        S    17:02   0:00 /usr/lib/postgresql/8.1/bin/postmaster -D
/var/lib/postgresql/8.1/main-c unix_socket_directory=/var/run/postgresql -c
config_file=/etc/postgresql/8.1/main/postgresql.conf-c hba_file=/etc/postgresql/8.1/main/pg_hba.conf -c
ident_file=/etc/postgresql/8.1/main/pg_ident.conf

Hum.  lsof knows about the file.

    $ lsof -p 1512 | grep /var/run
    postmaste 1512 postgres    4u  unix 0xf78b5980           1631 /var/run/postgresql/.s.PGSQL.5432


Any ideas what happened to the socket?


I had to stop and start the postmaster to get the socket back.



--
Bill Moseley
moseley@hank.org


Re: Missing domain socket after reboot.

От
Douglas McNaught
Дата:
Bill Moseley <moseley@hank.org> writes:

> Hum.  lsof knows about the file.
>
>     $ lsof -p 1512 | grep /var/run
>     postmaste 1512 postgres    4u  unix 0xf78b5980           1631 /var/run/postgresql/.s.PGSQL.5432
>
>
> Any ideas what happened to the socket?

Maybe something in your bootup process tried to clean up /var/run and
deleted it after the postmaster had started?

> I had to stop and start the postmaster to get the socket back.

Be interesting to see if you can reproduce it...

-Doug

Re: Missing domain socket after reboot.

От
Bill Moseley
Дата:
On Thu, Jun 22, 2006 at 08:16:05AM -0400, Douglas McNaught wrote:
> Bill Moseley <moseley@hank.org> writes:
>
> > Hum.  lsof knows about the file.
> >
> >     $ lsof -p 1512 | grep /var/run
> >     postmaste 1512 postgres    4u  unix 0xf78b5980           1631 /var/run/postgresql/.s.PGSQL.5432
> >
> >
> > Any ideas what happened to the socket?
>
> Maybe something in your bootup process tried to clean up /var/run and
> deleted it after the postmaster had started?

That's what I thought, but my quick look couldn't find anything in
the init scripts, not that that's conclusive:

    $ fgrep /var/run * | grep rm
    apache2:                [ -f /var/run/apache2/ssl_scache ] && rm -f /var/run/apache2/*ssl_scache*
    bootclean.sh:   rm -f /var/run/.clean
    bootmisc.sh:rm -f /tmp/.clean /var/run/.clean /var/lock/.clean
    portmap:          rm -f /var/run/portmap.upgrade-state
    portmap:            rm -f /var/run/portmap.state
    rsync:  rm -f /var/run/rsync.pid
    rsync:          rm -f /var/run/rsync.pid
    rsync:          rm -f /var/run/rsync.pid
    umountnfs.sh:rm -f /tmp/.clean /var/lock/.clean /var/run/.clean

But maybe postgresql is started too early.

    $ ls /etc/rc?.d  | grep postgres | head -1
    K20postgresql-8.1
    K20postgresql-8.1
    S20postgresql-8.1
    S20postgresql-8.1
    S20postgresql-8.1
    S20postgresql-8.1
    K20postgresql-8.1


Apache, for example, starts S91.

/etc/rc2.d:
K10atd                    S20courier-imap      S20mysqld-helper      S21nfs-common
K10cron                   S20courier-imap-ssl  S20netatalk           S21quotarpc
K10syslog-ng              S20courier-mta       S20nfs-kernel-server  S23ntp-server
S10sysklogd               S20courier-pop       S20ntop               S25mdadm
S11klogd                  S20courier-pop-ssl   S20oidentd            S30sysctl
S14ppp                    S20darwinss          S20postfix            S89cron
S15logical                S20exim4             S20postgresql-8.1     S91apache2
S16mountnfsforlogical.sh  S20grlogcheck        S20rmnologin          S91ifp_httpd
S18atd                    S20httpd             S20rsync              S99jabber
S18portmap                S20httpd2            S20saslauthd          S99stop-bootlogd
S19spamassassin           S20inetd             S20ssh                S99ud
S19syslog-ng              S20jabber            S20syslog-ng
S20binfmt-support         S20makedev           S20sysstat
S20courier-authdaemon     S20mysqld            S20xmail



> Be interesting to see if you can reproduce it...

Next reboot I'll look again.  It's a a production machine so I can't
really bring it up one service at a time.

--
Bill Moseley
moseley@hank.org