Обсуждение: could not create lock file postmaster.pid: No such file or directory, but file does exist
could not create lock file postmaster.pid: No such file or directory, but file does exist
Hi,
This is my first post to this list, so I hope I am posting it to the correct lists. But I am really stuck and getting pretty desperate at the moment.
This weekend my database crashed while importing some Openstreetmapdata and I can’t get it back to work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I would loose some data but that would be all.
This time it is somehow different because he doesn’t recognize any of the important files anymore. For example when I try to start Postgresql again with the command:
/usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start
I get the following error:
FATAL: could not create lock file "postmaster.pid": No such file or directory
But when I do a ls –l on the directory I can see the file exists.
drwx------ 0 postgres postgres 0 Jan 24 10:07 backup
drwx------ 0 postgres postgres 0 Feb 14 11:10 base
drwx------ 0 postgres postgres 0 Feb 17 09:46 global
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_clog
-rwxr-xr-x 0 postgres postgres 4476 Oct 11 10:49 pg_hba.conf
-rwxr-xr-x 0 postgres postgres 1636 Oct 11 10:49 pg_ident.conf
drwx------ 0 postgres postgres 0 Feb 17 11:29 pg_log
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_multixact
drwx------ 0 postgres postgres 0 Feb 17 08:58 pg_notify
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_serial
drwx------ 0 postgres postgres 0 Feb 12 09:58 pg_stat_tmp
drwx------ 0 postgres postgres 0 Feb 14 09:01 pg_subtrans
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_tblspc
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_twophase
-rwxr-xr-x 0 postgres postgres 4 Oct 11 10:49 PG_VERSION
drwx------ 0 postgres postgres 0 Feb 14 13:37 pg_xlog
-rwxr-xr-x 0 postgres postgres 19168 Oct 11 11:41 postgresql.conf
-rwxr-xr-x 0 postgres postgres 121 Feb 17 08:57 postmaster.opts
-rwxr-xr-x 0 postgres postgres 88 Feb 17 08:58 postmaster.pid
I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is there anything I can do to make the system recognize this file again? And get my database up and running? Or is all hopelessly lost?
I have Postgresql 9.1 installed on Ubuntu 12.04.
Kind regards,
Rob.
Rob Goethals wrote: > This is my first post to this list, so I hope I am posting it to the correct lists. But I am really > stuck and getting pretty desperate at the moment. You should not post to more than one list. > This weekend my database crashed while importing some Openstreetmapdata and I can’t get it back to > work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I > would loose some data but that would be all. That is not a good idea. PostgreSQL should recover from a crash automatically. If you run pg_resetxlog your database cluster is damaged, and all you should do is pg_dump all the data you can, run initdb and import the data. > This time it is somehow different because he doesn’t recognize any of the important files anymore. For > example when I try to start Postgresql again with the command: > > /usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start > > I get the following error: > > FATAL: could not create lock file "postmaster.pid": No such file or directory > > But when I do a ls –l on the directory I can see the file exists. [...] > -rwxr-xr-x 0 postgres postgres 88 Feb 17 08:58 postmaster.pid > > I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is > there anything I can do to make the system recognize this file again? And get my database up and > running? Or is all hopelessly lost? > > I have Postgresql 9.1 installed on Ubuntu 12.04. What is the error message you get for cp, mv or rm? Can you describe the crash of your machine in greater detail? What was the cause? One wild guess: could it be that the OS automatically remounted the file system read-only because it encountered a problem? Check your /var/log/messages (I hope the location is the same on Ubuntu and on RHEL). In that case unmount, fsck and remount should solve the problem. Yours, Laurenz Albe
Re: could not create lock file postmaster.pid: No such file or directory, but file does exist
> -----Oorspronkelijk bericht----- > Van: Albe Laurenz [mailto:laurenz.albe@wien.gv.at] > Verzonden: maandag 17 februari 2014 14:22 > Aan: Rob Goethals > Onderwerp: RE: could not create lock file postmaster.pid: No such file or > directory, but file does exist > > Dear Rob, > > you should send your reply to the list. > This way > a) people know that your problem is solved and won't spend their time trying > to help you. > b) others can benefit from the information. OK, clear. I hereby send this reply also to the list. > > >>> This weekend my database crashed while importing some > >>> Openstreetmapdata and I can’t get it back to work again. It happened > >>> before and normally I would reset the WAL-dir with the pg_resetxlog > >> command. I would loose some data but that would be all. > >> > >> That is not a good idea. PostgreSQL should recover from a crash > >> automatically. > >> If you run pg_resetxlog your database cluster is damaged, and all you > >> should do is pg_dump all the data you can, run initdb and import the data. > > > > But what if Postgresql doesn't recover automatically? When my database > > crashed and I try to restart it, I most of the time get a message like: > > LOG: could not open file "pg_xlog/0000000100000114000000D2" (log file > > 276, segment 210): No such file or directory > > LOG: invalid primary checkpoint record > > LOG: invalid secondary checkpoint link in control file > > PANIC: could not locate a valid checkpoint record > > LOG: startup process (PID 3604) was terminated by signal 6: Aborted > > LOG: aborting startup due to startup process failure > > Interesting. > How did you get PostgreSQL into this state? Did you set fsync=off or similar? > Which storage did you put pg_xlog on? > I am adding OSM-changefiles to my database with the command: osm2pgsql --append --database $database --username $user --slim --cache 3000 --number-processes 6 --style /usr/share/osm2pgsql/default.style--extra-attributes changes.osc.gz At the moment of the crash the postgresql-log says: 2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was terminated by signal 6: Aborted 2014-02-15 00:49:04 CET LOG: terminating any other active server processes 2014-02-15 00:49:04 CET [unknown] WARNING: terminating connection because of crash of another server process 2014-02-15 00:49:04 CET [unknown] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. So what exactly is happening, I don't know. When it is trying to startup again this is the logfile output: 2014-02-15 00:49:08 CET LOG: could not open temporary statistics file "global/pgstat.tmp": Input/output error 2014-02-15 00:49:14 CET LOG: all server processes terminated; reinitializing 2014-02-15 00:49:17 CET LOG: database system was interrupted; last known up at 2014-02-15 00:32:01 CET 2014-02-15 00:49:33 CET [unknown] [unknown]LOG: connection received: host=[local] 2014-02-15 00:49:33 CET [unknown] FATAL: the database system is in recovery mode 2014-02-15 00:49:56 CET LOG: database system was not properly shut down; automatic recovery in progress 2014-02-15 00:49:57 CET [unknown] [unknown]LOG: connection received: host=[local] 2014-02-15 00:49:57 CET [unknown] FATAL: the database system is in recovery mode 2014-02-15 00:50:01 CET LOG: redo starts at 114/C8B27330 2014-02-15 00:50:02 CET LOG: could not open file "pg_xlog/0000000100000114000000CB" (log file 276, segment 203): No suchfile or directory 2014-02-15 00:50:02 CET LOG: redo done at 114/CAFFFF80 2014-02-15 00:50:02 CET LOG: checkpoint starting: end-of-recovery immediate 2014-02-15 00:50:05 CET PANIC: could not create file "pg_xlog/xlogtemp.5390": Input/output error 2014-02-15 00:50:22 CET [unknown] [unknown]LOG: connection received: host=[local] 2014-02-15 00:50:22 CET [unknown] FATAL: the database system is in recovery mode 2014-02-15 00:50:23 CET LOG: startup process (PID 5390) was terminated by signal 6: Aborted 2014-02-15 00:50:23 CET LOG: aborting startup due to startup process failure Furthermore I checked my conf-file and my fsync is indeed set to off. I mounted a directory on a NTFS network-disk (because of the available size and considering the amount of OSM-data is prettybig). This is where I put all my database data, so also the pg_xlog. > > Is there a better procedure to follow when something like this > > happens? I am fairly new at the whole Postgresql thing so I am very > > willing to learn all about it anyway I can from experienced users. I > > am googling all my way round the internet to try and solve all the > > questions I have, but as with many things there's most of the time more > than 1 answer to a problem and for me it is very hard to figure out what is the > best solution. > > No, in that case I would restore from a backup. > > >> One wild guess: could it be that the OS automatically remounted the > >> file system read-only because it encountered a problem? Check your > >> /var/log/messages (I hope the location is the same on Ubuntu and on > RHEL). > >> In that case unmount, fsck and remount should solve the problem. > > > > I am impressed. Your wild guess exactly did the trick. Manually > > unmounting, checking and remounting was all it needed. Thank you very > much!! > > That would suggest that you have a hardware problem with your storage. > It may be that your file system is corrupted. Did you fsck it? The fsck didn't work as it was mounted as cifs. So I guess I should let Windows do the checking. > > Yours, > Laurenz Albe
On 17 February 2014 14:42, Rob Goethals / SNP <Rob.Goethals@snp.nl> wrote: > 2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was terminated by signal 6: Aborted Signal 6 is usually caused by hardware issues. Then again, you also say: >I mounted a directory on a NTFS network-disk (because of the available size and considering the > amount of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog. That will cause problems as well. SMBFS does not support all the necessary file flags, locks and such that the database needs to operate on those files in a safe way. That's probably worse than running with sciss... ehr... fsync=off Alban Hertroys. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Rob Goethals / SNP <Rob.Goethals@snp.nl> writes: > When it is trying to startup again this is the logfile output: > ... > 2014-02-15 00:50:05 CET PANIC: could not create file "pg_xlog/xlogtemp.5390": Input/output error The above PANIC is the reason for the abort that happens immediately thereafter. On local storage I'd think this meant disk hardware problems, but since you say you've got the database on an NTFS volume, what it more likely means is that there's a bug in the kernel's NTFS support. Anyway, it's fruitless to try to get Postgres going again until you have a stable filesystem underneath it. Generally speaking, longtime Postgres users are very suspicious of running Postgres atop any kind of networked filesystem. We find that network filesystems are invariably less stable than local ones. NTFS seems likely to be a particularly unfortunate choice from this standpoint, as you get to benefit from Windows' bugs along with Linux's. regards, tom lane
Rob Goethals wrote: > OK, clear. I hereby send this reply also to the list. Cool. >> Interesting. >> How did you get PostgreSQL into this state? Did you set fsync=off or similar? >> Which storage did you put pg_xlog on? > 2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was terminated by signal 6: Aborted Ouch. > Furthermore I checked my conf-file and my fsync is indeed set to off. Well, that is one reason why crash recovery is not working. > I mounted a directory on a NTFS network-disk (because of the available size and considering the amount > of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog. Double ouch. CIFS is not a supported file system. At least that explains your problems. Try with a local file system or NFS with hard foreground mount. Yours, Laurenz Albe
Re: could not create lock file postmaster.pid: No such file or directory, but file does exist
OK, it is clear to me that I didn't make the best choices setting up this database. :( I am happy I found this list because I am learning a lot in a very short period of time. :) Thank you all for your tips andcomments. I will definitely move the database to a Linux-system and set fsync to on. I hope this will give me a more stable environment. Furthermore I'll dive into the whole database-backup subject so next time I'll have something to restore ifthings go wrong. Rob Goethals. > -----Oorspronkelijk bericht----- > Van: Albe Laurenz [mailto:laurenz.albe@wien.gv.at] > Verzonden: maandag 17 februari 2014 16:20 > Aan: Rob Goethals > CC: 'pgsql-general@postgresql.org' > Onderwerp: RE: could not create lock file postmaster.pid: No such file or > directory, but file does exist > > Rob Goethals wrote: > > OK, clear. I hereby send this reply also to the list. > > Cool. > > >> Interesting. > >> How did you get PostgreSQL into this state? Did you set fsync=off or > similar? > >> Which storage did you put pg_xlog on? > > > 2014-02-15 00:49:04 CET LOG: WAL writer process (PID 1127) was > > terminated by signal 6: Aborted > > Ouch. > > > Furthermore I checked my conf-file and my fsync is indeed set to off. > > Well, that is one reason why crash recovery is not working. > > > I mounted a directory on a NTFS network-disk (because of the available > > size and considering the amount of OSM-data is pretty big). This is where I > put all my database data, so also the pg_xlog. > > Double ouch. > CIFS is not a supported file system. > > At least that explains your problems. > Try with a local file system or NFS with hard foreground mount. > > Yours, > Laurenz Albe
Hi,
This is my first post to this list, so I hope I am posting it to the correct lists. But I am really stuck and getting pretty desperate at the moment.
This weekend my database crashed while importing some Openstreetmapdata and I can’t get it back to work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I would loose some data but that would be all.
This time it is somehow different because he doesn’t recognize any of the important files anymore. For example when I try to start Postgresql again with the command:
/usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start
I get the following error:
FATAL: could not create lock file "postmaster.pid": No such file or directory
But when I do a ls –l on the directory I can see the file exists.
drwx------ 0 postgres postgres 0 Jan 24 10:07 backup
drwx------ 0 postgres postgres 0 Feb 14 11:10 base
drwx------ 0 postgres postgres 0 Feb 17 09:46 global
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_clog
-rwxr-xr-x 0 postgres postgres 4476 Oct 11 10:49 pg_hba.conf
-rwxr-xr-x 0 postgres postgres 1636 Oct 11 10:49 pg_ident.conf
drwx------ 0 postgres postgres 0 Feb 17 11:29 pg_log
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_multixact
drwx------ 0 postgres postgres 0 Feb 17 08:58 pg_notify
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_serial
drwx------ 0 postgres postgres 0 Feb 12 09:58 pg_stat_tmp
drwx------ 0 postgres postgres 0 Feb 14 09:01 pg_subtrans
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_tblspc
drwx------ 0 postgres postgres 0 Oct 11 10:49 pg_twophase
-rwxr-xr-x 0 postgres postgres 4 Oct 11 10:49 PG_VERSION
drwx------ 0 postgres postgres 0 Feb 14 13:37 pg_xlog
-rwxr-xr-x 0 postgres postgres 19168 Oct 11 11:41 postgresql.conf
-rwxr-xr-x 0 postgres postgres 121 Feb 17 08:57 postmaster.opts
-rwxr-xr-x 0 postgres postgres 88 Feb 17 08:58 postmaster.pid
I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is there anything I can do to make the system recognize this file again? And get my database up and running? Or is all hopelessly lost?
I have Postgresql 9.1 installed on Ubuntu 12.04.
Kind regards,
Rob.