Обсуждение: what happend to my database
Hi,
I am faced with a database disapperance and seeking some explanations outside of gremlins.
I had a database running at
cat /etc/sysconfig/pgsl/postmaster
PGDATA=/qmsvol/pg_8.1.9/data
PGLOG=/var/log/pgsql/pgstartup.log
Where /qmsvol is an iSCSI block device
A couple of days ago, my server was rebooted and by the time I got to it my database was deleted, gone, zapped, not there any more.
I looked at my pgstartup.log where I see the following....
postmaster cannot access the server configuration file "/qmsvol/pg_8.1.9/postgresql.conf": Permission denied
over 17 times and then following by...
The database cluster will be initialized with locale en_US.UTF-8.
I think the following happend...
Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql was executed, the device below it was not available.....question...why the error says "permission denied" vs "file not found". In the meantime, pg_ctl kept trying and finally concluded that the data directory is blank, and hence this must be a out-of-box case and he is good to initdb the PGDATA and as it called initdb to do the job... the iSCSI volume below it came online and by then the bomb had already been dropped.
Now I need to find some facts to support this...
Where else can I look for forensics
Thanks
Medi
I am faced with a database disapperance and seeking some explanations outside of gremlins.
I had a database running at
cat /etc/sysconfig/pgsl/postmaster
PGDATA=/qmsvol/pg_8.1.9/data
PGLOG=/var/log/pgsql/pgstartup.log
Where /qmsvol is an iSCSI block device
A couple of days ago, my server was rebooted and by the time I got to it my database was deleted, gone, zapped, not there any more.
I looked at my pgstartup.log where I see the following....
postmaster cannot access the server configuration file "/qmsvol/pg_8.1.9/postgresql.conf": Permission denied
over 17 times and then following by...
The database cluster will be initialized with locale en_US.UTF-8.
I think the following happend...
Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql was executed, the device below it was not available.....question...why the error says "permission denied" vs "file not found". In the meantime, pg_ctl kept trying and finally concluded that the data directory is blank, and hence this must be a out-of-box case and he is good to initdb the PGDATA and as it called initdb to do the job... the iSCSI volume below it came online and by then the bomb had already been dropped.
Now I need to find some facts to support this...
Where else can I look for forensics
Thanks
Medi
On Wed, 28 May 2008 18:37:06 -0700 "Medi Montaseri" <montaseri@gmail.com> wrote: > Hi, > > I am faced with a database disapperance and seeking some explanations > outside of gremlins. > I had a database running at > > cat /etc/sysconfig/pgsl/postmaster > PGDATA=/qmsvol/pg_8.1.9/data > PGLOG=/var/log/pgsql/pgstartup.log > > Where /qmsvol is an iSCSI block device > A couple of days ago, my server was rebooted and by the time I got to it my > database was deleted, gone, zapped, not there any more. > > I looked at my pgstartup.log where I see the following.... > > postmaster cannot access the server configuration file > "/qmsvol/pg_8.1.9/postgresql.conf": Permission denied > over 17 times and then following by... > The database cluster will be initialized with locale en_US.UTF-8. > > I think the following happend... > Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql > was executed, the device below it was not available.....question...why the > error says "permission denied" vs "file not found". In the meantime, pg_ctl > kept trying and finally concluded that the data directory is blank, and > hence this must be a out-of-box case and he is good to initdb the PGDATA and > as it called initdb to do the job... the iSCSI volume below it came online > and by then the bomb had already been dropped. > > Now I need to find some facts to support this... When you mount a partition on linux, it does this by overlaying it's root directory with the existing one on the parent volume.Ownerships and permissions are also replaced. I expect that the /qmsvol directory will be owned by root, with fairlyrestrictive access rights. This will not be the case the root ( . ) directory on the external device, which will bepostgres-friendly. > Where else can I look for forensics I don't think you need any more! To fix this, I'd do 2 things. First, start postgres much later in the boot sequence: cd /etc/rc3.d ; mv S64postgresql S99postgresql ( and the same in rc5.d if you're using a gui at all ). and do the converse to whichever script mounts your external devices. Also add in a test that the device is mounted in thestart) block of /etc/init.d/postgresql... something simple like while [ ! -d /qmsvol/pg_8.1.9/data ] do sleep 5 done ( well, something that can't hang forever would be preferable! ). > > Thanks > Medi > hth, Steve -- Steve Holdoway <steve.holdoway@firetrust.com>
Steve Holdoway <steve.holdoway@firetrust.com> writes: > "Medi Montaseri" <montaseri@gmail.com> wrote: >> I think the following happend... >> Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql >> was executed, the device below it was not available.....question...why the >> error says "permission denied" vs "file not found". In the meantime, pg_ctl >> kept trying and finally concluded that the data directory is blank, and >> hence this must be a out-of-box case and he is good to initdb the PGDATA and >> as it called initdb to do the job... the iSCSI volume below it came online >> and by then the bomb had already been dropped. >> >> Now I need to find some facts to support this... > When you mount a partition on linux, it does this by overlaying it's root directory with the existing one on the parentvolume. Ownerships and permissions are also replaced. I expect that the /qmsvol directory will be owned by root, withfairly restrictive access rights. This will not be the case the root ( . ) directory on the external device, which willbe postgres-friendly. >> Where else can I look for forensics > I don't think you need any more! To fix this, I'd do 2 things. First, start postgres much later in the boot sequence: > cd /etc/rc3.d ; mv S64postgresql S99postgresql > ( and the same in rc5.d if you're using a gui at all ). The other thing to do is remove the auto-initdb behavior in your startup script. We've done that in recent releases because of prior reports of this type of problem. The OP's script is evidently still old-school, though. regards, tom lane
Yes, this type of presumptuous behavior to wipe out a production database based on a few checks is too risky...
Behavior one:
First out-of-box time, pg_ctl does not find any database files, it tells the user that "sorry I did not find any database to start....see initdb....
Result: we have a semi-unhappy user/admin that says... what is initdb
Behavior two:
In order to enhance the out-of-box experience, we have wiped out a production environment, leading to many unhappy staff and customers....
PG developers...I am not impressed at all...
Medi
Behavior one:
First out-of-box time, pg_ctl does not find any database files, it tells the user that "sorry I did not find any database to start....see initdb....
Result: we have a semi-unhappy user/admin that says... what is initdb
Behavior two:
In order to enhance the out-of-box experience, we have wiped out a production environment, leading to many unhappy staff and customers....
PG developers...I am not impressed at all...
Medi
On Wed, May 28, 2008 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve Holdoway <steve.holdoway@firetrust.com> writes:
> "Medi Montaseri" <montaseri@gmail.com> wrote:>> I think the following happend...The other thing to do is remove the auto-initdb behavior in your startup
>> Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql
>> was executed, the device below it was not available.....question...why the
>> error says "permission denied" vs "file not found". In the meantime, pg_ctl
>> kept trying and finally concluded that the data directory is blank, and
>> hence this must be a out-of-box case and he is good to initdb the PGDATA and
>> as it called initdb to do the job... the iSCSI volume below it came online
>> and by then the bomb had already been dropped.
>>
>> Now I need to find some facts to support this...
> When you mount a partition on linux, it does this by overlaying it's root directory with the existing one on the parent volume. Ownerships and permissions are also replaced. I expect that the /qmsvol directory will be owned by root, with fairly restrictive access rights. This will not be the case the root ( . ) directory on the external device, which will be postgres-friendly.
>> Where else can I look for forensics
> I don't think you need any more! To fix this, I'd do 2 things. First, start postgres much later in the boot sequence:
> cd /etc/rc3.d ; mv S64postgresql S99postgresql
> ( and the same in rc5.d if you're using a gui at all ).
script. We've done that in recent releases because of prior reports of
this type of problem. The OP's script is evidently still old-school,
though.
regards, tom lane
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
On Wed, May 28, 2008 at 11:14 PM, Medi Montaseri <montaseri@gmail.com> wrote: > Yes, this type of presumptuous behavior to wipe out a production database > based on a few checks is too risky... > > Behavior one: > First out-of-box time, pg_ctl does not find any database files, it tells the > user that "sorry I did not find any database to start....see initdb.... > Result: we have a semi-unhappy user/admin that says... what is initdb > > Behavior two: > In order to enhance the out-of-box experience, we have wiped out a > production environment, leading to many unhappy staff and customers.... > > PG developers...I am not impressed at all... In defense of the pg developers, the behaviour you describe was removed long ago BECAUSE of the issues you mention. The fact is that pg developers can't police every distro out there to make sure they've removed such hinky behaviour from their startup scripts. So, the persons to NOT be impressed with at all are the folks who maintain your OS's postgresql packaging, not the pg developers. Course, you can always switch to MySQL, or Oracle, or MSSQL where nothing like that ever happens. uh huh.
On Tue, Jun 10, 2008 at 12:49 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Never...I rather stay and fix it...than run away to a different country
Thanks
On Wed, May 28, 2008 at 11:14 PM, Medi Montaseri <montaseri@gmail.com> wrote:In defense of the pg developers, the behaviour you describe was
> Yes, this type of presumptuous behavior to wipe out a production database
> based on a few checks is too risky...
>
> Behavior one:
> First out-of-box time, pg_ctl does not find any database files, it tells the
> user that "sorry I did not find any database to start....see initdb....
> Result: we have a semi-unhappy user/admin that says... what is initdb
>
> Behavior two:
> In order to enhance the out-of-box experience, we have wiped out a
> production environment, leading to many unhappy staff and customers....
>
> PG developers...I am not impressed at all...
removed long ago BECAUSE of the issues you mention.
The fact is that pg developers can't police every distro out there to
make sure they've removed such hinky behaviour from their startup
scripts. So, the persons to NOT be impressed with at all are the
folks who maintain your OS's postgresql packaging, not the pg
developers.
stand corrected
Course, you can always switch to MySQL, or Oracle, or MSSQL where
nothing like that ever happens. uh huh.
Never...I rather stay and fix it...than run away to a different country
Thanks
On Tue, Jun 10, 2008 at 1:57 PM, Medi Montaseri <montaseri@gmail.com> wrote: > > > On Tue, Jun 10, 2008 at 12:49 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Wed, May 28, 2008 at 11:14 PM, Medi Montaseri <montaseri@gmail.com> >> wrote: >> > Yes, this type of presumptuous behavior to wipe out a production >> > database >> > based on a few checks is too risky... >> > >> > Behavior one: >> > First out-of-box time, pg_ctl does not find any database files, it tells >> > the >> > user that "sorry I did not find any database to start....see initdb.... >> > Result: we have a semi-unhappy user/admin that says... what is initdb >> > >> > Behavior two: >> > In order to enhance the out-of-box experience, we have wiped out a >> > production environment, leading to many unhappy staff and customers.... >> > >> > PG developers...I am not impressed at all... >> >> In defense of the pg developers, the behaviour you describe was >> removed long ago BECAUSE of the issues you mention. >> >> The fact is that pg developers can't police every distro out there to >> make sure they've removed such hinky behaviour from their startup >> scripts. So, the persons to NOT be impressed with at all are the >> folks who maintain your OS's postgresql packaging, not the pg >> developers. > > > stand corrected > >> >> >> Course, you can always switch to MySQL, or Oracle, or MSSQL where >> nothing like that ever happens. uh huh. > > Never...I rather stay and fix it...than run away to a different country Me too. Sorry I really shoulda tossed a smily on the end there... :) There's one now...