Обсуждение: Out of memory in CIFS leads to database crash

Поиск
Список
Период
Сортировка

Out of memory in CIFS leads to database crash

От
Umesh Kirdat
Дата:
Hello All,=0A=A0=0AIn our setup under heavy load (too many client performin=
g=0Aupdates) we have observed the underlying CIFS module runs out of memory=
 and the=0Adatabase crashes or goes in recovery mode.=0A=A0=0ANov 28 09:17:=
32 ng78 kernel: CIFS VFS=0A(1006f1e5e,pid=3D19342): Error in Open =3D Out o=
f memory<3> ISVS(0615,i=3D96)=0Aro nOe u fmoy<3> CIFS VFS(1006f16,pid196) r=
or i pn u fmmor=0ANov 28 09:17:32 ng78 postgres[19342]: [10-1] 192.168.20.7=
8 19342 2013-11-28=0A09:17:32.882 PST ERROR: could not open file "base/1638=
4/16794":=0ACannot allocate memory=0ANov 28 09:17:32 ng78 postgres[19342]: =
[10-2] 192.168.20.78 19342 2013-11-28=0A09:17:32.882 PST STATEMENT: select =
=0A=A0=0AThe physical memory on the machine is 64 GB=0APostgres version 9.0=
.4=0AHardware is 64 bit=0A=A0=0AI wish to know why is the database crashing=
 if the file=0Aopen fails? Why can't it handle it gracefully by rolling bac=
k the transaction?=0AUmesh

Re: Out of memory in CIFS leads to database crash

От
Jeff Janes
Дата:
On Tue, Jan 7, 2014 at 2:03 AM, Umesh Kirdat <umesh.kirdat@yahoo.com> wrote:

> Hello All,
>
> In our setup under heavy load (too many client performing updates) we have
> observed the underlying CIFS module runs out of memory and the database
> crashes or goes in recovery mode.
>
> Nov 28 09:17:32 ng78 kernel: CIFS VFS (1006f1e5e,pid=19342): Error in Open
> = Out of memory<3> ISVS(0615,i=96) ro nOe u fmoy<3> CIFS
> VFS(1006f16,pid196) ror i pn u fmmor
> Nov 28 09:17:32 ng78 postgres[19342]: [10-1] 192.168.20.78 19342
> 2013-11-28 09:17:32.882 PST ERROR: could not open file "base/16384/16794":
> Cannot allocate memory
> Nov 28 09:17:32 ng78 postgres[19342]: [10-2] 192.168.20.78 19342
> 2013-11-28 09:17:32.882 PST STATEMENT: select
>
> The physical memory on the machine is 64 GB
> Postgres version 9.0.4
> Hardware is 64 bit
>
> I wish to know why is the database crashing if the file open fails? Why
> can't it handle it gracefully by rolling back the transaction?
>

Based on the section of the log you are showing, it looks like it did just
roll back the transaction.  A crash should be showing you PANIC messages,
not just ERROR.  Is there more to the log than you are showing?  If you are
logging over CIFS as well, perhaps the PANIC messages are getting lost
because they can't be logged.

I don't think that running with the data directory on CIFS is supported.  I
certainly wouldn't be brave enough to do that with data I care about.

Cheers,

Jeff

Re: Out of memory in CIFS leads to database crash

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Tue, Jan 7, 2014 at 2:03 AM, Umesh Kirdat <umesh.kirdat@yahoo.com> wrote:
>> I wish to know why is the database crashing if the file open fails? Why
>> can't it handle it gracefully by rolling back the transaction?

> Based on the section of the log you are showing, it looks like it did just
> roll back the transaction.  A crash should be showing you PANIC messages,
> not just ERROR.  Is there more to the log than you are showing?  If you are
> logging over CIFS as well, perhaps the PANIC messages are getting lost
> because they can't be logged.

We will PANIC on I/O failure involving the WAL log, but as you say, this
log extract isn't showing instances of that.  I/O failures on ordinary
data files shouldn't result in a panic.  (I'm not sure whether it'd be
practical to downgrade the panic for WAL write failures.  Certainly, the
database won't be good for much if it can't commit transactions.  A WAL
write failure also implies that data from transactions besides the one
doing the write may be in jeopardy, so just pretending that the system
as a whole can carry on doesn't sound all that safe.)

> I don't think that running with the data directory on CIFS is supported.  I
> certainly wouldn't be brave enough to do that with data I care about.

You should certainly be keeping the WAL log on a trustworthy filesystem;
and frankly I'm not sure what the point is of using a database on
known-untrustworthy storage of any breed.  We can't be more reliable
than the underlying storage is.

            regards, tom lane