Обсуждение: fd.c doesn't remove files on a crash-restart

Поиск
Список
Период
Сортировка

fd.c doesn't remove files on a crash-restart

От
"Joshua D. Drake"
Дата:
Hello,

fd.c[1] will remove files from pgsql_tmp on a restart but not a 
crash-restart per this comment:

/*
* NOTE: we could, but don't, call this during a post-backend-crash restart
* cycle.  The argument for not doing it is that someone might want to 
examine
* the temp files for debugging purposes.  This does however mean that
* OpenTemporaryFile had better allow for collision with an existing temp
* file name.
*/

I understand that this is designed this way. I think it is a bad idea 
because:

1. The majority crash-restarts in the wild are going to be diagnosed 
rather easily within the OS itself. They fall into things like OOM 
killer and out of disk space.

2. It can cause significant issues, we ran into this yesterday:

-bash-4.1$ ls pgsql_tmp31227*|du -sh
250G    

There is no active process/backend with PID 31227. The database itself 
is only 55G, but we are taking up an 5x that with dead files.

3. The problem can get worse over time. If you have a very long running 
instance, any time the backend crash-restarts you have to potential to 
increase disk space used for no purpose.

Sincerely,

JD

P.S. Thanks to AndrewG for his assistance in finding this.

1. http://doxygen.postgresql.org/fd_8c_source.html




-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: fd.c doesn't remove files on a crash-restart

От
Andres Freund
Дата:
On 2016-03-16 10:53:42 -0700, Joshua D. Drake wrote:
> Hello,
> 
> fd.c[1] will remove files from pgsql_tmp on a restart but not a
> crash-restart per this comment:
> 
> /*
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle.  The argument for not doing it is that someone might want to
> examine
> * the temp files for debugging purposes.  This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> */
> 
> I understand that this is designed this way. I think it is a bad idea
> because:
> 
> 1. The majority crash-restarts in the wild are going to be diagnosed rather
> easily within the OS itself. They fall into things like OOM killer and out
> of disk space.

I don't buy 1), like at all. I've seen numerous production instances
with crashes outside of os triggered things.


> 2. It can cause significant issues, we ran into this yesterday:
> 
> -bash-4.1$ ls pgsql_tmp31227*|du -sh
> 250G    
> 
> There is no active process/backend with PID 31227. The database itself is
> only 55G, but we are taking up an 5x that with dead files.
> 
> 3. The problem can get worse over time. If you have a very long running
> instance, any time the backend crash-restarts you have to potential to
> increase disk space used for no purpose.

But I think these outweigh the debugging benefit.


Andres



Re: fd.c doesn't remove files on a crash-restart

От
"Joshua D. Drake"
Дата:
On 03/16/2016 10:56 AM, Andres Freund wrote:

>> I understand that this is designed this way. I think it is a bad idea
>> because:
>>
>> 1. The majority crash-restarts in the wild are going to be diagnosed rather
>> easily within the OS itself. They fall into things like OOM killer and out
>> of disk space.
>
> I don't buy 1), like at all. I've seen numerous production instances
> with crashes outside of os triggered things.

I don't argue that. I argue that 9 times out of 10, it will not be one 
of those things but ....


>> 3. The problem can get worse over time. If you have a very long running
>> instance, any time the backend crash-restarts you have to potential to
>> increase disk space used for no purpose.
>
> But I think these outweigh the debugging benefit.

Well as Andrew said, we could also create postmaster start option that 
defaults to don't save.

JD

-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: fd.c doesn't remove files on a crash-restart

От
Andres Freund
Дата:
On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote:
> >>3. The problem can get worse over time. If you have a very long running
> >>instance, any time the backend crash-restarts you have to potential to
> >>increase disk space used for no purpose.
> >
> >But I think these outweigh the debugging benefit.
> 
> Well as Andrew said, we could also create postmaster start option that
> defaults to don't save.

I think these days you'd simply use restart_after_crash = false. For
debugging I found that to be rather valuable.


Andres



Re: fd.c doesn't remove files on a crash-restart

От
Tom Lane
Дата:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> Hello,
> fd.c[1] will remove files from pgsql_tmp on a restart but not a 
> crash-restart per this comment:

> /*
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle.  The argument for not doing it is that someone might want to 
> examine
> * the temp files for debugging purposes.  This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> */

> I understand that this is designed this way. I think it is a bad idea 
> because:

Possible compromise: remove files only in non-Assert builds?
        regards, tom lane



Re: fd.c doesn't remove files on a crash-restart

От
"Joshua D. Drake"
Дата:
On 03/16/2016 11:04 AM, Andres Freund wrote:
> On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote:
>>>> 3. The problem can get worse over time. If you have a very long running
>>>> instance, any time the backend crash-restarts you have to potential to
>>>> increase disk space used for no purpose.
>>>
>>> But I think these outweigh the debugging benefit.
>>
>> Well as Andrew said, we could also create postmaster start option that
>> defaults to don't save.
>
> I think these days you'd simply use restart_after_crash = false. For
> debugging I found that to be rather valuable.

That would have created an extended outage for this installation. I am 
not sure we can force that for something that should not happen in the 
first place (filling up the hard drive with dead files).

JD

>
>
> Andres
>
>


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: fd.c doesn't remove files on a crash-restart

От
"Joshua D. Drake"
Дата:
On 03/16/2016 11:05 AM, Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
>> Hello,
>> fd.c[1] will remove files from pgsql_tmp on a restart but not a
>> crash-restart per this comment:
>
>> /*
>> * NOTE: we could, but don't, call this during a post-backend-crash restart
>> * cycle.  The argument for not doing it is that someone might want to
>> examine
>> * the temp files for debugging purposes.  This does however mean that
>> * OpenTemporaryFile had better allow for collision with an existing temp
>> * file name.
>> */
>
>> I understand that this is designed this way. I think it is a bad idea
>> because:
>
> Possible compromise: remove files only in non-Assert builds?

Oh, that seems absolutely reasonable. If we are using assert builds it 
is because we are debugging something (or should be) anyway.

Sincerely,

JD

>
>             regards, tom lane
>


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: fd.c doesn't remove files on a crash-restart

От
Andres Freund
Дата:
On 2016-03-16 11:06:23 -0700, Joshua D. Drake wrote:
> On 03/16/2016 11:04 AM, Andres Freund wrote:
> >On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote:
> >>>>3. The problem can get worse over time. If you have a very long running
> >>>>instance, any time the backend crash-restarts you have to potential to
> >>>>increase disk space used for no purpose.
> >>>
> >>>But I think these outweigh the debugging benefit.
> >>
> >>Well as Andrew said, we could also create postmaster start option that
> >>defaults to don't save.
> >
> >I think these days you'd simply use restart_after_crash = false. For
> >debugging I found that to be rather valuable.
> 
> That would have created an extended outage for this installation. I am not
> sure we can force that for something that should not happen in the first
> place (filling up the hard drive with dead files).

Nah, I meant that if you want to debug something you'd set that (i.e. in
the case you need the temp files), not you should set that.



Re: fd.c doesn't remove files on a crash-restart

От
"Joshua D. Drake"
Дата:
On 03/16/2016 11:08 AM, Andres Freund wrote:

>>>> Well as Andrew said, we could also create postmaster start option that
>>>> defaults to don't save.
>>>
>>> I think these days you'd simply use restart_after_crash = false. For
>>> debugging I found that to be rather valuable.
>>
>> That would have created an extended outage for this installation. I am not
>> sure we can force that for something that should not happen in the first
>> place (filling up the hard drive with dead files).
>
> Nah, I meant that if you want to debug something you'd set that (i.e. in
> the case you need the temp files), not you should set that.

Aww, yes that would be reasonable as well.

JD




-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: fd.c doesn't remove files on a crash-restart

От
Robert Haas
Дата:
On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
>> Hello,
>> fd.c[1] will remove files from pgsql_tmp on a restart but not a
>> crash-restart per this comment:
>
>> /*
>> * NOTE: we could, but don't, call this during a post-backend-crash restart
>> * cycle.  The argument for not doing it is that someone might want to
>> examine
>> * the temp files for debugging purposes.  This does however mean that
>> * OpenTemporaryFile had better allow for collision with an existing temp
>> * file name.
>> */
>
>> I understand that this is designed this way. I think it is a bad idea
>> because:
>
> Possible compromise: remove files only in non-Assert builds?

That sorta seems like tying two things together that aren't obviously
related.  I think building with --enable-cassert is support to enable
debugging cross-checks, not change behavior.

I would vote for removing them always, and if somebody doesn't want to
remove them, well then they can comment the code out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: fd.c doesn't remove files on a crash-restart

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Possible compromise: remove files only in non-Assert builds?

> That sorta seems like tying two things together that aren't obviously
> related.  I think building with --enable-cassert is support to enable
> debugging cross-checks, not change behavior.

Well, it's support to enable debugging, and I would classify not
destroying evidence as being debugging support.
        regards, tom lane



Re: fd.c doesn't remove files on a crash-restart

От
Peter Geoghegan
Дата:
On Wed, Mar 16, 2016 at 10:53 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
> fd.c[1] will remove files from pgsql_tmp on a restart but not a
> crash-restart per this comment:
>
> /*
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle.  The argument for not doing it is that someone might want to
> examine
> * the temp files for debugging purposes.  This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> */
>
> I understand that this is designed this way. I think it is a bad idea

FWIW, I've seen this get out of hand several times myself.

-- 
Peter Geoghegan



Re: fd.c doesn't remove files on a crash-restart

От
Jim Nasby
Дата:
On 3/16/16 2:16 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Possible compromise: remove files only in non-Assert builds?
>
>> That sorta seems like tying two things together that aren't obviously
>> related.  I think building with --enable-cassert is support to enable
>> debugging cross-checks, not change behavior.
>
> Well, it's support to enable debugging, and I would classify not
> destroying evidence as being debugging support.

Another option: keep stuff around for a single restart. I don't think 
this would be that hard by having a file that's a list of files to 
remove on the next restart. On restart, remove everything in that file 
(and the file itself). If there's anything left, create a new file 
that's the list of what's left.

The other nice thing about having this list is it would tell the DBA 
exactly what files were left after the crash vs what's new.

Actually, I guess another option would be to have a separate directory 
to move all these files into. On restart, nuke the directory if it 
exists, then move stuff in there if necessary.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com