Обсуждение: fd.c doesn't remove files on a crash-restart
Hello, fd.c[1] will remove files from pgsql_tmp on a restart but not a crash-restart per this comment: /* * NOTE: we could, but don't, call this during a post-backend-crash restart * cycle. The argument for not doing it is that someone might want to examine * the temp files for debugging purposes. This does however mean that * OpenTemporaryFile had better allow for collision with an existing temp * file name. */ I understand that this is designed this way. I think it is a bad idea because: 1. The majority crash-restarts in the wild are going to be diagnosed rather easily within the OS itself. They fall into things like OOM killer and out of disk space. 2. It can cause significant issues, we ran into this yesterday: -bash-4.1$ ls pgsql_tmp31227*|du -sh 250G There is no active process/backend with PID 31227. The database itself is only 55G, but we are taking up an 5x that with dead files. 3. The problem can get worse over time. If you have a very long running instance, any time the backend crash-restarts you have to potential to increase disk space used for no purpose. Sincerely, JD P.S. Thanks to AndrewG for his assistance in finding this. 1. http://doxygen.postgresql.org/fd_8c_source.html -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On 2016-03-16 10:53:42 -0700, Joshua D. Drake wrote: > Hello, > > fd.c[1] will remove files from pgsql_tmp on a restart but not a > crash-restart per this comment: > > /* > * NOTE: we could, but don't, call this during a post-backend-crash restart > * cycle. The argument for not doing it is that someone might want to > examine > * the temp files for debugging purposes. This does however mean that > * OpenTemporaryFile had better allow for collision with an existing temp > * file name. > */ > > I understand that this is designed this way. I think it is a bad idea > because: > > 1. The majority crash-restarts in the wild are going to be diagnosed rather > easily within the OS itself. They fall into things like OOM killer and out > of disk space. I don't buy 1), like at all. I've seen numerous production instances with crashes outside of os triggered things. > 2. It can cause significant issues, we ran into this yesterday: > > -bash-4.1$ ls pgsql_tmp31227*|du -sh > 250G > > There is no active process/backend with PID 31227. The database itself is > only 55G, but we are taking up an 5x that with dead files. > > 3. The problem can get worse over time. If you have a very long running > instance, any time the backend crash-restarts you have to potential to > increase disk space used for no purpose. But I think these outweigh the debugging benefit. Andres
On 03/16/2016 10:56 AM, Andres Freund wrote: >> I understand that this is designed this way. I think it is a bad idea >> because: >> >> 1. The majority crash-restarts in the wild are going to be diagnosed rather >> easily within the OS itself. They fall into things like OOM killer and out >> of disk space. > > I don't buy 1), like at all. I've seen numerous production instances > with crashes outside of os triggered things. I don't argue that. I argue that 9 times out of 10, it will not be one of those things but .... >> 3. The problem can get worse over time. If you have a very long running >> instance, any time the backend crash-restarts you have to potential to >> increase disk space used for no purpose. > > But I think these outweigh the debugging benefit. Well as Andrew said, we could also create postmaster start option that defaults to don't save. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote: > >>3. The problem can get worse over time. If you have a very long running > >>instance, any time the backend crash-restarts you have to potential to > >>increase disk space used for no purpose. > > > >But I think these outweigh the debugging benefit. > > Well as Andrew said, we could also create postmaster start option that > defaults to don't save. I think these days you'd simply use restart_after_crash = false. For debugging I found that to be rather valuable. Andres
"Joshua D. Drake" <jd@commandprompt.com> writes: > Hello, > fd.c[1] will remove files from pgsql_tmp on a restart but not a > crash-restart per this comment: > /* > * NOTE: we could, but don't, call this during a post-backend-crash restart > * cycle. The argument for not doing it is that someone might want to > examine > * the temp files for debugging purposes. This does however mean that > * OpenTemporaryFile had better allow for collision with an existing temp > * file name. > */ > I understand that this is designed this way. I think it is a bad idea > because: Possible compromise: remove files only in non-Assert builds? regards, tom lane
On 03/16/2016 11:04 AM, Andres Freund wrote: > On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote: >>>> 3. The problem can get worse over time. If you have a very long running >>>> instance, any time the backend crash-restarts you have to potential to >>>> increase disk space used for no purpose. >>> >>> But I think these outweigh the debugging benefit. >> >> Well as Andrew said, we could also create postmaster start option that >> defaults to don't save. > > I think these days you'd simply use restart_after_crash = false. For > debugging I found that to be rather valuable. That would have created an extended outage for this installation. I am not sure we can force that for something that should not happen in the first place (filling up the hard drive with dead files). JD > > > Andres > > -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On 03/16/2016 11:05 AM, Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: >> Hello, >> fd.c[1] will remove files from pgsql_tmp on a restart but not a >> crash-restart per this comment: > >> /* >> * NOTE: we could, but don't, call this during a post-backend-crash restart >> * cycle. The argument for not doing it is that someone might want to >> examine >> * the temp files for debugging purposes. This does however mean that >> * OpenTemporaryFile had better allow for collision with an existing temp >> * file name. >> */ > >> I understand that this is designed this way. I think it is a bad idea >> because: > > Possible compromise: remove files only in non-Assert builds? Oh, that seems absolutely reasonable. If we are using assert builds it is because we are debugging something (or should be) anyway. Sincerely, JD > > regards, tom lane > -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On 2016-03-16 11:06:23 -0700, Joshua D. Drake wrote: > On 03/16/2016 11:04 AM, Andres Freund wrote: > >On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote: > >>>>3. The problem can get worse over time. If you have a very long running > >>>>instance, any time the backend crash-restarts you have to potential to > >>>>increase disk space used for no purpose. > >>> > >>>But I think these outweigh the debugging benefit. > >> > >>Well as Andrew said, we could also create postmaster start option that > >>defaults to don't save. > > > >I think these days you'd simply use restart_after_crash = false. For > >debugging I found that to be rather valuable. > > That would have created an extended outage for this installation. I am not > sure we can force that for something that should not happen in the first > place (filling up the hard drive with dead files). Nah, I meant that if you want to debug something you'd set that (i.e. in the case you need the temp files), not you should set that.
On 03/16/2016 11:08 AM, Andres Freund wrote: >>>> Well as Andrew said, we could also create postmaster start option that >>>> defaults to don't save. >>> >>> I think these days you'd simply use restart_after_crash = false. For >>> debugging I found that to be rather valuable. >> >> That would have created an extended outage for this installation. I am not >> sure we can force that for something that should not happen in the first >> place (filling up the hard drive with dead files). > > Nah, I meant that if you want to debug something you'd set that (i.e. in > the case you need the temp files), not you should set that. Aww, yes that would be reasonable as well. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: >> Hello, >> fd.c[1] will remove files from pgsql_tmp on a restart but not a >> crash-restart per this comment: > >> /* >> * NOTE: we could, but don't, call this during a post-backend-crash restart >> * cycle. The argument for not doing it is that someone might want to >> examine >> * the temp files for debugging purposes. This does however mean that >> * OpenTemporaryFile had better allow for collision with an existing temp >> * file name. >> */ > >> I understand that this is designed this way. I think it is a bad idea >> because: > > Possible compromise: remove files only in non-Assert builds? That sorta seems like tying two things together that aren't obviously related. I think building with --enable-cassert is support to enable debugging cross-checks, not change behavior. I would vote for removing them always, and if somebody doesn't want to remove them, well then they can comment the code out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Possible compromise: remove files only in non-Assert builds? > That sorta seems like tying two things together that aren't obviously > related. I think building with --enable-cassert is support to enable > debugging cross-checks, not change behavior. Well, it's support to enable debugging, and I would classify not destroying evidence as being debugging support. regards, tom lane
On Wed, Mar 16, 2016 at 10:53 AM, Joshua D. Drake <jd@commandprompt.com> wrote: > fd.c[1] will remove files from pgsql_tmp on a restart but not a > crash-restart per this comment: > > /* > * NOTE: we could, but don't, call this during a post-backend-crash restart > * cycle. The argument for not doing it is that someone might want to > examine > * the temp files for debugging purposes. This does however mean that > * OpenTemporaryFile had better allow for collision with an existing temp > * file name. > */ > > I understand that this is designed this way. I think it is a bad idea FWIW, I've seen this get out of hand several times myself. -- Peter Geoghegan
On 3/16/16 2:16 PM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Possible compromise: remove files only in non-Assert builds? > >> That sorta seems like tying two things together that aren't obviously >> related. I think building with --enable-cassert is support to enable >> debugging cross-checks, not change behavior. > > Well, it's support to enable debugging, and I would classify not > destroying evidence as being debugging support. Another option: keep stuff around for a single restart. I don't think this would be that hard by having a file that's a list of files to remove on the next restart. On restart, remove everything in that file (and the file itself). If there's anything left, create a new file that's the list of what's left. The other nice thing about having this list is it would tell the DBA exactly what files were left after the crash vs what's new. Actually, I guess another option would be to have a separate directory to move all these files into. On restart, nuke the directory if it exists, then move stuff in there if necessary. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com