Re: [RFC] Should we fix postmaster to avoid slow shutdown?
От | Tom Lane |
---|---|
Тема | Re: [RFC] Should we fix postmaster to avoid slow shutdown? |
Дата | |
Msg-id | 21221.1479830385@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [RFC] Should we fix postmaster to avoid slow shutdown? ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>) |
Ответы |
Re: [RFC] Should we fix postmaster to avoid slow shutdown?
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes: > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] >> The point I was trying to make is that I think the forced-removal behavior >> is not desirable, and therefore committing a patch that makes it be graven >> in stone is not desirable either. > I totally agree that we should pursue the direction for escaping from the complete loss of stats files. Personally, Iwould like to combine that with the idea of persistent performance diagnosis information for long-term analysis (IIRC, someoneproposed it.) However, I don't think my patch will make everyone forget about the problem of stats file loss duringrecovery. The problem exists with or without my patch, and my patch doesn't have the power to delute the importanceof the problem. If you are worried about memory, we can add an entry for the problem in TODO list that Bruce-sanis maintaining. > Or, maybe we can just stop removing the stats files during recovery by keeping the files of previous generation and usingit as the current one. I haven't seen how fresh the previous generation is (500ms ago?). A bit older might be betterthan nothing. Freshness isn't the issue. The stats file isn't there at all, in the permanent stats directory, unless the collector takes the time to write it before exiting. Without that, we have unrecoverable loss of the stats data. Now, that isn't as bad as loss of the SQL data content, but it's not good either. It's already the case that the pgstats code writes the stats data under a temporary file name and then renames it into place atomically. So the prospects for corrupt data are not large, and I do not think that the existing removal behavior was intended to prevent that. Rather, the concern was that if you do a point-in-time recovery to someplace much earlier on the WAL timeline, the stats file will be out of sync with what's now in your database. That's a valid point, but deleting the stats file during *any* recovery seems like an overreaction. The simplest solution I can think of is to delete the stats file when doing a PITR operation, but not during simple crash recovery. I've not looked to see how hard it would be to do that, but it seems like it should be a fairly minor logic tweak. Maybe decide to do the removal at the point where we intentionally stop following WAL someplace earlier than its end. Another angle we might take, independently of that, is to delete the stats file if the stats collector process itself crashes. This would provide a recovery avenue if somehow we did have a stats file that was corrupt enough to crash the collector. And it would not matter for post-startup crashes of the stats collector, because the file would not be there anyway. regards, tom lane
В списке pgsql-hackers по дате отправления:
Следующее
От: Robert HaasДата:
Сообщение: Re: [RFC] Should we fix postmaster to avoid slow shutdown?