Обсуждение: Usability improvements for pg_stop_backup()
Hackers, Since Gabrielle has improved archiving with pg_stat_archiver in 9.4, I'd like to go further and improve the usability of pg_stop_backup(). However, based on my IRC discussion with Vik, there might not be consensus on what the right behavior *should* be. This is for 9.5, of course. Currently, if archive_command is failing, pg_stop_backup() will hang forever. The only way to figure out what's wrong with pg_stop_backup() is to tail the PostgreSQL logs. This is difficult for users to troubleshoot, and strongly resists any kind of automation. Yes, we can work around this by setting statement_timeout, but that has two issues (a) the user has to remember to do it before the problem occurs, and (b) it won't differentiate between archive failure and other reasons it might time out. As such, I propose that pg_stop_backup() should error with an appropriate error message ("Could not archive WAL segments") after three archiving attempts. We could also add an optional parameter to raise the number of attempts from the default of three. An alternative, if we were doing this from scratch, would be for pg_stop_backup to return false or -1 or something if it couldn't archive; there are reasons why a user might not care that archive_command was failing (shared storage comes to mind). However, that would be a surprising break with backwards compatability, since currently users don't check the result value of pg_stop_backup(). Thoughts? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > Currently, if archive_command is failing, pg_stop_backup() will hang > forever. The only way to figure out what's wrong with pg_stop_backup() > is to tail the PostgreSQL logs. This is difficult for users to > troubleshoot, and strongly resists any kind of automation. That is bad. > Yes, we can work around this by setting statement_timeout, but that has > two issues (a) the user has to remember to do it before the problem > occurs, and (b) it won't differentiate between archive failure and other > reasons it might time out. Clearly not a long-term solution. > As such, I propose that pg_stop_backup() should error with an > appropriate error message ("Could not archive WAL segments") after > three > archiving attempts. We could also add an optional parameter to raise > the number of attempts from the default of three. That sounds sane to me. > An alternative, if we were doing this from scratch, would be for > pg_stop_backup to return false or -1 or something if it couldn't > archive; there are reasons why a user might not care that > archive_command was failing (shared storage comes to mind). However, > that would be a surprising break with backwards compatability, since > currently users don't check the result value of pg_stop_backup(). Some might, which is a stronger argument against changing what get returned. Even in a green field though, I would argue that pg_stop_backup() should return information about the minimum range of WAL files needed to perform a consistent recovery -- or possibly duplicate everything in the backup history file. An error seems much more appropriate to indicate that the user does not have a valid backup. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company