Обсуждение: [HACKERS] Clarifying "server starting" messaging in pg_ctl start without --wait
[HACKERS] Clarifying "server starting" messaging in pg_ctl start without --wait
От
Ryan Murphy
Дата:
Hi Postgres Devs,
I had a suggestion regarding the output pg_ctl gives when you use it to start the postgres server. At first I was going to write a patch, but then I decided to just ask you guys first to see what you think.I'm happy to write a patch for this if it's helpful, though it's such a simple change that if one of the core devs wants this s/he can probably more easily just add it themselves.
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
Peter Eisentraut
Дата:
On 12/20/16 3:31 PM, Ryan Murphy wrote: > I'm concerned some new users may not understand this behavior of pg_ctl, > so I wanted to suggest that we add some additional messaging after > "server starting" - something like: > > $ pg_ctl -D datadir -l logfile start > server starting > (to wait for confirmation that server actually started, try pg_ctl again > with --wait) Maybe the fix is to make --wait the default? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
David Fetter
Дата:
On Tue, Dec 20, 2016 at 03:43:11PM -0500, Peter Eisentraut wrote: > On 12/20/16 3:31 PM, Ryan Murphy wrote: > > I'm concerned some new users may not understand this behavior of pg_ctl, > > so I wanted to suggest that we add some additional messaging after > > "server starting" - something like: > > > > $ pg_ctl -D datadir -l logfile start > > server starting > > (to wait for confirmation that server actually started, try pg_ctl again > > with --wait) > > Maybe the fix is to make --wait the default? +1 It's not super useful to have the prompt return while the server is still starting up. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Ryan Murphy <ryanfmurphy@gmail.com> writes: > I'm concerned some new users may not understand this behavior of pg_ctl, so > I wanted to suggest that we add some additional messaging after "server > starting" - something like: > $ pg_ctl -D datadir -l logfile start > server starting > (to wait for confirmation that server actually started, try pg_ctl again > with --wait) That seems annoyingly verbose and nanny-ish. Perhaps we could get the point across like this: $ pg_ctl -D datadir -l logfile start requested server to start $ regards, tom lane
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > Maybe the fix is to make --wait the default? I was wondering about that too ... does anyone remember the rationale for the current behavior? But the message for the non-wait case seems like it could stand to be improved independently of that. regards, tom lane
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
Alvaro Herrera
Дата:
Tom Lane wrote: > Ryan Murphy <ryanfmurphy@gmail.com> writes: > > I'm concerned some new users may not understand this behavior of pg_ctl, so > > I wanted to suggest that we add some additional messaging after "server > > starting" - something like: > > > $ pg_ctl -D datadir -l logfile start > > server starting > > (to wait for confirmation that server actually started, try pg_ctl again > > with --wait) > > That seems annoyingly verbose and nanny-ish. Perhaps we could get the > point across like this: > > $ pg_ctl -D datadir -l logfile start > requested server to start +1, but also +1 to making --wait the default. Extra points if systemd start scripts are broken by the change ;-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
"David G. Johnston"
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Maybe the fix is to make --wait the default?
I was wondering about that too ... does anyone remember the rationale
for the current behavior? But the message for the non-wait case seems
like it could stand to be improved independently of that.
Not totally independent.
If the default is changed to --wait then the message can be written assuming the user understands what "--no-wait" does; but if the default is left "--no-wait" then cluing the user into the asynchronous behavior and telling them how to get the more expected synchronous behavior would be helpful.
David J.
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
Peter Eisentraut
Дата:
On 12/20/16 3:49 PM, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> Maybe the fix is to make --wait the default? > I was wondering about that too ... does anyone remember the rationale > for the current behavior? Probably because that didn't work reliably before pg_ctl learned how to get the right port number and PQping() and such things. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
Peter Eisentraut
Дата:
On 12/20/16 3:43 PM, Peter Eisentraut wrote: > On 12/20/16 3:31 PM, Ryan Murphy wrote: >> I'm concerned some new users may not understand this behavior of pg_ctl, >> so I wanted to suggest that we add some additional messaging after >> "server starting" - something like: >> >> $ pg_ctl -D datadir -l logfile start >> server starting >> (to wait for confirmation that server actually started, try pg_ctl again >> with --wait) > > Maybe the fix is to make --wait the default? Here is a patch for that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
Michael Paquier
Дата:
On Fri, Dec 23, 2016 at 10:47 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 12/20/16 3:43 PM, Peter Eisentraut wrote: >> On 12/20/16 3:31 PM, Ryan Murphy wrote: >>> I'm concerned some new users may not understand this behavior of pg_ctl, >>> so I wanted to suggest that we add some additional messaging after >>> "server starting" - something like: >>> >>> $ pg_ctl -D datadir -l logfile start >>> server starting >>> (to wait for confirmation that server actually started, try pg_ctl again >>> with --wait) >> >> Maybe the fix is to make --wait the default? > > Here is a patch for that. Is there still a use case for --no-wait in the real world? Why not simply ripping it out? -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > Is there still a use case for --no-wait in the real world? Sure. Most system startup scripts aren't going to want to wait. If we take it out those people will go back to starting the postmaster by hand. regards, tom lane
On 12/23/16 6:10 PM, Tom Lane wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> Is there still a use case for --no-wait in the real world? > > Sure. Most system startup scripts aren't going to want to wait. > If we take it out those people will go back to starting the postmaster > by hand. Presumably they could just background it... since it's not going to be long-lived it's presumably not that big a deal. Though, seems like many startup scripts like to make sure what they're starting is actually working. What might be interesting is a mode that waited for everything but recovery so at least you know the config is valid, the port is available, etc. That would be much harder to handle externally. </feature_creep> -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
Robert Haas
Дата:
On Fri, Dec 23, 2016 at 7:25 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 12/23/16 6:10 PM, Tom Lane wrote: >> Michael Paquier <michael.paquier@gmail.com> writes: >>> Is there still a use case for --no-wait in the real world? >> >> Sure. Most system startup scripts aren't going to want to wait. >> If we take it out those people will go back to starting the postmaster >> by hand. > > Presumably they could just background it... since it's not going to be > long-lived it's presumably not that big a deal. Though, seems like many > startup scripts like to make sure what they're starting is actually working. Making --wait the default may or may not be sensible -- I'm not sure -- but removing --no-wait is clearly a bad idea, and we shouldn't do it. The fact that the problems created by removing it might be solvable doesn't mean that it's a good idea to create them in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Clarifying "server starting" messaging in pg_ctl startwithout --wait
От
Ryan Murphy
Дата:
Making --wait the default may or may not be sensible -- I'm not sure
-- but removing --no-wait is clearly a bad idea, and we shouldn't do
it. The fact that the problems created by removing it might be
solvable doesn't mean that it's a good idea to create them in the
first place.
I agree with Robert - pg_ctl is no doubt used in all kinds of scripts that would then have to change.
It may make sense to have --wait be the default though - certainly less confusing to new users!
[HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without--wait
От
Ryan Murphy
Дата:
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed (Though I could not check "make installcheck-world" as passed because it failed 1 test, I think it basically SHOULD pass- see my comment below.) Patch looks good to me and does what we talked about, and Docs seem clear and correct. I was able to build Postgres and run pg_ctl and observe that it waited by default for the 'start' action, which addressesmy original concern. `make` and `make install` went fine, and `make check` did as well, but `make installcheck-world` said (after a while): =======================1 of 55 tests failed. ======================= The diff summary is here: *** /home/my_secret_local_username/my/secret/path/to/postgres/src/interfaces/ecpg/test/expected/connect-test5.stderr 2016-08-2310:00:53.000000000 -0500 --- /home/my_secret_local_username/my/secret/path/to/postgres/src/interfaces/ecpg/test/results/connect-test5.stderr 2017-01-0600:08:40.000000000 -0600 *************** *** 36,42 **** [NO_PID]: sqlca: code: 0, state: 00000 [NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port<DEFAULT> for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 00000 ! [NO_PID]: ECPGconnect: could not open database: FATAL: database "regress_ecpg_user2" does not exist [NO_PID]: sqlca:code: 0, state: 00000 [NO_PID]: ecpg_finish: connection main closed --- 36,42 ---- [NO_PID]: sqlca: code: 0, state: 00000 [NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port<DEFAULT> for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 00000 ! [NO_PID]: ECPGconnect: could not open database: FATAL: database "my_secret_local_username" does not exist [NO_PID]: sqlca:code: 0, state: 00000 [NO_PID]: ecpg_finish: connection main closed *************** *** 73,79 **** [NO_PID]: sqlca: code: -220, state: 08003 [NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port<DEFAULT> for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 00000 ! [NO_PID]: ECPGconnect: could not open database: FATAL: database "regress_ecpg_user2" does not exist [NO_PID]: sqlca:code: 0, state: 00000 [NO_PID]: ecpg_finish: connection main closed --- 73,79 ---- [NO_PID]: sqlca: code: -220, state: 08003 [NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port<DEFAULT> for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 00000 ! [NO_PID]: ECPGconnect: could not open database: FATAL: database "my_secret_local_username" does not exist [NO_PID]: sqlca:code: 0, state: 00000 [NO_PID]: ecpg_finish: connection main closed ====================================================================== I don't actually believe this to indicate a problem though - I think perhaps there's a problem with this test, or with howI am running it. The only diff was that when it (correctly) complained of a nonexistent database, it referred to my usernamethat I was logged in as, instead of the test database name "regress_ecpg_user2". I don't think this has anythingto do with the changes to pg_ctl. I could be wrong though! I am going to leave this as "Needs review" until someone more familiar with the project double-checksthis.
On 1/6/17 12:24 AM, Ryan Murphy wrote: > I don't actually believe this to indicate a problem though - I think perhaps there's a problem with this test, or withhow I am running it. The only diff was that when it (correctly) complained of a nonexistent database, it referred tomy username that I was logged in as, instead of the test database name "regress_ecpg_user2". I don't think this has anythingto do with the changes to pg_ctl. Hrm, I'm not able to reproduce that problem. Can you run make installworld-check on a checkout of master and see if you get the same thing? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/7/17 11:14 PM, Ryan Murphy wrote: > So I realized that I've never actually done `make world` before, and > when I try that I get a funny error: > > make -C doc all > make -C src all > make -C sgml all ... > *** > ERROR: `osx' is missing on your system. ... > Do you have any idea what that means? I googled it but couldn't find > anything. I can dig around more or ask the mailing list if you have no > idea. It's good to reply on the list (which I've cc'd) as there's lots of folks that can help you that way. To answer your question, the error has something to do with building docs. You don't really need to do that, and it can be rather painful to get setup to build them. I wouldn't bother for now. > But anyway, last time I think I was running `make installworld-check` > without first running `make world`, which I think is wrong right? - need > `make world` first? No, you don't. Some of our make targets can be a bit confusing in this regard... installworld-check will install all code (and docs, if they've been built) and then run full tests against them. There's no tests for the docs, so it doesn't matter if they get installed. The only "test" for docs is whether they build, but IMHO it's not worth it to ask a new reviewer to try and test that unless the patch has a *lot* of changes to the docs. In any case, docs won't explain why you were seeing a test failure and I wasn't. Hmm... I just thought of something though... do you have PGUSER set? That might break installworld-check. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Beena Emerson
Дата:
On Fri, Jan 6, 2017 at 11:54 AM, Ryan Murphy <ryanfmurphy@gmail.com> wrote:
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed
(Though I could not check "make installcheck-world" as passed because it failed 1 test, I think it basically SHOULD pass - see my comment below.)
Patch looks good to me and does what we talked about, and Docs seem clear and correct.
I was able to build Postgres and run pg_ctl and observe that it waited by default for the 'start' action, which addresses my original concern.
`make` and `make install` went fine, and `make check` did as well, but `make installcheck-world` said (after a while):
=======================
1 of 55 tests failed.
=======================
I am sure you would get this error even without the patch.
Thank you,
Beena Emerson
Have a Great Day!
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Beena Emerson
Дата:
Hello,
On Wed, Jan 11, 2017 t 6:06 PM, Beena Emerson <memissemerson@gmail.com> wrote:
On Fri, Jan 6, 2017 at 11:54 AM, Ryan Murphy <ryanfmurphy@gmail.com> wrote:The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed
(Though I could not check "make installcheck-world" as passed because it failed 1 test, I think it basically SHOULD pass - see my comment below.)
Patch looks good to me and does what we talked about, and Docs seem clear and correct.
I was able to build Postgres and run pg_ctl and observe that it waited by default for the 'start' action, which addresses my original concern.
`make` and `make install` went fine, and `make check` did as well, but `make installcheck-world` said (after a while):
=======================
1 of 55 tests failed.
=======================I am sure you would get this error even without the patch.
The patch is good. I do not have any comments to make about the patch.
Ryan try to run 'make install-world' then 'make -i installcheck-world', -i option will ignore the error and proceed. You can check if any other tests fails. This is a separate issue, unrelated to this patch. I do not think we should stop from changing the status because of this.
The status is now updated to 'Ready for committer'
Thank you,
Beena Emerson
Have a Great Day!
Thanks for the review Beena, I'm glad the patch is ready to go!
I think because of my environment/setup, I get errors when I try "make install-world", but I'm at work now, when I have time I will go back and try again and figure out what is wrong. I'll let you guys know if I have any questions.
Take care,
Ryan
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Peter Eisentraut
Дата:
On 1/11/17 11:20 AM, Ryan Murphy wrote: > Thanks for the review Beena, I'm glad the patch is ready to go! committed, thanks -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 14, 2017 at 11:19 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 1/11/17 11:20 AM, Ryan Murphy wrote: >> Thanks for the review Beena, I'm glad the patch is ready to go! > > committed, thanks Sorry for speaking up late. This change may confuse the users who run "pg_ctl start" to perform a crash recovery, archive recovery and standby server (with hot_standby=off) because "pg_ctl start" would not return so long time. Also during that long time, the error message "FATAL: the database system is starting up" keeps outputing. This makes me think that -W is better as the default of at least "pg_ctl start". Regards, -- Fujii Masao
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Peter Eisentraut
Дата:
On 1/15/17 11:40 PM, Fujii Masao wrote: > This change may confuse the users who run "pg_ctl start" to perform a crash > recovery, archive recovery and standby server (with hot_standby=off) because > "pg_ctl start" would not return so long time. Well, this change was made because the previous behavior confused people as well, because pg_ctl would return before it was actually done. The new state shouldn't be confusing because pg_ctl prints out progress messages. > Also during that long time, > the error message "FATAL: the database system is starting up" keeps outputing. We could potentially avoid some of this by using the new facilities in pg_ctl to read pg_control and not use PQping() before the state is out of DB_IN_CRASH_RECOVERY. Note, however, that this isn't a new problem. The way pg_ctl start -w is implemented hasn't changed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jan 17, 2017 at 11:27 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 1/15/17 11:40 PM, Fujii Masao wrote: >> This change may confuse the users who run "pg_ctl start" to perform a crash >> recovery, archive recovery and standby server (with hot_standby=off) because >> "pg_ctl start" would not return so long time. > > Well, this change was made because the previous behavior confused people > as well, because pg_ctl would return before it was actually done. > > The new state shouldn't be confusing because pg_ctl prints out progress > messages. But what if we're restarting after, say, rebooting? Then there's nobody to see the progress messages, perhaps. The system just seems to take an eternity to return to the usual runlevel. I saw the discussion on this thread, but I didn't realize that it meant that pg_ctl was going to wait for crash recovery, let alone archive recovery. That seems not good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Jan 17, 2017 at 11:27 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 1/15/17 11:40 PM, Fujii Masao wrote: > >> This change may confuse the users who run "pg_ctl start" to perform a crash > >> recovery, archive recovery and standby server (with hot_standby=off) because > >> "pg_ctl start" would not return so long time. > > > > Well, this change was made because the previous behavior confused people > > as well, because pg_ctl would return before it was actually done. > > > > The new state shouldn't be confusing because pg_ctl prints out progress > > messages. > > But what if we're restarting after, say, rebooting? Then there's > nobody to see the progress messages, perhaps. The system just seems > to take an eternity to return to the usual runlevel. Not unlike an fsck. > I saw the discussion on this thread, but I didn't realize that it > meant that pg_ctl was going to wait for crash recovery, let alone > archive recovery. That seems not good. I disagree. The database isn't done starting up until it's gone through recovery. If there are other bits of the system which are depending on the database being online, shouldn't they wait until it's actually online to be started? Admittedly, such processes should probably be prepared to try reconnecting to the database on a failure, but I don't see this as really all that different from how a journaling filesystem operates. Now, perhaps the init process needs to be adjusted so that the progress reports from pg_ctl are seen by the user, but that's outside of PG. Thanks! Stephen
On Tue, Jan 17, 2017 at 4:46 PM, Stephen Frost <sfrost@snowman.net> wrote: >> But what if we're restarting after, say, rebooting? Then there's >> nobody to see the progress messages, perhaps. The system just seems >> to take an eternity to return to the usual runlevel. > > Not unlike an fsck. Right. That's why people developed journaled filesystems like ext3 and ext4 - because waiting for increasingly-large disks to be checked for errors sucked. And that made fsck times vastly lower and everyone said "huzzah". Because waiting for things to happen stinks, and people want to do as little of it as is reasonably possible. >> I saw the discussion on this thread, but I didn't realize that it >> meant that pg_ctl was going to wait for crash recovery, let alone >> archive recovery. That seems not good. > > I disagree. The database isn't done starting up until it's gone through > recovery. If there are other bits of the system which are depending on > the database being online, shouldn't they wait until it's actually > online to be started? They aren't necessarily depending on the database; they could be entirely unrelated. > Admittedly, such processes should probably be prepared to try > reconnecting to the database on a failure, but I don't see this as > really all that different from how a journaling filesystem operates. A journaling filesystem doesn't have a mode where it enters archive recovery mode and stays there permanently leaving the system in an unusable state. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Jan 17, 2017 at 4:46 PM, Stephen Frost <sfrost@snowman.net> wrote: > >> But what if we're restarting after, say, rebooting? Then there's > >> nobody to see the progress messages, perhaps. The system just seems > >> to take an eternity to return to the usual runlevel. > > > > Not unlike an fsck. > > Right. That's why people developed journaled filesystems like ext3 > and ext4 - because waiting for increasingly-large disks to be checked > for errors sucked. And that made fsck times vastly lower and everyone > said "huzzah". Because waiting for things to happen stinks, and > people want to do as little of it as is reasonably possible. Sure, but they still have a recovery process that they go through when recovering from a crash, just as we do, and those things which are waiting for the filesystem have to wait until it is. If a PG user has an issue with waiting for recovery to finish then they should make checkpoints happen more often (typically by reducing checkpoint_timeout...), so that we don't have as much to replay through since the last one. Just as a user could reduce the journal size of ext4 if they're worried that it'll take too long for the system to replay the last set of journaled entires during recovery after a crash. > >> I saw the discussion on this thread, but I didn't realize that it > >> meant that pg_ctl was going to wait for crash recovery, let alone > >> archive recovery. That seems not good. > > > > I disagree. The database isn't done starting up until it's gone through > > recovery. If there are other bits of the system which are depending on > > the database being online, shouldn't they wait until it's actually > > online to be started? > > They aren't necessarily depending on the database; they could be > entirely unrelated. Not in modern boot systems today... If they aren't depending on the database then they can get started as soon as everything they *do* depend on is up and running. Those daemons or what-have-you which depend on the database say so through the init dependency system. > > Admittedly, such processes should probably be prepared to try > > reconnecting to the database on a failure, but I don't see this as > > really all that different from how a journaling filesystem operates. > > A journaling filesystem doesn't have a mode where it enters archive > recovery mode and stays there permanently leaving the system in an > unusable state. Now there I agree with you, whatever we're doing with pg_ctl here shouldn't mean that it never returns, but is that actually what happens with pg_ctl --wait? If so, then that's what is wrong, not this particular patch which is just making --wait the default. If I'm understanding your concern correctly, you're worried about the case of a cold standby where the database is only replaying WAL but not configured to come up as a hot standby and therefore PQping() won't ever succeed? Except, that isn't what would ever happen because the timeout for the --wait option is 60s, according to the pg_ctl docs anyway, after which it'll throw an error and say the server didn't start up, even if it would have after a few minutes. One could wonder why we have the default set to a value lower than checkpoint_timeout, making it entirely likely that the database recovery would take longer than the timeout on a busy/high-volume server that's actually checkpointing on-time, but just barely. Perhaps we need a way for pg_ctl to realize a cold-standby case and throw an error or warning if --wait is specified then, but that hardly seems like the common use-case. It also wouldn't make any sense to have anything in the init system which depended on PG being up in such a case because, well, PG isn't ever going to be 'up'. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Michael Paquier
Дата:
On Wed, Jan 18, 2017 at 7:31 AM, Stephen Frost <sfrost@snowman.net> wrote: > Perhaps we need a way for pg_ctl to realize a cold-standby case and > throw an error or warning if --wait is specified then, but that hardly > seems like the common use-case. It also wouldn't make any sense to have > anything in the init system which depended on PG being up in such a case > because, well, PG isn't ever going to be 'up'. Yeah, it seems to me that we are likely looking for a wait mode saying to exit pg_ctl once Postgres is happily rejecting connections, because that means that it is up and that it is sorting out something first before accepting them. This would basically filter the states in the control file that we find as acceptable if the connection test continues complaining about PQPING_REJECT. -- Michael
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Michael Paquier
Дата:
On Wed, Jan 18, 2017 at 10:35 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Jan 18, 2017 at 7:31 AM, Stephen Frost <sfrost@snowman.net> wrote: >> Perhaps we need a way for pg_ctl to realize a cold-standby case and >> throw an error or warning if --wait is specified then, but that hardly >> seems like the common use-case. It also wouldn't make any sense to have >> anything in the init system which depended on PG being up in such a case >> because, well, PG isn't ever going to be 'up'. > > Yeah, it seems to me that we are likely looking for a wait mode saying > to exit pg_ctl once Postgres is happily rejecting connections, because > that means that it is up and that it is sorting out something first > before accepting them. This would basically filter the states in the > control file that we find as acceptable if the connection test > continues complaining about PQPING_REJECT. Another option would be as well to log the state of the control file to the user to let him know what currently happens, and document that increasing the wait timeout is recommended if the recovery time since the last redo point takes longer. -- Michael
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Peter Eisentraut
Дата:
On 1/17/17 8:35 PM, Michael Paquier wrote: > Yeah, it seems to me that we are likely looking for a wait mode saying > to exit pg_ctl once Postgres is happily rejecting connections, because > that means that it is up and that it is sorting out something first > before accepting them. This would basically filter the states in the > control file that we find as acceptable if the connection test > continues complaining about PQPING_REJECT. Note that you can still use pg_ctl start --no-wait if that is what you want. But that should be the exception. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
Michael, * Michael Paquier (michael.paquier@gmail.com) wrote: > On Wed, Jan 18, 2017 at 7:31 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Perhaps we need a way for pg_ctl to realize a cold-standby case and > > throw an error or warning if --wait is specified then, but that hardly > > seems like the common use-case. It also wouldn't make any sense to have > > anything in the init system which depended on PG being up in such a case > > because, well, PG isn't ever going to be 'up'. > > Yeah, it seems to me that we are likely looking for a wait mode saying > to exit pg_ctl once Postgres is happily rejecting connections, because > that means that it is up and that it is sorting out something first > before accepting them. This would basically filter the states in the > control file that we find as acceptable if the connection test > continues complaining about PQPING_REJECT. If you're suggesting this *only* in the case where PG is starting up as a cold standby, then, ok, maybe. I don't think '-w' should mean anything less than "up and accepting connections" for regular or hot standby systems. I'm not really convinced that the code is worth the trouble to handle this case, but I'm not going to argue if someone wants to write it. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
Michael, * Michael Paquier (michael.paquier@gmail.com) wrote: > On Wed, Jan 18, 2017 at 10:35 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Wed, Jan 18, 2017 at 7:31 AM, Stephen Frost <sfrost@snowman.net> wrote: > >> Perhaps we need a way for pg_ctl to realize a cold-standby case and > >> throw an error or warning if --wait is specified then, but that hardly > >> seems like the common use-case. It also wouldn't make any sense to have > >> anything in the init system which depended on PG being up in such a case > >> because, well, PG isn't ever going to be 'up'. > > > > Yeah, it seems to me that we are likely looking for a wait mode saying > > to exit pg_ctl once Postgres is happily rejecting connections, because > > that means that it is up and that it is sorting out something first > > before accepting them. This would basically filter the states in the > > control file that we find as acceptable if the connection test > > continues complaining about PQPING_REJECT. > > Another option would be as well to log the state of the control file > to the user to let him know what currently happens, and document that > increasing the wait timeout is recommended if the recovery time since > the last redo point takes longer. I was actually thinking about it the other way- start out by changing them to both be 5m and then document next to checkpoint_timeout (and max_wal_size, perhaps...) that if you go changing those parameters (eg: bumping up checkpoint_timeout to 30 minutes and max_wal_size up enough that you're still checkpointing based on time and not due to running out of WAL space) then you might need to consider raising the timeout for pg_ctl to wait around for the server to finish going through crash recovery due to all of the outstanding changes since the last checkpoint. In particular, I like to think that will help people understand the downsides of raising those values; I've run into a number of cases where people seem to feel it's a win-win without any downside, but that isn't really the case. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Peter Eisentraut
Дата:
On 1/18/17 8:25 AM, Stephen Frost wrote: > I was actually thinking about it the other way- start out by changing > them to both be 5m and then document next to checkpoint_timeout (and > max_wal_size, perhaps...) that if you go changing those parameters (eg: > bumping up checkpoint_timeout to 30 minutes and max_wal_size up enough > that you're still checkpointing based on time and not due to running out > of WAL space) then you might need to consider raising the timeout for > pg_ctl to wait around for the server to finish going through crash > recovery due to all of the outstanding changes since the last > checkpoint. It is important for users to be aware of this, but I don't think the relationship between checkpoint_timeout and recovery time is linear, so it's unclear what the exact advice should be. Personally, I think the timeout in pg_ctl is wrong and needs to be disabled in practical applications, but that is a different discussion. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 1/18/17 8:25 AM, Stephen Frost wrote: > > I was actually thinking about it the other way- start out by changing > > them to both be 5m and then document next to checkpoint_timeout (and > > max_wal_size, perhaps...) that if you go changing those parameters (eg: > > bumping up checkpoint_timeout to 30 minutes and max_wal_size up enough > > that you're still checkpointing based on time and not due to running out > > of WAL space) then you might need to consider raising the timeout for > > pg_ctl to wait around for the server to finish going through crash > > recovery due to all of the outstanding changes since the last > > checkpoint. > > It is important for users to be aware of this, but I don't think the > relationship between checkpoint_timeout and recovery time is linear, so > it's unclear what the exact advice should be. I don't understand what I'm missing when it comes to checkpoint_timeout and the time required to recover from a crash. You aren't the first person to question that association, but it seems pretty clear to me. When doing recovery, we have to replay everything since the last checkpoint. If we are checkpointing at least every 5 minutes then we can't have any more than 5 minutes worth of WAL to replay, right? We could certainly have *less* WAL to replay after a crash, but we shouldn't ever have more, which makes checkpoint_timeout an upper bound on replay time. Now, if max_wal_size is set such that you don't have enough room in the WAL to store all of the changes and a checkpoint is forced early, then your recovery time will be based on how fast your system can replay max_wal_size amount of data, but even in that case, it can't be more than checkpoint_timeout, so that still serves as an upper bound. I think all I'm pointing out here is that we should make it clear that checkpoint_timeout serves as an upper bound on recovery time. Increasing it *could* lead to recovery taking longer and users should be aware of that. If they want to have a good handle on how long recovery is *likely* to take on their system then they'd need to measure their WAL rate and test their hardware to see how fast WAL is able to be replayed. > Personally, I think the timeout in pg_ctl is wrong and needs to be > disabled in practical applications, but that is a different discussion. I have to admit that the pg_ctl timeout does seem a bit odd. Thanks! Stephen
On Tue, Jan 17, 2017 at 5:31 PM, Stephen Frost <sfrost@snowman.net> wrote: > If I'm understanding your concern correctly, you're worried about the > case of a cold standby where the database is only replaying WAL but not > configured to come up as a hot standby and therefore PQping() won't ever > succeed? I think we've changed the defaults to make things better for an attended startup and worse for an unattended startup. But I think most PostgreSQL startups are probably unattended. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
Robert, * Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Jan 17, 2017 at 5:31 PM, Stephen Frost <sfrost@snowman.net> wrote: > > If I'm understanding your concern correctly, you're worried about the > > case of a cold standby where the database is only replaying WAL but not > > configured to come up as a hot standby and therefore PQping() won't ever > > succeed? > > I think we've changed the defaults to make things better for an > attended startup and worse for an unattended startup. But I think > most PostgreSQL startups are probably unattended. I don't understand how it's worse for an unattended startup to tell the init system that the database is now up and running when, in fact, it isn't. If that isn't what you meant, then it would be really helpful if you could explain a bit more what you see as being "worse" with this change for unattended startup. Thanks! Stephen
On Wed, Jan 18, 2017 at 3:43 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Tue, Jan 17, 2017 at 5:31 PM, Stephen Frost <sfrost@snowman.net> wrote: >> > If I'm understanding your concern correctly, you're worried about the >> > case of a cold standby where the database is only replaying WAL but not >> > configured to come up as a hot standby and therefore PQping() won't ever >> > succeed? >> >> I think we've changed the defaults to make things better for an >> attended startup and worse for an unattended startup. But I think >> most PostgreSQL startups are probably unattended. > > I don't understand how it's worse for an unattended startup to tell the > init system that the database is now up and running when, in fact, it > isn't. > > If that isn't what you meant, then it would be really helpful if you > could explain a bit more what you see as being "worse" with this change > for unattended startup. This seems clear as day to me, so I'm not sure what to explain. Anybody who has got a script that runs pg_ctl unattended mode likely now has to go update that script to add --no-wait. If they don't, their script may hang for whatever the timeout is (currently 60 seconds, and it sounds like Peter wants to change that to infiniity). If they have been wanting their script to hang all along, as you seem to be saying, then they'll be happy that it now does. If, on the other hand, they don't want that, then they'll be sad. In short, I bet we will get multiple reports of people getting hosed by this change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
Robert, * Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Jan 18, 2017 at 3:43 PM, Stephen Frost <sfrost@snowman.net> wrote: > > * Robert Haas (robertmhaas@gmail.com) wrote: > >> I think we've changed the defaults to make things better for an > >> attended startup and worse for an unattended startup. But I think > >> most PostgreSQL startups are probably unattended. > > > > I don't understand how it's worse for an unattended startup to tell the > > init system that the database is now up and running when, in fact, it > > isn't. > > > > If that isn't what you meant, then it would be really helpful if you > > could explain a bit more what you see as being "worse" with this change > > for unattended startup. > > This seems clear as day to me, so I'm not sure what to explain. > Anybody who has got a script that runs pg_ctl unattended mode likely > now has to go update that script to add --no-wait. If they don't, > their script may hang for whatever the timeout is (currently 60 > seconds, and it sounds like Peter wants to change that to infiniity). > If they have been wanting their script to hang all along, as you seem > to be saying, then they'll be happy that it now does. If, on the > other hand, they don't want that, then they'll be sad. Ok, you're talking about for the cold standby case here, correct? If so, then I agree that we should work a bit harder to make pg_ctl realize that configuration and have it not wait for the entire timeout in that specific situation. That said, I, honestly, don't remember the last time I ran into a cold standby configuration. For non-cold standby configurations, pg_ctl is going to return just as soon as the database has finished crash recovery, which in most cases will probably be on the order of a few seconds. Note that pg_ctl retries connecting every second, it doesn't just hang and do nothing until the timeout. For any case where the user is actually calling pg_ctl to start the database so that they can make connections to the database and to use it for something, which seems much more likely, this change will mean that they can remove the random 'sleep' they stuck in their script that they added when they discovered that pg_ctl wasn't waiting for the database to actually be online before returning. Or the '--wait' they already put in there will now be unnecessary, but that's not going to hurt anything. Or the loop they put in to keep trying to connect until the database is actually up. > In short, I bet we will get multiple reports of people getting hosed > by this change. I would be very surprised if we got any reports of people being hosed by this change, but I do think we should mention it in the release notes. Thanks! Stephen
On Wed, Jan 18, 2017 at 3:59 PM, Stephen Frost <sfrost@snowman.net> wrote: > For non-cold standby configurations, pg_ctl is going to return just as > soon as the database has finished crash recovery, which in most cases > will probably be on the order of a few seconds. /me is poleaxed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
On Wed, Jan 18, 2017 at 16:15 Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 18, 2017 at 3:59 PM, Stephen Frost <sfrost@snowman.net> wrote:
> For non-cold standby configurations, pg_ctl is going to return just as
> soon as the database has finished crash recovery, which in most cases
> will probably be on the order of a few seconds.
/me is poleaxed.
--------
Yeah, that was a confused sentence- most of the time it's going to return on the order of a few seconds because we're doing regular startup without having to do any crash recovery.
For actual crash recovery cases, it'll take between a few seconds and checkpoint_timeout, as I described up-thread, based on the amount of outstanding WAL.
Thanks!
Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Michael Paquier
Дата:
On Thu, Jan 19, 2017 at 6:20 AM, Stephen Frost <sfrost@snowman.net> wrote: > On Wed, Jan 18, 2017 at 16:15 Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Jan 18, 2017 at 3:59 PM, Stephen Frost <sfrost@snowman.net> wrote: >> > For non-cold standby configurations, pg_ctl is going to return just as >> > soon as the database has finished crash recovery, which in most cases >> > will probably be on the order of a few seconds. >> >> /me is poleaxed. >> >> Yeah, that was a confused sentence- most of the time it's going to return >> on the order of a few seconds because we're doing regular startup without >> having to do any crash recovery. >> >> >> For actual crash recovery cases, it'll take between a few seconds and >> checkpoint_timeout, as I described up-thread, based on the amount of >> outstanding WAL. Recovering up to the minumum recovery point could take minutes! -- Michael
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Michael Paquier
Дата:
On Thu, Jan 19, 2017 at 5:01 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 1/18/17 8:25 AM, Stephen Frost wrote: >> I was actually thinking about it the other way- start out by changing >> them to both be 5m and then document next to checkpoint_timeout (and >> max_wal_size, perhaps...) that if you go changing those parameters (eg: >> bumping up checkpoint_timeout to 30 minutes and max_wal_size up enough >> that you're still checkpointing based on time and not due to running out >> of WAL space) then you might need to consider raising the timeout for >> pg_ctl to wait around for the server to finish going through crash >> recovery due to all of the outstanding changes since the last >> checkpoint. > > It is important for users to be aware of this, but I don't think the > relationship between checkpoint_timeout and recovery time is linear, so > it's unclear what the exact advice should be. This is a right assumption for steady workloads with few DDLs, but for example once a couple of CREATE DATABASE records are in such a law is broken. -- Michael
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Michael Paquier (michael.paquier@gmail.com) wrote: > On Thu, Jan 19, 2017 at 6:20 AM, Stephen Frost <sfrost@snowman.net> wrote: > > On Wed, Jan 18, 2017 at 16:15 Robert Haas <robertmhaas@gmail.com> wrote: > >> On Wed, Jan 18, 2017 at 3:59 PM, Stephen Frost <sfrost@snowman.net> wrote: > >> > For non-cold standby configurations, pg_ctl is going to return just as > >> > soon as the database has finished crash recovery, which in most cases > >> > will probably be on the order of a few seconds. > >> > >> /me is poleaxed. > >> > >> Yeah, that was a confused sentence- most of the time it's going to return > >> on the order of a few seconds because we're doing regular startup without > >> having to do any crash recovery. > >> > >> > >> For actual crash recovery cases, it'll take between a few seconds and > >> checkpoint_timeout, as I described up-thread, based on the amount of > >> outstanding WAL. > > Recovering up to the minumum recovery point could take minutes! Right, in an actual crash recovery case, it'll take longer, up to checkpoint_timeout. I am certainly hopeful that, most of the time, we're actually starting up from a cleanly shut down system. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Michael Paquier (michael.paquier@gmail.com) wrote: > On Thu, Jan 19, 2017 at 5:01 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 1/18/17 8:25 AM, Stephen Frost wrote: > >> I was actually thinking about it the other way- start out by changing > >> them to both be 5m and then document next to checkpoint_timeout (and > >> max_wal_size, perhaps...) that if you go changing those parameters (eg: > >> bumping up checkpoint_timeout to 30 minutes and max_wal_size up enough > >> that you're still checkpointing based on time and not due to running out > >> of WAL space) then you might need to consider raising the timeout for > >> pg_ctl to wait around for the server to finish going through crash > >> recovery due to all of the outstanding changes since the last > >> checkpoint. > > > > It is important for users to be aware of this, but I don't think the > > relationship between checkpoint_timeout and recovery time is linear, so > > it's unclear what the exact advice should be. > > This is a right assumption for steady workloads with few DDLs, but for > example once a couple of CREATE DATABASE records are in such a law is > broken. I don't expect CREATE DATABASE to be terribly frequent, and it doesn't actually change the rule that the checkpoint_timeout is a maximum. Thanks! Stephen
Ryan try to run 'make install-world' then 'make -i installcheck-world', -i option will ignore the error and proceed. You can check if any other tests fails. This is a separate issue, unrelated to this patch. I do not think we should stop from changing the status because of this.
Beena, when I try to run 'make install-world' I get a lot of errors from the 'osx' executable which I think has to do with building the SGML documentation:
make -C doc install
make -C src install
make -C sgml install
{ \
echo "<!ENTITY version \"10devel\">"; \
echo "<!ENTITY majorversion \"10\">"; \
} > version.sgml
'/opt/local/bin/perl' ./mk_feature_tables.pl YES ../../../src/backend/catalog/ sql_feature_packages.txt ../../../src/backend/catalog/ sql_features.txt > features-supported.sgml
'/opt/local/bin/perl' ./mk_feature_tables.pl NO ../../../src/backend/catalog/ sql_feature_packages.txt ../../../src/backend/catalog/ sql_features.txt > features-unsupported.sgml
'/opt/local/bin/perl' ./generate-errcodes-table.pl ../../../src/backend/utils/ errcodes.txt > errcodes-table.sgml
/usr/local/bin/osx -D. -x lower -i include-xslt-index postgres.sgml >postgres.xmltmp
/usr/local/bin/osx:postgres. sgml:3:55:W: cannot generate system identifier for public text "-//OASIS//DTD DocBook V4.2//EN"
/usr/local/bin/osx:postgres. sgml:12:0:E: reference to entity "BOOK" for which no system identifier could be generated
/usr/local/bin/osx:postgres. sgml:3:0: entity was defined here
/usr/local/bin/osx:postgres. sgml:12:0:E: DTD did not contain element declaration for document type name
/usr/local/bin/osx:postgres. sgml:14:9:E: there is no attribute "ID"
/usr/local/bin/osx:postgres. sgml:14:19:E: element "BOOK" undefined
/usr/local/bin/osx:postgres. sgml:15:7:E: element "TITLE" undefined
/usr/local/bin/osx:postgres. sgml:17:10:E: element "BOOKINFO" undefined
/usr/local/bin/osx:postgres. sgml:18:13:E: element "CORPAUTHOR" undefined
/usr/local/bin/osx:postgres. sgml:19:14:E: element "PRODUCTNAME" undefined
/usr/local/bin/osx:postgres. sgml:20:16:E: element "PRODUCTNUMBER" undefined
/usr/local/bin/osx:legal.sgml: 3:5:E: element "DATE" undefined
/usr/local/bin/osx:legal.sgml: 5:10:E: element "COPYRIGHT" undefined
/usr/local/bin/osx:legal.sgml: 6:6:E: element "YEAR" undefined
...
/usr/local/bin/osx:history.sgml:173:13:E: element "LISTITEM" undefined
/usr/local/bin/osx:history.sgml:174:10:E: element "PARA" undefined
/usr/local/bin/osx:history.sgml:175:14:E: element "ACRONYM" undefined
/usr/local/bin/osx:I: maximum number of errors (200) reached; change with -E option
make[3]: *** [postgres.xml] Error 1
make[2]: *** [install] Error 2
make[1]: *** [install] Error 2
make: *** [install-world-doc-recurse] Error 2
make -C src install
make -C sgml install
{ \
echo "<!ENTITY version \"10devel\">"; \
echo "<!ENTITY majorversion \"10\">"; \
} > version.sgml
'/opt/local/bin/perl' ./mk_feature_tables.pl YES ../../../src/backend/catalog/
'/opt/local/bin/perl' ./mk_feature_tables.pl NO ../../../src/backend/catalog/
'/opt/local/bin/perl' ./generate-errcodes-table.pl ../../../src/backend/utils/
/usr/local/bin/osx -D. -x lower -i include-xslt-index postgres.sgml >postgres.xmltmp
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:postgres.
/usr/local/bin/osx:legal.sgml:
/usr/local/bin/osx:legal.sgml:
/usr/local/bin/osx:legal.sgml:
...
/usr/local/bin/osx:history.sgml:173:13:E: element "LISTITEM" undefined
/usr/local/bin/osx:history.sgml:174:10:E: element "PARA" undefined
/usr/local/bin/osx:history.sgml:175:14:E: element "ACRONYM" undefined
/usr/local/bin/osx:I: maximum number of errors (200) reached; change with -E option
make[3]: *** [postgres.xml] Error 1
make[2]: *** [install] Error 2
make[1]: *** [install] Error 2
make: *** [install-world-doc-recurse] Error 2
Jim Nasby said I shouldn't necessarily need to build the docs / the whole world in order to review patches. But the Review form needs a `make installworld-check`. Do I need to install the whole world in order to meet this requirement? Happy to do so if required, but in that case, I wonder why 'osx' is having so much trouble parsing the SGML into XML?
Thanks for the help,
Ryan
Jim Nasby said I shouldn't necessarily need to build the docs / the whole world in order to review patches. But the Review form needs a `make installworld-check`. Do I need to install the whole world in order to meet this requirement? Happy to do so if required, but in that case, I wonder why 'osx' is having so much trouble parsing the SGML into XML?
Aha, I was able to run a "make -i install-world", which ignored the SGML errors and built the rest of the world. Now trying "make -i installcheck-world".
So I ran "make -i installcheck-world" and it ran to completion. This is on a freshly "git pull"ed postgres source tree. Certain tests failed, but most succeeded. There was no "overall" indication of success or failure at the very end, so I'm not sure what this means.
The only "unusual" thing I'm doing that I'm aware of is I'm setting the installation directory to "/path/to/git/of/postgres/install_dir" rather than wherever it installs by default. I've attached my configure options and the result of the "make -i installcheck-world", in the hopes that someone can tell me whether this is an acceptable result or if it means there's something wrong with my install? I'd like to be able to review patches, and I'm not sure if this result I just got indicates a success or a failure of "make installcheck-world".
The only "unusual" thing I'm doing that I'm aware of is I'm setting the installation directory to "/path/to/git/of/postgres/install_dir" rather than wherever it installs by default. I've attached my configure options and the result of the "make -i installcheck-world", in the hopes that someone can tell me whether this is an acceptable result or if it means there's something wrong with my install? I'd like to be able to review patches, and I'm not sure if this result I just got indicates a success or a failure of "make installcheck-world".
Please let me know if there's any more info I should provide.
Thanks again,
Ryan
Вложения
Ryan Murphy <ryanfmurphy@gmail.com> writes: > So I ran "make -i installcheck-world" and it ran to completion. This is on > a freshly "git pull"ed postgres source tree. Certain tests failed, but > most succeeded. Those results look pretty broken :-( > There was no "overall" indication of success or failure at > the very end, so I'm not sure what this means. You told make to ignore errors (with "-i"), so it didn't notice that anything was wrong. The on-screen output isn't all that helpful for diagnosing what went wrong. You might learn more by looking at the regression.diffs files. Remember that errors tend to cascade, so the first one(s) in any particular test suite are the most important --- the rest might just be fallout. regards, tom lane
The on-screen output isn't all that helpful for diagnosing what went
wrong. You might learn more by looking at the regression.diffs files.
Remember that errors tend to cascade, so the first one(s) in any
particular test suite are the most important --- the rest might just
be fallout.
Thanks Tom, I'll check out the diffs.
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Peter Eisentraut
Дата:
On 1/18/17 3:12 PM, Stephen Frost wrote: > I don't understand what I'm missing when it comes to checkpoint_timeout > and the time required to recover from a crash. You aren't the first > person to question that association, but it seems pretty clear to me. > > When doing recovery, we have to replay everything since the last > checkpoint. If we are checkpointing at least every 5 minutes then we > can't have any more than 5 minutes worth of WAL to replay, right? But writing WAL and replaying WAL are two entirely different operations.Writing a WAL record involves writing a few bytessequentially. Replaying a WAL record might involve hopping all over the system and applying the changes that the WAL record describes. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
Peter, * Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 1/18/17 3:12 PM, Stephen Frost wrote: > > I don't understand what I'm missing when it comes to checkpoint_timeout > > and the time required to recover from a crash. You aren't the first > > person to question that association, but it seems pretty clear to me. > > > > When doing recovery, we have to replay everything since the last > > checkpoint. If we are checkpointing at least every 5 minutes then we > > can't have any more than 5 minutes worth of WAL to replay, right? > > But writing WAL and replaying WAL are two entirely different operations. Sure, but we're talking about replaying WAL vs. doing a checkpoint, not about writing WAL vs. replaying WAL. Replaying WAL and doing a checkpoint both require writing to lots of different places across the filesystem, of course. There can be cases where individual backends are having to evict pages to clear space for new pages to be pulled in and that could have an effect of making a checkpoint happen with multiple processes, but that's a bit of a different situation. Of course, it would be nice if we could make our replay of WAL multi-process, and checkpointing too, for that matter. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Peter Eisentraut
Дата:
On 1/18/17 3:47 PM, Robert Haas wrote: > Anybody who has got a script that runs pg_ctl unattended mode likely > now has to go update that script to add --no-wait. The state of init scripts and other start scripts out there is such a mess, it's hard to make general statements like this. Many start scripts still start the postmaster directly and have confusing or outdated advice about whether or not to use pg_ctl. Some implement their own waiting logic after starting. Some just ignore the issue and do wrong or inconsistent things. With this change, together with the systemd support that is already out there with 9.6, and with the new promote wait support, we'll at least have a consistent approach going forward and have a better shot at sorting out the current mess. Someone who really wants the old behavior can add the command-line option in a backward-compatible way. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 1/18/17 3:47 PM, Robert Haas wrote: > > Anybody who has got a script that runs pg_ctl unattended mode likely > > now has to go update that script to add --no-wait. > > The state of init scripts and other start scripts out there is such a > mess, it's hard to make general statements like this. Many start > scripts still start the postmaster directly and have confusing or > outdated advice about whether or not to use pg_ctl. Some implement > their own waiting logic after starting. Some just ignore the issue and > do wrong or inconsistent things. > > With this change, together with the systemd support that is already out > there with 9.6, and with the new promote wait support, we'll at least > have a consistent approach going forward and have a better shot at > sorting out the current mess. +1. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Peter Eisentraut
Дата:
On 1/19/17 9:53 AM, Stephen Frost wrote: > Sure, but we're talking about replaying WAL vs. doing a checkpoint, not > about writing WAL vs. replaying WAL. Replaying WAL and doing a > checkpoint both require writing to lots of different places across the > filesystem, of course. Yeah, but they are each doing different things, so you can't say that one will take the same amount of time or strictly less than the other. It might be a good first estimation, but my practical experience is that it's not when it really matters. :-/ -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 1/19/17 9:53 AM, Stephen Frost wrote: > > Sure, but we're talking about replaying WAL vs. doing a checkpoint, not > > about writing WAL vs. replaying WAL. Replaying WAL and doing a > > checkpoint both require writing to lots of different places across the > > filesystem, of course. > > Yeah, but they are each doing different things, so you can't say that > one will take the same amount of time or strictly less than the other. > It might be a good first estimation, but my practical experience is that > it's not when it really matters. :-/ You've found that WAL replay following a crash takes longer than checkpointing and therefore crash recovery requires more time than checkpoint_timeout is set to? WAL replay does do more work, generally speaking (the WAL has to be read, the checksum validated on it, and then the write has to go out, while the checkpointer just writes the page out from memory), but it's also dealing with less contention on the system (there aren't a bunch of backends hammering the disks to pull data in with reads when you're doing crash recovery...). We did make the WAL checksum routines a lot faster with 9.6, as I recall, so perhaps there's been some change there too. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Michael Paquier
Дата:
On Fri, Jan 20, 2017 at 12:06 AM, Stephen Frost <sfrost@snowman.net> wrote: > We did make the WAL checksum routines a lot > faster with 9.6, as I recall, so perhaps there's been some change there > too. 9.5, commit 5028f22f with Abhijit's and Heikki's work on CRC-32 computations. -- Michael
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Michael Paquier (michael.paquier@gmail.com) wrote: > On Fri, Jan 20, 2017 at 12:06 AM, Stephen Frost <sfrost@snowman.net> wrote: > > We did make the WAL checksum routines a lot > > faster with 9.6, as I recall, so perhaps there's been some change there > > too. > > 9.5, commit 5028f22f with Abhijit's and Heikki's work on CRC-32 computations. Oh, was it 9.5 where we added the code to use CPU OPs to speed up the calculations..? If so then that surprises me a bit, as, if I recall correctly, I had a 9.5-based server where the replica simply couldn't keep up with replay and was entirely CPU-bound.. I had thought those changes went into 9.6. If they were in 9.5 then that's possibly a bit disappointing. Of course, this was a while ago, perhaps that was actually a 9.4 server.. mmm, will have to go find out. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Andres Freund
Дата:
On 2017-01-19 10:06:09 -0500, Stephen Frost wrote: > WAL replay does do more work, generally speaking (the WAL has to be > read, the checksum validated on it, and then the write has to go out, > while the checkpointer just writes the page out from memory), but it's > also dealing with less contention on the system (there aren't a bunch of > backends hammering the disks to pull data in with reads when you're > doing crash recovery...). There's a huge difference though: WAL replay is single threaded, whereas generating WAL is not. Especially if there's synchronous IO required (most commonly reading in data, because more data was modified in the current checkpointthan fit in shared buffers, so FPIs don't pre-fill buffers), you can be significantly slower than generating the WAL. Especially when you deal with SSDs, which can handle a lot of IO in parallel, you can easily run into such issues. Greetings, Andres Freund
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Andres Freund (andres@anarazel.de) wrote: > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote: > > WAL replay does do more work, generally speaking (the WAL has to be > > read, the checksum validated on it, and then the write has to go out, > > while the checkpointer just writes the page out from memory), but it's > > also dealing with less contention on the system (there aren't a bunch of > > backends hammering the disks to pull data in with reads when you're > > doing crash recovery...). > > There's a huge difference though: WAL replay is single threaded, whereas > generating WAL is not. I'm aware- but *checkpointing* is still single-threaded, unless, as I mentioned, you end up with backends pushing out their own changes to the heap to make room for new pages to come in. Or is there some other way the checkpoint ends up being performed with multiple processes? > Especially if there's synchronous IO required > (most commonly reading in data, because more data was modified in the > current checkpointthan fit in shared buffers, so FPIs don't pre-fill > buffers), you can be significantly slower than generating the WAL. That is an interesting point, if I'm following what you're saying correctly- during the replay we can end up having more pages modified than fit in shared buffers, which means that we have to read back in pages that we pushed out to implement the non-FPI WAL changes to that page. I wonder if we should have a way to configure the amount of memory allowed to be used for WAL replay, independent of shared_buffers? I mean, really, during crash recovery on a dedicated database box, you'd probably want to say "ALL the memory can be used if it makes crash recovery faster!". That said, I wonder if our eviction algorithm could be improved/changed when performing WAL replay too to reduce the chances that we'll have to read a page back in. Very interesting. Thanks! Stephen
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Andres Freund
Дата:
On 2017-01-19 20:45:57 -0500, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote: > > > WAL replay does do more work, generally speaking (the WAL has to be > > > read, the checksum validated on it, and then the write has to go out, > > > while the checkpointer just writes the page out from memory), but it's > > > also dealing with less contention on the system (there aren't a bunch of > > > backends hammering the disks to pull data in with reads when you're > > > doing crash recovery...). > > > > There's a huge difference though: WAL replay is single threaded, whereas > > generating WAL is not. > > I'm aware- but *checkpointing* is still single-threaded, unless, as I > mentioned, you end up with backends pushing out their own changes to the > heap to make room for new pages to come in. Sure, but buffer checkpointing isn't necessarily that large a portion of the work done in one checkpoint cycle, in comparison to all the WAL being generated. Quite commonly a lot of the buffers will already have been flushed to disk by backend and/or bgwriter, and are clean by the time checkpointer gets to them. So I don't think checkpointer being single threaded necessarily means much WRT replay performance. > > Especially if there's synchronous IO required > > (most commonly reading in data, because more data was modified in the > > current checkpointthan fit in shared buffers, so FPIs don't pre-fill > > buffers), you can be significantly slower than generating the WAL. > > That is an interesting point, if I'm following what you're saying > correctly- during the replay we can end up having more pages modified > than fit in shared buffers, which means that we have to read back in > pages that we pushed out to implement the non-FPI WAL changes to that > page. Right. (And not just during replay obviously, also during the intial WAL generation). > I wonder if we should have a way to configure the amount of memory > allowed to be used for WAL replay, independent of shared_buffers? I don't quite see how that'd work, especially with HS. We just use the normal shared buffers code etc, and there we can't just resize the amount of shared_buffers allocated after doing crash recovery. > That said, I wonder if our eviction algorithm could be > improved/changed when performing WAL replay too to reduce the chances > that we'll have to read a page back in. I don't think that's a that promising angle of attach. Having a separate pre-fetching backend that parses the WAL and pre-reads everything necessary seems more promising. Greetings, Andres Freund
Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
От
Stephen Frost
Дата:
* Andres Freund (andres@anarazel.de) wrote: > On 2017-01-19 20:45:57 -0500, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote: > > > > WAL replay does do more work, generally speaking (the WAL has to be > > > > read, the checksum validated on it, and then the write has to go out, > > > > while the checkpointer just writes the page out from memory), but it's > > > > also dealing with less contention on the system (there aren't a bunch of > > > > backends hammering the disks to pull data in with reads when you're > > > > doing crash recovery...). > > > > > > There's a huge difference though: WAL replay is single threaded, whereas > > > generating WAL is not. > > > > I'm aware- but *checkpointing* is still single-threaded, unless, as I > > mentioned, you end up with backends pushing out their own changes to the > > heap to make room for new pages to come in. > > Sure, but buffer checkpointing isn't necessarily that large a portion of > the work done in one checkpoint cycle, in comparison to all the WAL > being generated. Quite commonly a lot of the buffers will already have > been flushed to disk by backend and/or bgwriter, and are clean by the > time checkpointer gets to them. So I don't think checkpointer being > single threaded necessarily means much WRT replay performance. Yes, good point, we also have the bgwriter going through and helping. > > > Especially if there's synchronous IO required > > > (most commonly reading in data, because more data was modified in the > > > current checkpointthan fit in shared buffers, so FPIs don't pre-fill > > > buffers), you can be significantly slower than generating the WAL. > > > > That is an interesting point, if I'm following what you're saying > > correctly- during the replay we can end up having more pages modified > > than fit in shared buffers, which means that we have to read back in > > pages that we pushed out to implement the non-FPI WAL changes to that > > page. > > Right. (And not just during replay obviously, also during the intial WAL > generation). Sure. > > I wonder if we should have a way to configure the amount of memory > > allowed to be used for WAL replay, independent of shared_buffers? > > I don't quite see how that'd work, especially with HS. We just use the > normal shared buffers code etc, and there we can't just resize the > amount of shared_buffers allocated after doing crash recovery. It wouldn't work with HS (or, at least, I have no idea how it would). I was specifically thinking about *just* during crash recovery there (sorry that I didn't make that clear), and my thought was that we'd just allocate the memory locally, not as shared memory, and then drop the whole thing and allocate shared_buffers after crash recovery was done. Obviously, this is a lot of hand-waving, but that's what I was thinking. > > That said, I wonder if our eviction algorithm could be > > improved/changed when performing WAL replay too to reduce the chances > > that we'll have to read a page back in. > > I don't think that's a that promising angle of attach. Having a separate > pre-fetching backend that parses the WAL and pre-reads everything > necessary seems more promising. I agree, that would be helpful and could help with HS too, which I agree is an important piece. Thanks! Stephen