Обсуждение: alpha3 release schedule?
Do people want more time to play with hot standby? Otherwise alpha3 should go out on Monday or Tuesday.
>Do people want more time to play with hot standby? Otherwise alpha3 >should go out on Monday or Tuesday. > Well, I want to know whether the problem I refered to in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php is must-fix or not. This problem is a corollary of the deadlock problem. This is less catstrophic but more likely to happen. If you leave this problem, for example, any long-running transactions, holding any cursors in whatever tables, have a possibility of freezing whole recovery work in HotStandby node until the transaction commit. regards, -- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net
On Sat, Dec 19, 2009 at 7:20 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > Do people want more time to play with hot standby? Otherwise alpha3 > should go out on Monday or Tuesday. I think we should try to wrap it promptly. It's true that Hot Standby almost certainly has bugs and/or annoying limitations, as one would expect with a feature of this magnitude, but I think we'll get a better idea what they are and which ones are the most important by getting something out there for people to test. AIUI, the reason why Simon has been busting ass to get this committed is precisely so that it could go into alpha3 and get more testing, and speaking in my capacity as a guy who is anal about the schedule, I couldn't be happier about that! Postponing alpha3 would seem to defeat the purpose of all that hard work. ...Robert
Hiroyuki Yamada <yamada@kokolink.net> writes: > Well, I want to know whether the problem I refered to > in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php > is must-fix or not. > This problem is a corollary of the deadlock problem. This is less catstrophic > but more likely to happen. > If you leave this problem, for example, any long-running transactions, > holding any cursors in whatever tables, have a possibility of freezing > whole recovery work in HotStandby node until the transaction commit. Seems like something we should fix ASAP, but I do not see why it need hold up an alpha release. Alpha releases are expected to have bugs, and this one doesn't look like it would stop people from finding other bugs. regards, tom lane
Tom Lane wrote: > Hiroyuki Yamada <yamada@kokolink.net> writes: >> Well, I want to know whether the problem I refered to >> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php >> is must-fix or not. > >> This problem is a corollary of the deadlock problem. This is less catstrophic >> but more likely to happen. > >> If you leave this problem, for example, any long-running transactions, >> holding any cursors in whatever tables, have a possibility of freezing >> whole recovery work in HotStandby node until the transaction commit. > > Seems like something we should fix ASAP, but I do not see why it need > hold up an alpha release. Alpha releases are expected to have bugs, > and this one doesn't look like it would stop people from finding > other bugs. yeah afaik alpha tarballs are a forma of a checkpoint at the end of a commitfest to get people a reasonable testing target. Every feature (not only HS) deserves getting serious testing so I vote for getting alpha3 out as soon as possible. Stefan
On Sat, 2009-12-19 at 18:12 +0100, Stefan Kaltenbrunner wrote: > > Seems like something we should fix ASAP, but I do not see why it > need > > hold up an alpha release. Alpha releases are expected to have bugs, > > and this one doesn't look like it would stop people from finding > > other bugs. > > yeah afaik alpha tarballs are a forma of a checkpoint at the end of a > commitfest to get people a reasonable testing target. Every feature > (not > only HS) deserves getting serious testing so I vote for getting > alpha3 > out as soon as possible. > > +1 for both. -- Devrim GÜNDÜZ, RHCE Command Prompt - http://www.CommandPrompt.com devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
>Hiroyuki Yamada <yamada@kokolink.net> writes: >> Well, I want to know whether the problem I refered to >> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php >> is must-fix or not. > >> This problem is a corollary of the deadlock problem. This is less catstrophic >> but more likely to happen. > >> If you leave this problem, for example, any long-running transactions, >> holding any cursors in whatever tables, have a possibility of freezing >> whole recovery work in HotStandby node until the transaction commit. > >Seems like something we should fix ASAP, but I do not see why it need >hold up an alpha release. Alpha releases are expected to have bugs, >and this one doesn't look like it would stop people from finding >other bugs. > At the beginning of this commit fest, Heikki said in http://archives.postgresql.org/pgsql-hackers/2009-11/msg00914.php >Of course there should be several phases! We've *already* punted a lot >of stuff from this first increment we're currently working on. The >criteria for getting this first phase committed is: could we release >with no further changes? And other patches seem to be checked with similar criteria, as long as I read mails in this list. So I wanted to know whether the problem is must-fix, and if it is, why the criteria has been changed during the commit fest. Anyway, thanks for answering my question. regards, -- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net
Hiroyuki Yamada wrote: >> Hiroyuki Yamada <yamada@kokolink.net> writes: >>> Well, I want to know whether the problem I refered to >>> in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php >>> is must-fix or not. >>> This problem is a corollary of the deadlock problem. This is less catstrophic >>> but more likely to happen. >>> If you leave this problem, for example, any long-running transactions, >>> holding any cursors in whatever tables, have a possibility of freezing >>> whole recovery work in HotStandby node until the transaction commit. >> Seems like something we should fix ASAP, but I do not see why it need >> hold up an alpha release. Alpha releases are expected to have bugs, >> and this one doesn't look like it would stop people from finding >> other bugs. > > At the beginning of this commit fest, Heikki said in > http://archives.postgresql.org/pgsql-hackers/2009-11/msg00914.php > >> Of course there should be several phases! We've *already* punted a lot >> of stuff from this first increment we're currently working on. The >> criteria for getting this first phase committed is: could we release >> with no further changes? > > And other patches seem to be checked with similar criteria, as long as > I read mails in this list. So I wanted to know whether the problem is > must-fix, and if it is, why the criteria has been changed during the > commit fest. Well, that was the criteria I used to decide whether to commit or not. Not everyone agreed to begin with, and the reason I used that criteria was a selfish one: I didn't want to be forced to fix loose ends after the commitfest myself. The big reason for that was that I didn't know how much time I would have for that. I have no complaints about Simon's commit. Knowing that I'm not on the hook to close the loose ends, I'm very happy that it's finally in. (That doesn't mean that I'll stop paying attention to this patch; I will do as much as I have time to.) Regarding the bugs you found, I put them on the TODO list at https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix category. I think they need to be fixed before final release, but there's no need to delay the alpha release for them. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
>Well, that was the criteria I used to decide whether to commit or not. >Not everyone agreed to begin with, and the reason I used that criteria >was a selfish one: I didn't want to be forced to fix loose ends after >the commitfest myself. The big reason for that was that I didn't know >how much time I would have for that. I have no complaints about Simon's >commit. Knowing that I'm not on the hook to close the loose ends, I'm >very happy that it's finally in. (That doesn't mean that I'll stop >paying attention to this patch; I will do as much as I have time to.) > >Regarding the bugs you found, I put them on the TODO list at >https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix >category. I think they need to be fixed before final release, but >there's no need to delay the alpha release for them. > I never think it's selfish. But I see. Thanks for your kind reply. regards, -- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net
On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote: > Well, that was the criteria I used to decide whether to commit or not. > Not everyone agreed to begin with, and the reason I used that criteria > was a selfish one: I didn't want to be forced to fix loose ends after > the commitfest myself. The big reason for that was that I didn't know > how much time I would have for that. I have no complaints about Simon's > commit. Knowing that I'm not on the hook to close the loose ends, I'm > very happy that it's finally in. (That doesn't mean that I'll stop > paying attention to this patch; I will do as much as I have time to.) > > Regarding the bugs you found, I put them on the TODO list at > https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix > category. I think they need to be fixed before final release, but > there's no need to delay the alpha release for them. Hmmm, well, if you are still paying attention you'll know that neither of those issues are bugs in the code that was committed. One of them has been fixed and the deadlock problem has a workaround applied. That workaround, err... works, but I accept its not ideal. But then a few things are not ideal and it does seem unlikely that every one of them will have a perfect fix in the next two months. I'll change the TODO page. -- Simon Riggs www.2ndQuadrant.com
On Sat, 2009-12-19 at 14:20 +0200, Peter Eisentraut wrote: > Do people want more time to play with hot standby? Otherwise alpha3 > should go out on Monday or Tuesday. No thanks. There were no known bugs in the code I committed, excepting the need to address VACUUM FULL. That will take longer than two days and isn't sufficient reason to halt Alpha3, IMHO. If others wish to revoke, then maybe we should consider a specific Alpha version just for Hot Standby. -- Simon Riggs www.2ndQuadrant.com
On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote: > I put them on the TODO list at > https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix > category. I notice you also re-arranged other items on there, specifically the notion that starting from a shutdown checkpoint is somehow important. It's definitely not any kind of bug. We've discussed this on-list and I've requested that you justify this. So far, nothing you've said on that issue has been at all convincing for me or others. The topic is already mentioned on the HS todo, since if one person requests something we should track that, just in case others eventually agree. But having said that, it clearly isn't a priority, so rearranging the item like that was not appropriate, unless you were thinking of doing it yourself, though that wasn't marked. -- Simon Riggs www.2ndQuadrant.com
On Sat, 2009-12-19 at 23:22 +0900, Hiroyuki Yamada wrote: > >Do people want more time to play with hot standby? Otherwise alpha3 > >should go out on Monday or Tuesday. > > > > Well, I want to know whether the problem I refered to > in http://archives.postgresql.org/pgsql-hackers/2009-12/msg01641.php > is must-fix or not. > > This problem is a corollary of the deadlock problem. This is less catstrophic > but more likely to happen. > > If you leave this problem, for example, any long-running transactions, > holding any cursors in whatever tables, have a possibility of freezing > whole recovery work in HotStandby node until the transaction commit. You seem very insistent on bringing up problems just before release. Almost as if you have a reason to back some other technology other than this one. The problem you mention here has been documented and very accessible for months and not a single person mentioned it up to now. What's more, the equivalent problem happens in the latest production version of Postgres - users can delay VACUUM endlessly in just the same way, yet I've not seen this raised as an issue in many years of using Postgres. Similarly, there are some ways that Postgres can deadlock that it need not, yet those negative behaviours are accepted and nobody is rushing to fix them, nor demanding that they should be. Few things are theoretically perfect on their first release. -- Simon Riggs www.2ndQuadrant.com
On Sun, Dec 20, 2009 at 3:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote: >> I put them on the TODO list at >> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix >> category. > > I notice you also re-arranged other items on there, specifically the > notion that starting from a shutdown checkpoint is somehow important. > It's definitely not any kind of bug. > > We've discussed this on-list and I've requested that you justify this. > So far, nothing you've said on that issue has been at all convincing for > me or others. The topic is already mentioned on the HS todo, since if > one person requests something we should track that, just in case others > eventually agree. But having said that, it clearly isn't a priority, so > rearranging the item like that was not appropriate, unless you were > thinking of doing it yourself, though that wasn't marked. This doesn't match my recollection of the previous discussion on this topic. I am not sure that I'd call it a bug, but I'd definitely like to see it fixed, and I think I mentioned that previously, though I don't have the email in front ATM. I am also not aware that anyone other than yourself has opined that we should not worry about fixing it, although I might be wrong about that too. At any rate, "clearly not a priority" seems like an overstatement relative to my memory of that conversation. ...Robert
Simon Riggs wrote: > On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote: >> I put them on the TODO list at >> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix >> category. > > I notice you also re-arranged other items on there, specifically the > notion that starting from a shutdown checkpoint is somehow important. I didn't rearrange anything. I added that item because it was missing. Yes, it is important in my opinion. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
>The problem you mention here has been documented and very accessible for >months and not a single person mentioned it up to now. What's more, the >equivalent problem happens in the latest production version of Postgres >- users can delay VACUUM endlessly in just the same way, yet I've not >seen this raised as an issue in many years of using Postgres. Similarly, >there are some ways that Postgres can deadlock that it need not, yet >those negative behaviours are accepted and nobody is rushing to fix >them, nor demanding that they should be. Few things are theoretically >perfect on their first release. > Sorry for annoying you, at the very first. Well, this is certainly a well-known problem, but the cursor example (or deadlock example) reveals that the problem is more severe than it was considered before, I guess. Following comments in backup.sgml(which are now replaced by the deadlock example) > Waits for buffer cleanup locks do not currently result in query > cancellation. Long waits are uncommon, though can happen in some cases > with long running nested loop joins. ...refered only to the example where startup process should wait until the end of one query. And long waits are assumed to be uncommon. The cursor example shows, however, the waits can be as long as one transaction, and occur in usual use case. FYI, I wrote a typical freeze scenario in the mail posted in the original deadlock example thread. Then the startup process may have to wait until the end of transaction, and we can not expect when the pin-holder transaction ends. Also, you mentioned the VACCUM case of the production version, but following two problems have different impacts. * One VACUUM process freezes until the end of a certain transaction.* Startup process(and whole recovery work) freezes untilthe end of a certain transaction. The startup process is the last process to freeze. So I guess this problem may become must-fix. Anyway, the patch are committed and alpha 3 are to be released. Do you think this problem is must-fix for the final release ? regards, -- Hiroyuki YAMADA Kokolink Corporation yamada@kokolink.net
On Mon, 2009-12-21 at 18:42 +0900, Hiroyuki Yamada wrote: > Do you think this problem is must-fix for the final release ? We should be clear that this is a behaviour I told you about, not a shock discovery by yourself. There is no permanent freeze, just a wait, from which the Startup process wakes up at the appropriate time. There is no crash or hang as is usually implied by the word freeze. It remains to be seen whether this is a priority for usability enhancement in this release. There are other issues as well and it is doubtful that every user will be fully happy with the functionality in this release. I will work on things in the order in which I understand them to be important for the majority, given my time and budget constraints and the resolvability of the issues. When you report bugs, I say thanks. When you start agitating about already-documented restrictions and I see which other software you promote, I think you may have other motives. Regrettably that reduces the weight I give your claims, in relation to other potential users. If you genuinely care about this topic then I hope and expect that you would start thinking about improvements, or even writing some. I am already in touch with many potential users and will be engaging more widely to understand users's reactions from the Alpha release. -- Simon Riggs www.2ndQuadrant.com
On Sun, 2009-12-20 at 19:11 -0500, Robert Haas wrote: > On Sun, Dec 20, 2009 at 3:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote: > >> I put them on the TODO list at > >> https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix > >> category. > > > > I notice you also re-arranged other items on there, specifically the > > notion that starting from a shutdown checkpoint is somehow important. > > It's definitely not any kind of bug. > > > > We've discussed this on-list and I've requested that you justify this. > > So far, nothing you've said on that issue has been at all convincing for > > me or others. The topic is already mentioned on the HS todo, since if > > one person requests something we should track that, just in case others > > eventually agree. But having said that, it clearly isn't a priority, so > > rearranging the item like that was not appropriate, unless you were > > thinking of doing it yourself, though that wasn't marked. > > This doesn't match my recollection of the previous discussion on this > topic. I am not sure that I'd call it a bug, but I'd definitely like > to see it fixed, and I think I mentioned that previously, though I > don't have the email in front ATM. I am also not aware that anyone > other than yourself has opined that we should not worry about fixing > it, although I might be wrong about that too. At any rate, "clearly > not a priority" seems like an overstatement relative to my memory of > that conversation. Please check the thread then. Nobody but me has "opined that we should not worry about fixing it", but then nobody else other than Heikki has suggested it is even a feature worthy of inclusion, ever. One person agreed with my position, nobody has spoken in favour of Heikki's position. However, I had already included the feature on the todo; it was further down the todo before a second copy was added, second copy now removed. If you are saying being able to start Hot Standby from a shutdown checkpoint is an important feature for you, then say so, and why. Please also be careful that you don't mix this up with other improvements, nor say "they all need fixing". This isn't a general discussion on those points. There are other important things. -- Simon Riggs www.2ndQuadrant.com
On 22.12.09 9:34 , Simon Riggs wrote: > If you are saying being able to start Hot Standby from a shutdown > checkpoint is an important feature for you, then say so, and why. I think it's not so much an important feature but more the removal of a footgun. Image a reporting database where all transactions but a few daily bulk imports are read-only. To spread the load, you do your bulk loads on the master, but run the reporting queries against a read-only HS slave. Now you take the master down for maintenance. Since all clients but the bulk loader use the slave already, and since the bulk loads can be deferred until after the maintenance window closes again, you don't actually do a fail-over. Now you're already pointing at your foot with the gun. All it takes to ruin your day is *some* reason for the slave to restart. Maybe due to a junior DBA's typo, or maybe due to a bug in postgres. Anway, once the slave is down, it won't come up until you manage to get the master up and running again. And this limitation is pretty surprising, since one would assume that if the slave survives a *crash* of the master, it'd certainly survive a simple *shutdown*. best regards, Florian Pflug
On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > If you are saying being able to start Hot Standby from a shutdown > checkpoint is an important feature for you, then say so, and why. Can you explain the consequences of missing this? It sounds to me like if I lose my master and it happened to be while it was shut down for whatever reason then I'll be stuck and won't be able to use my standby. If that's true it seems like it's a major problem. Or does it just mean I would have to follow a different procedure when failing over? I'm not sure if it's relevant but one thing to realize is that a lot of MySQL people are used to doing failovers to do regular maintenance tasks like creating indexes or making schema changes. Besides,a lot of sites build in regular failovers to ensure that their failover procedure works. In both cases they usually want to do a clean shut down of the master to ensure they don't lose any transactions during the failover. -- greg
On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote: > On 22.12.09 9:34 , Simon Riggs wrote: > > If you are saying being able to start Hot Standby from a shutdown > > checkpoint is an important feature for you, then say so, and why. > > I think it's not so much an important feature but more the removal of a > footgun. > > Image a reporting database where all transactions but a few daily bulk > imports are read-only. To spread the load, you do your bulk loads on the > master, but run the reporting queries against a read-only HS slave. Now > you take the master down for maintenance. Since all clients but the bulk > loader use the slave already, and since the bulk loads can be deferred > until after the maintenance window closes again, you don't actually do a > fail-over. > > Now you're already pointing at your foot with the gun. All it takes to > ruin your day is *some* reason for the slave to restart. Maybe due to a > junior DBA's typo, or maybe due to a bug in postgres. Anway, once the > slave is down, it won't come up until you manage to get the master up > and running again. And this limitation is pretty surprising, since one > would assume that if the slave survives a *crash* of the master, it'd > certainly survive a simple *shutdown*. Well, you either wait for master to come up again and restart, or you flip into normal mode and keep running queries from there. You aren't prevented from using the server, except by your own refusal to failover. That's not enough for me to raise the priority for this feature. But it was already on the list and remains there now. If someone does add this, it will require careful thought about how to avoid introducing further subtle ways to break HS, all of which will need testing and re-testing to avoid regression. So I'm not personally going to be working on it, for this release and likely the next also, nor will I encourage others to do so, for anyone looking to assist. There are more important things for us to do, IMHO. -- Simon Riggs www.2ndQuadrant.com
On Tue, 2009-12-22 at 11:41 +0000, Greg Stark wrote: > On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > If you are saying being able to start Hot Standby from a shutdown > > checkpoint is an important feature for you, then say so, and why. > > Can you explain the consequences of missing this? It sounds to me like > if I lose my master and it happened to be while it was shut down for > whatever reason then I'll be stuck and won't be able to use my > standby. If that's true it seems like it's a major problem. Or does it > just mean I would have to follow a different procedure when failing > over? Failover isn't prevented in this case. If we were going to spend time on anything it would be to make failover and switchback easier so that people aren't afraid of it. I've spent a few weeks trying to remove the shutdown checkpoint, but no luck so far. Switchback optimization is probably something for next release now, unless you're looking for a project? -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > If someone does > add this, it will require careful thought about how to avoid introducing > further subtle ways to break HS, all of which will need testing and > re-testing to avoid regression. Well, I *did* add that, but you removed it... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-12-22 at 16:09 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > If someone does > > add this, it will require careful thought about how to avoid introducing > > further subtle ways to break HS, all of which will need testing and > > re-testing to avoid regression. > > Well, I *did* add that, but you removed it... It was already on there when you added the second one. It is still there now, even after I removed the duplicate entry. By "add" I meant to write the feature, test and then support it afterwards, not to re-discuss editing the Wiki. -- Simon Riggs www.2ndQuadrant.com
On 22.12.09 13:21 , Simon Riggs wrote: > On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote: >> Image a reporting database where all transactions but a few daily >> bulk imports are read-only. To spread the load, you do your bulk >> loads on the master, but run the reporting queries against a >> read-only HS slave. Now you take the master down for maintenance. >> Since all clients but the bulk loader use the slave already, and >> since the bulk loads can be deferred until after the maintenance >> window closes again, you don't actually do a fail-over. >> >> Now you're already pointing at your foot with the gun. All it >> takes to ruin your day is *some* reason for the slave to restart. >> Maybe due to a junior DBA's typo, or maybe due to a bug in >> postgres. Anway, once the slave is down, it won't come up until you >> manage to get the master up and running again. And this limitation >> is pretty surprising, since one would assume that if the slave >> survives a *crash* of the master, it'd certainly survive a simple >> *shutdown*. > > Well, you either wait for master to come up again and restart, or you > flip into normal mode and keep running queries from there. You aren't > prevented from using the server, except by your own refusal to > failover. Very true. However, that "refusal" as you put it might actually be the most sensible thing to do in a lot of setups. Not everyone needs extreme up-time guarantees, and for those people setting up, testing and *continuously* exercising fail-over is just not worth the effort. Especially since fail-over with asynchronous replication is tricky to get right if you want to avoid data loss. So I still believe that there are very real use-cases for HS where this limitation can be quite a PITA. But you are of course free to work on whatever you feel like, and probably need to satisfy your client's needs first. So I'm in no way implying that this issue is a must-fix issue, or that you're in any way obliged to take care of it. I merely wanted to make the point that there *are* valid use-cases where this behavior is not ideal. best regards, Florian Pflug
On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote: >> Well, you either wait for master to come up again and restart, or you >> flip into normal mode and keep running queries from there. You aren't >> prevented from using the server, except by your own refusal to >> failover. > > Very true. However, that "refusal" as you put it might actually be the > most sensible thing to do in a lot of setups. Not everyone needs extreme > up-time guarantees, and for those people setting up, testing and > *continuously* exercising fail-over is just not worth the effort. > Especially since fail-over with asynchronous replication is tricky to > get right if you want to avoid data loss. To say nothing that the replica might not be a suitable master at all. It could be running on inferior hardware or be on a separate network perhaps too slow to reach from production services. HA is not the only use case for HS or even the main one in my experience -- greg
On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote: > But you are of course free to work on whatever you feel like, and > probably need to satisfy your client's needs first. Alluding to me as whimsical or mercenary isn't likely to change my mind. IMHO this isn't one of the more important features, for the majority, in this release. I do intend to check that. If there are people that believe otherwise, knock yourselves out. -- Simon Riggs www.2ndQuadrant.com
On Tue, 2009-12-22 at 15:38 +0000, Greg Stark wrote: > On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote: > >> Well, you either wait for master to come up again and restart, or you > >> flip into normal mode and keep running queries from there. You aren't > >> prevented from using the server, except by your own refusal to > >> failover. > > > > Very true. However, that "refusal" as you put it might actually be the > > most sensible thing to do in a lot of setups. Not everyone needs extreme > > up-time guarantees, and for those people setting up, testing and > > *continuously* exercising fail-over is just not worth the effort. > > Especially since fail-over with asynchronous replication is tricky to > > get right if you want to avoid data loss. > > To say nothing that the replica might not be a suitable master at all. > It could be running on inferior hardware or be on a separate network > perhaps too slow to reach from production services. > > HA is not the only use case for HS or even the main one in my experience I can invent scenarios in which all the outstanding issues give problems. What I have to do is balance which of those is more likely and which have useful workarounds. This is about priority and in particular, my priority. IMHO my time would be misplaced to work upon this issue, though I will check that other users feel that way also. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > By "add" I meant to write the feature, test and then support it > afterwards, not to re-discuss editing the Wiki. That's exactly what I meant too. I *did* write the feature, but you removed it before committing. I can extract the removed parts from the git repository and send you as a new patch for review, if you'd like. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > By "add" I meant to write the feature, test and then support it > > afterwards, not to re-discuss editing the Wiki. > > That's exactly what I meant too. I *did* write the feature, but you > removed it before committing. I removed it because you showed it wouldn't work. If you want to fix that problem, test, commit and support it, go right ahead. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> By "add" I meant to write the feature, test and then support it >>> afterwards, not to re-discuss editing the Wiki. >> That's exactly what I meant too. I *did* write the feature, but you >> removed it before committing. > > I removed it because you showed it wouldn't work. I did? I believe this is the discussion that lead to you removing it (6th of December, thread "Hot Standby, recent changes"): Simon Riggs wrote: > > On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: >> > 4. Need to handle the case where master is started up with >> > wal_standby_info=true, shut down, and restarted with >> > wal_standby_info=false, while the standby server runs continuously. And >> > the code in StartupXLog() to initialize recovery snapshot from a >> > shutdown checkpoint needs to check that too. > > I don't really understand the use case for shutting down the server and > then using it as a HS base backup. Why would anyone do that? Why would > they have their server down for potentially hours, when they can take > the backup while the server is up? If the server is idle, it can be > up-and-idle just as easily as down-and-idle, in which case we wouldn't > need to support this at all. Adding yards of code for this capability > isn't important to me. I'd rather strip the whole lot out than keep > fiddling with a low priority area. Please justify this as a real world > solution before we continue trying to support it. The issue I mentioned had nothing to do with starting from a shutdown checkpoint - it's still a problem if you keep the standby running through the restart cycle in the master) - but maybe you thought it was? Or was there something else? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-12-22 at 18:40 +0200, Heikki Linnakangas wrote: > The issue I mentioned had nothing to do with starting from a shutdown > checkpoint - it's still a problem if you keep the standby running > through the restart cycle in the master) - but maybe you thought it was? > Or was there something else? Strangely enough that exact same problem already happens with archive_mode, and we see a fix coming for that soon also. That fix takes the same approach as HS already takes. HS will flip out when it sees the next record (checkpoint). The only way out is to re-take base backup, just the same. Even after that fix is applied, HS will still work as well as archive-mode, so if anything HS is ahead of other functionality. Fixing obscure cases where people actively try to get past configuration options is not a priority. I'm not sure why you see it as important, especially when you've argued we don't even need the parameter in the first place. You've been perfectly happy for *years* with the situation that recovery would fail if max_prepared_transactions was not correctly. You're not going to tell me you never noticed? Why is avoidance of obvious misconfiguration of HS such a heavy priority when nothing else ever was? I'm going to concentrate on fixing important issues. I'd rather you helped with those. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > You've been perfectly happy for *years* with the situation that recovery > would fail if max_prepared_transactions was not correctly. You're not > going to tell me you never noticed? Why is avoidance of obvious > misconfiguration of HS such a heavy priority when nothing else ever was? That's not a priority, and I never said it was. It almost sounds like we're in a violant agreement: this issue of flipping wal_standby_info in the master has nothing to do with the removal of the capability to start standby from a shutdown checkpoint. So what *was* the reason? Was there something wrong with it? If not, please put it back. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Mon, 2009-12-21 at 18:42 +0900, Hiroyuki Yamada wrote: > > > Do you think this problem is must-fix for the final release ? > > We should be clear that this is a behaviour I told you about, not a > shock discovery by yourself. There is no permanent freeze, just a wait, > from which the Startup process wakes up at the appropriate time. There > is no crash or hang as is usually implied by the word freeze. > > It remains to be seen whether this is a priority for usability > enhancement in this release. There are other issues as well and it is > doubtful that every user will be fully happy with the functionality in > this release. I will work on things in the order in which I understand > them to be important for the majority, given my time and budget > constraints and the resolvability of the issues. > > When you report bugs, I say thanks. When you start agitating about > already-documented restrictions and I see which other software you > promote, I think you may have other motives. Regrettably that reduces > the weight I give your claims, in relation to other potential users. Simon, where did this come from? "Other software?" I think Simon's comments are way off base here and only serve to increase tension in this discussion. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 2009-12-22 at 19:30 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > You've been perfectly happy for *years* with the situation that recovery > > would fail if max_prepared_transactions was not correctly. You're not > > going to tell me you never noticed? Why is avoidance of obvious > > misconfiguration of HS such a heavy priority when nothing else ever was? > > That's not a priority, and I never said it was. > > It almost sounds like we're in a violant agreement: this issue of > flipping wal_standby_info in the master has nothing to do with the > removal of the capability to start standby from a shutdown checkpoint. I removed the capability to start at shutdown checkpoints because you said it would cause this bug. That gives two choices: fix the bug, remove the feature. I don't think it is a priority to support that feature, so I removed it in favour of other work. I will work on issues in priority order and this was already on the list and remains so. I don't have endless time, so realistically, given its current priority it is unlikely to be addressed in this release. -- Simon Riggs www.2ndQuadrant.com
On 22.12.09 16:45 , Simon Riggs wrote: > On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote: >> But you are of course free to work on whatever you feel like, and >> probably need to satisfy your client's needs first. > > Alluding to me as whimsical or mercenary isn't likely to change my > mind. Simon, you *completely* miss-understood my last paragraph! I never intended to call you whimsical or mercenary, and I honestly don't believe I did. The only thing I "alluded" to you was seeing HS mostly as a solution for HA setups, whereas I felt that I has quite a few use-cases beside that. Plus that your view of what the important use-cases are is influenced by the projects you usually work on, and that it's perfectly reasonable for your priorities to reflect that view. None of this was meant as an insult of any kind. best regards, Florian Pflug
On Tue, 2009-12-22 at 19:53 +0100, Florian Pflug wrote: > None of this was meant as an insult of any kind. Then I apologise completely. I've clearly been working too hard and will retire for some rest (even though that is not listed as a task on the Wiki). -- Simon Riggs www.2ndQuadrant.com
On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote: > I've clearly been working too hard and will retire for some rest (even > though that is not listed as a task on the Wiki). Someone add it! David
On Tue, Dec 22, 2009 at 11:04:29AM -0800, David Wheeler wrote: > On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote: > > > I've clearly been working too hard and will retire for some rest (even > > though that is not listed as a task on the Wiki). > > Someone add it! Done! :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate