Обсуждение: SIGQUIT and lost sequence WAL records
Hi all, Lately a couple of colleagues have pinged me regarding the fact that a server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may ignore sequence WAL records. See for example the script attached seqtest.py (simplified test case provided from my colleague Nikhil Deshpande), that will fail while for example the following command is run in parallel: for i in {1..1000}; do pg_ctl restart -m immediate -w; sleep 1; done The failure happening being for example: AssertionError: DB returned reused sequence value prev=1118 curr=958 Here it means that the last return value of nextval() was 958, even if the highest value ever fetched was 1118. Note that this test script uses psycopg2, but that it does *not* commit transactions on purpose. Also note that those WAL records do not disappear all the time. Sometimes they are here and the sequence keeps increasing monotically after server is restarted. Trying to find out if there were records ignored I added some logs in the server thanks to the patch attached. In the case of this last failure there have been a couple of records after the last one found at recovery: WARNING: Logged new sequence value at 0/1505570 for nextval = 957 WARNING: Logged new sequence value at 0/15066D8 for nextval = 990 WARNING: Logged new sequence value at 0/1507828 for nextval = 1023 WARNING: Logged new sequence value at 0/1508990 for nextval = 1056 WARNING: Logged new sequence value at 0/1509AE0 for nextval = 1089 WARNING: Logged new sequence value at 0/150AC48 for nextval = 1122 And if I looked at the WAL records those were not of course not present. That's not really surprising as recovery didn't use them to restore the sequence up to where it should have. func.sgml mentions that a nextval() operation is *never* rolled back even if a transaction is aborted, though in this case this is exactly what is happening, even if when the server is stopped the existing transactions running are forcibly aborted. I am lacking of fuel to look more at that today, but at this point I wanted to get some input to see if I am missing something obvious. So, code bug or documentation bug? -- Michael
Вложения
Michael Paquier <michael.paquier@gmail.com> writes: > Lately a couple of colleagues have pinged me regarding the fact that a > server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may > ignore sequence WAL records. I've not gone through this in detail, but I think what is happening is that since the test case never commits any transactions, the sequence-advance WAL records may not get flushed to disk, and thus of course are not available for replay. This is not a bug IMO, as by the exact same token, no uses of those nextval() values can appear in any committed tuples, so no database inconsistency is possible. If an application is using nextval() results *externally to the database*, it's unsafe for it to rely on those values being unique unless/until it commits the nextval() calls. I seem to recall some past discussions about whether a transaction that commits after writing only a sequence advance (that is, it did nextval() and nothing else) needs to flush WAL. I think at one point it did not but we changed it because of this consideration. [ digs around... ] That was a long time ago: see 01747692f. regards, tom lane
On Wed, Feb 17, 2016 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> Lately a couple of colleagues have pinged me regarding the fact that a >> server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may >> ignore sequence WAL records. > > I've not gone through this in detail, but I think what is happening > is that since the test case never commits any transactions, the > sequence-advance WAL records may not get flushed to disk, and thus > of course are not available for replay. This is not a bug IMO, as > by the exact same token, no uses of those nextval() values can appear > in any committed tuples, so no database inconsistency is possible. > > If an application is using nextval() results *externally to the > database*, it's unsafe for it to rely on those values being unique > unless/until it commits the nextval() calls. That's for the confirmation. Yeah I already suggested those folks to issue a commit to ensure that nextval() remains consistent. > I seem to recall some past discussions about whether a transaction > that commits after writing only a sequence advance (that is, it > did nextval() and nothing else) needs to flush WAL. I think at one > point it did not but we changed it because of this consideration. > > [ digs around... ] That was a long time ago: see 01747692f. That's a part of history... The documentation is stating the following regarding this behavior: <important> <para> To avoid blocking concurrent transactions that obtain numbers from the same sequence, a <function>nextval</function> operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the <function>nextval</function> later aborts. This means that aborted transactions might leave unused <quote>holes</quote> in the sequence of assigned values. </para> </important> Do you think we should add a mention regarding SIGQUIT/stop-immediate regarding the fact that the application needs to commit the transaction that used nextval() with synchronous_commit = on to keep it consistent at recovery? -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > Do you think we should add a mention regarding SIGQUIT/stop-immediate > regarding the fact that the application needs to commit the > transaction that used nextval() with synchronous_commit = on to keep > it consistent at recovery? Meh. In the normal case where you just do "SELECT nextval()" and use the result externally, there's going to be an implicit commit before you get the data back. So I think you really have to be taking deliberate aim at your foot in order to get burnt like this. regards, tom lane