Обсуждение: Reliable and fast money transaction design
I need a way to perform a series of money transactions (row inserts) together with some row updates in such a way that integrity is ensured and performance is high. I have two tables: ACCOUNTS ( account_id int, balance int ); TRANSACTIONS ( transaction_id int, source_account_id int, destination_account_id int, amount int ); When a money transaction from account_id = 111 to account_id = 222 with the amount of 123 is performed, the following things must happen as an atomic event: 1) INSERT INTO TRANSACTIONS (source_account_id, destination_account_id, amount) VALUES (111, 222, 123) 2) UPDATE ACCOUNTS SET balance = balance-123 WHERE account_id=111 3) UPDATE ACCOUNTS SET balance = balance+123 WHERE account_id=222 A lot of such money transactions will happen in parallel so I need ensure integrity of the rows in ACCOUNTS. This might be done by creating an *immutable* function that performs the three steps but this will block unnecessarily if to completely unrelated money transactions are tried to be performed in parallel. Any suggestions on how to perform step 1-3 while ensuring integrity? QUESTION 2: For various reasons I might need to modify the ACCOUNTS table to ACCOUNTS ( account_id int, transaction_id int, balance int, <some other info> ); so that the balance for account_id=111 is given by SELECT balance FROM ACCOUNTS WHERE account_id=111 ORDER BY transaction_id DESC LIMIT 1 How will that effect how I should perform the steps 1-3 above? Thanks Thanks
On Tue, Aug 28, 2007 at 05:48:50PM +0200, cluster wrote: > I need a way to perform a series of money transactions (row inserts) > together with some row updates in such a way that integrity is ensured > and performance is high. > > I have two tables: > ACCOUNTS ( > account_id int, > balance int > ); > > TRANSACTIONS ( > transaction_id int, > source_account_id int, > destination_account_id int, > amount int > ); > > When a money transaction from account_id = 111 to account_id = 222 with > the amount of 123 is performed, the following things must happen as an > atomic event: > 1) INSERT INTO TRANSACTIONS > (source_account_id, destination_account_id, amount) > VALUES (111, 222, 123) > 2) UPDATE ACCOUNTS SET balance = balance-123 WHERE account_id=111 > 3) UPDATE ACCOUNTS SET balance = balance+123 WHERE account_id=222 Why do you think you need to do anything special for this? As long as you're doing these 3 steps in a single transaction, everything should be fine. At most, you might need to set your transaction isolation level to serializable, but I don't think that's actually needed. > A lot of such money transactions will happen in parallel so I need > ensure integrity of the rows in ACCOUNTS. > This might be done by creating an *immutable* function that performs the > three steps but this will block unnecessarily if to completely unrelated > money transactions are tried to be performed in parallel. > > Any suggestions on how to perform step 1-3 while ensuring integrity? > > > QUESTION 2: > > For various reasons I might need to modify the ACCOUNTS table to > ACCOUNTS ( > account_id int, > transaction_id int, > balance int, > <some other info> > ); > > so that the balance for account_id=111 is given by > SELECT balance FROM ACCOUNTS > WHERE account_id=111 > ORDER BY transaction_id DESC > LIMIT 1 > > How will that effect how I should perform the steps 1-3 above? > > Thanks > > Thanks > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Вложения
OK, thanks. But what with the second question in which the UPDATE is based on a SELECT max(...) statement on another table? How can I ensure that no other process inserts a row between my SELECT max() and UPDATE - making my SELECT max() invalid? A table lock could be an option but I am only interested in blocking for row insertions for this particular account_id. Insertions for other account_ids will not make the SELECT max() invalid and should therefore be allowed.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/29/07 07:27, cluster wrote: > OK, thanks. But what with the second question in which the UPDATE is > based on a SELECT max(...) statement on another table? How can I ensure > that no other process inserts a row between my SELECT max() and UPDATE - > making my SELECT max() invalid? > > A table lock could be an option but I am only interested in blocking for > row insertions for this particular account_id. Insertions for other > account_ids will not make the SELECT max() invalid and should therefore > be allowed. Well, concurrency and transactional consistency *allows* other processes to update the table after you start your transaction. You just won't *see* their updates while you're inside of a transaction. Of course, if you truly want exclusive access, you could LOCK the table. It's well explained in the documentation... - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG1XaWS9HxQb37XmcRAi5hAKDff5j5KnqWdGKxHjCJuTwXxfPwjACfZuko 1Ic5Bq1tU3IlPP44VYyD74M= =Sv0p -----END PGP SIGNATURE-----
On Wed, Aug 29, 2007 at 08:37:26AM -0500, Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 08/29/07 07:27, cluster wrote: > > OK, thanks. But what with the second question in which the UPDATE is > > based on a SELECT max(...) statement on another table? How can I ensure > > that no other process inserts a row between my SELECT max() and UPDATE - > > making my SELECT max() invalid? > > > > A table lock could be an option but I am only interested in blocking for > > row insertions for this particular account_id. Insertions for other > > account_ids will not make the SELECT max() invalid and should therefore > > be allowed. > > Well, concurrency and transactional consistency *allows* other > processes to update the table after you start your transaction. You > just won't *see* their updates while you're inside of a transaction. Just make sure and read up about transaction isolation... in the default of READ COMMITTED mode, you can sometimes see changes made by other transactions. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Вложения
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/29/07 09:34, Decibel! wrote: > On Wed, Aug 29, 2007 at 08:37:26AM -0500, Ron Johnson wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 08/29/07 07:27, cluster wrote: >>> OK, thanks. But what with the second question in which the UPDATE is >>> based on a SELECT max(...) statement on another table? How can I ensure >>> that no other process inserts a row between my SELECT max() and UPDATE - >>> making my SELECT max() invalid? >>> >>> A table lock could be an option but I am only interested in blocking for >>> row insertions for this particular account_id. Insertions for other >>> account_ids will not make the SELECT max() invalid and should therefore >>> be allowed. >> Well, concurrency and transactional consistency *allows* other >> processes to update the table after you start your transaction. You >> just won't *see* their updates while you're inside of a transaction. > > Just make sure and read up about transaction isolation... in the default > of READ COMMITTED mode, you can sometimes see changes made by other > transactions. Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG1YxuS9HxQb37XmcRAlJOAKCWL+NtM95YC2bMkFjOkD2NfF/xuQCggfKO QQC/mW+IYtlV6R9rqaSomMs= =H3+i -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ron Johnson wrote: > On 08/29/07 09:34, Decibel! wrote: >> On Wed, Aug 29, 2007 at 08:37:26AM -0500, Ron Johnson wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> On 08/29/07 07:27, cluster wrote: >>>> OK, thanks. But what with the second question in which the UPDATE is >>>> based on a SELECT max(...) statement on another table? How can I ensure >>>> that no other process inserts a row between my SELECT max() and UPDATE - >>>> making my SELECT max() invalid? >>>> >>>> A table lock could be an option but I am only interested in blocking for >>>> row insertions for this particular account_id. Insertions for other >>>> account_ids will not make the SELECT max() invalid and should therefore >>>> be allowed. >>> Well, concurrency and transactional consistency *allows* other >>> processes to update the table after you start your transaction. You >>> just won't *see* their updates while you're inside of a transaction. >> Just make sure and read up about transaction isolation... in the default >> of READ COMMITTED mode, you can sometimes see changes made by other >> transactions. > > Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. SERIALIZABLE is really slow :). You should look into SERIALIZABLE only for those transactions that need it. There is also SELECT FOR UPDATE. Joshua D. Drake > - ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1ZOLATb/zqfZUUQRAl5UAKCf8cli24MMOjxsKlel5nEFXllGsgCeIfDn eg5BSlRpUlTGgGA7tBbx3EM= =ynMx -----END PGP SIGNATURE-----
Ron Johnson <ron.l.johnson@cox.net> writes: > On 08/29/07 07:27, cluster wrote: >> Just make sure and read up about transaction isolation... in the default >> of READ COMMITTED mode, you can sometimes see changes made by other >> transactions. > Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. You can change default_transaction_isolation if you like. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/29/07 10:40, Joshua D. Drake wrote: > Ron Johnson wrote: >> On 08/29/07 09:34, Decibel! wrote: >>> On Wed, Aug 29, 2007 at 08:37:26AM -0500, Ron Johnson wrote: >>>> >>>> On 08/29/07 07:27, cluster wrote: >>>>> OK, thanks. But what with the second question in which the UPDATE is >>>>> based on a SELECT max(...) statement on another table? How can I ensure >>>>> that no other process inserts a row between my SELECT max() and UPDATE - >>>>> making my SELECT max() invalid? >>>>> >>>>> A table lock could be an option but I am only interested in blocking for >>>>> row insertions for this particular account_id. Insertions for other >>>>> account_ids will not make the SELECT max() invalid and should therefore >>>>> be allowed. >>>> Well, concurrency and transactional consistency *allows* other >>>> processes to update the table after you start your transaction. You >>>> just won't *see* their updates while you're inside of a transaction. >>> Just make sure and read up about transaction isolation... in the default >>> of READ COMMITTED mode, you can sometimes see changes made by other >>> transactions. >> Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. > > SERIALIZABLE is really slow :). You should look into SERIALIZABLE only > for those transactions that need it. There is also SELECT FOR UPDATE. We use SERIALIZABLE (with all it's locking "issues") to guarantee the I in ACID. ISTM that READ COMMITTED can only deliver "ACD". - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG1ZVYS9HxQb37XmcRAlopAJ9wvAovDcqvUpsj5dqSrum+/3QUbgCeODwL a8BJm6gi7VnR6dWgtmTLkcM= =eg1s -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/29/07 10:47, Tom Lane wrote: > Ron Johnson <ron.l.johnson@cox.net> writes: >> On 08/29/07 07:27, cluster wrote: >>> Just make sure and read up about transaction isolation... in the default >>> of READ COMMITTED mode, you can sometimes see changes made by other >>> transactions. > >> Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. > > You can change default_transaction_isolation if you like. You misunderand: we do that on purpose, and I had forgotten that most RDBMSs don't do that. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG1Zf8S9HxQb37XmcRAjt+AKCW7I1H9ON60TEUaZMzLzcXYFb0pwCgh9qC Q+VwNMFCHTWqq1mTL8kx13w= =3NIY -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ron Johnson wrote: > On 08/29/07 10:40, Joshua D. Drake wrote: >> Ron Johnson wrote: >>> Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. >> SERIALIZABLE is really slow :). You should look into SERIALIZABLE only >> for those transactions that need it. There is also SELECT FOR UPDATE. > > We use SERIALIZABLE (with all it's locking "issues") to guarantee > the I in ACID. ISTM that READ COMMITTED can only deliver "ACD". You are using serializable for select statements? - ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1ZzQATb/zqfZUUQRAm+lAJ4i8s6I2MKCQGo1zD3g2w5lPRFikwCeNZML 4bV06CiM196qwC2l5MKqn10= =ygzn -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/29/07 11:20, Joshua D. Drake wrote: > Ron Johnson wrote: >> On 08/29/07 10:40, Joshua D. Drake wrote: >>> Ron Johnson wrote: > >>>> Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. >>> SERIALIZABLE is really slow :). You should look into SERIALIZABLE only >>> for those transactions that need it. There is also SELECT FOR UPDATE. >> We use SERIALIZABLE (with all it's locking "issues") to guarantee >> the I in ACID. ISTM that READ COMMITTED can only deliver "ACD". > > You are using serializable for select statements? READ ONLY, which defaults to SERIALIZABLE. (It's not PostgreSQL...) - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG1Z02S9HxQb37XmcRAnzVAKDFFqHLuMHE1q6sgvO288bzZvZa1gCfcGWM KUyB8HyjE3s9NfWq5GeLfvQ= =9jB2 -----END PGP SIGNATURE-----
"Joshua D. Drake" <jd@commandprompt.com> writes: > Ron Johnson wrote: >> Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. > SERIALIZABLE is really slow :). Say what? If anything it's probably faster than READ COMMITTED, because it doesn't take as many snapshots. But the difference is likely down in the noise anyway. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: >> Ron Johnson wrote: >>> Argh!!! The RDBMS that I typically use defaults to SERIALIZABLE. > >> SERIALIZABLE is really slow :). > > Say what? If anything it's probably faster than READ COMMITTED, because > it doesn't take as many snapshots. But the difference is likely down in > the noise anyway. Not in production it isn't. Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1aPLATb/zqfZUUQRAlWhAKCHgvvxUHRBZ5xQDmMK841U3/gglQCfdh9o mooGYXxZ57Hla31WeqQM9jI= =0mTL -----END PGP SIGNATURE-----
"Joshua D. Drake" <jd@commandprompt.com> writes: > Tom Lane wrote: >> "Joshua D. Drake" <jd@commandprompt.com> writes: >>> SERIALIZABLE is really slow :). >> >> Say what? If anything it's probably faster than READ COMMITTED, because >> it doesn't take as many snapshots. But the difference is likely down in >> the noise anyway. > Not in production it isn't. Well, I can believe that specific applications might be slower overall due to having to retry transactions that get serialization failures, or perhaps because they take more locks to prevent such failures. But it's not slower as far as the database engine is concerned. If you think otherwise I'd like to see a test case. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: >> Tom Lane wrote: >>> "Joshua D. Drake" <jd@commandprompt.com> writes: >>>> SERIALIZABLE is really slow :). >>> Say what? If anything it's probably faster than READ COMMITTED, because >>> it doesn't take as many snapshots. But the difference is likely down in >>> the noise anyway. > >> Not in production it isn't. > > Well, I can believe that specific applications might be slower overall > due to having to retry transactions that get serialization failures, > or perhaps because they take more locks to prevent such failures. > But it's not slower as far as the database engine is concerned. Well I can only speak to live production loads. I have never profiled the difference from that low of a level. I can definitely say that in a standard web app, under velocity, serializable is a huge performance killer. Sincerely, Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1bHqATb/zqfZUUQRAvDMAJ9nEu+9cumsD+P6E7pZmdkEry6V7QCeN1Cz nRjVC8BoFZb4b+u6ncP8UFo= =N4gK -----END PGP SIGNATURE-----
"Joshua D. Drake" <jd@commandprompt.com> writes: > Tom Lane wrote: >> "Joshua D. Drake" <jd@commandprompt.com> writes: >>> Tom Lane wrote: >>>> "Joshua D. Drake" <jd@commandprompt.com> writes: >>>>> SERIALIZABLE is really slow :). >>>> Say what? If anything it's probably faster than READ COMMITTED, because >>>> it doesn't take as many snapshots. But the difference is likely down in >>>> the noise anyway. >> >>> Not in production it isn't. >> >> Well, I can believe that specific applications might be slower overall >> due to having to retry transactions that get serialization failures, >> or perhaps because they take more locks to prevent such failures. >> But it's not slower as far as the database engine is concerned. > > Well I can only speak to live production loads. I have never profiled > the difference from that low of a level. I can definitely say that in a > standard web app, under velocity, serializable is a huge performance killer. Are you having to retry after serialization failures frequently? There's no reason for an individual transaction to take longer in SERIALIZABLE mode. In fact I believe SERIALIZABLE mode is actually measurably faster in benchmarks but haven't run one in READ COMMITTED mode recently (for that reason). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gregory Stark wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > >> Tom Lane wrote: >>> "Joshua D. Drake" <jd@commandprompt.com> writes: >>>> Tom Lane wrote: >>>>> "Joshua D. Drake" <jd@commandprompt.com> writes: >>>>>> SERIALIZABLE is really slow :). >>>>> Say what? If anything it's probably faster than READ COMMITTED, because >>>>> it doesn't take as many snapshots. But the difference is likely down in >>>>> the noise anyway. >>>> Not in production it isn't. >>> Well, I can believe that specific applications might be slower overall >>> due to having to retry transactions that get serialization failures, >>> or perhaps because they take more locks to prevent such failures. >>> But it's not slower as far as the database engine is concerned. >> Well I can only speak to live production loads. I have never profiled >> the difference from that low of a level. I can definitely say that in a >> standard web app, under velocity, serializable is a huge performance killer. > > Are you having to retry after serialization failures frequently? > > There's no reason for an individual transaction to take longer in SERIALIZABLE > mode. In fact I believe SERIALIZABLE mode is actually measurably faster in > benchmarks but haven't run one in READ COMMITTED mode recently (for that > reason). Oddly enough, I am the exact opposite boat :). We found that READ COMMITTED was faster a while back and haven't looked back except where the logic requires. The only recent testing I have done is with our PostgreSQL Analytics software. We are using Pyscopg2 which defaults to serializable. We were having serious performance problems under high concurrency selects. We moved to READ COMMITTED and it went away. I will see if I can do some digging and get some actual numbers for us. Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1ju7ATb/zqfZUUQRAlXlAJ0TWwfTpUQX++TDN0QPtYvhGGRyuwCghzRi 8mIlB2013+T4QMdjK2F3a9M= =HGhc -----END PGP SIGNATURE-----
On Wed, Aug 29, 2007 at 10:22:32PM +0100, Gregory Stark wrote: > mode. In fact I believe SERIALIZABLE mode is actually measurably faster in > benchmarks but haven't run one in READ COMMITTED mode recently (for that > reason). I think there's a reason why SERIALIZABLE could be slower, and that is that it's waiting on possibly-conflicting (but not actually conflicting) commits to happen in READ COMMITTED mode. No? Won't it have to check those things when it COMMITs? A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
Andrew Sullivan <ajs@crankycanuck.ca> writes: > I think there's a reason why SERIALIZABLE could be slower, and that > is that it's waiting on possibly-conflicting (but not actually > conflicting) commits to happen in READ COMMITTED mode. No? Won't it > have to check those things when it COMMITs? SERIALIZABLE mode does not introduce any waits that wouldn't happen anyway. It only affects what happens after you stop waiting. The sequence is that if you go to update or delete a row, and you see there's already an uncommitted change on the row, you have to wait for that transaction to commit or roll back. If it rolls back, you can proceed. If it commits, then either throw an error (in SERIALIZABLE mode) or attempt to update/delete the newest tuple version (in READ COMMITTED mode). regards, tom lane
On Thu, Aug 30, 2007 at 02:21:56PM -0400, Tom Lane wrote: > SERIALIZABLE mode does not introduce any waits that wouldn't happen > anyway. It only affects what happens after you stop waiting. Ok, this makes me think I'm deeply confused about something. (Uh, well, on this specific topic. Anyone who's ever read any of my posts or talked to me for 10 seconds will attest that I'm deeply confused about plenty of things generally!) I had the impression that, when working in READ COMMITTED mode, you could see (for instance) _new_ rows that were INSERTed by others who were also doing work. In SERIALIZABLE, you couldn't. So in cases where the additional rows met criteria in your WHERE clause late in your transaction, SERIALIZABLE had to throw them away. For instance, in READ COMMITTED, in a longish transaction, the "WHERE processed on IS NULL" might match more rows than were available at the beginning of the transaction when you SELECTed them, but in SERIALIZABLE, you can't see those additional rows. Is that wrong? (If so, I'm going to have to spend some time working out clarifications for the manual.) And doesn't the SERIALIZABLE transaction have to figure out that "this row doesn't count for me"? A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Andrew Sullivan <ajs@crankycanuck.ca> writes: > I had the impression that, when working in READ COMMITTED mode, you > could see (for instance) _new_ rows that were INSERTed by others who > were also doing work. In SERIALIZABLE, you couldn't. So in cases > where the additional rows met criteria in your WHERE clause late in > your transaction, SERIALIZABLE had to throw them away. For instance, > in READ COMMITTED, in a longish transaction, the "WHERE processed on > IS NULL" might match more rows than were available at the beginning > of the transaction when you SELECTed them, but in SERIALIZABLE, you > can't see those additional rows. Is that wrong? (If so, I'm going > to have to spend some time working out clarifications for the manual.) > And doesn't the SERIALIZABLE transaction have to figure out that > "this row doesn't count for me"? Sure, but so does READ COMMITTED. Both of them work with MVCC "snapshot" data structures that tell them which tuples to consider good and which were changed too recently to consider visible. The only difference is that SERIALIZABLE takes one snapshot at transaction start and works with that for the whole transaction, whereas READ COMMITTED takes a new snap for each statement. It's certainly true that a later snapshot might allow more tuples to be seen, but it might allow fewer as well. I don't see that that argument proves anything in general --- it'd all depend on details of how your application works. regards, tom lane
On Thu, Aug 30, 2007 at 03:32:40PM -0400, Tom Lane wrote: > difference is that SERIALIZABLE takes one snapshot at transaction start > and works with that for the whole transaction, whereas READ COMMITTED > takes a new snap for each statement. Oh, I get it. This explains then why in principle READ COMMITTED oughta be faster in the absence of conflicts: additional snapshot checks are not needed? (Sorry to be obtuse. I think I had a backward mental picture of how this worked: like SERIALIZABLE did everything RC did, and then threw stuff away, or in any case did additional work to ensure a nearly-mathematical serializability.) A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Thu, Aug 30, 2007 at 03:32:40PM -0400, Tom Lane wrote: >> difference is that SERIALIZABLE takes one snapshot at transaction start >> and works with that for the whole transaction, whereas READ COMMITTED >> takes a new snap for each statement. > Oh, I get it. This explains then why in principle READ COMMITTED > oughta be faster in the absence of conflicts: additional snapshot > checks are not needed? To my mind it ought to be slower in principle: computing snapshots isn't free (though it's not tremendously expensive either, unless you have a huge number of active backends). The actual tuple visibility checks are going to be about the same speed either way, it's only a question of which snapshot you are using. Anyway, this is probably all down in the noise compared to the details of what is happening on the application level. If you end up having to retry a lot of serialization failures, or if you use stronger locking to avoid such failures (thereby losing concurrency), you'll more than swamp out any possible engine-level difference. I suspect that something of the sort is responsible for JD's experiences. regards, tom lane