Обсуждение: Re: atomic commit;begin for long running transactions , in combination with savepoint.

Поиск
Список
Период
Сортировка

Re: atomic commit;begin for long running transactions , in combination with savepoint.

От
Syan Tan
Дата:
thanks. I'll try it out, but sounds true enough; so there is no
isolation level where transaction state keeps track of all the read versions
of any item read so far by the transaction ?
 The main question is whether there's a way of avoiding implementing
either those patterns of offline optimistic locking (version checking)
or offline pessimistic locking ( lock table checking) , when
you've got interactive client applications ?

If there was an isolation level that kept track of the versions of items
a transaction has read, and there was a "commit with continue" command, then
you could avoid that  extra work in clients - or maybe warping transactions
for this purpose makes it more difficult to write transactions for
the usual batch processing purposes.


On Mon Oct 15  5:28 , "Trevor Talbot"  sent:

>On 10/15/07, Syan Tan  wrote:
>
>> >In order to detect a change occurred, what you want is a SERIALIZABLE
>> >transaction: you want the update to fail if the row it matches is no
>> >longer the same version as your snapshot.  However, in order to read
>> >the new value to decide if you want to update it anyway, you need to
>> >leave your current snapshot.  As soon as you do that, ALL previously
>> >read values lose the update checks that snapshot provided you.
>>
>> you read the old value at the beginning of the transaction, and
>> you don't re-read it , assuming it is infrequently changed, so
>> if someone updates it concurrently, when you try to write, then
>> you detect the conflict, and rollback to the savepoint.
>
>Transactions don't operate based on what you've read.  "UPDATE ...
>WHERE ..." finds the row(s) to update using the WHERE clause right
>now, not based on any previously-read values.  It does not know what
>you've read before.
>
>The only difference is in the data you _can_ read.  For the
>SERIALIZABLE isolation level, that data was decided at the beginning
>of the transaction.  A row that was updated by another transaction
>will make the version that you can see effectively "read only", so
>when the UPDATE tries to change it, there will be a transactional
>conflict due to the isolation level.
>
>> You DONT want a serializable transaction, because then you can't read
>> the other committed value after rolling back to the savepoint.
>
>Correct.  But the READ COMMITTED isolation level does not limit what
>data you can see at the beginning of the transaction, so an UPDATE
>will always find the latest version of a row.  There is no conflict as
>far as the transaction is concerned.
>
>> >A way to do this using PostgreSQL's own row version data came up
>> >recently on this list.  Have a look at this post and the one following
>> >it:
>> >http://archives.postgresql.org/pgsql-general/2007-10/msg00503.php
>>
>> this is the same as using your own version ids and incrementing them
>> within the application, which would leave the xmin within postgresql's
>> domain , and would also mean the application's sql is not tied
>> to postgresql.
>
>Yes.  You were asking for a feature in PostgreSQL that doesn't match
>standard transaction semantics though, so I figured you wouldn't mind
>a PostgreSQL-specific option :)
>
>
>Assuming READ COMMITTED isolation level:
>
>> begin;
>> select x from t1 where id=2;
>> (store in variable x0 , display to user)
>> ...(much later)
>> (user changes stored x, at client, now x1)
>> savepoint a;
>>
>> answ = n;
>>
>> do:
>> try:
>>     update t1 set x=x1 where id = 2;
>>     commit-and-continue;
>> catch:
>>      rollback to savepoint a;
>>      select x from t1 where id=2 ( store as x2)
>>      if (x2  x0) notify user "x has changed from x0 to x2, continue to write x1?"
>>      input answ;
>>
>> while answ ==y;
>>
>>
>> In the first pass of the loop, the transaction hasn't read x a second
>> time so the transaction state for x is at x0,
>
>Transaction state is not based on what you've read; it doesn't know.
>
>> if x has been changed by another transaction's commit, then the catch will
>> execute ,
>
>The UPDATE will find the latest version of the row.  The change made
>by the other transaction is not a problem at this isolation level (you
>can see it), so the UPDATE will simply proceed and change it anyway.
>The catch block will never execute.
>
>> If the transaction was serializable , and another transaction has committed,
>> this would never work, because a commit would nullify the other
>> transactions write without this transaction ever having seen the other
>> transactions write, so this transaction would always be forced to rollback.
>
>Exactly.  But without SERIALIZABLE isolation, any operations you
>perform always see the other transaction's write, so there is never
>any conflict.
>
>Also keep in mind that MVCC is not the only way to implement
>transactions; pure locking is more common in other databases.  In the
>locking model, most transactions prevent others from writing until
>after they are finished.  Rows simply can't have different versions
>(and of course concurrent performance is awful).




Re: atomic commit;begin for long running transactions , in combination with savepoint.

От
"Trevor Talbot"
Дата:
On 10/15/07, Syan Tan <kittylitter@people.net.au> wrote:
> thanks. I'll try it out, but sounds true enough; so there is no
> isolation level where transaction state keeps track of all the read versions
> of any item read so far by the transaction ?

Right.  That would be a whole different set of semantics.

>  The main question is whether there's a way of avoiding implementing
> either those patterns of offline optimistic locking (version checking)
> or offline pessimistic locking ( lock table checking) , when
> you've got interactive client applications ?

Those are the only two methods I'm aware of.  If anyone else knows of
others, I'm sure they'll speak up.

> If there was an isolation level that kept track of the versions of items
> a transaction has read, and there was a "commit with continue" command, then
> you could avoid that  extra work in clients - or maybe warping transactions
> for this purpose makes it more difficult to write transactions for
> the usual batch processing purposes.

Transactions are governed by the SQL standard, so changing that would
not really be an option anyway.  Even so, trying to implement that
behavior (in any form) would get really complicated when you consider
situations other than single identifiable rows.

For instance, if you do "SELECT * FROM foo LIMIT 5" you essentially
get 5 arbitrary rows back.  If they don't actually have a primary key,
you can't identify them individually.  What would "previously read"
mean in such a situation?

Or what about rows constructed via joins from other tables?  There is
more than one way to join data; if you read the same rows from the
underlying tables, but join them in a different way to get the final
result, does that count as being previously read or not?  What about
aggregate results; is that one previously read row, or all of the rows
that went into the aggregate?  What if all of this is happening
through a VIEW, and the application doesn't know it's coming from the
same tables underneath?

Using transactions as a unit of isolation works well because they
don't have any semantics that rely on specific rows; they cover what
data you are able to see and that's about it.  An application can
understand that a row is individually identifiable because that's how
it was designed, but in general terms there's no such guarantee.
Trying to define that kind of isolation level in the general case
would be really hard.