Обсуждение: Nested transactions

Поиск
Список
Период
Сортировка

Nested transactions

От
Alvaro Herrera
Дата:
Hackers,

I've been looking at what is involved into making nested transactions
possible.  So far I've identified the following things.  Please comment.
Additions are welcome.


Resource Management
-------------------

We will create and select a new memory context for each subtransaction.
This context will be deleted on transaction abort/commit.

Some At{Abort,Commit,EOX}act_*() actions will be executed at the end of the
subtransaction (indexes, GUC, Locks. Memory), while others will be delayed
(on_commit_actions, smgrDoPendingDeletes) till toplevel transaction commit.
Most of there routines will have to be revised so they do The Right Thing.


Locking
-------

The ProcReleaseLocks()/LockReleaseAll() interaction will have to be
modified to not just "release all locks" on abort.  It currently does
the right thing on commit, but the check for abort will have to be more
fine-grained.


Transaction State
-----------------

We will add a field to TransactionStateData with the xid of the parent
transaction.  If it's set to InvalidTransactionId, then the transaction is a
parent transaction [maybe we only need a boolean, since we can get the
parent transaction from the subtransaction tree anyway -- another idea
would be using a integer to denote nesting level, but how is that
useful?]

Whenever a transaction aborts, the backend is put into TRANS_ABORT state
only if it is a toplevel transaction.  If it is not, the backend is returned
to TRANS_INPROGRESS, the TransactionId goes back to the parent
(sub)transaction Id, and the pg_clog records the transaction as aborted.
The subtransaction tree may delete the corresponding branch.


Commit/abort protocol in pg_clog
--------------------------------

For a toplevel transaction, commit is:
- write 01 as state of xid
- write 01 as state of each non-aborted subtransaction

For a toplevel transaction, abort is:
- write 10 as state of xid
- write 10 as state of each non-aborted subtransaction

For a non-toplevel transaction, commit does nothing.

For a non-toplevel transaction, abort is:
- write 10 as state of xid
- write 10 as state of each non-aborted subtransaction.


Tuple Visibility
----------------

We keep the xmix/xmax protocol for tuple visibility.  To determine the
state of a transaction, we determine if it's a toplevel transaction.  If it
is, report the pg_clog value directly.

For a non-toplevel transaction, the protocol is as follows:
- if the state is 01, it is committed
- if the state is 10, it is aborted
- if the state is 00: - if it's toplevel, it's in progress - if parent is 01, start again - if parent is 10, it is
aborted- if parent is 00, check parent
 


Subtransaction tree
-------------------

I don't know how this will be, but the subsystem will have to answer the
following questions:

- Xid of my parent transaction
- List of Xids of non-aborted child transactions

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Hay que recordar que la existencia en el cosmos, y particularmente la
elaboración de civilizaciones dentre de él no son, por desgracia,
nada idílicas" (Ijon Tichy)


Re: Nested transactions

От
Manfred Koizar
Дата:
On Tue, 18 Mar 2003 00:20:40 -0400, Alvaro Herrera
<alvherre@dcc.uchile.cl> wrote:
>We will add a field to TransactionStateData with the xid of the parent
>transaction.  If it's set to InvalidTransactionId, then the transaction is a
>parent transaction

We need a stack of currently executing transactions.

>Whenever a transaction aborts, the backend is put into TRANS_ABORT state
>only if it is a toplevel transaction.  If it is not, the backend is returned
>to TRANS_INPROGRESS, the TransactionId goes back to the parent
>(sub)transaction Id, and the pg_clog records the transaction as aborted.
>The subtransaction tree may delete the corresponding branch.

First we have to work on the semantics.  Immediately aborting the
subtransaction is not consistent.  Better stay in the subtransaction
and mark it with a new SUBTRANS_ABORT state (or just TRANS_ABORT
because we know we are in a subtransaction).

On ROLLBACK: end the current subtransaction, pop it from the stack,
restore state of the enclosing transaction.

On COMMIT: end the current subtransaction (marking it as aborted in
pg_clog), pop it from the stack, set the enclosing transaction to
(SUB)TRANS_ABORT.

On BEGIN: start a new subtransaction, mark it as SUBTRANS_ABORT!  This
may sound weird, but I think we need it to make nested transactions
useful for scripts and functions.

Possible micro optimisation: don't assign a new xid, but keep track of
nesting level.

Any other command is ignored.

Do we need new convenience commands ROLLBACK ALL and/or COMMIT ALL?


>Commit/abort protocol in pg_clog [...]
>Tuple Visibility [...]

I had the strange feeling that this has already been done in more
detail and was almost going to refer you to
http://archives.postgresql.org/pgsql-hackers/2002-11/msg01124.php,
but checking the link I realised that this one message is truncated
near the beginning.  The rest of the thread is ok but the discussion
looks a bit out of context :-(

So I repost that message here, although it doesn't fully reflect my
current opinions.  (I'll try to integrate objections and suggestions
into a new proposal and post it tomorrow.)

On Fri, 29 Nov 2002 18:03:56 +0100, I wrote:
|On Thu, 28 Nov 2002 12:59:21 -0500 (EST), Bruce Momjian
|<pgman@candle.pha.pa.us> wrote:
|>Yes, locking is one possible solution, but no one likes that.  One hack
|>lock idea would be to create a subtransaction-only lock, [...]
|>
|>> [...] without
|>> having to touch the xids in the tuple headers.
|>
|>Yes, you could do that, but we can easily just set the clog bits
|>atomically,
|
|>From what I read above I don't think we can *easily* set more than one
|transaction's bits atomically.
|
|> and it will not be needed --- the tuple bits really don't
|>help us, I think.
|
|Yes, this is what I said, or at least tried to say.  I just wanted to
|make clear how this new approach (use the fourth status) differs from
|older proposals (replace subtransaction ids in tuple headers).
|
|>OK, we put it in a file.  And how do we efficiently clean it up?
|>Remember, it is only to be used for a _brief_ period of time.  I think a
|>file system solution is doable if we can figure out a way not to create
|>a file for every xid.
|
|I don't want to create one file for every transaction, but rather a
|huge (sparse) array of parent xids.  This array is divided into
|manageable chunks, represented by files, "pg_subtrans_NNNN".  These
|files are only created when necessary.  At any time only a tiny part
|of the whole array is kept in shared buffers.  This concept is similar
|or almost equal to pg_clog, which is an array of doublebits.
|
|>Maybe we write the xid's to a file in a special directory in sorted
|>order, and backends can do a btree search of each file in that directory
|>looking for the xid, and then knowing the master xid, look up that
|>status, and once all the children xid's are updated, you delete the
|>file.
|
|Yes, dense arrays or btrees are other possible implementations.  But
|for simplicity I'd do it pg_clog style.
|
|>Yes, but again, the xid status of subtransactions is only update just
|>before commit of the main transaction, so there is little value to
|>having those visible.
|
|Having them visible solves the atomicity problem without requiring
|long locks.  Updating the status of a single (main or sub) transaction
|is atomic, just like it is now.
|
|Here is what is to be done for some operations:
|
|BEGIN main transaction:
|    Get a new xid (no change to current behaviour).
|    pg_clog[xid] is still 00, meaning active.
|    pg_subtrans[xid] is still 0, meaning no parent.
|
|BEGIN subtransaction:
|    Push current transaction info onto local stack.
|    Get a new xid.
|    Record parent xid in pg_subtrans[xid].
|    pg_clog[xid] is still 00.
|
|ROLLBACK subtransaction:
|    Set pg_clog[xid] to 10 (aborted).
|    Optionally set clog bits for subsubtransactions to 10.
|    Pop transaction info from stack.
|
|COMMIT subtransaction:
|    Set pg_clog[xid] to 11 (committed subtrans).
|    Don't touch clog bits for subsubtransactions!
|    Pop transaction info from stack.
|
|ROLLBACK main transaction:
|    Set pg_clog[xid] to 10 (aborted).
|    Optionally set clog bits for subtransactions to 10.
|    
|COMMIT main transaction:
|    Set pg_clog[xid] to 01 (committed).
|    Optionally set clog bits for subtransactions from 11 to 01.
|    Don't touch clog bits for aborted subtransactions!
|
|Visibility check by other transactions:  If a tuple is visited and its
|XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to
|be consulted to find out the status of the inserting/deleting
|transaction xid.  If pg_clog[xid] is ...
|
|    00:  transaction still active
|
|    10:  aborted
|
|    01:  committed
|
|    11:  committed subtransaction, have to check parent
|
|Only in this last case do we have to get parentxid from pg_subtrans.
|Now we look at pg_clog[parentxid].  If we find ...
|
|    00:  parent still active, so xid is considered active, too
|
|    10:  parent aborted, so xid is considered aborted,
|         optionally set pg_clog[xid] = 10
|
|    01:  parent committed, so xid is considered committed,
|         optionally set pg_clog[xid] = 01
|
|    11:  recursively check grandparent(s) ...
|
|For brevity the following operations are not covered in detail:
|. Visibility checks for tuples inserted/deleted by a (sub)transaction
|belonging to the current transaction tree (have to check local
|transaction stack whenever we look at a xid or switch to a parent xid)
|. HeapTupleSatisfiesUpdate (sometimes has to wait for parent
|transaction)
|
|The trick here is, that subtransaction status is immediately updated
|in pg_clog on commit/abort.  Main transaction commit is atomic (just
|set its commit bit).  Status 11 is short-lived, it is replaced with
|the final status by one or more of
|
|    - COMMIT/ROLLBACK of the main transaction
|    - a later visibility check (as a side effect)
|    - VACUUM
|
|pg_subtrans cleanup:  A pg_subtrans_NNNN file covers a known range of
|transaction ids.  As soon as none of these transactions has a pg_clog
|status of 11, the pg_subtrans_NNNN file can be removed.  VACUUM can do
|this, and it won't even have to check the heap.

>I don't know how this will be, but the subsystem will have to answer the
>following questions:
>
>- Xid of my parent transaction

Yes, and not only *my* but *any* transaction.  So this has to be
globally visible.

>- List of Xids of non-aborted child transactions

Only if we do "parent commit updates all child states" as suggested by
Tom in
http://archives.postgresql.org/pgsql-hackers/2002-11/msg01129.php.
This can be in a data structure that's private to the transaction.

ServusManfred


Re: Nested transactions

От
Tom Lane
Дата:
Manfred Koizar <mkoi-pg@aon.at> writes:
> On COMMIT: end the current subtransaction (marking it as aborted in
> pg_clog), pop it from the stack, set the enclosing transaction to
> (SUB)TRANS_ABORT.

Surely not.  The outer transaction must remain alive, else there's no
point in the whole thing.
        regards, tom lane


Re: Nested transactions

От
Alvaro Herrera
Дата:
On Tue, Mar 18, 2003 at 10:32:58AM +0100, Manfred Koizar wrote:
> On Tue, 18 Mar 2003 00:20:40 -0400, Alvaro Herrera
> <alvherre@dcc.uchile.cl> wrote:
> >We will add a field to TransactionStateData with the xid of the parent
> >transaction.  If it's set to InvalidTransactionId, then the transaction is a
> >parent transaction
> 
> We need a stack of currently executing transactions.

Oh, sure.

> >Whenever a transaction aborts, the backend is put into TRANS_ABORT state
> >only if it is a toplevel transaction.  If it is not, the backend is returned
> >to TRANS_INPROGRESS, the TransactionId goes back to the parent
> >(sub)transaction Id, and the pg_clog records the transaction as aborted.
> >The subtransaction tree may delete the corresponding branch.
> 
> First we have to work on the semantics.  Immediately aborting the
> subtransaction is not consistent.  Better stay in the subtransaction
> and mark it with a new SUBTRANS_ABORT state (or just TRANS_ABORT
> because we know we are in a subtransaction).

That's what TBLOCK_ABORT state is (note that it's different from
TRANS_ABORT).  On error, we go into TBLOCK_ABORT state.  On transaction
end, we go from there to TBLOCK_ENDABORT. (And TRANS_ABORT, finish
processing the transaction and get the parent transaction from the
stack.)

If we are in TBLOCK_ABORT state and are issued a BEGIN, a new
subtransaction starts and put in TBLOCK_ABORT state too.  (I think this
is what you intended to say?)  We cannot ignore the BEGIN, and while we
can start with TBLOCK_INPROGRESS state it's only a loss of information,
because at transaction end we can register "aborted" in pg_clog
immediately.


> Possible micro optimisation: don't assign a new xid, but keep track of
> nesting level.

And how do you register things in pg_clog?  I don't think it's optional.


> Do we need new convenience commands ROLLBACK ALL and/or COMMIT ALL?

Probably, but that won't be until the basic machinery is working (same
for SAVEPOINTs: they will create a named transaction and rollback to
that looking at the transaction stack).


> >Commit/abort protocol in pg_clog [...]
> >Tuple Visibility [...]
> 
> I had the strange feeling that this has already been done in more
> detail and was almost going to refer you to
> http://archives.postgresql.org/pgsql-hackers/2002-11/msg01124.php,
> but checking the link I realised that this one message is truncated
> near the beginning.  The rest of the thread is ok but the discussion
> looks a bit out of context :-(

I read the whole thread from Google.  I borrowed Tom's proposal (in
.../2002-11/msg01129.php IIRC); it seems the simplest correct mechanism.

> |The trick here is, that subtransaction status is immediately updated
> |in pg_clog on commit/abort.  Main transaction commit is atomic (just
> |set its commit bit).  Status 11 is short-lived, it is replaced with
> |the final status by one or more of
> |
> |    - COMMIT/ROLLBACK of the main transaction
> |    - a later visibility check (as a side effect)
> |    - VACUUM

I think we obtain the same benefits without using the 11 state.  The
only downside is that non-aborted subtransactions will have their status
updated only at toplevel transaction commit.  We have two ways of doing
this:
1. create a new clog function that allows a list of xid's to be marked
committed (pg_clog lock is held longer),

2. call TransactionIdSetStatus multiple times (lock is held in multiple,
short amounts of time).  I think this will create more contention at
toplevel transaction commit, because when checking for a subtransaction
whose toplevel xact is just committing, the caller will have to obtain
the lock multiple times to get the final status (one for the subtrans,
another for the parent, another for the subtrans again).


> >I don't know how this will be, but the subsystem will have to answer the
> >following questions:
> >
> >- Xid of my parent transaction
> 
> Yes, and not only *my* but *any* transaction.  So this has to be
> globally visible.

Oh, sure.  It's the only reason why we need pg_subtrans to be shared,
AFAIU; else we could save this in local memory and be done with it.

> >- List of Xids of non-aborted child transactions
> 
> Only if we do "parent commit updates all child states" as suggested by
> Tom in
> http://archives.postgresql.org/pgsql-hackers/2002-11/msg01129.php.

Yes, I think this is the best proposal so far.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos / con todos los humanos acabaré (Bender)