Обсуждение: XactLockTableWait doesn't set wait_event correctly

Поиск
Список
Период
Сортировка

XactLockTableWait doesn't set wait_event correctly

От
Simon Riggs
Дата:
New (9.4) XactLockTableWait() sets the reason for the wait, so that if
we emit a log message then it will produce a log message to say its
not waiting on a transaction, its waiting on a lock.

Nice. log_lock_waits works great.

New (9.6) wait_event code is not correctly set. In ProcSleep() we set
the wait reason according to the actual lock tag, effectively ignoring
the information specifically provided by XactLockTableWait().

Ugh.

pg_stat_activity shows a wait_event of 'transactionid' rather than 'tuple'
pg_locks shows a transactionid lock rather than a wait for tuple lock

That looks like a bug since we get different answers from
log_lock_wait and wait_event, which is annoying and confusing.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: XactLockTableWait doesn't set wait_event correctly

От
Amit Kapila
Дата:
On Tue, Nov 29, 2016 at 4:56 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> New (9.4) XactLockTableWait() sets the reason for the wait, so that if
> we emit a log message then it will produce a log message to say its
> not waiting on a transaction, its waiting on a lock.
>
> Nice. log_lock_waits works great.
>
> New (9.6) wait_event code is not correctly set. In ProcSleep() we set
> the wait reason according to the actual lock tag, effectively ignoring
> the information specifically provided by XactLockTableWait().
>

The information provided by XactLockTableWait() helps to display the
more information via context, but the message still suggests
transaction (refer LOG line in below message):

LOG:  process 6460 still waiting for ShareLock on transaction 42960
after 41822.775 ms
DETAIL:  Process holding the lock: 5704. Wait queue: 6460.
CONTEXT:  while updating tuple (0,1) in relation "t1"
STATEMENT:  update t1 set c1=3 where c1=1;

So I am not sure if displaying tuple in pg_stat_activity is better
than displaying transactionid and how will we distinguish it when some
process is actually waiting on tuple lock?  The process waiting on
tuple lock displays log message as below:

LOG:  process 648 still waiting for ExclusiveLock on tuple (0,1) of
relation 137344 of database 12245 after 1045.220 ms
DETAIL:  Process holding the lock: 6460. Wait queue: 648.
STATEMENT:  update t1 set c1=4 where c1=1;

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: XactLockTableWait doesn't set wait_event correctly

От
Robert Haas
Дата:
On Tue, Nov 29, 2016 at 6:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> New (9.4) XactLockTableWait() sets the reason for the wait, so that if
> we emit a log message then it will produce a log message to say its
> not waiting on a transaction, its waiting on a lock.
>
> Nice. log_lock_waits works great.
>
> New (9.6) wait_event code is not correctly set. In ProcSleep() we set
> the wait reason according to the actual lock tag, effectively ignoring
> the information specifically provided by XactLockTableWait().
>
> Ugh.
>
> pg_stat_activity shows a wait_event of 'transactionid' rather than 'tuple'
> pg_locks shows a transactionid lock rather than a wait for tuple lock
>
> That looks like a bug since we get different answers from
> log_lock_wait and wait_event, which is annoying and confusing.

What you find annoying or confusing is up to you, but it's not a bug,
and it's not particularly inconsistent, either.  log_lock_waits
reports a transaction lock, pg_stat_activity reports a transactionid
lock, and pg_locks reports a transactionid lock.  Now, it is true *in
addition* to reporting that the lock in question is a transaction
lock, log_lock_waits also reports some context information:

2016-11-29 13:49:59.702 EST [33658] LOG:  process 33658 still waiting
for ShareLock on transaction 2997486 after 1000.748 ms
2016-11-29 13:49:59.702 EST [33658] DETAIL:  Process holding the lock:
33653. Wait queue: 33658.
2016-11-29 13:49:59.702 EST [33658] CONTEXT:  while updating tuple
(0,67) in relation "pgbench_accounts"
2016-11-29 13:49:59.702 EST [33658] STATEMENT:  update
pgbench_accounts set filler = 'b' where aid = 1;

...but right there on the first line, it clearly says "waiting for
ShareLock on transaction 2997".

I'm not totally unsympathetic to the argument that perhaps
XactLockTableWait() ought to pass down the context information through
LockAcquire() so that it reaches ProcSleep() and we can set the wait
event to something like Tuple/Update instead of Lock/Transaction.  But
I don't think it's a slam-dunk, either.  If we did that, it would be
consistent with the additional detail provided by log_lock_waits
rather than the primary message, and it would be inconsistent with
pg_locks.  Also, I think that the reason why we originally used an
error-context callback here rather than passing down some additional
detail into LockAcquire() is that the breaking the abstraction layer
seemed messy and unappealing.  IIRC, Tom, in particular, was concerned
about that last point.

But feel free to propose a patch...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: XactLockTableWait doesn't set wait_event correctly

От
Simon Riggs
Дата:
On 29 November 2016 at 19:03, Amit Kapila <amit.kapila16@gmail.com> wrote:

> how will we distinguish it when some
> process is actually waiting on tuple lock?

The point is that both those actions are waiting for a tuple lock.

Obtaining a tuple lock requires two separate actions: First we do
LockTuple() and then we do XactLockTableWait().

So log_lock_wait output has two separate log entries for the same
thing, which is inconsistent. (One mentions the relation name, the
other mentions the relation id).

(Note that I'm not saying that all callers of XactLockTableWait are
tuple lockers; if they were it would be less confusing).

But at least that info is visible. log_lock_waits output allows you to
see that a XactLockTableWait is actually held for a tuple. There is no
way to do that for pg_stat_activity or pg_locks output, which is
inconsistent.

I'm not worried about abstraction layers, I'd just like to get useful
information out of the server to diagnose locking issues. Right now,
nobody can do that.

My proposal to resolve this is...

1. Make log_lock_wait output the same for both cases... following this format
LOG:  process 648 still waiting for ExclusiveLock on tuple (0,1) of
relation 137344 of database 12245 after 1045.220 ms
DETAIL:  Process holding the lock: 6460. Wait queue: 648.
STATEMENT:  update t1 set c1=4 where c1=1;
Nobody will miss the other format, since the above format has all the
same information.

2. Set wait_event_type = tuple when we wait during XactLockTableWait.
We need the reason info, not the actual wait info, since this is for
users not for our own diagnostics. This isn't very important, since
wait_event_type doesn't include details like which tuple or relation
caused the wait.

3. pg_locks output can't fit both locktag and reason info inside the
LOCKTAG struct, so we'd need to do something like store the reason
info in a separate hash table, so it can be used to explain a
transaction lock entry. I'm sure that will raise an objection, so
we'll need something like a view called pg_lock_wait_reason. Better
suggestions welcome.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: XactLockTableWait doesn't set wait_event correctly

От
Robert Haas
Дата:
On Wed, Nov 30, 2016 at 6:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Obtaining a tuple lock requires two separate actions: First we do
> LockTuple() and then we do XactLockTableWait().

I think that's kind of a confusing way of looking at it.  LockTuple()
waits for a "tuple" lmgr lock, and XactLockTableWait waits for a
"transaction" lmgr lock.  Those two things are both part of a larger
protocol for managing access to what we refer to as tuple locks at the
SQL level.  I don't think conflating those things would be a very good
idea, because it's useful to know which phase you're currently doing -
e.g. anybody waiting on a tuple lock is not first in the queue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: XactLockTableWait doesn't set wait_event correctly

От
Simon Riggs
Дата:
On 2 December 2016 at 00:28, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 30, 2016 at 6:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Obtaining a tuple lock requires two separate actions: First we do
>> LockTuple() and then we do XactLockTableWait().
>
> I think that's kind of a confusing way of looking at it.  LockTuple()
> waits for a "tuple" lmgr lock, and XactLockTableWait waits for a
> "transaction" lmgr lock.  Those two things are both part of a larger
> protocol for managing access to what we refer to as tuple locks at the
> SQL level.  I don't think conflating those things would be a very good
> idea, because it's useful to know which phase you're currently doing -
> e.g. anybody waiting on a tuple lock is not first in the queue.

Why is it useful to know which phase you're at? What can I do with that info?

Why is knowing that you're doing a "transaction lock" more important
than the fact you're doing a tuple lock on a particular db, relation,
page and tuple? The "transaction lock" tells you nothing a user can
act upon to improve the situation.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: XactLockTableWait doesn't set wait_event correctly

От
Robert Haas
Дата:
On Thu, Dec 1, 2016 at 7:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 2 December 2016 at 00:28, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Nov 30, 2016 at 6:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Obtaining a tuple lock requires two separate actions: First we do
>>> LockTuple() and then we do XactLockTableWait().
>>
>> I think that's kind of a confusing way of looking at it.  LockTuple()
>> waits for a "tuple" lmgr lock, and XactLockTableWait waits for a
>> "transaction" lmgr lock.  Those two things are both part of a larger
>> protocol for managing access to what we refer to as tuple locks at the
>> SQL level.  I don't think conflating those things would be a very good
>> idea, because it's useful to know which phase you're currently doing -
>> e.g. anybody waiting on a tuple lock is not first in the queue.
>
> Why is it useful to know which phase you're at? What can I do with that info?

Well, I just mentioned one thing you can do with it...

> Why is knowing that you're doing a "transaction lock" more important
> than the fact you're doing a tuple lock on a particular db, relation,
> page and tuple? The "transaction lock" tells you nothing a user can
> act upon to improve the situation.

I think the charter of pg_locks is to provide visibility into what's
actually going on in the lock table.  It could be useful to provide
additional supplementary information as well, if we can do that in a
reasonable way, but I think trying to paper over the fact that
internally it's a transaction lock probably isn't a good idea.  For
example, suppose next week somebody decides to replace the "waiting"
column in pg_locks with a column indicating the position in the lock
queue, with NULL indicating a granted lock.   Well, if transaction
locks look like tuple locks, the person writing that patch will find
it impossible to display something sane in those cases, because the
transaction lock queue is separate from the tuple lock queue and you
can't somehow glue those together without distorting what the lock
manager is actually doing.  That kind of thing is the natural
consequence of displaying something other than what the system is
actually doing internally.  But I have no objection to you adding
supplementary detail also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company