Re: BUG #18472: SELECT FOR UPDATE locking more rows than expected

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #18472: SELECT FOR UPDATE locking more rows than expected
Дата
Msg-id 2760373.1716044766@sss.pgh.pa.us
обсуждение исходный текст
Ответ на BUG #18472: SELECT FOR UPDATE locking more rows than expected  (PG Bug reporting form <noreply@postgresql.org>)
Список pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> And then retry I can see that I get the expected 5 rows back in the query
> from tab 2 leading me to believe that when both the LIMIT and ORDER BY are
> present postgres is locking more rows than I believe should be necessary
> based on the documentation that I've read and experiments that I've done.

This is not a bug, because there is no guarantee that SELECT FOR
UPDATE will lock only what you think is the minimum number of rows.

I realize that this is probably an oversimplified example of your real
problem, but in the given case the issue is use of the sub-select;
do you really need that?  EXPLAIN shows the plan as

 Limit
   ->  LockRows
         ->  Sort
               Sort Key: t1.b
               ->  Nested Loop
                     ->  Seq Scan on tab1 t1
                     ->  Subquery Scan on t2
                           ->  Limit
                                 ->  LockRows
                                       ->  Seq Scan on tab2
                                             Filter: (t1.a = a_ref)

The upper LockRows node should indeed not lock any rows that
aren't returned, but the trouble is the lower one, which will
probably end up locking every tab2 row that has a match in tab1.
(Remember the Sort will not be able to return any row until
it's run the Nested Loop to completion.)

The reason that's there is that the outer query interprets
application of FOR UPDATE to a sub-SELECT-in-FROM as being
a request to push down a FOR UPDATE into the sub-SELECT.
You could get rid of that by specifying just "FOR UPDATE OF t1"
in the outer query, but of course then you'd end with no lock
applied to the joined t2 row, so I don't know if that works
for you.

I think the answer could be to get rid of the inner LIMIT,
which is what's preventing optimization of the subquery.
I dislike that on semantic grounds anyway, because as this
is written it's totally undefined which tab2 rows get joined
to which tab1 rows.  Without that I get

explain (costs off)
SELECT *
FROM public.tab1 t1,
LATERAL (SELECT * 
FROM public.tab2
WHERE "a"="a_ref" 
) t2
ORDER BY b ASC
LIMIT 1
FOR UPDATE;
                     QUERY PLAN                      
-----------------------------------------------------
 Limit
   ->  LockRows
         ->  Sort
               Sort Key: t1.b
               ->  Merge Join
                     Merge Cond: (t1.a = tab2.a_ref)
                     ->  Sort
                           Sort Key: t1.a
                           ->  Seq Scan on tab1 t1
                     ->  Sort
                           Sort Key: tab2.a_ref
                           ->  Seq Scan on tab2
(12 rows)

which would behave a lot better in terms of locking
only the returned row(s).

            regards, tom lane



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: BUG #18465: Wrong results from SELECT DISTINCT MIN in scalar subquery using HashAggregate
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #18465: Wrong results from SELECT DISTINCT MIN in scalar subquery using HashAggregate