Обсуждение: ERROR: tuple concurrently updated

Поиск
Список
Период
Сортировка

ERROR: tuple concurrently updated

От
Stephen Frost
Дата:
Greetings,
 Subject pretty much says it all.  I've put up with this error in the past when it has caused me trouble but it's now
startingto hit our clients on occation which is just unacceptable.
 
 The way I've seen it happen, and this is just empirically so I'm not sure that it's exactly right, is something like
this:
 Running with pg_autovacuum on the system Run a long-running PL/PgSQL function which creates tables Wait for some sort
ofoverlap, and the PL/PgSQL function dies with the above error.
 
 I've also seen it happen when I've got a long-running PL/PgSQL function going and I'm creating tables in another
back-end.
 From a prior discussion I *think* the issue is the lack of versioning/visibility information in the SysCache which
meansthat if the long-running function attempts to look-up data about a table which was created *after* the
long-runningfunction started but was put into the common SysCache by another backend, the long-running function gets
screwedby the 'tuple concurrently updated' query and ends up failing and being rolled back.
 
 If this is correct then the solution seems to be either add versioning to the SysCache data, or have an overall 'this
SysCacheis only good for data past transaction X' so that a backend which is prior to that version could just accept
thatit can't use the SysCache and fall back to accessing the data directly (if that's possible).  I'm not very familiar
withthe way the SysCache system is envisioned but I'm not a terrible programmer (imv anyway) and given some direction
onthe correct approach to solving this problem I'd be happy to spend some time working on it.  I'd *really* like to see
thiserror just go away completely for all non-broken use-cases.
 
     Thanks,
    Stephen

Re: ERROR: tuple concurrently updated

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
>   Subject pretty much says it all.  I've put up with this error in the
>   past when it has caused me trouble but it's now starting to hit our
>   clients on occation which is just unacceptable.

Have you tracked down the exact scenario making it happen?

>   If this is correct then the solution seems to be either add versioning
>   to the SysCache data,

You have provided no evidence that that would fix anything at all.
To my mind a "concurrently updated" failure is more likely to mean
insufficient locking around update operations.
        regards, tom lane


Re: ERROR: tuple concurrently updated

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> >   Subject pretty much says it all.  I've put up with this error in the
> >   past when it has caused me trouble but it's now starting to hit our
> >   clients on occation which is just unacceptable.
>
> Have you tracked down the exact scenario making it happen?

I think I might have confused two different issues, sorry. :/

> >   If this is correct then the solution seems to be either add versioning
> >   to the SysCache data,
>
> You have provided no evidence that that would fix anything at all.
> To my mind a "concurrently updated" failure is more likely to mean
> insufficient locking around update operations.

I havn't built a reliable test case yet but I *think* the tuple
concurrently updated problem is with an analyze being run inside of a
function and also being run by autovacuum.  The SysCache stuff I was
thinking about previously was actually for another problem that I hadn't
seen in a long time (because I hadn't been doing a particular set of
operations, not because it's that difficult to have happen) but just ran
into again today:
ERROR:  cache lookup failed for relation ...

I first started seeing this happen, iirc, when I created a function
which went against pg_class/pg_attribute/pg_type and used
pg_table_is_visible().  This function (and another which uses pg_class,
pg_constraint and pg_attribute) gets used over and over again from
another pl/pgsql function.  Basically we are walking through a list of
tables pulling the column names and primary keys and then running some
checks on the tables.  Anyhow, while this is running (and it takes
upwards of half an hour to run) other activity on the database (this
time it was creating a view in a seperate completely unrelated schema)
can cause the lookup failure which kills the long-running function
(which gets to be very frustrating...).
Thanks,
    Stephen

Re: ERROR: tuple concurrently updated

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> I havn't built a reliable test case yet but I *think* the tuple
> concurrently updated problem is with an analyze being run inside of a
> function and also being run by autovacuum.

If so it should be fixed as of 8.2 --- I believe we changed the locking
rules to ensure only one ANALYZE at a time for any one table.

Conflicts from concurrent ANALYZEs are the only cases I've heard of
before that make this error occur in the field, but I suppose it would
be possible to get it from other things such as concurrently trying to
CREATE OR REPLACE the same function.

> The SysCache stuff I was
> thinking about previously was actually for another problem that I hadn't
> seen in a long time (because I hadn't been doing a particular set of
> operations, not because it's that difficult to have happen) but just ran
> into again today:
> ERROR:  cache lookup failed for relation ...

I think we've got a solution for that in 8.2, also --- at least, the
only common case I know of should be fixed, namely where a RENAME or
similar has caused the same table name to be assigned to a new OID.
        regards, tom lane


Re: ERROR: tuple concurrently updated

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I havn't built a reliable test case yet but I *think* the tuple
> > concurrently updated problem is with an analyze being run inside of a
> > function and also being run by autovacuum.
>
> If so it should be fixed as of 8.2 --- I believe we changed the locking
> rules to ensure only one ANALYZE at a time for any one table.
[...]
> I think we've got a solution for that in 8.2, also --- at least, the
> only common case I know of should be fixed, namely where a RENAME or
> similar has caused the same table name to be assigned to a new OID.

Great!  These were on 8.1 and I was actually just working to try and
reproduce them on 8.2 (without success so far!).  I'll see about
upgrading the production systems to 8.2 soon and will let ya'll know if
I see them again there.
Thanks!
    Stephen