Обсуждение: ERROR: tuple concurrently updated
Greetings, Subject pretty much says it all. I've put up with this error in the past when it has caused me trouble but it's now startingto hit our clients on occation which is just unacceptable. The way I've seen it happen, and this is just empirically so I'm not sure that it's exactly right, is something like this: Running with pg_autovacuum on the system Run a long-running PL/PgSQL function which creates tables Wait for some sort ofoverlap, and the PL/PgSQL function dies with the above error. I've also seen it happen when I've got a long-running PL/PgSQL function going and I'm creating tables in another back-end. From a prior discussion I *think* the issue is the lack of versioning/visibility information in the SysCache which meansthat if the long-running function attempts to look-up data about a table which was created *after* the long-runningfunction started but was put into the common SysCache by another backend, the long-running function gets screwedby the 'tuple concurrently updated' query and ends up failing and being rolled back. If this is correct then the solution seems to be either add versioning to the SysCache data, or have an overall 'this SysCacheis only good for data past transaction X' so that a backend which is prior to that version could just accept thatit can't use the SysCache and fall back to accessing the data directly (if that's possible). I'm not very familiar withthe way the SysCache system is envisioned but I'm not a terrible programmer (imv anyway) and given some direction onthe correct approach to solving this problem I'd be happy to spend some time working on it. I'd *really* like to see thiserror just go away completely for all non-broken use-cases. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > Subject pretty much says it all. I've put up with this error in the > past when it has caused me trouble but it's now starting to hit our > clients on occation which is just unacceptable. Have you tracked down the exact scenario making it happen? > If this is correct then the solution seems to be either add versioning > to the SysCache data, You have provided no evidence that that would fix anything at all. To my mind a "concurrently updated" failure is more likely to mean insufficient locking around update operations. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Subject pretty much says it all. I've put up with this error in the > > past when it has caused me trouble but it's now starting to hit our > > clients on occation which is just unacceptable. > > Have you tracked down the exact scenario making it happen? I think I might have confused two different issues, sorry. :/ > > If this is correct then the solution seems to be either add versioning > > to the SysCache data, > > You have provided no evidence that that would fix anything at all. > To my mind a "concurrently updated" failure is more likely to mean > insufficient locking around update operations. I havn't built a reliable test case yet but I *think* the tuple concurrently updated problem is with an analyze being run inside of a function and also being run by autovacuum. The SysCache stuff I was thinking about previously was actually for another problem that I hadn't seen in a long time (because I hadn't been doing a particular set of operations, not because it's that difficult to have happen) but just ran into again today: ERROR: cache lookup failed for relation ... I first started seeing this happen, iirc, when I created a function which went against pg_class/pg_attribute/pg_type and used pg_table_is_visible(). This function (and another which uses pg_class, pg_constraint and pg_attribute) gets used over and over again from another pl/pgsql function. Basically we are walking through a list of tables pulling the column names and primary keys and then running some checks on the tables. Anyhow, while this is running (and it takes upwards of half an hour to run) other activity on the database (this time it was creating a view in a seperate completely unrelated schema) can cause the lookup failure which kills the long-running function (which gets to be very frustrating...). Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > I havn't built a reliable test case yet but I *think* the tuple > concurrently updated problem is with an analyze being run inside of a > function and also being run by autovacuum. If so it should be fixed as of 8.2 --- I believe we changed the locking rules to ensure only one ANALYZE at a time for any one table. Conflicts from concurrent ANALYZEs are the only cases I've heard of before that make this error occur in the field, but I suppose it would be possible to get it from other things such as concurrently trying to CREATE OR REPLACE the same function. > The SysCache stuff I was > thinking about previously was actually for another problem that I hadn't > seen in a long time (because I hadn't been doing a particular set of > operations, not because it's that difficult to have happen) but just ran > into again today: > ERROR: cache lookup failed for relation ... I think we've got a solution for that in 8.2, also --- at least, the only common case I know of should be fixed, namely where a RENAME or similar has caused the same table name to be assigned to a new OID. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I havn't built a reliable test case yet but I *think* the tuple > > concurrently updated problem is with an analyze being run inside of a > > function and also being run by autovacuum. > > If so it should be fixed as of 8.2 --- I believe we changed the locking > rules to ensure only one ANALYZE at a time for any one table. [...] > I think we've got a solution for that in 8.2, also --- at least, the > only common case I know of should be fixed, namely where a RENAME or > similar has caused the same table name to be assigned to a new OID. Great! These were on 8.1 and I was actually just working to try and reproduce them on 8.2 (without success so far!). I'll see about upgrading the production systems to 8.2 soon and will let ya'll know if I see them again there. Thanks! Stephen