Обсуждение: Tuple concurrency issue in large objects

Поиск
Список
Период
Сортировка

Tuple concurrency issue in large objects

От
Shalini
Дата:
Hi all,

I am working on a project which allows multiple users to work on single 
large text document. I am using lo_put to apply only the diff into the 
large object without replacing it with a new lob. While working on it, I 
encountered an error "Tuple concurrently updated".
The error can be reproduced with two psql clients.

Setup:

mydb=# create table text_docs(id serial primary key, data oid);
CREATE TABLE
mydb=# insert into text_docs(data) select lo_import('./upload.txt');
INSERT 0 1
mydb=# select * from text_docs;
  id |  data
----+---------
   1 | 5810130
(1 rows)

Now, if we open two psql clients and execute the following commands:

Client 1:

mydb=# begin;
BEGIN
mydb=# select lo_put(5810130, 10, '\xaa');
UPDATE 1

Client 2:

mydb=# select lo_put(5810130, 10, '\xaa');

Client 1:
mydb=# commit;
COMMIT

Client 2:
mydb=# select lo_put(5810130, 10, '\xaa');
ERROR:  tuple concurrently updated

Is there a workaround to this concurrency issue without creating a new 
large object?

Regards
Shalini





Re: Tuple concurrency issue in large objects

От
Rene Romero Benavides
Дата:
Hi Shalini. The usual diagnostic info is your postgresql server version, major and minor version, such as in 12.1 , the major version is 12 and the minor version (patch version) is 1.


On Fri, Dec 6, 2019 at 9:26 AM Shalini <shalini@saralweb.com> wrote:
Hi all,

I am working on a project which allows multiple users to work on single
large text document. I am using lo_put to apply only the diff into the
large object without replacing it with a new lob. While working on it, I
encountered an error "Tuple concurrently updated".
The error can be reproduced with two psql clients.

Setup:

mydb=# create table text_docs(id serial primary key, data oid);
CREATE TABLE
mydb=# insert into text_docs(data) select lo_import('./upload.txt');
INSERT 0 1
mydb=# select * from text_docs;
  id |  data
----+---------
   1 | 5810130
(1 rows)

Now, if we open two psql clients and execute the following commands:

Client 1:

mydb=# begin;
BEGIN
mydb=# select lo_put(5810130, 10, '\xaa');
UPDATE 1

Client 2:

mydb=# select lo_put(5810130, 10, '\xaa');

Client 1:
mydb=# commit;
COMMIT

Client 2:
mydb=# select lo_put(5810130, 10, '\xaa');
ERROR:  tuple concurrently updated

Is there a workaround to this concurrency issue without creating a new
large object?

Regards
Shalini






--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Re: Tuple concurrency issue in large objects

От
Shalini
Дата:

Hi Rene,

I am using Postgresql 11.2. Major version is 11 and minor version is 2.

On 12/10/2019 11:24 AM, Rene Romero Benavides wrote:
Hi Shalini. The usual diagnostic info is your postgresql server version, major and minor version, such as in 12.1 , the major version is 12 and the minor version (patch version) is 1.


On Fri, Dec 6, 2019 at 9:26 AM Shalini <shalini@saralweb.com> wrote:
Hi all,

I am working on a project which allows multiple users to work on single
large text document. I am using lo_put to apply only the diff into the
large object without replacing it with a new lob. While working on it, I
encountered an error "Tuple concurrently updated".
The error can be reproduced with two psql clients.

Setup:

mydb=# create table text_docs(id serial primary key, data oid);
CREATE TABLE
mydb=# insert into text_docs(data) select lo_import('./upload.txt');
INSERT 0 1
mydb=# select * from text_docs;
  id |  data
----+---------
   1 | 5810130
(1 rows)

Now, if we open two psql clients and execute the following commands:

Client 1:

mydb=# begin;
BEGIN
mydb=# select lo_put(5810130, 10, '\xaa');
UPDATE 1

Client 2:

mydb=# select lo_put(5810130, 10, '\xaa');

Client 1:
mydb=# commit;
COMMIT

Client 2:
mydb=# select lo_put(5810130, 10, '\xaa');
ERROR:  tuple concurrently updated

Is there a workaround to this concurrency issue without creating a new
large object?

Regards
Shalini






--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


Re: Tuple concurrency issue in large objects

От
"Daniel Verite"
Дата:
    Shalini wrote:

> >     Is there a workaround to this concurrency issue without creating a
> >     new   large object?

The transaction failing with the "Tuple concurrently updated"
error could be resubmitted by the client, as if it was a
serialization failure.
Or the failure could be prevented by using advisory locks:
https://www.postgresql.org/docs/current/explicit-locking.html


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Tuple concurrency issue in large objects

От
"Daniel Verite"
Дата:
    Shalini wrote:

> Could you also please state the reason why is it happening in case
> of large objects? Because concurrent transactions are very well
> handled for other data types, but the same is not happening for
> lobs. Is it because the fomer are stored in toast table and there is
> no support for concurrent txns in pg_largeobject table?

Keeping in mind that large objects are not a datatype, but rather a
functionality that is built on top of the bytea and oid datatypes plus
a set of functions, I wouldn't say that concurrent writes would be
better handled if you had a table: document(id serial, contents bytea)
with "contents" being indeed toastable.

To illustrate with a basic example: transactions Tx1 and Tx2
want to update the contents of the same document concurrently,
with this order of execution:

Tx1: begin
Tx1: update document set contents=... where id=...
Tx2: begin
Tx2: update the same document (gets blocked)
Tx1: commit
Tx2: commit

If using the read committed isolation level, Tx2 will be put to wait
until Tx1 commits, and then the update by Tx1 will be overwritten by
Tx2. That's a well known anomaly known as a "lost update", and
probably not what you want.

If using a better isolation level (repeatable read or serializable),
the update by Tx2 will be rejected with a serialization failure,
which, to me, seems the moral equivalent of the "Tuple concurrently
updated" error you're reporting with large objects.
When this occurs, your application can fetch the latest value in a new
transaction and see how it can apply its change to the new value,
unless another conflict arises and so on.

In short, the best the database can do in case of conflicting writes
is to inform the application. It can't know which write should be
prioritized or if the changes should be merged before being written.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Tuple concurrency issue in large objects

От
Justin
Дата:
I have a question reading through this email chain.   Does Large Objects table using these functions work like normal MVCC where there can be two versions of a large object in pg_largeobject .  My gut says no as moving/copying potentially 4 TB of data would kill any IO. 

I can not find any documentation discussing how these functions actually work with respect to Transaction Isolation, MVCC and Snapshots??

On Wed, Dec 18, 2019 at 10:05 AM Daniel Verite <daniel@manitou-mail.org> wrote:
        Shalini wrote:

> Could you also please state the reason why is it happening in case
> of large objects? Because concurrent transactions are very well
> handled for other data types, but the same is not happening for
> lobs. Is it because the fomer are stored in toast table and there is
> no support for concurrent txns in pg_largeobject table?

Keeping in mind that large objects are not a datatype, but rather a
functionality that is built on top of the bytea and oid datatypes plus
a set of functions, I wouldn't say that concurrent writes would be
better handled if you had a table: document(id serial, contents bytea)
with "contents" being indeed toastable.

To illustrate with a basic example: transactions Tx1 and Tx2
want to update the contents of the same document concurrently,
with this order of execution:

Tx1: begin
Tx1: update document set contents=... where id=...
Tx2: begin
Tx2: update the same document (gets blocked)
Tx1: commit
Tx2: commit

If using the read committed isolation level, Tx2 will be put to wait
until Tx1 commits, and then the update by Tx1 will be overwritten by
Tx2. That's a well known anomaly known as a "lost update", and
probably not what you want.

If using a better isolation level (repeatable read or serializable),
the update by Tx2 will be rejected with a serialization failure,
which, to me, seems the moral equivalent of the "Tuple concurrently
updated" error you're reporting with large objects.
When this occurs, your application can fetch the latest value in a new
transaction and see how it can apply its change to the new value,
unless another conflict arises and so on.

In short, the best the database can do in case of conflicting writes
is to inform the application. It can't know which write should be
prioritized or if the changes should be merged before being written.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: Tuple concurrency issue in large objects

От
Tom Lane
Дата:
Justin <zzzzz.graf@gmail.com> writes:
> I have a question reading through this email chain.   Does Large Objects
> table using these functions work like normal MVCC where there can be two
> versions of a large object in pg_largeobject .

Yes, otherwise you could never roll back a transaction that'd modified
a large object.

> My gut says no as
> moving/copying potentially 4 TB of data would kill any IO.

Well, it's done on a per-chunk basis (normally about 2K per chunk),
so you won't do that much I/O unless you're changing all of a 4TB
object.

            regards, tom lane



Re: Tuple concurrency issue in large objects

От
Justin
Дата:
I now see what is causing this specific issue...

The update and row versions is happening on 2kb chunk at a time,  That's going to make tracking what other clients are doing a difficult task.   

All the clients would have to have some means to notify all the other clients that an update occurred in this chunk, which could cause total reload of the data if the update spilled into adjoining rows,
The notifications and re-fetching of data to keep the clients in sync is going to make this a Network Chatty app. 

Maybe  adding a bit to the documentation stating "row versions occurs every X chunks"  

On Wed, Dec 18, 2019 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Justin <zzzzz.graf@gmail.com> writes:
> I have a question reading through this email chain.   Does Large Objects
> table using these functions work like normal MVCC where there can be two
> versions of a large object in pg_largeobject .

Yes, otherwise you could never roll back a transaction that'd modified
a large object.

> My gut says no as
> moving/copying potentially 4 TB of data would kill any IO.

Well, it's done on a per-chunk basis (normally about 2K per chunk),
so you won't do that much I/O unless you're changing all of a 4TB
object.

                        regards, tom lane

Re: Tuple concurrency issue in large objects

От
Tom Lane
Дата:
Justin <zzzzz.graf@gmail.com> writes:
> I now see what is causing this specific issue...
> The update and row versions is happening on 2kb chunk at a time,  That's
> going to make tracking what other clients are doing a difficult task.

Yeah, it's somewhat unfortunate that the chunkiness of the underlying
data storage becomes visible to clients if they try to do concurrent
updates of the same large object.  Ideally you'd only get a concurrency
failure if you tried to overwrite the same byte(s) that somebody else
did, but as it stands, modifying nearby bytes might be enough --- or
not, if there's a chunk boundary between.

On the whole, though, it's not clear to me why concurrent updates of
sections of large objects is a good application design.  You probably
ought to rethink how you're storing your data.

            regards, tom lane



Re: Tuple concurrency issue in large objects

От
Justin
Дата:
I agree  completely, 

I do not think Postgresql is a good fit for Shalini based on the conversation so far

tracking Concurrency is going to be a killer...  But i see the temptation to use a DB for this as the updates are ACID less likely to corrupted data for X reason 

On Wed, Dec 18, 2019 at 12:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Justin <zzzzz.graf@gmail.com> writes:
> I now see what is causing this specific issue...
> The update and row versions is happening on 2kb chunk at a time,  That's
> going to make tracking what other clients are doing a difficult task.

Yeah, it's somewhat unfortunate that the chunkiness of the underlying
data storage becomes visible to clients if they try to do concurrent
updates of the same large object.  Ideally you'd only get a concurrency
failure if you tried to overwrite the same byte(s) that somebody else
did, but as it stands, modifying nearby bytes might be enough --- or
not, if there's a chunk boundary between.

On the whole, though, it's not clear to me why concurrent updates of
sections of large objects is a good application design.  You probably
ought to rethink how you're storing your data.

                        regards, tom lane

Re: Tuple concurrency issue in large objects

От
Shalini
Дата:
Well.. it seems I have to rethink about my application design. Anyway, thank you all for your insights and suggestions.

On 12/18/2019 10:46 PM, Justin wrote:
I agree  completely, 

I do not think Postgresql is a good fit for Shalini based on the conversation so far

tracking Concurrency is going to be a killer...  But i see the temptation to use a DB for this as the updates are ACID less likely to corrupted data for X reason 

On Wed, Dec 18, 2019 at 12:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Justin <zzzzz.graf@gmail.com> writes:
> I now see what is causing this specific issue...
> The update and row versions is happening on 2kb chunk at a time,  That's
> going to make tracking what other clients are doing a difficult task.

Yeah, it's somewhat unfortunate that the chunkiness of the underlying
data storage becomes visible to clients if they try to do concurrent
updates of the same large object.  Ideally you'd only get a concurrency
failure if you tried to overwrite the same byte(s) that somebody else
did, but as it stands, modifying nearby bytes might be enough --- or
not, if there's a chunk boundary between.

On the whole, though, it's not clear to me why concurrent updates of
sections of large objects is a good application design.  You probably
ought to rethink how you're storing your data.

                        regards, tom lane