Обсуждение: pg_advisory_lock problem

Поиск
Список
Период
Сортировка

pg_advisory_lock problem

От
Rémi Cura
Дата:
Hey dear list,

following the advise of Depesz I'm trying to use advisory lock.

I'm trying to perform parallel upsert.
I have 2 different sessions in which I run a very complicated querry (lot's of CTE) concurrently. In fact this complicated querry is simply put inside a plpgsql function for ease of calling.

the querry performs upsert in 3 different tables, each time using a dedicated plpgsql function that looks like this :
---------
PERFORM pg_advisory_lock(#1 , 2 or 3 depending on the table#,id of the stuff I want to upsert) ;
WITH stuff_to_upsert ()
,updating AS (update returning id)
,inserting AS (insert if not updated)
PERFORM pg_advisory_unlock(same as above).
--------

The querry call such plpgsql function like this
CTE_N(
SELECT r.*
FROM result_to_be_upserted, function_upserting(...) as r
)

Yet I still have errors of duplicated primary key being broken because trying to insert 2 times the same stuff.
ERROR:  duplicate key value violates unique constraint "result_intersection_pkey"
DETAIL:  Key (node_id)=(12621) already exists.

Of course when performing the querry sequencially I don't have any errors, even performing it several times.

I have read the 2 pages relevant to pg_advisory lock, and I clean all the advisory lock before executing the test that gives those errors.

After the errors happens (which means that 1 process completed and the other failed), I can see that there is a lock in pg_locks with the id of the row that caused the error when being upserted.
advisory;2953366;;;;;;3;12621;2;8/0;11380;ExclusiveLock;t;f

Any help is greatly appreciated, I have tried everything I could think of.

Thanks,
Cheers,
Rémi-C

Re: pg_advisory_lock problem

От
David G Johnston
Дата:
Rémi Cura wrote
> Hey dear list,
>
> following the advise of Depesz I'm trying to use advisory lock.
>
> I'm trying to perform parallel upsert.
> I have 2 different sessions in which I run a very complicated querry
> (lot's
> of CTE) concurrently. In fact this complicated querry is simply put inside
> a plpgsql function for ease of calling.
>
> the querry performs upsert in 3 different tables, each time using a
> dedicated plpgsql function that looks like this :
> ---------
> PERFORM pg_advisory_lock(#1 , 2 or 3 depending on the table#,id of the
> stuff I want to upsert) ;
> WITH stuff_to_upsert ()
> ,updating AS (update returning id)
> ,inserting AS (insert if not updated)
> PERFORM pg_advisory_unlock(same as above).
> --------
>
> The querry call such plpgsql function like this
> CTE_N(
> SELECT r.*
> FROM result_to_be_upserted, function_upserting(...) as r
> )
>
> Yet I still have errors of duplicated primary key being broken because
> trying to insert 2 times the same stuff.
> ERROR:  duplicate key value violates unique constraint
> "result_intersection_pkey"
> DETAIL:  Key (node_id)=(*12621*) already exists.
>
> Of course when performing the querry sequencially I don't have any errors,
> even performing it several times.
>
> I have read the 2 pages relevant to pg_advisory lock, and I clean all the
> advisory lock before executing the test that gives those errors.
>
> After the errors happens (which means that 1 process completed and the
> other failed), I can see that there is a lock in pg_locks with the id of
> the row that caused the error when being upserted.
> advisory;2953366;;;;;;3;*12621*;2;8/0;11380;ExclusiveLock;t;f
>
> Any help is greatly appreciated, I have tried everything I could think of.

Without digging into this too deeply I suspect that there is some MVCC
visibility issues going on here.  Basically, while the pg_advisory_lock()
can see, and therefore wait for, the lock issued by the other session it
cannot see the newly inserted record from that other session and so once the
lock is released it goes ahead and tries to insert too and fails.

Unfortunately I am not familiar with this particular means of upset so I
cannot be of greater help.

A self-contained example would probably help.

David J.

P.S. one "r" in query...I generally avoid this kind of thing but it stood
out and you did it 4 times so I didn't think it was a typo...




--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-advisory-lock-problem-tp5814387p5814396.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: pg_advisory_lock problem

От
John R Pierce
Дата:
---------
>PERFORM pg_advisory_lock(#1 , 2 or 3 depending on the table#,id of the
>stuff I want to upsert) ;
>WITH stuff_to_upsert ()
>,updating AS (update returning id)
>,inserting AS (insert if not updated)
>PERFORM pg_advisory_unlock(same as above).
>--------


ah, you're releasing the lock before the insert is committed, since this
is all within a function call, its entirely within a single transaction.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: pg_advisory_lock problem

От
Rémi Cura
Дата:



2014-08-11 5:33 GMT+02:00 John R Pierce <pierce@hogranch.com>:

---------
PERFORM pg_advisory_lock(#1 , 2 or 3 depending on the table#,id of the
stuff I want to upsert) ;
WITH stuff_to_upsert ()
,updating AS (update returning id)
,inserting AS (insert if not updated)
PERFORM pg_advisory_unlock(same as above).
--------


ah, you're releasing the lock before the insert is committed, since this is all within a function call, its entirely within a single transaction.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Hey,
thanks your two !

Oh no I hoped it was something fixable
, because I don't think I can fix this transaction problem.
I guess because the big query is called in one plpgsql function it will be in one transaction
,so even if I resorted to plpython in the upserting functions to issue commit it shouldn't work.
The only way would be to transform the big query plpgsqp function into another pl function,
but I can't really afford it :-(

Damn, it is really hard to use a table with 2 different session at the same time!

Thanks for your answers, and for the much needed typo correction ;-)

Cheers,
Rémi-C

Re: pg_advisory_lock problem

От
Kevin Grittner
Дата:
Rémi Cura <remi.cura@gmail.com> wrote:

> 2014-08-11 5:33 GMT+02:00 John R Pierce <pierce@hogranch.com>:

>> ah, you're releasing the lock before the insert is committed,
>> since this is all within a function call, its entirely within a
>> single transaction.

> Oh no I hoped it was something fixable

Well, it might be.  Try using a transactional advisory lock and
letting it expire at the end of the transaction, rather than
explicitly releasing it before the transaction commits.  Depending
on some other details, that might get it to do what you want.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_advisory_lock problem

От
Merlin Moncure
Дата:
On Mon, Aug 11, 2014 at 9:49 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Rémi Cura <remi.cura@gmail.com> wrote:
>
>> 2014-08-11 5:33 GMT+02:00 John R Pierce <pierce@hogranch.com>:
>
>>> ah, you're releasing the lock before the insert is committed,
>>> since this is all within a function call, its entirely within a
>>> single transaction.
>
>> Oh no I hoped it was something fixable
>
> Well, it might be.  Try using a transactional advisory lock and
> letting it expire at the end of the transaction, rather than
> explicitly releasing it before the transaction commits.  Depending
> on some other details, that might get it to do what you want.

Better to use vanilla LOCK TABLE statement in my opinion for this purpose.

merlin


Re: pg_advisory_lock problem

От
Rémi Cura
Дата:
Hey,
as you (both?) suggested it works using advisory lock used at the beginning and end of the transaction. This way there is no upsert at all if the element is locked? (I used general advisory lockbut in the same way as transactionnal lock)
The issue in this case is simple : I have to use about 100k advisory locks, which is a big memory requirement for my hardware :-(

Merlin I'm afraid I don't understand what is "vanilla" LOCK TABLE.

I can't really use a lock table because each query upsert sequentially into 3 tables, doing lots of computing between.
I use parallel query to compute faster (load dividing). I guess it would be very slow with about 8 parallel queries with locks.

I should test this lock approach to be sure.

Thanks both of you !
Cheers,
Rémi-C



2014-08-11 17:51 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
On Mon, Aug 11, 2014 at 9:49 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Rémi Cura <remi.cura@gmail.com> wrote:
>
>> 2014-08-11 5:33 GMT+02:00 John R Pierce <pierce@hogranch.com>:
>
>>> ah, you're releasing the lock before the insert is committed,
>>> since this is all within a function call, its entirely within a
>>> single transaction.
>
>> Oh no I hoped it was something fixable
>
> Well, it might be.  Try using a transactional advisory lock and
> letting it expire at the end of the transaction, rather than
> explicitly releasing it before the transaction commits.  Depending
> on some other details, that might get it to do what you want.

Better to use vanilla LOCK TABLE statement in my opinion for this purpose.

merlin

Re: pg_advisory_lock problem

От
Kevin Grittner
Дата:
Rémi Cura <remi.cura@gmail.com> wrote:

> as you (both?) suggested it works using advisory lock used at the
> beginning and end of the transaction. This way there is no upsert
> at all if the element is locked? (I used general advisory lockbut
> in the same way as transactionnal lock)

This is too vague to comment on.

> The issue in this case is simple : I have to use about 100k
> advisory locks, which is a big memory requirement for my hardware
> :-(

... and that doesn't seem to make any sense.  Either you are not
understanding advisory locks or you are doing something very, very
unusual.

> Merlin I'm afraid I don't understand what is "vanilla" LOCK
> TABLE.

See the LOCK TABLE command.

http://www.postgresql.org/docs/current/interactive/sql-lock.html

http://www.postgresql.org/docs/current/interactive/explicit-locking.html

> I can't really use a lock table because each query upsert
> sequentially into 3 tables, doing lots of computing between.

Now *that* I understand.  :-)  It's not an unusual requirement,
but can be a challenge when using snapshot isolation (where writes
don't block reads and reads don't block anything).  There are two
main approaches -- introduce blocking to serialize some of the
operations, or use the SERIALIZABLE transaction isolation level to
ensure that the behavior of all concurrent transactions is
consistent with the behavior you would see if they were run one at
a time.  The latter approach doesn't introduce any new blocking,
but it can cause transactions to get an ERROR with a SQLSTATE of
40001 at just about any point, so you need to be prepared to
recognize that and retry those transactions from the beginning (not
just the last statement of the transaction), ignoring any data read
during the failed attempt.

You may want to read the entire chapter on concurrency control:

http://www.postgresql.org/docs/current/interactive/mvcc.html

If you are considering using SERIALIZABLE transactions, you should
probably review the examples in the Wiki, to get an idea of how it
behaves in various cases:

http://wiki.postgresql.org/wiki/SSI

> I use parallel query to compute faster (load dividing). I guess
> it would be very slow with about 8 parallel queries with locks.

Well, if you introduce blocking you reduce your parallelism, but if
you use serializable transactions and there are actually a lot of
conflicts you can see poor performance because of the errors
rolling back transactions and the need to retry them from the
start.  The techniques used to implement serializable transactions
in PostgreSQL are basically a refinement of the Optimistic
Concurrency Control (OCC) techniques, but generally with far fewer
retries needed -- the point being that it optimistically assumes
that there will not be a conflict so that concurrency is better,
but has to cancel things if that optimism proves to be unfounded.

To make related to changes to multiple tables and maintain coherent
data, you probably will need to do one or the other.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_advisory_lock problem

От
Rémi Cura
Дата:



2014-08-11 22:48 GMT+02:00 Kevin Grittner <kgrittn@ymail.com>:
Rémi Cura <remi.cura@gmail.com> wrote:

> as you (both?) suggested it works using advisory lock used at the
> beginning and end of the transaction. This way there is no upsert
> at all if the element is locked? (I used general advisory lockbut
> in the same way as transactionnal lock)

This is too vague to comment on.

> The issue in this case is simple : I have to use about 100k
> advisory locks, which is a big memory requirement for my hardware
> :-(

... and that doesn't seem to make any sense.  Either you are not
understanding advisory locks or you are doing something very, very
unusual.

> Merlin I'm afraid I don't understand what is "vanilla" LOCK
> TABLE.

See the LOCK TABLE command.

http://www.postgresql.org/docs/current/interactive/sql-lock.html

http://www.postgresql.org/docs/current/interactive/explicit-locking.html

> I can't really use a lock table because each query upsert
> sequentially into 3 tables, doing lots of computing between.

Now *that* I understand.  :-)  It's not an unusual requirement,
but can be a challenge when using snapshot isolation (where writes
don't block reads and reads don't block anything).  There are two
main approaches -- introduce blocking to serialize some of the
operations, or use the SERIALIZABLE transaction isolation level to
ensure that the behavior of all concurrent transactions is
consistent with the behavior you would see if they were run one at
a time.  The latter approach doesn't introduce any new blocking,
but it can cause transactions to get an ERROR with a SQLSTATE of
40001 at just about any point, so you need to be prepared to
recognize that and retry those transactions from the beginning (not
just the last statement of the transaction), ignoring any data read
during the failed attempt.

You may want to read the entire chapter on concurrency control:

http://www.postgresql.org/docs/current/interactive/mvcc.html

If you are considering using SERIALIZABLE transactions, you should
probably review the examples in the Wiki, to get an idea of how it
behaves in various cases:

http://wiki.postgresql.org/wiki/SSI

> I use parallel query to compute faster (load dividing). I guess
> it would be very slow with about 8 parallel queries with locks.

Well, if you introduce blocking you reduce your parallelism, but if
you use serializable transactions and there are actually a lot of
conflicts you can see poor performance because of the errors
rolling back transactions and the need to retry them from the
start.  The techniques used to implement serializable transactions
in PostgreSQL are basically a refinement of the Optimistic
Concurrency Control (OCC) techniques, but generally with far fewer
retries needed -- the point being that it optimistically assumes
that there will not be a conflict so that concurrency is better,
but has to cancel things if that optimism proves to be unfounded.

To make related to changes to multiple tables and maintain coherent
data, you probably will need to do one or the other.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Hey, thanks for your detailed answer.

The particularity here is that I use a big query with CTE instead of a more procedural way.
I do sophisticated geometric computing using postGIS. I guess it is a hack of both postgres and postgis.

I explain better the pg_advisory locks uses I have tried.

First classic use of pg_advisory, not working :
CTE_1 (find what rows will be upserted in table_1)
CTE_2 (find what rows will be upserted in table_2)
CTE_3 (find what rows will be upserted in table_3)
CTE_4 (compute the result to be upserted into table_1)
CTE_5 (upsert into table_1 using custom upsert_function)
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function)
CTE_10 (end of query)
each of the upserting function is plpgsql and do something like
pg_advisory_lock(table_number, id of row to be upserted)
with updating AS (update table)
insert into table if not updated
pg_advisory_unlock(table_number,id of row to be upserted)

According to what the list said, it doesn't work because of visibility issues : the locking work, so we know each processes will upsert the same thing sequentially. However it will fail because each process has no visibility on the insert done by the others. So when the second process upsert the same thing, it will try to insert , and we get 2 inserts for the same row

Second non classic use of pg_adivsory, working but too taxing on resources
CTE_1 (find what rows will be upserted in table_1, pg_try_advisory_lock(1,id1)  )
CTE_2 (find what rows will be upserted in table_2, pg_try_advisory_lock(2,id2)  )
CTE_3 (find what rows will be upserted in table_3, pg_try_advisory_lock(3,id3)  )
CTE_4 (compute the result to be upserted into table_1)
CTE_5 (upsert into table_1 using custom upsert_function, only if  pg_try_advisory_lock was true )
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function, only if  pg_try_advisory_lock was true)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function, only if  pg_try_advisory_lock was true)
CTE_10 (release locks for (1,id1) , (2,id2) , (3,id3)) )
CTE_10 (end of query)

This works because we don't even try to upsert a row that is already being taken care by another process. So there is no visibility issue;
The problem is with ressources, I have to use 100k locks for several minutes. It grezatly increase memory use.

The difference between the 2 is essentially : in first case we lock each individual upsert statement. In second case we lock at the query level.

On the bright side, I don't need to wait for the insert in table_1 to compute table_2, same for table_2 and table_3.
However I can't control when the CTE upserting into each table is executed (as far as I know, there is no guarantee of execution order when using multiple cte).

Now about the lock of tables: I guess it wouldn't work for the same reasons at the individual upsert statement.
If I use it at the whole query level, other parallel process will be waiting for the whole query to finish before being allowed to go on?
Moreover, whatever LOCK I use will be within the same big transaction, so the only solution may be :

lock table_1,table_2,table_3;
execute big querry;
unlock table_1, table_2, table_3;

I didn't find any "vanilla" lock. Is "vanilla"  some kind of postgres slang?

Thanks for the links to the doc. I didn't know about SERIALIZABLE transaction, nor about mvcc .
I guess it would be equivalent to put an exeception block in the upsert function on duplicated primary key?
I don't see how to use it at querry level. The whole query being in one transaction, I would have to retry it because a few percent rows are duplicates !

Thanks all for your help,
Cheers,
Rémi-C

Re: pg_advisory_lock problem

От
Merlin Moncure
Дата:
On Tue, Aug 12, 2014 at 5:07 AM, Rémi Cura <remi.cura@gmail.com> wrote:
>
>
>
> 2014-08-11 22:48 GMT+02:00 Kevin Grittner <kgrittn@ymail.com>:
>
>> Rémi Cura <remi.cura@gmail.com> wrote:
>>
>> > as you (both?) suggested it works using advisory lock used at the
>> > beginning and end of the transaction. This way there is no upsert
>> > at all if the element is locked? (I used general advisory lockbut
>> > in the same way as transactionnal lock)
>>
>> This is too vague to comment on.
>>
>> > The issue in this case is simple : I have to use about 100k
>> > advisory locks, which is a big memory requirement for my hardware
>> > :-(
>>
>> ... and that doesn't seem to make any sense.  Either you are not
>> understanding advisory locks or you are doing something very, very
>> unusual.
>>
>> > Merlin I'm afraid I don't understand what is "vanilla" LOCK
>> > TABLE.
>>
>> See the LOCK TABLE command.
>>
>> http://www.postgresql.org/docs/current/interactive/sql-lock.html
>>
>> http://www.postgresql.org/docs/current/interactive/explicit-locking.html
>>
>> > I can't really use a lock table because each query upsert
>> > sequentially into 3 tables, doing lots of computing between.
>>
>> Now *that* I understand.  :-)  It's not an unusual requirement,
>> but can be a challenge when using snapshot isolation (where writes
>> don't block reads and reads don't block anything).  There are two
>> main approaches -- introduce blocking to serialize some of the
>> operations, or use the SERIALIZABLE transaction isolation level to
>> ensure that the behavior of all concurrent transactions is
>> consistent with the behavior you would see if they were run one at
>> a time.  The latter approach doesn't introduce any new blocking,
>> but it can cause transactions to get an ERROR with a SQLSTATE of
>> 40001 at just about any point, so you need to be prepared to
>> recognize that and retry those transactions from the beginning (not
>> just the last statement of the transaction), ignoring any data read
>> during the failed attempt.
>>
>> You may want to read the entire chapter on concurrency control:
>>
>> http://www.postgresql.org/docs/current/interactive/mvcc.html
>>
>> If you are considering using SERIALIZABLE transactions, you should
>> probably review the examples in the Wiki, to get an idea of how it
>> behaves in various cases:
>>
>> http://wiki.postgresql.org/wiki/SSI
>>
>> > I use parallel query to compute faster (load dividing). I guess
>> > it would be very slow with about 8 parallel queries with locks.
>>
>> Well, if you introduce blocking you reduce your parallelism, but if
>> you use serializable transactions and there are actually a lot of
>> conflicts you can see poor performance because of the errors
>> rolling back transactions and the need to retry them from the
>> start.  The techniques used to implement serializable transactions
>> in PostgreSQL are basically a refinement of the Optimistic
>> Concurrency Control (OCC) techniques, but generally with far fewer
>> retries needed -- the point being that it optimistically assumes
>> that there will not be a conflict so that concurrency is better,
>> but has to cancel things if that optimism proves to be unfounded.
>>
>> To make related to changes to multiple tables and maintain coherent
>> data, you probably will need to do one or the other.
>>
>> --
>> Kevin Grittner
>> EDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
>
>
> Hey, thanks for your detailed answer.
>
> The particularity here is that I use a big query with CTE instead of a more
> procedural way.
> I do sophisticated geometric computing using postGIS. I guess it is a hack
> of both postgres and postgis.
>
> I explain better the pg_advisory locks uses I have tried.
>
> First classic use of pg_advisory, not working :
> CTE_1 (find what rows will be upserted in table_1)
> CTE_2 (find what rows will be upserted in table_2)
> CTE_3 (find what rows will be upserted in table_3)
> CTE_4 (compute the result to be upserted into table_1)
> CTE_5 (upsert into table_1 using custom upsert_function)
> CTE_6 (compute the result to be upserted into table_2)
> CTE_7 (upsert into table_2 using custom upsert_function)
> CTE_8 (compute the result to be upserted into table_2)
> CTE_9 (upsert into table_2 using custom upsert_function)
> CTE_10 (end of query)
> each of the upserting function is plpgsql and do something like
> pg_advisory_lock(table_number, id of row to be upserted)
> with updating AS (update table)
> insert into table if not updated
> pg_advisory_unlock(table_number,id of row to be upserted)
>
> According to what the list said, it doesn't work because of visibility
> issues : the locking work, so we know each processes will upsert the same
> thing sequentially. However it will fail because each process has no
> visibility on the insert done by the others. So when the second process
> upsert the same thing, it will try to insert , and we get 2 inserts for the
> same row
>
> Second non classic use of pg_adivsory, working but too taxing on resources
> CTE_1 (find what rows will be upserted in table_1,
> pg_try_advisory_lock(1,id1)  )
> CTE_2 (find what rows will be upserted in table_2,
> pg_try_advisory_lock(2,id2)  )
> CTE_3 (find what rows will be upserted in table_3,
> pg_try_advisory_lock(3,id3)  )
> CTE_4 (compute the result to be upserted into table_1)
> CTE_5 (upsert into table_1 using custom upsert_function, only if
> pg_try_advisory_lock was true )
> CTE_6 (compute the result to be upserted into table_2)
> CTE_7 (upsert into table_2 using custom upsert_function, only if
> pg_try_advisory_lock was true)
> CTE_8 (compute the result to be upserted into table_2)
> CTE_9 (upsert into table_2 using custom upsert_function, only if
> pg_try_advisory_lock was true)
> CTE_10 (release locks for (1,id1) , (2,id2) , (3,id3)) )
> CTE_10 (end of query)
>
> This works because we don't even try to upsert a row that is already being
> taken care by another process. So there is no visibility issue;
> The problem is with ressources, I have to use 100k locks for several
> minutes. It grezatly increase memory use.
>
> The difference between the 2 is essentially : in first case we lock each
> individual upsert statement. In second case we lock at the query level.
>
> On the bright side, I don't need to wait for the insert in table_1 to
> compute table_2, same for table_2 and table_3.
> However I can't control when the CTE upserting into each table is executed
> (as far as I know, there is no guarantee of execution order when using
> multiple cte).
>
> Now about the lock of tables: I guess it wouldn't work for the same reasons
> at the individual upsert statement.
> If I use it at the whole query level, other parallel process will be waiting
> for the whole query to finish before being allowed to go on?
> Moreover, whatever LOCK I use will be within the same big transaction, so
> the only solution may be :
>
> lock table_1,table_2,table_3;
> execute big querry;
> unlock table_1, table_2, table_3;
>
> I didn't find any "vanilla" lock. Is "vanilla"  some kind of postgres slang?

'vanilla' is common english slang for 'regular'.   Basically I was
referring to the built in lock table statement.  LOCK has no
corresponding 'UNLOCK'; you lock a table or tables then locks are
released for you at the end of transaction.  What you are trying to
do, but isn't really possible, is to 'temporarily' lock an object
inside a transaction.  Transactions group related work and must be
handled as one unit.

The default lock of LOCK TABLE is very heavy but you can pass an
optional lockmode if you want to allow readers.   Definitely read this
chapter: http://www.postgresql.org/docs/9.3/static/explicit-locking.html.
The point I was making is that using advisory locks to block access to
a table usually isn't a good idea; just use the built in locks.

merlin


Re: pg_advisory_lock problem

От
Rémi Cura
Дата:
Hey,
just a quick follow-up for archive:

problem solved with pg_try_advisory...
using lock (ROW EXCLUSIVE) does'nt work (same duplicate key value)
using lock (ACCESS EXCLUSIVE) work (no error) but stop the parallelism (one thread as to wait for another to go on).

The optimal solution would be to slice the big CTE into several successive temp table creation, using a procedural language allowing control of transaction.
This would allow to control visibility and use the powerful MVCC features.
(I'm unwilling to do that).

Cheers everybody and many thanks for the help,
Rémi-C


2014-08-12 14:45 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
On Tue, Aug 12, 2014 at 5:07 AM, Rémi Cura <remi.cura@gmail.com> wrote:
>
>
>
> 2014-08-11 22:48 GMT+02:00 Kevin Grittner <kgrittn@ymail.com>:
>
>> Rémi Cura <remi.cura@gmail.com> wrote:
>>
>> > as you (both?) suggested it works using advisory lock used at the
>> > beginning and end of the transaction. This way there is no upsert
>> > at all if the element is locked? (I used general advisory lockbut
>> > in the same way as transactionnal lock)
>>
>> This is too vague to comment on.
>>
>> > The issue in this case is simple : I have to use about 100k
>> > advisory locks, which is a big memory requirement for my hardware
>> > :-(
>>
>> ... and that doesn't seem to make any sense.  Either you are not
>> understanding advisory locks or you are doing something very, very
>> unusual.
>>
>> > Merlin I'm afraid I don't understand what is "vanilla" LOCK
>> > TABLE.
>>
>> See the LOCK TABLE command.
>>
>> http://www.postgresql.org/docs/current/interactive/sql-lock.html
>>
>> http://www.postgresql.org/docs/current/interactive/explicit-locking.html
>>
>> > I can't really use a lock table because each query upsert
>> > sequentially into 3 tables, doing lots of computing between.
>>
>> Now *that* I understand.  :-)  It's not an unusual requirement,
>> but can be a challenge when using snapshot isolation (where writes
>> don't block reads and reads don't block anything).  There are two
>> main approaches -- introduce blocking to serialize some of the
>> operations, or use the SERIALIZABLE transaction isolation level to
>> ensure that the behavior of all concurrent transactions is
>> consistent with the behavior you would see if they were run one at
>> a time.  The latter approach doesn't introduce any new blocking,
>> but it can cause transactions to get an ERROR with a SQLSTATE of
>> 40001 at just about any point, so you need to be prepared to
>> recognize that and retry those transactions from the beginning (not
>> just the last statement of the transaction), ignoring any data read
>> during the failed attempt.
>>
>> You may want to read the entire chapter on concurrency control:
>>
>> http://www.postgresql.org/docs/current/interactive/mvcc.html
>>
>> If you are considering using SERIALIZABLE transactions, you should
>> probably review the examples in the Wiki, to get an idea of how it
>> behaves in various cases:
>>
>> http://wiki.postgresql.org/wiki/SSI
>>
>> > I use parallel query to compute faster (load dividing). I guess
>> > it would be very slow with about 8 parallel queries with locks.
>>
>> Well, if you introduce blocking you reduce your parallelism, but if
>> you use serializable transactions and there are actually a lot of
>> conflicts you can see poor performance because of the errors
>> rolling back transactions and the need to retry them from the
>> start.  The techniques used to implement serializable transactions
>> in PostgreSQL are basically a refinement of the Optimistic
>> Concurrency Control (OCC) techniques, but generally with far fewer
>> retries needed -- the point being that it optimistically assumes
>> that there will not be a conflict so that concurrency is better,
>> but has to cancel things if that optimism proves to be unfounded.
>>
>> To make related to changes to multiple tables and maintain coherent
>> data, you probably will need to do one or the other.
>>
>> --
>> Kevin Grittner
>> EDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
>
>
> Hey, thanks for your detailed answer.
>
> The particularity here is that I use a big query with CTE instead of a more
> procedural way.
> I do sophisticated geometric computing using postGIS. I guess it is a hack
> of both postgres and postgis.
>
> I explain better the pg_advisory locks uses I have tried.
>
> First classic use of pg_advisory, not working :
> CTE_1 (find what rows will be upserted in table_1)
> CTE_2 (find what rows will be upserted in table_2)
> CTE_3 (find what rows will be upserted in table_3)
> CTE_4 (compute the result to be upserted into table_1)
> CTE_5 (upsert into table_1 using custom upsert_function)
> CTE_6 (compute the result to be upserted into table_2)
> CTE_7 (upsert into table_2 using custom upsert_function)
> CTE_8 (compute the result to be upserted into table_2)
> CTE_9 (upsert into table_2 using custom upsert_function)
> CTE_10 (end of query)
> each of the upserting function is plpgsql and do something like
> pg_advisory_lock(table_number, id of row to be upserted)
> with updating AS (update table)
> insert into table if not updated
> pg_advisory_unlock(table_number,id of row to be upserted)
>
> According to what the list said, it doesn't work because of visibility
> issues : the locking work, so we know each processes will upsert the same
> thing sequentially. However it will fail because each process has no
> visibility on the insert done by the others. So when the second process
> upsert the same thing, it will try to insert , and we get 2 inserts for the
> same row
>
> Second non classic use of pg_adivsory, working but too taxing on resources
> CTE_1 (find what rows will be upserted in table_1,
> pg_try_advisory_lock(1,id1)  )
> CTE_2 (find what rows will be upserted in table_2,
> pg_try_advisory_lock(2,id2)  )
> CTE_3 (find what rows will be upserted in table_3,
> pg_try_advisory_lock(3,id3)  )
> CTE_4 (compute the result to be upserted into table_1)
> CTE_5 (upsert into table_1 using custom upsert_function, only if
> pg_try_advisory_lock was true )
> CTE_6 (compute the result to be upserted into table_2)
> CTE_7 (upsert into table_2 using custom upsert_function, only if
> pg_try_advisory_lock was true)
> CTE_8 (compute the result to be upserted into table_2)
> CTE_9 (upsert into table_2 using custom upsert_function, only if
> pg_try_advisory_lock was true)
> CTE_10 (release locks for (1,id1) , (2,id2) , (3,id3)) )
> CTE_10 (end of query)
>
> This works because we don't even try to upsert a row that is already being
> taken care by another process. So there is no visibility issue;
> The problem is with ressources, I have to use 100k locks for several
> minutes. It grezatly increase memory use.
>
> The difference between the 2 is essentially : in first case we lock each
> individual upsert statement. In second case we lock at the query level.
>
> On the bright side, I don't need to wait for the insert in table_1 to
> compute table_2, same for table_2 and table_3.
> However I can't control when the CTE upserting into each table is executed
> (as far as I know, there is no guarantee of execution order when using
> multiple cte).
>
> Now about the lock of tables: I guess it wouldn't work for the same reasons
> at the individual upsert statement.
> If I use it at the whole query level, other parallel process will be waiting
> for the whole query to finish before being allowed to go on?
> Moreover, whatever LOCK I use will be within the same big transaction, so
> the only solution may be :
>
> lock table_1,table_2,table_3;
> execute big querry;
> unlock table_1, table_2, table_3;
>
> I didn't find any "vanilla" lock. Is "vanilla"  some kind of postgres slang?

'vanilla' is common english slang for 'regular'.   Basically I was
referring to the built in lock table statement.  LOCK has no
corresponding 'UNLOCK'; you lock a table or tables then locks are
released for you at the end of transaction.  What you are trying to
do, but isn't really possible, is to 'temporarily' lock an object
inside a transaction.  Transactions group related work and must be
handled as one unit.

The default lock of LOCK TABLE is very heavy but you can pass an
optional lockmode if you want to allow readers.   Definitely read this
chapter: http://www.postgresql.org/docs/9.3/static/explicit-locking.html.
The point I was making is that using advisory locks to block access to
a table usually isn't a good idea; just use the built in locks.

merlin