Обсуждение: How can I see if my code is "concurrency safe"?

Поиск
Список
Период
Сортировка

How can I see if my code is "concurrency safe"?

От
Janne H
Дата:
Hi there!

Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse
whenit comes to using that knowledge in real-life. 

Let me give you an example.
I have this table 

create table userpositions ( userID int,  positionID int, unique (userID,positionID));

For a given userID there can be many positionIDs.

There are then two operations performed on this table, the first is "select positionID from userpositions where
userID=..."to get all positions for a user, and the second is to replace all positions for the user with new positions.
Forthis I was thinking about running it in a transaction 

begin;
  delete from userpositions where userID=...;
  insert into userpositions (userID,positionID) values ....;
commit;

But will this work? I don't want select to return empty results, I don't want two inserts running causing a unique
violation.
Experimenting with it tells me yes, it will work, but how should I reason to "convinse" my self that it will work?

Quoting the docs:  "The partial transaction isolation provided by Read Committed mode is adequate for many
applications,and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do
complexqueries and updates might require a more rigorously consistent view of the database than Read Committed mode
provides."


How do I know I'm not creating one of those complex queries? Or to put it differntly, how do you identify a complex
querywith potential issues? Any special techniques to analyze? 


/J

Re: How can I see if my code is "concurrency safe"?

От
Ben Chobot
Дата:
On Apr 25, 2012, at 5:17 PM, Janne H wrote:

> Hi there!
>
> Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse
whenit comes to using that knowledge in real-life. 
>
> Let me give you an example.
> I have this table
>
> create table userpositions ( userID int,  positionID int, unique (userID,positionID));
>
> For a given userID there can be many positionIDs.
>
> There are then two operations performed on this table, the first is "select positionID from userpositions where
userID=..."to get all positions for a user, and the second is to replace all positions for the user with new positions.
Forthis I was thinking about running it in a transaction 
>
> begin;
>   delete from userpositions where userID=...;
>   insert into userpositions (userID,positionID) values ....;
> commit;
>
> But will this work? I don't want select to return empty results, I don't want two inserts running causing a unique
violation.
> Experimenting with it tells me yes, it will work, but how should I reason to "convinse" my self that it will work?
>
> Quoting the docs:  "The partial transaction isolation provided by Read Committed mode is adequate for many
applications,and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do
complexqueries and updates might require a more rigorously consistent view of the database than Read Committed mode
provides."
>
>
> How do I know I'm not creating one of those complex queries? Or to put it differntly, how do you identify a complex
querywith potential issues? Any special techniques to analyze? 

Think about it this way: once one session starts a transaction, any modifications it makes are invisible to other
sessionsuntil you commit your transaction, at which point they all become visible atomically. (Unless those other
sessionshave explicitly said, "yes, I want to play with fire" and changed their isolation mode to something other than
ReadCommitted.)  

So given what you've said, it seems like you should be set, so long as you don't have two different sessions try to
insertthe same {userID,positionID} tuple. If that happens, then the second one that tries to commit will fail. You can
avoidthat by using a sequence for positionID, which will result in gaps and non-sequential IDs, but also no uniqueness
failures.

Re: How can I see if my code is "concurrency safe"?

От
"David Johnston"
Дата:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Ben Chobot
> Sent: Wednesday, April 25, 2012 7:29 PM
> To: Janne H
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] How can I see if my code is "concurrency safe"?
>
> On Apr 25, 2012, at 5:17 PM, Janne H wrote:
>
> > Hi there!
> >
> > Today I realised that my knowledge concerning how postgres handles
> concurrency is not very good, and its even worse when it comes to using
that
> knowledge in real-life.
> >
> > Let me give you an example.
> > I have this table
> >
> > create table userpositions ( userID int,  positionID int, unique
> > (userID,positionID));
> >
> > For a given userID there can be many positionIDs.
> >
> > There are then two operations performed on this table, the first is
> > "select positionID from userpositions where userID=..." to get all
> > positions for a user, and the second is to replace all positions for
> > the user with new positions. For this I was thinking about running it
> > in a transaction
> >
> > begin;
> >   delete from userpositions where userID=...;
> >   insert into userpositions (userID,positionID) values ....; commit;
> >
> > But will this work? I don't want select to return empty results, I don't
want
> two inserts running causing a unique violation.
> > Experimenting with it tells me yes, it will work, but how should I
reason to
> "convinse" my self that it will work?
> >
> > Quoting the docs:  "The partial transaction isolation provided by Read
> Committed mode is adequate for many applications, and this mode is fast
> and simple to use; however, it is not sufficient for all cases.
Applications that
> do complex queries and updates might require a more rigorously consistent
> view of the database than Read Committed mode provides."
> >
> >
> > How do I know I'm not creating one of those complex queries? Or to put
it
> differntly, how do you identify a complex query with potential issues? Any
> special techniques to analyze?
>
> Think about it this way: once one session starts a transaction, any
> modifications it makes are invisible to other sessions until you commit
your
> transaction, at which point they all become visible atomically. (Unless
those
> other sessions have explicitly said, "yes, I want to play with fire" and
changed
> their isolation mode to something other than Read Committed.)
>
> So given what you've said, it seems like you should be set, so long as you
> don't have two different sessions try to insert the same
{userID,positionID}
> tuple. If that happens, then the second one that tries to commit will
fail. You
> can avoid that by using a sequence for positionID, which will result in
gaps
> and non-sequential IDs, but also no uniqueness failures.
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

=====================================================

Please keep in mind that simply adding in a "SERIAL" (sequence) value to
avoid duplicate keys is dangerous as well.  While it avoids a physical
duplicate you are now able to insert a logical duplicate into the system.

Specific to your code could you just perform the following?

UPDATE userpositions SET userid = new_userid WHERE userid = old_userid;

In response to Ben: it appears that positionID is a FK in this situation
which means whether the corresponding PK is serial or not is irrelevant.
To Janne: If this indeed is a foreign key situation (multi-multi reference
table) you should define both columns using "FOREIGN KEY ... REFERENCES ..."

Since you are inserting records for a different userID than the one you are
deleting you currently have no checking in place to ensure that one of your
INSERT values is not already on the table.  You also have not defined the
relationship between position and user (i.e., can more than one person hold
a given position).  Assuming each position can only have one person active
at a time you should define the positionID field as UNIQUE by itself as
well.  Then, if you ensure you only add records that correspond to the
deleted records shown you can ensure that no duplicates will exist since
someone trying to add a duplicate position before you commit will have to
wait on the DELETE to unlock the position and by the time that happens you
will already have insert a new record for that position.

RISKY: Your risk, as shown, is between obtaining your "values" and running
the transaction delete/insert your "values" become invalid.  Say after you
select but before you delete/insert someone adds a new position to the user.
Your delete will remove the newly added position but your subsequent insert
will not have the new data and thus that position will end up un-filled
after your routine.  Same thing applies if they delete a position entry -
you will end up filling it again.  It is also possible that the position
could be double-filled if the userID is changed during your window.  Since
you have not provided ALL of your code (especially the SELECT/VALUES
portion) I would assume that you are obtaining the data nievely and so you
do have a concurrency issue to address.

Dave



Re: How can I see if my code is "concurrency safe"?

От
Chris Travers
Дата:
On Wed, Apr 25, 2012 at 4:17 PM, Janne H <jannehson51@yahoo.com> wrote:
> Hi there!
>
> Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse
whenit comes to using that knowledge in real-life. 

I think what everyone here is trying to say is that while PostgreSQL
is very good at handling concurrency itself, you may want to be
careful about making sure your relational constraints are correct.
This means proper use of multi-column keys (either as primary keys,
unique constraints, or foreign keys).  If you do a good job here,
PostgreSQL will handle your concurrency needs for you.

On an internal level, look at how MVCC works. It isn't really a
question of overall concurrency so much as it is a matter of
concurrency performance, since locking is no longer as necessary.
Also check out the serializable transaction level, though the need for
this sort of concurrency protection is rare.

Best Wishes,
Chris Travers

Re: How can I see if my code is "concurrency safe"?

От
Bill Moran
Дата:
On Wed, 25 Apr 2012 16:17:53 -0700 (PDT) Janne H <jannehson51@yahoo.com> wrote:

> Hi there!
>
> Today I realised that my knowledge concerning how postgres handles concurrency is not very good, and its even worse
whenit comes to using that knowledge in real-life. 
>
> Let me give you an example.
> I have this table 
>
> create table userpositions ( userID int,  positionID int, unique (userID,positionID));
>
> For a given userID there can be many positionIDs.
>
> There are then two operations performed on this table, the first is "select positionID from userpositions where
userID=..."to get all positions for a user, and the second is to replace all positions for the user with new positions.
Forthis I was thinking about running it in a transaction 
>
> begin;
>   delete from userpositions where userID=...;
>   insert into userpositions (userID,positionID) values ....;
> commit;
>
> But will this work? I don't want select to return empty results, I don't want two inserts running causing a unique
violation.
> Experimenting with it tells me yes, it will work, but how should I reason to "convinse" my self that it will work?
>
> Quoting the docs:  "The partial transaction isolation provided by Read Committed mode is adequate for many
applications,and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do
complexqueries and updates might require a more rigorously consistent view of the database than Read Committed mode
provides."
>
>
> How do I know I'm not creating one of those complex queries? Or to put it differntly, how do you identify a complex
querywith potential issues? Any special techniques to analyze? 

Just try it.

Open two psql sessions and manually enter the commands one at a time,
alternating between windows.  Try some scenarios that you're concerned
about and see what happens.  Experience will teach you a lot very
quickly.

I can say, from experience, that what you have above is not
sufficient to protect you from empty result sets and duplicate
keys in rare, but annoyingly persistent cases.  Especially if your
front-end is a web-app, you may be shocked to find out how often
users will double-click instead of single click, thus resulting in
the exact same queries running close enough together to conflict
with each other.  Of course, this can be fixed in many ways, but
understanding why it happens is the first step to choosing the
best approach for your application.

--
Bill Moran <wmoran@potentialtech.com>