Обсуждение: coalesce in plpgsql, and other style questions

Поиск

Список

Период

Сортировка

coalesce in plpgsql, and other style questions

От

Ross Boylan

Дата:

12 июня 2012 г., 17:47:17

I just wrote my first pl/pgsql function, and would appreciate any
comments people have on it.  I'll be writing a bunch of similar
functions, with semantics "give me the id of the object if exists,
otherwise create it and give me the id."

My solution seems excessively procedural to me.  I thought I could get
the right semantics with something like
select coalesce((select id from mytable where name='foo'),
    (insert into mytable ('name') values('foo') returning id))
but I could not get that to work in plgsql.

Also, I wonder if calling a column 'name' is asking for trouble.

The actual function is a little more complicated because the table has
the possibility of canonical entries with other entries pointing to
them,  I use 'did' for database id to avoid confusion with various other
message ids the system uses; it's for tracking emails.

Here's the code.  As I said, I'd love comments on any aspect of it.

/* gom = get or make functions retrieve did if available, otherwise create record
   and return its did. */

create or replace function gom_hostids(IN hostname text, IN canonicalname text = NULL,
       OUT hostid bigint, OUT canonicalid bigint) language plpgsql as $$
DECLARE
BEGIN
        select did, canonical into hostid, canonicalid from host
        where name = hostname;
        if FOUND then
           return;
           end if;
        if canonicalname is not NULL then
           select did  into canonicalid from host where name = canonicalname;
           if not FOUND then
              insert into host (name) values(canonicalname) returning did into canonicalid;
              end if;
           end if;
        if hostname != canonical then
                insert into host (name, canonical) values(hostname, canonicalid)
                       returning did into hostid;
        else
                hostid := canonicalid;
        return;
END
$$;

Table definition:
create table host (
       did bigint primary key default (nextval('rb_id_seq')), ---database id, like oid
       name  text, --IP address legal, usually enclosed in [].  More commonly an internet domain.
                   -- domain is a sql keyword and so I used host.
       canonical bigint references host (did) --if not null, preferred name for host
                 --may refer to self
) inherits (RBObject);

I'm using Posgresql 8.4.

Thanks.
Ross

P.S.  The recommended style of using "into", which puts the into near
the start of selects but the end of inserts and deletes seems irregular
to me.  Also, I put the language clause of create function before the as
clause because it seemed more natural to know that information before
reading the body.  I suppose if they are all the same language the
language clause is just kind of noise, and that's why the examples have
it at the end.

Re: coalesce in plpgsql, and other style questions

От

Jeff Davis

Дата:

12 июня 2012 г., 23:31:00

On Tue, 2012-06-12 at 10:46 -0700, Ross Boylan wrote:
> I just wrote my first pl/pgsql function, and would appreciate any
> comments people have on it.  I'll be writing a bunch of similar
> functions, with semantics "give me the id of the object if exists,
> otherwise create it and give me the id."
>
> My solution seems excessively procedural to me.

I think what you're trying to do is procedural, and that's not
necessarily bad. You are trying to get an existing ID if available, or
assign a new ID if not; and I assume you are doing so to save the space
associated with storing the full names (essentially like dictionary
encoding).

>   I thought I could get
> the right semantics with something like
> select coalesce((select id from mytable where name='foo'),
>     (insert into mytable ('name') values('foo') returning id))
> but I could not get that to work in plgsql.

In pl/pgsql, you can't just use SELECT, you have to use something like
SELECT INTO the_id ... or use PERFORM.

Also, you have a potential race condition there, because someone might
insert "foo" into mytable after the select (first arg to coalesce) and
before the insert (second arg to coalesce).

> Also, I wonder if calling a column 'name' is asking for trouble.

I recommend against it.

> The actual function is a little more complicated because the table has
> the possibility of canonical entries with other entries pointing to
> them,  I use 'did' for database id to avoid confusion with various other
> message ids the system uses; it's for tracking emails.
>
> Here's the code.  As I said, I'd love comments on any aspect of it.
>
> /* gom = get or make functions retrieve did if available, otherwise create record
>    and return its did. */
>
> create or replace function gom_hostids(IN hostname text, IN canonicalname text = NULL,
>        OUT hostid bigint, OUT canonicalid bigint) language plpgsql as $$

Sometimes it's good to declare your variables with names like
in_hostname or out_canonicalid, to prevent confusion reading
similarly-named variables coming from the tables.

> DECLARE
> BEGIN
>         select did, canonical into hostid, canonicalid from host
>         where name = hostname;
>         if FOUND then
>            return;
>            end if;
>         if canonicalname is not NULL then
>            select did  into canonicalid from host where name = canonicalname;
>            if not FOUND then
>               insert into host (name) values(canonicalname) returning did into canonicalid;
>               end if;
>            end if;
>         if hostname != canonical then

Is canonical a proper variable here? It's not in the argument list, and
it's not DECLAREd. Did you mean canonicalname?

>                 insert into host (name, canonical) values(hostname, canonicalid)
>                        returning did into hostid;
>         else
>                 hostid := canonicalid;

I would recommend keeping more explicit track of NULL values and what
should happen to them. It looks like a NULL canonicalname would fall
into this else branch (I could be mistaken), but it's a little hard to
follow exactly what you want to happen. A few comments would help.

>         return;
> END
> $$;

Again, you have a potential race between checking for the canonicalname
in host, and inserting it into host when it doesn't exist. If another
transaction runs in between those two, you will get duplicates in the
host table.

> Table definition:
> create table host (
>        did bigint primary key default (nextval('rb_id_seq')), ---database id, like oid
>        name  text, --IP address legal, usually enclosed in [].  More commonly an internet domain.
>                    -- domain is a sql keyword and so I used host.
>        canonical bigint references host (did) --if not null, preferred name for host
>                  --may refer to self
> ) inherits (RBObject);
>
> I'm using Posgresql 8.4.

I recommend a unique constraint on host.name. That will catch the race
conditions I mentioned above, and turn them into errors. You can catch
those errors in plpgsql, allowing you to do what you want in that case.

Regards,
    Jeff Davis

Re: coalesce in plpgsql, and other style questions

От

Ross Boylan

Дата:

13 июня 2012 г., 00:44:07

Thanks for your responses.  Some followup below.
On Tue, 2012-06-12 at 16:30 -0700, Jeff Davis wrote:
> On Tue, 2012-06-12 at 10:46 -0700, Ross Boylan wrote:
> > I just wrote my first pl/pgsql function, and would appreciate any
> > comments people have on it.  I'll be writing a bunch of similar
> > functions, with semantics "give me the id of the object if exists,
> > otherwise create it and give me the id."
> >
> > My solution seems excessively procedural to me.
>
> I think what you're trying to do is procedural, and that's not
> necessarily bad. You are trying to get an existing ID if available, or
> assign a new ID if not; and I assume you are doing so to save the space
> associated with storing the full names (essentially like dictionary
> encoding).
>
> >   I thought I could get
> > the right semantics with something like
> > select coalesce((select id from mytable where name='foo'),
> >     (insert into mytable ('name') values('foo') returning id))
> > but I could not get that to work in plgsql.
>
> In pl/pgsql, you can't just use SELECT, you have to use something like
> SELECT INTO the_id ... or use PERFORM.
>
> Also, you have a potential race condition there, because someone might
> insert "foo" into mytable after the select (first arg to coalesce) and
> before the insert (second arg to coalesce).
Practically, it's just me so there shouldn't be any risk.  But I'd like
to understand the general issue.  I thought transaction would take care
of this, so that within a transaction the state of the database does not
change from actions in other sessions.  Then if I commit and have
conflict, the commit fails.

I guess the sequence I'm using for did assures that did is unique across
all transactions, and so the 2 transactions would not be in conflict,
since they have different primary keys.

As you said later, a unique constraint on one/some of the other fields
will at least prevent bogus records that are "the same" from my point of
view.  I was a bit on the fence about making it unique, since I thought
I might have different hosts that shared the same name.  But those
scenarios are unlikely, and I really don't want to deal with it if it
does happen.

But is my whole model of how transactions are operating off?  I'm
basically generalizing from Gemstone, an object database.

>
> > Also, I wonder if calling a column 'name' is asking for trouble.
>
> I recommend against it.
I'll change it.
>
> > The actual function is a little more complicated because the table has
> > the possibility of canonical entries with other entries pointing to
> > them,  I use 'did' for database id to avoid confusion with various other
> > message ids the system uses; it's for tracking emails.
> >
> > Here's the code.  As I said, I'd love comments on any aspect of it.
> >
> > /* gom = get or make functions retrieve did if available, otherwise create record
> >    and return its did. */
> >
> > create or replace function gom_hostids(IN hostname text, IN canonicalname text = NULL,
> >        OUT hostid bigint, OUT canonicalid bigint) language plpgsql as $$
>
> Sometimes it's good to declare your variables with names like
> in_hostname or out_canonicalid, to prevent confusion reading
> similarly-named variables coming from the tables.
I'm slowly discovering the problems of name clashes; thanks for the
suggestion.
>
> > DECLARE
BTW, is DECLARE necessary if there are no declarations?

> > BEGIN
> >         select did, canonical into hostid, canonicalid from host
> >         where name = hostname;
> >         if FOUND then
> >            return;
> >            end if;
> >         if canonicalname is not NULL then
> >            select did  into canonicalid from host where name = canonicalname;
> >            if not FOUND then
> >               insert into host (name) values(canonicalname) returning did into canonicalid;
> >               end if;
> >            end if;
> >         if hostname != canonical then
>
> Is canonical a proper variable here? It's not in the argument list, and
> it's not DECLAREd. Did you mean canonicalname?
canonical is a column name in the table.  Perhaps canonical_did would be
more appropriate for it (and rename the output parameter
out_canonical_did from canonicalid).
>
> >                 insert into host (name, canonical) values(hostname, canonicalid)
> >                        returning did into hostid;
> >         else
> >                 hostid := canonicalid;
>
> I would recommend keeping more explicit track of NULL values and what
> should happen to them. It looks like a NULL canonicalname would fall
> into this else branch (I could be mistaken),
that's right.
> but it's a little hard to
> follow exactly what you want to happen. A few comments would help.
I trimmed a comment before the start of the function; it doesn't exactly
hit that issue, but it does discuss NULL's:
/* Note that if this function creates a canonical host, the canonical field
   of that host will be NULL rather than pointing to itself.
   This function may return canonicalid = NULL; it does not currently set canonicalid
   to hostid in that case.

   Also if caller supplies an existing hostname with a bogus canonicalname the
   canonicalid on file of that hostname will be returned. */


>
> >         return;
> > END
> > $$;
>
> Again, you have a potential race between checking for the canonicalname
> in host, and inserting it into host when it doesn't exist. If another
> transaction runs in between those two, you will get duplicates in the
> host table.
>
> > Table definition:
> > create table host (
> >        did bigint primary key default (nextval('rb_id_seq')), ---database id, like oid
> >        name  text, --IP address legal, usually enclosed in [].  More commonly an internet domain.
> >                    -- domain is a sql keyword and so I used host.
> >        canonical bigint references host (did) --if not null, preferred name for host
> >                  --may refer to self
> > ) inherits (RBObject);
> >
> > I'm using Posgresql 8.4.
>
> I recommend a unique constraint on host.name. That will catch the race
> conditions I mentioned above, and turn them into errors. You can catch
> those errors in plpgsql, allowing you to do what you want in that case.
>
> Regards,
>     Jeff Davis
>

Re: coalesce in plpgsql, and other style questions

От

Jeff Davis

Дата:

13 июня 2012 г., 01:17:57

On Tue, 2012-06-12 at 17:42 -0700, Ross Boylan wrote:
> Practically, it's just me so there shouldn't be any risk.  But I'd like
> to understand the general issue.  I thought transaction would take care
> of this, so that within a transaction the state of the database does not
> change from actions in other sessions.  Then if I commit and have
> conflict, the commit fails.

That is true in SERIALIZABLE mode in version 9.1 or later. Perhaps that
will eventually be the default.

But a UNIQUE constraint on host.name will fix your particular problem
nicely and efficiently, so you don't have to be on 9.1 (though I
encourage you to upgrade to 9.1.4 if you can). And it's just good
practice to declare such constraints unless there's a reason not to.

> I guess the sequence I'm using for did assures that did is unique across
> all transactions, and so the 2 transactions would not be in conflict,
> since they have different primary keys.

You are using sequences (nextval() operates on a sequence) and a
sequence will never produce the same value from nextval() (unless you
explicitly reset it). You have a UNIQUE constraint on did, but even if
you didn't, then the values should be unique. However, I still recommend
keeping the UNIQUE constraint on host.did unless there's a reason not
to.

> But is my whole model of how transactions are operating off?  I'm
> basically generalizing from Gemstone, an object database.

I think you have the right idea, and 9.1 SERIALIZABLE mode makes what
you say true. But, unfortunately, that is not true in 9.0 or before (nor
in lower isolation modes, like the READ COMMITTED default). If you're
interested in the details, please read the 8.4 explanation here:

http://www.postgresql.org/docs/8.4/static/transaction-iso.html

thoroughly, and then see the 9.1 version:

http://www.postgresql.org/docs/8.4/static/transaction-iso.html

Each of these isolation modes has a purpose, but in my opinion, you
should use SERIALIZABLE in version 9.1+ (a.k.a. "true serializability")
unless you understand the other modes and have a reason to use one of
the other ones. I say this because true serializability matches your
intuitive understanding, while the other modes have some subtle
behaviors that might surprise you.

> > > DECLARE
> BTW, is DECLARE necessary if there are no declarations?

Nope.

> > > BEGIN
> > >         select did, canonical into hostid, canonicalid from host
> > >         where name = hostname;
> > >         if FOUND then
> > >            return;
> > >            end if;
> > >         if canonicalname is not NULL then
> > >            select did  into canonicalid from host where name = canonicalname;
> > >            if not FOUND then
> > >               insert into host (name) values(canonicalname) returning did into canonicalid;
> > >               end if;
> > >            end if;
> > >         if hostname != canonical then
> >
> > Is canonical a proper variable here? It's not in the argument list, and
> > it's not DECLAREd. Did you mean canonicalname?
> canonical is a column name in the table.  Perhaps canonical_did would be
> more appropriate for it (and rename the output parameter
> out_canonical_did from canonicalid).

Does the function actually work like it is now? It looks like
"canonical" is being used outside of the scope of any query, so I don't
see how it comes from the table. Maybe I'm still confused.

Regards,
    Jeff Davis

Re: coalesce in plpgsql, and other style questions

От

Ross Boylan

Дата:

13 июня 2012 г., 01:58:47

On Tue, 2012-06-12 at 18:17 -0700, Jeff Davis wrote:
> On Tue, 2012-06-12 at 17:42 -0700, Ross Boylan wrote:

> > > > BEGIN
> > > >         select did, canonical into hostid, canonicalid from host
> > > >         where name = hostname;
> > > >         if FOUND then
> > > >            return;
> > > >            end if;
> > > >         if canonicalname is not NULL then
> > > >            select did  into canonicalid from host where name = canonicalname;
> > > >            if not FOUND then
> > > >               insert into host (name) values(canonicalname) returning did into canonicalid;
> > > >               end if;
> > > >            end if;
> > > >         if hostname != canonical then
> > >
> > > Is canonical a proper variable here? It's not in the argument list, and
> > > it's not DECLAREd. Did you mean canonicalname?
> > canonical is a column name in the table.  Perhaps canonical_did would be
> > more appropriate for it (and rename the output parameter
> > out_canonical_did from canonicalid).
>
> Does the function actually work like it is now?
No :)  Thanks for your persistence.  I made some changes after I tested
it.  That last bit should be
        if hostname != canonicalname then
                insert into host (name, canonical) values(hostname, canonicalid)
                       returning did into hostid;
        else
                hostid := canonicalid;
                end if;
        return;

There was a missing end if too.  I may rethink my indenting the "end if"
with the previous block, at least if I have an "else".  I'm also not
crazy about how emacs is handling indentation.

Ross


> It looks like
> "canonical" is being used outside of the scope of any query, so I don't
> see how it comes from the table. Maybe I'm still confused.
>
> Regards,
>     Jeff Davis
>

Re: coalesce in plpgsql, and other style questions

От

Daniel Staal

Дата:

13 июня 2012 г., 13:52:22

On 2012-06-12 20:42, Ross Boylan wrote:
> Thanks for your responses.  Some followup below.
> On Tue, 2012-06-12 at 16:30 -0700, Jeff Davis wrote:
>> On Tue, 2012-06-12 at 10:46 -0700, Ross Boylan wrote:
>> > I just wrote my first pl/pgsql function, and would appreciate any
>> > comments people have on it.  I'll be writing a bunch of similar
>> > functions, with semantics "give me the id of the object if exists,
>> > otherwise create it and give me the id."
>> >
>> > My solution seems excessively procedural to me.
>>
>> I think what you're trying to do is procedural, and that's not
>> necessarily bad. You are trying to get an existing ID if available,
>> or
>> assign a new ID if not; and I assume you are doing so to save the
>> space
>> associated with storing the full names (essentially like dictionary
>> encoding).
>>
>> >   I thought I could get
>> > the right semantics with something like
>> > select coalesce((select id from mytable where name='foo'),
>> >     (insert into mytable ('name') values('foo') returning id))
>> > but I could not get that to work in plgsql.
>>
>> In pl/pgsql, you can't just use SELECT, you have to use something
>> like
>> SELECT INTO the_id ... or use PERFORM.
>>
>> Also, you have a potential race condition there, because someone
>> might
>> insert "foo" into mytable after the select (first arg to coalesce)
>> and
>> before the insert (second arg to coalesce).
> Practically, it's just me so there shouldn't be any risk.  But I'd
> like
> to understand the general issue.  I thought transaction would take
> care
> of this, so that within a transaction the state of the database does
> not
> change from actions in other sessions.  Then if I commit and have
> conflict, the commit fails.
>
> I guess the sequence I'm using for did assures that did is unique
> across
> all transactions, and so the 2 transactions would not be in conflict,
> since they have different primary keys.
>
> As you said later, a unique constraint on one/some of the other
> fields
> will at least prevent bogus records that are "the same" from my point
> of
> view.  I was a bit on the fence about making it unique, since I
> thought
> I might have different hosts that shared the same name.  But those
> scenarios are unlikely, and I really don't want to deal with it if it
> does happen.
>
> But is my whole model of how transactions are operating off?  I'm
> basically generalizing from Gemstone, an object database.

A transaction means you see a consistent view of the data in the
database - that view is consistent and valid through the life of the
transaction, but no other transaction needs to see the same view.  Other
transactions may be going on at the same time, and changing data as they
see fit - you will see those changes within your transaction, but only
once they've finished their transaction.  Transactions do not lock
tables or rows, even at the serializable level.

What you appear to be thinking is that the transaction locks the
tables, and then tries to do a resolution of the lock at the end of the
transaction - serializable comes close to that, if everyone is modifying
the same table, but not quite.  Even under serializable, you'd still be
able to insert invalid data using the 'select, then insert if not in
select' logic.  (The difference is that under serializable, you couldn't
check to see if it had happened afterwards - at least not in the same
transaction.)

You need either a unique constraint - so the database checks as part of
the insert operation whether there is another of the same value - or a
table lock.  A table lock would allow you to set that no one else is
allowed to modify the table until you are done with it.

Your basic assumption for how transactions work is close to how simple
databases work, that basically lock the whole table or database while
you are working on it.  Which is simple and cheap to do - as long as you
aren't doing a whole lot in the database.  If you have a lot of action
going on in the database, the lock resolution eventually overwhelms the
simplicity - which is the whole reason why there are databases like
Postgresql, which can maintain good performance and data integrity
without that locking.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Re: coalesce in plpgsql, and other style questions

От

Merlin Moncure

Дата:

13 июня 2012 г., 14:02:41

On Tue, Jun 12, 2012 at 12:46 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> I just wrote my first pl/pgsql function, and would appreciate any
> comments people have on it.  I'll be writing a bunch of similar
> functions, with semantics "give me the id of the object if exists,
> otherwise create it and give me the id."
>
> My solution seems excessively procedural to me.  I thought I could get
> the right semantics with something like
> select coalesce((select id from mytable where name='foo'),
>        (insert into mytable ('name') values('foo') returning id))
> but I could not get that to work in plgsql.

for posterity:

with a as (select id from mytable where name='foo'),
b as
(
  insert into mytable (name)
    select 'foo' where not exists (select 1 from a)
   returning id
)
select * from a union all select * from b;

I definitely appreciate the desire to avoid procedural implementations
of things like this.  Just be advised that this is a postgresql-ism
(data modifying 'with' is not standard syntax).   This also (as Jeff
notes) has no bearing on the race to the id: you must be prepared to
retry the above statement in face of concurrent attempts to insert to
the same unique value  unless you have taken a lock to guard against
this.  I don't think it's possible to work that lock into the CTE.

merlin

Re: coalesce in plpgsql, and other style questions

От

Ross Boylan

Дата:

13 июня 2012 г., 18:35:52

On Wed, 2012-06-13 at 09:02 -0500, Merlin Moncure wrote:
> On Tue, Jun 12, 2012 at 12:46 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > I just wrote my first pl/pgsql function, and would appreciate any
> > comments people have on it.  I'll be writing a bunch of similar
> > functions, with semantics "give me the id of the object if exists,
> > otherwise create it and give me the id."
> >
> > My solution seems excessively procedural to me.  I thought I could get
> > the right semantics with something like
> > select coalesce((select id from mytable where name='foo'),
> >        (insert into mytable ('name') values('foo') returning id))
> > but I could not get that to work in plgsql.
>
> for posterity:
>
> with a as (select id from mytable where name='foo'),
> b as
> (
>   insert into mytable (name)
>     select 'foo' where not exists (select 1 from a)
>    returning id
> )
> select * from a union all select * from b;
Oh my!

Is that legal plpgsql code, or just regular (postgres) sql?

Also, what's CTE (below)?

Thanks.
Ross
>
> I definitely appreciate the desire to avoid procedural implementations
> of things like this.  Just be advised that this is a postgresql-ism
> (data modifying 'with' is not standard syntax).   This also (as Jeff
> notes) has no bearing on the race to the id: you must be prepared to
> retry the above statement in face of concurrent attempts to insert to
> the same unique value  unless you have taken a lock to guard against
> this.  I don't think it's possible to work that lock into the CTE.
>
> merlin

Re: coalesce in plpgsql, and other style questions

От

Merlin Moncure

Дата:

13 июня 2012 г., 18:54:46

On Wed, Jun 13, 2012 at 1:34 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> On Wed, 2012-06-13 at 09:02 -0500, Merlin Moncure wrote:
>> > My solution seems excessively procedural to me.  I thought I could get
>> > the right semantics with something like
>> > select coalesce((select id from mytable where name='foo'),
>> >        (insert into mytable ('name') values('foo') returning id))
>> > but I could not get that to work in plgsql.
>>
>> for posterity:
>>
>> with a as (select id from mytable where name='foo'),
>> b as
>> (
>>   insert into mytable (name)
>>     select 'foo' where not exists (select 1 from a)
>>    returning id
>> )
>> select * from a union all select * from b;
> Oh my!
>
> Is that legal plpgsql code, or just regular (postgres) sql?

It is both (but only in 9.1+, sorry!).  It's plain SQL, so is
acceptable in any place sql is allowed -- directly from the client,
sql functions, plpgsql functions, etc.  The ability to chain
'returning' into other queries via 'with' was a new feature which we
call 'data modifying with' added as of postgresql 9.1.  Vanilla CTEs
aka common table exrpressions  aka WITH statements -- were added in
8.4 but you can only use them with select statements.

Aside: I encourage you to continue with pl/pgsql.  It's the secret sauce.

merlin

Re: coalesce in plpgsql, and other style questions

От

Jeff Davis

Дата:

13 июня 2012 г., 19:10:39

On Wed, 2012-06-13 at 09:52 -0400, Daniel Staal wrote:
> What you appear to be thinking is that the transaction locks the
> tables, and then tries to do a resolution of the lock at the end of the
> transaction - serializable comes close to that, if everyone is modifying
> the same table, but not quite.  Even under serializable, you'd still be
> able to insert invalid data using the 'select, then insert if not in
> select' logic.  (The difference is that under serializable, you couldn't
> check to see if it had happened afterwards - at least not in the same
> transaction.)

Your statement was true in 9.0 and before, but in 9.1, SERIALIZABLE
means *truly serializable*.

Try it, and see. The 'select, then insert if not in select' logic will
throw a serialization error if another transaction races it, even if
there is no UNIQUE.

> You need either a unique constraint

I agree that a unique constraint is the right way to do it, because it's
a declarative constraint.

> Your basic assumption for how transactions work is close to how simple
> databases work, that basically lock the whole table or database while
> you are working on it.  Which is simple and cheap to do - as long as you
> aren't doing a whole lot in the database.  If you have a lot of action
> going on in the database, the lock resolution eventually overwhelms the
> simplicity - which is the whole reason why there are databases like
> Postgresql, which can maintain good performance and data integrity
> without that locking.

I strongly encourage you to do some experimentation on 9.1+ with
serializable transactions (all transactions must be serializable for it
to work).

See if you can find any anomalies, or any performance degradation. The
only expected performance degradation (aside from very strange cases) is
that there will be serialization errors, and you'll need to retry those
transactions. It does not cause any transactions to block that wouldn't
otherwise.

Think of it as a magic mode that turns SQL race conditions into errors.
You should still use appropriate locking and declarative constraints,
because those will allow more transactions to succeed (obviously, under
intense workloads you don't want a high rollback rate). And declarative
constraints also add to the readability/maintainability.

Regards,
    Jeff Davis

Re: coalesce in plpgsql, and other style questions

От

Daniel Staal

Дата:

14 июня 2012 г., 01:20:13

--As of June 13, 2012 12:10:18 PM -0700, Jeff Davis is alleged to have said:

> Your statement was true in 9.0 and before, but in 9.1, SERIALIZABLE
> means *truly serializable*.

You're right; sorry.  Bad me for relying on faulty memory and an old
manual.  ;)

>> You need either a unique constraint
>
> I agree that a unique constraint is the right way to do it, because it's
> a declarative constraint.

This would be my preference as well, regardless of the rest.  If you find
yourself trying to implement something that's solvable as an SQL constraint
with application-level code, you are doing something wrong.

>> Your basic assumption for how transactions work is close to how simple
>> databases work, that basically lock the whole table or database while
>> you are working on it.  Which is simple and cheap to do - as long as you
>> aren't doing a whole lot in the database.  If you have a lot of action
>> going on in the database, the lock resolution eventually overwhelms the
>> simplicity - which is the whole reason why there are databases like
>> Postgresql, which can maintain good performance and data integrity
>> without that locking.
>
> I strongly encourage you to do some experimentation on 9.1+ with
> serializable transactions (all transactions must be serializable for it
> to work).
>
> See if you can find any anomalies, or any performance degradation. The
> only expected performance degradation (aside from very strange cases) is
> that there will be serialization errors, and you'll need to retry those
> transactions. It does not cause any transactions to block that wouldn't
> otherwise.

I would in no way expect Postgresql to implement that change unless they
were sure it didn't cause any major issues.  I'm impressed though that they
managed it in the first place.  ;)  My statement above was to address why
you would tend to find this type of behavior on low-end databases as the
default: Because it's cheap and easy to implement poorly.  Just lock
everything.  Implementing it well is another thing entirely.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: coalesce in plpgsql, and other style questions