Обсуждение: partitioning question -- how to guarantee uniqueness across partitions
I have a simple example copied from the 8.3 manual on partitioning (http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html). My question is, if you create a serial type in the parent table which is meant to be the primary key across all the partitions, how does one guarantee uniqueness of that key? This example is contrived, but it shows that there is no constraint to prevent insertion of duplicate 'id' values into separate partition tables. (after running the example there should be two id=100 rows in the table). This makes sense because each partition only guarantees uniqueness for it's own rows, and the primary key constraint on the parent table never comes into play. Do I need to add a test in the trigger function to prevent insertion of duplicate 'id' values? I realize that if I used a partitioning function that was based on ranges of the primary key, then I would be guaranteed unique ranges of id for each table, but most partitioning examples I've seen usually test for date ranges which is similar to this example. Any practical thoughts on this issue would be appreciated. Thanks, Whit begin; CREATE TABLE part_main ( id serial primary key, data_value integer not null ); CREATE TABLE part_10 (CHECK (data_value <= 10)) inherits (part_main); ALTER TABLE part_10 ADD PRIMARY KEY (id); CREATE TABLE part_20 (CHECK (data_value > 10 and data_value <= 20 )) inherits (part_main); ALTER TABLE part_20 ADD PRIMARY KEY (id); CREATE OR REPLACE FUNCTION part_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.data_value <= 10) THEN INSERT INTO part_10 VALUES (NEW.*); ELSEIF ( NEW.data_value > 10 and NEW.data_value <= 20) THEN INSERT INTO part_20 VALUES (NEW.*); ELSE RAISE EXCEPTION 'value out of range. Fix the part_insert_trigger() function!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER insert_part_trigger BEFORE INSERT ON part_main FOR EACH ROW EXECUTE PROCEDURE part_insert_trigger(); insert into part_main (id,data_value) values (100,10), (100,20); commit;
Whit Armstrong <armstrong.whit@gmail.com> writes: > I have a simple example copied from the 8.3 manual on partitioning > (http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html). > My question is, if you create a serial type in the parent table which > is meant to be the primary key across all the partitions, how does one > guarantee uniqueness of that key? One doesn't. That is not an appropriate way to set up a partitioned table. You need a primary key that can actually be used as a meaningful partitioning key. In this example, the id is completely useless and what you should be looking at is making the data_value be the primary key. regards, tom lane
Thanks, Tom. Let me give a little more detail on my actual data rather than the simple example I sent. I have a 60GB table of loan balances, which I've partitioned into 26 tables. The loan id's are a sequence of 6 characters, so the partitioning rule I've used is the first character of the loan id, which yields roughly equal sized partitions of 2.8 GB or so. Each loan can only have one balance per month, so the primary key on each partition is set to be loan_id and asofdate. However, this data is meant to be available via a rails application, hence, the need for a surrogate key of integers which is unique across the entire set of partitions. Creation of new rows in the partitioned tables should not be an issue under normal circumstances because I see that all of the child tables use the same sequence for generating new id's. However, what makes me nervous is that there is no explicit constraint in the database that prevents duplicate id's from being created, and I'm not sure how the rails app would react if for whatever reason duplicate id keys wound up in the table. Any suggestions? Thanks, Whit On Sun, Jun 28, 2009 at 1:27 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Whit Armstrong <armstrong.whit@gmail.com> writes: >> I have a simple example copied from the 8.3 manual on partitioning >> (http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html). > >> My question is, if you create a serial type in the parent table which >> is meant to be the primary key across all the partitions, how does one >> guarantee uniqueness of that key? > > One doesn't. That is not an appropriate way to set up a partitioned > table. You need a primary key that can actually be used as a meaningful > partitioning key. In this example, the id is completely useless and > what you should be looking at is making the data_value be the primary > key. > > regards, tom lane >
> Hi Whit, > > you could consider using GUIDs instead of Integer for primary keys. > Here is a nice blog post explaining how to do this in Rails: > http://ariejan.net/2008/08/12/ruby-on-rails-uuid-as-your-activerecord... > > Best Regards, > Arndt Lehmann just a note: As per above blog post, the GUID are generated using a Ruby module. PostgreSQL actually comes with a baked in UUID column type. That probably will be a more efficient solution. Regards, Arndt
On Jun 29, 3:45 am, armstrong.w...@gmail.com (Whit Armstrong) wrote: > Thanks, Tom. > > Let me give a little more detail on my actual data rather than the > simple example I sent. > > I have a 60GB table of loan balances, which I've partitioned into 26 tables. > > The loan id's are a sequence of 6 characters, so the partitioning rule > I've used is the first character of the loan id, which yields roughly > equal sized partitions of 2.8 GB or so. > > Each loan can only have one balance per month, so the primary key on > each partition is set to be loan_id and asofdate. > > However, this data is meant to be available via a rails application, > hence, the need for a surrogate key of integers which is unique across > the entire set of partitions. > > Creation of new rows in the partitioned tables should not be an issue > under normal circumstances because I see that all of the child tables > use the same sequence for generating new id's. > > However, what makes me nervous is that there is no explicit constraint > in the database that prevents duplicate id's from being created, and > I'm not sure how the rails app would react if for whatever reason > duplicate id keys wound up in the table. > > Any suggestions? > > Thanks, > Whit > > > > On Sun, Jun 28, 2009 at 1:27 PM, Tom Lane<t...@sss.pgh.pa.us> wrote: > > Whit Armstrong <armstrong.w...@gmail.com> writes: > >> I have a simple example copied from the 8.3 manual on partitioning > >> (http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html). > > >> My question is, if you create a serial type in the parent table which > >> is meant to be the primary key across all the partitions, how does one > >> guarantee uniqueness of that key? > > > One doesn't. That is not an appropriate way to set up a partitioned > > table. You need a primary key that can actually be used as a meaningful > > partitioning key. In this example, the id is completely useless and > > what you should be looking at is making the data_value be the primary > > key. > > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general Hi Whit, you could consider using GUIDs instead of Integer for primary keys. Here is a nice blog post explaining how to do this in Rails: http://ariejan.net/2008/08/12/ruby-on-rails-uuid-as-your-activerecord-primary-key/ Best Regards, Arndt Lehmann
> Hi Whit, > > you could consider using GUIDs instead of Integer for primary keys. > Here is a nice blog post explaining how to do this in Rails: > http://ariejan.net/2008/08/12/ruby-on-rails-uuid-as-your-activerecord... > > Best Regards, > Arndt Lehmann just a note: As per above blog post, the GUID are generated using a Ruby module. PostgreSQL actually comes with a baked in UUID column type. That probably will be a more efficient solution. Regards, Arndt
On Jun 28, 2009, at 11:45 AM, Whit Armstrong wrote: > Thanks, Tom. > > Let me give a little more detail on my actual data rather than the > simple example I sent. > > I have a 60GB table of loan balances, which I've partitioned into 26 > tables. > > The loan id's are a sequence of 6 characters, so the partitioning rule > I've used is the first character of the loan id, which yields roughly > equal sized partitions of 2.8 GB or so. > > Each loan can only have one balance per month, so the primary key on > each partition is set to be loan_id and asofdate. > > However, this data is meant to be available via a rails application, > hence, the need for a surrogate key of integers which is unique across > the entire set of partitions. > > Creation of new rows in the partitioned tables should not be an issue > under normal circumstances because I see that all of the child tables > use the same sequence for generating new id's. > > However, what makes me nervous is that there is no explicit constraint > in the database that prevents duplicate id's from being created, and > I'm not sure how the rails app would react if for whatever reason > duplicate id keys wound up in the table. As long as your inserts always use the default value, nextval('sequence_name'), for the id values then that can never happen unless you at some point use setval('sequence_name', X) where X <= the max value already present in your partitioned table, which you should never be doing anyway. Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k