Обсуждение: Help with pre-loaded arbitrary key sequences

Поиск
Список
Период
Сортировка

Help with pre-loaded arbitrary key sequences

От
"James B. Byrne"
Дата:
I am prototyping a system migration that is to employ Ruby, Rails and
PostgreSQL.  Rails has the convention that the primary key of a row is an
arbitrary integer value assigned by the database manager through a
sequence.  As it turns out, the legacy application employs essentially the
same convention in most instances.

My question is this: Can one assign an id number to a sequenced key column
on create and override the sequencer?  If one does this then can and, if
so, how does the sequencer in Postgresql handle the eventuality of running
into a block of keys holding previously assigned numbers?

For example.  The existing client master dataset employs an eight digit
account number as primary key.  The values in use tend to cluster in
groups at each thousand increment, thus 1..375, 1001..1288, 2001..2225,
3001..3312, ..., 2001001..2001476, ..., etc.

Assuming that these existing entries were all loaded into the new table
with the values given as their primary keys and given that one could not
simply start the sequencer at a value above the highest existing value: If
one was to add a record and auto-generate a sequence number then can the
sequencer handle looping from 1 to 375 and returning 376 and then continue
until it reaches 1001, when it needs be loop again until 1288 and then
return 1289 and so forth?

During the load of the initial table data it would probably be necessary
to disable the sequencer for this column. Is this in fact the case? If so,
how is this done and how is the sequencer restored after the initial
migration of data is complete?

I presume that I can write my own sequencer function to accomplish this in
any case but I wish to know if the existing method handles this case.
Another solution is to simply decouple the existing key value from the new
and treat the exiting client number as a piece of data (with or without an
index), but that seems redundant since the new arbitrary key value might
just as well be the existing arbitrary key value.

Dispensing with the existing arbitrary number is another option but, the
fact is that present business practice is for employees to refer to their
clients and vendors by account number. The existing computer system is 25
years old but employs account numbers that predate automation. The firm is
well past the century mark and some of these numbers have been in use with
a few clients from the end of 1800's.  So, while not strictly a business
case, both the firm and some of its clients have a strong, if irrational,
attachment to preserving the existing scheme.

I regret if these questions appear naive but I am struggling with a lot of
new information on a number of fronts and as usual wish to get quick
answers to questions that may be far more involved than I realize.

Sincerely,

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3


Re: Help with pre-loaded arbitrary key sequences

От
Martijn van Oosterhout
Дата:
On Fri, Jan 11, 2008 at 11:43:54AM -0500, James B. Byrne wrote:
> My question is this: Can one assign an id number to a sequenced key column
> on create and override the sequencer?  If one does this then can and, if
> so, how does the sequencer in Postgresql handle the eventuality of running
> into a block of keys holding previously assigned numbers?

You can set the counter during create, or at any time later. However,
the counter is not defined by the column as such and will happily
return numbers already in the table if you screw it up.

The usual process is to insert normally when loading the data and then
do a setval() on the sequence to past the values already stored.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Вложения

Re: Help with pre-loaded arbitrary key sequences

От
"Scott Marlowe"
Дата:
On Jan 11, 2008 10:43 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:
> I am prototyping a system migration that is to employ Ruby, Rails and
> PostgreSQL.  Rails has the convention that the primary key of a row is an
> arbitrary integer value assigned by the database manager through a
> sequence.  As it turns out, the legacy application employs essentially the
> same convention in most instances.
>
> My question is this: Can one assign an id number to a sequenced key column
> on create and override the sequencer?  If one does this then can and, if
> so, how does the sequencer in Postgresql handle the eventuality of running
> into a block of keys holding previously assigned numbers?
>
> For example.  The existing client master dataset employs an eight digit
> account number as primary key.  The values in use tend to cluster in
> groups at each thousand increment, thus 1..375, 1001..1288, 2001..2225,
> 3001..3312, ..., 2001001..2001476, ..., etc.
>
> Assuming that these existing entries were all loaded into the new table
> with the values given as their primary keys and given that one could not
> simply start the sequencer at a value above the highest existing value: If
> one was to add a record and auto-generate a sequence number then can the
> sequencer handle looping from 1 to 375 and returning 376 and then continue
> until it reaches 1001, when it needs be loop again until 1288 and then
> return 1289 and so forth?

You're essentially wanting to fill in the blanks here.  If you need
good performance, then what you'll need to do is to preallocate all
the numbers that haven't been assigned somewhere.  So, we make a table
something like:

create table locatorcodes (i int, count_id serial);

Then we insert an id into that table for everyone that's missing from
the main table:

insert into locatorcodes (i)
    select b.i from (
        select * from generate_series(1,1000000)as i
    ) as b
    left join main_table a on (b.i=a.i)
    where a.i is null;

Or something like that.

Now, we've got a table with all the unused ids, and a serial count
assigned to them.  Create another sequence:

create checkout_sequence;

and use that to "check out" numbers from locatorcodes:

select i from locatorcodes where count_id=nextval('checkout_sequence');

And since the sequence will just count up, there's little or no
problems with performance.

There's lots of ways of handling this.  That's just one of the ones
that doesn't slow your database down a lot.

If you need to, you can shuffle the numbers going into the
locatorcodes table with an order by random() when you create it.

Re: Help with pre-loaded arbitrary key sequences

От
"James B. Byrne"
Дата:
On Wed, January 16, 2008 18:40, Scott Marlowe wrote:
> You're essentially wanting to fill in the blanks here.  If you need
> good performance, then what you'll need to do is to preallocate all
> the numbers that haven't been assigned somewhere.  So, we make a table
> something like:
>
> create table locatorcodes (i int, count_id serial);
>
> Then we insert an id into that table for everyone that's missing from
> the main table:
>
> insert into locatorcodes (i)
>     select b.i from (
>         select * from generate_series(1,1000000)as i
>     ) as b
>     left join main_table a on (b.i=a.i)
>     where a.i is null;
>
> Or something like that.
>
> Now, we've got a table with all the unused ids, and a serial count
> assigned to them.  Create another sequence:
>
> create checkout_sequence;
>
> and use that to "check out" numbers from locatorcodes:
>
> select i from locatorcodes where count_id=nextval('checkout_sequence');
>
> And since the sequence will just count up, there's little or no
> problems with performance.
>
> There's lots of ways of handling this.  That's just one of the ones
> that doesn't slow your database down a lot.
>
> If you need to, you can shuffle the numbers going into the
> locatorcodes table with an order by random() when you create it.
>

Martin and Scott,

Thank you both for your responses.

If the entries involved numbered in the millions then Scott's approach has
considerable merit.  In my case, as the rate of additions is very low and
the size of the existing blocks is in the hundreds rather than hundreds of
thousands then I believe that I will simply write my own iterator and do a
repetitive select when on the incrementally proposed values until an
opening is found then insert the new entry and update the iterator next
value accordingly.

Regards,

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3


Re: Help with pre-loaded arbitrary key sequences

От
"Scott Marlowe"
Дата:
On Jan 17, 2008 9:05 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:
>
> If the entries involved numbered in the millions then Scott's approach has
> considerable merit.  In my case, as the rate of additions is very low and
> the size of the existing blocks is in the hundreds rather than hundreds of
> thousands then I believe that I will simply write my own iterator and do a
> repetitive select when on the incrementally proposed values until an
> opening is found then insert the new entry and update the iterator next
> value accordingly.

If race conditions are a possible issue, you use a sequence and
increment that until you get a number that isn't used.  That way two
clients connecting at the same time can get different, available
numbers.

Re: Help with pre-loaded arbitrary key sequences

От
"James B. Byrne"
Дата:
On Thu, January 17, 2008 10:15, Scott Marlowe wrote:
>
> If race conditions are a possible issue, you use a sequence and
> increment that until you get a number that isn't used.  That way two
> clients connecting at the same time can get different, available
> numbers.
>

That is close to the idea that I originally had.  I was simply wondering
if the built-in sequencer could handle this case or whether I need to roll
my own.

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3


Re: Help with pre-loaded arbitrary key sequences

От
"Scott Marlowe"
Дата:
On Jan 17, 2008 9:19 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:
>
> On Thu, January 17, 2008 10:15, Scott Marlowe wrote:
> >
> > If race conditions are a possible issue, you use a sequence and
> > increment that until you get a number that isn't used.  That way two
> > clients connecting at the same time can get different, available
> > numbers.
> >
>
> That is close to the idea that I originally had.  I was simply wondering
> if the built-in sequencer could handle this case or whether I need to roll
> my own.

Yeah, the built in sequencer just increments by one, nothing else.
But it should be pretty easy to write a pl/pgsql function that grabs
the next value and loop until it finds one that's available.

Re: Help with pre-loaded arbitrary key sequences

От
"Scott Marlowe"
Дата:
On Jan 17, 2008 9:19 AM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:
>
> On Thu, January 17, 2008 10:15, Scott Marlowe wrote:
> >
> > If race conditions are a possible issue, you use a sequence and
> > increment that until you get a number that isn't used.  That way two
> > clients connecting at the same time can get different, available
> > numbers.
> >
>
> That is close to the idea that I originally had.  I was simply wondering
> if the built-in sequencer could handle this case or whether I need to roll
> my own.

Got bored, hacked this aggregious pl/pgsql routine up.  It looks
horrible, but I wanted it to be able to use indexes.  Seems to work.
Test has ~750k rows and returns in it and returns a new id in  < 1ms
on my little server.

File attached.

Вложения

Re: Help with pre-loaded arbitrary key sequences

От
"James B. Byrne"
Дата:
On Thu, January 17, 2008 11:48, Scott Marlowe wrote:

> Got bored, hacked this aggregious pl/pgsql routine up.  It looks
> horrible, but I wanted it to be able to use indexes.  Seems to work.
> Test has ~750k rows and returns in it and returns a new id in  < 1ms
> on my little server.
>
> File attached.
>

Many thanks.

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3