Обсуждение: Could postgres12 support millions of sequences? (like 10 million)

Поиск
Список
Период
Сортировка

Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:
Hello,

My schema requires a counter for each combination of 2 values. Something like:

CREATE TABLE counter(
group INT NOT NULL,
element INT NOT NULL,
seq_number INT NOT NULL default 0,
CONSTRAINT PRIMARY KEY (group, element)
);

For each entry in counter, aka for each (group, element) pair, the model requires a seq_number.

If I use a table "counter", I could still have counter collisions between 2 transactions. I need truly sequence behavior. Is that possible by using a table like "counter" table, where the counter could be increased out of the transaction so it performs as a sequence without having race conditions between concurrent transactions?

The other option is to create sequences for each new pair of (group, element) using triggers. There are millions of pairs. So this approach will generate millions of sequences.

How a PostgreSQL database would behave having millions of sequences in a schema? Would it degrade its performance? Is there any negative impact?

Regards

Pablo



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:

> On Mar 19, 2020, at 3:36 PM, pabloa98 <pabloa98@gmail.com> wrote:
>
> Hello,
>
> My schema requires a counter for each combination of 2 values. Something like:
>
> CREATE TABLE counter(
> group INT NOT NULL,
> element INT NOT NULL,
> seq_number INT NOT NULL default 0,
> CONSTRAINT PRIMARY KEY (group, element)
> );
>
> For each entry in counter, aka for each (group, element) pair, the model requires a seq_number.
>
> If I use a table "counter", I could still have counter collisions between 2 transactions. I need truly sequence
behavior.Is that possible by using a table like "counter" table, where the counter could be increased out of the
transactionso it performs as a sequence without having race conditions between concurrent transactions? 
>
> The other option is to create sequences for each new pair of (group, element) using triggers. There are millions of
pairs.So this approach will generate millions of sequences. 
>
> How a PostgreSQL database would behave having millions of sequences in a schema? Would it degrade its performance? Is
thereany negative impact? 
>
> Regards
>
> Pablo
>

To clarify, are you hoping for consecutive numbers as the each row is added to the table, i.e. “serial”?

What is the intension of “seq_number”?
>
>




Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Thu, Mar 19, 2020 at 2:50 PM Rob Sargent <robjsargent@gmail.com> wrote:


> On Mar 19, 2020, at 3:36 PM, pabloa98 <pabloa98@gmail.com> wrote:
>
> Hello,
>
> My schema requires a counter for each combination of 2 values. Something like:
>
> CREATE TABLE counter(
> group INT NOT NULL,
> element INT NOT NULL,
> seq_number INT NOT NULL default 0,
> CONSTRAINT PRIMARY KEY (group, element)
> );
>
> For each entry in counter, aka for each (group, element) pair, the model requires a seq_number.
>
> If I use a table "counter", I could still have counter collisions between 2 transactions. I need truly sequence behavior. Is that possible by using a table like "counter" table, where the counter could be increased out of the transaction so it performs as a sequence without having race conditions between concurrent transactions?
>
> The other option is to create sequences for each new pair of (group, element) using triggers. There are millions of pairs. So this approach will generate millions of sequences.
>
> How a PostgreSQL database would behave having millions of sequences in a schema? Would it degrade its performance? Is there any negative impact?
>
> Regards
>
> Pablo
>

To clarify, are you hoping for consecutive numbers as the each row is added to the table, i.e. “serial”?

What is the intension of “seq_number”?
>
>


the idea is to have like a serial sequence, but for each pair of (group, element).

so that when we insert rows in another table, we could have something like:


group, element, event_id, ...
1, 1, 1
1, 1, 2
1, 1, 3
2, 1, 1
1, 1, 4
1, 3, 1
1, 1, 5
1, 3, 2
2, 1, 2
2, 1, 3

The 3rd column is the sequence number we get from the appropriate sequence created by the trigger.
 
I want to implement a variation of https://stackoverflow.com/a/30204854 and that will generate millions of sequences.


Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:


On Mar 19, 2020, at 4:13 PM, pabloa98 <pabloa98@gmail.com> wrote:



On Thu, Mar 19, 2020 at 2:50 PM Rob Sargent <robjsargent@gmail.com> wrote:


> On Mar 19, 2020, at 3:36 PM, pabloa98 <pabloa98@gmail.com> wrote:
>
> Hello,
>
> My schema requires a counter for each combination of 2 values. Something like:
>
> CREATE TABLE counter(
> group INT NOT NULL,
> element INT NOT NULL,
> seq_number INT NOT NULL default 0,
> CONSTRAINT PRIMARY KEY (group, element)
> );
>
> For each entry in counter, aka for each (group, element) pair, the model requires a seq_number.
>
> If I use a table "counter", I could still have counter collisions between 2 transactions. I need truly sequence behavior. Is that possible by using a table like "counter" table, where the counter could be increased out of the transaction so it performs as a sequence without having race conditions between concurrent transactions?
>
> The other option is to create sequences for each new pair of (group, element) using triggers. There are millions of pairs. So this approach will generate millions of sequences.
>
> How a PostgreSQL database would behave having millions of sequences in a schema? Would it degrade its performance? Is there any negative impact?
>
> Regards
>
> Pablo
>

To clarify, are you hoping for consecutive numbers as the each row is added to the table, i.e. “serial”?

What is the intension of “seq_number”?
>
>


the idea is to have like a serial sequence, but for each pair of (group, element).

so that when we insert rows in another table, we could have something like:


group, element, event_id, ...
1, 1, 1
1, 1, 2
1, 1, 3
2, 1, 1
1, 1, 4
1, 3, 1
1, 1, 5
1, 3, 2
2, 1, 2
2, 1, 3

The 3rd column is the sequence number we get from the appropriate sequence created by the trigger.
 
I want to implement a variation of https://stackoverflow.com/a/30204854 and that will generate millions of sequences.


Then I don’t thing group/element can be a PRIMARY KEY 


Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Thu, Mar 19, 2020 at 3:17 PM Rob Sargent <robjsargent@gmail.com> wrote:


On Mar 19, 2020, at 4:13 PM, pabloa98 <pabloa98@gmail.com> wrote:



On Thu, Mar 19, 2020 at 2:50 PM Rob Sargent <robjsargent@gmail.com> wrote:


> On Mar 19, 2020, at 3:36 PM, pabloa98 <pabloa98@gmail.com> wrote:
>
> Hello,
>
> My schema requires a counter for each combination of 2 values. Something like:
>
> CREATE TABLE counter(
> group INT NOT NULL,
> element INT NOT NULL,
> seq_number INT NOT NULL default 0,
> CONSTRAINT PRIMARY KEY (group, element)
> );
>
> For each entry in counter, aka for each (group, element) pair, the model requires a seq_number.
>
> If I use a table "counter", I could still have counter collisions between 2 transactions. I need truly sequence behavior. Is that possible by using a table like "counter" table, where the counter could be increased out of the transaction so it performs as a sequence without having race conditions between concurrent transactions?
>
> The other option is to create sequences for each new pair of (group, element) using triggers. There are millions of pairs. So this approach will generate millions of sequences.
>
> How a PostgreSQL database would behave having millions of sequences in a schema? Would it degrade its performance? Is there any negative impact?
>
> Regards
>
> Pablo
>

To clarify, are you hoping for consecutive numbers as the each row is added to the table, i.e. “serial”?

What is the intension of “seq_number”?
>
>


the idea is to have like a serial sequence, but for each pair of (group, element).

so that when we insert rows in another table, we could have something like:


group, element, event_id, ...
1, 1, 1
1, 1, 2
1, 1, 3
2, 1, 1
1, 1, 4
1, 3, 1
1, 1, 5
1, 3, 2
2, 1, 2
2, 1, 3

The 3rd column is the sequence number we get from the appropriate sequence created by the trigger.
 
I want to implement a variation of https://stackoverflow.com/a/30204854 and that will generate millions of sequences.


Then I don’t thing group/element can be a PRIMARY KEY 


Thank you for your answer. Let me explain:

Table "counter" will have triggers that will create a sequence with a new row is inserted called counter_1_1_seq, counter_2_1_seq, counter_1_3_seq, etc that will be used to insert values in **another** table. It will be used for nothing else. When we insert millions of pairs group/element, the trigger in that table will generate millions of sequences.

My question is how PostgreSQL will behave. Could it deal with millions of sequences? What about system operations as vacuum, etc?

Pablo

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"David G. Johnston"
Дата:
On Thu, Mar 19, 2020 at 3:33 PM pabloa98 <pabloa98@gmail.com> wrote:
Table "counter" will have triggers that will create a sequence with a new row is inserted called counter_1_1_seq, counter_2_1_seq, counter_1_3_seq, etc that will be used to insert values in **another** table. It will be used for nothing else. When we insert millions of pairs group/element, the trigger in that table will generate millions of sequences. 

My question is how PostgreSQL will behave. Could it deal with millions of sequences? What about system operations as vacuum, etc?


First, it sounds like you care about there being no gaps in the records you end up saving.  If that is the case then sequences will not work for you.

.
.
.

If you are still reading because you can live with some gaps - then does having one sequence per pair really matter?

Regardless, at this scale you probably should setup a performance test as part of your continuous integration/test infrastructure, and let it answer the "which performs better" question.

However, one other consideration with sequences: do you care that PostgreSQL will cache/pin (i.e., no release) every single sequence you touch for the lifetime of the session? (I do not think DISCARD matters here but I'm just guessing)

A big determinant would seem to be how your sessions would go about using the sequences.  You've described the model requirement but haven't describe the process by which the model will be used; and without that information useful comments pertaining to alternative implementations are difficult to formulate.

David J.

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/19/20 3:32 PM, pabloa98 wrote:
> 
> 
> On Thu, Mar 19, 2020 at 3:17 PM Rob Sargent <robjsargent@gmail.com 
> <mailto:robjsargent@gmail.com>> wrote:
> 
> 
> 
>>     On Mar 19, 2020, at 4:13 PM, pabloa98 <pabloa98@gmail.com
>>     <mailto:pabloa98@gmail.com>> wrote:
>>
>>
>>
>>     On Thu, Mar 19, 2020 at 2:50 PM Rob Sargent <robjsargent@gmail.com
>>     <mailto:robjsargent@gmail.com>> wrote:
>>
>>
>>
>>         > On Mar 19, 2020, at 3:36 PM, pabloa98 <pabloa98@gmail.com
>>         <mailto:pabloa98@gmail.com>> wrote:
>>         >
>>         > Hello,
>>         >
>>         > My schema requires a counter for each combination of 2
>>         values. Something like:
>>         >
>>         > CREATE TABLE counter(
>>         > group INT NOT NULL,
>>         > element INT NOT NULL,
>>         > seq_number INT NOT NULL default 0,
>>         > CONSTRAINT PRIMARY KEY (group, element)
>>         > );
>>         >
>>         > For each entry in counter, aka for each (group, element)
>>         pair, the model requires a seq_number.
>>         >
>>         > If I use a table "counter", I could still have counter
>>         collisions between 2 transactions. I need truly sequence
>>         behavior. Is that possible by using a table like "counter"
>>         table, where the counter could be increased out of the
>>         transaction so it performs as a sequence without having race
>>         conditions between concurrent transactions?
>>         >
>>         > The other option is to create sequences for each new pair of
>>         (group, element) using triggers. There are millions of pairs.
>>         So this approach will generate millions of sequences.
>>         >
>>         > How a PostgreSQL database would behave having millions of
>>         sequences in a schema? Would it degrade its performance? Is
>>         there any negative impact?
>>         >
>>         > Regards
>>         >
>>         > Pablo
>>         >
>>
>>         To clarify, are you hoping for consecutive numbers as the each
>>         row is added to the table, i.e. “serial”?
>>
>>         What is the intension of “seq_number”?
>>         >
>>         >
>>
>>
>>     the idea is to have like a serial sequence, but for each pair of
>>     (group, element).
>>
>>     so that when we insert rows in another table, we could have
>>     something like:
>>
>>
>>     group, element, event_id, ...
>>     1, 1, 1
>>     1, 1, 2
>>     1, 1, 3
>>     2, 1, 1
>>     1, 1, 4
>>     1, 3, 1
>>     1, 1, 5
>>     1, 3, 2
>>     2, 1, 2
>>     2, 1, 3
>>
>>     The 3rd column is the sequence number we get from the appropriate
>>     sequence created by the trigger.
>>     I want to implement a variation of
>>     https://stackoverflow.com/a/30204854 and that will generate
>>     millions of sequences.
>>
>>
>     Then I don’t thing group/element can be a PRIMARY KEY
> 
> 
> Thank you for your answer. Let me explain:
> 
> Table "counter" will have triggers that will create a sequence with a 
> new row is inserted called counter_1_1_seq, counter_2_1_seq, 
> counter_1_3_seq, etc that will be used to insert values in **another** 
> table. It will be used for nothing else. When we insert millions of 
> pairs group/element, the trigger in that table will generate millions of 
> sequences.

This has train wreck written all over it. If you are going to have a 
trigger for each combination of (group, element) I gotta believe the 
table will fall over before you run into sequence issues. Not sure why 
there can't be serial column that has supplies the sequence numbers and 
therefore only one sequence in play. From what I see all you care about 
is that:

group, element, event_id

has an increasing event_id for (group, element) pairs.


> 
> My question is how PostgreSQL will behave. Could it deal with millions 
> of sequences? What about system operations as vacuum, etc?
> 
> Pablo


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Thu, Mar 19, 2020 at 5:13 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/19/20 3:32 PM, pabloa98 wrote:
>
>
> On Thu, Mar 19, 2020 at 3:17 PM Rob Sargent <robjsargent@gmail.com
> <mailto:robjsargent@gmail.com>> wrote:
>
>
>
>>     On Mar 19, 2020, at 4:13 PM, pabloa98 <pabloa98@gmail.com
>>     <mailto:pabloa98@gmail.com>> wrote:
>>
>>
>>
>>     On Thu, Mar 19, 2020 at 2:50 PM Rob Sargent <robjsargent@gmail.com
>>     <mailto:robjsargent@gmail.com>> wrote:
>>
>>
>>
>>         > On Mar 19, 2020, at 3:36 PM, pabloa98 <pabloa98@gmail.com
>>         <mailto:pabloa98@gmail.com>> wrote:
>>         >
>>         > Hello,
>>         >
>>         > My schema requires a counter for each combination of 2
>>         values. Something like:
>>         >
>>         > CREATE TABLE counter(
>>         > group INT NOT NULL,
>>         > element INT NOT NULL,
>>         > seq_number INT NOT NULL default 0,
>>         > CONSTRAINT PRIMARY KEY (group, element)
>>         > );
>>         >
>>         > For each entry in counter, aka for each (group, element)
>>         pair, the model requires a seq_number.
>>         >
>>         > If I use a table "counter", I could still have counter
>>         collisions between 2 transactions. I need truly sequence
>>         behavior. Is that possible by using a table like "counter"
>>         table, where the counter could be increased out of the
>>         transaction so it performs as a sequence without having race
>>         conditions between concurrent transactions?
>>         >
>>         > The other option is to create sequences for each new pair of
>>         (group, element) using triggers. There are millions of pairs.
>>         So this approach will generate millions of sequences.
>>         >
>>         > How a PostgreSQL database would behave having millions of
>>         sequences in a schema? Would it degrade its performance? Is
>>         there any negative impact?
>>         >
>>         > Regards
>>         >
>>         > Pablo
>>         >
>>
>>         To clarify, are you hoping for consecutive numbers as the each
>>         row is added to the table, i.e. “serial”?
>>
>>         What is the intension of “seq_number”?
>>         >
>>         >
>>
>>
>>     the idea is to have like a serial sequence, but for each pair of
>>     (group, element).
>>
>>     so that when we insert rows in another table, we could have
>>     something like:
>>
>>
>>     group, element, event_id, ...
>>     1, 1, 1
>>     1, 1, 2
>>     1, 1, 3
>>     2, 1, 1
>>     1, 1, 4
>>     1, 3, 1
>>     1, 1, 5
>>     1, 3, 2
>>     2, 1, 2
>>     2, 1, 3
>>
>>     The 3rd column is the sequence number we get from the appropriate
>>     sequence created by the trigger.
>>     I want to implement a variation of
>>     https://stackoverflow.com/a/30204854 and that will generate
>>     millions of sequences.
>>
>>
>     Then I don’t thing group/element can be a PRIMARY KEY
>
>
> Thank you for your answer. Let me explain:
>
> Table "counter" will have triggers that will create a sequence with a
> new row is inserted called counter_1_1_seq, counter_2_1_seq,
> counter_1_3_seq, etc that will be used to insert values in **another**
> table. It will be used for nothing else. When we insert millions of
> pairs group/element, the trigger in that table will generate millions of
> sequences.

This has train wreck written all over it. If you are going to have a
trigger for each combination of (group, element) I gotta believe the
table will fall over before you run into sequence issues. Not sure why
there can't be serial column that has supplies the sequence numbers and
therefore only one sequence in play. From what I see all you care about
is that:

group, element, event_id

has an increasing event_id for (group, element) pairs.



I need a different sequence because it is a business requirement.
Please forget about the first example. I was not clear. It will be something like this.

I have a table called "pair":

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

I must add entries to the table "event". This table event will have a code that will be generated using a sequence of that is a function of s(group,element).
CREATE TABLE event(
group INT NOT NULL,
element INT NOT NULL,
code INT NOT NULL,
CONSTRAINT PRIMARY KEY(code, element, group)
);

I could make column code SERIAL and be done with it. HOWEVER, the "code" column starting from 1 for each pair (group, element) is a business requirement. The business domain dictates that for each pair of (group, element) there is a code that will start from 1 to 99999999. It is OK to have gaps in the sequence.

So this table "event" will contain entries like:

     1, 1, 1
     1, 1, 2
     1, 1, 3
     2, 1, 1
     1, 1, 4
     1, 3, 1
     1, 1, 5
     1, 3, 2
     2, 1, 2
     2, 1, 3

These triplets will be inserted by several processes (aka several concurrent transactions could happen at the same time). Since they are concurrent transactions, it could be possible that a trivial calculation of "code" column (something like SELECT MAX(code) + 1 FROM event WHERE group=1 AND element = 1;) will generate collisions because 2 transactions could come with the same value of "code" at the same time.

So I need something like a sequence for each pair of (group, element)

So I added to the table "pair" a trigger to create/drop these sequences each time a new row is inserted or deleted.

Something like:

    CREATE FUNCTION make_pair_sequence() RETURNS trigger
      LANGUAGE plpgsql
    AS $make_pair_sequence$
      BEGIN
        IF (TG_OP = 'INSERT' OR TG_OP = 'UPDATE') THEN
          EXECUTE format('
            CREATE SEQUENCE IF NOT EXISTS code_%s_%s_seq
              INCREMENT 1
              START 1
              CACHE 1
              MINVALUE 1
              MAXVALUE 99999999  -- 8 digits
          ', NEW.group, NEW.element);
        ELSEIF (TG_OP = 'DELETE') THEN
          EXECUTE format('
            DROP SEQUENCE IF EXISTS code_%s_%s_seq
          ', NEW.group, NEW.element);
        -- ELSEIF (TG_OP = 'TRUNCATE') THEN
        END IF;
        RETURN NULL;  -- result ignored, so no biggie about what it is returned..
      END
    $make_pair_sequence$;

    DROP TRIGGER IF EXISTS make_pair_sequence
      ON pair CASCADE;
    CREATE TRIGGER make_pair_sequence
      AFTER INSERT OR UPDATE OR DELETE ON pair
      FOR EACH ROW EXECUTE FUNCTION make_pair_sequence();

each time I insert a pair with something like:

INSERT INTO PAIR( group, element) VALUES (1, 1);   -- trigger creates sequence code_1_1_seq
INSERT INTO PAIR( group, element) VALUES (1, 3);   -- trigger creates sequence code_1_3_seq
INSERT INTO PAIR( group, element) VALUES (2, 1);   -- trigger creates sequence code_2_1_seq
 
Later I will insert values in the "event" table with values like:

INSERT INTO event(group, element, code) VALUES (1, 1, nextval('code_1_1_seq'));
INSERT INTO event(group, element, code) VALUES (1, 3, nextval('code_1_3_seq'));
INSERT INTO event(group, element, code) VALUES (2, 1, nextval('code_2_1_seq'));
etc

And since there are millions of possible pairs, the database will get millions of sequences. The "code" column value generation on table "event" will have no race conditions and everything will work fine from the business point of view.

Now the questions are:

- Could PostgreSQL deal with millions of sequences? What will happen with maintenance commands (like VACCUM)? Will they suffer performance issues?
- Is there a better method of having independent code counters transaction-safe with no locks?

Sorry about the first email. I did not explain clearly what the problem was.




>
> My question is how PostgreSQL will behave. Could it deal with millions
> of sequences? What about system operations as vacuum, etc?
>
> Pablo


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:


On Mar 19, 2020, at 6:45 PM, pabloa98 <pabloa98@gmail.com> wrote:



On Thu, Mar 19, 2020 at 5:13 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/19/20 3:32 PM, pabloa98 wrote:
> 
> 
> On Thu, Mar 19, 2020 at 3:17 PM Rob Sargent <robjsargent@gmail.com 
> <mailto:robjsargent@gmail.com>> wrote:I have a table called "pair":

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

I must add entries to the table "event". This table event will have a code that will be generated using a sequence of that is a function of s(group,element). 
CREATE TABLE event(
group INT NOT NULL,
element INT NOT NULL,
code INT NOT NULL,
CONSTRAINT PRIMARY KEY(code, element, group)
);

Unless event table is searched by code more than group, you probably want to maintain the key order from the pair table's primary key.

If gaps are ok do you still near multiple sequences?


Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Thu, Mar 19, 2020 at 6:16 PM Rob Sargent <robjsargent@gmail.com> wrote:


On Mar 19, 2020, at 6:45 PM, pabloa98 <pabloa98@gmail.com> wrote:



On Thu, Mar 19, 2020 at 5:13 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/19/20 3:32 PM, pabloa98 wrote:
> 
> 
> On Thu, Mar 19, 2020 at 3:17 PM Rob Sargent <robjsargent@gmail.com 
> <mailto:robjsargent@gmail.com>> wrote:I have a table called "pair":

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

I must add entries to the table "event". This table event will have a code that will be generated using a sequence of that is a function of s(group,element). 
CREATE TABLE event(
group INT NOT NULL,
element INT NOT NULL,
code INT NOT NULL,
CONSTRAINT PRIMARY KEY(code, element, group)
);

Unless event table is searched by code more than group, you probably want to maintain the key order from the pair table's primary key.

OK. I will do that.

If gaps are ok do you still near multiple sequences?

I need to start each "code" value from 1 in each (group, element) pair. This is because we only have 99999999 codes for each pair. We do not want to waste numbers. If there is a gap is OK but no a gap of millions of numbers.

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:


On Mar 19, 2020, at 7:35 PM, pabloa98 <pabloa98@gmail.com> wrote:



On Thu, Mar 19, 2020 at 6:16 PM Rob Sargent <robjsargent@gmail.com> wrote:


On Mar 19, 2020, at 6:45 PM, pabloa98 <pabloa98@gmail.com> wrote:



On Thu, Mar 19, 2020 at 5:13 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/19/20 3:32 PM, pabloa98 wrote:
> 
> 
> On Thu, Mar 19, 2020 at 3:17 PM Rob Sargent <robjsargent@gmail.com 
> <mailto:robjsargent@gmail.com>> wrote:I have a table called "pair":

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

I must add entries to the table "event". This table event will have a code that will be generated using a sequence of that is a function of s(group,element). 
CREATE TABLE event(
group INT NOT NULL,
element INT NOT NULL,
code INT NOT NULL,
CONSTRAINT PRIMARY KEY(code, element, group)
);

Unless event table is searched by code more than group, you probably want to maintain the key order from the pair table's primary key.

OK. I will do that. 

If gaps are ok do you still near multiple sequences?

I need to start each "code" value from 1 in each (group, element) pair. This is because we only have 99999999 codes for each pair. We do not want to waste numbers. If there is a gap is OK but no a gap of millions of numbers.


Re: Could postgres12 support millions of sequences? (like 10 million)

От
Michael Lewis
Дата:


On Thu, Mar 19, 2020, 5:48 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
However, one other consideration with sequences: do you care that PostgreSQL will cache/pin (i.e., no release) every single sequence you touch for the lifetime of the session? (I do not think DISCARD matters here but I'm just guessing)


Would you expand on this point or is there someplace specific in the documentation on this?

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Tom Lane
Дата:
Michael Lewis <mlewis@entrata.com> writes:
> On Thu, Mar 19, 2020, 5:48 PM David G. Johnston <david.g.johnston@gmail.com>
> wrote:
>> However, one other consideration with sequences: do you care that
>> PostgreSQL will cache/pin (i.e., no release) every single sequence you
>> touch for the lifetime of the session? (I do not think DISCARD matters here
>> but I'm just guessing)

> Would you expand on this point or is there someplace specific in the
> documentation on this?

I think what David is worried about is that a sequence object is a
one-row table in PG's implementation.  Thus

(1) each sequence requires a dozen or two rows in assorted system
catalogs (not sure exactly how many offhand).

(2) each sequence uses up 8KB on disk for its table file.

(3) each sequence you actually access within a particular session
results in creation of relcache and catcache entries in that
session's local memory.  I'm not sure offhand how big those are
either, but a few KB per sequence would be a reasonable guess.

(4) each sequence competes for space in the shared-buffer arena,
since its 8K block has to be swapped into there whenever you try
to access/increment the sequence's value.

This is all mighty inefficient of course, and there's been talk
of trying to reduce the per-sequence overhead; but I don't know
of anyone actively working on that.  As things stand, I think
having millions of sequences would be quite painful performance-
wise, especially if your workload were actively hitting a lot
of them concurrently.  It would work, for some value of "work",
but it wouldn't perform very well.

Also, as multiple people mentioned already, this still wouldn't
guarantee gap-free sequences of ID values.

            regards, tom lane



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/19/20 7:38 PM, Michael Lewis wrote:
> 
> 
> On Thu, Mar 19, 2020, 5:48 PM David G. Johnston 
> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
> 
>     However, one other consideration with sequences: do you care that
>     PostgreSQL will cache/pin (i.e., no release) every single sequence
>     you touch for the lifetime of the session? (I do not think DISCARD
>     matters here but I'm just guessing)
> 
> 
> 
> Would you expand on this point or is there someplace specific in the 
> documentation on this?
> 

See the section starting here:

https://www.postgresql.org/docs/12/sql-createsequence.html

Notes

"Unexpected results might be obtained if a cache setting greater than 
one is used for a sequence object that will be used concurrently by 
multiple sessions. Each session will allocate and cache successive 
sequence values during one access to the sequence object and increase 
the sequence object's last_value accordingly. Then, the next cache-1 
uses of nextval within that session simply return the preallocated 
values without touching the sequence object. So, any numbers allocated 
but not used within a session will be lost when that session ends, 
resulting in “holes” in the sequence.

...
"


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Thu, Mar 19, 2020 at 9:12 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/19/20 7:38 PM, Michael Lewis wrote:
>
>
> On Thu, Mar 19, 2020, 5:48 PM David G. Johnston
> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
>
>     However, one other consideration with sequences: do you care that
>     PostgreSQL will cache/pin (i.e., no release) every single sequence
>     you touch for the lifetime of the session? (I do not think DISCARD
>     matters here but I'm just guessing)
>
>
>
> Would you expand on this point or is there someplace specific in the
> documentation on this?
>

See the section starting here:

https://www.postgresql.org/docs/12/sql-createsequence.html

Notes

"Unexpected results might be obtained if a cache setting greater than
one is used for a sequence object that will be used concurrently by
multiple sessions. Each session will allocate and cache successive
sequence values during one access to the sequence object and increase
the sequence object's last_value accordingly. Then, the next cache-1
uses of nextval within that session simply return the preallocated
values without touching the sequence object. So, any numbers allocated
but not used within a session will be lost when that session ends,
resulting in “holes” in the sequence.

...
"

We will use a CACHE 1. This is because when nextval('seq') is invoked, we are hitting 3 or 4 more tables so the sequence will not be a performance blocker (compared with all the operations in the transaction).


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:
I see. 

Any suggestion? It should behave like a sequence in the sense that concurrent transitions will get different numbers from this alternative sequence like solution.

In our case, we will need to do a call nextval('some_seq') (or similar) from different processes no more than twice every minute.


It would be nice to have a sequence data type. Correct me if I am wrong, but It seems to me that a sequence data type would cost the same or less than the current sequence implementation.

The schema would be more clear too. We could have a table like:

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
seq SEQUENCE INCREMENT 1
              START 1
              CACHE 1
              MINVALUE 1
              MAXVALUE 99999999
        NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

And then:

INSERT INTO event(group, element, code)
VALUES (
         1,
         1,
         ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE p.group=1 and p.code=1 )
);

Or perhaps storing all the sequences in the same table as rows will have the same behavior.

Pablo


On Thu, Mar 19, 2020 at 7:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Lewis <mlewis@entrata.com> writes:
> On Thu, Mar 19, 2020, 5:48 PM David G. Johnston <david.g.johnston@gmail.com>
> wrote:
>> However, one other consideration with sequences: do you care that
>> PostgreSQL will cache/pin (i.e., no release) every single sequence you
>> touch for the lifetime of the session? (I do not think DISCARD matters here
>> but I'm just guessing)

> Would you expand on this point or is there someplace specific in the
> documentation on this?

I think what David is worried about is that a sequence object is a
one-row table in PG's implementation.  Thus

(1) each sequence requires a dozen or two rows in assorted system
catalogs (not sure exactly how many offhand).

(2) each sequence uses up 8KB on disk for its table file.

(3) each sequence you actually access within a particular session
results in creation of relcache and catcache entries in that
session's local memory.  I'm not sure offhand how big those are
either, but a few KB per sequence would be a reasonable guess.

(4) each sequence competes for space in the shared-buffer arena,
since its 8K block has to be swapped into there whenever you try
to access/increment the sequence's value.

This is all mighty inefficient of course, and there's been talk
of trying to reduce the per-sequence overhead; but I don't know
of anyone actively working on that.  As things stand, I think
having millions of sequences would be quite painful performance-
wise, especially if your workload were actively hitting a lot
of them concurrently.  It would work, for some value of "work",
but it wouldn't perform very well.

Also, as multiple people mentioned already, this still wouldn't
guarantee gap-free sequences of ID values.

                        regards, tom lane

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"David G. Johnston"
Дата:
On Thursday, March 19, 2020, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Lewis <mlewis@entrata.com> writes:
> On Thu, Mar 19, 2020, 5:48 PM David G. Johnston <david.g.johnston@gmail.com>
> wrote:
>> However, one other consideration with sequences: do you care that
>> PostgreSQL will cache/pin (i.e., no release) every single sequence you
>> touch for the lifetime of the session? (I do not think DISCARD matters here
>> but I'm just guessing)

> Would you expand on this point or is there someplace specific in the
> documentation on this?

I think what David is worried about is that a sequence object is a
one-row table in PG's implementation.  Thus

(1) each sequence requires a dozen or two rows in assorted system
catalogs (not sure exactly how many offhand).

Actually I seemed to have missed that dynamic.  I was actually referring to the SeqTable hash table specified here:


I wouldn’t think there would be much spreading of data throughout the catalog if the sequences are unowned (by tables).

David J.

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"David G. Johnston"
Дата:
On Thursday, March 19, 2020, pabloa98 <pabloa98@gmail.com> wrote:

We will use a CACHE 1. This is because when nextval('seq') is invoked, we are hitting 3 or 4 more tables so the sequence will not be a performance blocker (compared with all the operations in the transaction).

The other implementation detail to remember is:

“Because nextval and setval calls are never rolled back, sequence objects cannot be used if gapless assignment of sequence numbers is needed.”

David J.

 

Re: Could postgres12 support millions of sequences? (like 10million)

От
rob stone
Дата:
Hello,

On Thu, 2020-03-19 at 14:36 -0700, pabloa98 wrote:
> Hello,
> 
> My schema requires a counter for each combination of 2 values.
> Something like:
> 
> CREATE TABLE counter(
> group INT NOT NULL,
> element INT NOT NULL,
> seq_number INT NOT NULL default 0,
> CONSTRAINT PRIMARY KEY (group, element)
> );
> 
> For each entry in counter, aka for each (group, element) pair, the
> model requires a seq_number.
> 
> If I use a table "counter", I could still have counter collisions
> between 2 transactions. I need truly sequence behavior. Is that
> possible by using a table like "counter" table, where the counter
> could be increased out of the transaction so it performs as a
> sequence without having race conditions between concurrent
> transactions?
> 
> The other option is to create sequences for each new pair of (group,
> element) using triggers. There are millions of pairs. So this
> approach will generate millions of sequences.
> 
> How a PostgreSQL database would behave having millions of sequences
> in a schema? Would it degrade its performance? Is there any negative
> impact?
> 
> Regards
> 
> Pablo
> 
> 
> 


Have you tried the following:-

BEGIN;
SELECT (seq_number + 1) FROM counter WHERE group = $1 
AND element = $2 FOR UPDATE; 

If that gives a NOTFOUND exception, then you need to insert into table
counter using a seq_number = 1, or update it with (seq_number + 1).
Then do the insert into whatever table needs the seq_number value.

If no errors detected, then COMMIT else ROLLBACK.
It is all in one transaction block.
We use this approach to assign invoice numbers.

You could pre-seed table counter when a group/element pair is created
for the first time.

HTH,
Robert





Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/19/20 10:31 PM, pabloa98 wrote:
> I see.
> 
> Any suggestion? It should behave like a sequence in the sense that 
> concurrent transitions will get different numbers from this alternative 
> sequence like solution.
> 
> In our case, we will need to do a call nextval('some_seq') (or similar) 
> from different processes no more than twice every minute.
> 
> 
> It would be nice to have a sequence data type. Correct me if I am wrong, 
> but It seems to me that a sequence data type would cost the same or less 
> than the current sequence implementation.
> 
> The schema would be more clear too. We could have a table like:
> 
> CREATE TABLE pair(
> group INT NOT NULL,
> element INT NOT NULL,
> seq SEQUENCE INCREMENT 1
>                START 1
>                CACHE 1
>                MINVALUE 1
>                MAXVALUE 99999999
>          NOT NULL,
> CONSTRAINT PRIMARY KEY (group, element)
> );
> 
> And then:
> 
> INSERT INTO event(group, element, code)
> VALUES (
>           1,
>           1,
>           ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE 
> p.group=1 and p.code=1 )
> );
> 
> Or perhaps storing all the sequences in the same table as rows will have 
> the same behavior.

If code is just something to show the sequence of insertion for group, 
element combinations then maybe something like below:

CREATE TABLE event(
group_id INT NOT NULL, --changed as group is reserved word
element INT NOT NULL,
insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
PRIMARY KEY(group_id, element, insert_ts)
);


insert into event(group_id, element) VALUES
(1, 1),
(1, 1),
(1, 1),
(2, 1),
(1, 1),
(1, 3),
(1, 1),
(1, 3),
(2, 1),
(2, 1);


select * from event ;
  group_id | element |           insert_ts
----------+---------+--------------------------------
         1 |       1 | 03/20/2020 09:51:12.675926 PDT
         1 |       1 | 03/20/2020 09:51:12.675985 PDT
         1 |       1 | 03/20/2020 09:51:12.675991 PDT
         2 |       1 | 03/20/2020 09:51:12.675996 PDT
         1 |       1 | 03/20/2020 09:51:12.676 PDT
         1 |       3 | 03/20/2020 09:51:12.676004 PDT
         1 |       1 | 03/20/2020 09:51:12.676008 PDT
         1 |       3 | 03/20/2020 09:51:12.676012 PDT
         2 |       1 | 03/20/2020 09:51:12.676016 PDT
         2 |       1 | 03/20/2020 09:51:12.67602 PDT
(10 rows)


select group_id, element, row_number() OVER (partition by (group_id, 
element) order by (group_id, element)) AS code from event;
  group_id | element | code
----------+---------+------
         1 |       1 |    1
         1 |       1 |    2
         1 |       1 |    3
         1 |       1 |    4
         1 |       1 |    5
         1 |       3 |    1
         1 |       3 |    2
         2 |       1 |    1
         2 |       1 |    2
         2 |       1 |    3
(10 rows)

> 
> Pablo
> 
> 
> On Thu, Mar 19, 2020 at 7:56 PM Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>     Michael Lewis <mlewis@entrata.com <mailto:mlewis@entrata.com>> writes:
>      > On Thu, Mar 19, 2020, 5:48 PM David G. Johnston
>     <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>>
>      > wrote:
>      >> However, one other consideration with sequences: do you care that
>      >> PostgreSQL will cache/pin (i.e., no release) every single
>     sequence you
>      >> touch for the lifetime of the session? (I do not think DISCARD
>     matters here
>      >> but I'm just guessing)
> 
>      > Would you expand on this point or is there someplace specific in the
>      > documentation on this?
> 
>     I think what David is worried about is that a sequence object is a
>     one-row table in PG's implementation.  Thus
> 
>     (1) each sequence requires a dozen or two rows in assorted system
>     catalogs (not sure exactly how many offhand).
> 
>     (2) each sequence uses up 8KB on disk for its table file.
> 
>     (3) each sequence you actually access within a particular session
>     results in creation of relcache and catcache entries in that
>     session's local memory.  I'm not sure offhand how big those are
>     either, but a few KB per sequence would be a reasonable guess.
> 
>     (4) each sequence competes for space in the shared-buffer arena,
>     since its 8K block has to be swapped into there whenever you try
>     to access/increment the sequence's value.
> 
>     This is all mighty inefficient of course, and there's been talk
>     of trying to reduce the per-sequence overhead; but I don't know
>     of anyone actively working on that.  As things stand, I think
>     having millions of sequences would be quite painful performance-
>     wise, especially if your workload were actively hitting a lot
>     of them concurrently.  It would work, for some value of "work",
>     but it wouldn't perform very well.
> 
>     Also, as multiple people mentioned already, this still wouldn't
>     guarantee gap-free sequences of ID values.
> 
>                              regards, tom lane
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/20/20 9:59 AM, Adrian Klaver wrote:
> On 3/19/20 10:31 PM, pabloa98 wrote:
>> I see.
>>
>> Any suggestion? It should behave like a sequence in the sense that 
>> concurrent transitions will get different numbers from this 
>> alternative sequence like solution.
>>
>> In our case, we will need to do a call nextval('some_seq') (or 
>> similar) from different processes no more than twice every minute.
>>
>>
>> It would be nice to have a sequence data type. Correct me if I am 
>> wrong, but It seems to me that a sequence data type would cost the 
>> same or less than the current sequence implementation.
>>
>> The schema would be more clear too. We could have a table like:
>>
>> CREATE TABLE pair(
>> group INT NOT NULL,
>> element INT NOT NULL,
>> seq SEQUENCE INCREMENT 1
>>                START 1
>>                CACHE 1
>>                MINVALUE 1
>>                MAXVALUE 99999999
>>          NOT NULL,
>> CONSTRAINT PRIMARY KEY (group, element)
>> );
>>
>> And then:
>>
>> INSERT INTO event(group, element, code)
>> VALUES (
>>           1,
>>           1,
>>           ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE 
>> p.group=1 and p.code=1 )
>> );
>>
>> Or perhaps storing all the sequences in the same table as rows will 
>> have the same behavior.
> 
> If code is just something to show the sequence of insertion for group, 
> element combinations then maybe something like below:
> 
> CREATE TABLE event(
> group_id INT NOT NULL, --changed as group is reserved word
> element INT NOT NULL,
> insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
> PRIMARY KEY(group_id, element, insert_ts)
> );
> 
> 
> insert into event(group_id, element) VALUES
> (1, 1),
> (1, 1),
> (1, 1),
> (2, 1),
> (1, 1),
> (1, 3),
> (1, 1),
> (1, 3),
> (2, 1),
> (2, 1);
> 
> 
> select * from event ;
>   group_id | element |           insert_ts
> ----------+---------+--------------------------------
>          1 |       1 | 03/20/2020 09:51:12.675926 PDT
>          1 |       1 | 03/20/2020 09:51:12.675985 PDT
>          1 |       1 | 03/20/2020 09:51:12.675991 PDT
>          2 |       1 | 03/20/2020 09:51:12.675996 PDT
>          1 |       1 | 03/20/2020 09:51:12.676 PDT
>          1 |       3 | 03/20/2020 09:51:12.676004 PDT
>          1 |       1 | 03/20/2020 09:51:12.676008 PDT
>          1 |       3 | 03/20/2020 09:51:12.676012 PDT
>          2 |       1 | 03/20/2020 09:51:12.676016 PDT
>          2 |       1 | 03/20/2020 09:51:12.67602 PDT
> (10 rows)
> 
> 
> select group_id, element, row_number() OVER (partition by (group_id, 
> element) order by (group_id, element)) AS code from event;
>   group_id | element | code
> ----------+---------+------
>          1 |       1 |    1
>          1 |       1 |    2
>          1 |       1 |    3
>          1 |       1 |    4
>          1 |       1 |    5
>          1 |       3 |    1
>          1 |       3 |    2
>          2 |       1 |    1
>          2 |       1 |    2
>          2 |       1 |    3
> (10 rows)

Oops the above actually returned:

select group_id, element, row_number() OVER (partition by (group_id, 
element) order by (group_id, element)) AS code, insert_ts from event;
  group_id | element | code |           insert_ts
----------+---------+------+--------------------------------
         1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
         1 |       1 |    2 | 03/20/2020 09:51:12.675985 PDT
         1 |       1 |    3 | 03/20/2020 09:51:12.675991 PDT
         1 |       1 |    4 | 03/20/2020 09:51:12.676008 PDT
         1 |       1 |    5 | 03/20/2020 09:51:12.676 PDT
         1 |       3 |    1 | 03/20/2020 09:51:12.676004 PDT
         1 |       3 |    2 | 03/20/2020 09:51:12.676012 PDT
         2 |       1 |    1 | 03/20/2020 09:51:12.67602 PDT
         2 |       1 |    2 | 03/20/2020 09:51:12.676016 PDT
         2 |       1 |    3 | 03/20/2020 09:51:12.675996 PDT
(10 rows)


Needs to be:

select group_id, element, row_number() OVER (partition by (group_id, 
element) order by (group_id, element, insert_ts)) AS code, insert_ts 
from event;
  group_id | element | code |           insert_ts
----------+---------+------+--------------------------------
         1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
         1 |       1 |    2 | 03/20/2020 09:51:12.675985 PDT
         1 |       1 |    3 | 03/20/2020 09:51:12.675991 PDT
         1 |       1 |    4 | 03/20/2020 09:51:12.676 PDT
         1 |       1 |    5 | 03/20/2020 09:51:12.676008 PDT
         1 |       3 |    1 | 03/20/2020 09:51:12.676004 PDT
         1 |       3 |    2 | 03/20/2020 09:51:12.676012 PDT
         2 |       1 |    1 | 03/20/2020 09:51:12.675996 PDT
         2 |       1 |    2 | 03/20/2020 09:51:12.676016 PDT
         2 |       1 |    3 | 03/20/2020 09:51:12.67602 PDT
(10 rows)

> 
>>
>> Pablo
>>
>>
>> On Thu, Mar 19, 2020 at 7:56 PM Tom Lane <tgl@sss.pgh.pa.us 
>> <mailto:tgl@sss.pgh.pa.us>> wrote:
>>
>>     Michael Lewis <mlewis@entrata.com <mailto:mlewis@entrata.com>> 
>> writes:
>>      > On Thu, Mar 19, 2020, 5:48 PM David G. Johnston
>>     <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>>
>>      > wrote:
>>      >> However, one other consideration with sequences: do you care that
>>      >> PostgreSQL will cache/pin (i.e., no release) every single
>>     sequence you
>>      >> touch for the lifetime of the session? (I do not think DISCARD
>>     matters here
>>      >> but I'm just guessing)
>>
>>      > Would you expand on this point or is there someplace specific 
>> in the
>>      > documentation on this?
>>
>>     I think what David is worried about is that a sequence object is a
>>     one-row table in PG's implementation.  Thus
>>
>>     (1) each sequence requires a dozen or two rows in assorted system
>>     catalogs (not sure exactly how many offhand).
>>
>>     (2) each sequence uses up 8KB on disk for its table file.
>>
>>     (3) each sequence you actually access within a particular session
>>     results in creation of relcache and catcache entries in that
>>     session's local memory.  I'm not sure offhand how big those are
>>     either, but a few KB per sequence would be a reasonable guess.
>>
>>     (4) each sequence competes for space in the shared-buffer arena,
>>     since its 8K block has to be swapped into there whenever you try
>>     to access/increment the sequence's value.
>>
>>     This is all mighty inefficient of course, and there's been talk
>>     of trying to reduce the per-sequence overhead; but I don't know
>>     of anyone actively working on that.  As things stand, I think
>>     having millions of sequences would be quite painful performance-
>>     wise, especially if your workload were actively hitting a lot
>>     of them concurrently.  It would work, for some value of "work",
>>     but it wouldn't perform very well.
>>
>>     Also, as multiple people mentioned already, this still wouldn't
>>     guarantee gap-free sequences of ID values.
>>
>>                              regards, tom lane
>>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Fri, Mar 20, 2020 at 5:39 AM rob stone <floriparob@gmail.com> wrote:
Hello,

On Thu, 2020-03-19 at 14:36 -0700, pabloa98 wrote:
> Hello,
>
> My schema requires a counter for each combination of 2 values.
> Something like:
>
> CREATE TABLE counter(
> group INT NOT NULL,
> element INT NOT NULL,
> seq_number INT NOT NULL default 0,
> CONSTRAINT PRIMARY KEY (group, element)
> );
>
> For each entry in counter, aka for each (group, element) pair, the
> model requires a seq_number.
>
> If I use a table "counter", I could still have counter collisions
> between 2 transactions. I need truly sequence behavior. Is that
> possible by using a table like "counter" table, where the counter
> could be increased out of the transaction so it performs as a
> sequence without having race conditions between concurrent
> transactions?
>
> The other option is to create sequences for each new pair of (group,
> element) using triggers. There are millions of pairs. So this
> approach will generate millions of sequences.
>
> How a PostgreSQL database would behave having millions of sequences
> in a schema? Would it degrade its performance? Is there any negative
> impact?
>
> Regards
>
> Pablo
>
>
>


Have you tried the following:-

BEGIN;
SELECT (seq_number + 1) FROM counter WHERE group = $1
AND element = $2 FOR UPDATE;

If that gives a NOTFOUND exception, then you need to insert into table
counter using a seq_number = 1, or update it with (seq_number + 1).
Then do the insert into whatever table needs the seq_number value.

If no errors detected, then COMMIT else ROLLBACK.
It is all in one transaction block.
We use this approach to assign invoice numbers.


We do that in other parts and with another set of tables. They have higher concurrency than the problem I bring to the list.
The issue in doing this is that the client and PostgreSQL server talks too much. Too many messages! Increasing latency.

We replaced it with a lock on the row so we can increase the counter and commit the transaction. The problem with this approach is that we are serializing access to this table and it has performance implications. It is basically a global serialization of an update on this table crossing all the servers on the cluster.


 
You could pre-seed table counter when a group/element pair is created
for the first time.

HTH,
Robert


Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Fri, Mar 20, 2020 at 10:26 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 3/20/20 9:59 AM, Adrian Klaver wrote:
> On 3/19/20 10:31 PM, pabloa98 wrote:
>> I see.
>>
>> Any suggestion? It should behave like a sequence in the sense that
>> concurrent transitions will get different numbers from this
>> alternative sequence like solution.
>>
>> In our case, we will need to do a call nextval('some_seq') (or
>> similar) from different processes no more than twice every minute.
>>
>>
>> It would be nice to have a sequence data type. Correct me if I am
>> wrong, but It seems to me that a sequence data type would cost the
>> same or less than the current sequence implementation.
>>
>> The schema would be more clear too. We could have a table like:
>>
>> CREATE TABLE pair(
>> group INT NOT NULL,
>> element INT NOT NULL,
>> seq SEQUENCE INCREMENT 1
>>                START 1
>>                CACHE 1
>>                MINVALUE 1
>>                MAXVALUE 99999999
>>          NOT NULL,
>> CONSTRAINT PRIMARY KEY (group, element)
>> );
>>
>> And then:
>>
>> INSERT INTO event(group, element, code)
>> VALUES (
>>           1,
>>           1,
>>           ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE
>> p.group=1 and p.code=1 )
>> );
>>
>> Or perhaps storing all the sequences in the same table as rows will
>> have the same behavior.
>
> If code is just something to show the sequence of insertion for group,
> element combinations then maybe something like below:
>
> CREATE TABLE event(
> group_id INT NOT NULL, --changed as group is reserved word
> element INT NOT NULL,
> insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
> PRIMARY KEY(group_id, element, insert_ts)
> );
>
>
> insert into event(group_id, element) VALUES
> (1, 1),
> (1, 1),
> (1, 1),
> (2, 1),
> (1, 1),
> (1, 3),
> (1, 1),
> (1, 3),
> (2, 1),
> (2, 1);
>
>
> select * from event ;
>   group_id | element |           insert_ts
> ----------+---------+--------------------------------
>          1 |       1 | 03/20/2020 09:51:12.675926 PDT
>          1 |       1 | 03/20/2020 09:51:12.675985 PDT
>          1 |       1 | 03/20/2020 09:51:12.675991 PDT
>          2 |       1 | 03/20/2020 09:51:12.675996 PDT
>          1 |       1 | 03/20/2020 09:51:12.676 PDT
>          1 |       3 | 03/20/2020 09:51:12.676004 PDT
>          1 |       1 | 03/20/2020 09:51:12.676008 PDT
>          1 |       3 | 03/20/2020 09:51:12.676012 PDT
>          2 |       1 | 03/20/2020 09:51:12.676016 PDT
>          2 |       1 | 03/20/2020 09:51:12.67602 PDT
> (10 rows)
>
>
> select group_id, element, row_number() OVER (partition by (group_id,
> element) order by (group_id, element)) AS code from event;
>   group_id | element | code
> ----------+---------+------
>          1 |       1 |    1
>          1 |       1 |    2
>          1 |       1 |    3
>          1 |       1 |    4
>          1 |       1 |    5
>          1 |       3 |    1
>          1 |       3 |    2
>          2 |       1 |    1
>          2 |       1 |    2
>          2 |       1 |    3
> (10 rows)

Oops the above actually returned:

select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element)) AS code, insert_ts from event;
  group_id | element | code |           insert_ts
----------+---------+------+--------------------------------
         1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
         1 |       1 |    2 | 03/20/2020 09:51:12.675985 PDT
         1 |       1 |    3 | 03/20/2020 09:51:12.675991 PDT
         1 |       1 |    4 | 03/20/2020 09:51:12.676008 PDT
         1 |       1 |    5 | 03/20/2020 09:51:12.676 PDT
         1 |       3 |    1 | 03/20/2020 09:51:12.676004 PDT
         1 |       3 |    2 | 03/20/2020 09:51:12.676012 PDT
         2 |       1 |    1 | 03/20/2020 09:51:12.67602 PDT
         2 |       1 |    2 | 03/20/2020 09:51:12.676016 PDT
         2 |       1 |    3 | 03/20/2020 09:51:12.675996 PDT
(10 rows)


Needs to be:

select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element, insert_ts)) AS code, insert_ts
from event;
  group_id | element | code |           insert_ts
----------+---------+------+--------------------------------
         1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
         1 |       1 |    2 | 03/20/2020 09:51:12.675985 PDT
         1 |       1 |    3 | 03/20/2020 09:51:12.675991 PDT
         1 |       1 |    4 | 03/20/2020 09:51:12.676 PDT
         1 |       1 |    5 | 03/20/2020 09:51:12.676008 PDT
         1 |       3 |    1 | 03/20/2020 09:51:12.676004 PDT
         1 |       3 |    2 | 03/20/2020 09:51:12.676012 PDT
         2 |       1 |    1 | 03/20/2020 09:51:12.675996 PDT
         2 |       1 |    2 | 03/20/2020 09:51:12.676016 PDT
         2 |       1 |    3 | 03/20/2020 09:51:12.67602 PDT
(10 rows)



Clever. :D
I will use it on other things.

The problem for this specific case is that if someone deletes a row, several codes will change. For this problem, codes should not be changed.


Re: Could postgres12 support millions of sequences? (like 10 million)

От
"Peter J. Holzer"
Дата:
On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> First, it sounds like you care about there being no gaps in the records you end
> up saving.  If that is the case then sequences will not work for you.

I think (but I would love to be proven wrong), that *nothing* will work
reliably, if

1) you need gapless numbers which are strictly allocated in sequence
2) you have transactions
3) you don't want to block

Rationale:

Regardless of how you get the next number, the following scenario is
always possible:

Session1: get next number
Session2: get next nummber
Session1: rollback
Session2: commit

At this point you have a gap.

If you can afford to block, I think a simple approach like

    create table s(id int, counter int);
    ...
    begin;
    ...
    update s set counter = counter + 1 where id = $whatever returning counter;
    -- use counter
    commit;

should work. But that effectively serializes your transactions and may
cause some to be aborted to prevent deadlocks.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:
> On Mar 20, 2020, at 4:59 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
> On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
>> First, it sounds like you care about there being no gaps in the records you end
>> up saving.  If that is the case then sequences will not work for you.
>
> I think (but I would love to be proven wrong), that *nothing* will work
> reliably, if
>
> 1) you need gapless numbers which are strictly allocated in sequence
> 2) you have transactions
> 3) you don't want to block
>
> Rationale:
>
> Regardless of how you get the next number, the following scenario is
> always possible:
>
> Session1: get next number
> Session2: get next nummber
> Session1: rollback
> Session2: commit
>
> At this point you have a gap.
>
> If you can afford to block, I think a simple approach like
>
>    create table s(id int, counter int);
>    ...
>    begin;
>    ...
>    update s set counter = counter + 1 where id = $whatever returning counter;
>    -- use counter
>    commit;
>
> should work. But that effectively serializes your transactions and may
> cause some to be aborted to prevent deadlocks.
>
>        hp

OP  has said small gaps are ok. To me that says the requirement is capricious but we haven’t heard the rationale for
therequirement yet (or I missed it) 

Aside: apologies for the empty message earlier




Re: Could postgres12 support millions of sequences? (like 10 million)

От
"Peter J. Holzer"
Дата:
On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:
> On Mar 20, 2020, at 4:59 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> >> First, it sounds like you care about there being no gaps in the records you end
> >> up saving.  If that is the case then sequences will not work for you.
> >
> > I think (but I would love to be proven wrong), that *nothing* will work
> > reliably, if
> >
> > 1) you need gapless numbers which are strictly allocated in sequence
> > 2) you have transactions
> > 3) you don't want to block
> >
> > Rationale:
> >
> > Regardless of how you get the next number, the following scenario is
> > always possible:
[...]
> > At this point you have a gap.
> >
> > If you can afford to block, I think a simple approach like
[...]
> > should work. But that effectively serializes your transactions and may
> > cause some to be aborted to prevent deadlocks.
>
> OP  has said small gaps are ok.

Yes. This wasn't a response to the OP's requirements, but to David's
(rather knee-jerk, IMHO) "don't use sequences" response. Very often the
requirements which would preclude sequences also preclude any other
solution.

(In the case of the OP's problem, I'd agree that sequences are probably
a bad idea for the reasons he anticipates)

> To me that says the requirement

Which requirement? The OP's or the one I posed here?

> is capricious but we haven’t heard the rationale for the requirement
> yet (or I missed it)

The OP gave a rationale: He has to fit the counter into an 8-digit
field, and a global counter would overflow that. So he needs per-element
counters.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:

> On Mar 20, 2020, at 5:29 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
> 
>>
>>
>> OP  has said small gaps are ok.
>
> Yes. This wasn't a response to the OP's requirements, but to David's
> (rather knee-jerk, IMHO) "don't use sequences" response. Very often the
> requirements which would preclude sequences also preclude any other
> solution.
>
> (In the case of the OP's problem, I'd agree that sequences are probably
> a bad idea for the reasons he anticipates)
>
>> To me that says the requirement
>
> Which requirement? The OP's or the one I posed here?
>
>> is capricious but we haven’t heard the rationale for the requirement
>> yet (or I missed it)
>
> The OP gave a rationale: He has to fit the counter into an 8-digit
> field, and a global counter would overflow that. So he needs per-element
> counters.
>
>        hp

Isn’t that the implementation of a req along the lines of “we want to number the occurrences of these pairs because
...”?
>
> --
>   _  | Peter J. Holzer    | Story must make more sense than reality.
> |_|_) |                    |
> | |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |       challenge!"



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/20/20 4:29 PM, Peter J. Holzer wrote:
> On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:
>> On Mar 20, 2020, at 4:59 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>>> On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
>>>> First, it sounds like you care about there being no gaps in the records you end
>>>> up saving.  If that is the case then sequences will not work for you.
>>>
>>> I think (but I would love to be proven wrong), that *nothing* will work
>>> reliably, if
>>>
>>> 1) you need gapless numbers which are strictly allocated in sequence
>>> 2) you have transactions
>>> 3) you don't want to block
>>>
>>> Rationale:
>>>
>>> Regardless of how you get the next number, the following scenario is
>>> always possible:
> [...]
>>> At this point you have a gap.
>>>
>>> If you can afford to block, I think a simple approach like
> [...]
>>> should work. But that effectively serializes your transactions and may
>>> cause some to be aborted to prevent deadlocks.
>>
>> OP  has said small gaps are ok.
> 
> Yes. This wasn't a response to the OP's requirements, but to David's
> (rather knee-jerk, IMHO) "don't use sequences" response. Very often the
> requirements which would preclude sequences also preclude any other
> solution.

I don't see  a knee-jerk reaction in this:

https://www.postgresql.org/message-id/CAKFQuwZ%3D%3Dri5_m2geFA-GPOdfnVggmJRu3zEi%2B1EwJdJA%3D9AeQ%40mail.gmail.com

The response was if you cared about gaps(not something the OP had 
specified at that point) then a sequence would not work. If not then 
they where something that would need testing to determine suitability.

> 
> (In the case of the OP's problem, I'd agree that sequences are probably
> a bad idea for the reasons he anticipates)
> 
>> To me that says the requirement
> 
> Which requirement? The OP's or the one I posed here?
> 
>> is capricious but we haven’t heard the rationale for the requirement
>> yet (or I missed it)

The requirement is that (group, element) pairs have a sequence 
number/code so:

1,1,1
1,1,2
1,1,3
2,2,1
2,2,2

> 
> The OP gave a rationale: He has to fit the counter into an 8-digit
> field, and a global counter would overflow that. So he needs per-element
> counters.

I must have missed that post. There was this(and alternates):

CREATE TABLE counter(
group INT NOT NULL,
element INT NOT NULL,
seq_number INT NOT NULL default 0,
CONSTRAINT PRIMARY KEY (group, element)
);

Nothing I saw that said int could not become bigint.


> 
>          hp
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/20/20 2:13 PM, pabloa98 wrote:
> 
> 
> On Fri, Mar 20, 2020 at 10:26 AM Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 3/20/20 9:59 AM, Adrian Klaver wrote:
>      > On 3/19/20 10:31 PM, pabloa98 wrote:
>      >> I see.
>      >>
>      >> Any suggestion? It should behave like a sequence in the sense that
>      >> concurrent transitions will get different numbers from this
>      >> alternative sequence like solution.
>      >>
>      >> In our case, we will need to do a call nextval('some_seq') (or
>      >> similar) from different processes no more than twice every minute.
>      >>
>      >>
>      >> It would be nice to have a sequence data type. Correct me if I am
>      >> wrong, but It seems to me that a sequence data type would cost the
>      >> same or less than the current sequence implementation.
>      >>
>      >> The schema would be more clear too. We could have a table like:
>      >>
>      >> CREATE TABLE pair(
>      >> group INT NOT NULL,
>      >> element INT NOT NULL,
>      >> seq SEQUENCE INCREMENT 1
>      >>                START 1
>      >>                CACHE 1
>      >>                MINVALUE 1
>      >>                MAXVALUE 99999999
>      >>          NOT NULL,
>      >> CONSTRAINT PRIMARY KEY (group, element)
>      >> );
>      >>
>      >> And then:
>      >>
>      >> INSERT INTO event(group, element, code)
>      >> VALUES (
>      >>           1,
>      >>           1,
>      >>           ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE
>      >> p.group=1 and p.code=1 )
>      >> );
>      >>
>      >> Or perhaps storing all the sequences in the same table as rows will
>      >> have the same behavior.
>      >
>      > If code is just something to show the sequence of insertion for
>     group,
>      > element combinations then maybe something like below:
>      >
>      > CREATE TABLE event(
>      > group_id INT NOT NULL, --changed as group is reserved word
>      > element INT NOT NULL,
>      > insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
>      > PRIMARY KEY(group_id, element, insert_ts)
>      > );
>      >
>      >
>      > insert into event(group_id, element) VALUES
>      > (1, 1),
>      > (1, 1),
>      > (1, 1),
>      > (2, 1),
>      > (1, 1),
>      > (1, 3),
>      > (1, 1),
>      > (1, 3),
>      > (2, 1),
>      > (2, 1);
>      >
>      >
>      > select * from event ;
>      >   group_id | element |           insert_ts
>      > ----------+---------+--------------------------------
>      >          1 |       1 | 03/20/2020 09:51:12.675926 PDT
>      >          1 |       1 | 03/20/2020 09:51:12.675985 PDT
>      >          1 |       1 | 03/20/2020 09:51:12.675991 PDT
>      >          2 |       1 | 03/20/2020 09:51:12.675996 PDT
>      >          1 |       1 | 03/20/2020 09:51:12.676 PDT
>      >          1 |       3 | 03/20/2020 09:51:12.676004 PDT
>      >          1 |       1 | 03/20/2020 09:51:12.676008 PDT
>      >          1 |       3 | 03/20/2020 09:51:12.676012 PDT
>      >          2 |       1 | 03/20/2020 09:51:12.676016 PDT
>      >          2 |       1 | 03/20/2020 09:51:12.67602 PDT
>      > (10 rows)
>      >
>      >
>      > select group_id, element, row_number() OVER (partition by (group_id,
>      > element) order by (group_id, element)) AS code from event;
>      >   group_id | element | code
>      > ----------+---------+------
>      >          1 |       1 |    1
>      >          1 |       1 |    2
>      >          1 |       1 |    3
>      >          1 |       1 |    4
>      >          1 |       1 |    5
>      >          1 |       3 |    1
>      >          1 |       3 |    2
>      >          2 |       1 |    1
>      >          2 |       1 |    2
>      >          2 |       1 |    3
>      > (10 rows)
> 
>     Oops the above actually returned:
> 
>     select group_id, element, row_number() OVER (partition by (group_id,
>     element) order by (group_id, element)) AS code, insert_ts from event;
>        group_id | element | code |           insert_ts
>     ----------+---------+------+--------------------------------
>               1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
>               1 |       1 |    2 | 03/20/2020 09:51:12.675985 PDT
>               1 |       1 |    3 | 03/20/2020 09:51:12.675991 PDT
>               1 |       1 |    4 | 03/20/2020 09:51:12.676008 PDT
>               1 |       1 |    5 | 03/20/2020 09:51:12.676 PDT
>               1 |       3 |    1 | 03/20/2020 09:51:12.676004 PDT
>               1 |       3 |    2 | 03/20/2020 09:51:12.676012 PDT
>               2 |       1 |    1 | 03/20/2020 09:51:12.67602 PDT
>               2 |       1 |    2 | 03/20/2020 09:51:12.676016 PDT
>               2 |       1 |    3 | 03/20/2020 09:51:12.675996 PDT
>     (10 rows)
> 
> 
>     Needs to be:
> 
>     select group_id, element, row_number() OVER (partition by (group_id,
>     element) order by (group_id, element, insert_ts)) AS code, insert_ts
>     from event;
>        group_id | element | code |           insert_ts
>     ----------+---------+------+--------------------------------
>               1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
>               1 |       1 |    2 | 03/20/2020 09:51:12.675985 PDT
>               1 |       1 |    3 | 03/20/2020 09:51:12.675991 PDT
>               1 |       1 |    4 | 03/20/2020 09:51:12.676 PDT
>               1 |       1 |    5 | 03/20/2020 09:51:12.676008 PDT
>               1 |       3 |    1 | 03/20/2020 09:51:12.676004 PDT
>               1 |       3 |    2 | 03/20/2020 09:51:12.676012 PDT
>               2 |       1 |    1 | 03/20/2020 09:51:12.675996 PDT
>               2 |       1 |    2 | 03/20/2020 09:51:12.676016 PDT
>               2 |       1 |    3 | 03/20/2020 09:51:12.67602 PDT
>     (10 rows)
> 
> 
> 
> Clever. :D
> I will use it on other things.
> 
> The problem for this specific case is that if someone deletes a row, 
> several codes will change. For this problem, codes should not be changed.

A new requirement.

To get a better response I would suggest posting a detailed model of 
what you are after and why?

> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Fri, Mar 20, 2020 at 3:59 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> First, it sounds like you care about there being no gaps in the records you end
> up saving.  If that is the case then sequences will not work for you.

I think (but I would love to be proven wrong), that *nothing* will work
reliably, if

1) you need gapless numbers which are strictly allocated in sequence
A little gap is acceptable. We cannot afford a 100 gap though.

2) you have transactions
3) you don't want to block

Rationale:

Regardless of how you get the next number, the following scenario is
always possible:

Session1: get next number
Session2: get next nummber
Session1: rollback
Session2: commit

At this point you have a gap.

If you can afford to block, I think a simple approach like

    create table s(id int, counter int);
    ...
    begin;
    ...
    update s set counter = counter + 1 where id = $whatever returning counter;
    -- use counter
    commit;

should work. But that effectively serializes your transactions and may
cause some to be aborted to prevent deadlocks.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:

Nothing I saw that said int could not become bigint.


My bad. The code cannot be a bigint. Or it could be a bigint between 1 to 99999999 :) 
I thought it was not important. The code could be a number from 1 to 99999999 (so an Int will be OK) assigned in order-ish. This is because of business requirements. The code should not change in case a row is deleted. That rules out using windowed functions. At least for this problem. There could be some gaps if they are not too big (similar to a serial column when a transaction is rolled back). We are more concerned about code generation collisions (for example 2 transactions calculating the same code value) than gaps. For each pair (group, element) the code, once assigned should remain constant. This is because the group, the element, and the code will become an id (business-related). So:

group, element, code = bid
1, 3, 1 = bid 131
2, 1, 1 = bid 211
etc

This calculated bid follows the convention described here and it is used as a reference in other tables. Therefore it should not change.
All this weirdness is because of business requirements. I am good with a classic serial column. But that will not map our business rules.

Regarding to the implementation of this. Our concern is to choose something not slow (it does not need to be super fast because there will be more operations in other tables using the same transaction) and I thought that creating/dropping sequences could be a solution. But I was not sure. I am not sure how it will behave with millions of sequences.

If there is another solution different than millions of sequences that do not block, generate few gaps (and those gaps are small) and never generate collisions then I will happily use it.

I hope I described the problem completely.

Pablo



Re: Could postgres12 support millions of sequences? (like 10 million)

От
"David G. Johnston"
Дата:
On Friday, March 20, 2020, pabloa98 <pabloa98@gmail.com> wrote:

If there is another solution different than millions of sequences that do not block, generate few gaps (and those gaps are small) and never generate collisions then I will happily use it.

You are going to have to lose the not blocking requirement, or at least define what blocking and non-blocking interactions look like.

What does it mean that bid# 123 exists?  Can you, by way of example, just create a table of all possible bid numbers and update a null column with a timestamp saying “this bid came into existence at this moment”?

David J.

Re: Could postgres12 support millions of sequences? (like 10 million)

От
John W Higgins
Дата:


On Fri, Mar 20, 2020 at 8:13 PM pabloa98 <pabloa98@gmail.com> wrote:

I hope I described the problem completely.


1) What is a group - does it exist prior to records being inserted? How many groups are you working with? How long do they live for? 
2) #1 but for element
3) How many records per second per group per element are you anticipating? Are their spikes?
4) How long does a group/element live for? You can put 10 record per second over 3 years and still be under 100 million records (the size of your seq_number)
5) How quickly do you need the seq_number - if you batch created them once a minute would that work? Does it need to be quicker than that? Slower than that? Or is it an immediate need?

That's a starting point.....

John W Higgins

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/20/20 8:13 PM, pabloa98 wrote:
> 
>     Nothing I saw that said int could not become bigint.
> 
> 
> My bad. The code cannot be a bigint. Or it could be a bigint between 1 
> to 99999999 :)


Aah, that was the counter Peter was talking about. I missed that.

As to below that is going to require more thought.

> I thought it was not important. The code could be a number from 1 to 
> 99999999 (so an Int will be OK) assigned in order-ish. This is because 
> of business requirements. The code should not change in case a row is 
> deleted. That rules out using windowed functions. At least for this 
> problem. There could be some gaps if they are not too big (similar to a 
> serial column when a transaction is rolled back). We are more concerned 
> about code generation collisions (for example 2 transactions calculating 
> the same code value) than gaps. For each pair (group, element) the code, 
> once assigned should remain constant. This is because the group, the 
> element, and the code will become an id (business-related). So:
> 
> group, element, code = bid
> 1, 3, 1 = bid 131
> 2, 1, 1 = bid 211
> etc
> 
> This calculated bid follows the convention described here and it is used 
> as a reference in other tables. Therefore it should not change.
> All this weirdness is because of business requirements. I am good with a 
> classic serial column. But that will not map our business rules.
> 
> Regarding to the implementation of this. Our concern is to choose 
> something not slow (it does not need to be super fast because there will 
> be more operations in other tables using the same transaction) and I 
> thought that creating/dropping sequences could be a solution. But I was 
> not sure. I am not sure how it will behave with millions of sequences.
> 
> If there is another solution different than millions of sequences that 
> do not block, generate few gaps (and those gaps are small) and never 
> generate collisions then I will happily use it.
> 
> I hope I described the problem completely.
> 
> Pablo
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:

> On Mar 21, 2020, at 10:47 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 3/20/20 8:13 PM, pabloa98 wrote:
>>    Nothing I saw that said int could not become bigint.
>> My bad. The code cannot be a bigint. Or it could be a bigint between 1 to 99999999 :)
>
>
> Aah, that was the counter Peter was talking about. I missed that.
>
> As to below that is going to require more thought.
>
Still no word on the actual requirement. As someone who believes consecutive numbers on digital invoices is simply a
mistakeninterpretation of the paper based system, I suspect a similar error here. But again we haven’t really heard,
faras I know. Something really fishy about 99999999.  
>



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Ron
Дата:
On 3/21/20 12:02 PM, Rob Sargent wrote:
>> On Mar 21, 2020, at 10:47 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 3/20/20 8:13 PM, pabloa98 wrote:
>>>     Nothing I saw that said int could not become bigint.
>>> My bad. The code cannot be a bigint. Or it could be a bigint between 1 to 99999999 :)
>> Aah, that was the counter Peter was talking about. I missed that.
>>
>> As to below that is going to require more thought.
>>
> Still no word on the actual requirement. As someone who believes consecutive numbers on digital invoices is simply a
mistakeninterpretation of the paper based system, I suspect a similar error here. But again we haven’t really heard,
faras I know. Something really fishy about 99999999.
 

Why?  "Print" and "screen" forms have all sorts of practical restrictions 
like this.

-- 
Angular momentum makes the world go 'round.



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:

> As to below that is going to require more thought.
>
Still no word on the actual requirement. As someone who believes consecutive numbers on digital invoices is simply a mistaken interpretation of the paper based system, I suspect a similar error here. But again we haven’t really heard, far as I know. Something really fishy about 99999999.
>
It is fishy. This is the thing. The code, element, and group is part of a bigger code called item identifier (or ii).

An item identifier is a result of:  code || group || element || qualifier (a constant like 55) || check digit coming from some check algorithm.

For example:

for a triplet (group, element, code) like (1, 3, 63) the item identifier (or ii) is: 630010003558  (the last 8 is the check digit).
This number is converted to a bigint and stored (and used as PK or FK on other tables, etc, etc).

In an item identifier the room is assigned like:

3 digits for group
4 digits for element
8 digits for code (not padded with 0s)
2 digits for qualifier
1 digit for the check digit.
-----------------------------
18 digits for item identifier.

And that is why we have 8 digits maximum for the code. So when a "code" is generated, it is important that there are no collisions, no big gaps (little gaps are OK, but because we cannot waste too many codes, keeping gaps small is important) and no locks.

The lock part is because we solved a similar problem with a counter by row locking the counter and increasing it in another part of the database. The result is that all the queries using that table are queued by pair (group, element) that is not that bad because we are not inserting thousands of rows by second. Still is killing cluster performance (but performance is still OK from the business point of view). The problem using locks is that they are too sensitive to developer errors and bugs. Sometimes connected clients aborts and the connection is returned to the pool with the lock active until the connection is closed or someone unlocks the row. I would prefer to have something more resilient to developers/programming errors, if possible.




Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:

On Fri, Mar 20, 2020 at 9:04 PM John W Higgins <wishdev@gmail.com> wrote:


On Fri, Mar 20, 2020 at 8:13 PM pabloa98 <pabloa98@gmail.com> wrote:

I hope I described the problem completely.


1) What is a group - does it exist prior to records being inserted? How many groups are you working with? How long do they live for? 
2) #1 but for element
Group and elements are 2 numbers assigned in another part of the business. They exist before codes are created. They will continue existing after each code is deleted. If they are deleted is because we are evicting all that branch of information from the database. When that happens everything goes using FK.

3) How many records per second per group per element are you anticipating? Are their spikes?
We are thinking about 100K each 2 to 3 months max. At business hours (8 hours each day).
 
4) How long does a group/element live for? You can put 10 record per second over 3 years and still be under 100 million records (the size of your seq_number)
That depends. Some group/element will leave seconds because they are used for testing. Others will live years.

5) How quickly do you need the seq_number - if you batch created them once a minute would that work? Does it need to be quicker than that? Slower than that? Or is it an immediate need?
We will need each code is created in less than 1 second (1 second is ok if that is the max time). Regarding to the pair group/element, it is ok if they take no more than 3 seconds top.



That's a starting point.....

John W Higgins

Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:

Why?  "Print" and "screen" forms have all sorts of practical restrictions
like this.

Legacy I guess. These are all digital stuff. But the final result is an identifier that people can read and realize what they are talking about.

Pablo

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:


On Mar 21, 2020, at 12:18 PM, pabloa98 <pabloa98@gmail.com> wrote:



Why?  "Print" and "screen" forms have all sorts of practical restrictions
like this.

Legacy I guess. These are all digital stuff. But the final result is an identifier that people can read and realize what they are talking about.


I think  the “why” was directed at my comment. 
I’m suspicious of the practicality. 

To me the description of the ID smacks of database-in-the-name folly. I recognize that OP is likely unable to take another path. I’ll not push this any further. 

Pablo

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"Peter J. Holzer"
Дата:
On 2020-03-20 17:53:11 -0700, Adrian Klaver wrote:
> On 3/20/20 4:29 PM, Peter J. Holzer wrote:
> > On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:
> > > On Mar 20, 2020, at 4:59 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> > > > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> > > > > First, it sounds like you care about there being no gaps in the records you end
> > > > > up saving.  If that is the case then sequences will not work for you.
> > > >
> > > > I think (but I would love to be proven wrong), that *nothing* will work
> > > > reliably, if
> > > >
> > > > 1) you need gapless numbers which are strictly allocated in sequence
> > > > 2) you have transactions
> > > > 3) you don't want to block
[...]
> > Yes. This wasn't a response to the OP's requirements, but to David's
> > (rather knee-jerk, IMHO) "don't use sequences" response. Very often the
> > requirements which would preclude sequences also preclude any other
> > solution.
>
> I don't see  a knee-jerk reaction in this:
>
> https://www.postgresql.org/message-id/CAKFQuwZ%3D%3Dri5_m2geFA-GPOdfnVggmJRu3zEi%2B1EwJdJA%3D9AeQ%40mail.gmail.com
>
> The response was if you cared about gaps(not something the OP had specified
> at that point) then a sequence would not work.

And I think that "care about gaps -> sequence doesn't work" is a
knee-jerk reaction. It's similar to "can't parse HTML with regexps".
True in the general case, and therefore people tend to blurt it out
every time the topic comes up. But not necessarily true in specific
cases. As I wrote above, there is no perfect solution - so you have to
think about the actual requirements and the consequences of various
solutions - and maybe using a sequence is the best (or least bad)
solution.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"Peter J. Holzer"
Дата:
On 2020-03-21 12:55:33 -0600, Rob Sargent wrote:
> To me the description of the ID smacks of database-in-the-name folly. I
> recognize that OP is likely unable to take another path. I’ll not push this any
> further.

Identifiers often have internal structure. In Austria for example, the
social security number contains the birth date. Invoice numbers, project
numbers or court case numbers often contain the year.

That's because they are used by *people*, and people like their
identifiers to make some kind of sense. The computer doesn't care.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Sat, Mar 21, 2020 at 12:08 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:


And I think that "care about gaps -> sequence doesn't work" is a
knee-jerk reaction. It's similar to "can't parse HTML with regexps".
True in the general case, and therefore people tend to blurt it out
every time the topic comes up. But not necessarily true in specific
cases. As I wrote above, there is no perfect solution - so you have to
think about the actual requirements and the consequences of various
solutions - and maybe using a sequence is the best (or least bad)
solution.


OK. In that case, I will proceed with the millions of sequences. We will see.

Anyway, It will be awesome if we have a sequence data type in a future version of postgresql. They will solve a lot of problems similar to this one.

Pablo

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Rob Sargent
Дата:

> On Mar 21, 2020, at 1:13 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>
> On 2020-03-21 12:55:33 -0600, Rob Sargent wrote:
>> To me the description of the ID smacks of database-in-the-name folly. I
>> recognize that OP is likely unable to take another path. I’ll not push this any
>> further.
>
> Identifiers often have internal structure. In Austria for example, the
> social security number contains the birth date. Invoice numbers, project
> numbers or court case numbers often contain the year.
>
> That's because they are used by *people*, and people like their
> identifiers to make some kind of sense. The computer doesn't care.

Since OP said this was digital not paper, I see this as a presentation problem bleeding into database design (assuming
yourresponse was an invitation to push further). 




Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/21/20 1:45 PM, pabloa98 wrote:
> 
> 
> On Sat, Mar 21, 2020 at 12:08 PM Peter J. Holzer <hjp-pgsql@hjp.at 
> <mailto:hjp-pgsql@hjp.at>> wrote:
> 
> 
> 
>     And I think that "care about gaps -> sequence doesn't work" is a
>     knee-jerk reaction. It's similar to "can't parse HTML with regexps".
>     True in the general case, and therefore people tend to blurt it out
>     every time the topic comes up. But not necessarily true in specific
>     cases. As I wrote above, there is no perfect solution - so you have to
>     think about the actual requirements and the consequences of various
>     solutions - and maybe using a sequence is the best (or least bad)
>     solution.
> 
> 
> OK. In that case, I will proceed with the millions of sequences. We will 
> see.
> 
> Anyway, It will be awesome if we have a sequence data type in a future 
> version of postgresql. They will solve a lot of problems similar to this 
> one.

Actually there are already two:

https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-SERIAL

https://www.postgresql.org/docs/12/sql-createtable.html
"
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]

     This clause creates the column as an identity column. It will have 
an implicit sequence attached to it and the column in new rows will 
automatically have values from the sequence assigned to it.
"

What you want is a built in method to pull from different sequences 
depending on the value of another column or values from multiple columns 
in the same table.


> 
> Pablo
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Sat, Mar 21, 2020 at 4:37 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> Anyway, It will be awesome if we have a sequence data type in a future
> version of postgresql. They will solve a lot of problems similar to this
> one.

Actually there are already two:

https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-SERIAL

https://www.postgresql.org/docs/12/sql-createtable.html
"
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]

     This clause creates the column as an identity column. It will have
an implicit sequence attached to it and the column in new rows will
automatically have values from the sequence assigned to it.
"

What you want is a built in method to pull from different sequences
depending on the value of another column or values from multiple columns
in the same table.

Exactly. That will work fine too.

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"Peter J. Holzer"
Дата:
On 2020-03-21 14:51:35 -0600, Rob Sargent wrote:
> > On Mar 21, 2020, at 1:13 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> >
> > On 2020-03-21 12:55:33 -0600, Rob Sargent wrote:
> >> To me the description of the ID smacks of database-in-the-name folly. I
> >> recognize that OP is likely unable to take another path. I’ll not push this any
> >> further.
> >
> > Identifiers often have internal structure. In Austria for example, the
> > social security number contains the birth date. Invoice numbers, project
> > numbers or court case numbers often contain the year.
> >
> > That's because they are used by *people*, and people like their
> > identifiers to make some kind of sense. The computer doesn't care.
>
> Since OP said this was digital not paper, I see this as a presentation
> problem bleeding into database design (assuming your response was an
> invitation to push further).

Well, that's the old natural primary key vs. surrogate primary key
debate.

(Trigger warning: You probably already heard that a gazillion times)

A primary key is an attribute (or a combination of attributes) which
unambiguosly identifies each entity.

If the entity in question already has such an atttribute (e.g. an SSN,
invoice number, username, ...) and you use that as the primary key, is
is called a natural primary key[1].

If you generate a new attribute not related to existing attributes it is
called a surrogate (primary) key.

If the entity already has an attribute suitable as a primary key, why
would you want to use another one?

* The key candidate may be unwieldy: It might be a combination of
  several attributes, it might be a very long string, etc.
* You may suspect that the key candidate is not in fact a key. There
  have been cases of duplicate SSNs in several countries, and of course
  not every person has an SSN.

Some people say you should always use surrogate keys, never natural
keys. Mostly because of the second reason, I think: When you design a
system you have limited knowledge, so you can never be absolutely sure
that what appears to be a key candidate actually is one.

OTOH, surrogate keys have downsides, too: The attributes which would
have been a natural key are still there and must be stored, input,
output, maybe generated, used in business rules, etc. So that's a
(usually modest) increase in application complexity. More importantly,
everywhere you would have had a (to a domain expert) meaningful key you
now have a meaningless key - which makes it harder to understand the
data and makes extra joins necessary. So to inspect the contents of a
table instead of a simple "select * from tablename" you need a join over
three or four tables.

To get back to the OP's problem: As I understand it he needs to generate
that compound identifier anyway, because it is used outside of the
application. He could use a surrogate key in addition to that, but I
don't see any advantage to that. Most importantly it doesn't solve the
problem he wants to discuss in this thread[2].

        hp


[1] But note that these attributes are almost always already artificial:
    Humans aren't born with an SSN - that is assigned by an authority to
    uniquely identify their clients; courts have sat for centuries
    without the need to number their cases; etc.

[2] I admit I haven't been very helpful in this regard, either, going
    off on tangents at every opportunity.

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"Peter J. Holzer"
Дата:
On 2020-03-21 13:45:54 -0700, pabloa98 wrote:
> On Sat, Mar 21, 2020 at 12:08 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
>     And I think that "care about gaps -> sequence doesn't work" is a
>     knee-jerk reaction. It's similar to "can't parse HTML with regexps".
>     True in the general case, and therefore people tend to blurt it out
>     every time the topic comes up. But not necessarily true in specific
>     cases. As I wrote above, there is no perfect solution - so you have to
>     think about the actual requirements and the consequences of various
>     solutions - and maybe using a sequence is the best (or least bad)
>     solution.
>
> OK. In that case, I will proceed with the millions of sequences. We will see.

Nonononononononono! (channelling Jean-Claude Laprie[1])

I was trying to say that in each case one should think about the pros
and cons of different approaches and be wary of dogmatic rules.

I wasn't recommending the use of sequences in your case. I would
actually be surprised if they work well for you.

But having said that - by all means try it. Maybe it does work, and
you'll give a talk about "The 50 million sequences database" at a
conference next year :-). Or maybe it doesn't, and then you'll have
learned something about where the actual limits are.

        hp

[1] I don't expect you to know him :-)


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/21/20 11:08 AM, pabloa98 wrote:
> 
>      > As to below that is going to require more thought.
>      >
>     Still no word on the actual requirement. As someone who believes
>     consecutive numbers on digital invoices is simply a mistaken
>     interpretation of the paper based system, I suspect a similar error
>     here. But again we haven’t really heard, far as I know. Something
>     really fishy about 99999999.
>      >
> 
> It is fishy. This is the thing. The code, element, and group is part of 
> a bigger code called item identifier (or ii).
> 
> An item identifier is a result of:  code || group || element || 
> qualifier (a constant like 55) || check digit coming from some check 
> algorithm.
> 
> For example:
> 
> for a triplet (group, element, code) like (1, 3, 63) the item identifier 
> (or ii) is: 630010003558  (the last 8 is the check digit).
> This number is converted to a bigint and stored (and used as PK or FK on 
> other tables, etc, etc).
> 
> In an item identifier the room is assigned like:

Revised, to make it match above identifier:

> 
 > 8 digits for code (not padded with 0s)
> 3 digits for group
> 4 digits for element
> 2 digits for qualifier
> 1 digit for the check digit.
> -----------------------------
> 18 digits for item identifier.
> 


So the question may actually be:

How do we improve our locking code, so we don't have to spawn millions 
of sequences?

What is the locking method you are using?


> The lock part is because we solved a similar problem with a counter by 
> row locking the counter and increasing it in another part of the 
> database. The result is that all the queries using that table are queued 
> by pair (group, element) that is not that bad because we are not 
> inserting thousands of rows by second. Still is killing cluster 
> performance (but performance is still OK from the business point of 
> view). The problem using locks is that they are too sensitive to 
> developer errors and bugs. Sometimes connected clients aborts and the 
> connection is returned to the pool with the lock active until the 
> connection is closed or someone unlocks the row. I would prefer to have 
> something more resilient to developers/programming errors, if possible.
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:

So the question may actually be:

How do we improve our locking code, so we don't have to spawn millions
of sequences?

What is the locking method you are using?

I am not using locking with the million sequence solution. I do not want something that locks because the problems described below

I prefer the solution generates a gap (skip a couple of numbers) and not using locks.



> The lock part is because we solved a similar problem with a counter by
> row locking the counter and increasing it in another part of the
> database. The result is that all the queries using that table are queued
> by pair (group, element) that is not that bad because we are not
> inserting thousands of rows by second. Still is killing cluster
> performance (but performance is still OK from the business point of
> view). The problem using locks is that they are too sensitive to
> developer errors and bugs. Sometimes connected clients aborts and the
> connection is returned to the pool with the lock active until the
> connection is closed or someone unlocks the row. I would prefer to have
> something more resilient to developers/programming errors, if possible.
>

Now I read this paragraph, I realize I was not clear enough.
I am saying we do not want to use locks because of all the problems described.
 

Re: Could postgres12 support millions of sequences? (like 10 million)

От
Adrian Klaver
Дата:
On 3/22/20 2:53 PM, pabloa98 wrote:
> 
>     So the question may actually be:
> 
>     How do we improve our locking code, so we don't have to spawn millions
>     of sequences?
> 
>     What is the locking method you are using?
> 
> 
> I am not using locking with the million sequence solution. I do not want 
> something that locks because the problems described below
> 
> I prefer the solution generates a gap (skip a couple of numbers) and not 
> using locks.
> 
> 
> 
>      > The lock part is because we solved a similar problem with a
>     counter by
>      > row locking the counter and increasing it in another part of the
>      > database. The result is that all the queries using that table are
>     queued
>      > by pair (group, element) that is not that bad because we are not
>      > inserting thousands of rows by second. Still is killing cluster
>      > performance (but performance is still OK from the business point of
>      > view). The problem using locks is that they are too sensitive to
>      > developer errors and bugs. Sometimes connected clients aborts and
>     the
>      > connection is returned to the pool with the lock active until the
>      > connection is closed or someone unlocks the row. I would prefer
>     to have
>      > something more resilient to developers/programming errors, if
>     possible.
>      >
> 
> Now I read this paragraph, I realize I was not clear enough.
> I am saying we do not want to use locks because of all the problems 
> described.

And what I was asking is what locking where you doing?

And it might be better to ask the list how to solve those problems, then 
to create a whole new set of problems by using millions of sequences.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:

> Now I read this paragraph, I realize I was not clear enough.
> I am saying we do not want to use locks because of all the problems
> described.

And what I was asking is what locking where you doing?

And it might be better to ask the list how to solve those problems, then
to create a whole new set of problems by using millions of sequences.

What we are doing in the **OTHER section** with row locks is basically:

BEGIN;
...
x = SELECT last_value + 1 FROM counter WHERE group = ... AND element = .... FOR UPDATE;
...

if everything is fine, then:
    UPDATE counter SET last_value = x WHERE group = ... AND element = .... FOR UPDATE;
    COMMIT;
else:
    ROLLBACK;



Re: Could postgres12 support millions of sequences? (like 10 million)

От
Christopher Browne
Дата:
On Sun, 22 Mar 2020 at 17:54, pabloa98 <pabloa98@gmail.com> wrote:

So the question may actually be:

How do we improve our locking code, so we don't have to spawn millions
of sequences?

What is the locking method you are using?

I am not using locking with the million sequence solution. I do not want something that locks because the problems described below

I prefer the solution generates a gap (skip a couple of numbers) and not using locks.

If you can cope with gaps, that's a good thing.  That means that some kind of usage of sequences would be compatible with your application.
 

> The lock part is because we solved a similar problem with a counter by
> row locking the counter and increasing it in another part of the
> database. The result is that all the queries using that table are queued
> by pair (group, element) that is not that bad because we are not
> inserting thousands of rows by second. Still is killing cluster
> performance (but performance is still OK from the business point of
> view). The problem using locks is that they are too sensitive to
> developer errors and bugs. Sometimes connected clients aborts and the
> connection is returned to the pool with the lock active until the
> connection is closed or someone unlocks the row. I would prefer to have
> something more resilient to developers/programming errors, if possible.
>

Now I read this paragraph, I realize I was not clear enough.
I am saying we do not want to use locks because of all the problems described.

Cool, that means you have been thinking through similar considerations to what others have in mind, and it doesn't sound like there are dramatically different understandings.

Let's circle back to the schema that you provided...

CREATE TABLE counter(
group INT NOT NULL,
element INT NOT NULL,
seq_number INT NOT NULL default 0,
CONSTRAINT PRIMARY KEY (group, element)
);

That's not a table using the sequential values; that's what you imagining you could have as a way of referencing the sequences, right?

I would represent it slightly differently...

create table counter (
   group integer not null,
   element integer not null,
   sequence_name name,
   primary key (group, element)
);
Arguably, there's no need for sequence_name altogether, as it's never directly referenced by anything.

And then have a function that might fire upon creation of new entries in this table.

create or replace function generate_sequence (i_group integer, i_element integer) returns name
as $$
declare
  c_seqname name;
  c_query text;
begin
   c_seqname := 'obj_counter_' || i_group || '_'  || i_element;
   c_query := 'create sequence if not exists ' || c_seqname || ';';
   execute c_query;
   update counter set sequence_name = c_seqname where group = i_group and element = i_element;
   return c_seqname;
end
$$ language plpgsql;

You'd need a trigger function to put onto the table that runs this function; that is left as an exercise for the reader.

Then, on any of the tables where you need to assign sequence values, you'd need to run an "after" trigger to do the assignment.  The function that finds the sequence value is kind of analagous:
create or replace function get_next_counter (i_group integer, i_element integer) returns integer -- or bigint?
as $$
declare
  c_seqname name;
  c_query text;
  c_seqval integer;
begin
   c_seqname := 'obj_counter_' || i_group || '_'  || i_element;
   c_query := 'select nextval(' || quote_ident( c_seqname_ || ');';
   execute c_query into c_seqval;
   return c_seqval;
end
$$ language plpgsql;

Again, that just grabs a nextval(); you'd need to execute this inside a trigger function called ON INSERT on any of the tables that need sequence values assigned.
That encapsulates the usage of this horde of sequences.  You're probably calling get_next_counter() millions of times, so perhaps that code gets expanded directly into place in the trigger function.

--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"David G. Johnston"
Дата:
On Sun, Mar 22, 2020 at 5:36 PM Christopher Browne <cbbrowne@gmail.com> wrote:

Then, on any of the tables where you need to assign sequence values, you'd need to run an "after" trigger to do the assignment.  The function that finds the sequence value is kind of analagous:
create or replace function get_next_counter (i_group integer, i_element integer) returns integer -- or bigint?
as $$
declare
  c_seqname name;
  c_query text;
  c_seqval integer;
begin
   c_seqname := 'obj_counter_' || i_group || '_'  || i_element;
   c_query := 'select nextval(' || quote_ident( c_seqname_ || ');';

or

c_query := format('select nextval(%I);', c_seqname);
You're probably calling get_next_counter() millions of times, so perhaps that code gets expanded directly into place in the trigger function.

not tested but something like:

execute format('select nextval("obj_counter_%s_%s");', i_group, i_element) into strict c_seqval;

or, more paranoidly:

execute format('select nextval(%I);', format('obj_counter_%s_%s', i_group, i_element)) into strict c_seqval;

David J.

Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Sun, Mar 22, 2020 at 5:36 PM Christopher Browne <cbbrowne@gmail.com> wrote:
On Sun, 22 Mar 2020 at 17:54, pabloa98 <pabloa98@gmail.com> wrote:

So the question may actually be:

How do we improve our locking code, so we don't have to spawn millions
of sequences?

What is the locking method you are using?

I am not using locking with the million sequence solution. I do not want something that locks because the problems described below

I prefer the solution generates a gap (skip a couple of numbers) and not using locks.

If you can cope with gaps, that's a good thing.  That means that some kind of usage of sequences would be compatible with your application.

Indeed
 
 

> The lock part is because we solved a similar problem with a counter by
> row locking the counter and increasing it in another part of the
> database. The result is that all the queries using that table are queued
> by pair (group, element) that is not that bad because we are not
> inserting thousands of rows by second. Still is killing cluster
> performance (but performance is still OK from the business point of
> view). The problem using locks is that they are too sensitive to
> developer errors and bugs. Sometimes connected clients aborts and the
> connection is returned to the pool with the lock active until the
> connection is closed or someone unlocks the row. I would prefer to have
> something more resilient to developers/programming errors, if possible.
>

Now I read this paragraph, I realize I was not clear enough.
I am saying we do not want to use locks because of all the problems described.

Cool, that means you have been thinking through similar considerations to what others have in mind, and it doesn't sound like there are dramatically different understandings.

Let's circle back to the schema that you provided...

CREATE TABLE counter(
group INT NOT NULL,
element INT NOT NULL,
seq_number INT NOT NULL default 0,
CONSTRAINT PRIMARY KEY (group, element)
);

That's not a table using the sequential values; that's what you imagining you could have as a way of referencing the sequences, right?

I would represent it slightly differently...

create table counter (
   group integer not null,
   element integer not null,
   sequence_name name,
   primary key (group, element)
);
Arguably, there's no need for sequence_name altogether, as it's never directly referenced by anything.

And then have a function that might fire upon creation of new entries in this table.

create or replace function generate_sequence (i_group integer, i_element integer) returns name
as $$
declare
  c_seqname name;
  c_query text;
begin
   c_seqname := 'obj_counter_' || i_group || '_'  || i_element;
   c_query := 'create sequence if not exists ' || c_seqname || ';';
   execute c_query;
   update counter set sequence_name = c_seqname where group = i_group and element = i_element;
   return c_seqname;
end
$$ language plpgsql;

You'd need a trigger function to put onto the table that runs this function; that is left as an exercise for the reader.

Then, on any of the tables where you need to assign sequence values, you'd need to run an "after" trigger to do the assignment.  The function that finds the sequence value is kind of analagous:
create or replace function get_next_counter (i_group integer, i_element integer) returns integer -- or bigint?
as $$
declare
  c_seqname name;
  c_query text;
  c_seqval integer;
begin
   c_seqname := 'obj_counter_' || i_group || '_'  || i_element;
   c_query := 'select nextval(' || quote_ident( c_seqname_ || ');';
   execute c_query into c_seqval;
   return c_seqval;
end
$$ language plpgsql;

Again, that just grabs a nextval(); you'd need to execute this inside a trigger function called ON INSERT on any of the tables that need sequence values assigned.
That encapsulates the usage of this horde of sequences.  You're probably calling get_next_counter() millions of times, so perhaps that code gets expanded directly into place in the trigger function.


I like this approach.
When I have a medium number of sequence I will report how it behaves. It will take some time though.

Pablo
 

Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Sun, Mar 22, 2020 at 6:58 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sun, Mar 22, 2020 at 5:36 PM Christopher Browne <cbbrowne@gmail.com> wrote:

Then, on any of the tables where you need to assign sequence values, you'd need to run an "after" trigger to do the assignment.  The function that finds the sequence value is kind of analagous:
create or replace function get_next_counter (i_group integer, i_element integer) returns integer -- or bigint?
as $$
declare
  c_seqname name;
  c_query text;
  c_seqval integer;
begin
   c_seqname := 'obj_counter_' || i_group || '_'  || i_element;
   c_query := 'select nextval(' || quote_ident( c_seqname_ || ');';

or

c_query := format('select nextval(%I);', c_seqname);
You're probably calling get_next_counter() millions of times, so perhaps that code gets expanded directly into place in the trigger function.

not tested but something like:

execute format('select nextval("obj_counter_%s_%s");', i_group, i_element) into strict c_seqval;

or, more paranoidly:

execute format('select nextval(%I);', format('obj_counter_%s_%s', i_group, i_element)) into strict c_seqval;

David J.


I will add this to the previous solution.

Thank you all for all the ideas and suggestions.

I hope there will be int he future sequence data type and support and optimizations of sequences in postgresql to deal with a lot of them. They will be very useful no only for me :)  but problems like monitoring and counting things by the zillions.

Pablo

Re: Could postgres12 support millions of sequences? (like 10 million)

От
"Daniel Verite"
Дата:
    pabloa98 wrote:

> When I have a medium number of sequence I will report how it behaves. It
> will take some time though.

Be aware that creating the sequences on the fly has the kind of race
condition that you wanted to avoid in the first place.

For instance consider this execution in two concurrent sessions:

S1: BEGIN;

S1: CREATE SEQUENCE seq1 IF NOT EXISTS;

S2:  BEGIN;

S2:  CREATE SEQUENCE seq1 IF NOT EXISTS;
S2:  (now blocked waiting for S1)

S1: COMMIT;

S2:    ERROR:  duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL : Key (typname, typnamespace)=(seq1, 36434) already exists.

The client could catch these errors and retry, but then it could also
do that with serializable transactions on serialization failures
(SQLSTATE 40001), and you'd get the guarantee of consecutive
numbering without creating all these sequences, not to mention
the protection against other potential concurrency anomalies.
See https://www.postgresql.org/docs/current/transaction-iso.html


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Could postgres12 support millions of sequences? (like 10 million)

От
pabloa98
Дата:


On Mon, Mar 23, 2020 at 9:58 AM Daniel Verite <daniel@manitou-mail.org> wrote:
        pabloa98 wrote:

> When I have a medium number of sequence I will report how it behaves. It
> will take some time though.

Be aware that creating the sequences on the fly has the kind of race
condition that you wanted to avoid in the first place.

For instance consider this execution in two concurrent sessions:

S1: BEGIN;

S1: CREATE SEQUENCE seq1 IF NOT EXISTS;

S2:  BEGIN;

S2:  CREATE SEQUENCE seq1 IF NOT EXISTS;
S2:  (now blocked waiting for S1)

S1: COMMIT;

S2:    ERROR:  duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL : Key (typname, typnamespace)=(seq1, 36434) already exists.

The client could catch these errors and retry, but then it could also
do that with serializable transactions on serialization failures
(SQLSTATE 40001), and you'd get the guarantee of consecutive
numbering without creating all these sequences, not to mention
the protection against other potential concurrency anomalies.
See https://www.postgresql.org/docs/current/transaction-iso.html


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Indeed. We have a "creation state" in the application. During that moment the pair(group,element) and the sequence will be created.
After the creation process is done, the entry is in "production state" and the sequence will be used only then.

Pablo