Обсуждение: Big table and partition

Поиск
Список
Период
Сортировка

Big table and partition

От
Rajesh Kumar
Дата:
Hi 

In my env, I have a table 30GB where data is getting inserted directly whenever the new block is created in block chain...

Sometimes, there is a query running to check whether the recent block is created ?

I have been asked to do partition, i said manager since we are not at all using the past data, partition is not required. Instead we can stop the sync (downtime not required), change the name of table , create a new table with same structure with original name and keep only tis month data. Then sync the blockchain so that new data will come to newly created table and old data will be there simply as a storage as we r not selecting any query from old data...

Note: on a longer run, data will keep on coming in whenever new block is created.

Is that right ? Any suggestions ? Better options ?



Re: Big table and partition

От
Ron
Дата:
On 10/14/23 15:46, Rajesh Kumar wrote:
> Hi
>
> In my env, I have a table 30GB

That's not necessarily huge.

> where data is getting inserted directly whenever the new block is created 
> in block chain...

How many records for that 30GB?

How many days (or months)?

Will the rate of insertion increase?

What retention is required?

> Sometimes, there is a query running to check whether the recent block is 
> created ?
>
> I have been asked to do partition, i said manager since we are not at all 
> using the past data, partition is not required. Instead we can stop the 
> sync (downtime not required), change the name of table , create a new 
> table with same structure with original name and keep only tis month data. 
> Then sync the blockchain so that new data will come to newly created table 
> and old data will be there simply as a storage as we r not selecting any 
> query from old data...

I've been burned by inefficient queries when having to add a date column to 
the primary key.  We "departitioned" every table except the two that had 
large bytea columns and stored up to 200GB per month.

>
> Note: on a longer run, data will keep on coming in whenever new block is 
> created.
>
> Is that right ? Any suggestions ? Better options ?

In my experience, Postgresql deletes based on an indexed date field are 
pretty efficient.  Sometimes I delete a month of data at once, and sometimes 
one day of data at a time.


-- 
Born in Arizona, moved to Babylonia.



Re: Big table and partition

От
Christophe Courtois
Дата:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly 
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at 
> all using the past data, partition is not required. Instead we can stop 
> the sync (downtime not required), change the name of table , create a 
> new table with same structure with original name and keep only tis month 
> data. Then sync the blockchain so that new data will come to newly 
> created table and old data will be there simply as a storage as we r not 
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is 
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when 
do you store data? Will you purge the table? What is the main access key 
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former 
months again, partition by month, perhaps by year.

-- 
Christophe Courtois
Consultant DALIBO



Re: Big table and partition

От
Rajesh Kumar
Дата:
The columns are id, blocknumber and jsondata. 

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

 Growth will be 10gb per year data.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.





On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

Re: Big table and partition

От
Ron
Дата:
On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?


I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative.  30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

 Growth will be 10gb per year data.

That's less than a gigabyte per month.


Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;.  Do that every month.


On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От
JP Pozzi
Дата:
Hello,

Partitionning is not so simple, it is better to have a key (or part of key)  or a date to make
useful partitionning decision.
If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole
partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution
for better performance.

Regards

JP P


De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?


I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative.  30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

 Growth will be 10gb per year data.

That's less than a gigabyte per month.


Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;.  Do that every month.


On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От
Rajesh Kumar
Дата:
Timestamp is in jsonb. Why I am hesitant for partition is ,  there are only 2 queries running 1) insert block no (no where condition used)
2) check whether that column is inserted using select query, whic is the recently inserted into row(max blockno)...

This is automatic.


Can I show them some stats from pg_stat_statement that it doesn't impact performance already 

On Mon, 16 Oct, 2023, 2:37 AM Ron, <ronljohnsonjr@gmail.com> wrote:
On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?


I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative.  30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

 Growth will be 10gb per year data.

That's less than a gigabyte per month.


Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;.  Do that every month.


On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От
Rajesh Kumar
Дата:
As said, we have timetamp in jsonb, incase if we want to partition.

How do I convince them that partition is not needed for this table?



On Mon, 16 Oct, 2023, 1:38 PM JP Pozzi, <jpp@jppozzi.dyndns.org> wrote:
Hello,

Partitionning is not so simple, it is better to have a key (or part of key)  or a date to make
useful partitionning decision.
If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole
partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution
for better performance.

Regards

JP P


De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?


I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative.  30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

 Growth will be 10gb per year data.

That's less than a gigabyte per month.


Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;.  Do that every month.


On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От
Ron
Дата:
On 10/16/23 08:09, Rajesh Kumar wrote:
As said, we have timetamp in jsonb, incase if we want to partition.

That's not going to work.  The timestamp must be it's own column.


How do I convince them that partition is not needed for this table?

By demonstrating that DELETE WHERE id < nnnnnnn; is Fast Enough.  (And remember what your mother told you about not stuffing your mouth full of food: delete in bite-sized chunks.  There's no need to delete half the records in one statement.)


On Mon, 16 Oct, 2023, 1:38 PM JP Pozzi, <jpp@jppozzi.dyndns.org> wrote:
Hello,

Partitionning is not so simple, it is better to have a key (or part of key)  or a date to make
useful partitionning decision.
If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole
partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution
for better performance.

Regards

JP P


De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?


I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative.  30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

 Growth will be 10gb per year data.

That's less than a gigabyte per month.


Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;.  Do that every month.


On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.


--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От
Rajesh Kumar
Дата:
I was able to create partition using jsonb column.

And I want to show them thre is no performance related issues in this table so I could convince them(i belive so). What all things (data) i can show them like mean execution time from pg_stat_statements, no cpu bottleneck , no longg running queries in this table ....and what more I can tell dem?

On Mon, 16 Oct, 2023, 7:05 PM Ron, <ronljohnsonjr@gmail.com> wrote:
On 10/16/23 08:09, Rajesh Kumar wrote:
As said, we have timetamp in jsonb, incase if we want to partition.

That's not going to work.  The timestamp must be it's own column.


How do I convince them that partition is not needed for this table?

By demonstrating that DELETE WHERE id < nnnnnnn; is Fast Enough.  (And remember what your mother told you about not stuffing your mouth full of food: delete in bite-sized chunks.  There's no need to delete half the records in one statement.)


On Mon, 16 Oct, 2023, 1:38 PM JP Pozzi, <jpp@jppozzi.dyndns.org> wrote:
Hello,

Partitionning is not so simple, it is better to have a key (or part of key)  or a date to make
useful partitionning decision.
If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole
partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution
for better performance.

Regards

JP P


De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?


I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative.  30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

 Growth will be 10gb per year data.

That's less than a gigabyte per month.


Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;.  Do that every month.


On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.


--
Born in Arizona, moved to Babylonia.