Обсуждение: Big table and partition

Поиск

Список

Период

Сортировка

Big table and partition

От

Rajesh Kumar

Дата:

14 октября 2023 г., 23:46:08

In my env, I have a table 30GB where data is getting inserted directly whenever the new block is created in block chain...

Sometimes, there is a query running to check whether the recent block is created ?

I have been asked to do partition, i said manager since we are not at all using the past data, partition is not required. Instead we can stop the sync (downtime not required), change the name of table , create a new table with same structure with original name and keep only tis month data. Then sync the blockchain so that new data will come to newly created table and old data will be there simply as a storage as we r not selecting any query from old data...

Note: on a longer run, data will keep on coming in whenever new block is created.

Is that right ? Any suggestions ? Better options ?

Re: Big table and partition

От

Ron

Дата:

15 октября 2023 г., 00:06:05

On 10/14/23 15:46, Rajesh Kumar wrote:
> Hi
>
> In my env, I have a table 30GB

That's not necessarily huge.

> where data is getting inserted directly whenever the new block is created 
> in block chain...

How many records for that 30GB?

How many days (or months)?

Will the rate of insertion increase?

What retention is required?

> Sometimes, there is a query running to check whether the recent block is 
> created ?
>
> I have been asked to do partition, i said manager since we are not at all 
> using the past data, partition is not required. Instead we can stop the 
> sync (downtime not required), change the name of table , create a new 
> table with same structure with original name and keep only tis month data. 
> Then sync the blockchain so that new data will come to newly created table 
> and old data will be there simply as a storage as we r not selecting any 
> query from old data...

I've been burned by inefficient queries when having to add a date column to 
the primary key.  We "departitioned" every table except the two that had 
large bytea columns and stored up to 200GB per month.

>
> Note: on a longer run, data will keep on coming in whenever new block is 
> created.
>
> Is that right ? Any suggestions ? Better options ?

In my experience, Postgresql deletes based on an indexed date field are 
pretty efficient.  Sometimes I delete a month of data at once, and sometimes 
one day of data at a time.

-- 
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От

Christophe Courtois

Дата:

15 октября 2023 г., 17:37:08

Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly 
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at 
> all using the past data, partition is not required. Instead we can stop 
> the sync (downtime not required), change the name of table , create a 
> new table with same structure with original name and keep only tis month 
> data. Then sync the blockchain so that new data will come to newly 
> created table and old data will be there simply as a storage as we r not 
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is 
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when 
do you store data? Will you purge the table? What is the main access key 
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former 
months again, partition by month, perhaps by year.

-- 
Christophe Courtois
Consultant DALIBO

Re: Big table and partition

От

Rajesh Kumar

Дата:

15 октября 2023 г., 23:46:24

The columns are id, blocknumber and jsondata.

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

Growth will be 10gb per year data.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:

Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

Re: Big table and partition

От

Ron

Дата:

16 октября 2023 г., 00:03:49

On 10/15/23 15:46, Rajesh Kumar wrote:

The columns are id, blocknumber and jsondata.

No date column?

How much json data?

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative. 30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

Growth will be 10gb per year data.

That's less than a gigabyte per month.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;. Do that every month.

On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От

JP Pozzi

Дата:

16 октября 2023 г., 00:13:47

Hello,

Partitionning is not so simple, it is better to have a key (or part of key) or a date to make

useful partitionning decision.

If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole

partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution

for better performance.

Regards

JP P

De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:

The columns are id, blocknumber and jsondata.

No date column?

How much json data?

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative. 30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

Growth will be 10gb per year data.

That's less than a gigabyte per month.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;. Do that every month.

On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От

Rajesh Kumar

Дата:

16 октября 2023 г., 04:53:36

Timestamp is in jsonb. Why I am hesitant for partition is , there are only 2 queries running 1) insert block no (no where condition used)

2) check whether that column is inserted using select query, whic is the recently inserted into row(max blockno)...

This is automatic.

Can I show them some stats from pg_stat_statement that it doesn't impact performance already

On Mon, 16 Oct, 2023, 2:37 AM Ron, <ronljohnsonjr@gmail.com> wrote:

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative. 30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

Growth will be 10gb per year data.

That's less than a gigabyte per month.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;. Do that every month.

On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От

Rajesh Kumar

Дата:

16 октября 2023 г., 16:09:34

As said, we have timetamp in jsonb, incase if we want to partition.

How do I convince them that partition is not needed for this table?

On Mon, 16 Oct, 2023, 1:38 PM JP Pozzi, <jpp@jppozzi.dyndns.org> wrote:

Hello,

Partitionning is not so simple, it is better to have a key (or part of key) or a date to make
useful partitionning decision.
If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole
partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution
for better performance.

Regards

JP P

De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative. 30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

Growth will be 10gb per year data.

That's less than a gigabyte per month.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;. Do that every month.

On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От

Ron

Дата:

16 октября 2023 г., 16:33:03

On 10/16/23 08:09, Rajesh Kumar wrote:

As said, we have timetamp in jsonb, incase if we want to partition.

That's not going to work. The timestamp must be it's own column.

How do I convince them that partition is not needed for this table?

By demonstrating that DELETE WHERE id < nnnnnnn; is Fast Enough. (And remember what your mother told you about not stuffing your mouth full of food: delete in bite-sized chunks. There's no need to delete half the records in one statement.)

On Mon, 16 Oct, 2023, 1:38 PM JP Pozzi, <jpp@jppozzi.dyndns.org> wrote:
Hello,

Partitionning is not so simple, it is better to have a key (or part of key) or a date to make
useful partitionning decision.
If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole
partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution
for better performance.

Regards

JP P

De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative. 30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

Growth will be 10gb per year data.

That's less than a gigabyte per month.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;. Do that every month.

On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

--
Born in Arizona, moved to Babylonia.

Re: Big table and partition

От

Rajesh Kumar

Дата:

16 октября 2023 г., 17:45:39

I was able to create partition using jsonb column.

And I want to show them thre is no performance related issues in this table so I could convince them(i belive so). What all things (data) i can show them like mean execution time from pg_stat_statements, no cpu bottleneck , no longg running queries in this table ....and what more I can tell dem?

On Mon, 16 Oct, 2023, 7:05 PM Ron, <ronljohnsonjr@gmail.com> wrote:

On 10/16/23 08:09, Rajesh Kumar wrote:
As said, we have timetamp in jsonb, incase if we want to partition.

That's not going to work. The timestamp must be it's own column.

How do I convince them that partition is not needed for this table?

By demonstrating that DELETE WHERE id < nnnnnnn; is Fast Enough. (And remember what your mother told you about not stuffing your mouth full of food: delete in bite-sized chunks. There's no need to delete half the records in one statement.)

On Mon, 16 Oct, 2023, 1:38 PM JP Pozzi, <jpp@jppozzi.dyndns.org> wrote:
Hello,

Partitionning is not so simple, it is better to have a key (or part of key) or a date to make
useful partitionning decision.
If you want to suppress (or archive and suppress) some data it is simple to "drop" a whole
partition. Also if the partitionning key (or part ok key) is in most queries it is a good solution
for better performance.

Regards

JP P

De: "Ron" <ronljohnsonjr@gmail.com>
À: "pgsql-admin" <pgsql-admin@lists.postgresql.org>
Envoyé: Dimanche 15 Octobre 2023 23:03:49
Objet: Re: Big table and partition

On 10/15/23 15:46, Rajesh Kumar wrote:
The columns are id, blocknumber and jsondata.

No date column?

How much json data?

I have not been told anything ABT data retention and I will ask them.

Problem I am trying to resolve is, 30gb is big

Big is relative. 30GB in 2023 is pretty darned small.

and it will be every growing always. By doing this, i am making the scan easier (despite having indexes).

Growth will be 10gb per year data.

That's less than a gigabyte per month.

Incase I am doing partition, Plz provide any script that does partitioning automatically. We are using postgres version 15.

Keep It Simple, Stanley and just DELETE WHERE id < nnnnnnn;. Do that every month.

On Sun, 15 Oct, 2023, 8:07 PM Christophe Courtois, <christophe.courtois@dalibo.com> wrote:
Hi,

Le 14/10/2023 à 22:46, Rajesh Kumar a écrit :
> In my env, I have a table 30GB where data is getting inserted directly
> whenever the new block is created in block chain...
> I have been asked to do partition, i said manager since we are not at
> all using the past data, partition is not required. Instead we can stop
> the sync (downtime not required), change the name of table , create a
> new table with same structure with original name and keep only tis month
> data. Then sync the blockchain so that new data will come to newly
> created table and old data will be there simply as a storage as we r not
> selecting any query from old data...

In fact, you want to do manual partitioning.
Why reinvent the wheel?

> Note: on a longer run, data will keep on coming in whenever new block is
> created.
> Is that right ? Any suggestions ? Better options ?

What problem do you have to solve? 30 GB is not so big, but since when
do you store data? Will you purge the table? What is the main access key
of your queries (critical for the partitioning key)?

Without other information I'd say that if you never read the former
months again, partition by month, perhaps by year.

--
Christophe Courtois
Consultant DALIBO

--
Born in Arizona, moved to Babylonia.

--
Born in Arizona, moved to Babylonia.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Big table and partition