Обсуждение: Optimal database table optimization method

Поиск
Список
Период
Сортировка

Optimal database table optimization method

От
Roger Tannous
Дата:

Hello,

I have a database table that is growing too big (few hundred million rows) that needs to be optimized, but before I get into partitioning it, I thought I'd ask about suggestions.

Here is the usage:

     0 . Table contains about 10 columns of length about 20 bytes each.

  1. INSERTS are performed at a rate of hundreds of times per second.

  2. SELECT statements are performed based on column 'a' (where a='xxxx' ) a few times per hour.

  3. DELETE statements are performed based on a DATE column. (delete where date older than 1 year) usually once per day.


The key requirement is to speed up INSERT and SELECT statements, and be able to keep history data of 1 year back without locking the whole table down while deleting.

I would guess that I must have two indexes, one for column 'a', and the other for the date field. or is it possible to optimize both ?

Will there be a necessary trade-off between speed on select and speed of delete?

Is partitioning the only solution ? What are good strategies for partitioning such table?

I'm using a PostgreSQL 8.4 database.


Best Regards,

Roger Tannous.

Re: Optimal database table optimization method

От
John R Pierce
Дата:
Roger Tannous wrote:
>
> Hello,
>
> I have a database table that is growing too big (few hundred million
> rows) that needs to be optimized, but before I get into partitioning
> it, I thought I'd ask about suggestions.
>
> Here is the usage:
>
>      0 . Table contains about 10 columns of length about 20 bytes each.
>
>   1.
>
>       INSERTS are performed at a rate of hundreds of times per second.
>
>   2.
>
>       SELECT statements are performed based on column 'a' (where
>       a='xxxx' ) a few times per hour.
>
>   3.
>
>       DELETE statements are performed based on a DATE column. (delete
>       where date older than 1 year) usually once per day.
>
>   4.
>
>
> The key requirement is to speed up INSERT and SELECT statements, and
> be able to keep history data of 1 year back without locking the whole
> table down while deleting.
>
> I would guess that I must have two indexes, one for column 'a', and
> the other for the date field. or is it possible to optimize both ?
>
> Will there be a necessary trade-off between speed on select and speed
> of delete?
>
> Is partitioning the only solution ? What are good strategies for
> partitioning such table?
>
> I'm using a PostgreSQL 8.4 database.
>

we partition similar tables by date, typically by month.   this way you
can simply drop an old month table after updating the triggers to put
new data into a new month table.

.

Re: Optimal database table optimization method

От
Adrian von Bidder
Дата:
Heyho!


Disclaimer: I don't have much experience with big databases, so this is
based on my understanding of the theory.

On Thursday 04 March 2010 20.32:46 Roger Tannous wrote:
> I have a database table that is growing too big (few hundred million
>  rows) that needs to be optimized, but before I get into partitioning it,
>  I thought I'd ask about suggestions.
>
> Here is the usage:
>
>      0 . Table contains about 10 columns of length about 20 bytes each.
>
>    1.  INSERTS are performed at a rate of hundreds of times per second.
>    2.  SELECT statements are performed based on column 'a' (where a='xxxx'
>    3.  DELETE statements are performed based on a DATE column. (delete
>    where date older than 1 year) usually once per day.

>
> The key requirement is to speed up INSERT and SELECT statements, and be
>  able to keep history data of 1 year back without locking the whole table
>  down while deleting.

How are your INSERTs coming in?  In batches, from multiple sources?

> I would guess that I must have two indexes, one for column 'a', and the
> other for the date field. or is it possible to optimize both ?

Sounds reasonable, but see below.

> Will there be a necessary trade-off between speed on select and speed of
> delete?
>
> Is partitioning the only solution ?

How far did you already optimize your set up?  WAL on a separate disk
(perhaps even solid state?), index and data separated?

> What are good strategies for
> partitioning such table?

With partitioning by date, you might be able to drop the index on the date
(if not needed for queries) because to just delete the oldest partition (and
creating a new one) each day the "delete" part doesn't involve any queries,
you just drop the table.

If you can make the INSERT logic smart so that it inserts into the partition
directly (instead of selecting the partition via trigger/rule), INSERT speed
presumably isn't slower than without partitioning.

I have no idea how good pg is with select queries over 365 partitions (1
year).  OTOH: perhaps, with a date based partitioning, you can cache the
result of the select queries and add just the results of the newest data, so
the SELECT would onnly need to access the recent data?  (Obviously depends
on the nature of these queries...)

cheers
-- vbi

--
Verbing weirds language.
        -- Calvin & Hobbes

Вложения

Re: Optimal database table optimization method

От
"Dann Corbit"
Дата:

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Roger Tannous
Sent: Thursday, March 04, 2010 11:33 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Optimal database table optimization method

 

Hello,

I have a database table that is growing too big (few hundred million rows) that needs to be optimized, but before I get into partitioning it, I thought I'd ask about suggestions.

Here is the usage:

     0 . Table contains about 10 columns of length about 20 bytes each.

1.   INSERTS are performed at a rate of hundreds of times per second.

2.   SELECT statements are performed based on column 'a' (where a='xxxx' ) a few times per hour.

3.   DELETE statements are performed based on a DATE column. (delete where date older than 1 year) usually once per day.

  1.  

The key requirement is to speed up INSERT and SELECT statements, and be able to keep history data of 1 year back without locking the whole table down while deleting.

I would guess that I must have two indexes, one for column 'a', and the other for the date field. or is it possible to optimize both ?

Will there be a necessary trade-off between speed on select and speed of delete?

Is partitioning the only solution ? What are good strategies for partitioning such table?

I'm using a PostgreSQL 8.4 database.

>> 

INSERT statements are slowed down by adding indexes.  Both the SELECT and DELETE statements should speed up by adding the indexes.

I suggest adding the indexes in a test environment to see if the changes are beneficial.

<< 

Re: Optimal database table optimization method

От
John R Pierce
Дата:
> I have no idea how good pg is with select queries over 365 partitions (1
> year).

no need to use a partition per day, I'm pretty sure he won't mind having
a few extra days or weeks.   I'd go by month, and drop a whole month at
a time, that way there's only 12-13 partitions.

and doing this, there's no need for a date index if he's never searching
by date, just an index on the primary key.   yes, the selects will have
to search all 12-13 tables, but SELECT ...  WHERE pk=?;  are very fast
searches, so doing this on 12 tables is pretty painless.