Re: two memory-consuming postgres processes

Поиск
Список
Период
Сортировка
От PFC
Тема Re: two memory-consuming postgres processes
Дата
Msg-id op.uajp1iotcigqcu@apollo13.peufeu.com
обсуждение исходный текст
Ответ на Re: two memory-consuming postgres processes  (Alexy Khrabrov <deliverable@gmail.com>)
Ответы Re: two memory-consuming postgres processes  (Matthew Wakeling <matthew@flymine.org>)
Список pgsql-performance
> I created several indices for the primary table, yes.  Sure I can do a
> table for a volatile column, but then I'll have to create a new such
> table for each derived column -- that's why I tried to add a column to
> the existing table.  Yet seeing this is really slow, and I need to to
> many derived analyses like this -- which are later scanned in other
> computations, so should persist -- I indeed see no other way but to
> procreate derived tables with the same key, one column per each...

    OK, so in that case, if you could do all of your derived column
calculations in one query like this :

CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived
calculations here)

    or :

BEGIN;    <-- this is important to avoid writing xlog
CREATE TABLE derived AS ...
INSERT INTO derived SELECT ... FROM ... (perform all your derived
calculations here)
COMMIT;

    Basically, updating the entire table several times to add a few simple
columns is a bad idea. If you can compute all the data you need in one
query, like above, it will be much faster. Especially if you join one
large table to several smaller ones, and as long as the huge data set
doesn't need to be sorted (check the query plan using EXPLAIN). Try to do
as much as possible in one query to scan the large dataset only once.

    Note that the above will be faster than updating the entire table since
it needs to write much less data : it doesn't need to delete the old rows,
and it doesn't need to write the transaction log, since if the transaction
rolls back, the table never existed anyway. Also since your newly created
table doesn't have any indexes, they won't need to be updated.

    If you really need to update an entire table multiple times, you will
need to :

    - Use hardware that can handle disk writes at a decent speed (that isn't
a characteristic of a laptop drive)
    - use MyIsam, yes (but if you need to make complex queries on the data
afterwards, it could suck).


В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: two memory-consuming postgres processes
Следующее
От: Ernesto
Дата:
Сообщение: Very slow INFORMATION_SCHEMA