Re: count(*) performance improvement ideas

Поиск
Список
Период
Сортировка
От Stephen Denne
Тема Re: count(*) performance improvement ideas
Дата
Msg-id F0238EBA67824444BC1CB4700960CB4804EACCFC@dmpeints002.isotach.com
обсуждение исходный текст
Ответ на Re: count(*) performance improvement ideas  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Ответы Re: count(*) performance improvement ideas  ("Stephen Denne" <Stephen.Denne@datamail.co.nz>)
Список pgsql-hackers
Pavan Deolasee wrote
> On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne
> <Stephen.Denne@datamail.co.nz> wrote:
>
> >
> >  Pavan also refers to deferred triggers, which has got me
> thinking about another possible solution:
> >
> >  Instead of inserting a delta row, that will be updated a
> lot of times, create an on commit drop temp table named after
> the txid and the grouping value (which is an integer in my
> case, perhaps hash it if you're grouping by something that
> doesn't easily convert to part of a table name),
> >  create an after insert initially deferred constraint to
> call a function which will move the 'at commit' values of the
> rows in the temp table to the summary table.
> >  The temp table will only have one row inserted, updated
> many times, then on commit the trigger is fired once, and the
> temp table is dropped.
> >
> >  Does anyone think this will or won't work for some reason?
>
>
> I think this should work, although you may need to take some
> extra steps
> to manage the summary table. Also, I think a single temp
> table per transaction
> should suffice. The temp table would have one row per "group
> by" or "where"
> condition on which you want to track the count. The
> corresponding row will
> be updated as and when the corresponding count changes. You would need
> INSERT/DELETE/UPDATE triggers to do that. If there are any
> subtransaction
> aborts, that will be taken care by MVCC.

Thanks for that. I had gone ahead and tried out the idea, and it was working 'ok'.

Using one table per transaction has the benefit of less temp tables (but the same number of triggers waiting to run).
Italso removes the grouping key from the table name. 

I was using a single table per grouping key, with a single updated row in it.
The benefit was simpler queries, and I could create an ON INSERT trigger that would be triggered only once when the
temptable was created, and a 'zero' row was inserted, thereby separating the setup of the trigger from the maintenance
ofthe delta. 

I haven't explored the transactional implications of updating vs inserting delta rows in the summary table at the time
oftransaction commit. The code below updates the summary table, which I think could lead to a large delay or deadlocks
ifthere are other styles of updates on that table (other than on-commit triggers)? 

I also hadn't considered sub-transactions.

Below is a cut-down version of what I tried out.

I was summarizing more than just the number of documents shown below, I was storing a sum, and two maximums of
timestamps(using the 'greatest' function for aggregating each record). These were extra fields in both the summary
tableand the temp tables. 
This is able to be made more generic by changing get_temp_table_name() to take an additional couple of parameters
specifyingthe name of the function to run at commit, and an identifying key (eg 'doc_count' vs 'doc_size_sum'), or
perhapsincluding the delta value too, (which looks like it would simplify the triggers on the tables whose changes we
wishto summarize, except that it doesn't cater for storing greatest or least aggregates.) 

I took a wild guess at a way of finding out whether the temp table already exists:not exists(select tablename from
pg_catalog.pg_tableswhere tablename=temp_delta_txid_group) 
Is there a better/safer way?

Here's part of the code I've got at the moment (edited here to cut it down to the core example, so it may contain
errors):

CREATE TABLE doc_type_summary ( document_type_id integer NOT NULL, documents bigint NOT NULL DEFAULT 0, CONSTRAINT
pk_doc_type_summaryPRIMARY KEY (document_type_id) 
);

CREATE OR REPLACE FUNCTION process_delta() RETURNS TRIGGER AS
$$  BEGIN     EXECUTE 'UPDATE doc_type_summary set documents=doc_type_summary.documents+d.documents from ' ||
TG_ARGV[0]|| ' as d where document_type_id = ' || TG_ARGV[1];     RETURN NULL;  END; 
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION get_temp_table_name(document_type_id integer)
RETURNS text AS
$$
DECLARE  temp_delta_txid_group text;
BEGIN  temp_delta_txid_group := 'temp_delta_' || txid_current() || '_' || document_type_id;  IF not exists(select
tablenamefrom pg_catalog.pg_tables where tablename=temp_delta_txid_group) THEN     EXECUTE 'CREATE TEMP TABLE ' ||
temp_delta_txid_group|| '(documents bigint NOT NULL DEFAULT 0) ON COMMIT DROP';     EXECUTE 'CREATE CONSTRAINT TRIGGER
'|| temp_delta_txid_group || '_trig AFTER INSERT ON ' || temp_delta_txid_group || ' DEFERRABLE INITIALLY DEFERRED FOR
EACHROW EXECUTE PROCEDURE process_delta ("' || temp_delta_txid_group || '", ' || document_type_id || ')';     EXECUTE
'INSERTINTO ' || temp_delta_txid_group || ' DEFAULT VALUES';  END IF;  RETURN temp_delta_txid_group; 
END;
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION document_count_trig()
RETURNS TRIGGER AS
$$
DECLARE  temp_delta_txid_group text;
BEGIN  IF TG_OP = 'INSERT' THEN     temp_delta_txid_group := get_temp_table_name(NEW.document_type_id);     EXECUTE
'UPDATE' || temp_delta_txid_group || ' set documents=documents+1';     RETURN NEW;  ELSIF TG_OP = 'DELETE' THEN
temp_delta_txid_group:= get_temp_table_name(OLD.document_type_id);     EXECUTE 'UPDATE ' || temp_delta_txid_group || '
setdocuments=documents-1';     RETURN OLD;  END IF; 
END;
$$
LANGUAGE 'plpgsql';

--
Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 
__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality             Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Recovery of Multi-stage WAL actions
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: xlogdump