Обсуждение: Trigger concurrent execution

Поиск
Список
Период
Сортировка

Trigger concurrent execution

От
Blagoj Petrushev
Дата:
Hi,

I'm thinking of an extension to trigger functionality like this:

CREATE TRIGGER trigger_name   AFTER event   ON table   CONCURRENTLY EXECUTE PROCEDURE trigger_fc

This would call the trigger after the end of the transaction.

The following is a use-case, please tell me if I'm doing it wrong.

I have a big table with big text column article and a nullable
tsvector column fts_article. On each insert or update that changes the
article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
daemon listener, I catch the notification and update fts_article
accordingly with my_fts_fc(article). The reason I don't do this
directly in my trigger is because my_fts_fc is slow for big articles,
fts_article has a gin index, and also, on heavy load, my listener can
do these updates concurrently. Now, with a concurrent execution of
triggers, I can just call my_fts_fc inside the trigger instead of the
notify roundtrip.

Kind regards,
Blagoj Petrushev



Re: Trigger concurrent execution

От
David G Johnston
Дата:
Blagoj Petrushev wrote
> Hi,
> 
> I'm thinking of an extension to trigger functionality like this:
> 
> CREATE TRIGGER trigger_name
>     AFTER event
>     ON table
>     CONCURRENTLY EXECUTE PROCEDURE trigger_fc
> 
> This would call the trigger after the end of the transaction.
> 
> The following is a use-case, please tell me if I'm doing it wrong.
> 
> I have a big table with big text column article and a nullable
> tsvector column fts_article. On each insert or update that changes the
> article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
> daemon listener, I catch the notification and update fts_article
> accordingly with my_fts_fc(article). The reason I don't do this
> directly in my trigger is because my_fts_fc is slow for big articles,
> fts_article has a gin index, and also, on heavy load, my listener can
> do these updates concurrently. Now, with a concurrent execution of
> triggers, I can just call my_fts_fc inside the trigger instead of the
> notify roundtrip.

Conceptually, trigger actions run in-transaction and can cause it to
ROLLBACK; so how would "after the end of the transaction" work?  Since the
easy way is to have COMMIT; block until all the AFTER event concurrent
triggers fire I presume you would want something more like a task queue for
background workers where, at commit, the function call is in place in a FIFO
queue and the calling session is allowed to move onto other activity.

It is not clear what you mean by "my listener can do these updates
concurrently"? Concurrently with each other or concurrently with other DML
action on table?    I assume you have multiple listeners since the potential
rate of insert of the documents is likely much greater than the rate of
update/indexing.

Also, it would seem you'd typically want the GIN index to be updated once
the corresponding transaction committed and makes the rest of the data
available.  Or does your use case allow for some delay between the article
being in the database physically and it being available in the index?

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Trigger-concurrent-execution-tp5804158p5804164.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Trigger concurrent execution

От
Craig Ringer
Дата:
On 05/16/2014 08:06 AM, Blagoj Petrushev wrote:
> Hi,
> 
> I'm thinking of an extension to trigger functionality like this:
> 
> CREATE TRIGGER trigger_name
>     AFTER event
>     ON table
>     CONCURRENTLY EXECUTE PROCEDURE trigger_fc
> 
> This would call the trigger after the end of the transaction.

If "after the end of the transaction" is what you mean by
"concurrently", then that's the wrong word to choose.

"AFTER COMMIT" ?

The concept of running a trigger "concurrently" just doesn't make sense
in PostgreSQL, because the backend is single threaded. You wouldn't be
able to run any SQL commands until the trigger finished.

It isn't possible to do anything useful without a transaction, so
PostgreSQL would need to start a transaction for the trigger and commit
the transaction at the end, as if you'd run SELECT my_procedure();.
Because it's outside the scope of the transaction it probably wouldn't
be possible to do FOR EACH ROW with a NEW and OLD var, unless you
stashed them as materialized rows in the queue of pending "AFTER COMMIT"
triggers.

Finally, because it's after transaction commit, you couldn't easily
guarantee that the trigger would really run. If the backend crashed /
the server was shut down  / etc after the commit but before your trigger
finished, you'd have a committed transaction but the trigger would not
run. To fix that you'd need to somehow make the trigger queue WAL-logged
and run it during replay, which from my rather limited understanding of
this area would be ... "interesting" to do. It'd also mean the trigger
couldn't have any session context.

This isn't easy, if it's practical at all.

> I have a big table with big text column article and a nullable
> tsvector column fts_article. On each insert or update that changes the
> article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
> daemon listener, I catch the notification and update fts_article
> accordingly with my_fts_fc(article). The reason I don't do this
> directly in my trigger is because my_fts_fc is slow for big articles,
> fts_article has a gin index, and also, on heavy load, my listener can
> do these updates concurrently. Now, with a concurrent execution of
> triggers, I can just call my_fts_fc inside the trigger instead of the
> notify roundtrip.

I don't think that really fits.

It seems like you want to run the trigger procedure in the background on
another back-end. That'd be quite cool, but also not trivial to do,
especially if you wanted to guarantee that it happened reliably and in a
crash-safe manner.



-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Trigger concurrent execution

От
Blagoj Petrushev
Дата:
Hi,

Thanks all for being patient, apparently I didn't quite understand the
norms of trigger execution.

On 16 May 2014 07:55, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 05/16/2014 08:06 AM, Blagoj Petrushev wrote:
>> Hi,
>>
>> I'm thinking of an extension to trigger functionality like this:
>>
>> CREATE TRIGGER trigger_name
>>     AFTER event
>>     ON table
>>     CONCURRENTLY EXECUTE PROCEDURE trigger_fc
>>
>> This would call the trigger after the end of the transaction.
>
> If "after the end of the transaction" is what you mean by
> "concurrently", then that's the wrong word to choose.
>
> "AFTER COMMIT" ?

You're right, 'concurrently' is the wrong word.

>
> The concept of running a trigger "concurrently" just doesn't make sense
> in PostgreSQL, because the backend is single threaded. You wouldn't be
> able to run any SQL commands until the trigger finished.
>
> It isn't possible to do anything useful without a transaction, so
> PostgreSQL would need to start a transaction for the trigger and commit
> the transaction at the end, as if you'd run SELECT my_procedure();.
> Because it's outside the scope of the transaction it probably wouldn't
> be possible to do FOR EACH ROW with a NEW and OLD var,

Right. Didn't think of this.

> unless you
> stashed them as materialized rows in the queue of pending "AFTER COMMIT"
> triggers.
>
> Finally, because it's after transaction commit, you couldn't easily
> guarantee that the trigger would really run. If the backend crashed /
> the server was shut down  / etc after the commit but before your trigger
> finished, you'd have a committed transaction but the trigger would not
> run. To fix that you'd need to somehow make the trigger queue WAL-logged
> and run it during replay, which from my rather limited understanding of
> this area would be ... "interesting" to do. It'd also mean the trigger
> couldn't have any session context.
>
> This isn't easy, if it's practical at all.
>
>> I have a big table with big text column article and a nullable
>> tsvector column fts_article. On each insert or update that changes the
>> article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
>> daemon listener, I catch the notification and update fts_article
>> accordingly with my_fts_fc(article). The reason I don't do this
>> directly in my trigger is because my_fts_fc is slow for big articles,
>> fts_article has a gin index, and also, on heavy load, my listener can
>> do these updates concurrently. Now, with a concurrent execution of
>> triggers, I can just call my_fts_fc inside the trigger instead of the
>> notify roundtrip.
>
> I don't think that really fits.
>
> It seems like you want to run the trigger procedure in the background on
> another back-end. That'd be quite cool, but also not trivial to do,
> especially if you wanted to guarantee that it happened reliably and in a
> crash-safe manner.
>
>
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

I'll also try to reply to David G Johnston answer here, since I didn't
actually get the email.

"""Conceptually, trigger actions run in-transaction and can cause it to
ROLLBACK; so how would "after the end of the transaction" work?  Since the
easy way is to have COMMIT; block until all the AFTER event concurrent
triggers fire I presume you would want something more like a task queue for
background workers where, at commit, the function call is in place in a FIFO
queue and the calling session is allowed to move onto other activity."""

As I see, for my problem, it would be great if there's a way to put a
function call in a queue in a background worker. I don't know how to
do this, however. But, the crash handling and NEW/OLD vars passing
would remain a problem nonetheless.

"""It is not clear what you mean by "my listener can do these updates
concurrently"? Concurrently with each other or concurrently with other DML
action on table?    I assume you have multiple listeners since the potential
rate of insert of the documents is likely much greater than the rate of
update/indexing."""

I meant concurrently with additional inserts to the table, as well as
the fact that my listener is able to receive new notifications while
updating some record.

"""Also, it would seem you'd typically want the GIN index to be updated once
the corresponding transaction committed and makes the rest of the data
available.  Or does your use case allow for some delay between the article
being in the database physically and it being available in the index?"""

The latter, namely, the search feature can 'lag' a few seconds after
the content update.

Thanks,
Blagoj Petrushev