Обсуждение: Backfill bgworker Extension?

Поиск
Список
Период
Сортировка

Backfill bgworker Extension?

От
Jeremy Finzel
Дата:
One of our challenges we have is that our engineers have written frameworks to backfill data in several different DSLs, and every time they adopt a new language, they maybe need to write another one.

To be clear, what I mean is batch updating a large set of data in small pieces so as to avoid things like lock contention and replication lags.  Sometimes these have a driving table that has the source data to update in a destination table based on a key column, but sometimes it is something like setting just a single specific value for a huge table.

I would love instead to have a Postgres extension that uses postgres background workers to accomplish this, especially if it were part of core.  Before I venture into exploring writing something like this as an extension, would this ever be considered something appropriate as an extension in Postgres core?  Would that be appropriate?

Thanks,
Jeremy

Re: Backfill bgworker Extension?

От
Peter Eisentraut
Дата:
On 12/12/17 13:03, Jeremy Finzel wrote:
> To be clear, what I mean is batch updating a large set of data in small
> pieces so as to avoid things like lock contention and replication lags. 
> Sometimes these have a driving table that has the source data to update
> in a destination table based on a key column, but sometimes it is
> something like setting just a single specific value for a huge table.
> 
> I would love instead to have a Postgres extension that uses postgres
> background workers to accomplish this, especially if it were part of
> core.  Before I venture into exploring writing something like this as an
> extension, would this ever be considered something appropriate as an
> extension in Postgres core?  Would that be appropriate?

I don't see what the common ground between different variants of this
use case would be.  Aren't you basically just looking to execute a
use-case-specific stored procedure in the background?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Backfill bgworker Extension?

От
Jeremy Finzel
Дата:
On Tue, Dec 12, 2017 at 2:26 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 12/12/17 13:03, Jeremy Finzel wrote:
> To be clear, what I mean is batch updating a large set of data in small
> pieces so as to avoid things like lock contention and replication lags. 
> Sometimes these have a driving table that has the source data to update
> in a destination table based on a key column, but sometimes it is
> something like setting just a single specific value for a huge table.
>
> I would love instead to have a Postgres extension that uses postgres
> background workers to accomplish this, especially if it were part of
> core.  Before I venture into exploring writing something like this as an
> extension, would this ever be considered something appropriate as an
> extension in Postgres core?  Would that be appropriate?

I don't see what the common ground between different variants of this
use case would be.  Aren't you basically just looking to execute a
use-case-specific stored procedure in the background?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

The common ground is some column in some table needs to be bulk updated. I may not be explaining well, but in our environment we have done hundreds of these using a generic framework to build a backfill. So I’m not sure what you are questioning about the need? We have had to build a worker to accomplish this because it can’t be done as a sql script alone.

I’m not sure what you mean by a stored procedure in the background. Since it would not be a single transaction, it doesn’t fit as a stored procedure at least in Postgres when a function is 1 transaction.

Sorry if I’m misunderstanding.

Thanks,
Jeremy 


Re: Backfill bgworker Extension?

От
Peter Eisentraut
Дата:
On 12/15/17 23:50, Jeremy Finzel wrote:
> The common ground is some column in some table needs to be bulk updated.
> I may not be explaining well, but in our environment we have done
> hundreds of these using a generic framework to build a backfill. So I’m
> not sure what you are questioning about the need? We have had to build a
> worker to accomplish this because it can’t be done as a sql script alone.

I'm trying to identify the independently useful pieces in your use case.
 A background worker to backfill large tables is a very specific use
case.  If instead we had a job/scheduler mechanism and a way to have
server-side scripts that can control transactions, then that might
satisfy your requirements as well (I'm not sure), but it would also
potentially address many other uses.

> I’m not sure what you mean by a stored procedure in the background.
> Since it would not be a single transaction, it doesn’t fit as a stored
> procedure at least in Postgres when a function is 1 transaction.

In progress: https://commitfest.postgresql.org/16/1360/

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Backfill bgworker Extension?

От
Jeremy Finzel
Дата:
On Sat, Dec 16, 2017 at 8:31 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 12/15/17 23:50, Jeremy Finzel wrote:
> The common ground is some column in some table needs to be bulk updated.
> I may not be explaining well, but in our environment we have done
> hundreds of these using a generic framework to build a backfill. So I’m
> not sure what you are questioning about the need? We have had to build a
> worker to accomplish this because it can’t be done as a sql script alone.

I'm trying to identify the independently useful pieces in your use case.
 A background worker to backfill large tables is a very specific use
case.  If instead we had a job/scheduler mechanism and a way to have
server-side scripts that can control transactions, then that might
satisfy your requirements as well (I'm not sure), but it would also
potentially address many other uses.

I follow you now.  Yes, I think it probably would.  I think it would at least provide a framework on which to build the tool I want.  It would be great to have a "worker-capable" tool inside postgres than always having to write external logic to do things like this.

> I’m not sure what you mean by a stored procedure in the background.
> Since it would not be a single transaction, it doesn’t fit as a stored
> procedure at least in Postgres when a function is 1 transaction.

In progress: https://commitfest.postgresql.org/16/1360/

Looking forward to this.  I think this will help.  A stored procedure with subtransactions could have the logic for the backfill in it, but would still need an external worker that could retry it in case of failure especially when things like a server restart happens.

Thanks,
Jeremy