Обсуждение: Commit every N rows in PL/pgsql

Поиск

Список

Период

Сортировка

Commit every N rows in PL/pgsql

От

Len Walter

Дата:

26 мая 2010 г., 06:27:34

Hi,

I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.

In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps a counter and commits every 10000 rows (pseudocode):

define cursor curs as select col_a from t

while fetch_from_cursor(curs) into a

update t set col_c = col_a + col_b where col_a = a

i++

if i > 10000

commit; i=0;

end if;

commit;

PL/pgsql doesn't allow that because it doesn't support nested transactions.
Is there an equivalent Postgres way of doing this?

cheers,

Len
--
len.walter@gmail.com skype:lenwalter msn:len.walter@gmail.com

Re: Commit every N rows in PL/pgsql

От

Jaime Casanova

Дата:

26 мая 2010 г., 07:36:53

On Wed, May 26, 2010 at 1:27 AM, Len Walter <len.walter@gmail.com> wrote:
>
> PL/pgsql doesn't allow that because it doesn't support nested transactions.
> Is there an equivalent Postgres way of doing this?

what about this?

create function f() returns void as $$
declare
    r record;
begin
   for r in select col_a from t loop
       update t set col_c = col_a + col_b where col_a = r.a;
   end loop;
   return;
end;
$$ language plpgsql;

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

Re: Commit every N rows in PL/pgsql

От

Pavel Stehule

Дата:

26 мая 2010 г., 08:47:33

Hello

it is useless in PostgreSQL - it isn't important if you commit one or
billion updated rows. PostgreSQL has different implementation of
transactions, so some Oracle's issues are not here.

Regards
Pavel Stehule

2010/5/26 Len Walter <len.walter@gmail.com>:
> Hi,
> I need to populate a new column in a Postgres 8.3 table. The SQL would be
> something like "update t set col_c = col_a + col_b". Unfortunately, this
> table has 110 million rows, so running that query runs out of memory.
> In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps
> a counter and commits every 10000 rows (pseudocode):
> define cursor curs as select col_a from t
> while fetch_from_cursor(curs) into a
>      update t set col_c = col_a + col_b where col_a = a
>      i++
>      if i > 10000
>          commit; i=0;
>      end if;
>      commit;
> PL/pgsql doesn't allow that because it doesn't support nested transactions.
> Is there an equivalent Postgres way of doing this?
> cheers,
> Len
> --
> len.walter@gmail.com skype:lenwalter  msn:len.walter@gmail.com
>

Re: Commit every N rows in PL/pgsql

От

Alban Hertroys

Дата:

26 мая 2010 г., 08:57:14

On 26 May 2010, at 8:27, Len Walter wrote:

> Hi,
>
> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory. 

That's unusual, what is the error you get?

Your table will grow (on disk) to twice the size it had previously, as new rows will be created for the transaction
you'rerunning the update from, but it will shrink again with vacuuming and usage. So you may run out of disk space, but
/never/out of memory. If you do then you probably have configured Postgres to use more memory than you have. 

> In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps a counter and commits every 10000 rows
(pseudocode):
>
> define cursor curs as select col_a from t
> while fetch_from_cursor(curs) into a
>      update t set col_c = col_a + col_b where col_a = a
>      i++
>      if i > 10000
>          commit; i=0;
>      end if;
>      commit;
>
> PL/pgsql doesn't allow that because it doesn't support nested transactions.
> Is there an equivalent Postgres way of doing this?

Nested transactions wouldn't solve the problem, as the rows you "commit" here still aren't allowed to be visible to
othertransactions and so both versions of the rows need to be kept around until the outer transaction commits. It's not
goingto save you any space. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4bfce26010413711619512!

Re: Commit every N rows in PL/pgsql

От

hubert depesz lubaczewski

Дата:

26 мая 2010 г., 10:42:16

On Wed, May 26, 2010 at 04:27:22PM +1000, Len Walter wrote:
> Hi,
>
> I need to populate a new column in a Postgres 8.3 table. The SQL would be
> something like "update t set col_c = col_a + col_b". Unfortunately, this
> table has 110 million rows, so running that query runs out of memory.
> In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps
> a counter and commits every 10000 rows (pseudocode):
>
> define cursor curs as select col_a from t
> while fetch_from_cursor(curs) into a
>      update t set col_c = col_a + col_b where col_a = a
>      i++
>      if i > 10000
>          commit; i=0;
>      end if;
>      commit;

you can't do it easily with plpgsql because plpgsql cannot influence
transactions.

what you can do is to use some client (like psql) and make it simply
issue a lot of queries.

for example. let's assume your table t has column id, which is primary
key and contains values from 1 to 100000.

now you can:
perl -e 'for ($i=1; $i<100000; $i+=1000) {printf "update t set col_c = col_a + col_b where col_a = a and id between %u
and%u;\n", $i, $i+999}' | psql -U ... -d ... 

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: Commit every N rows in PL/pgsql

От

Len Walter

Дата:

27 мая 2010 г., 00:42:47

> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.

That's unusual, what is the error you get?

Here it is:

TopMemoryContext: 57608 total in 7 blocks; 4072 free (6 chunks); 53536 used

RI compare cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used

RI query cache: 24576 total in 2 blocks; 14136 free (5 chunks); 10440 used

TopTransactionContext: 57344 total in 3 blocks; 10088 free (8 chunks); 47256 used

AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319 chunks); 2642393152 used

Combo CIDs: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used

Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used

Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used

MessageContext: 32768 total in 3 blocks; 11688 free (5 chunks); 21080 used

smgr relation table: 8192 total in 1 blocks; 1776 free (0 chunks); 6416 used

TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used

Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used

PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used

PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used

ExecutorState: 57344 total in 3 blocks; 24384 free (21 chunks); 32960 used

ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

ExprContext: 8192 total in 1 blocks; 8128 free (0 chunks); 64 used

Relcache by OID: 8192 total in 1 blocks; 2856 free (0 chunks); 5336 used

CacheMemoryContext: 2283848 total in 25 blocks; 986328 free (93 chunks); 1297520 used

t_idx_3: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

t_idx_2: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

t_idx_1: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

t_idx_0: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

t_fromuid: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

t_exportid: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

t_callid_hashtext: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

t_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

CachedPlan: 3072 total in 2 blocks; 1512 free (0 chunks); 1560 used

CachedPlanSource: 3072 total in 2 blocks; 1672 free (1 chunks); 1400 used

SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used

company_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

CachedPlan: 3072 total in 2 blocks; 1512 free (0 chunks); 1560 used

CachedPlanSource: 3072 total in 2 blocks; 1672 free (1 chunks); 1400 used

SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used

timezone_code_key: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

timezone_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_trigger_tgconstrname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_trigger_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

timezoneregion_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_toast_113593282_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_constraint_contypid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_constraint_conrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_toast_113593269_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_attrdef_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_shdepend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used

pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used

pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used

pg_roles: 7168 total in 3 blocks; 2824 free (0 chunks); 4344 used

globalconfig_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

schemaversion_pkey: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_aggregate_fnoid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_language_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used

pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used

pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_ts_config_cfgname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0 chunks); 896 used

pg_language_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used

pg_ts_parser_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used

pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used

pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_constraint_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used

pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used

pg_namespace_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used

pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used

pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used

pg_ts_template_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used

MdSmgr: 8192 total in 1 blocks; 5872 free (0 chunks); 2320 used

LOCALLOCK hash: 8192 total in 1 blocks; 1856 free (0 chunks); 6336 used

Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used

ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used

2010-05-20 22:53:40 EST ERROR: out of memory

2010-05-20 22:53:40 EST DETAIL: Failed on request of size 32.

2010-05-20 22:53:40 EST STATEMENT: update t

set col_c = col_a + col_c;

Your table will grow (on disk) to twice the size it had previously, as new rows will be created for the transaction you're running the update from, but it will shrink again with vacuuming and usage. So you may run out of disk space, but /never/ out of memory. If you do then you probably have configured Postgres to use more memory than you have.

If it does grow to double on disk, that would be a problem. The table is 40GB and there's 40GB free on the disk... From the error, I thought it was a memory problem though.

Nested transactions wouldn't solve the problem, as the rows you "commit" here still aren't allowed to be visible to other transactions and so both versions of the rows need to be kept around until the outer transaction commits. It's not going to save you any space.

I think that Oracle stored procedures aren't wrapped in an implicit transaction the way they are in Postgres. We ended up solving the problem outside Postgres, but it's still an interesting problem :-)

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:1081,4bfce25910411232819391!

--
len.walter@gmail.com skype:lenwalter msn:len.walter@gmail.com

Re: Commit every N rows in PL/pgsql

От

Tom Lane

Дата:

27 мая 2010 г., 00:51:49

Len Walter <len.walter@gmail.com> writes:
>>> I need to populate a new column in a Postgres 8.3 table. The SQL would be
>>> something like "update t set col_c = col_a + col_b". Unfortunately, this
>>> table has 110 million rows, so running that query runs out of memory.
>>
>> That's unusual, what is the error you get?

> Here it is:
> ...
>     AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319
> chunks); 2642393152 used

And there's the problem.  Evidently you have an AFTER trigger on the
table, and the queued events for that trigger are overrunning memory.

Figuring out what to do about that would depend on what the trigger's
for, but usually the best bet is to drop the trigger and instead do
whatever it's doing in some bulk fashion instead of retail.

(We have a TODO item to allow the trigger queue to spill to disk instead
of always being kept in memory; but it's fairly low on the priority
list, because quite frankly once the event list gets this large, you'd
not want to wait around for all the triggers to execute anyway ...)

            regards, tom lane

Re: Commit every N rows in PL/pgsql

От

Chris Browne

Дата:

02 июня 2010 г., 22:30:53

len.walter@gmail.com (Len Walter) writes:
> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+
> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.

Unnecessary.  On Oracle, the typical scenario is "ORA-1562 FAILED TO
EXTEND ROLLBACK SEGMENT."

PostgreSQL doesn't have a rollback segment, so there's nothing to run
out of here.  Where Oracle would tend to encourage you to keep your
transactions rather small, PostgreSQL doesn't require you to care about
that.

Big transactions, on PostgreSQL, are really no big deal.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;)" -- William Burrow <aa126@DELETE.fan.nb.ca>

Re: Commit every N rows in PL/pgsql

От

Chris Browne

Дата:

02 июня 2010 г., 23:30:53

Chris Browne <cbbrowne@acm.org> writes:
> len.walter@gmail.com (Len Walter) writes:
>> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+
>> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.
>
> Unnecessary.  On Oracle, the typical scenario is "ORA-1562 FAILED TO
> EXTEND ROLLBACK SEGMENT."
>
> PostgreSQL doesn't have a rollback segment, so there's nothing to run
> out of here.  Where Oracle would tend to encourage you to keep your
> transactions rather small, PostgreSQL doesn't require you to care about
> that.
>
> Big transactions, on PostgreSQL, are really no big deal.

Unless, of course, there's a trigger on that table.  In which case
it's pretty likely that you'd want to suppress the trigger...
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;)" -- William Burrow <aa126@DELETE.fan.nb.ca>

Re: Commit every N rows in PL/pgsql

От

Len Walter

Дата:

03 июня 2010 г., 08:00:17

> ...
> AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319
> chunks); 2642393152 used

And there's the problem. Evidently you have an AFTER trigger on the
table, and the queued events for that trigger are overrunning memory.

That's interesting - I don't know of any triggers on this table, but it does have a lot of foreign key constraints. Would they fall into the same category?

--
len.walter@gmail.com skype:lenwalter msn:len.walter@gmail.com

Re: Commit every N rows in PL/pgsql

От

Howard Rogers

Дата:

07 июля 2010 г., 05:20:44

On 06/03/2010 08:26 AM, Chris Browne wrote:
> len.walter@gmail.com (Len Walter) writes:
>> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+
>> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.
>
> Unnecessary.  On Oracle, the typical scenario is "ORA-1562 FAILED TO
> EXTEND ROLLBACK SEGMENT."
>
> PostgreSQL doesn't have a rollback segment, so there's nothing to run
> out of here.  Where Oracle would tend to encourage you to keep your
> transactions rather small, PostgreSQL doesn't require you to care about
> that.
>
> Big transactions, on PostgreSQL, are really no big deal.

Oracle would most certainly NOT tend to encourage you to keep your
transactions rather small! As in any other relational database, they
would -I suggest- encourage you to commit when it is "transactionally
appropriate".

Do you care if your load fails part-way through and the rows inserted up
to that point remain behind? If not, commit frequently. But if it has to
be 'all 110 million or none', then you have one large transaction on
your hands, and you'll have to resource for that -no matter which
database you're using.

Oracle may run out of rollback segment space (for the past 7 years,
they're called undo segments, but no real matter). But other databases
will run out of transaction log space, or have a problem handling
mutli-version concurrency control issues for an extended length of time.
No matter what a particular RDBMS may call it, all would have resource
issues to deal with when coping with a long-running, multi-million row
transaction.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Commit every N rows in PL/pgsql