Обсуждение: Commit every N rows in PL/pgsql

Поиск
Список
Период
Сортировка

Commit every N rows in PL/pgsql

От
Len Walter
Дата:
Hi,

I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.
In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps a counter and commits every 10000 rows (pseudocode):

define cursor curs as select col_a from t
while fetch_from_cursor(curs) into a
     update t set col_c = col_a + col_b where col_a = a
     i++
     if i > 10000
         commit; i=0;
     end if;
     commit;

PL/pgsql doesn't allow that because it doesn't support nested transactions. 
Is there an equivalent Postgres way of doing this?

cheers,

Re: Commit every N rows in PL/pgsql

От
Jaime Casanova
Дата:
On Wed, May 26, 2010 at 1:27 AM, Len Walter <len.walter@gmail.com> wrote:
>
> PL/pgsql doesn't allow that because it doesn't support nested transactions.
> Is there an equivalent Postgres way of doing this?

what about this?

create function f() returns void as $$
declare
    r record;
begin
   for r in select col_a from t loop
       update t set col_c = col_a + col_b where col_a = r.a;
   end loop;
   return;
end;
$$ language plpgsql;

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

Re: Commit every N rows in PL/pgsql

От
Pavel Stehule
Дата:
Hello

it is useless in PostgreSQL - it isn't important if you commit one or
billion updated rows. PostgreSQL has different implementation of
transactions, so some Oracle's issues are not here.

Regards
Pavel Stehule

2010/5/26 Len Walter <len.walter@gmail.com>:
> Hi,
> I need to populate a new column in a Postgres 8.3 table. The SQL would be
> something like "update t set col_c = col_a + col_b". Unfortunately, this
> table has 110 million rows, so running that query runs out of memory.
> In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps
> a counter and commits every 10000 rows (pseudocode):
> define cursor curs as select col_a from t
> while fetch_from_cursor(curs) into a
>      update t set col_c = col_a + col_b where col_a = a
>      i++
>      if i > 10000
>          commit; i=0;
>      end if;
>      commit;
> PL/pgsql doesn't allow that because it doesn't support nested transactions.
> Is there an equivalent Postgres way of doing this?
> cheers,
> Len
> --
> len.walter@gmail.com skype:lenwalter  msn:len.walter@gmail.com
>

Re: Commit every N rows in PL/pgsql

От
Alban Hertroys
Дата:
On 26 May 2010, at 8:27, Len Walter wrote:

> Hi,
>
> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory. 

That's unusual, what is the error you get?

Your table will grow (on disk) to twice the size it had previously, as new rows will be created for the transaction
you'rerunning the update from, but it will shrink again with vacuuming and usage. So you may run out of disk space, but
/never/out of memory. If you do then you probably have configured Postgres to use more memory than you have. 

> In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps a counter and commits every 10000 rows
(pseudocode):
>
> define cursor curs as select col_a from t
> while fetch_from_cursor(curs) into a
>      update t set col_c = col_a + col_b where col_a = a
>      i++
>      if i > 10000
>          commit; i=0;
>      end if;
>      commit;
>
> PL/pgsql doesn't allow that because it doesn't support nested transactions.
> Is there an equivalent Postgres way of doing this?

Nested transactions wouldn't solve the problem, as the rows you "commit" here still aren't allowed to be visible to
othertransactions and so both versions of the rows need to be kept around until the outer transaction commits. It's not
goingto save you any space. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4bfce26010413711619512!



Re: Commit every N rows in PL/pgsql

От
hubert depesz lubaczewski
Дата:
On Wed, May 26, 2010 at 04:27:22PM +1000, Len Walter wrote:
> Hi,
>
> I need to populate a new column in a Postgres 8.3 table. The SQL would be
> something like "update t set col_c = col_a + col_b". Unfortunately, this
> table has 110 million rows, so running that query runs out of memory.
> In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps
> a counter and commits every 10000 rows (pseudocode):
>
> define cursor curs as select col_a from t
> while fetch_from_cursor(curs) into a
>      update t set col_c = col_a + col_b where col_a = a
>      i++
>      if i > 10000
>          commit; i=0;
>      end if;
>      commit;

you can't do it easily with plpgsql because plpgsql cannot influence
transactions.

what you can do is to use some client (like psql) and make it simply
issue a lot of queries.

for example. let's assume your table t has column id, which is primary
key and contains values from 1 to 100000.

now you can:
perl -e 'for ($i=1; $i<100000; $i+=1000) {printf "update t set col_c = col_a + col_b where col_a = a and id between %u
and%u;\n", $i, $i+999}' | psql -U ... -d ... 

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: Commit every N rows in PL/pgsql

От
Len Walter
Дата:

> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.

That's unusual, what is the error you get?
 
Here it is:
TopMemoryContext: 57608 total in 7 blocks; 4072 free (6 chunks); 53536 used
  RI compare cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
  RI query cache: 24576 total in 2 blocks; 14136 free (5 chunks); 10440 used
  TopTransactionContext: 57344 total in 3 blocks; 10088 free (8 chunks); 47256 used
    AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319 chunks); 2642393152 used
    Combo CIDs: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used
  MessageContext: 32768 total in 3 blocks; 11688 free (5 chunks); 21080 used
  smgr relation table: 8192 total in 1 blocks; 1776 free (0 chunks); 6416 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
    PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used
      ExecutorState: 57344 total in 3 blocks; 24384 free (21 chunks); 32960 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8128 free (0 chunks); 64 used
  Relcache by OID: 8192 total in 1 blocks; 2856 free (0 chunks); 5336 used
  CacheMemoryContext: 2283848 total in 25 blocks; 986328 free (93 chunks); 1297520 used
    t_idx_3: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    t_idx_2: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    t_idx_1: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    t_idx_0: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    t_fromuid: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    t_exportid: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    t_callid_hashtext: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    t_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    CachedPlan: 3072 total in 2 blocks; 1512 free (0 chunks); 1560 used
    CachedPlanSource: 3072 total in 2 blocks; 1672 free (1 chunks); 1400 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    company_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    CachedPlan: 3072 total in 2 blocks; 1512 free (0 chunks); 1560 used
    CachedPlanSource: 3072 total in 2 blocks; 1672 free (1 chunks); 1400 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    timezone_code_key: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    timezone_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_trigger_tgconstrname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_trigger_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    timezoneregion_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_toast_113593282_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_constraint_contypid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_constraint_conrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_toast_113593269_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_attrdef_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_shdepend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_roles: 7168 total in 3 blocks; 2824 free (0 chunks); 4344 used
    globalconfig_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    schemaversion_pkey: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_language_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
    pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0 chunks); 896 used
    pg_language_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
    pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
    pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
    pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
  MdSmgr: 8192 total in 1 blocks; 5872 free (0 chunks); 2320 used
  LOCALLOCK hash: 8192 total in 1 blocks; 1856 free (0 chunks); 6336 used
  Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used
  ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used
2010-05-20 22:53:40 EST ERROR:  out of memory
2010-05-20 22:53:40 EST DETAIL:  Failed on request of size 32.
2010-05-20 22:53:40 EST STATEMENT:  update t
     set col_c = col_a + col_c;
 
Your table will grow (on disk) to twice the size it had previously, as new rows will be created for the transaction you're running the update from, but it will shrink again with vacuuming and usage. So you may run out of disk space, but /never/ out of memory. If you do then you probably have configured Postgres to use more memory than you have.
 
If it does grow to double on disk, that would be a problem. The table is 40GB and there's 40GB free on the disk... From the error, I thought it was a memory problem though.
 
Nested transactions wouldn't solve the problem, as the rows you "commit" here still aren't allowed to be visible to other transactions and so both versions of the rows need to be kept around until the outer transaction commits. It's not going to save you any space.
 
I think that Oracle stored procedures aren't wrapped in an implicit transaction the way they are in Postgres. We ended up solving the problem outside Postgres, but it's still an interesting problem :-)


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:1081,4bfce25910411232819391!





--
len.walter@gmail.com skype:lenwalter  msn:len.walter@gmail.com

Re: Commit every N rows in PL/pgsql

От
Tom Lane
Дата:
Len Walter <len.walter@gmail.com> writes:
>>> I need to populate a new column in a Postgres 8.3 table. The SQL would be
>>> something like "update t set col_c = col_a + col_b". Unfortunately, this
>>> table has 110 million rows, so running that query runs out of memory.
>>
>> That's unusual, what is the error you get?

> Here it is:
> ...
>     AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319
> chunks); 2642393152 used

And there's the problem.  Evidently you have an AFTER trigger on the
table, and the queued events for that trigger are overrunning memory.

Figuring out what to do about that would depend on what the trigger's
for, but usually the best bet is to drop the trigger and instead do
whatever it's doing in some bulk fashion instead of retail.

(We have a TODO item to allow the trigger queue to spill to disk instead
of always being kept in memory; but it's fairly low on the priority
list, because quite frankly once the event list gets this large, you'd
not want to wait around for all the triggers to execute anyway ...)

            regards, tom lane

Re: Commit every N rows in PL/pgsql

От
Chris Browne
Дата:
len.walter@gmail.com (Len Walter) writes:
> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+
> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.

Unnecessary.  On Oracle, the typical scenario is "ORA-1562 FAILED TO
EXTEND ROLLBACK SEGMENT."

PostgreSQL doesn't have a rollback segment, so there's nothing to run
out of here.  Where Oracle would tend to encourage you to keep your
transactions rather small, PostgreSQL doesn't require you to care about
that.

Big transactions, on PostgreSQL, are really no big deal.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;)" -- William Burrow <aa126@DELETE.fan.nb.ca>

Re: Commit every N rows in PL/pgsql

От
Chris Browne
Дата:
Chris Browne <cbbrowne@acm.org> writes:
> len.walter@gmail.com (Len Walter) writes:
>> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+
>> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.
>
> Unnecessary.  On Oracle, the typical scenario is "ORA-1562 FAILED TO
> EXTEND ROLLBACK SEGMENT."
>
> PostgreSQL doesn't have a rollback segment, so there's nothing to run
> out of here.  Where Oracle would tend to encourage you to keep your
> transactions rather small, PostgreSQL doesn't require you to care about
> that.
>
> Big transactions, on PostgreSQL, are really no big deal.

Unless, of course, there's a trigger on that table.  In which case
it's pretty likely that you'd want to suppress the trigger...
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;)" -- William Burrow <aa126@DELETE.fan.nb.ca>

Re: Commit every N rows in PL/pgsql

От
Len Walter
Дата:

> ...
>     AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319
> chunks); 2642393152 used

And there's the problem.  Evidently you have an AFTER trigger on the
table, and the queued events for that trigger are overrunning memory.

That's interesting - I don't know of any triggers on this table, but it does have a lot of foreign key constraints. Would they fall into the same category?
 
--
len.walter@gmail.com skype:lenwalter  msn:len.walter@gmail.com

Re: Commit every N rows in PL/pgsql

От
Howard Rogers
Дата:
On 06/03/2010 08:26 AM, Chris Browne wrote:
> len.walter@gmail.com (Len Walter) writes:
>> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a
+
>> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.
>
> Unnecessary.  On Oracle, the typical scenario is "ORA-1562 FAILED TO
> EXTEND ROLLBACK SEGMENT."
>
> PostgreSQL doesn't have a rollback segment, so there's nothing to run
> out of here.  Where Oracle would tend to encourage you to keep your
> transactions rather small, PostgreSQL doesn't require you to care about
> that.
>
> Big transactions, on PostgreSQL, are really no big deal.

Oracle would most certainly NOT tend to encourage you to keep your
transactions rather small! As in any other relational database, they
would -I suggest- encourage you to commit when it is "transactionally
appropriate".

Do you care if your load fails part-way through and the rows inserted up
to that point remain behind? If not, commit frequently. But if it has to
be 'all 110 million or none', then you have one large transaction on
your hands, and you'll have to resource for that -no matter which
database you're using.

Oracle may run out of rollback segment space (for the past 7 years,
they're called undo segments, but no real matter). But other databases
will run out of transaction log space, or have a problem handling
mutli-version concurrency control issues for an extended length of time.
No matter what a particular RDBMS may call it, all would have resource
issues to deal with when coping with a long-running, multi-million row
transaction.