Обсуждение: Commit every N rows in PL/pgsql
Hi,
I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.
In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps a counter and commits every 10000 rows (pseudocode):
define cursor curs as select col_a from t
while fetch_from_cursor(curs) into a
update t set col_c = col_a + col_b where col_a = a
i++
if i > 10000
commit; i=0;
end if;
commit;
PL/pgsql doesn't allow that because it doesn't support nested transactions.
Is there an equivalent Postgres way of doing this?
Is there an equivalent Postgres way of doing this?
cheers,
On Wed, May 26, 2010 at 1:27 AM, Len Walter <len.walter@gmail.com> wrote: > > PL/pgsql doesn't allow that because it doesn't support nested transactions. > Is there an equivalent Postgres way of doing this? what about this? create function f() returns void as $$ declare r record; begin for r in select col_a from t loop update t set col_c = col_a + col_b where col_a = r.a; end loop; return; end; $$ language plpgsql; -- Jaime Casanova www.2ndQuadrant.com Soporte y capacitación de PostgreSQL
Hello it is useless in PostgreSQL - it isn't important if you commit one or billion updated rows. PostgreSQL has different implementation of transactions, so some Oracle's issues are not here. Regards Pavel Stehule 2010/5/26 Len Walter <len.walter@gmail.com>: > Hi, > I need to populate a new column in a Postgres 8.3 table. The SQL would be > something like "update t set col_c = col_a + col_b". Unfortunately, this > table has 110 million rows, so running that query runs out of memory. > In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps > a counter and commits every 10000 rows (pseudocode): > define cursor curs as select col_a from t > while fetch_from_cursor(curs) into a > update t set col_c = col_a + col_b where col_a = a > i++ > if i > 10000 > commit; i=0; > end if; > commit; > PL/pgsql doesn't allow that because it doesn't support nested transactions. > Is there an equivalent Postgres way of doing this? > cheers, > Len > -- > len.walter@gmail.com skype:lenwalter msn:len.walter@gmail.com >
On 26 May 2010, at 8:27, Len Walter wrote: > Hi, > > I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a +col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory. That's unusual, what is the error you get? Your table will grow (on disk) to twice the size it had previously, as new rows will be created for the transaction you'rerunning the update from, but it will shrink again with vacuuming and usage. So you may run out of disk space, but /never/out of memory. If you do then you probably have configured Postgres to use more memory than you have. > In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps a counter and commits every 10000 rows (pseudocode): > > define cursor curs as select col_a from t > while fetch_from_cursor(curs) into a > update t set col_c = col_a + col_b where col_a = a > i++ > if i > 10000 > commit; i=0; > end if; > commit; > > PL/pgsql doesn't allow that because it doesn't support nested transactions. > Is there an equivalent Postgres way of doing this? Nested transactions wouldn't solve the problem, as the rows you "commit" here still aren't allowed to be visible to othertransactions and so both versions of the rows need to be kept around until the outer transaction commits. It's not goingto save you any space. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4bfce26010413711619512!
On Wed, May 26, 2010 at 04:27:22PM +1000, Len Walter wrote: > Hi, > > I need to populate a new column in a Postgres 8.3 table. The SQL would be > something like "update t set col_c = col_a + col_b". Unfortunately, this > table has 110 million rows, so running that query runs out of memory. > In Oracle, I'd turn auto-commit off and write a pl/sql procedure that keeps > a counter and commits every 10000 rows (pseudocode): > > define cursor curs as select col_a from t > while fetch_from_cursor(curs) into a > update t set col_c = col_a + col_b where col_a = a > i++ > if i > 10000 > commit; i=0; > end if; > commit; you can't do it easily with plpgsql because plpgsql cannot influence transactions. what you can do is to use some client (like psql) and make it simply issue a lot of queries. for example. let's assume your table t has column id, which is primary key and contains values from 1 to 100000. now you can: perl -e 'for ($i=1; $i<100000; $i+=1000) {printf "update t set col_c = col_a + col_b where col_a = a and id between %u and%u;\n", $i, $i+999}' | psql -U ... -d ... Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory.That's unusual, what is the error you get?
Here it is:
TopMemoryContext: 57608 total in 7 blocks; 4072 free (6 chunks); 53536 used
RI compare cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
RI query cache: 24576 total in 2 blocks; 14136 free (5 chunks); 10440 used
TopTransactionContext: 57344 total in 3 blocks; 10088 free (8 chunks); 47256 used
AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319 chunks); 2642393152 used
Combo CIDs: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used
MessageContext: 32768 total in 3 blocks; 11688 free (5 chunks); 21080 used
smgr relation table: 8192 total in 1 blocks; 1776 free (0 chunks); 6416 used
TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used
ExecutorState: 57344 total in 3 blocks; 24384 free (21 chunks); 32960 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8128 free (0 chunks); 64 used
Relcache by OID: 8192 total in 1 blocks; 2856 free (0 chunks); 5336 used
CacheMemoryContext: 2283848 total in 25 blocks; 986328 free (93 chunks); 1297520 used
t_idx_3: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
t_idx_2: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
t_idx_1: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
t_idx_0: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
t_fromuid: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
t_exportid: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
t_callid_hashtext: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
t_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
CachedPlan: 3072 total in 2 blocks; 1512 free (0 chunks); 1560 used
CachedPlanSource: 3072 total in 2 blocks; 1672 free (1 chunks); 1400 used
SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
company_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
CachedPlan: 3072 total in 2 blocks; 1512 free (0 chunks); 1560 used
CachedPlanSource: 3072 total in 2 blocks; 1672 free (1 chunks); 1400 used
SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
timezone_code_key: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
timezone_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_trigger_tgconstrname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_trigger_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
timezoneregion_pkey: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_toast_113593282_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_constraint_contypid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_constraint_conrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_toast_113593269_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_attrdef_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_shdepend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_roles: 7168 total in 3 blocks; 2824 free (0 chunks); 4344 used
globalconfig_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
schemaversion_pkey: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0 chunks); 896 used
pg_language_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_constraint_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
pg_namespace_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
MdSmgr: 8192 total in 1 blocks; 5872 free (0 chunks); 2320 used
LOCALLOCK hash: 8192 total in 1 blocks; 1856 free (0 chunks); 6336 used
Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used
ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used
2010-05-20 22:53:40 EST ERROR: out of memory
2010-05-20 22:53:40 EST DETAIL: Failed on request of size 32.
2010-05-20 22:53:40 EST STATEMENT: update t
set col_c = col_a + col_c;
Your table will grow (on disk) to twice the size it had previously, as new rows will be created for the transaction you're running the update from, but it will shrink again with vacuuming and usage. So you may run out of disk space, but /never/ out of memory. If you do then you probably have configured Postgres to use more memory than you have.
If it does grow to double on disk, that would be a problem. The table is 40GB and there's 40GB free on the disk... From the error, I thought it was a memory problem though.
Nested transactions wouldn't solve the problem, as the rows you "commit" here still aren't allowed to be visible to other transactions and so both versions of the rows need to be kept around until the outer transaction commits. It's not going to save you any space.
I think that Oracle stored procedures aren't wrapped in an implicit transaction the way they are in Postgres. We ended up solving the problem outside Postgres, but it's still an interesting problem :-)
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:1081,4bfce25910411232819391!
--
len.walter@gmail.com skype:lenwalter msn:len.walter@gmail.com
Len Walter <len.walter@gmail.com> writes: >>> I need to populate a new column in a Postgres 8.3 table. The SQL would be >>> something like "update t set col_c = col_a + col_b". Unfortunately, this >>> table has 110 million rows, so running that query runs out of memory. >> >> That's unusual, what is the error you get? > Here it is: > ... > AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319 > chunks); 2642393152 used And there's the problem. Evidently you have an AFTER trigger on the table, and the queued events for that trigger are overrunning memory. Figuring out what to do about that would depend on what the trigger's for, but usually the best bet is to drop the trigger and instead do whatever it's doing in some bulk fashion instead of retail. (We have a TODO item to allow the trigger queue to spill to disk instead of always being kept in memory; but it's fairly low on the priority list, because quite frankly once the event list gets this large, you'd not want to wait around for all the triggers to execute anyway ...) regards, tom lane
len.walter@gmail.com (Len Walter) writes: > I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + > col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory. Unnecessary. On Oracle, the typical scenario is "ORA-1562 FAILED TO EXTEND ROLLBACK SEGMENT." PostgreSQL doesn't have a rollback segment, so there's nothing to run out of here. Where Oracle would tend to encourage you to keep your transactions rather small, PostgreSQL doesn't require you to care about that. Big transactions, on PostgreSQL, are really no big deal. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html "MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;)" -- William Burrow <aa126@DELETE.fan.nb.ca>
Chris Browne <cbbrowne@acm.org> writes: > len.walter@gmail.com (Len Walter) writes: >> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + >> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory. > > Unnecessary. On Oracle, the typical scenario is "ORA-1562 FAILED TO > EXTEND ROLLBACK SEGMENT." > > PostgreSQL doesn't have a rollback segment, so there's nothing to run > out of here. Where Oracle would tend to encourage you to keep your > transactions rather small, PostgreSQL doesn't require you to care about > that. > > Big transactions, on PostgreSQL, are really no big deal. Unless, of course, there's a trigger on that table. In which case it's pretty likely that you'd want to suppress the trigger... -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html "MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;)" -- William Burrow <aa126@DELETE.fan.nb.ca>
> ...> AfterTriggerEvents: 2642403328 total in 327 blocks; 10176 free (319And there's the problem. Evidently you have an AFTER trigger on the
> chunks); 2642393152 used
table, and the queued events for that trigger are overrunning memory.
That's interesting - I don't know of any triggers on this table, but it does have a lot of foreign key constraints. Would they fall into the same category?
len.walter@gmail.com skype:lenwalter msn:len.walter@gmail.com
On 06/03/2010 08:26 AM, Chris Browne wrote: > len.walter@gmail.com (Len Walter) writes: >> I need to populate a new column in a Postgres 8.3 table. The SQL would be something like "update t set col_c = col_a + >> col_b". Unfortunately, this table has 110 million rows, so running that query runs out of memory. > > Unnecessary. On Oracle, the typical scenario is "ORA-1562 FAILED TO > EXTEND ROLLBACK SEGMENT." > > PostgreSQL doesn't have a rollback segment, so there's nothing to run > out of here. Where Oracle would tend to encourage you to keep your > transactions rather small, PostgreSQL doesn't require you to care about > that. > > Big transactions, on PostgreSQL, are really no big deal. Oracle would most certainly NOT tend to encourage you to keep your transactions rather small! As in any other relational database, they would -I suggest- encourage you to commit when it is "transactionally appropriate". Do you care if your load fails part-way through and the rows inserted up to that point remain behind? If not, commit frequently. But if it has to be 'all 110 million or none', then you have one large transaction on your hands, and you'll have to resource for that -no matter which database you're using. Oracle may run out of rollback segment space (for the past 7 years, they're called undo segments, but no real matter). But other databases will run out of transaction log space, or have a problem handling mutli-version concurrency control issues for an extended length of time. No matter what a particular RDBMS may call it, all would have resource issues to deal with when coping with a long-running, multi-million row transaction.