Re: Out of memory condition

Поиск
Список
Период
Сортировка
От Carlos Henrique Reimer
Тема Re: Out of memory condition
Дата
Msg-id CAJnnue1BEtddyjAjDhKnKX7Mhay5Qp9Q0x-v+r3ZWz9x0Qx9BQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Out of memory condition  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hi,

Yes, I agree, 8.3 is out of support for a long time and this is the reason we are trying to migrate to 9.3 using SLONY to minimize downtime.

I eliminated the possibility of data corruption as the limit/offset technique indicated different rows each time it was executed. Actually, the failure is still happening and as it is running in a virtual machine, memory size configuration for this virtual machine was increased from 48GB to 64GB and we have scheduled a server shutdown/restart for the next coming weekend in order to try to get rid of the failure.

The replication activity was aborted: SLONY triggers removed, SLONY processes terminated and SLONY schemas removed.

Ulimit output was appended at the end of this note.

Memory statistics dump from postmaster log resulted from a select * from "8147_spunico"."sincdc"; command:

Thank you!

TopMemoryContext: 80800 total in 9 blocks; 4088 free (10 chunks); 76712 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  TopTransactionContext: 8192 total in 1 blocks; 7648 free (1 chunks); 544 used
  MessageContext: 57344 total in 3 blocks; 40760 free (6 chunks); 16584 used
  smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks); 10672 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
      ExecutorState: 381096528 total in 6 blocks; 49856 free (30 chunks); 381046672 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
  CacheMemoryContext: 817392 total in 20 blocks; 230456 free (3 chunks); 586936 used
    pg_toast_729119_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
    idx_sincdc_situacao: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    idx_sincdc_esqtab: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    idx_sincdc_datahoraexp: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_sincdc: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
    pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_cast_source_target_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
    pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
    pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_operator_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
  MdSmgr: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used
  LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used


Ulimit output:
ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 385725
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


On Thu, Dec 11, 2014 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Carlos Henrique Reimer <carlos.reimer@opendb.com.br> writes:
> I've facing an out of memory condition after running SLONY several hours to
> get a 1TB database with about 23,000 tables replicated. The error occurs
> after about 50% of the tables were replicated.

I'd try bringing this up with the Slony crew.

> I guess postgresql is trying to perform an atomic allocation (those which
> cannot wait for reclaim) to get a continues memory area and is failing due
> to memory fragmentation.

This theory has nothing to do with reality.  More likely it's just a
garden variety memory leak.  If it was an out-of-memory error reported
by Postgres, there should have been a memory statistics dump written in
the postmaster log --- can you find that and post it?

Another possible theory is that you're just looking at lots of memory
needed to hold relcache entries for all 23000 tables :-(.  If so there
may not be any easy way around it, except perhaps replicating subsets
of the tables.  Unless you can boost the memory available to the backend
--- since this is a 64 bit build, the only reason I can see for
out-of-memory failures would be a restrictive ulimit setting.

> After SLONY gets the out of memory condition, select * of the table also
> does not work:
> FiscalWeb=# select * from "8147_spunico"."sincdc";
> ERROR:  out of memory
> DETAIL:  Failed on request of size 268435456.

That's odd ... looks more like data corruption than anything else.
Does this happen even in a fresh session?  What do you have to do
to get rid of the failure?

>  PostgreSQL 8.3.21 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> 4.4.6 20120305 (Red Hat 4.4.6-4)

You realize of course that this version is years out of support, and that
even if this problem traces to a bug in Postgres, 8.3 is not going to get
fixed.

                        regards, tom lane



--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Out of memory condition
Следующее
От: Jim McLaughlin
Дата:
Сообщение: libpq equivalent of jdbc:default:connection