Обсуждение: is there a way to firmly cap postgres worker memory consumption?

Поиск
Список
Период
Сортировка

is there a way to firmly cap postgres worker memory consumption?

От
Steve Kehlet
Дата:
I'm running into some scenarios where errant postgres queries are consuming all memory and bringing down production boxes. I'm running Postgres 9.3.4 on CentOS 5.x VMs under XenServer. My custom config is here: https://gist.github.com/skehlet/9984666

The incidents may be related to reports of an array_agg memory leak, and another recent incident seems like it was due to the xpath() function provided by --with-xml. There's some more information on the former in my post at: http://dba.stackexchange.com/questions/62587/why-does-this-query-cause-the-postgres-worker-process-to-consume-so-much-memory, if it's interesting. I'd like to dig into that further, maybe on another thread.

But for now, the devs are asking me for some way to put a cap on a postgres query's total memory consumption. I'm familiar with the available settings on the "Resource Consumption" docs (and you can see my settings in my gist above, including work_mem turned way down to 1MB for testing), but it seems like there are things like Materialize that remain uncappable, since they're not constrained by the shared_buffers and work_mem limits. For example in my post to dba.stackexchange above, I found by doing a "set enable_material=false;" I could stop a particularly nasty query from consuming all memory, narrowing the problem somewhat, but that query is one we have to live with in the near term.

With all that laid out, can anyone help with the following questions:

- Is there any way to set to total memory cap on a worker processes memory consumption?
- Are there other (even undocumented) settings to cap memory usage?
- Other ideas?

In the meantime, to stop the bleeding I'm looking into using the postgres feature to coerce the OOM killer to kill rogue child processes first (-DLINUX_OOM_ADJ). Unfortunately, another setback, even the latest kernel on CentOS 5.x doesn't allow non-root processes to write to /proc/self/oom_adj, so it isn't working. So I'll either need to patch the CentOS kernel and rebuild, move to a newer kernel, or maybe do something hacky like run a background process to reassign the oom_adj value for all postgres workers found.

Thanks for any help/ideas!

Re: is there a way to firmly cap postgres worker memory consumption?

От
Amador Alvarez
Дата:
Hi Steve,

Did you check if your OS allows you to limit memory usage per user / process basis?

Did you either profiled or debugged it to see what is going on?

When memory consumption grows , Doesn't ever swap out?


Thanks,
A.A.


On Tue, Apr 8, 2014 at 10:48 AM, Steve Kehlet <steve.kehlet@gmail.com> wrote:
I'm running into some scenarios where errant postgres queries are consuming all memory and bringing down production boxes. I'm running Postgres 9.3.4 on CentOS 5.x VMs under XenServer. My custom config is here: https://gist.github.com/skehlet/9984666

The incidents may be related to reports of an array_agg memory leak, and another recent incident seems like it was due to the xpath() function provided by --with-xml. There's some more information on the former in my post at: http://dba.stackexchange.com/questions/62587/why-does-this-query-cause-the-postgres-worker-process-to-consume-so-much-memory, if it's interesting. I'd like to dig into that further, maybe on another thread.

But for now, the devs are asking me for some way to put a cap on a postgres query's total memory consumption. I'm familiar with the available settings on the "Resource Consumption" docs (and you can see my settings in my gist above, including work_mem turned way down to 1MB for testing), but it seems like there are things like Materialize that remain uncappable, since they're not constrained by the shared_buffers and work_mem limits. For example in my post to dba.stackexchange above, I found by doing a "set enable_material=false;" I could stop a particularly nasty query from consuming all memory, narrowing the problem somewhat, but that query is one we have to live with in the near term.

With all that laid out, can anyone help with the following questions:

- Is there any way to set to total memory cap on a worker processes memory consumption?
- Are there other (even undocumented) settings to cap memory usage?
- Other ideas?

In the meantime, to stop the bleeding I'm looking into using the postgres feature to coerce the OOM killer to kill rogue child processes first (-DLINUX_OOM_ADJ). Unfortunately, another setback, even the latest kernel on CentOS 5.x doesn't allow non-root processes to write to /proc/self/oom_adj, so it isn't working. So I'll either need to patch the CentOS kernel and rebuild, move to a newer kernel, or maybe do something hacky like run a background process to reassign the oom_adj value for all postgres workers found.

Thanks for any help/ideas!

Re: is there a way to firmly cap postgres worker memory consumption?

От
Steve Kehlet
Дата:
Did you check if your OS allows you to limit memory usage per user / process basis?

I'll definitely dig more into this. I found some stuff saying it isn't as easy as one would expect, i.e. not just a ulimit and done. Or maybe it is, in this case. I will play around.
 
Did you either profiled or debugged it to see what is going on?

I would love to learn more about how to do this, to get to the bottom of the memory usage. I can google around, or can you suggest any reads?
 
When memory consumption grows , Doesn't ever swap out?

Yes, it consumes all swap.

Re: is there a way to firmly cap postgres worker memory consumption?

От
Tom Lane
Дата:
Steve Kehlet <steve.kehlet@gmail.com> writes:
> But for now, the devs are asking me for some way to put a cap on a postgres
> query's total memory consumption. I'm familiar with the available settings
> on the "Resource Consumption" docs (and you can see my settings in my gist
> above, including work_mem turned way down to 1MB for testing), but it seems
> like there are things like Materialize that remain uncappable, since
> they're not constrained by the shared_buffers and work_mem limits.

Materialize nodes should honor the work_mem limit.  That's per node of
course, but you only have half a dozen of them in this query, so I doubt
that's where the problem is.  You mentioned array_agg ... how are you
using that exactly, and how large a result are you expecting it to create?
If you had an array_agg being used in a HashAggregate plan node, I'd
figure for sure that was the source of your problem, but the explain
output shows it's a GroupAggregate so there should only be one array_agg
active at a time.

Anyway, what I would try doing is starting the postmaster with a "ulimit
-d" value corresponding to the max per-process data segment size you want.
Something in the range of 50-100MB would probably be reasonable if your
queries aren't too complex.

            regards, tom lane


Re: is there a way to firmly cap postgres worker memory consumption?

От
Tom Lane
Дата:
Steve Kehlet <steve.kehlet@gmail.com> writes:
>> Did you either profiled or debugged it to see what is going on?

> I would love to learn more about how to do this, to get to the bottom of
> the memory usage. I can google around, or can you suggest any reads?

Once you've got a ulimit in place so that malloc eventually fails with
ENOMEM, the backend process should print a memory context dump on stderr
when it hits that.  Make sure your logging setup captures the process
stderr someplace (logging_collector works for this, syslog does not).
Post the dump here when you've got it.

            regards, tom lane


Re: is there a way to firmly cap postgres worker memory consumption?

От
Steve Kehlet
Дата:
On Tue, Apr 8, 2014 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Once you've got a ulimit in place so that malloc eventually fails with
ENOMEM, the backend process should print a memory context dump on stderr
when it hits that.  Make sure your logging setup captures the process
stderr someplace (logging_collector works for this, syslog does not).
Post the dump here when you've got it.

Thank you. For some reason I couldn't get it to trip with "ulimit -d 51200", but "ulimit -v 1572864" (1.5GiB) got me this in serverlog. I hope this is readable, if not it's also here: https://gist.github.com/skehlet/10206741
 
TopMemoryContext: 123760 total in 16 blocks; 2720 free (7 chunks); 121040 used
  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
  Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  TopTransactionContext: 8192 total in 1 blocks; 7368 free (0 chunks); 824 used
  MessageContext: 4186112 total in 9 blocks; 633960 free (19 chunks); 3552152 used
    JoinRelHashTable: 8192 total in 1 blocks; 624 free (0 chunks); 7568 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  smgr relation table: 24576 total in 2 blocks; 9808 free (4 chunks); 14768 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
      ExecutorState: 566208 total in 12 blocks; 195256 free (2 chunks); 370952 used
        TupleSort: 17817440 total in 36 blocks; 9326832 free (8801 chunks); 8490608 used
        TupleSort: 17500568 total in 86 blocks; 12016040 free (2896 chunks); 5484528 used
        TupleSort: 17616584 total in 86 blocks; 9975520 free (3649 chunks); 7641064 used
        TupleSort: 10825056 total in 14 blocks; 3896400 free (45082 chunks); 6928656 used
        TupleSort: 17459720 total in 67 blocks; 8495600 free (2226 chunks); 8964120 used
        TupleSort: 10899360 total in 23 blocks; 2112448 free (23706 chunks); 8786912 used
        TupleSort: 11908776 total in 15 blocks; 8344064 free (145642 chunks); 3564712 used
        printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 812638208 total in 108 blocks; 183520 free (171 chunks); 812454688 used
        AggContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
  CacheMemoryContext: 1342128 total in 21 blocks; 424872 free (1 chunks); 917256 used
    pg_toast_16646_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_mc2_fki: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_mc2_pkey: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    d_mcm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_mm2_fki: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    d_mm2_index2: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    d_mm2_index1: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    d_mm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_m2_index1: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_m2_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_range_rngtypid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_database_datname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
  MdSmgr: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
  ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  hba parser context: 3072 total in 2 blocks; 80 free (1 chunks); 2992 used
  LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
  Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
2014-04-08 23:00:14 GMT [31229]: [1-1] ERROR:  out of memory
2014-04-08 23:00:14 GMT [31229]: [2-1] DETAIL:  Failed on request of size 597.
2014-04-08 23:00:14 GMT [31229]: [3-1] STATEMENT:  SELECT MESSAGE.ID,
      MESSAGE.SERVER_ID,
      MESSAGE.RECEIVED_DATE,
      MESSAGE.PROCESSED,
      MESSAGE.ORIGINAL_ID,
      MESSAGE.IMPORT_ID,
      MESSAGE.IMPORT_CHANNEL_ID,
      METADATA_IDS
FROM
 ( SELECT M.ID,
          ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',') AS METADATA_IDS
  FROM D_M2 M
  JOIN D_MM2 MM ON MM.MESSAGE_ID = M.ID
  LEFT JOIN D_MCM2 MCM ON MCM.MESSAGE_ID = MM.MESSAGE_ID
  AND MCM.METADATA_ID = MM.ID
  LEFT JOIN D_MC2 MC_1 ON MC_1.MESSAGE_ID = M.ID
  AND MC_1.IS_ENCRYPTED = FALSE
  AND MC_1.METADATA_ID = 0
  AND ((MM.ID = 0
        AND MC_1.CONTENT_TYPE = 1)
       OR (MM.ID > 0
           AND MC_1.CONTENT_TYPE = 4))
  LEFT JOIN D_MC2 MC_2 ON MC_2.MESSAGE_ID = M.ID
  AND MC_2.IS_ENCRYPTED = FALSE
  AND MC_2.METADATA_ID = MM.ID
  AND MC_2.CONTENT_TYPE = '2'
  LEFT JOIN D_MC2 MC_3 ON MC_3.MESSAGE_ID = M.ID
  AND MC_3.IS_ENCRYPTED = FALSE
  AND MC_3.METADATA_ID = MM.ID
  AND MC_3.CONTENT_TYPE = '3'
  LEFT JOIN D_MC2 MC_4 ON MC_4.MESSAGE_ID = M.ID
  AND MC_4.IS_ENCRYPTED = FALSE
  AND MC_4.METADATA_ID = MM.ID
  AND MC_4.CONTENT_TYPE = '4'
  LEFT JOIN D_MC2 MC_5 ON MC_5.MESSAGE_ID = M.ID
  AND MC_5.IS_ENCRYPTED = FALSE
  AND MC_5.METADATA_ID = MM.ID
  AND MC_5.CONTENT_TYPE = '5'
  LEFT JOIN D_MC2 MC_6 ON MC_6.MESSAGE_ID = M.ID
  AND MC_6.IS_ENCRYPTED = FALSE
  AND MC_6.METADATA_ID = MM.ID
  AND MC_6.CONTENT_TYPE = '6'
  LEFT JOIN D_MC2 MC_7 ON MC_7.MESSAGE_ID = M.ID
  AND MC_7.IS_ENCRYPTED = FALSE
  AND MC_7.METADATA_ID = MM.ID
  AND MC_7.CONTENT_TYPE = '7'
  LEFT JOIN D_MC2 MC_8 ON MC_8.MESSAGE_ID = M.ID
  AND MC_8.IS_ENCRYPTED = FALSE
  AND MC_8.METADATA_ID = MM.ID
  AND MC_8.CONTENT_TYPE ='8'
  WHERE M.ID <= '230716'
    AND ( M.ID::VARCHAR ILIKE '%' || 'rwar mea' || '%'
         OR MM.CONNECTOR_NAME ILIKE '%' || 'rwar mea' || '%'
         OR MC_1.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_2.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_3.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_4.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_5.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_6.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_7.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_8.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MCM."SOURCE" ILIKE '%' || 'rwar mea' || '%'
         OR MCM."TYPE" ILIKE '%' || 'rwar mea' || '%' )
  GROUP BY M.ID
  ORDER BY M.ID DESC
  OFFSET 0 LIMIT 21 ) AS SUBQUERY
INNER JOIN D_M2 MESSAGE ON SUBQUERY.ID = MESSAGE.ID
ORDER BY MESSAGE.ID DESC
;
2014-04-08 23:00:14 GMT [31229]: [4-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.7", size 124600320
2014-04-08 23:00:14 GMT [31229]: [5-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.6", size 36167680
2014-04-08 23:00:14 GMT [31229]: [6-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.5", size 1073741824
2014-04-08 23:00:14 GMT [31229]: [7-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.4", size 823541760
2014-04-08 23:00:14 GMT [31229]: [8-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.3", size 5324800
2014-04-08 23:00:14 GMT [31229]: [9-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.2", size 402513920
2014-04-08 23:00:14 GMT [31229]: [10-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.1", size 38617088
2014-04-08 23:00:14 GMT [31229]: [11-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.0", size 5079040

Re: is there a way to firmly cap postgres worker memory consumption?

От
Steve Kehlet
Дата:
I should add I had to move testing to a different VM, with less RAM and different tunings; they are here: https://gist.github.com/skehlet/10207086


On Tue, Apr 8, 2014 at 4:08 PM, Steve Kehlet <steve.kehlet@gmail.com> wrote:
On Tue, Apr 8, 2014 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Once you've got a ulimit in place so that malloc eventually fails with
ENOMEM, the backend process should print a memory context dump on stderr
when it hits that.  Make sure your logging setup captures the process
stderr someplace (logging_collector works for this, syslog does not).
Post the dump here when you've got it.

Thank you. For some reason I couldn't get it to trip with "ulimit -d 51200", but "ulimit -v 1572864" (1.5GiB) got me this in serverlog. I hope this is readable, if not it's also here: https://gist.github.com/skehlet/10206741
 
TopMemoryContext: 123760 total in 16 blocks; 2720 free (7 chunks); 121040 used
  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
  Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  TopTransactionContext: 8192 total in 1 blocks; 7368 free (0 chunks); 824 used
  MessageContext: 4186112 total in 9 blocks; 633960 free (19 chunks); 3552152 used
    JoinRelHashTable: 8192 total in 1 blocks; 624 free (0 chunks); 7568 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  smgr relation table: 24576 total in 2 blocks; 9808 free (4 chunks); 14768 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
      ExecutorState: 566208 total in 12 blocks; 195256 free (2 chunks); 370952 used
        TupleSort: 17817440 total in 36 blocks; 9326832 free (8801 chunks); 8490608 used
        TupleSort: 17500568 total in 86 blocks; 12016040 free (2896 chunks); 5484528 used
        TupleSort: 17616584 total in 86 blocks; 9975520 free (3649 chunks); 7641064 used
        TupleSort: 10825056 total in 14 blocks; 3896400 free (45082 chunks); 6928656 used
        TupleSort: 17459720 total in 67 blocks; 8495600 free (2226 chunks); 8964120 used
        TupleSort: 10899360 total in 23 blocks; 2112448 free (23706 chunks); 8786912 used
        TupleSort: 11908776 total in 15 blocks; 8344064 free (145642 chunks); 3564712 used
        printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 812638208 total in 108 blocks; 183520 free (171 chunks); 812454688 used
        AggContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
  CacheMemoryContext: 1342128 total in 21 blocks; 424872 free (1 chunks); 917256 used
    pg_toast_16646_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_mc2_fki: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_mc2_pkey: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    d_mcm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_mm2_fki: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    d_mm2_index2: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    d_mm2_index1: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    d_mm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_m2_index1: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    d_m2_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_range_rngtypid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_database_datname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
  MdSmgr: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
  ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  hba parser context: 3072 total in 2 blocks; 80 free (1 chunks); 2992 used
  LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
  Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
2014-04-08 23:00:14 GMT [31229]: [1-1] ERROR:  out of memory
2014-04-08 23:00:14 GMT [31229]: [2-1] DETAIL:  Failed on request of size 597.
2014-04-08 23:00:14 GMT [31229]: [3-1] STATEMENT:  SELECT MESSAGE.ID,
      MESSAGE.SERVER_ID,
      MESSAGE.RECEIVED_DATE,
      MESSAGE.PROCESSED,
      MESSAGE.ORIGINAL_ID,
      MESSAGE.IMPORT_ID,
      MESSAGE.IMPORT_CHANNEL_ID,
      METADATA_IDS
FROM
 ( SELECT M.ID,
          ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',') AS METADATA_IDS
  FROM D_M2 M
  JOIN D_MM2 MM ON MM.MESSAGE_ID = M.ID
  LEFT JOIN D_MCM2 MCM ON MCM.MESSAGE_ID = MM.MESSAGE_ID
  AND MCM.METADATA_ID = MM.ID
  LEFT JOIN D_MC2 MC_1 ON MC_1.MESSAGE_ID = M.ID
  AND MC_1.IS_ENCRYPTED = FALSE
  AND MC_1.METADATA_ID = 0
  AND ((MM.ID = 0
        AND MC_1.CONTENT_TYPE = 1)
       OR (MM.ID > 0
           AND MC_1.CONTENT_TYPE = 4))
  LEFT JOIN D_MC2 MC_2 ON MC_2.MESSAGE_ID = M.ID
  AND MC_2.IS_ENCRYPTED = FALSE
  AND MC_2.METADATA_ID = MM.ID
  AND MC_2.CONTENT_TYPE = '2'
  LEFT JOIN D_MC2 MC_3 ON MC_3.MESSAGE_ID = M.ID
  AND MC_3.IS_ENCRYPTED = FALSE
  AND MC_3.METADATA_ID = MM.ID
  AND MC_3.CONTENT_TYPE = '3'
  LEFT JOIN D_MC2 MC_4 ON MC_4.MESSAGE_ID = M.ID
  AND MC_4.IS_ENCRYPTED = FALSE
  AND MC_4.METADATA_ID = MM.ID
  AND MC_4.CONTENT_TYPE = '4'
  LEFT JOIN D_MC2 MC_5 ON MC_5.MESSAGE_ID = M.ID
  AND MC_5.IS_ENCRYPTED = FALSE
  AND MC_5.METADATA_ID = MM.ID
  AND MC_5.CONTENT_TYPE = '5'
  LEFT JOIN D_MC2 MC_6 ON MC_6.MESSAGE_ID = M.ID
  AND MC_6.IS_ENCRYPTED = FALSE
  AND MC_6.METADATA_ID = MM.ID
  AND MC_6.CONTENT_TYPE = '6'
  LEFT JOIN D_MC2 MC_7 ON MC_7.MESSAGE_ID = M.ID
  AND MC_7.IS_ENCRYPTED = FALSE
  AND MC_7.METADATA_ID = MM.ID
  AND MC_7.CONTENT_TYPE = '7'
  LEFT JOIN D_MC2 MC_8 ON MC_8.MESSAGE_ID = M.ID
  AND MC_8.IS_ENCRYPTED = FALSE
  AND MC_8.METADATA_ID = MM.ID
  AND MC_8.CONTENT_TYPE ='8'
  WHERE M.ID <= '230716'
    AND ( M.ID::VARCHAR ILIKE '%' || 'rwar mea' || '%'
         OR MM.CONNECTOR_NAME ILIKE '%' || 'rwar mea' || '%'
         OR MC_1.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_2.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_3.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_4.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_5.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_6.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_7.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MC_8.CONTENT ILIKE '%' || 'rwar mea' || '%'
         OR MCM."SOURCE" ILIKE '%' || 'rwar mea' || '%'
         OR MCM."TYPE" ILIKE '%' || 'rwar mea' || '%' )
  GROUP BY M.ID
  ORDER BY M.ID DESC
  OFFSET 0 LIMIT 21 ) AS SUBQUERY
INNER JOIN D_M2 MESSAGE ON SUBQUERY.ID = MESSAGE.ID
ORDER BY MESSAGE.ID DESC
;
2014-04-08 23:00:14 GMT [31229]: [4-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.7", size 124600320
2014-04-08 23:00:14 GMT [31229]: [5-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.6", size 36167680
2014-04-08 23:00:14 GMT [31229]: [6-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.5", size 1073741824
2014-04-08 23:00:14 GMT [31229]: [7-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.4", size 823541760
2014-04-08 23:00:14 GMT [31229]: [8-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.3", size 5324800
2014-04-08 23:00:14 GMT [31229]: [9-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.2", size 402513920
2014-04-08 23:00:14 GMT [31229]: [10-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.1", size 38617088
2014-04-08 23:00:14 GMT [31229]: [11-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp31229.0", size 5079040


Re: is there a way to firmly cap postgres worker memory consumption?

От
Tom Lane
Дата:
Steve Kehlet <steve.kehlet@gmail.com> writes:
> Thank you. For some reason I couldn't get it to trip with "ulimit -d
> 51200", but "ulimit -v 1572864" (1.5GiB) got me this in serverlog. I hope
> this is readable, if not it's also here:

Well, here's the problem:

>         ExprContext: 812638208 total in 108 blocks; 183520 free (171
> chunks); 812454688 used

So something involved in expression evaluation is eating memory.
Looking at the query itself, I'd have to bet on this:

>            ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',')

My guess is that this aggregation is being done across a lot more rows
than you were expecting, and the resultant array/string therefore eats
lots of memory.  You might try replacing that with COUNT(*), or even
better SUM(LENGTH(MM.ID::CHARACTER VARYING)), just to get some definitive
evidence about what the query is asking to compute.

Meanwhile, it seems like ulimit -v would provide the safety valve
you asked for originally.  I too am confused about why -d didn't
do it, but as long as you found a variant that works ...

            regards, tom lane


Re: is there a way to firmly cap postgres worker memory consumption?

От
Steve Kehlet
Дата:
On Tue, Apr 8, 2014 at 6:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Well, here's the problem:
>         ExprContext: 812638208 total in 108 blocks; 183520 free (171
> chunks); 812454688 used

So something involved in expression evaluation is eating memory.
Looking at the query itself, I'd have to bet on this:

>            ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',')

My guess is that this aggregation is being done across a lot more rows
than you were expecting, and the resultant array/string therefore eats
lots of memory.  You might try replacing that with COUNT(*), or even
better SUM(LENGTH(MM.ID::CHARACTER VARYING)), just to get some definitive
evidence about what the query is asking to compute.

The devs have moved on and want to stick with their new query, so I'll just chalk this up to a bad query and let it go. But I'm glad to have learned a few new tricks, thanks.
 
Meanwhile, it seems like ulimit -v would provide the safety valve
you asked for originally.

Thank you Amador and Tom for the ulimit solution, that's exactly what I needed.