Обсуждение: Out of memory on SELECT (from sort?) in 8.3
I have a SELECT query that causes an out-of-memory error on my production Postgres 8.3 server. I believe the culprit is from the sort in the query, but its running out on queries that return 300,000 results which I didn't think was very large. work_mem is set to 128MB, but I thought the server would change to an on-disk sort method if it couldn't perform the sort in memory. Is this not the case? In the log file, I see the following information: 2008-08-16 02:02:43 CDT lms_nna ERROR: out of memory 2008-08-16 02:02:43 CDT lms_nna DETAIL: Failed on request of size 16384. TopMemoryContext: 105376 total in 12 blocks; 9288 free (10 chunks); 96088 used LibxmlContext: 8380416 total in 10 blocks; 3828016 free (0 chunks); 4552400 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used PL/PgSQL function context: 8192 total in 1 blocks; 5512 free (4 chunks); 2680 used Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used PL/PgSQL function context: 8192 total in 1 blocks; 4648 free (7 chunks); 3544 used CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PLpgSQL function cache: 24224 total in 2 blocks; 3744 free (0 chunks); 20480 used Record information cache: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used Prepared Queries: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used MessageContext: 8192 total in 1 blocks; 6976 free (1 chunks); 1216 used smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks); 10672 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 8192 total in 1 blocks; 7616 free (0 chunks); 576 used PortalHeapMemory: 2048 total in 1 blocks; 424 free (0 chunks); 1624 used ExecutorState: 841031232 total in 51159 blocks; 1712232 free (56 chunks); 839319000 used TIDBitmap: 2088960 total in 8 blocks; 243200 free (25 chunks); 1845760 used TupleSort: 28303408 total in 13 blocks; 5346944 free (14 chunks); 22956464 used ExprContext: 8192 total in 1 blocks; 8096 free (0 chunks); 96 used ExprContext: 8192 total in 1 blocks; 8096 free (0 chunks); 96 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used PortalHeapMemory: 1024 total in 1 blocks; 944 free (0 chunks); 80 used Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used CacheMemoryContext: 817392 total in 20 blocks; 23488 free (1 chunks); 793904 used CachedPlan: 15360 total in 4 blocks; 1528 free (0 chunks); 13832 used CachedPlanSource: 15360 total in 4 blocks; 1440 free (0 chunks); 13920 used unnamed prepared statement: 8192 total in 1 blocks; 3856 free (2 chunks); 4336 used CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 3072 total in 2 blocks; 488 free (0 chunks); 2584 used CachedPlanSource: 1024 total in 1 blocks; 24 free (0 chunks); 1000 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used CachedPlanSource: 1024 total in 1 blocks; 656 free (0 chunks); 368 used pg_toast_18005_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used reporting_modified_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used reporting_last_processed_date_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used reporting_date_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used lead_created_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_processing_step_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_destination_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_modified_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_source_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used processing_state_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used locks_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used CachedPlan: 3072 total in 2 blocks; 648 free (0 chunks); 2424 used CachedPlanSource: 3072 total in 2 blocks; 1248 free (1 chunks); 1824 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used xslt_style_sheet_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used external_system_user_name_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used external_system_name_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used external_system_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used dealer_external_system_id_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used dealer_dealer_code_key: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used dealer_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1576 free (3 chunks); 1496 used pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used pg_language_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1576 free (3 chunks); 1496 used pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_namespace_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_type_typname_nsp_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amop_opr_fam_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_proc_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used MdSmgr: 8192 total in 1 blocks; 7104 free (0 chunks); 1088 used LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
"Matt Magoffin" <postgresql.org@msqr.us> writes: > I have a SELECT query that causes an out-of-memory error on my production > Postgres 8.3 server. 8.3.which, and what exactly is the query and its EXPLAIN plan? > I believe the culprit is from the sort in the query, No, the problem seems to be here > ExecutorState: 841031232 total in 51159 blocks; 1712232 free (56 > chunks); 839319000 used and it's impossible to guess what that's about without a lot more details than you provided. The sort's only eating 28MB: > TupleSort: 28303408 total in 13 blocks; 5346944 free (14 chunks); > 22956464 used regards, tom lane
Matt Magoffin wrote: > I have a SELECT query that causes an out-of-memory error on my production > Postgres 8.3 server. I believe the culprit is from the sort in the query, > but its running out on queries that return 300,000 results which I didn't > think was very large. It's not the sort. There's a memory leak in the system somewhere. Can you post a reproducible test case? (If you can do it in the next couple of days, there's a chance the fix will make it to 8.3.4). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>> I have a SELECT query that causes an out-of-memory error on my >> production >> Postgres 8.3 server. > > 8.3.which, and what exactly is the query and its EXPLAIN plan? > >> I believe the culprit is from the sort in the query, > > No, the problem seems to be here > >> ExecutorState: 841031232 total in 51159 blocks; 1712232 free (56 >> chunks); 839319000 used > > and it's impossible to guess what that's about without a lot more > details than you provided. I see, I was looking at the log entries wrong, sorry. This is Postgres 8.3.3 + an xml.c patch (from Kris - http://archives.postgresql.org/pgsql-general/2008-07/msg00110.php) to fix a memory leak there. The EXPLAIN for this query was this: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=238861.67..239149.85 rows=115274 width=70) Sort Key: id -> Bitmap Heap Scan on lead (cost=3828.21..229170.17 rows=115274 width=70) Recheck Cond: (((date(timezone('-06:00:00'::interval, created)) >= '2008-08-15'::date) AND (date(timezone('-06:00:00'::interval, created)) <= '2008-08-15'::date)) OR ((date(timezone('-06:00:00'::interval, modified)) >= '2008-08-15'::date) AND (date(timezone('-06:00:00'::interval, modified)) <= '2008-08-15'::date))) -> BitmapOr (cost=3828.21..3828.21 rows=115274 width=0) -> Bitmap Index Scan on reporting_date_idx (cost=0.00..4.91 rows=1 width=0) Index Cond: ((date(timezone('-06:00:00'::interval, created)) >= '2008-08-15'::date) AND (date(timezone('-06:00:00'::interval, created)) <= '2008-08-15'::date)) -> Bitmap Index Scan on reporting_modified_idx (cost=0.00..3765.67 rows=115274 width=0) Index Cond: ((date(timezone('-06:00:00'::interval, modified)) >= '2008-08-15'::date) AND (date(timezone('-06:00:00'::interval, modified)) <= '2008-08-15'::date)) (9 rows) This particular query fails with this out-of-memory error when the result size is about 300k or so. I'm also seeing this TopMemoryContext output in the logs from some other situations (sorry I'm not familiar with what/how this gets logged), like this autovacuum task that appears to dump this into the log: 2008-07-17 00:19:26 CDT ERROR: canceling autovacuum task 2008-07-17 00:19:26 CDT CONTEXT: automatic vacuum of table "lms_infiniti._lms.sl_log_2" TopMemoryContext: 129952 total in 15 blocks; 32088 free (139 chunks); 97864 used LibxmlContext: 8380416 total in 10 blocks; 3123904 free (0 chunks); 5256512 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used PL/PgSQL function context: 8192 total in 1 blocks; 4400 free (3 chunks); 3792 used PL/PgSQL function context: 8192 total in 1 blocks; 5512 free (4 chunks); 2680 used Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PL/PgSQL function context: 24576 total in 2 blocks; 21000 free (10 chunks); 3576 used CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PLpgSQL function cache: 24224 total in 2 blocks; 3744 free (0 chunks); 20480 used Record information cache: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used Prepared Queries: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used MessageContext: 8192 total in 1 blocks; 6976 free (1 chunks); 1216 used smgr relation table: 24576 total in 2 blocks; 9776 free (4 chunks); 14800 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 8192 total in 1 blocks; 7616 free (0 chunks); 576 used PortalHeapMemory: 2048 total in 1 blocks; 360 free (0 chunks); 1688 used ExecutorState: 1019641424 total in 92998 blocks; 6834024 free (66 chunks); 1012807400 used TIDBitmap: 4186112 total in 9 blocks; 1774784 free (26 chunks); 2411328 used TupleSort: 31449136 total in 13 blocks; 3451184 free (12 chunks); 27997952 used ExprContext: 8192 total in 1 blocks; 8096 free (0 chunks); 96 used ExprContext: 8192 total in 1 blocks; 8096 free (0 chunks); 96 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used PortalHeapMemory: 1024 total in 1 blocks; 944 free (0 chunks); 80 used Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks); 11744 used CacheMemoryContext: 1341680 total in 21 blocks; 620264 free (1633 chunks); 721416 used pg_toast_16580_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used CachedPlan: 15360 total in 4 blocks; 1528 free (0 chunks); 13832 used CachedPlanSource: 15360 total in 4 blocks; 1440 free (0 chunks); 13920 used reporting_modified_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used reporting_last_processed_date_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used reporting_date_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used lead_created_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_processing_step_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_destination_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_modified_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_source_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used processing_state_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used lead_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used unnamed prepared statement: 8192 total in 1 blocks; 3856 free (2 chunks); 4336 used locks_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used CachedPlan: 3072 total in 2 blocks; 648 free (0 chunks); 2424 used pg_language_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used xslt_style_sheet_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used offline_report_key_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used offline_report_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used pg_type_typname_nsp_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_statistic_relid_att_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_amop_opr_fam_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used external_system_user_name_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used external_system_name_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used external_system_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_cast_source_target_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_operator_oprname_l_r_n_index: 2048 total in 1 blocks; 392 free (0 chunks); 1656 used pg_class_relname_nsp_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_type_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_proc_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_proc_proname_args_nsp_index: 2048 total in 1 blocks; 584 free (0 chunks); 1464 used pg_namespace_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_namespace_nspname_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used CachedPlan: 3072 total in 2 blocks; 1160 free (0 chunks); 1912 used CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 3072 total in 2 blocks; 1984 free (0 chunks); 1088 used CachedPlanSource: 3072 total in 2 blocks; 1792 free (1 chunks); 1280 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 3072 total in 2 blocks; 1984 free (0 chunks); 1088 used CachedPlanSource: 3072 total in 2 blocks; 1792 free (1 chunks); 1280 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 3072 total in 2 blocks; 1984 free (0 chunks); 1088 used CachedPlanSource: 3072 total in 2 blocks; 1792 free (1 chunks); 1280 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 3072 total in 2 blocks; 1984 free (0 chunks); 1088 used CachedPlanSource: 3072 total in 2 blocks; 1792 free (1 chunks); 1280 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlanSource: 1024 total in 1 blocks; 24 free (0 chunks); 1000 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used CachedPlanSource: 3072 total in 2 blocks; 1248 free (1 chunks); 1824 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used MdSmgr: 8192 total in 1 blocks; 7296 free (54 chunks); 896 used LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used If there is any other information I can provide, please let me know. -- m@
> It's not the sort. There's a memory leak in the system somewhere. Can > you post a reproducible test case? (If you can do it in the next couple > of days, there's a chance the fix will make it to 8.3.4). I'm not sure how to go about creating a test case for this. I have had a nagging suspicion about a memory leak somewhere in this system (starting back with this thread: http://archives.postgresql.org/pgsql-general/2008-07/msg00048.php) but I really don't have any ideas on were to even begin with creating a test case. I did have a small test case with that thread (http://archives.postgresql.org/pgsql-general/2008-07/msg00063.php) but that was based on my guess about the XML stuff causing the leak. I have no real proof of this, however, nor do I have any idea of a way to narrow down the problem. -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: > I see, I was looking at the log entries wrong, sorry. This is Postgres > 8.3.3 + an xml.c patch (from Kris - > http://archives.postgresql.org/pgsql-general/2008-07/msg00110.php) to fix > a memory leak there. The EXPLAIN for this query was this: Is this particular query using xml features, and if so which? Actually, could you just show us the whole query and the schemas of the tables it's using? > I'm also seeing this TopMemoryContext output in the logs from some other > situations (sorry I'm not familiar with what/how this gets logged), like > this autovacuum task that appears to dump this into the log: The memory context dump happens when we're about to report an "out of memory" error, so actually the relevant ERROR line would come after the context dump not before. I'm supposing it's another instance of the same problem, since the main bloat seems to be here: > ExecutorState: 1019641424 total in 92998 blocks; 6834024 free (66 > chunks); 1012807400 used > TIDBitmap: 4186112 total in 9 blocks; 1774784 free (26 chunks); > 2411328 used > TupleSort: 31449136 total in 13 blocks; 3451184 free (12 chunks); > 27997952 used Thinking about it, it's actually a bit hard to see how it could be an xml.c problem --- AFAIR that code should only be running in per-tuple contexts. A leak at the ExecutorState level might perhaps have something to do with misbehaving aggregates, or some such. Anyway we really need an example query to test in order to find it. regards, tom lane
> Is this particular query using xml features, and if so which? Actually, > could you just show us the whole query and the schemas of the tables > it's using? No, other than returning a column of type xml. The query for the EXPLAIN output I posted before is this: select lead.id as leadId, lead.xml as xml, lead.processing_state as processingState, lead.processing_step as processingStep, lead.processing_attempts as processingAttempts, lead.created as createdDate, lead.last_processed as lastProcessedDate from lead lead where (date(lead.created at time zone interval '-06') between date('2008-08-15') and date('2008-08-15')) or (date(lead.modified at time zone interval '-06') between date('2008-08-15') and date('2008-08-15')) order by lead.id And the schema of the lead table is this: lms_nna=# \d lead Table "public.lead" Column | Type | Modifiers ---------------------+--------------------------+----------- id | bigint | not null xml | xml | source | character varying(50) | not null destination | character varying(50) | processing_state | character varying(20) | not null created | timestamp with time zone | not null modified | timestamp with time zone | not null last_processed | timestamp with time zone | processing_step | integer | processing_attempts | integer | Indexes: "lead_pkey" PRIMARY KEY, btree (id), tablespace "lms_index" "lead_created_idx" btree (created), tablespace "lms_index" "lead_destination_idx" btree (destination), tablespace "lms_index" "lead_modified_idx" btree (modified), tablespace "lms_index" "lead_processing_step_idx" btree (processing_step), tablespace "lms_index" "lead_source_idx" btree (source), tablespace "lms_index" "processing_state_idx" btree (processing_state), tablespace "lms_index" "reporting_date_idx" btree (date(timezone('-06:00:00'::interval, created))), tablespace "lms_index" "reporting_last_processed_date_idx" btree (date(timezone('-06:00:00'::interval, last_processed))), tablespace "lms_index" "reporting_modified_idx" btree (date(timezone('-06:00:00'::interval, modified))) Foreign-key constraints: "fk_lead_destination" FOREIGN KEY (destination) REFERENCES external_system(name) ON UPDATE RESTRICT ON DELETE RESTRICT "fk_lead_source" FOREIGN KEY (source) REFERENCES external_system(name) ON UPDATE RESTRICT ON DELETE RESTRICT Triggers: _lms_logtrigger_11 AFTER INSERT OR DELETE OR UPDATE ON lead FOR EACH ROW EXECUTE PROCEDURE _lms.logtrigger('_lms', '11', 'kvvvvvvvvv') create_lead_reporting_data AFTER INSERT OR UPDATE ON lead FOR EACH ROW EXECUTE PROCEDURE update_lead_reporting_data() set_modified BEFORE INSERT OR UPDATE ON lead FOR EACH ROW EXECUTE PROCEDURE update_modified() Would you also like the schemas for the referenced FOREIGN KEY tables? You can see we have some insert/update triggers there, which I can provide the source for if you need (update_lead_reporting_data() is quite large, though). -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: >> Is this particular query using xml features, and if so which? Actually, >> could you just show us the whole query and the schemas of the tables >> it's using? > No, other than returning a column of type xml. Huh, I was overthinking the problem. xml_out has a memory leak :-( More than one in fact: a large leak (equal to size of output) in the executor context (hence query lifespan) and a small one in LibxmlContext (hence transaction lifespan). Try the patch here: http://archives.postgresql.org/pgsql-committers/2008-09/msg00159.php (In the long run it might be better if we didn't run datatype output functions in a query-lifespan context ...) regards, tom lane
> Huh, I was overthinking the problem. xml_out has a memory leak :-( > > More than one in fact: a large leak (equal to size of output) in > the executor context (hence query lifespan) and a small one in > LibxmlContext (hence transaction lifespan). > > Try the patch here: > http://archives.postgresql.org/pgsql-committers/2008-09/msg00159.php I've applied this patch now to our staging and production environments, and Postgres performed quite well after testing very large result sets that were previously failing with out of memory errors... so the patch is successful for this problem I was having. I didn't see any significant increase in memory over the course of the query execution, in fact. Thanks! -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: >> Try the patch here: >> http://archives.postgresql.org/pgsql-committers/2008-09/msg00159.php > I've applied this patch now to our staging and production environments, > and Postgres performed quite well after testing very large result sets > that were previously failing with out of memory errors... so the patch is > successful for this problem I was having. I didn't see any significant > increase in memory over the course of the query execution, in fact. Good to hear. That patch is baked into 8.3.4, if you want to go back to using a stock release... regards, tom lane