Обсуждение: free(): invalid pointer , when to dumping data to a new server

Поиск
Список
Период
Сортировка

free(): invalid pointer , when to dumping data to a new server

От
Lars Aksel Opsahl
Дата:

Hi


I have problem with a new server that crashes when restoring data to it. It's around 1.5 TB of raw data with both spatial and raster data.


I use Postgres 12.4 and POSTGIS="3.0.2 2fb2a18" [EXTENSION] PGSQL="120" GEOS="3.8.1-CAPI-1.13.3" SFCGAL="1.3.7" PROJ="7.1.1" GDAL="GDAL 3.1.3, released 2020/09/01" LIBXML="2.9.7" LIBJSON="0.13.1" LIBPROTOBUF="1.3.0" WAGYU="0.4.3 (Internal)" TOPOLOGY RASTER

(I have also tested with postgis 3.1 and latest form master)


For Os I use CentOS Linux release 8.2.2004 (Core) and kernel 4.18.0-193.19.1.el8_2.x86_64 on server with 64 “Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz”


First I have restored 3 schemas with no errors the logs

388G data/sl/schema_grl_met_wisline_v2

49G data/sl/schema_met_vaer_dyrkbar

118G data/sl/schema_met_vaer_wisline


using this commands :

pg_restore -j 50 --format=d -d sl data/sl/schema_grl_met_wisline_v2/ 2> log/sl/restor_grl_met_wisline_v2.log

pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_wisline/ 2> log/sl/restor_met_vaer_wisline.log

pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_dyrkbar/ 2> log/sl/restor_met_vaer_dyrkbar.log


Then i start to restore the rest of data below.

744G data/sl/schema_sl_rest_ofd


Here I split the restore job in 3 parts, like below .


Command 1 below went OK and no errors in the client log file or on the server side.

pg_restore --section=pre-data -1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_pre_data.log


Command 2 below went OK according to logs the client side but not according logs on the server side and the last table with one more than one billion rows are empty.

pg_restore --section=data -j 50 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_data.log


I do not understand why I do not have any errors in client log file, because on the server side I this.

< [local] postgres, sl, 2020-10-22 00:52:47.823 CEST >LOG: duration: 3417759.209 ms statement: COPY grl_sk.dtm10_norge_retiled (rid, rast, filename) FROM stdin;

< [local] postgres, sl, 2020-10-22 00:56:05.771 CEST >LOG: duration: 3615706.705 ms statement: COPY sl_sos.lsk_arome_v1 (flateid, point_uid_ref, epoch_hours, tg_l03, tg_l04, tswi_l03, tswi_l04

, tg_l05, tswi_l05, pr, t2m, sfcwindl, snw, twpr) FROM stdin;

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

< , , 2020-10-22 00:56:05.830 CEST >LOG: server process (PID 516912) was terminated by signal 6: Aborted

< , , 2020-10-22 00:56:05.830 CEST >LOG: terminating any other active server processes

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:10.892 CEST >LOG: all server processes terminated; reinitializing

< , , 2020-10-22 00:56:11.324 CEST >LOG: database system was interrupted; last known up at 2020-10-22 00:52:33 CEST

< , , 2020-10-22 00:56:11.480 CEST >LOG: database system was not properly shut down; automatic recovery in progress

< , , 2020-10-22 00:56:11.521 CEST >LOG: redo starts at B9C/EA83BBF0

< , , 2020-10-22 00:56:35.074 CEST >LOG: redo done at BA0/FDFFEF48

< , , 2020-10-22 00:56:51.250 CEST >LOG: database system is ready to accept connections


I don't care about the last table, we dont't need that data,  so I run the last command below.

pg_restore --section=post-data -j 1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_post_data.log


Then I see this in the logs so I to run the above command multiple times.


First time :

< [local] postgres, sl, 2020-10-22 03:46:27.349 CEST >LOG: duration: 1746.370 ms statement: CREATE INDEX o_8_dtm10_norge_retiled_st_convexhull_idx ON grl_sk.o_8_dtm10_norge_retiled USING gist

(public.st_convexhull(rast));

free(): invalid pointer

free(): invalid pointer

< , , 2020-10-22 03:46:30.283 CEST >LOG: background worker "parallel worker" (PID 658814) was terminated by signal 6: Aborted

< , , 2020-10-22 03:46:30.283 CEST >LOG: terminating any other active server processes

< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process

< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 03:46:30.599 CEST >LOG: all server processes terminated; reinitializing

< , , 2020-10-22 03:46:31.019 CEST >LOG: database system was interrupted; last known up at 2020-10-22 03:41:23 CEST

< , , 2020-10-22 03:46:31.178 CEST >LOG: database system was not properly shut down; automatic recovery in progress

< , , 2020-10-22 03:46:31.224 CEST >LOG: redo starts at BAA/B07178F8

< , , 2020-10-22 03:46:37.209 CEST >LOG: redo done at BAC/7F3DFD70


Second time :

< [local] postgres, sl, 2020-10-22 08:13:08.882 CEST >LOG: duration: 417316.937 ms statement: CREATE INDEX geoidx_org_forestmap____ostl3_predictions ON org_forestmap.ostl3_predictions USING gi

st (geo);

< [local] munin, template1, 2020-10-22 08:13:36.797 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:13:36.835 CEST >FATAL: role "munin" does not exist

< [local] postgres, sl_help, 2020-10-22 08:13:39.106 CEST >FATAL: database "sl_help" does not exist

< [local] postgres, sl_help, 2020-10-22 08:13:39.183 CEST >FATAL: database "sl_help" does not exist

< [local] munin, template1, 2020-10-22 08:13:44.607 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:13:44.684 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:18:36.225 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:18:36.266 CEST >FATAL: role "munin" does not exist

< [local] postgres, sl_help, 2020-10-22 08:18:38.611 CEST >FATAL: database "sl_help" does not exist

< [local] postgres, sl_help, 2020-10-22 08:18:38.691 CEST >FATAL: database "sl_help" does not exist

< [local] munin, template1, 2020-10-22 08:18:44.234 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:18:44.314 CEST >FATAL: role "munin" does not exist

free(): invalid pointer

free(): invalid pointer

< , , 2020-10-22 08:19:58.932 CEST >LOG: background worker "parallel worker" (PID 835249) was terminated by signal 6: Aborted

< , , 2020-10-22 08:19:58.932 CEST >LOG: terminating any other active server processes

< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.970 CEST >FATAL: the database system is in recovery mode

< , , 2020-10-22 08:19:59.041 CEST >LOG: all server processes terminated; reinitializing

< , , 2020-10-22 08:19:59.476 CEST >LOG: database system was interrupted; last known up at 2020-10-22 08:11:17 CEST

< , , 2020-10-22 08:20:01.901 CEST >LOG: database system was not properly shut down; automatic recovery in progress

< , , 2020-10-22 08:20:02.005 CEST >LOG: redo starts at BB1/AA4309A0

< , , 2020-10-22 08:20:03.517 CEST >LOG: invalid record length at BB1/FBF33FB0: wanted 24, got 0

< , , 2020-10-22 08:20:03.517 CEST >LOG: redo done at BB1/FBF33F78

< , , 2020-10-22 08:20:05.573 CEST >LOG: database system is ready to accept connections


Any idea about why this happens and what I can do to fix it ?


Do you believe Postgres 13 will work better ?


Thanks

Lars

Re: free(): invalid pointer , when to dumping data to a new server

От
Lars Aksel Opsahl
Дата:



From: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>
Sent: Thursday, October 22, 2020 9:32 AM
To: pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>
Subject: free(): invalid pointer , when to dumping data to a new server
 
>
>
>From: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>Sent: Thursday, October 22, 2020 9:32 AMTo: pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>Subject: free(): invalid pointer , when to dumping data to a new server
>
>
>Hi
>
>
>I have problem with a new server that crashes when restoring data to it. It's around 1.5 TB of raw data with both spatial and raster data.
>
>
>I use Postgres 12.4 and POSTGIS="3.0.2 2fb2a18" [EXTENSION] PGSQL="120" GEOS="3.8.1-CAPI-1.13.3" SFCGAL="1.3.7" PROJ="7.1.1" GDAL="GDAL 3.1.3, released 2020/09/01" LIBXML="2.9.7" LIBJSON="0.13.1" LIBPROTOBUF="1.3.0" WAGYU="0.4.3 (Internal)" TOPOLOGY RASTER
>
>(I have also tested with postgis 3.1 and latest form master)
>
>
>For Os I use CentOS Linux release 8.2.2004 (Core) and kernel 4.18.0-193.19.1.el8_2.x86_64 on server with 64 “Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz”
>
>
>First I have restored 3 schemas with no errors the logs
>
>388G data/sl/schema_grl_met_wisline_v2
>
>49G data/sl/schema_met_vaer_dyrkbar
>
>118G data/sl/schema_met_vaer_wisline
>
>
>using this commands :
>
>pg_restore -j 50 --format=d -d sl data/sl/schema_grl_met_wisline_v2/ 2> log/sl/restor_grl_met_wisline_v2.log
>
>pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_wisline/ 2> log/sl/restor_met_vaer_wisline.log
>
>pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_dyrkbar/ 2> log/sl/restor_met_vaer_dyrkbar.log
>
>
>Then i start to restore the rest of data below.
>
>744G data/sl/schema_sl_rest_ofd
>
>
>Here I split the restore job in 3 parts, like below .
>
>
>Command 1 below went OK and no errors in the client log file or on the server side.
>
>pg_restore --section=pre-data -1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_pre_data.log
>
>
>Command 2 below went OK according to logs the client side but not according logs on the server side and the last table with one more than one billion rows are empty.
>
>pg_restore --section=data -j 50 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_data.log
>
>
>I do not understand why I do not have any errors in client log file, because on the server side I this.
>
>< [local] postgres, sl, 2020-10-22 00:52:47.823 CEST >LOG: duration: 3417759.209 ms statement: COPY grl_sk.dtm10_norge_retiled (rid, rast, filename) FROM stdin;
>
>< [local] postgres, sl, 2020-10-22 00:56:05.771 CEST >LOG: duration: 3615706.705 ms statement: COPY sl_sos.lsk_arome_v1 (flateid, point_uid_ref, epoch_hours, tg_l03, tg_l04, tswi_l03, tswi_l04
>
>, tg_l05, tswi_l05, pr, t2m, sfcwindl, snw, twpr) FROM stdin;
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>< , , 2020-10-22 00:56:05.830 CEST >LOG: server process (PID 516912) was terminated by signal 6: Aborted
>
>< , , 2020-10-22 00:56:05.830 CEST >LOG: terminating any other active server processes
>
>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process
>
>< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process
>
>< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process
>
>< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process
>
>< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< , , 2020-10-22 00:56:10.892 CEST >LOG: all server processes terminated; reinitializing
>
>< , , 2020-10-22 00:56:11.324 CEST >LOG: database system was interrupted; last known up at 2020-10-22 00:52:33 CEST
>
>< , , 2020-10-22 00:56:11.480 CEST >LOG: database system was not properly shut down; automatic recovery in progress
>
>< , , 2020-10-22 00:56:11.521 CEST >LOG: redo starts at B9C/EA83BBF0
>
>< , , 2020-10-22 00:56:35.074 CEST >LOG: redo done at BA0/FDFFEF48
>
>< , , 2020-10-22 00:56:51.250 CEST >LOG: database system is ready to accept connections
>
>
>I don't care about the last table, we dont't need that data,  so I run the last command below.
>
>pg_restore --section=post-data -j 1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_post_data.log
>
>
>Then I see this in the logs so I to run the above command multiple times.
>
>
>First time :
>
>< [local] postgres, sl, 2020-10-22 03:46:27.349 CEST >LOG: duration: 1746.370 ms statement: CREATE INDEX o_8_dtm10_norge_retiled_st_convexhull_idx ON grl_sk.o_8_dtm10_norge_retiled USING gist
>
>(public.st_convexhull(rast));
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>< , , 2020-10-22 03:46:30.283 CEST >LOG: background worker "parallel worker" (PID 658814) was terminated by signal 6: Aborted
>
>< , , 2020-10-22 03:46:30.283 CEST >LOG: terminating any other active server processes
>
>< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process
>
>< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< , , 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process
>
>< , , 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< , , 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< , , 2020-10-22 03:46:30.599 CEST >LOG: all server processes terminated; reinitializing
>
>< , , 2020-10-22 03:46:31.019 CEST >LOG: database system was interrupted; last known up at 2020-10-22 03:41:23 CEST
>
>< , , 2020-10-22 03:46:31.178 CEST >LOG: database system was not properly shut down; automatic recovery in progress
>
>< , , 2020-10-22 03:46:31.224 CEST >LOG: redo starts at BAA/B07178F8
>
>< , , 2020-10-22 03:46:37.209 CEST >LOG: redo done at BAC/7F3DFD70
>
>
>Second time :
>
>< [local] postgres, sl, 2020-10-22 08:13:08.882 CEST >LOG: duration: 417316.937 ms statement: CREATE INDEX geoidx_org_forestmap____ostl3_predictions ON org_forestmap.ostl3_predictions USING gi
>
>st (geo);
>
>< [local] munin, template1, 2020-10-22 08:13:36.797 CEST >FATAL: role "munin" does not exist
>
>< [local] munin, template1, 2020-10-22 08:13:36.835 CEST >FATAL: role "munin" does not exist
>
>< [local] postgres, sl_help, 2020-10-22 08:13:39.106 CEST >FATAL: database "sl_help" does not exist
>
>< [local] postgres, sl_help, 2020-10-22 08:13:39.183 CEST >FATAL: database "sl_help" does not exist
>
>< [local] munin, template1, 2020-10-22 08:13:44.607 CEST >FATAL: role "munin" does not exist
>
>< [local] munin, template1, 2020-10-22 08:13:44.684 CEST >FATAL: role "munin" does not exist
>
>< [local] munin, template1, 2020-10-22 08:18:36.225 CEST >FATAL: role "munin" does not exist
>
>< [local] munin, template1, 2020-10-22 08:18:36.266 CEST >FATAL: role "munin" does not exist
>
>< [local] postgres, sl_help, 2020-10-22 08:18:38.611 CEST >FATAL: database "sl_help" does not exist
>
>< [local] postgres, sl_help, 2020-10-22 08:18:38.691 CEST >FATAL: database "sl_help" does not exist
>
>< [local] munin, template1, 2020-10-22 08:18:44.234 CEST >FATAL: role "munin" does not exist
>
>< [local] munin, template1, 2020-10-22 08:18:44.314 CEST >FATAL: role "munin" does not exist
>
>free(): invalid pointer
>
>free(): invalid pointer
>
>< , , 2020-10-22 08:19:58.932 CEST >LOG: background worker "parallel worker" (PID 835249) was terminated by signal 6: Aborted
>
>< , , 2020-10-22 08:19:58.932 CEST >LOG: terminating any other active server processes
>
>< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process
>
>< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< , , 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process
>
>< , , 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< , , 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process
>
>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.970 CEST >FATAL: the database system is in recovery mode
>
>< , , 2020-10-22 08:19:59.041 CEST >LOG: all server processes terminated; reinitializing
>
>< , , 2020-10-22 08:19:59.476 CEST >LOG: database system was interrupted; last known up at 2020-10-22 08:11:17 CEST
>
>< , , 2020-10-22 08:20:01.901 CEST >LOG: database system was not properly shut down; automatic recovery in progress
>
>< , , 2020-10-22 08:20:02.005 CEST >LOG: redo starts at BB1/AA4309A0
>
>< , , 2020-10-22 08:20:03.517 CEST >LOG: invalid record length at BB1/FBF33FB0: wanted 24, got 0
>
>< , , 2020-10-22 08:20:03.517 CEST >LOG: redo done at BB1/FBF33F78
>
>< , , 2020-10-22 08:20:05.573 CEST >LOG: database system is ready to accept connections
>
>
>Any idea about why this happens and what I can do to fix it ?
>
>
>Do you believe Postgres 13 will work better ?
>
>Thanks
>
>Lars

I did a test with Postgres 13 and right after running this cmd "pg_restore  --section=pre-data -1 --format=d -d sl data/sl/schema_sl_rest_ofd" where the rest of tables and schemes was created with no errors  and before I was running this cmd  "pg_restore  --section=data -j 50 --format=d -d sl data/sl/schema_sl_rest_ofd" I found errors in the server log.

(The table grl_met_wisline_v2.nora_bc25_observation_200912 was with no errors two hours before.
"LOG:  duration: 589215.382 ms  statement: COPY grl_met_wisline_v2.nora_bc25_observation_200912 (point_uid_ref, epoch_hours, tg_l03, tg_l04, tswi_l03, tswi_l04, tg_l05, tswi_l05, pr, t2m, sfcwindl, snw, twpr) FROM stdin")

I the server log I found this

<  , , 2020-10-22 14:23:44.167 CEST >LOG:  server process (PID 1085979) was terminated by signal 6: Aborted
<  , , 2020-10-22 14:23:44.167 CEST >LOG:  terminating any other active server processes
<  , , 2020-10-22 14:23:44.167 CEST >WARNING:  terminating connection because of crash of another server process
<  , , 2020-10-22 14:23:44.167 CEST >DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
<  , , 2020-10-22 14:23:44.167 CEST >HINT:  In a moment you should be able to reconnect to the database and repeat your command.
<  , , 2020-10-22 14:23:44.167 CEST >CONTEXT:  while scanning block 393158 of relation "grl_met_wisline_v2.nora_bc25_observation_201001"
<  , , 2020-10-22 14:23:44.167 CEST >WARNING:  terminating connection because of crash of another server process
<  , , 2020-10-22 14:23:44.167 CEST >DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
<  , , 2020-10-22 14:23:44.167 CEST >HINT:  In a moment you should be able to reconnect to the database and repeat your command.
<  , , 2020-10-22 14:23:44.167 CEST >CONTEXT:  while scanning block 59409 of relation "grl_met_wisline_v2.nora_bc25_observation_200912"
<  , , 2020-10-22 14:23:44.167 CEST >WARNING:  terminating connection because of crash of another server process
<  , , 2020-10-22 14:23:44.167 CEST >DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
<  , , 2020-10-22 14:23:44.167 CEST >HINT:  In a moment you should be able to reconnect to the database and repeat your command.
<  , , 2020-10-22 14:23:44.167 CEST >CONTEXT:  while scanning block 51913 of relation "grl_met_wisline_v2.nora_bc25_observation_200910"
<  , , 2020-10-22 14:23:44.167 CEST >WARNING:  terminating connection because of crash of another server process
<  , , 2020-10-22 14:23:44.167 CEST >DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
<  , , 2020-10-22 14:23:44.167 CEST >HINT:  In a moment you should be able to reconnect to the database and repeat your command.
<  , , 2020-10-22 14:23:44.185 CEST >LOG:  all server processes terminated; reinitializing
<  , , 2020-10-22 14:23:44.616 CEST >LOG:  database system was interrupted; last known up at 2020-10-22 14:23:21 CEST
<  , , 2020-10-22 14:23:44.782 CEST >LOG:  database system was not properly shut down; automatic recovery in progress
<  , , 2020-10-22 14:23:44.823 CEST >LOG:  redo starts at 146/A2485468
<  , , 2020-10-22 14:23:46.035 CEST >LOG:  redo done at 146/A83A7F98
<  , , 2020-10-22 14:23:47.043 CEST >LOG:  database system is ready to accept connections

Lars

The


Re: free(): invalid pointer , when to dumping data to a new server

От
Lars Aksel Opsahl
Дата:




From: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>
Sent: Thursday, October 22, 2020 9:32 AM
To: pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>
Subject: free(): invalid pointer , when to dumping data to a new server
 
<>Hi
<>
<>
<>I have problem with a new server that crashes when restoring data to it. It's around 1.5 TB of raw data with both spatial and raster data.
<>

I have done some more testing and postgres also crashes right after  'pg_restore  --section=pre-data -1' is done also.  Here I also see the 'free(): invalid pointer' in the postgres log.

This happens on with Postgres 12 and Postgres 13. and any version of postgis I have tested with.

The pre-data input is little bit more than 200.000 lines with more than 6000 CREATE statements, if I test with the first 100 lines it does not crash. It seems like no data are lost here because the server does recovery at startup.

I tested on another with CentOS and the same software and it worked ok

The server diagnostic does not indicate any memory errors.

Dnf search does not give mee any debuginfo file for postgres
    • yum search debuginfo|grep postgres
      Last metadata expiration check: 0:17:59 ago on Mon 26 Oct 2020 10:53:03 AM CET.
      postgresql12-odbc-debuginfo.x86_64 : Debug information for package postgresql12-odbc


From gdb I get this

gdb /usr/pgsql-12/bin/postmaster /var/lib/systemd/coredump/core.postmaster.26.6f7d98107f544085a41c6f8b876418d0.3832.1603701965000000
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-11.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/pgsql-12/bin/postmaster...Reading symbols from .gnu_debugdata for /usr/pgsql-12/bin/postgres...(no debugging symbols found)...done.
(no debugging symbols found)...done.
BFD: warning: /var/lib/systemd/coredump/core.postmaster.26.6f7d98107f544085a41c6f8b876418d0.3832.1603701965000000 is truncated: expected core file size >= 2195427328, found: 2147483648

warning: core file may not match specified executable file.
[New LWP 3832]
Cannot access memory at address 0x7fdbd7d42128
Cannot access memory at address 0x7fdbd7d42120
Failed to read a valid object file image from memory.
Core was generated by `postgres: postgres sl [local] idle                     '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fdbd508b70f in ?? ()
Missing separate debuginfos, use: yum debuginfo-install postgresql12-server-12.4-1PGDG.rhel8.x86_64
(gdb) bt
#0  0x00007fdbd508b70f in ?? ()
Backtrace stopped: Cannot access memory at address 0x7ffd0e7285c0
(gdb) quit

Any ideas ?

Lars

Re: free(): invalid pointer , when to dumping data to a new server

От
Laurenz Albe
Дата:
On Mon, 2020-10-26 at 10:22 +0000, Lars Aksel Opsahl wrote:
> I have done some more testing and postgres also crashes right after  'pg_restore  --section=pre-data -1'
>  is done also.  Here I also see the 'free(): invalid pointer' in the postgres log.
> 
> This happens on with Postgres 12 and Postgres 13. and any version of postgis I have tested with. 
> 
> The pre-data input is little bit more than 200.000 lines with more than 6000 CREATE statements,
>  if I test with the first 100 lines it does not crash. It seems like no data are lost here because
>  the server does recovery at startup.
> 
> I tested on another with CentOS and the same software and it worked ok
> 
> The server diagnostic does not indicate any memory errors.
> 
> Dnf search does not give mee any debuginfo file for postgres

You'll have to edit "/etc/yum.repos.d/pgdg-fedora-all.repo" and
set the "PostgreSQL 13 for Fedora $releasever - $basearch - Debuginfo"
repository to "enabled=1".

Then you can install the debuginfo packages and generate a stack trace.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: free(): invalid pointer , when to dumping data to a new server

От
Lars Aksel Opsahl
Дата:

>
>From: Laurenz Albe <laurenz.albe@cybertec.at>Sent: Tuesday, October 27, 2020 3:10 PMTo: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>; pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>Subject: Re: free(): invalid pointer , when to dumping data to a new server
>
.
.
>> Dnf search does not give mee any debuginfo file for postgres
>
>You'll have to edit "/etc/yum.repos.d/pgdg-fedora-all.repo" and
>set the "PostgreSQL 13 for Fedora $releasever - $basearch - Debuginfo"
>repository to "enabled=1".
>
>Then you can install the debuginfo packages and generate a stack trace.
>
>Yours,
>Laurenz Albe
>--
>Cybertec | https://www.cybertec-postgresql.com
>

HI

Thanks that fixed the debug install yes.

Here is the new stackstrace not too much info that helps mee thou.

gdb /usr/pgsql-12/bin/postmaster /var/lib/systemd/coredump/core.postmaster.26.6f7d98107f544085a41c6f8b876418d0.298120.1603810048000000
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-11.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/pgsql-12/bin/postmaster...Reading symbols from /usr/lib/debug/usr/pgsql-12/bin/postgres-12.4-1PGDG.rhel8.x86_64.debug...done.
done.
BFD: warning: /var/lib/systemd/coredump/core.postmaster.26.6f7d98107f544085a41c6f8b876418d0.298120.1603810048000000 is truncated: expected core file size >= 2326896640, found: 2147483648

warning: core file may not match specified executable file.
[New LWP 298120]
Cannot access memory at address 0x7f79c5966128
Cannot access memory at address 0x7f79c5966120
Failed to read a valid object file image from memory.
Core was generated by `postgres: postgres test1 [local] idle                  '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f79c2caf70f in ?? ()
(gdb) bt
#0  0x00007f79c2caf70f in ?? ()
Backtrace stopped: Cannot access memory at address 0x7ffc037c6ae0
(gdb)

Lars

Re: free(): invalid pointer , when to dumping data to a new server

От
Alvaro Herrera
Дата:
On 2020-Oct-27, Lars Aksel Opsahl wrote:

> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /usr/pgsql-12/bin/postmaster...Reading symbols from
/usr/lib/debug/usr/pgsql-12/bin/postgres-12.4-1PGDG.rhel8.x86_64.debug...done.
> done.
> BFD: warning: /var/lib/systemd/coredump/core.postmaster.26.6f7d98107f544085a41c6f8b876418d0.298120.1603810048000000
istruncated: expected core file size >= 2326896640, found: 2147483648
 
> 
> warning: core file may not match specified executable file.

Uh :-(  The core file would have been too large, so the system wrote a
truncated copy, with the result that GDB cannot interpret it correctly.

I don't know if you can raise the 2TB limit on core files, but one thing
you can do is change the "core filter" so that it doesn't dump shared
memory, which is often where the bloat comes from.  This will result in
a much smaller core file, but still useful to find the problem.  You can
do this:

echo 0x01 > /proc/$pid/coredump_filter
where the PID can be the postmaster's, before running pg_dump.



Re: free(): invalid pointer , when to dumping data to a new server

От
Tom Lane
Дата:
Lars Aksel Opsahl <Lars.Opsahl@nibio.no> writes:
> Here is the new stackstrace not too much info that helps mee thou.

This is probably the reason why "bt" is producing garbage:

> BFD: warning: /var/lib/systemd/coredump/core.postmaster.26.6f7d98107f544085a41c6f8b876418d0.298120.1603810048000000
istruncated: expected core file size >= 2326896640, found: 2147483648 

So the core file is being truncated at 2GB, but the process image was a
little bit larger than that.  You need to get a complete core file.

First thing I'd check is the "ulimit -c" setting that the postmaster is
being started under.  I see that systemd is capturing the core dump,
so it could also be that there's some systemd configuration you need
to adjust ("man systemd-coredump" might help here).

            regards, tom lane



Re: free(): invalid pointer , when to dumping data to a new server

От
Lars Aksel Opsahl
Дата:


>
>
>From: Tom Lane <tgl@sss.pgh.pa.us>Sent: Tuesday, October 27, 2020 5:04 PMTo: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>Cc: Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>Subject: Re: free(): invalid pointer , when to dumping data to a new server
>
>Lars Aksel Opsahl <Lars.Opsahl@nibio.no> writes:
>> Here is the new stackstrace not too much info that helps mee thou.
>
>This is probably the reason why "bt" is producing garbage:
>
>> BFD: warning: /var/lib/systemd/coredump/core.postmaster.26.6f7d98107f544085a41c6f8b876418d0.298120.1603810048000000 is truncated: expected core file size >= 2326896640, found: 2147483648
>
>So the core file is being truncated at 2GB, but the process image was a
>little bit larger than that.  You need to get a complete core file.
>
>First thing I'd check is the "ulimit -c" setting that the postmaster is
>being started under.  I see that systemd is capturing the core dump,
>so it could also be that there's some systemd configuration you need
>to adjust ("man systemd-coredump" might help here).
>
>                        regards, tom lane
>
>

Hi

Thanks that fixed core dump output . The ulimit -c was unlimited.

We now have a stacktrace. Right now I test with Postgis 2.5, but it also failed with Postgis 3 also, but I may go back to Postgis 3 because that's what we plan to use in production on this server and then I can remove proj-* .

Here is the output from bt

Core was generated by `postgres: postgres test1 [local] idle                  '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fa9c241e70f in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: yum debuginfo-install CGAL-4.14-1.rhel8.x86_64 SuperLU-5.2.0-7.el8.x86_64 armadillo-9.700.2-1.el8.x86_64 arpack-3.7.0-1.el8.x86_64 atlas-3.10.3-7.el8.x86_64 audit-libs-3.0-0.17.20191104git1c2f876.el8.x86_64 blas-3.8.0-8.el8.x86_64 boost-atomic-1.66.0-7.el8.x86_64 boost-chrono-1.66.0-7.el8.x86_64 boost-date-time-1.66.0-7.el8.x86_64 boost-serialization-1.66.0-7.el8.x86_64 boost-system-1.66.0-7.el8.x86_64 boost-thread-1.66.0-7.el8.x86_64 brotli-1.0.6-1.el8.x86_64 bzip2-libs-1.0.6-26.el8.x86_64 cfitsio-3.47-1.el8.x86_64 cyrus-sasl-lib-2.1.27-1.el8.x86_64 expat-2.2.5-3.el8.x86_64 fontconfig-2.13.1-3.el8.x86_64 freetype-2.9.1-4.el8.x86_64 freexl-1.0.6-1.el8.x86_64 gdal31-libs-3.1.3-2.rhel8.x86_64 geos38-3.8.1-2.rhel8.x86_64 giflib-5.1.4-3.el8.x86_64 glibc-2.28-101.el8.x86_64 gmp-6.1.2-10.el8.x86_64 hdf5-1.10.5-4.el8.x86_64 json-c-0.13.1-0.2.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.17-18.el8.x86_64 lapack-3.8.0-8.el8.x86_64 libblkid-2.32.1-22.el8.x86_64 libcap-2.26-3.el8.x86_64 libcap-ng-0.7.9-5.el8.x86_64 libcom_err-1.45.4-3.el8.x86_64 libcurl-7.61.1-12.el8.x86_64 libgcc-8.3.1-5.el8.0.2.x86_64 libgcrypt-1.8.3-4.el8.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64 libgomp-8.3.1-5.el8.0.2.x86_64 libgpg-error-1.31-1.el8.x86_64 libgta-1.2.1-1.el8.x86_64 libicu-60.3-2.el8_1.x86_64 libidn2-2.2.0-1.el8.x86_64 libjpeg-turbo-1.5.3-10.el8.x86_64 libmount-2.32.1-22.el8.x86_64 libnghttp2-1.33.0-3.el8_2.1.x86_64 libpng-1.6.34-5.el8.x86_64 libpsl-0.20.2-5.el8.x86_64 libselinux-2.9-3.el8.x86_64 libspatialite-5.0.0-beta0_1.el8.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64 libtiff-4.0.9-17.el8.x86_64 libtirpc-1.1.4-4.el8.x86_64 libuuid-2.32.1-22.el8.x86_64 libwebp-1.0.0-1.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 libxml2-2.9.7-7.el8.x86_64 libzstd-1.4.2-2.el8.x86_64 lz4-libs-1.8.1.2-4.el8.x86_64 mpfr-3.1.6-1.el8.x86_64 netcdf-4.7.0-2.el8.x86_64 nspr-4.25.0-2.el8_2.x86_64 openblas-openmp-0.3.3-5.el8.x86_64 openblas-threads-0.3.3-5.el8.x86_64 openldap-2.4.46-11.el8_1.x86_64 openssl-libs-1.1.1c-15.el8.x86_64 pam-1.3.1-8.el8.x86_64 pcre-8.42-4.el8.x86_64 pgrouting_12-3.1.0-2.rhel8.x86_64 poppler-0.66.0-26.el8.x86_64 postgis25_12-2.5.5-1.rhel8.x86_64 proj-6.3.2-4.el8.x86_64 protobuf-c-1.3.0-4.el8.x86_64 sqlite-libs-3.26.0-6.el8.x86_64 sssd-client-2.2.3-20.el8.x86_64 systemd-libs-239-31.el8_2.2.x86_64 xz-libs-5.2.4-3.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64
(gdb) bt
#0  0x00007fa9c241e70f in raise () from /lib64/libc.so.6
#1  0x00007fa9c2408b25 in abort () from /lib64/libc.so.6
#2  0x00007fa9c2461897 in __libc_message () from /lib64/libc.so.6
#3  0x00007fa9c2467fdc in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fa9c24698dc in _int_free () from /lib64/libc.so.6
#5  0x00007f992fe099f9 in osgeo::proj::common::UnitOfMeasure::~UnitOfMeasure() () from /lib64/libproj.so.15
#6  0x00007fa9c2420e9c in __run_exit_handlers () from /lib64/libc.so.6
#7  0x00007fa9c2420fd0 in exit () from /lib64/libc.so.6
#8  0x000000000075beb0 in proc_exit (code=code@entry=0) at ipc.c:152
#9  0x000000000077fd21 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x182b6d8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4455
#10 0x00000000007085a1 in BackendRun (port=0x1819b90, port=0x1819b90) at postmaster.c:4448
#11 BackendStartup (port=0x1819b90) at postmaster.c:4139
#12 ServerLoop () at postmaster.c:1704
#13 0x00000000007094d0 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x17e7800) at postmaster.c:1377
#14 0x0000000000482c7e in main (argc=3, argv=0x17e7800) at main.c:228

Lars

Re: free(): invalid pointer , when to dumping data to a new server

От
Tom Lane
Дата:
Lars Aksel Opsahl <Lars.Opsahl@nibio.no> writes:
> (gdb) bt
> #0  0x00007fa9c241e70f in raise () from /lib64/libc.so.6
> #1  0x00007fa9c2408b25 in abort () from /lib64/libc.so.6
> #2  0x00007fa9c2461897 in __libc_message () from /lib64/libc.so.6
> #3  0x00007fa9c2467fdc in malloc_printerr () from /lib64/libc.so.6
> #4  0x00007fa9c24698dc in _int_free () from /lib64/libc.so.6
> #5  0x00007f992fe099f9 in osgeo::proj::common::UnitOfMeasure::~UnitOfMeasure() () from /lib64/libproj.so.15
> #6  0x00007fa9c2420e9c in __run_exit_handlers () from /lib64/libc.so.6
> #7  0x00007fa9c2420fd0 in exit () from /lib64/libc.so.6
> #8  0x000000000075beb0 in proc_exit (code=code@entry=0) at ipc.c:152
> #9  0x000000000077fd21 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x182b6d8, dbname=<optimized out>,
username=<optimizedout>) at postgres.c:4455 
> #10 0x00000000007085a1 in BackendRun (port=0x1819b90, port=0x1819b90) at postmaster.c:4448
> #11 BackendStartup (port=0x1819b90) at postmaster.c:4139
> #12 ServerLoop () at postmaster.c:1704
> #13 0x00000000007094d0 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x17e7800) at postmaster.c:1377
> #14 0x0000000000482c7e in main (argc=3, argv=0x17e7800) at main.c:228

OK, I'd say that definitely puts the bug outside of core Postgres.
It looks like proj has installed an atexit() handler that is trying
to free something that was already freed.  This might be proj's
fault directly, or perhaps a caller did something wrong (a likely
bet, perhaps, is using palloc for something that proj expects to
survive till program exit).

Probably the next step is to take this to the postgis support lists
and see what they think about it.

            regards, tom lane



Re: free(): invalid pointer , when to dumping data to a new server

От
Lars Aksel Opsahl
Дата:



From: Tom Lane <tgl@sss.pgh.pa.us>
>From: Tom Lane <tgl@sss.pgh.pa.us>Sent: Tuesday, October 27, 2020 5:51 PMTo: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>Cc: Alvaro Herrera <alvherre@alvh.no-ip.org>; Laurenz Albe <laurenz.albe@cybertec.at>; pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>Subject: Re: free(): invalid pointer , when to dumping data to a new server
>
>Lars Aksel Opsahl <Lars.Opsahl@nibio.no> writes:
>> (gdb) bt
>> #0  0x00007fa9c241e70f in raise () from /lib64/libc.so.6
>> #1  0x00007fa9c2408b25 in abort () from /lib64/libc.so.6
>> #2  0x00007fa9c2461897 in __libc_message () from /lib64/libc.so.6
>> #3  0x00007fa9c2467fdc in malloc_printerr () from /lib64/libc.so.6
>> #4  0x00007fa9c24698dc in _int_free () from /lib64/libc.so.6
>> #5  0x00007f992fe099f9 in osgeo::proj::common::UnitOfMeasure::~UnitOfMeasure() () from /lib64/libproj.so.15
>> #6  0x00007fa9c2420e9c in __run_exit_handlers () from /lib64/libc.so.6
>> #7  0x00007fa9c2420fd0 in exit () from /lib64/libc.so.6
>> #8  0x000000000075beb0 in proc_exit (code=code@entry=0) at ipc.c:152
>> #9  0x000000000077fd21 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x182b6d8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4455
>> #10 0x00000000007085a1 in BackendRun (port=0x1819b90, port=0x1819b90) at postmaster.c:4448
>> #11 BackendStartup (port=0x1819b90) at postmaster.c:4139
>> #12 ServerLoop () at postmaster.c:1704
>> #13 0x00000000007094d0 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x17e7800) at postmaster.c:1377
>> #14 0x0000000000482c7e in main (argc=3, argv=0x17e7800) at main.c:228
>
>OK, I'd say that definitely puts the bug outside of core Postgres.
>It looks like proj has installed an atexit() handler that is trying
>to free something that was already freed.  This might be proj's
>fault directly, or perhaps a caller did something wrong (a likely
>bet, perhaps, is using palloc for something that proj expects to
>survive till program exit).
>
>Probably the next step is to take this to the postgis support lists
>and see what they think about it.
>
>                        regards, tom lane
>

Hi

Thanks a lot for your help I will take it to the Postgis lists.

Lars

Re: free(): invalid pointer , when to dumping data to a new server

От
Magnus Hagander
Дата:
On Tue, Oct 27, 2020 at 5:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Lars Aksel Opsahl <Lars.Opsahl@nibio.no> writes:
> > (gdb) bt
> > #0  0x00007fa9c241e70f in raise () from /lib64/libc.so.6
> > #1  0x00007fa9c2408b25 in abort () from /lib64/libc.so.6
> > #2  0x00007fa9c2461897 in __libc_message () from /lib64/libc.so.6
> > #3  0x00007fa9c2467fdc in malloc_printerr () from /lib64/libc.so.6
> > #4  0x00007fa9c24698dc in _int_free () from /lib64/libc.so.6
> > #5  0x00007f992fe099f9 in osgeo::proj::common::UnitOfMeasure::~UnitOfMeasure() () from /lib64/libproj.so.15
> > #6  0x00007fa9c2420e9c in __run_exit_handlers () from /lib64/libc.so.6
> > #7  0x00007fa9c2420fd0 in exit () from /lib64/libc.so.6
> > #8  0x000000000075beb0 in proc_exit (code=code@entry=0) at ipc.c:152
> > #9  0x000000000077fd21 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x182b6d8, dbname=<optimized out>,
username=<optimizedout>) at postgres.c:4455
 
> > #10 0x00000000007085a1 in BackendRun (port=0x1819b90, port=0x1819b90) at postmaster.c:4448
> > #11 BackendStartup (port=0x1819b90) at postmaster.c:4139
> > #12 ServerLoop () at postmaster.c:1704
> > #13 0x00000000007094d0 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x17e7800) at postmaster.c:1377
> > #14 0x0000000000482c7e in main (argc=3, argv=0x17e7800) at main.c:228
>
> OK, I'd say that definitely puts the bug outside of core Postgres.
> It looks like proj has installed an atexit() handler that is trying
> to free something that was already freed.  This might be proj's
> fault directly, or perhaps a caller did something wrong (a likely
> bet, perhaps, is using palloc for something that proj expects to
> survive till program exit).
>
> Probably the next step is to take this to the postgis support lists
> and see what they think about it.


FYI, looking at this one in a bit more details it looks like it's
standard C++ destructors -- they're added to the list of exit handlers
automatically when C++ is used, AIUI.

Still leaves the problem with postgis/proj of course, but as a
noteworthy detail. It's not as easy as an explicit atexit().

However, mostly this looks like a packaging issue on RHEL8, or more
likely a combination of a packaging issue and some updated packages
that somehow trigger a new issue. It seems to load multiple versions
of some of the required C++ libraries at the same time, and I bet
they're conflicting with each other.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: free(): invalid pointer , when to dumping data to a new server

От
"Andres Freund"
Дата:
Hi,


> FYI, looking at this one in a bit more details it looks like it's
> standard C++ destructors -- they're added to the list of exit handlers
> automatically when C++ is used, AIUI.
> 
> Still leaves the problem with postgis/proj of course, but as a
> noteworthy detail. It's not as easy as an explicit atexit().

Yes, there has to be a bug leading to the memory already having been freed or such.

Probably worth trying to run this under valgrind or such.

Andres