Обсуждение: Memory leak (possibly connected to postgis) leading to server crash

Поиск

Список

Период

Сортировка

Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

06 декабря 2019 г., 12:22:42

Hello,

this may be a serious issue and I am not very experienced with reporting this kind of stuff, I will try my best. I believe there may be a memory leak somewhere in postgres/postgis, which can eat up all the available memory, leading to postgres crash (signal 9 kill of backend, postmaster terminating others, recovery).

My setup and workload:

Postgres 12.1 (Ubuntu 12.1-1.pgdg18.04+1) installed via apt on Ubuntu server 18.04 LTS, in VM in Azure cloud.

PostGIS version: 3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1

VM has 8cores, 56 GB RAM, 7 TB RAID from managed disks (azure network storage)

shared_buffers = 12GB

work_mem = 256MB

maintenance_work_mem = 2GB

I was running 8 clients (on different VMs, over network) using the database, batch processing geographic stuff. Each worker is running one query at a time, so pretty much 8 parallel queries 100% of the time. Queries themselves are fairly short (5-60 seconds), SELECTing rows by geometry index (GIST) and doing stuff like ST_SimplifyPreserveTopology, ST_Buffer, ST_Union on them, essentially using all 8 cores to 100%. There is a lot of things to process, so this was going on for maybe 12 hours, when I noticed (in "htop") that memory usage is unusually high, somewhere about 50 GB. It was suspicious, because it is way higher than it should be with 12 GB shared buffers and 256MB work_mem with 8 clients, and it continued to grow slowly (I could see numbers in RES column in htop slowly rise for all backends). Until eventually:

2019-12-06 00:04:24 UTC [21487-8] LOG: server process (PID 52059) was terminated by signal 9: Killed

2019-12-06 00:04:24 UTC [21487-10] LOG: terminating any other active server processes
2019-12-06 00:04:24 UTC [91091-1] WARNING: terminating connection because of crash of another server process
2019-12-06 00:04:24 UTC [91091-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

...

FATAL: the database system is in recovery mode

...

I realize, this is probably not enough for you. But I will be running this workload again soon, so if this repeats, what should I do to help you diagnose this?

Re: Memory leak (possibly connected to postgis) leading to servercrash

От

Tomas Vondra

Дата:

06 декабря 2019 г., 14:01:54

On Fri, Dec 06, 2019 at 10:22:42AM +0100, Roman Cervenak wrote:
>Hello,
>this may be a serious issue and I am not very experienced with reporting
>this kind of stuff, I will try my best. I believe there may be a memory
>leak somewhere in postgres/postgis, which can eat up all the available
>memory, leading to postgres crash (signal 9 kill of backend, postmaster
>terminating others, recovery).
>
>My setup and workload:
>Postgres 12.1 (Ubuntu 12.1-1.pgdg18.04+1) installed via apt on Ubuntu
>server 18.04 LTS, in VM in Azure cloud.
>PostGIS version: 3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
>VM has 8cores, 56 GB RAM, 7 TB RAID from managed disks (azure network
>storage)
>
>shared_buffers = 12GB
>work_mem = 256MB
>maintenance_work_mem = 2GB
>
>I was running 8 clients (on different VMs, over network) using the
>database, batch processing geographic stuff. Each worker is running one
>query at a time, so pretty much 8 parallel queries 100% of the time.
>Queries themselves are fairly short (5-60 seconds), SELECTing rows by
>geometry index (GIST) and doing stuff like ST_SimplifyPreserveTopology,
>ST_Buffer, ST_Union on them, essentially using all 8 cores to 100%. There
>is a lot of things to process, so this was going on for maybe 12 hours,
>when I noticed (in "htop") that memory usage is unusually high, somewhere
>about 50 GB. It was suspicious, because it is way higher than it should be
>with 12 GB shared buffers and 256MB work_mem with 8 clients, and it
>continued to grow slowly (I could see numbers in RES column in htop slowly
>rise for all backends). Until eventually:
>
>2019-12-06 00:04:24 UTC [21487-8]  LOG:  server process (PID 52059) was
>terminated by signal 9: Killed
>2019-12-06 00:04:24 UTC [21487-10]  LOG:  terminating any other active
>server processes
>2019-12-06 00:04:24 UTC [91091-1]  WARNING:  terminating connection because
>of crash of another server process
>2019-12-06 00:04:24 UTC [91091-2]  DETAIL:  The postmaster has commanded
>this server process to roll back the current transaction and exit, because
>another server process exited abnormally and possibly corrupted shared
>memory.
>...
>FATAL:  the database system is in recovery mode
>...
>
>I realize, this is probably not enough for you. But I will be running this
>workload again soon, so if this repeats, what should I do to help you
>diagnose this?

The kill was initiated by oom killer, I suppose? It might be interesting
to see the message written to dmesg by it, it usually includes info
about how much memory was used by the backend etc.

The other thing that might be useful is dump of memory contexts - you'll
have to wait until the memory usage gets excessive (i.e. shortly before
the oom killer would kick in), attach gdb to a backend and call
MemoryContextStats on TopMemoryContext. So, something like this:

$ gdb -p $PID
(gdb) p MemoryContextStats(TopMemoryContext)
(gdb) q

which writes a bunch of info about memory contexts into the server log.

But, I'm not sure this will actually help. Based on what you wrote, the
memory stays allocated across queries. So either it's allocated in one
of the long-lived contexts (which queries usually don't do), or it's
allocated directly through malloc() and not through our memory context
infrastructure (hence it'll be invisible in the context stats).

I'm not particularly familiar with PostGIS, but AFAIK it's using various
libraries, and those are naturally using malloc/free directly. So maybe
it's not freeing the memory in some cases.

What would really help is having some sort of reproducer, and/or running
the queries with valgrind, which can detect memory leaks. 

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

06 декабря 2019 г., 14:46:44

Yes, it was killed by oom killer:

[2037990.376427] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice,task=postgres,pid=52059,uid=111
[2037990.376433] Out of memory: Kill process 52059 (postgres) score 294 or sacrifice child
[2037990.384186] Killed process 52059 (postgres) total-vm:17508832kB, anon-rss:4309296kB, file-rss:108kB, shmem-rss:12641580kB
[2037990.516504] oom_reaper: reaped process 52059 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:12641580kB

(full dmesg.log attached, if it is interesting; there are more postgres backends visible, but they were inactive at the time)

I can try the gdb dump next time I will see it. But I cannot imagine giving you reproducible case - it is 500 GB proprietary database, and without it, queries would be hardly useful, I presume? I can try to make "sample" with generated data and find out if I can reproduce the issue with my queries that way, but that will be quite time consuming.

Would it help to dump the memory of the backend process and deliver the dump (by some private channel) to somebody to identify who is consuming all that memory? (that is the usual drill in windows)

On Fri, Dec 6, 2019 at 12:02 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, Dec 06, 2019 at 10:22:42AM +0100, Roman Cervenak wrote:
>Hello,
>this may be a serious issue and I am not very experienced with reporting
>this kind of stuff, I will try my best. I believe there may be a memory
>leak somewhere in postgres/postgis, which can eat up all the available
>memory, leading to postgres crash (signal 9 kill of backend, postmaster
>terminating others, recovery).
>
>My setup and workload:
>Postgres 12.1 (Ubuntu 12.1-1.pgdg18.04+1) installed via apt on Ubuntu
>server 18.04 LTS, in VM in Azure cloud.
>PostGIS version: 3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
>VM has 8cores, 56 GB RAM, 7 TB RAID from managed disks (azure network
>storage)
>
>shared_buffers = 12GB
>work_mem = 256MB
>maintenance_work_mem = 2GB
>
>I was running 8 clients (on different VMs, over network) using the
>database, batch processing geographic stuff. Each worker is running one
>query at a time, so pretty much 8 parallel queries 100% of the time.
>Queries themselves are fairly short (5-60 seconds), SELECTing rows by
>geometry index (GIST) and doing stuff like ST_SimplifyPreserveTopology,
>ST_Buffer, ST_Union on them, essentially using all 8 cores to 100%. There
>is a lot of things to process, so this was going on for maybe 12 hours,
>when I noticed (in "htop") that memory usage is unusually high, somewhere
>about 50 GB. It was suspicious, because it is way higher than it should be
>with 12 GB shared buffers and 256MB work_mem with 8 clients, and it
>continued to grow slowly (I could see numbers in RES column in htop slowly
>rise for all backends). Until eventually:
>
>2019-12-06 00:04:24 UTC [21487-8] LOG: server process (PID 52059) was
>terminated by signal 9: Killed
>2019-12-06 00:04:24 UTC [21487-10] LOG: terminating any other active
>server processes
>2019-12-06 00:04:24 UTC [91091-1] WARNING: terminating connection because
>of crash of another server process
>2019-12-06 00:04:24 UTC [91091-2] DETAIL: The postmaster has commanded
>this server process to roll back the current transaction and exit, because
>another server process exited abnormally and possibly corrupted shared
>memory.
>...
>FATAL: the database system is in recovery mode
>...
>
>I realize, this is probably not enough for you. But I will be running this
>workload again soon, so if this repeats, what should I do to help you
>diagnose this?

The kill was initiated by oom killer, I suppose? It might be interesting
to see the message written to dmesg by it, it usually includes info
about how much memory was used by the backend etc.

The other thing that might be useful is dump of memory contexts - you'll
have to wait until the memory usage gets excessive (i.e. shortly before
the oom killer would kick in), attach gdb to a backend and call
MemoryContextStats on TopMemoryContext. So, something like this:

$ gdb -p $PID
(gdb) p MemoryContextStats(TopMemoryContext)
(gdb) q

which writes a bunch of info about memory contexts into the server log.

But, I'm not sure this will actually help. Based on what you wrote, the
memory stays allocated across queries. So either it's allocated in one
of the long-lived contexts (which queries usually don't do), or it's
allocated directly through malloc() and not through our memory context
infrastructure (hence it'll be invisible in the context stats).

I'm not particularly familiar with PostGIS, but AFAIK it's using various
libraries, and those are naturally using malloc/free directly. So maybe
it's not freeing the memory in some cases.

What would really help is having some sort of reproducer, and/or running
the queries with valgrind, which can detect memory leaks.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

dmesg.log

Re: Memory leak (possibly connected to postgis) leading to servercrash

От

Tomas Vondra

Дата:

06 декабря 2019 г., 15:41:19

On Fri, Dec 06, 2019 at 12:46:44PM +0100, Roman Cervenak wrote:
>Yes, it was killed by oom killer:
>
>[2037990.376427]

>oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice,task=postgres,pid=52059,uid=111
>[2037990.376433] Out of memory: Kill process 52059 (postgres) score 294 or
>sacrifice child
>[2037990.384186] Killed process 52059 (postgres) total-vm:17508832kB,
>anon-rss:4309296kB, file-rss:108kB, shmem-rss:12641580kB
>[2037990.516504] oom_reaper: reaped process 52059 (postgres), now
>anon-rss:0kB, file-rss:0kB, shmem-rss:12641580kB
>
>(full dmesg.log attached, if it is interesting; there are more postgres
>backends visible, but they were inactive at the time)
>

OK, so it allocated extra ~4.3GB or so (plus shared buffers).

>I can try the gdb dump next time I will see it. But I cannot imagine giving
>you reproducible case - it is 500 GB proprietary database, and without it,
>queries would be hardly useful, I presume? I can try to make "sample" with
>generated data and find out if I can reproduce the issue with my queries
>that way, but that will be quite time consuming.
>

Yeah, I understand the data is proprietary. But we need to identify
where the issue is, somehow. And being able to reproduce it is likely
the best way to do that. One way to do that is to either generate
synthetic data with similar structure/features (geometries of similar
size etc.), and hope that it triggers the issue too. Or distill a subset
of data triggering the issue and anonymize it. Or something like that.

Yes, it's going to be quite time consuming, but I don't have better
ideas. Maybe running it under valgrind would help - you'd have to
rebuild PostgreSQL with valgrind support and run the workload on it. The
queries would be much slower (possibly by an order of magnitude or so),
so you'd have to run it on a different machine, but that's machine time,
not time wasted by a human.

Anyway, let's start with the easy stuff - try running it again and get
the memory context stats. Maybe it's a simple leak in PostgreSQL, in
which case it should be easier to investigate it. If that turns out to
be untrue, you can try this more complicated stuff (valgrind, ...).

>Would it help to dump the memory of the backend process and deliver the
>dump (by some private channel) to somebody to identify who is consuming all
>that memory? (that is the usual drill in windows)
>

I don't think that'd be very helpful - if this really is memory leak in
one of the libraries, I have no idea how to spot that in the memory
dump. Also, if you claim the data is sensitive/proprietary, you should
not really be sharing the dumps.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

11 декабря 2019 г., 14:09:11

Hi,

I am trying to do

(gdb) p MemoryContextStats(TopMemoryContext)

on postgres backend process, but the response is: 'TopMemoryContext' has unknown type; cast it to its declared type

I never debugged postgres (or anything under linux), and google is not helpful. Can you please advise? Full output is attached.

Regards,

Roman

On Fri, Dec 6, 2019 at 1:41 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, Dec 06, 2019 at 12:46:44PM +0100, Roman Cervenak wrote:
>Yes, it was killed by oom killer:
>
>[2037990.376427]
>oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice,task=postgres,pid=52059,uid=111
>[2037990.376433] Out of memory: Kill process 52059 (postgres) score 294 or
>sacrifice child
>[2037990.384186] Killed process 52059 (postgres) total-vm:17508832kB,
>anon-rss:4309296kB, file-rss:108kB, shmem-rss:12641580kB
>[2037990.516504] oom_reaper: reaped process 52059 (postgres), now
>anon-rss:0kB, file-rss:0kB, shmem-rss:12641580kB
>
>(full dmesg.log attached, if it is interesting; there are more postgres
>backends visible, but they were inactive at the time)
>

OK, so it allocated extra ~4.3GB or so (plus shared buffers).

>I can try the gdb dump next time I will see it. But I cannot imagine giving
>you reproducible case - it is 500 GB proprietary database, and without it,
>queries would be hardly useful, I presume? I can try to make "sample" with
>generated data and find out if I can reproduce the issue with my queries
>that way, but that will be quite time consuming.
>

Yeah, I understand the data is proprietary. But we need to identify
where the issue is, somehow. And being able to reproduce it is likely
the best way to do that. One way to do that is to either generate
synthetic data with similar structure/features (geometries of similar
size etc.), and hope that it triggers the issue too. Or distill a subset
of data triggering the issue and anonymize it. Or something like that.

Yes, it's going to be quite time consuming, but I don't have better
ideas. Maybe running it under valgrind would help - you'd have to
rebuild PostgreSQL with valgrind support and run the workload on it. The
queries would be much slower (possibly by an order of magnitude or so),
so you'd have to run it on a different machine, but that's machine time,
not time wasted by a human.

Anyway, let's start with the easy stuff - try running it again and get
the memory context stats. Maybe it's a simple leak in PostgreSQL, in
which case it should be easier to investigate it. If that turns out to
be untrue, you can try this more complicated stuff (valgrind, ...).

>Would it help to dump the memory of the backend process and deliver the
>dump (by some private channel) to somebody to identify who is consuming all
>that memory? (that is the usual drill in windows)
>

I don't think that'd be very helpful - if this really is memory leak in
one of the libraries, I have no idea how to spot that in the memory
dump. Also, if you claim the data is sensitive/proprietary, you should
not really be sharing the dumps.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

postgres_gdb.txt

Re: Memory leak (possibly connected to postgis) leading to servercrash

От

Tomas Vondra

Дата:

11 декабря 2019 г., 17:23:10

Hi,

On Wed, Dec 11, 2019 at 12:09:11PM +0100, Roman Cervenak wrote:
>Hi,
>I am trying to do
>
>(gdb) p MemoryContextStats(TopMemoryContext)
>
>on postgres backend process, but the response is: 'TopMemoryContext' has
>unknown type; cast it to its declared type
>
>I never debugged postgres (or anything under linux), and google is not
>helpful. Can you please advise? Full output is attached.
>

Hmmmm, I don't recall having this issue. Can you try installing the
package with debug symbols, it should be called -debug or so.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Tom Lane

Дата:

11 декабря 2019 г., 19:49:09

Roman Cervenak <roman@cervenak.info> writes:
> I am trying to do
> (gdb) p MemoryContextStats(TopMemoryContext)

I think what you wanted to do is "call MemoryContextStats(TopMemoryContext)".
"p" means "print".

            regards, tom lane

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

11 декабря 2019 г., 20:23:39

Same error with call.

On Wed, Dec 11, 2019, 17:49 Tom Lane <tgl@sss.pgh.pa.us> wrote:

Roman Cervenak <roman@cervenak.info> writes:
> I am trying to do
> (gdb) p MemoryContextStats(TopMemoryContext)

I think what you wanted to do is "call MemoryContextStats(TopMemoryContext)".
"p" means "print".

regards, tom lane

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

13 декабря 2019 г., 13:51:55

With "postgresql-12-dbgsym" installed, call MemoryContextStats(TopMemoryContext) succeeded on postgres backend process with high memory consumption.

I cut it from server logs, see attached file (I have redacted queries and table structures, I left SELECT parts where is visible that geometric operations are involved).

Regards,

Roman Cervenak

On Wed, Dec 11, 2019 at 5:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Roman Cervenak <roman@cervenak.info> writes:
> I am trying to do
> (gdb) p MemoryContextStats(TopMemoryContext)

I think what you wanted to do is "call MemoryContextStats(TopMemoryContext)".
"p" means "print".

regards, tom lane

Вложения

pg_memory_context.txt

Re: Memory leak (possibly connected to postgis) leading to servercrash

От

Tomas Vondra

Дата:

13 декабря 2019 г., 16:20:36

On Fri, Dec 13, 2019 at 11:51:55AM +0100, Roman Cervenak wrote:
>With "postgresql-12-dbgsym" installed,
>call MemoryContextStats(TopMemoryContext) succeeded on postgres backend
>process with high memory consumption.
>I cut it from server logs, see attached file (I have redacted queries and
>table structures, I left SELECT parts where is visible that geometric
>operations are involved).
>

The way I read the context stats, the memory leak has to be somewhere in
a library calling malloc directly, or something like that. Because per
the log, the PostgreSQL memory contexts are a whopping 5MB in total: 

   Grand total: 5683464 bytes in 1082 blocks; 2027536 free (362 chunks); 3655928

So this is likely a memory leak either somewhere in PostGIS (I see ~200
malloc calls in the current source code) or maybe in one of the
libraries it's using.

I think at this point we need to get more information about the queries
and data, or even better a reproducer. Or try running the queries under
valgrind, which does have a memory leak detector.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

17 декабря 2019 г., 14:51:27

Hey guys,

I have reproducer for you.

Luckily, the issue is not specific for our data, and can be demonstrated also on OpenStreepMap data. I have imported OSM planet with Imposm, which creates tables with PostGIS geometry columns.

I have used roads for linestring testing and landuse for polygon testing, with queries combining several geometric operations (similar to my production workload).

The linesting test with 7 concurrent workers increases memory consumption by about 6 GB per hour on my environment.

If you don't have OSM database with geometries, I have dumped the roads table (about 25GB download): https://sygicpublic.blob.core.windows.net/postgres-bug-data/osm_road.sql.gzip

Here is the tool running queries in parallel workers, written in .NET Core (just fill in database connection info in Program.cs, build and run): https://github.com/rouen-sk/PostgisMemoryLeakDemoCore

Let me know if I can help further.

Regards,

Roman Cervenak

On Fri, Dec 13, 2019 at 2:20 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, Dec 13, 2019 at 11:51:55AM +0100, Roman Cervenak wrote:
>With "postgresql-12-dbgsym" installed,
>call MemoryContextStats(TopMemoryContext) succeeded on postgres backend
>process with high memory consumption.
>I cut it from server logs, see attached file (I have redacted queries and
>table structures, I left SELECT parts where is visible that geometric
>operations are involved).
>

The way I read the context stats, the memory leak has to be somewhere in
a library calling malloc directly, or something like that. Because per
the log, the PostgreSQL memory contexts are a whopping 5MB in total:

Grand total: 5683464 bytes in 1082 blocks; 2027536 free (362 chunks); 3655928

So this is likely a memory leak either somewhere in PostGIS (I see ~200
malloc calls in the current source code) or maybe in one of the
libraries it's using.

I think at this point we need to get more information about the queries
and data, or even better a reproducer. Or try running the queries under
valgrind, which does have a memory leak detector.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to servercrash

От

Tomas Vondra

Дата:

17 декабря 2019 г., 15:50:35

On Tue, Dec 17, 2019 at 12:51:27PM +0100, Roman Cervenak wrote:
>Hey guys,
>I have reproducer for you.
>
>Luckily, the issue is not specific for our data, and can be demonstrated
>also on OpenStreepMap data. I have imported OSM planet with Imposm, which
>creates tables with PostGIS geometry columns.
>I have used roads for linestring testing and landuse for polygon testing,
>with queries combining several geometric operations (similar to my
>production workload).
>
>The linesting test with 7 concurrent workers increases memory consumption
>by about 6 GB per hour on my environment.
>
>If you don't have OSM database with geometries, I have dumped the roads
>table (about 25GB download):
>https://sygicpublic.blob.core.windows.net/postgres-bug-data/osm_road.sql.gzip
>Here is the tool running queries in parallel workers, written in .NET Core
>(just fill in database connection info in Program.cs, build and run):
>https://github.com/rouen-sk/PostgisMemoryLeakDemoCore
>
>Let me know if I can help further.
>

I'll try reproducing it, but I don't have any environment to run c#
stuff. Is it possible to reproduce a single query reproducing the issue,
with fixed parameters? Or is the randomization necessary?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

17 декабря 2019 г., 16:20:41

.NET core is cross platform, you can run it under windows or linux, just install .net core runtime.

I did not try to run loop with fixed params, I tried to mimic my production workload (which is not random of course, but highly variable by location). You can certainly try.

On Tue, Dec 17, 2019, 13:50 Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Tue, Dec 17, 2019 at 12:51:27PM +0100, Roman Cervenak wrote:
>Hey guys,
>I have reproducer for you.
>
>Luckily, the issue is not specific for our data, and can be demonstrated
>also on OpenStreepMap data. I have imported OSM planet with Imposm, which
>creates tables with PostGIS geometry columns.
>I have used roads for linestring testing and landuse for polygon testing,
>with queries combining several geometric operations (similar to my
>production workload).
>
>The linesting test with 7 concurrent workers increases memory consumption
>by about 6 GB per hour on my environment.
>
>If you don't have OSM database with geometries, I have dumped the roads
>table (about 25GB download):
>https://sygicpublic.blob.core.windows.net/postgres-bug-data/osm_road.sql.gzip
>Here is the tool running queries in parallel workers, written in .NET Core
>(just fill in database connection info in Program.cs, build and run):
>https://github.com/rouen-sk/PostgisMemoryLeakDemoCore
>
>Let me know if I can help further.
>

I'll try reproducing it, but I don't have any environment to run c#
stuff. Is it possible to reproduce a single query reproducing the issue,
with fixed parameters? Or is the randomization necessary?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to servercrash

От

Tomas Vondra

Дата:

17 декабря 2019 г., 21:09:57

On Tue, Dec 17, 2019 at 02:20:41PM +0100, Roman Cervenak wrote:
>.NET core is cross platform, you can run it under windows or linux, just
>install .net core runtime.

That's possible, but I have zero experience with .net core, and
unfortunately I have no system with it installed. I've tried running it
under mono I just installed, but quickly ran into issue that I don't
know how to solve. I'm willing to spend time investigating the issue,
but this seems a bit too much ...

>I did not try to run loop with fixed params, I tried to mimic my production
>workload (which is not random of course, but highly variable by location).
>You can certainly try.
>

Can you capture a couple of queries, and share those? I'll run that
under valgrind and see if that produces something interesting.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to server crash

От

Roman Cervenak

Дата:

18 декабря 2019 г., 11:49:22

Sure, here it is, 10k queries generated by this tool: https://sygicpublic.blob.core.windows.net/postgres-bug-data/postgis_memory_leak_queries.txt

On Tue, Dec 17, 2019 at 7:10 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Tue, Dec 17, 2019 at 02:20:41PM +0100, Roman Cervenak wrote:
>.NET core is cross platform, you can run it under windows or linux, just
>install .net core runtime.

That's possible, but I have zero experience with .net core, and
unfortunately I have no system with it installed. I've tried running it
under mono I just installed, but quickly ran into issue that I don't
know how to solve. I'm willing to spend time investigating the issue,
but this seems a bit too much ...

>I did not try to run loop with fixed params, I tried to mimic my production
>workload (which is not random of course, but highly variable by location).
>You can certainly try.
>

Can you capture a couple of queries, and share those? I'll run that
under valgrind and see if that produces something interesting.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory leak (possibly connected to postgis) leading to servercrash

От

Tomas Vondra

Дата:

18 декабря 2019 г., 17:55:21

On Wed, Dec 18, 2019 at 09:49:22AM +0100, Roman Cervenak wrote:
>Sure, here it is, 10k queries generated by this tool:
>https://sygicpublic.blob.core.windows.net/postgres-bug-data/postgis_memory_leak_queries.txt
>

Thanks. I've tried running this on current branches (so PostgreSQL 13dev
and PostGIS 3.1.0dev), but that does not seem to be leaking so far. I'll
leave running it for a bit longer and then I'll try running it on 12.1
and 3.0.0 which I think are the versions you reported. I don't have an
Ubuntu box at hand, so I'll have to build it locally, though.

BTW it's customary to reply in-line here, top-posting makes it harder to
follow the discussion etc.

regards

>RC
>
>On Tue, Dec 17, 2019 at 7:10 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>
>> On Tue, Dec 17, 2019 at 02:20:41PM +0100, Roman Cervenak wrote:
>> >.NET core is cross platform, you can run it under windows or linux, just
>> >install .net core runtime.
>>
>> That's possible, but I have zero experience with .net core, and
>> unfortunately I have no system with it installed. I've tried running it
>> under mono I just installed, but quickly ran into issue that I don't
>> know how to solve. I'm willing to spend time investigating the issue,
>> but this seems a bit too much ...
>>
>> >I did not try to run loop with fixed params, I tried to mimic my
>> production
>> >workload (which is not random of course, but highly variable by location).
>> >You can certainly try.
>> >
>>
>> Can you capture a couple of queries, and share those? I'll run that
>> under valgrind and see if that produces something interesting.
>>
>>
>> regards
>>
>> --
>> Tomas Vondra                  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Memory leak (possibly connected to postgis) leading to server crash

Вложения

Вложения

Вложения