Обсуждение: BUG #3881: lo_open leaks memory
The following bug has been logged online: Bug reference: 3881 Logged by: Michael Akinde Email address: michael.akinde@met.no PostgreSQL version: 8.2.5 Operating system: Linux Debian Etch Description: lo_open leaks memory Details: We are using large objects to store gridded data, and wish to provide function to permit the extraction of a single point from the data grids. One issue we have run into, unfortunately, is that lo_open seems to leak memory somewhat terribly. Simplified test case: create or replace function f() RETURNS setof bytea as $body$ declare r oid; fd int; ret bytea; begin for r in select gridoid FROM gridvalue LIMIT 150000 LOOP fd := lo_open( r, 262144 ); -- 262144 = "INV_READ" --PERFORM lo_lseek( fd, 120, 0 ); -- 0 = "SEEK_SET" --ret := loread( fd, 4 ); PERFORM lo_close( fd ); --RETURN NEXT ret; END LOOP; end; $body$ language plpgsql; SELECT * FROM f(); (Note that several lines are commented out - behavior is essentially the same with or without). We find that the above function (on a 8.2.5 setup) will rapidly max out 1GB of shared memory (it seems to goblle up 10-20kb for each lo_open), and performs very poorly. Obviously, this only gets worse with increasing queries (many of our queries will be retrieving points from over 10 million grids, so simply increasing memory is unfortunately not a viable solution). We are aware that this is a "known" bug (inasmuch as the comments on the backend source seems to explicitly state that the lo_* functions are known to leak memory). Questions: - Any likelihood that this may be fixed in the near future (or ever)? - Might there be a simple workaround for this problem? We have looked at the backend code ourselves, but I suspect that it would probably be easier for us to use toasted binary objects (esentially developing our own specialized lo_* system of functions) than to try and patch this on our own. Or might that run into similar (or different) problems? Regards, Michael Akinde Database Architect, met.no
"Michael Akinde" <michael.akinde@met.no> writes: > Description: lo_open leaks memory Hmm, I cannot replicate any memory leak with your example. I'm testing 8.2.6 (well, really 8.2 branch tip) not 8.2.5, but I don't recall that there were any recent fixes in this area. Perhaps there is some contributing factor you didn't mention? regards, tom lane
Tom Lane wrote:
The table gridvalue is of course a table with an attribute gridoid OID, containing at least 150,000 objects (in our own case, I simply run it on our valuetable which contains 4 million 90K binary objects). The problem doesn't occur with "fake" OIDs - and (at least from what I have seen) - neither does it occur if if it is just 150,000 copies of the same OID.
The testcase displays the behavior on a 64-bit debian etch setup, with Postgres 8.2.5. The testcase given (running on our value table), easily chews up 1.9 GB of memory in no time. The table in question is fairly large (20 or so attributes) and fairly heavily indexed, but we don't seem to be able to chew up main memory in the same way, without the lo_open call.
I'll try to build a more complete test case for you (incl. table and blob generation), but am a bit hampered by the limits on my workstation at the moment. It seems though as if the problem does not crop up with small files (at least, the attempts I've done so far to recreate the testcase with small volumes of synthetic data haven't recreated the problem), so this might take a while. I'll try to get something ready by monday.
Regards,
Michael A.
No doubt. I'm not sure what it would be, though."Michael Akinde" <michael.akinde@met.no> writes:Description: lo_open leaks memoryHmm, I cannot replicate any memory leak with your example. I'm testing 8.2.6 (well, really 8.2 branch tip) not 8.2.5, but I don't recall that there were any recent fixes in this area. Perhaps there is some contributing factor you didn't mention?
The table gridvalue is of course a table with an attribute gridoid OID, containing at least 150,000 objects (in our own case, I simply run it on our valuetable which contains 4 million 90K binary objects). The problem doesn't occur with "fake" OIDs - and (at least from what I have seen) - neither does it occur if if it is just 150,000 copies of the same OID.
The testcase displays the behavior on a 64-bit debian etch setup, with Postgres 8.2.5. The testcase given (running on our value table), easily chews up 1.9 GB of memory in no time. The table in question is fairly large (20 or so attributes) and fairly heavily indexed, but we don't seem to be able to chew up main memory in the same way, without the lo_open call.
I'll try to build a more complete test case for you (incl. table and blob generation), but am a bit hampered by the limits on my workstation at the moment. It seems though as if the problem does not crop up with small files (at least, the attempts I've done so far to recreate the testcase with small volumes of synthetic data haven't recreated the problem), so this might take a while. I'll try to get something ready by monday.
Regards,
Michael A.
Вложения
Michael Akinde <michael.akinde@met.no> writes: > Tom Lane wrote: >> Hmm, I cannot replicate any memory leak with your example. >> I'm testing 8.2.6 (well, really 8.2 branch tip) not 8.2.5, but I >> don't recall that there were any recent fixes in this area. >> Perhaps there is some contributing factor you didn't mention? >> > No doubt. I'm not sure what it would be, though. Just to avoid some possibly-wasted time, please update to 8.2.6 and confirm you still see the problem, before trying to extract a test case. I'm certain there are no relevant post-8.2.6 fixes, but I might have forgotten something relevant since 8.2.5. regards, tom lane
I updated the database to PostgreSQL 8.2.6, but this does not appear to make any difference. I use the following script to create a test table. For /tmp/oidfile.temp, I use "FAQ_farsi" from the PostgreSQL installation (about 140kb). create table gridvalue (gridoid oid); --delete from gridvalue; create or replace function load() RETURNS INTEGER AS $body$ declare i integer; r oid; begin for i IN 1..150000 LOOP r := lo_import('/tmp/oidfile.tmp'); insert into gridvalue values (r); END LOOP; RETURN i; end; $body$ language plpgsql; select load(); select count(*) from gridvalue; And the following script runs the . create or replace function f() RETURNS setof bytea as $body$ declare r oid; fd int; ret bytea; begin for r in select gridoid FROM gridvalue LIMIT 150000 LOOP fd := lo_open( r, 262144 ); -- 262144 = "INV_READ" --PERFORM lo_lseek( fd, 120, 0 ); -- 0 = "SEEK_SET" --ret := loread( fd, 4 ); PERFORM lo_close( fd ); --RETURN NEXT ret; END LOOP; end; $body$ language plpgsql; SELECT * FROM f(); On our 64bit Debian setup with 16 GB memory (2GB shared buffers), running f() will rapidly eat up 1.4 GB of memory on the first run; around 800-900 MB on subsequent runs. This seems a bit excessive, considering that I am just opening the OID, without reading or writing anything. Adding more or less iterations seems to scale up (or down) the amount of memory eaten up by the lo_open loop. Some observations: - With small blobs, the memory usage doesn't blow up in this way. The problem seems to require "big" blobs (although 140kb isn't really that much). - Running the same query (with 50,000 iterations, due to hd and admin limitations) on my 32bit laptop with Fedora5 doesn't show up the problem, as it simply runs within the limits of the few MB it has available on shared buffers. I suspect it would also gobble up memory, if it was available, but I don't know. Just to verify that there is not something within our database setup that is affecting this, I'll try to run the tests again on a clean installation of the database on the 64bit machine (just need to get some disk space allocated first). Regards, Michael Akinde Database Architect, met.no
Вложения
Michael Akinde <michael.akinde@met.no> writes: > Some observations: > - With small blobs, the memory usage doesn't blow up in this way. The > problem seems to require "big" blobs (although 140kb isn't really that > much). Ah, that may be the root of the difference in our results --- I was trying with small blobs. Will retest. regards, tom lane
Michael Akinde <michael.akinde@met.no> writes: > I use the following script to create a test table. For > /tmp/oidfile.temp, I use "FAQ_farsi" from the PostgreSQL installation > (about 140kb). Okay, I ran this with about 900MB of shared buffers (about as much as I thought I could make it without descending into swap hell ...) and there is no memory leak that I can see. What I *do* see is that the process size as reported by "top" quickly jumps to 900MB plus and then sits there. This is not a memory leak though, it is just a side effect of the way "top" reports usage of shared memory. Basically, a shared buffer starts getting charged against a given process the first time that process touches that buffer. Your test case involves reading a lot of blocks of pg_largeobject and that results in touching a lot of buffers. So basically I don't see a problem here. If you are noticing a performance issue in this area, it may indicate that you have shared_buffers set too large, ie, using more RAM than the machine can really afford to spare. That leads to swapping which drives performance down. regards, tom lane
Thanks for taking the time to look into the problem. It is much appreciated. Tom Lane wrote: > Michael Akinde <michael.akinde@met.no> writes: > >> I use the following script to create a test table. For >> /tmp/oidfile.temp, I use "FAQ_farsi" from the PostgreSQL installation >> (about 140kb). >> > Okay, I ran this with about 900MB of shared buffers (about as much as I > thought I could make it without descending into swap hell ...) and there > is no memory leak that I can see. What I *do* see is that the process > size as reported by "top" quickly jumps to 900MB plus and then sits > there. This is not a memory leak though, it is just a side effect of > the way "top" reports usage of shared memory. Basically, a shared > buffer starts getting charged against a given process the first time > that process touches that buffer. Your test case involves reading a lot > of blocks of pg_largeobject and that results in touching a lot of > buffers. > Hmm. Just some questions to clarify for our benefit, if you can find the time. Why does it make a difference to lo_open what the size of the blob is? Other than simply opening the blob to get the file descriptor, after all, we don't touch the blob itself. Also, since the blob is opened and closed, why does the process allocate new memory to open a new blob, rather than reuse existing memory? If this is the intended behavior (as it seems), is there someway we could force lo_open to reuse the memory (as this would seem to be a desirable behavior, at least to us)? > So basically I don't see a problem here. If you are noticing a > performance issue in this area, it may indicate that you have > shared_buffers set too large, ie, using more RAM than the machine > can really afford to spare. That leads to swapping which drives > performance down. > I suppose this goes for another mailing list, but 2GB shared buffers on a 16GB server which isn't used for anything else doesn't seem like that much. Is there an up to date source containing a thorough discussion of these settings (as everything we've found seems to be from 2003)? The case worries us a bit for two reasons. Firstly, we expect both much bigger retrieval queries in production (1 million rows, rather than 100 thousand) , and we've already seen that the database will max out physical memory usage at around 14 GB (shared memory usage is still reported at 2GB) and allocate huge globs of virtual memory (~30 GB) for queries of this kind. Some part of that memory usage is of course caused by our code (which also does retrieval on many other tables), but the huge majority of that memory usage appears to be caused by lo_open. Secondly, we will see exponential growth in the size of the blobs over the lifetime of the system, which bodes fairly poorly for this approach to a solution - memory is getting cheaper, but not that fast. Asking the users to "please don't ask big queries" is obviously not the kind of option which is tenable in the long run, so if we can't "tune" the problem away, and the large object facility is working as designed, then we'll need to figure out a different solution for the system. We had been considering toasted ByteAs, but as I understand it, they don't support random I/O, which would seem to rule them out as a practical alternative. Do you think it would be practicable to fix the Postgres large object implementation so that it is more suitable for our needs, or would it be simpler to do our own implementation of blobs in Postgres? Regards, Michael Akinde Database Architect, met.no
Вложения
Sorry, I forgot to mention: When we were building the test case, we ran a lot of experiments with 1 GB of shared buffers, and were taking a clear performance hit anytime the shared buffers seemed to hit the 1GB barrier. Increasing the shared buffer size to 2GB, improved performance significantly (since the query now "fits" in shared memory). This seems to contradict the conclusion that the problem is only a result of the way top is reporting. Regards, Michael A. Tom Lane wrote: > Okay, I ran this with about 900MB of shared buffers (about as much as I > thought I could make it without descending into swap hell ...) and there > is no memory leak that I can see. What I *do* see is that the process > size as reported by "top" quickly jumps to 900MB plus and then sits > there. This is not a memory leak though, it is just a side effect of > the way "top" reports usage of shared memory. Basically, a shared > buffer starts getting charged against a given process the first time > that process touches that buffer. Your test case involves reading a lot > of blocks of pg_largeobject and that results in touching a lot of > buffers. > > So basically I don't see a problem here. If you are noticing a > performance issue in this area, it may indicate that you have > shared_buffers set too large, ie, using more RAM than the machine > can really afford to spare. That leads to swapping which drives > performance down. >
Вложения
On Tue, 22 Jan 2008, Michael Akinde wrote: >> What I *do* see is that the process size as reported by "top" >> quickly jumps to 900MB plus and then sits there. This is not a >> memory leak though, it is just a side effect of the way "top" >> reports usage of shared memory. > > Also, since the blob is opened and closed, why does the process allocate > new memory to open a new blob, rather than reuse existing memory? I think a process does not allocate new memory, it just uses his shared buffer. The OS does not give physical memory for a process immediately when it is allocated for example by malloc, it gives it in chunks - only when it is first read or written to. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
Thanks for the reply, Tomasz.
We have now done some more performance tests working with pure C/C++ code, and the results we are finding seem to indicate that the disk thrashing has to do with the OS disk cache, and not as a result of the lo_open call. Notably, we have been unable to recreate the problems we found with the size of the shared buffers affecting performance, which confirms Tom's conclusions.
We still have a performance issue, but the latest round of tests indicate a number of places we should try to tweak.
Thanks for your patience, Tom.
Regards,
Michael Akinde
Database Architect, met.no
Tomasz Ostrowski wrote:
We have now done some more performance tests working with pure C/C++ code, and the results we are finding seem to indicate that the disk thrashing has to do with the OS disk cache, and not as a result of the lo_open call. Notably, we have been unable to recreate the problems we found with the size of the shared buffers affecting performance, which confirms Tom's conclusions.
We still have a performance issue, but the latest round of tests indicate a number of places we should try to tweak.
Thanks for your patience, Tom.
Regards,
Michael Akinde
Database Architect, met.no
Tomasz Ostrowski wrote:
On Tue, 22 Jan 2008, Michael Akinde wrote:What I *do* see is that the process size as reported by "top" quickly jumps to 900MB plus and then sits there. This is not a memory leak though, it is just a side effect of the way "top" reports usage of shared memory.Also, since the blob is opened and closed, why does the process allocate new memory to open a new blob, rather than reuse existing memory?I think a process does not allocate new memory, it just uses his shared buffer. The OS does not give physical memory for a process immediately when it is allocated for example by malloc, it gives it in chunks - only when it is first read or written to. Regards Tometzky
Вложения
Michael Akinde <michael.akinde@met.no> writes: > Why does it make a difference to lo_open what the size of the blob is? > Other than simply opening the blob to get the file descriptor, after > all, we don't touch the blob itself. I believe lo_open() fetches the first chunk of the blob's data, essentially as a way of validating that there is a blob of that OID. With larger blobs those first chunks would be spread across more pages of pg_largeobject, thus this process would involve touching more buffers. > Also, since the blob is opened and closed, why does the process allocate > new memory to open a new blob, rather than reuse existing memory? If > this is the intended behavior (as it seems), is there someway we could > force lo_open to reuse the memory (as this would seem to be a desirable > behavior, at least to us)? It will recycle those buffers, once it runs out of unused ones. Again, if you don't like the amount of memory that's going into this, maybe you need to back off your shared_buffer setting. > Firstly, we expect both much bigger retrieval queries in production (1 > million rows, rather than 100 thousand) , and we've already seen that > the database will max out physical memory usage at around 14 GB (shared > memory usage is still reported at 2GB) and allocate huge globs of > virtual memory (~30 GB) for queries of this kind. To be blunt, I'm not sure that either of us knows what you're measuring here. Are you counting OS-level disk cache as consumed memory? It's really not a problem if that's where unused memory is going. regards, tom lane