Обсуждение: One source of constant annoyance identified
...or so it seems: I configured our webserver to not use persistant connections et voilà - no more 200+MB backends! I knew there was supposed to be some bug in PHP, so users of Apache/PHP are discouraged to use persistant connections. As we are using ColdFusion/IIS on Win2k Server with ODBC I never suspected that there should be a similar issue with my configuration. Now we just switched off the persistant connection option (which was a true winner for our Oracle DB, performance-wise) and noticed the lack of these giant-backends we had learned to fear and endure before. Now there's still the odd 250MB backend lingering around for some time, but it's not four or five of them any more, wich is a big gain when there's 250MB swap around more often than not. Are there any known issues concerning persistant connections apart from the pgpconnect-thingy with PHP? Is anyone running the same combination of *nix/Postgres+IIS/ColdFusion? Regards, Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: > [ turned off persistent connections ] > Now there's still the odd 250MB backend lingering around for some time, > but it's not four or five of them any more, wich is a big gain when > there's 250MB swap around more often than not. So you haven't really solved the problem --- somewhere there is a query being issued that ramps the backend up to a lot of memory. All you've done is ensured that the backend won't hang around very long. The persistent connection isn't really at fault, except in that it causes backends to keep being used after their memory usage has become bloated. Although this might be enough to eliminate your immediate performance problem, you should keep looking. Starting lots more backends than you need to is a performance hit, so turning off persistent connections is really only a stopgap not a desirable answer. And I'm still wondering whether you've exposed a fixable memory leak bug. We need to identify exactly what query is causing the backends to eat memory. regards, tom lane
Markus Wollny wrote: > > Are there any known issues concerning persistant connections apart from > the pgpconnect-thingy with PHP? Is anyone running the same combination > of *nix/Postgres+IIS/ColdFusion? PostgreSQL 7.1 on i386-unknown-openbsd2.9, compiled by GCC 2.95.3 ColdFusion 4.5.1 SP2 on Windows NT 4 SP6a, ODBC 7.01.00.06 But it is all super light load, all the queries return 1 row or they are cached on the ColdFusion side. Only issue I ever had was that I needed to explicitly cast text columns to varchar or else the query would return an error (which I have been unable to reproduce with later ODBC drivers). If you are running CF MX, try the JDBC driver. I haven't finished testing it, but so far it is looking good. Jochem -- Jochem van Dieten Team Macromedia Volunteer for ColdFusion http://www.macromedia.com/support/forums/team_macromedia/
Hi! I see - so the beast is still roaming, but the current settings sort of keep it at bay for a while... I'd surely like to find the specific query/queries which cause this behaviour; all I could find out as yet was the database for the query. If I switch on query_stats and query-output in the logfile, I get about 300-500MB log per hour - even during low-load times. Scanning through these logs is like searching for a needle in a haystack. I can indeed identify those queries with exceptionally high execution times, but there's no mentioning of memory-usage anywhere - so I'm still left in the dark. I can get the PIDs from top-output and scan through the log, taking down execution times, but then there's nothing much out of the ordinary - mostly below 0.05 seconds or much, much less. Again I cannot guess about any correlation between execution time and memory usage. As one backend processes lots of queries in sequence, I cannot find it hard to imagine that each and every one of these queries isn't in any way extraordinary, but there might be some issue with freeing up memory that belonged to the predecessor, when the backend begins with a new query, thus grabbing more and more memory during its lifetime. With the current tools I know of, I shall have a hard time finding the error - if I ever do. I cannot claim however to really know a lot :) How would you, being "real DBAs" and knowing the intestines of this particular DB probably much better than your very own, go about with this task of tracking down the memory-eater? Regards, Markus > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Donnerstag, 27. Juni 2002 16:41 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] One source of constant annoyance identified > > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > [ turned off persistent connections ] > > > Now there's still the odd 250MB backend lingering around > for some time, > > but it's not four or five of them any more, wich is a big gain when > > there's 250MB swap around more often than not. > > So you haven't really solved the problem --- somewhere there > is a query > being issued that ramps the backend up to a lot of memory. All you've > done is ensured that the backend won't hang around very long. The > persistent connection isn't really at fault, except in that it causes > backends to keep being used after their memory usage has > become bloated. > > Although this might be enough to eliminate your immediate performance > problem, you should keep looking. Starting lots more > backends than you > need to is a performance hit, so turning off persistent connections is > really only a stopgap not a desirable answer. And I'm still wondering > whether you've exposed a fixable memory leak bug. We need to identify > exactly what query is causing the backends to eat memory. > > regards, tom lane >
On Thu, 27 Jun 2002, Tom Lane wrote: > So you haven't really solved the problem --- somewhere there is a query > being issued that ramps the backend up to a lot of memory. All you've > done is ensured that the backend won't hang around very long. The > persistent connection isn't really at fault, except in that it causes > backends to keep being used after their memory usage has become bloated. Yeah, but if the queries after that are not using all of the mapped memory, that should be swapped out fairly quickly because the machine is short on memory. Same for memory leaks; if you're losing a lot of memory, you'd think there would be a fair number of pages you never touch that could then be swapped out. The bloated processes, at least from the top fragment I saw, appear to have a working set of 200-250 MB; basically the entire data space is resident. So what's touching all of those pages often enough that they don't get swapped? cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Hi! It seems I found one of the queries which suck up memory as if there were terabytes available. If a user starts a search for e.g. "Ich brauche Mitleid" on one of our websites (site-ID is 43 in this example), we construct our select like this: select MESSAGE.BOARD_ID , MESSAGE.THREAD_ID , MESSAGE.MESSAGE_ID , MESSAGE.TITLE , MESSAGE.USER_ID , USERS.LOGIN , USERS.STATUS , USERS.RIGHTS , to_char(MESSAGE.CREATED,'DD.MM.YY hh24:mi') as DATUM , MESSAGE.COUNT_REPLY , (select count(*) from CT_COM_USER_THREAD_FOLLOW where USER_ID= '484387' and thread_id=MESSAGE.THREAD_ID) as TFUID from CT_COM_BOARD_MESSAGE MESSAGE , CT_COM_USER USERS , CT_COM_BOARD_RULES READRULE , CT_COM_SITE_BOARDS SITE where SITE.SITE_ID = '43' and ( lower(MESSAGE.TEXT) like '%ich%' or lower(MESSAGE.TEXT) like 'ich%' or lower(MESSAGE.TEXT) like '%ich' or lower(MESSAGE.TITLE) like '%ich%' or lower(MESSAGE.TITLE) like 'ich%' or lower(MESSAGE.TITLE) like '%ich' ) and ( lower(MESSAGE.TEXT) like '%brauche%' or lower(MESSAGE.TEXT) like 'brauche%' or lower(MESSAGE.TEXT) like '%brauche' or lower(MESSAGE.TITLE) like '%brauche%' or lower(MESSAGE.TITLE) like 'brauche%' or lower(MESSAGE.TITLE) like '%brauche' ) and ( lower(MESSAGE.TEXT) like '%mitleid%' or lower(MESSAGE.TEXT) like 'mitleid%' or lower(MESSAGE.TEXT) like '%mitleid' or lower(MESSAGE.TITLE) like '%mitleid%' or lower(MESSAGE.TITLE) like 'mitleid%' or lower(MESSAGE.TITLE) like '%mitleid' ) and MESSAGE.STATE_ID = 0 and MESSAGE.USER_ID = USERS.USER_ID and USERS.STATUS > 0 and SITE.BOARD_ID = MESSAGE.BOARD_ID and READRULE.BOARD_ID = MESSAGE.BOARD_ID and READRULE.RULE_ID = 1 and READRULE.VALUE <= '5' order by MESSAGE.LAST_REPLY desc Now I think it's the bit with the LIKEs that kills us, especially as the database refuses to create an index on MESSAGE.TEXT for it being to big or whatever - search me, but it just wouldn't do it (the field is of type varchar with a maximum length of 10,000 characters). This query is a true killer, taking over 2 minutes to complete while eating up more than a quarter of a gig of memory. Oracle wasn't too bothered about this one, but now PostgreSQL seems to act very differently... Now as far as I searched through the docs and the archives, there's this fulltext-search method provided in CONTRIB (http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/fulltextindex/ ?only_with_tag=REL7_2_STABLE); is this an equivalent of the conText-cartridge provided by Oracle? This lack for a full-text-search might be the main issue in our attempts to migrate from Oracle to PostgreSQL, so to me it looks like it might just be the saving straw. Please feel free to correct me if I'm wrong... I never had much experience with this CVS-system and as yet left it to those ambitous enough to tinker with the innards of their projects, but right now it seems like I am forced to risk a try... Unfortunately from my point of view this thing lacks a bit in terms of documentation - do I need to recompile the whole of PostgreSQL or just this bit? How would I go about installing it up to the point of actually running and making use of it on two columns (TITLE and TEXT in the MESSAGE-table)? Or am I completely misled concerning this fulltext-search-option - and there's some cheaper way out to speed things up without reducing functionality? Regards, Markus
LIKE '%dfjdsklfdfjdklfjds' is what is killing you. LIKE 'sdfdklf%' can be indexed, but the leading wildcard forces an index search. Markus Wollny wrote: > Hi! > > It seems I found one of the queries which suck up memory as if there > were terabytes available. -- P. J. "Josh" Rovero Sonalysts, Inc. Email: rovero@sonalysts.com www.sonalysts.com 215 Parkway North Work: (860)326-3671 or 442-4355 Waterford CT 06385 ***********************************************************************
Should have said: "leading % forces a sequential scan". Sorry P.J. \"Josh\" Rovero wrote: > LIKE '%dfjdsklfdfjdklfjds' is what is killing you. > > LIKE 'sdfdklf%' can be indexed, but the leading wildcard > forces an index search. > -- P. J. "Josh" Rovero Sonalysts, Inc. Email: rovero@sonalysts.com www.sonalysts.com 215 Parkway North Work: (860)326-3671 or 442-4355 Waterford CT 06385 ***********************************************************************
Curt Sampson <cjs@cynic.net> writes: > On Thu, 27 Jun 2002, Tom Lane wrote: >> So you haven't really solved the problem --- somewhere there is a query >> being issued that ramps the backend up to a lot of memory. All you've >> done is ensured that the backend won't hang around very long. The >> persistent connection isn't really at fault, except in that it causes >> backends to keep being used after their memory usage has become bloated. > Yeah, but if the queries after that are not using all of the mapped > memory, that should be swapped out fairly quickly because the > machine is short on memory. And the swapping activity is exactly the problem, isn't it? In any case, we can't make much progress until we identify the query that is making the backend's address space grow. (Markus, you don't happen to have SORT_MEM set to a large value, do you?) regards, tom lane
> And the swapping activity is exactly the problem, isn't it? Yupp, I guess so; once swapping is reduced, there should be not only much more fast memory available to the queries but also more processing time. > In any case, we can't make much progress until we identify the query > that is making the backend's address space grow. This here is one of them (see full text in mail from 13:25): > -----Ursprüngliche Nachricht----- > Von: Markus Wollny > Gesendet: Freitag, 28. Juni 2002 13:25 > An: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] One source of constant annoyance identified > > > Hi! > > It seems I found one of the queries which suck up memory as if there > were terabytes available. > > If a user starts a search for e.g. "Ich brauche Mitleid" on one of our > websites (site-ID is 43 in this example), we construct our select like > this: > > select MESSAGE.BOARD_ID > , MESSAGE.THREAD_ID > , MESSAGE.MESSAGE_ID > , MESSAGE.TITLE > , MESSAGE.USER_ID > , USERS.LOGIN > , USERS.STATUS > , USERS.RIGHTS > , to_char(MESSAGE.CREATED,'DD.MM.YY > hh24:mi') as DATUM > , MESSAGE.COUNT_REPLY > > , (select count(*) from > CT_COM_USER_THREAD_FOLLOW where USER_ID= '484387' and > thread_id=MESSAGE.THREAD_ID) as TFUID > > from CT_COM_BOARD_MESSAGE MESSAGE > , CT_COM_USER > USERS > , CT_COM_BOARD_RULES READRULE > , CT_COM_SITE_BOARDS SITE > where SITE.SITE_ID = '43' > > and > ( > lower(MESSAGE.TEXT) like '%ich%' > or lower(MESSAGE.TEXT) like 'ich%' > or lower(MESSAGE.TEXT) like '%ich' > > or lower(MESSAGE.TITLE) like '%ich%' > or lower(MESSAGE.TITLE) like 'ich%' > or lower(MESSAGE.TITLE) like '%ich' > > ) > > and > ( > lower(MESSAGE.TEXT) like '%brauche%' > or lower(MESSAGE.TEXT) like 'brauche%' > or lower(MESSAGE.TEXT) like '%brauche' > > or lower(MESSAGE.TITLE) like '%brauche%' > or lower(MESSAGE.TITLE) like 'brauche%' > or lower(MESSAGE.TITLE) like '%brauche' > > ) > > and > ( > lower(MESSAGE.TEXT) like '%mitleid%' > or lower(MESSAGE.TEXT) like 'mitleid%' > or lower(MESSAGE.TEXT) like '%mitleid' > > or lower(MESSAGE.TITLE) like '%mitleid%' > or lower(MESSAGE.TITLE) like 'mitleid%' > or lower(MESSAGE.TITLE) like '%mitleid' > > ) > > and MESSAGE.STATE_ID = 0 > and MESSAGE.USER_ID = > USERS.USER_ID > and USERS.STATUS > 0 > and SITE.BOARD_ID = > MESSAGE.BOARD_ID > and READRULE.BOARD_ID = > MESSAGE.BOARD_ID > and READRULE.RULE_ID = 1 > and READRULE.VALUE <= '5' > order by MESSAGE.LAST_REPLY desc > [snip] > (Markus, you don't happen to have SORT_MEM set to a large value, > do you?) Not at all - recommendations are 2-4% of available RAM AFAIK. On a 1GB machine this would be 21000 to 42000KBs. Nevertheless I reduced it from these values to a meagre sort_mem = 8192, that's 8MB, but to no avail. Reducing this value any further doesn't really make too much sense, does it. Anyway, I think that one of our main problems is the lack of options for fulltext-indexing. I'm currently trying to find out how to get FTI (from CVS/Contrib) up and running, because I hope that this will solve 90% of my problems if not more :) Regards, Markus
On Fri, Jun 28, 2002 at 07:40:46AM -0400, P.J. Josh Rovero wrote: > LIKE '%dfjdsklfdfjdklfjds' is what is killing you. > > LIKE 'sdfdklf%' can be indexed, but the leading wildcard > forces an index search. You mean it forces a sequential scan, of course. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
"Markus Wollny" <Markus.Wollny@computec.de> writes: > lower(MESSAGE.TEXT) like '%ich%' > or lower(MESSAGE.TEXT) like 'ich%' > or lower(MESSAGE.TEXT) like '%ich' Is whoever wrote this under the misimpression that % can't match zero characters? You could reduce the number of LIKE tests by a factor of 3, because the foo% and %foo tests are completely redundant. But, back to the problem at hand --- it seems like a fair bet that we must have a memory leak in lower() or LIKE or both. Did you build with locale or multibyte (or both) enabled? If so, what locale and what database encoding are you using, respectively? regards, tom lane
Hi! > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Freitag, 28. Juni 2002 17:03 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] One source of constant annoyance identified > > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > lower(MESSAGE.TEXT) like '%ich%' > > or lower(MESSAGE.TEXT) like 'ich%' > > or lower(MESSAGE.TEXT) like '%ich' > > Is whoever wrote this under the misimpression that % can't match zero > characters? You could reduce the number of LIKE tests by a > factor of 3, > because the foo% and %foo tests are completely redundant. Wasn't me :) I think there might be the odd generous wastage of processing time still in the code just because we could afford it under Oracle. We intend to implement this very bit using regular expressions, as we hope that this will improve performance a bit. So we might get away without using LIKE at all in this particular case. We cannot however remove LIKE completely from every bit of code. > But, back to the problem at hand --- it seems like a fair bet that > we must have a memory leak in lower() or LIKE or both. Did you build > with locale or multibyte (or both) enabled? If so, what locale and > what database encoding are you using, respectively? Compilation-options were: --enable-locale --enable-recode --enable-multibyte --with-perl --enable-odbc --enable-syslog Environment-variables RC_LANG/LC_CTYPE are set to de_DE@euro, encoding for the databases is SQL_ASCII. As we are hosting german websites and communities, we need to sort data containing characters ÄÖÜäöüß in the correct context (AÄ,OÖ,UÜ,sß), so I figured we'd need locale support. Would it be worth a try recompiling without multibyte enabled? Can I dump/reimport the current DB afterwards? Regards, Markus
Hi, On Fri, Jun 28, 2002 at 05:30:58PM +0200, Markus Wollny wrote: > > Compilation-options were: > > --enable-locale > --enable-recode > --enable-multibyte > --with-perl > --enable-odbc > --enable-syslog > > Environment-variables RC_LANG/LC_CTYPE are set to de_DE@euro, encoding > for the databases is SQL_ASCII. As we are hosting german websites and > communities, we need to sort data containing characters ÄÖÜäöüß in the > correct context (AÄ,OÖ,UÜ,sß), so I figured we'd need locale support. AFAIK the compiling options are ok, but did You the initdb command with the proper LC_ALL env set? It shoul be de_DE or de_DE@euro. BTW: I've found on my linux distribution (SlackWare 7.x) a wrong i18n source file. After fixing this and compiling the new locale evering worked fine. It's corrected in the newer SlackWare. -tb -- Thomas Beutin tb@laokoon.IN-Berlin.DE Beam me up, Scotty. There is no intelligent live down in Redmond.
Hi! -----Ursprüngliche Nachricht----- Von: Thomas Beutin Gesendet: Fr 28.06.2002 19:43 An: pgsql-general@postgresql.org Cc: Betreff: Re: [GENERAL] One source of constant annoyance identified > Environment-variables RC_LANG/LC_CTYPE are set to de_DE@euro, encoding > for the databases is SQL_ASCII. As we are hosting german websites and > communities, we need to sort data containing characters ÄÖÜäöüß in the > correct context (AÄ,OÖ,UÜ,sß), so I figured we'd need locale support. AFAIK the compiling options are ok, but did You the initdb command with the proper LC_ALL env set? It shoul be de_DE or de_DE@euro. For initdb-option "--locale=" I found in the documentation "If this option is not specified, the locale is inherited from the environment that initdb runs in." So I presume that if I didn't explicitly set it to anything other than my current environment, which I didn't, it should be okay. BTW: I've found on my linux distribution (SlackWare 7.x) a wrong i18n source file. After fixing this and compiling the new locale evering worked fine. It's corrected in the newer SlackWare. I use SuSE 7.3, a german distro, so I cannot imagine that some serious problem with german localization would have gone unnoticed. Of course I did not use the PostgreSQL-Version provided with the distro - I got the latest stable release 7.2.1 and compiled. So I guess that's not the source of the problem. Regards, Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: > For initdb-option "--locale=" I found in the documentation "If this > option is not specified, the locale is inherited from the environment > that initdb runs in." So I presume that if I didn't explicitly set it to > anything other than my current environment, which I didn't, it should be > okay. If you're not certain about this, compile contrib/pg_controldata and see what it says the locale values in your pg_control file are. regards, tom lane
On Fri, 28 Jun 2002, Tom Lane wrote: > > Yeah, but if the queries after that are not using all of the mapped > > memory, that should be swapped out fairly quickly because the > > machine is short on memory. > > And the swapping activity is exactly the problem, isn't it? That particular swapping activity would not be a problem. Memory that's not used gets paged out and that's the end of it. The problem is that something is *using* that memory, so it's not being paged out, or if it does get paged out, it gets paged back in again. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson <cjs@cynic.net> writes: > On Fri, 28 Jun 2002, Tom Lane wrote: >> And the swapping activity is exactly the problem, isn't it? > That particular swapping activity would not be a problem. Memory > that's not used gets paged out and that's the end of it. The problem > is that something is *using* that memory, so it's not being paged > out, or if it does get paged out, it gets paged back in again. Yeah --- but typical implementations of malloc are very pager- unfriendly; they tend to traverse data structures that consist of a word or two at the head of each randomly-sized chunk of data or former-now-freed data. PG adds its own layer of not-very-paging-friendly allocation logic on top of whatever sins your local malloc may commit. Bottom line is that a PG backend that's swollen to a couple hundred MB is trouble. Don't assume it'll play nice with the swapper; it won't. regards, tom lane
Hi! I did as you suggested - and locale is indeed not set correctly, it seems. Here's the output: pg_control version number: 71 [...] LC_COLLATE: C LC_CTYPE: de_DE@euro So LC_COLLATE wasn't set correctly; I dumped all databases, recompiled (without multibyte- and recode-support this time, just to reduce possible sources of failure), did initdb again and reimported the dumpfile. pg_control-output is now: pg_control version number: 71 [...] LC_COLLATE: de_DE@euro LC_CTYPE: de_DE@euro Configure-options were ./configure --prefix=/opt/pgsql/ --with-perl --enable-odbc --enable-locale --enable-syslog So that's most definitely fixed. It didn't do anything for me, though - there are still several giant backends, as you can see by top-output 12:29pm up 1 day, 21:04, 3 users, load average: 1.44, 1.13, 1.18 85 processes: 81 sleeping, 4 running, 0 zombie, 0 stopped CPU0 states: 87.3% user, 9.2% system, 0.0% nice, 3.1% idle CPU1 states: 34.0% user, 6.0% system, 0.0% nice, 59.5% idle CPU2 states: 47.3% user, 1.4% system, 0.0% nice, 50.5% idle CPU3 states: 44.6% user, 7.3% system, 0.0% nice, 47.3% idle Mem: 1029400K av, 1002272K used, 27128K free, 0K shrd, 5352K buff Swap: 2097136K av, 13160K used, 2083976K free 842792K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 9269 postgres 10 0 257M 257M 256M S 96.2 25.5 0:54 postmaster 9264 postgres 9 0 250M 250M 249M S 1.9 24.9 0:42 postmaster 9375 postgres 9 0 127M 127M 126M S 1.2 12.6 0:05 postmaster 9372 postgres 9 0 106M 106M 105M S 3.9 10.5 0:26 postmaster 9312 postgres 9 0 68900 67M 67616 S 1.8 6.6 0:02 postmaster 9379 postgres 9 0 41144 40M 39880 S 2.8 3.9 0:04 postmaster 9377 postgres 9 0 39276 38M 38104 S 0.0 3.8 0:14 postmaster 9380 postgres 9 0 34548 33M 33388 S 0.0 3.3 1:14 postmaster 9381 postgres 9 0 32140 31M 30760 S 0.0 3.1 0:26 postmaster 9373 postgres 9 0 24068 23M 23064 S 0.0 2.3 0:01 postmaster 9353 postgres 9 0 22224 21M 20332 S 7.8 2.1 0:37 postmaster 9371 postgres 9 0 22036 21M 21032 S 0.0 2.1 0:01 postmaster 9382 postgres 9 0 17840 17M 16652 S 0.0 1.7 0:01 postmaster 9268 postgres 12 0 16104 15M 15036 R 0.7 1.5 0:16 postmaster 9352 postgres 9 0 15584 15M 14076 S 0.0 1.5 0:19 postmaster 9351 postgres 19 0 14644 14M 13080 R 36.9 1.4 0:16 postmaster 9313 postgres 9 0 14068 13M 13036 S 5.1 1.3 0:14 postmaster 9374 postgres 9 0 13996 13M 12976 S 3.1 1.3 0:14 postmaster 9354 postgres 9 0 13116 12M 12088 S 1.3 1.2 0:03 postmaster 9376 postgres 9 0 12744 12M 11732 S 0.9 1.2 0:02 postmaster 9452 postgres 9 0 12356 12M 11208 S 0.0 1.1 0:10 postmaster 9370 postgres 9 0 12300 12M 11332 S 1.8 1.1 0:02 postmaster 9355 postgres 9 0 11768 11M 10756 S 0.0 1.1 0:00 postmaster 9524 postgres 9 0 10744 10M 9740 S 3.6 1.0 0:00 postmaster 9476 postgres 9 0 10020 9.8M 9044 S 1.9 0.9 0:00 postmaster 9602 postgres 18 0 9472 9468 8356 R 58.0 0.9 0:03 postmaster 9535 postgres 9 0 9368 9364 8216 S 0.7 0.9 0:00 postmaster Any more ideas? Regards, Markus > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Freitag, 28. Juni 2002 22:42 > An: Markus Wollny > Cc: Thomas Beutin; pgsql-general@postgresql.org > Betreff: Re: [GENERAL] One source of constant annoyance identified > > > If you're not certain about this, compile > contrib/pg_controldata and see > what it says the locale values in your pg_control file are. > > regards, tom lane >
"Markus Wollny" <Markus.Wollny@computec.de> writes: > Now I think it's the bit with the LIKEs that kills us, That's what I thought too, but I've tried and failed to reproduce any memory leak with lower/LIKE and the same configuration options that you used. It might be that some other part of the query is the problem, or maybe I'm not duplicating the setup correctly. Could I trouble you for the exact schemas of the tables used by the problem query? (The output of pg_dump -s would be the best thing to send.) regards, tom lane
Hello! Sorry I took so long - I attached the schema as asked. Actually it seems to be very often the case, that certain operations suck up more than 25% of available memory and processing capacities. I managed to customize and install the Full Text Index-option (fti) from the contrib-directory (substrings are at least 3 characters in length for us and I defined a list of StopWords to be not included). Right now I have started filling the fti-table with the substrings; I tried using the Perl-script supplied, but the results are quite dissatisfactory because it doesn't exclude any stopwords, nor does it limit itself to alphanumeric - you may be able to imagine what happens when there's loads of kiddies posting "!!!!!!!!!!!!!!!!!!" (ad nauseum) and suchlike in quite a lot of postings. So used the already implemented trigger to execute the fti-function: update ct_com_board_message set state_id=0 where state_id=0 and to_char(last_reply, 'yyyymmdd') between '20020301' and '20020830'; I took a quick look at top: Even this humble query causes memory- and processor-load like a giant: 266M RAM, 38.3% processor time, 26.4% memory usage. Okay, it's calling the trigger for each row which in turn inserts some new tuples into ct_com_board_fti, but is it expected to cause so much load? Regards, Markus > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Sonntag, 30. Juni 2002 21:34 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] One source of constant annoyance identified > That's what I thought too, but I've tried and failed to reproduce any > memory leak with lower/LIKE and the same configuration > options that you > used. It might be that some other part of the query is the > problem, or > maybe I'm not duplicating the setup correctly. Could I > trouble you for > the exact schemas of the tables used by the problem query? > (The output > of pg_dump -s would be the best thing to send.) > > regards, tom lane >
Вложения
"Markus Wollny" <Markus.Wollny@computec.de> writes: > Sorry I took so long - I attached the schema as asked. Thanks. But I'm still unable to reproduce the memory bloat you see on SELECTs. This seems very peculiar. You said you were running SuSE 7.3 --- how recent is that? Which glibc version is it running? (I've been reduced to speculating about memory leakage inside libc, which is a pretty long shot but...) > So used the already implemented trigger to > execute the fti-function: > update ct_com_board_message > set state_id=3D0 > where state_id=3D0 > and to_char(last_reply, 'yyyymmdd') between '20020301' and '20020830'; > I took a quick look at top: Even this humble query causes memory- and > processor-load like a giant: 266M RAM, 38.3% processor time, 26.4% > memory usage. Okay, it's calling the trigger for each row which in turn > inserts some new tuples into ct_com_board_fti, but is it expected to > cause so much load? Wouldn't surprise me. Since you're using an AFTER trigger, the pending trigger events have to be saved up for commit time, so the list of pending events is going to grow quite large. (How many rows do you have in ct_com_board_message, anyway? How many did that query try to update?) This however does not explain your problem with SELECT, since selects don't fire triggers. Could I see the output of EXPLAIN for that problem SELECT on your machine? regards, tom lane
On Tue, 2 Jul 2002, Tom Lane wrote: > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > Sorry I took so long - I attached the schema as asked. > > Thanks. But I'm still unable to reproduce the memory bloat you see on > SELECTs. This seems very peculiar. You said you were running SuSE 7.3 > --- how recent is that? Which glibc version is it running? (I've been > reduced to speculating about memory leakage inside libc, which is a > pretty long shot but...) > > > So used the already implemented trigger to > > execute the fti-function: > > > update ct_com_board_message > > set state_id=3D0 > > where state_id=3D0 > > and to_char(last_reply, 'yyyymmdd') between '20020301' and '20020830'; > > > I took a quick look at top: Even this humble query causes memory- and > > processor-load like a giant: 266M RAM, 38.3% processor time, 26.4% > > memory usage. Okay, it's calling the trigger for each row which in turn > > inserts some new tuples into ct_com_board_fti, but is it expected to > > cause so much load? > > Wouldn't surprise me. Since you're using an AFTER trigger, the pending > trigger events have to be saved up for commit time, so the list of > pending events is going to grow quite large. (How many rows do you have > in ct_com_board_message, anyway? How many did that query try to > update?) Whoa, that's what I was trying to remember. I had problems at one time when loading a large amount of data into a table, with a txtidx type column. It might not have been a memory problem I had though it could just have been slow loading. I was loading with COPY in a transaction and ended up just doing the COPY outside of a transaction. It still took a while but then it's only a low powered machine. If that wasn't the process footprint growing huge then that problem was occuring for me when doing selects. I can't remember what it was that I did that fixed it though. I wonder if it's in the list's archives since the issue was raised here. > This however does not explain your problem with SELECT, since > selects don't fire triggers. > > Could I see the output of EXPLAIN for that problem SELECT on your > machine? > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
Hello! Thank you very much for your efforts - we appreciate that very much :) > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Mittwoch, 3. Juli 2002 00:28 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] One source of constant annoyance identified > > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > Sorry I took so long - I attached the schema as asked. > > Thanks. But I'm still unable to reproduce the memory bloat you see on > SELECTs. This seems very peculiar. You said you were > running SuSE 7.3 > --- how recent is that? Which glibc version is it running? > (I've been > reduced to speculating about memory leakage inside libc, which is a > pretty long shot but...) I agree - it is a long shot: SuSE 7.3 has got Kernel: 2.4.10 and glibc: 2.2.4; it was released in October 2001. I tried using SuSE 8.0, released in late April this year, but I was more than unhappy with some of the new "features" - and it seems that hardly anybody has switched to 8.0 for server-usage as yet. Generally SuSE 7.3 (which is probably by far the most popular distro in Germany) is considered quite stable and current enough for server-usage. Would it really be worth the hassle updating glibc to 2.2.5? > > I took a quick look at top: Even this humble query causes > memory- and > > processor-load like a giant: 266M RAM, 38.3% processor time, 26.4% > > memory usage. Okay, it's calling the trigger for each row > which in turn > > inserts some new tuples into ct_com_board_fti, but is it expected to > > cause so much load? > > Wouldn't surprise me. Since you're using an AFTER trigger, > the pending > trigger events have to be saved up for commit time, so the list of > pending events is going to grow quite large. Okay, so there are indeed situations when this kind of backend size can be considered normal. That's some sort of relief :) > (How many rows do you have in ct_com_board_message, anyway? > How many did that query try to > update?) This however does not explain your problem with > SELECT, since > selects don't fire triggers. Currently there are 362,154 rows in ct_com_board_message and 85,101 rows in ct_com_user. I don't know if this can be considered a lot; we will expect that to grow at an accelerating rate during the next months, so more than a million can be expected within the next 12 months or so. We scarcely use any database-specific features like triggers as yet, it's 99.99% pure SQL, tables, indexes, data, selects, inserts, updates, deletes, no "fancy stuff" as yet. We'll have to get to know PostgreSQL better before using anything specific; we never needed to get too deep into Oracle either, which was an advantage when porting the whole thing over to PostgreSQL, but it seems that the latter calls for a bit more attention and know-how... > Could I see the output of EXPLAIN for that problem SELECT on your > machine? We are currently working on a new version for the search I gave you before, but here's another one that's causing similar symptoms - extra long running time and most probably huge backends (it's 10:20 am, far from peak time, and we already have ~280MB swap and six backends >100MB): This one generates an overview over all the forum-threads in one board which has 41,624 messages, 2971 of them are FATHER_ID=0, so thread-starters: select MESSAGE.BOARD_ID , MESSAGE.THREAD_ID , MESSAGE.FATHER_ID , MESSAGE.MESSAGE_ID , MESSAGE.USER_ID , MESSAGE.USER_LOGIN as LOGIN , MESSAGE.USER_STATUS as STATUS , MESSAGE.USER_RIGHTS as RIGHTS , MESSAGE.TITLE , MESSAGE.COUNT_REPLY as COUNT_REPLY , to_char(MESSAGE.LAST_REPLY,'DD.MM.YY hh24:mi') as LAST_REPLY , round((date_part('epoch',CURRENT_TIMESTAMP)-date_part('epoch',MESSAGE.LA ST_REPLY))/60) as diff_posting , to_char(MESSAGE.CREATED,'DD.MM.YY hh24:mi') as DATUM , (select count(*) from CT_COM_USER_THREAD_FOLLOW where USER_ID= '295798' and thread_id=MESSAGE.THREAD_ID) as TFUID from CT_COM_BOARD_MESSAGE MESSAGE where (0=0) and MESSAGE.BOARD_ID = 10 and MESSAGE.FATHER_ID = 0 and MESSAGE.STATE_ID = 0 order by MESSAGE.LAST_REPLY desc Sort (cost=30695.27..30695.27 rows=7693 width=154) (actual time=9745.94..9751.58 rows=4663 loops=1) -> Index Scan using idx_bm_show_topics on ct_com_board_message message (cost=0.00..30198.72 rows=7693 width=154) (actual time=111.56..9549.99 rows=4663 loops=1) SubPlan -> Aggregate (cost=5.83..5.83 rows=1 width=0) (actual time=0.91..0.91 rows=1 loops=4663) -> Index Scan using idx_user_thread_follow on ct_com_user_thread_follow (cost=0.00..5.83 rows=1 width=0) (actual time=0.88..0.88 rows=0 loops=4663) Total runtime: 9835.57 msec I do hope you can make anything of this... Regards, Markus
Okay, now it's even more annoying... We just upgraded the server from 1GB to 2GB of RAM because it kept swapping out about 300MB. I updated postgresql.conf accordingly, setting max_connections = 190 (this is the minimum we need to satisfy webservers and backend-jobs) shared_buffers = 60000 (that's 468,75MB; I took this value assuming a recommended value of 25% of RAM) sort_mem = 80000 (that's 78,125MB, recommended value is 4% of RAM, equalling 82MB). Now top-output sorted by memory usage turns out to be: 2:43pm up 1:34, 3 users, load average: 3.29, 2.25, 2.08 123 processes: 118 sleeping, 5 running, 0 zombie, 0 stopped CPU0 states: 60.5% user, 3.1% system, 0.0% nice, 35.4% idle CPU1 states: 47.3% user, 15.3% system, 0.0% nice, 36.4% idle CPU2 states: 51.0% user, 17.1% system, 0.0% nice, 31.3% idle CPU3 states: 75.4% user, 4.0% system, 0.0% nice, 20.0% idle Mem: 2061560K av, 2054268K used, 7292K free, 0K shrd, 13924K buff Swap: 2097136K av, 7788K used, 2089348K free 1825104K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1652 postgres 9 0 469M 469M 467M S 0.0 23.3 2:21 postmaster 2041 postgres 9 0 431M 431M 429M S 1.2 21.4 1:32 postmaster 1588 postgres 9 0 411M 411M 410M S 15.6 20.4 1:25 postmaster 1597 postgres 9 0 289M 289M 287M S 0.1 14.3 0:31 postmaster 1849 postgres 9 0 259M 259M 258M S 0.0 12.9 0:14 postmaster 2046 postgres 9 0 239M 239M 238M S 0.0 11.9 0:25 postmaster 1973 postgres 9 0 172M 172M 171M S 0.0 8.5 0:21 postmaster 2142 postgres 9 0 128M 128M 127M S 0.0 6.3 0:04 postmaster 2156 postgres 9 0 116M 116M 114M S 0.0 5.7 0:02 postmaster 1598 postgres 9 0 86548 84M 84752 S 1.7 4.1 2:36 postmaster 1608 postgres 9 0 60932 59M 59356 S 0.0 2.9 2:09 postmaster 1582 postgres 9 0 57624 56M 55444 S 0.0 2.7 1:58 postmaster 1609 postgres 9 0 56408 55M 55164 S 8.3 2.7 2:05 postmaster 1766 postgres 9 0 45248 44M 43824 S 0.0 2.1 1:09 postmaster 2139 postgres 9 0 45276 44M 43892 S 0.0 2.1 0:01 postmaster 2045 postgres 9 0 41500 40M 39820 S 1.7 2.0 2:43 postmaster 1610 postgres 9 0 41336 40M 40172 S 0.0 2.0 0:53 postmaster 2044 postgres 14 0 38328 37M 37056 R 25.0 1.8 0:33 postmaster 1881 postgres 9 0 34936 34M 33676 S 17.2 1.6 0:29 postmaster 2042 postgres 9 0 33144 32M 31920 S 3.7 1.6 1:13 postmaster 1679 postgres 9 0 32516 31M 31288 S 0.0 1.5 0:05 postmaster 1678 postgres 9 0 31996 31M 30812 S 0.0 1.5 0:19 postmaster 1653 postgres 9 0 29424 28M 28180 S 0.0 1.4 0:05 postmaster 2048 postgres 9 0 27772 27M 26556 S 0.0 1.3 0:09 postmaster 1802 postgres 9 0 26676 26M 25504 S 0.0 1.2 0:06 postmaster 2211 postgres 9 0 25940 25M 24592 S 0.0 1.2 0:02 postmaster 2047 postgres 14 0 25588 24M 23532 R 8.0 1.2 0:55 postmaster 2065 postgres 11 0 25584 24M 24404 S 3.9 1.2 0:04 postmaster 1980 postgres 11 0 24584 24M 22864 S 3.5 1.1 0:07 postmaster 1872 postgres 9 0 23908 23M 22800 S 0.0 1.1 0:04 postmaster 2068 postgres 9 0 21352 20M 20188 S 0.0 1.0 0:01 postmaster 2138 postgres 9 0 20928 20M 19644 S 0.0 1.0 0:06 postmaster 1983 postgres 9 0 20544 20M 19344 S 0.0 0.9 0:04 postmaster 2342 postgres 9 0 20352 19M 13772 S 0.0 0.9 0:23 postmaster 2357 postgres 15 0 20260 19M 18532 R 93.1 0.9 1:34 postmaster 2204 postgres 9 0 19816 19M 18244 S 0.0 0.9 0:01 postmaster 2199 postgres 9 0 16840 16M 15452 S 0.0 0.8 0:00 postmaster 2207 postgres 9 0 16784 16M 15512 S 0.0 0.8 0:00 postmaster 2050 postgres 9 0 15880 15M 14136 S 0.0 0.7 0:06 postmaster 2200 postgres 9 0 15568 15M 14080 S 0.0 0.7 0:00 postmaster 2301 postgres 9 0 15076 14M 13940 S 0.0 0.7 0:00 postmaster 2236 postgres 9 0 14132 13M 12824 S 0.0 0.6 0:00 postmaster 2346 postgres 9 0 14080 13M 12972 S 0.0 0.6 0:09 postmaster 2347 postgres 9 0 14064 13M 12960 S 0.0 0.6 0:09 postmaster 2205 postgres 9 0 13904 13M 12412 S 0.0 0.6 0:00 postmaster 2339 postgres 11 0 12660 12M 11448 S 36.1 0.6 0:27 postmaster which is not at all good... And still the major problem in finding the cause of all this is to identify the query which causes these huge backends. At the moment I tend to suspect that there certainly are several queries which need a bit of RAM, not that much however, and the backend fails to free up the memory used by processed queries. What puzzles me here is that all these 400MB+ backend-processes are marked as idle (ps-output) respectively sleeping, so they don't actually do much. But how come a sleeping process grabs 20% of 2GB of RAM and 15% of processing time? Another fact that hints at backends just eating memory without reason is that when I stop the database, processes keep lingering - kill just doesn't help, I have to kill -9 the last of the lot to get red of them before restarting the database: dunkles:/var/lib/pgsql/data/base # /etc/init.d/postgresql stop Shutting down PostgreSQL done dunkles:/var/lib/pgsql/data/base # ps ax|grep post 863 pts/1 S 0:00 login -- postgres 885 pts/2 S 0:00 login -- postgres 1552 pts/0 S 0:01 /opt/pgsql/bin/postmaster -i -D/var/lib/pgsql/data/base 1553 pts/0 S 0:00 postgres: stats buffer process 1554 pts/0 S 0:01 postgres: stats collector process 1650 pts/0 S 0:00 postgres: postgres template1 212.123.109.25 idle 1681 pts/0 S 0:00 postgres: postgres abo 212.123.109.25 idle 1682 pts/0 S 0:00 postgres: postgres bluebox 212.123.109.25 idle 1683 pts/0 S 0:00 postgres: postgres kidszone 212.123.109.25 idle 1684 pts/0 S 0:00 postgres: postgres mcv 212.123.109.25 idle 1685 pts/0 S 0:00 postgres: postgres mpo 212.123.109.25 idle 1686 pts/0 S 0:00 postgres: postgres nzone 212.123.109.25 idle 1687 pts/0 S 0:00 postgres: postgres pcaction 212.123.109.25 idle 1688 pts/0 S 0:00 postgres: postgres pcgames 212.123.109.25 idle 1689 pts/0 S 0:00 postgres: postgres phppgadmin 212.123.109.25 idle 1690 pts/0 S 0:00 postgres: postgres pszone 212.123.109.25 idle 1691 pts/0 S 0:00 postgres: postgres saturn 212.123.109.25 idle 1693 pts/0 S 0:00 postgres: postgres template1 212.123.109.25 idle 1780 pts/0 S 0:00 postgres: postgres template1 212.123.108.149 idle 1781 pts/0 S 0:00 postgres: postgres template1 212.123.108.149 idle 1784 pts/0 S 0:00 postgres: postgres pcgames 212.123.108.149 idle This just has some sort of Windows-look&feel to it - processes not responding any more. Now I set back the original values before upgrading RAM: shared_buffers = 32768 (256MB) and sort_mem = 51200 (50MB). Limit for backend size not seems to be some 265 odd MB (about four to six of those around at any given time, as always) and top reports: 3:11pm up 2:03, 3 users, load average: 3.24, 1.65, 1.81 163 processes: 156 sleeping, 7 running, 0 zombie, 0 stopped CPU0 states: 81.3% user, 15.5% system, 0.0% nice, 2.2% idle CPU1 states: 67.1% user, 26.0% system, 0.0% nice, 6.3% idle CPU2 states: 69.2% user, 21.0% system, 0.0% nice, 9.3% idle CPU3 states: 65.4% user, 20.2% system, 0.0% nice, 13.3% idle Mem: 2061560K av, 2025392K used, 36168K free, 0K shrd, 13108K buff Swap: 2097136K av, 7732K used, 2089404K free 1750556K cached Which isn't quite as bad anymore - although it cannot be the optimum performance for this machine, I can't imagine. Now what on earth can I do to get this DB running, and when I say "running" I don't mean "sort of crawling uphills"? Are there any OS-patches or environment-settings I need to consider? Are there known memory-leaks? I just doesn't seem to matter whichever settings I take in postgresql.conf, the database just eats up any available memory in any case. I cannot use it for production purposes this way. I am quite dispaired right now - and I am definitely running out of time. Which bits of linux (glibc, kernel, whatnot) does postgresql 7.2.1 need in which version to run smoothly? As I mentioned before, we don't use any "fancy" features of the database like foreign keys, triggers (except one) or whatever, it's just basic functionality that seems to fall down on us... I am willing to try almost anything - but I need to squeeze more performance out of this thing and I need some hints on which tools to use to identify the problem. And unfortunately I need all this as soon as possible... Hints and help are very, very much appreciated. Thank you! Regards, Markus
Hello! Scanning google for strange backend behaviour of PostgreSQL I stumbled over the following bug-report: http://postgresql.linux.cz/mhonarc/pgsql-ports/1999-01/msg00001.html It's referring to a very old version of PostgreSQL (6.0), but could it be that we suffer from the very same phenomenon? I do notice "connection reset by peer" messages in the logfile - I always thought that this happened when a user terminated his request or a connection broke because of other (mostly harmless) reasons. Now to me it seems quite likely that our backends sort of "go to sleep", too, hogging memory and (for some reason) processing time as well. Has the mentioned bug report been adressed since then? Regards, Markus
"Markus Wollny" <Markus.Wollny@computec.de> writes: > http://postgresql.linux.cz/mhonarc/pgsql-ports/1999-01/msg00001.html > Has the mentioned bug report been adressed since then? I don't think it's a bug. TCP is not designed to detect connection failure instantly, and it's definitely not designed to waste network resources when neither side of a connection is sending anything. In the scenario described in the report, the backend is sitting idle and there's no way to know that the client side thinks the connection has failed. In recent releases we set the keepalive feature on, which means that after a sufficiently long timeout the server side will probe to see if the client is still there, and will then discover that the connection is dead. Whereupon the backend will receive a kernel error report and will exit. But up to that moment, there's no reason to think that anything is wrong. regards, tom lane
On Wed, Jul 03, 2002 at 03:23:39PM +0200, Markus Wollny wrote: > Okay, now it's even more annoying... > > We just upgraded the server from 1GB to 2GB of RAM because it kept > swapping out about 300MB. Well, you've solved the swapping problem. > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 1652 postgres 9 0 469M 469M 467M S 0.0 23.3 2:21 postmaster > 2041 postgres 9 0 431M 431M 429M S 1.2 21.4 1:32 postmaster > 1588 postgres 9 0 411M 411M 410M S 15.6 20.4 1:25 postmaster > 1597 postgres 9 0 289M 289M 287M S 0.1 14.3 0:31 postmaster That %CPU refers to the average CPU over the lifetime of the process. For example, that first process has done nearly 2 and a half minutes of work. That's either a lot of queries or one really big one. Now I can imagine that many queries mapping in all of shared memory. Note that RSS is around the size of the shared memory you have. So all those 460M processes seem to be all the same memory, all shared. What exactly is the problem here? You're not swapping. > And still the major problem in finding the cause of all this is to > identify the query which causes these huge backends. At the moment I > tend to suspect that there certainly are several queries which need a > bit of RAM, not that much however, and the backend fails to free up the > memory used by processed queries. What puzzles me here is that all these > 400MB+ backend-processes are marked as idle (ps-output) respectively > sleeping, so they don't actually do much. But how come a sleeping > process grabs 20% of 2GB of RAM and 15% of processing time? My theory is that it's not using RAM at all, it's simply mapping the shared memory in, which inflates the RSS. If you reduce the amount of shared memory, does the RSS of the processes go down too? > Another fact that hints at backends just eating memory without reason is > that when I stop the database, processes keep lingering - kill just > doesn't help, I have to kill -9 the last of the lot to get red of them > before restarting the database: Never kill -9. Recipie for disaster. If you want those processes to die, perhaps you should stop the processes that are accesses the DB. They're the ones that are idle. Are you using persistant connections at all? If so, how long for? Note that pg_ctl has various stop modes: smart, fast and immediate. Make you're using the one you want. > 1689 pts/0 S 0:00 postgres: postgres phppgadmin 212.123.109.25 > idle > 1690 pts/0 S 0:00 postgres: postgres pszone 212.123.109.25 idle > > 1691 pts/0 S 0:00 postgres: postgres saturn 212.123.109.25 idle > > 1693 pts/0 S 0:00 postgres: postgres template1 212.123.109.25 > idle > 1780 pts/0 S 0:00 postgres: postgres template1 212.123.108.149 > idle > 1781 pts/0 S 0:00 postgres: postgres template1 212.123.108.149 > idle > 1784 pts/0 S 0:00 postgres: postgres pcgames 212.123.108.149 > idle Note that these are not the processes you listed above. The process times don't match. You also have quite a lot of connections to template1. > 3:11pm up 2:03, 3 users, load average: 3.24, 1.65, 1.81 Look at that load average. You have a whole CPU idle. CPU power is not your limitation. No swap so that's not the problem. You really need to work out what is slow. > Which isn't quite as bad anymore - although it cannot be the optimum > performance for this machine, I can't imagine. So is it actually slow or are you looking for some ethereal "faster". > Now what on earth can I do to get this DB running, and when I say > "running" I don't mean "sort of crawling uphills"? Are there any Look. 90% of performance improvement comes from modifying queries. Tweaking the config settings can really only account for so much. There is nothing in any of your messages where you have provided anything that we can use to help you. There are no optimal settings, they depend entirely on what your queries are. > OS-patches or environment-settings I need to consider? Are there known > memory-leaks? I just doesn't seem to matter whichever settings I take in > postgresql.conf, the database just eats up any available memory in any > case. I cannot use it for production purposes this way. I am quite > dispaired right now - and I am definitely running out of time. Which > bits of linux (glibc, kernel, whatnot) does postgresql 7.2.1 need in > which version to run smoothly? As I mentioned before, we don't use any > "fancy" features of the database like foreign keys, triggers (except > one) or whatever, it's just basic functionality that seems to fall down > on us... By the way, you do notice the little figure in your top output labeled "cached" that is around 1.7GB. I do beleive that means the kernel is using that much memory for disk cache. You are not running out of memory by any stretch of the imagination (unless one of your queries is using a lot, but it appears to be getting freed at the end). > I am willing to try almost anything - but I need to squeeze more > performance out of this thing and I need some hints on which tools to > use to identify the problem. And unfortunately I need all this as soon > as possible... Hints and help are very, very much appreciated. Thank > you! Work out where your bottleneck is. It's not memory and it doesn't appear to be CPU. With that much cache I can't imagine it's disk transfer rate either. So the only explanation is the individual queries. As for hints, the straight forward method is the best. Find whatever action is the slowest and profile it. How much in the front end, how much in the database, how much in client latency. Only once you understand where the time is going can you do any meaningful optimisation. Tweaking config settings will not help you here. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
Martijn van Oosterhout <kleptog@svana.org> writes: > My theory is that it's not using RAM at all, it's simply mapping the shared > memory in, which inflates the RSS. Interesting idea. By rights the whole shared memory segment should be charged against a backend's SIZE all the time, since it's certainly all there in the address space. But if for some reason a page of shmem isn't charged until first touched, then indeed that could explain Markus' observation --- and, perhaps, my inability to duplicate it on other platforms. (I tried on Red Hat 7.2, which you'd think would be fairly close to his SuSE release, but maybe it's different on just this point. HPUX seems to completely ignore shared segments in both SIZE and RSS, but it's way out in left field on a number of fronts...) regards, tom lane
On Wed, 3 Jul 2002 10:34:53 +0200, "Markus Wollny" <Markus.Wollny@computec.de> wrote: >This one generates an overview over all the forum-threads in one board >which has 41,624 messages, 2971 of them are FATHER_ID=0, so ^^^^ >thread-starters: [long SQL statement follows] Markus, that's strange. Your explain says: >Sort (cost=30695.27..30695.27 rows=7693 width=154) (actual >time=9745.94..9751.58 rows=4663 loops=1) ^^^^ > -> Index Scan using idx_bm_show_topics on ct_com_board_message >message (cost=0.00..30198.72 rows=7693 width=154) (actual >time=111.56..9549.99 rows=4663 loops=1) > SubPlan > -> Aggregate (cost=5.83..5.83 rows=1 width=0) (actual >time=0.91..0.91 rows=1 loops=4663) > -> Index Scan using idx_user_thread_follow on >ct_com_user_thread_follow (cost=0.00..5.83 rows=1 width=0) (actual >time=0.88..0.88 rows=0 loops=4663) >Total runtime: 9835.57 msec Anyway, that's not my point here. I'd want you to do an EXPLAIN ANALYZE of another SQL statement which does the same IMHO. From the schema you posted I see that (user_id, thread_id) is the primary key, or at least a unique index, in CT_COM_USER_THREAD_FOLLOW, so the sub-select >(select count(*) > from CT_COM_USER_THREAD_FOLLOW > where USER_ID= '295798' and thread_id=MESSAGE.THREAD_ID) as TFUID can only give 0 or 1. So following my first rule of thumb "Avoid subselects; use joins wherever possible" I'd write: select MESSAGE.BOARD_ID , MESSAGE.THREAD_ID , MESSAGE.FATHER_ID , MESSAGE.MESSAGE_ID , MESSAGE.USER_ID , MESSAGE.USER_LOGIN as LOGIN , MESSAGE.USER_STATUS as STATUS , MESSAGE.USER_RIGHTS as RIGHTS , MESSAGE.TITLE , MESSAGE.COUNT_REPLY as COUNT_REPLY , to_char(MESSAGE.LAST_REPLY,'DD.MM.YY hh24:mi') as LAST_REPLY , round((date_part('epoch',CURRENT_TIMESTAMP) - date_part('epoch',MESSAGE.LAST_REPLY))/60) as diff_posting , to_char(MESSAGE.CREATED,'DD.MM.YY hh24:mi') as DATUM , count(TH.THREAD_ID) as TFUID from CT_COM_BOARD_MESSAGE MESSAGE left join CT_COM_USER_THREAD_FOLLOW TH ON (TH.USER_ID='295798' and TH.thread_id=MESSAGE.THREAD_ID) where (0=0) and MESSAGE.BOARD_ID = 10 and MESSAGE.FATHER_ID = 0 and MESSAGE.STATE_ID = 0 order by MESSAGE.LAST_REPLY desc; Could you try to EXPLAIN ANALYZE this and post the result? And please cross-check whether it returns the same result set as your original query. Servus Manfred
On Wed, 03 Jul 2002 21:09:24 +0200, I wrote: >select MESSAGE.BOARD_ID > , [...] > , count(TH.THREAD_ID) as TFUID Oops! COUNT won't work unless you add GRAOUP BY <all other fields> at the end of the query. I had CASE WHEN th.thread_id IS NULL THEN 0 ELSE 1 END here and thoughtlessly replaced it by COUNT(...) for brevity. How foolish! Servus Manfred