Обсуждение: Postmaster won't -HUP

Поиск
Список
Период
Сортировка

Postmaster won't -HUP

От
Jerry Lynde
Дата:
Hello out there,

    I'm having a problem with a production server. Actually, there are two
problems. The semi-trivial problem is that Postgres won't die using the
service mechanism. As root, I "service postgres stop" and then "service
postgres start" after a reasonable wait. The restart will earn me a
"StreamServerPort: cannot bind to port" which indicates tht the process
never died. A ps ax confirms the persistance of postmaster. When I kill -9
the processes (postmaster, the /bin/sh -c postgres, and logger) they
process will claim to start with "service postgres start" but it reports no
PID and doesn't show up in ps ax. It is clearly not running at this point.

    The real problem, which caused all this debugging, is that twice so far,
for no apparent reason, I have pegged the processors on the server. The
machine has two 500mHz processors with 256 MB ram. I have a hard time
believing that one small query can bring that machine to its knees, but it
has, twice. The queries were run through a hard coded php front end (for
testing purposes). Any insight on these two problems would be appreciated
greatly.

thanks,

Jerry Lynde



Jerry Lynde
IT - Invocation/Evocation, Banishing, et al.
"In E-commerce, the E stands for Evil."
Due Diligence Inc.
http://www.diligence.com
Phone: (406) 728-0001 x232
Fax: (406) 728-0006

Re: Postmaster won't -HUP

От
Joseph Shraibman
Дата:
In version 7.0 postgres waits for all clients to close their connections
before exiting.  Before it just quit.

Jerry Lynde wrote:
>
> Hello out there,
>
>         I'm having a problem with a production server. Actually, there are two
> problems. The semi-trivial problem is that Postgres won't die using the
> service mechanism. As root, I "service postgres stop" and then "service
> postgres start" after a reasonable wait. The restart will earn me a
> "StreamServerPort: cannot bind to port" which indicates tht the process
> never died. A ps ax confirms the persistance of postmaster. When I kill -9
> the processes (postmaster, the /bin/sh -c postgres, and logger) they
> process will claim to start with "service postgres start" but it reports no
> PID and doesn't show up in ps ax. It is clearly not running at this point.
>
>         The real problem, which caused all this debugging, is that twice so far,
> for no apparent reason, I have pegged the processors on the server. The
> machine has two 500mHz processors with 256 MB ram. I have a hard time
> believing that one small query can bring that machine to its knees, but it
> has, twice. The queries were run through a hard coded php front end (for
> testing purposes). Any insight on these two problems would be appreciated
> greatly.
>
> thanks,
>
> Jerry Lynde
>
> Jerry Lynde
> IT - Invocation/Evocation, Banishing, et al.
> "In E-commerce, the E stands for Evil."
> Due Diligence Inc.
> http://www.diligence.com
> Phone: (406) 728-0001 x232
> Fax: (406) 728-0006

Re: Postmaster won't -HUP

От
Ed Loehr
Дата:
Jerry Lynde wrote:
>
>         I'm having a problem with a production server. Actually, there are two
> problems. The semi-trivial problem is that Postgres won't die using the
> service mechanism. As root, I "service postgres stop" and then "service
> postgres start" after a reasonable wait. The restart will earn me a
> "StreamServerPort: cannot bind to port" which indicates tht the process
> never died. A ps ax confirms the persistance of postmaster. When I kill -9
> the processes (postmaster, the /bin/sh -c postgres, and logger) they
> process will claim to start with "service postgres start" but it reports no
> PID and doesn't show up in ps ax. It is clearly not running at this point.
>         The real problem, which caused all this debugging, is that twice so far,
> for no apparent reason, I have pegged the processors on the server. The
> machine has two 500mHz processors with 256 MB ram. I have a hard time
> believing that one small query can bring that machine to its knees, but it
> has, twice. The queries were run through a hard coded php front end (for
> testing purposes). Any insight on these two problems would be appreciated
> greatly.

Hello Jerry,

What version are you running?

What does your server log show?  Any other logs?

Can you show us the small query, # of rows, etc?

Regards,
Ed Loehr

Re: Postmaster won't -HUP

От
Martijn van Oosterhout
Дата:
Jerry Lynde wrote:
>
> Hello out there,
>
>         I'm having a problem with a production server. Actually, there are two
> problems. The semi-trivial problem is that Postgres won't die using the
> service mechanism. As root, I "service postgres stop" and then "service
> postgres start" after a reasonable wait. The restart will earn me a
> "StreamServerPort: cannot bind to port" which indicates tht the process
> never died. A ps ax confirms the persistance of postmaster. When I kill -9
> the processes (postmaster, the /bin/sh -c postgres, and logger) they
> process will claim to start with "service postgres start" but it reports no
> PID and doesn't show up in ps ax. It is clearly not running at this point.

Hmm, on debian it always dies, though it does takes a while sometimes.
Also, try to avoid kill -9 because then it can't clean up shared memory,
locks, temporary files, etc. Try kill -INT or -TERM.

>         The real problem, which caused all this debugging, is that twice so far,
> for no apparent reason, I have pegged the processors on the server. The
> machine has two 500mHz processors with 256 MB ram. I have a hard time
> believing that one small query can bring that machine to its knees, but it
> has, twice. The queries were run through a hard coded php front end (for
> testing purposes). Any insight on these two problems would be appreciated
> greatly.

Heh. Yes, I also wonder why "select * from bigtable" where bigtable is
a 400,000 row table crashes the machine also :). (This is pg_dump btw).
Maybe you should post the query.

Actually, there is one other situation I've killed the machine nearly.
I have two big tables, "bigtable" and "largetable". Then do this
query:

select * from bigtable where largetable.a = 1;

(ofcourse the actual query was much longer but this one demonstrates the
problem). It actually does a join between those two tables even though
only one is mentioned in the from part. Running explain over this told
that this would take a *very* long time to complete.

This is probably not a problem in newer versions though. What version
are you using?
--
Martijn van Oosterhout <kleptog@cupid.suninternet.com>
http://cupid.suninternet.com/~kleptog/

Re: Postmaster won't -HUP

От
Jerry Lynde
Дата:
Wow! Thanks for all the replies!

The version I'm running is 7.0.0

    It could certainly be that the connections are remaining open and the
server isn't dying due to that. I'll play with it a little when I get the
chance, probably this weekend (when clients aren't hitting the server.)
Thanks for the tip on -INT and -TERM.

    As for the query I'm running, it was simply  select * from bigtable (about
2-300k lines) where
                         firstname=<obscure fname> and
                        lastname=<obscure lname> and
                        DOB=<the exact DOB for the above named individual>

    Essentially, the query was hard-coded to pull a specific record from the
data and disply it on the page. While the dataset is  medium to large
(IMHO) it shouldn't peg two 500's. The problem happened the first time
reliably when I ran the query. I was using phpdb for the php interface (I
have stopped using that for the time being, because it maxes the system
every time.)






Jerry Lynde
IT - Invocation/Evocation, Banishing, et al.
"In E-commerce, the E stands for Evil."
Due Diligence Inc.
http://www.diligence.com
Phone: (406) 728-0001 x232
Fax: (406) 728-0006

Re: Postmaster won't -HUP

От
Ed Loehr
Дата:
Jerry Lynde wrote:
>
>         As for the query I'm running, it was simply  select * from bigtable (about
> 2-300k lines) where
>                                                  firstname=<obscure fname> and
>                                                 lastname=<obscure lname> and
>                                                 DOB=<the exact DOB for the above named individual>

What indices do you have on those fields?

Regards,
Ed Loehr

Re: Postmaster won't -HUP

От
Tom Lane
Дата:
Ed Loehr <eloehr@austin.rr.com> writes:
> Jerry Lynde wrote:
>>
>> As for the query I'm running, it was simply  select * from bigtable (about
>> 2-300k lines) where
>> firstname=<obscure fname> and
>> lastname=<obscure lname> and
>> DOB=<the exact DOB for the above named individual>

> What indices do you have on those fields?

And is it using them --- ie, what does EXPLAIN say about the query?

            regards, tom lane

Re: Postmaster won't -HUP

От
Jerry Lynde
Дата:
At 12:11 PM 6/1/00 -0500, Ed Loehr wrote:
>Jerry Lynde wrote:
> >
> >         As for the query I'm running, it was simply  select * from
> bigtable (about
> > 2-300k lines) where
> >                                                  firstname=<obscure
> fname> and
> >                                                 lastname=<obscure
> lname> and
> >                                                 DOB=<the exact DOB for
> the above named individual>
>
>What indices do you have on those fields?
>
>Regards,
>Ed Loehr


They are all indexed, the DOB index is actually  DOBYear DOBDay and
DOBMonth and all 5 fields are indexed



Re: Postmaster won't -HUP

От
Jerry Lynde
Дата:
At 01:21 PM 6/1/00 -0400, Tom Lane wrote:
>Ed Loehr <eloehr@austin.rr.com> writes:
> > Jerry Lynde wrote:
> >>
> >> As for the query I'm running, it was simply  select * from bigtable (about
> >> 2-300k lines) where
> >> firstname=<obscure fname> and
> >> lastname=<obscure lname> and
> >> DOB=<the exact DOB for the above named individual>
>
> > What indices do you have on those fields?
>
>And is it using them --- ie, what does EXPLAIN say about the query?
>
>                         regards, tom lane


It started indexing on DOBMonth until I did a VACUUM ANALYZE, after which
it uses the lastname index



Re: Postmaster won't -HUP

От
Joseph Shraibman
Дата:
Jerry Lynde wrote:
>
> At 12:11 PM 6/1/00 -0500, Ed Loehr wrote:
> >Jerry Lynde wrote:
> > >
> > >         As for the query I'm running, it was simply  select * from
> > bigtable (about
> > > 2-300k lines) where
> > >                                                  firstname=<obscure
> > fname> and
> > >                                                 lastname=<obscure
> > lname> and
> > >                                                 DOB=<the exact DOB for
> > the above named individual>
> >
> >What indices do you have on those fields?
> >
> >Regards,
> >Ed Loehr
>
> They are all indexed, the DOB index is actually  DOBYear DOBDay and
> DOBMonth and all 5 fields are indexed

Do you have 5 indexes or do you have an index that spans more than one
field?

Re: Postmaster won't -HUP

От
Jerry Lynde
Дата:
At 05:19 PM 6/1/00 -0400, Joseph Shraibman wrote:
>Jerry Lynde wrote:
> >
> > At 12:11 PM 6/1/00 -0500, Ed Loehr wrote:
> > >Jerry Lynde wrote:
> > > >
> > > >         As for the query I'm running, it was simply  select * from
> > > bigtable (about
> > > > 2-300k lines) where
> > > >                                                  firstname=<obscure
> > > fname> and
> > > >                                                 lastname=<obscure
> > > lname> and
> > > >                                                 DOB=<the exact DOB for
> > > the above named individual>
> > >
> > >What indices do you have on those fields?
> > >
> > >Regards,
> > >Ed Loehr
> >
> > They are all indexed, the DOB index is actually  DOBYear DOBDay and
> > DOBMonth and all 5 fields are indexed
>
>Do you have 5 indexes or do you have an index that spans more than one
>field?


Sorry for being less than explicit. There are 5 separate indices, one per
field.



Jerry Lynde
IT - Invocation/Evocation, Banishing, et al.
"In E-commerce, the E stands for Evil."
Due Diligence Inc.
http://www.diligence.com
Phone: (406) 728-0001 x232
Fax: (406) 728-0006

Re: Postmaster won't -HUP

От
Tom Lane
Дата:
Jerry Lynde <jlynde@diligence.com> writes:
>>>> They are all indexed, the DOB index is actually  DOBYear DOBDay and
>>>> DOBMonth and all 5 fields are indexed
>>
>> Do you have 5 indexes or do you have an index that spans more than one
>> field?

> Sorry for being less than explicit. There are 5 separate indices, one per
> field.

So your query is really something more like

    ... WHERE firstname = 'joe' AND lastname = 'blow' AND
          DOByear = 1999 AND DOBmonth = 1 AND DOBday = 1

?

The problem here is that only one index can be used in any individual
scan.  If I were the optimizer I'd probably figure that lastname is
going to be the most selective of the five available choices, too.

I'd suggest storing the DOB as *one* field of type 'date'.  You can
pull out the subparts for display with date_part() when you need to,
but for searches you'll be a lot better off with

        WHERE DOB = '1999-01-01'

            regards, tom lane

Re: Postmaster won't -HUP

От
Jerry Lynde
Дата:
At 05:58 PM 6/1/00 -0400, you wrote:
Jerry Lynde <jlynde@diligence.com> writes:
 >>>> They are all indexed, the DOB index is actually  DOBYear DOBDay and
 >>>> DOBMonth and all 5 fields are indexed
 >>
 >> Do you have 5 indexes or do you have an index that spans more than one
 >> field?

 > Sorry for being less than explicit. There are 5 separate indices, one per
 > field.

So your query is really something more like

    ... WHERE firstname = 'joe' AND lastname = 'blow' AND
          DOByear = 1999 AND DOBmonth = 1 AND DOBday = 1

?

yes



The problem here is that only one index can be used in any individual
scan.  If I were the optimizer I'd probably figure that lastname is
going to be the most selective of the five available choices, too.

and it did, and that's ok


I'd suggest storing the DOB as *one* field of type 'date'.  You can
pull out the subparts for display with date_part() when you need to,
but for searches you'll be a lot better off with

        WHERE DOB = '1999-01-01'

            regards, tom lane


    Thanks for the tip. I might indeed take that approach in the future,
however that's not really the problem I'm trying to tackle right now.
Indexing by Last Name is fine with me, currently. What's not working for me
is the part where the dual pentium 500 machine with 256MB RAM goes into
deep thought indefinitely for one simple hard-coded query.
    I used to think that the problem was due to the phpdb module that I was
invoking, since the behavior exhibited itself consistently doing the
aforementioned query with the phpdb module. Using nothing but straight php
I have been able to make the query run smoothly.
    The reason I no longer believe the problem was tied to phpdb is that the
behavior with the processors (all processor time devoted to user processes)
happened when I was not making use of phpdb anymore. In fact I wasn't even
making queries at the time, so it may not be tied to postgres at all, but I
suspect it might, since the problem happens at random currently, but was
consistent and predictable with the phpdb-driven postgres query.



Re: Postmaster won't -HUP

От
Tom Lane
Дата:
Jerry Lynde <jlynde@diligence.com> writes:
>     Thanks for the tip. I might indeed take that approach in the future,
> however that's not really the problem I'm trying to tackle right now.
> Indexing by Last Name is fine with me, currently. What's not working for me
> is the part where the dual pentium 500 machine with 256MB RAM goes into
> deep thought indefinitely for one simple hard-coded query.

Ah, sorry ... I've been seeing so many optimizer questions lately that
I tend to zero right in on anything that looks like a misoptimization
issue.

I'm not aware of any reason that a query such as you describe would
tend to hang up the machine.  It would be useful to know what you see
in "top" or some other monitoring program when the problem happens.
Is there just one backend process sucking all the CPU time?  More than
one?  Is the process(es) memory usage stable, or climbing?

An even more useful bit of info is a stack trace from a backend that's
suffering the problem: if you do a "kill -ABORT" on it you should get
a coredump and be able to backtrace with gdb.  (Note this will cause
a database system restart, ie all the other backends will commit
harakiri too, so I wouldn't advise doing it during normal usage of the
system.)

            regards, tom lane