Обсуждение: Re: performance enhancements for PostgreSQL: update

Поиск
Список
Период
Сортировка

Re: performance enhancements for PostgreSQL: update

От
"Johnson, Shaunn"
Дата:

--thanks for the reply all

--this is what i have so far:

--[snip]

> In an effort to enhance / streamline performance, I've done
> the following:
>
> * memory upgrade from 512M to 1G
> * move RAID5 to scsi drives (10K RPM)
> * set up cron script to vacuum database weekly
> * set number of client connects ( i.e., /usr/bin/postmaster -i -B 128 -N 64
> -d 4 )

--in the startup script, i launch / end postgres this way:

--[snip from startup]

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -o '-i -B 128 -N 64 -d 2' \
-p /usr/bin/postmaster start >/dev/null "

--and--

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl stop -D $PGDATA -s -m fast" > /dev/null 2>&1

--[/snip]

--i have been trying to use the postgres.conf file but i haven't
--had much success with it.  will put that up higher in the priority
--food chain.

--can someone explain what the benefits for that (allocating
--more buffer blocks) will be?  rather, can someone post a link
--to the docs that explains why it would benefit me?

You could probably allocate WAY more buffer blocks than that.  I run 4000
on most of my medium weight machines, and with a gig of ram you could get
away with quite a bit more, but you'll need to increads shmmax and shmall
to go very high.   But I don't think that's your major problem.
 
--working on it now ... had to take the '-d 4' part of my start up options
--down to a '-d 2' ... the system was thinking / writing for too long just
--to test a simple query ...

More important, don't start the postmaster that way.  Edit the
$PGDATA/postgresql.conf file, then use the pg_ctl command to start and
stop it.

> But I'm at the point now that I can't kill some jobs.  Yes, I know
> I shouldn't use 'kill' in any forceful way, but just a kill seems to do
> nothing (or, if it is doing something, it's not fast enough
> for the user community and it's stopping production).

--i've seen it first hand why it's not good, but, i got this not
--too long ago:

--[excerpt from old email]
Use kill -9. Do a:

killall -9 postgres
killall -9 postmaster

The tip message

'Don't kill -9 the postmaster'

is old and was added when havoc could be caused by an old
postgres backend process. Consider the scenrio:

1. postmaster started
2. postgres started  (Session #1)
3. postmaster killed (-9)
4. postmaster restarted
5. postgres started (Session #2)

Now there isn't any synchronization between Sesison #1 and
Session #2 at all, which would lead to data corruption. This
scenario was fixed a long time ago (7.1?). The whole 'Don't kill
-9 the postmaster' comment was actually a tongue-in-cheek remark
by me regarding a parallel discussion of RedHat init scripts.
The corruption possibility has long-since been fixed. Since I've
seen FUD claiming PostgreSQL doesn't have sufficient
crash-recovery because of the tip, I suggest the tip be changed to:

'Feel free to kill -9 the postmaster'

--[/excerpt]

--is this true?   perhaps someone can verify this.

You can kill individual backends pretty safely, it's the postmaster you
can't kill -9 safely.  If you see a postgres child running away with all
your memory etc... you can kill -9 that pid pretty safely. 

> I got this from a co-worker:
>
> [snip]
>
> the server is blocking on access to the metadata tables.  not even
> logins are being processed.  I'm not sure what caused the
> problem, but I think a database restart is the best course,
> which I have been trying to do.  Do not kill -9, as it will corrupt  the
> WAL.

--yup.  tried that and nothing changed for about 20 minutes or so ... so
--i had to reboot the server (i *hate* doing that).

If you want to shut down the server and it doesn't seem to respond to
pg_ctl stop, try 'pg_ctl -m fast stop' and see if that works.

--[snip raid errors]

--this is what the  /proc/mdstat says

--[snip from /proc/mdstats]

Personalities :
read_ahead not set
unused devices: <none>

--[/snip from /proc/mdstats]

That looks like a dead drive in your RAID array.  What does 'cat
/proc/mdstat' say about the drive

--i'm doing a linux software raid

(are you using linux software raid, or a hardware controller?)

> * have can I figure out how access to the metadata tables
>   are being stopped? (my guess is the error on the scsi drive, but ... )

Sounds like processes are hanging, and users are just trying to
reconnect over and over and you're running out of connections. 
This is a symptom, not the problem, which is your machine is
having issues.

[snip rest of email]

--i'm thinking more and more that while someone was trying to
--update / insert data into a table the scsi disk stopped writing
--(or just gave a lot of errors while writing).  the thing is, *some* people
--could do work (albeit very little).

--at any rate, i'll have to investigate more later (probably the
--weekend) and stress test the array. 

--also, i hope to get more messages in the log file with a
--better debug level to help track the problem.

--thanks again!

-X

Re: performance enhancements for PostgreSQL: update

От
"scott.marlowe"
Дата:
On Mon, 18 Nov 2002, Johnson, Shaunn wrote:

> su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -o '-i -B 128 -N 64
> -d 2' \
> -p /usr/bin/postmaster start >/dev/null "

Doing it that way, you should be able to leave out the parts from -o on
and get it to use the postgresql.conf file settings.

> --is this true?   perhaps someone can verify this.

Generally speaking, while it still considered bad practice (or at least
somewhat "rude" :-) to kill -9 the postmaster, it isn't particularly
dangerous.  I've done it while heavily testing a database (pgbench -c 100
-t 5000) and it never once corrupted my database.

> --this is what the  /proc/mdstat says
>
> --[snip from /proc/mdstats]
>
> Personalities :
> read_ahead not set
> unused devices: <none>
>
> --[/snip from /proc/mdstats]
>
> That looks like a dead drive in your RAID array.  What does 'cat
> /proc/mdstat' say about the drive
>
> --i'm doing a linux software raid

Is there anymore to the /proc/mdstat entry?  normally you should have a
line that has a bit that names the drive partitions in the software raid
and has a bit showing which drives are online that looks something like
this: [UUUU] for all drives being UP.  And like this: [UU_U] where the
underscore shows a missing drive.  If you don't have a line like that
then Linux isn't doing the software RAID array.  Could it be that you
aren't actually running RAID but think you are?  Just wondering.


Re: performance enhancements for PostgreSQL: update

От
Medi Montaseri
Дата:
Actually pg_ctl -D $PGDATA is redundant....


 -D datadir
              Specifies  the  file  system  location of the database
files. If
              this is omitted, the environment variable PGDATA is used.


In fact if $PGDATA is for some reasong not loaded, you'll paint yourself
to a bad spot
where -D -o will be regarded as -o is the datadir location.


scott.marlowe wrote:

>On Mon, 18 Nov 2002, Johnson, Shaunn wrote:
>
>
>
>>su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -o '-i -B 128 -N 64
>>-d 2' \
>>-p /usr/bin/postmaster start >/dev/null "
>>
>>
>
>Doing it that way, you should be able to leave out the parts from -o on
>and get it to use the postgresql.conf file settings.
>
>
>
>>--is this true?   perhaps someone can verify this.
>>
>>
>
>Generally speaking, while it still considered bad practice (or at least
>somewhat "rude" :-) to kill -9 the postmaster, it isn't particularly
>dangerous.  I've done it while heavily testing a database (pgbench -c 100
>-t 5000) and it never once corrupted my database.
>
>
>
>>--this is what the  /proc/mdstat says
>>
>>--[snip from /proc/mdstats]
>>
>>Personalities :
>>read_ahead not set
>>unused devices: <none>
>>
>>--[/snip from /proc/mdstats]
>>
>>That looks like a dead drive in your RAID array.  What does 'cat
>>/proc/mdstat' say about the drive
>>
>>--i'm doing a linux software raid
>>
>>
>
>Is there anymore to the /proc/mdstat entry?  normally you should have a
>line that has a bit that names the drive partitions in the software raid
>and has a bit showing which drives are online that looks something like
>this: [UUUU] for all drives being UP.  And like this: [UU_U] where the
>underscore shows a missing drive.  If you don't have a line like that
>then Linux isn't doing the software RAID array.  Could it be that you
>aren't actually running RAID but think you are?  Just wondering.
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>