Обсуждение: RAID 1 - drive failed - very slow queries even after drive replaced

Поиск
Список
Период
Сортировка

RAID 1 - drive failed - very slow queries even after drive replaced

От
Merrick
Дата:
Hi,

I am looking for some advice on where to troubleshoot after 1 drive in
a RAID 1 failed.

Thank you.

I am running v 7.41, I am currently importing the data to another
physical server running 8.4 and will test with that once I can. In the
meantime here is relevant info:

Backups used to take 25 minutes, and now take 110 minutes, before
replacing the drive it became clear the backup was not going to finish
since in 120 minutes it had only finished 200mb of 2.8gb.

Before replacing the drive:
-----------------------------------
We noticed all of the queries were slow, many taking over 100 seconds.
After we replaced the drives we noticed the queries are running 40
seconds or more and most are 8 seconds or more where the same query
used to take only 1 second. We have replaced a drive in this RAID 1
before and nothing like this happened. The schema was not touched for
at least 1 week prior to this.

Since replacing the drive I have:
-------------------------------------------
Restored from a backup a few hours before the queries became very
slow.
Reindex all tables
Vacuum all tables
Analyze all tables

Here is what I get with iostat:

iostat -k /dev/sda2
Linux 2.6.26-2-686-bigmem (db1)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          19.61    0.00    8.34    1.60    0.00   70.45

Re: RAID 1 - drive failed - very slow queries even after drive replaced

От
tv@fuzzy.cz
Дата:
> Hi,
>
> I am looking for some advice on where to troubleshoot after 1 drive in
> a RAID 1 failed.
>
> Thank you.
>
> I am running v 7.41, I am currently importing the data to another
> physical server running 8.4 and will test with that once I can. In the
> meantime here is relevant info:
>
> Backups used to take 25 minutes, and now take 110 minutes, before
> replacing the drive it became clear the backup was not going to finish
> since in 120 minutes it had only finished 200mb of 2.8gb.

What kind of RAID is this? I know it's RAID1, but is it built using linux
sw raid (md), or is it some other solution (like a hw card)? Can you post
some diagnostics info like

mdadm --detail /dev/md1

or something like that? I guess this would be reflected in the iostat
output (at least for the 'md') but I'm not sure. Maybe there's something
still in progress (sync of the new drive?).

Tomas


Re: RAID 1 - drive failed - very slow queries even after drive replaced

От
Merlin Moncure
Дата:
On Wed, Mar 23, 2011 at 3:33 AM, Merrick <merrick@gmail.com> wrote:
> Hi,
>
> I am looking for some advice on where to troubleshoot after 1 drive in
> a RAID 1 failed.
>
> Thank you.
>
> I am running v 7.41, I am currently importing the data to another
> physical server running 8.4 and will test with that once I can. In the
> meantime here is relevant info:
>
> Backups used to take 25 minutes, and now take 110 minutes, before
> replacing the drive it became clear the backup was not going to finish
> since in 120 minutes it had only finished 200mb of 2.8gb.
>
> Before replacing the drive:
> -----------------------------------
> We noticed all of the queries were slow, many taking over 100 seconds.
> After we replaced the drives we noticed the queries are running 40
> seconds or more and most are 8 seconds or more where the same query
> used to take only 1 second. We have replaced a drive in this RAID 1
> before and nothing like this happened. The schema was not touched for
> at least 1 week prior to this.
>
> Since replacing the drive I have:
> -------------------------------------------
> Restored from a backup a few hours before the queries became very
> slow.
> Reindex all tables
> Vacuum all tables
> Analyze all tables
>
> Here is what I get with iostat:
>
> iostat -k /dev/sda2
> Linux 2.6.26-2-686-bigmem (db1)
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          19.61    0.00    8.34    1.60    0.00   70.45

probably the replacement drive is bunk, or some esoteric hw problem is
tripping you up.  some iostat numbers while you are having the problem
would be more telling.  the solution is obvious -- in terms of this
server, it's time to ramble on...

merlin

Re: RAID 1 - drive failed - very slow queries even after drive replaced

От
Merrick
Дата:
Thank you Merlin, I had my suspicions about the hardware as well.

The backup server is blazing fast, it is definitely

"time to ramble on..."

On Mar 23, 7:11 am, mmonc...@gmail.com (Merlin Moncure) wrote:
> On Wed, Mar 23, 2011 at 3:33 AM, Merrick <merr...@gmail.com> wrote:
> > Hi,
>
> > I am looking for some advice on where to troubleshoot after 1 drive in
> > a RAID 1 failed.
>
> > Thank you.
>
> > I am running v 7.41, I am currently importing the data to another
> > physical server running 8.4 and will test with that once I can. In the
> > meantime here is relevant info:
>
> > Backups used to take 25 minutes, and now take 110 minutes, before
> > replacing the drive it became clear the backup was not going to finish
> > since in 120 minutes it had only finished 200mb of 2.8gb.
>
> > Before replacing the drive:
> > -----------------------------------
> > We noticed all of the queries were slow, many taking over 100 seconds.
> > After we replaced the drives we noticed the queries are running 40
> > seconds or more and most are 8 seconds or more where the same query
> > used to take only 1 second. We have replaced a drive in this RAID 1
> > before and nothing like this happened. The schema was not touched for
> > at least 1 week prior to this.
>
> > Since replacing the drive I have:
> > -------------------------------------------
> > Restored from a backup a few hours before the queries became very
> > slow.
> > Reindex all tables
> > Vacuum all tables
> > Analyze all tables
>
> > Here is what I get with iostat:
>
> > iostat -k /dev/sda2
> > Linux 2.6.26-2-686-bigmem (db1)
> > avg-cpu: �%user � %nice %system %iowait �%steal � %idle
> > � � � � �19.61 � �0.00 � �8.34 � �1.60 � �0.00 � 70.45
>
> probably the replacement drive is bunk, or some esoteric hw problem is
> tripping you up.  some iostat numbers while you are having the problem
> would be more telling.  the solution is obvious -- in terms of this
> server, it's time to ramble on...
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general


Re: RAID 1 - drive failed - very slow queries even after drive replaced

От
Alban Hertroys
Дата:
On 23 Mar 2011, at 9:33, Merrick wrote:

> Backups used to take 25 minutes, and now take 110 minutes, before
> replacing the drive it became clear the backup was not going to finish
> since in 120 minutes it had only finished 200mb of 2.8gb.

A few obvious questions:
1. Are you sure you replaced the correct drive?
2. Did the mirror finish resyncing before you did above measurements?

> Before replacing the drive:
> -----------------------------------
> We noticed all of the queries were slow, many taking over 100 seconds.
> After we replaced the drives we noticed the queries are running 40
> seconds or more and most are 8 seconds or more where the same query
> used to take only 1 second. We have replaced a drive in this RAID 1
> before and nothing like this happened. The schema was not touched for
> at least 1 week prior to this.
>
> Since replacing the drive I have:
> -------------------------------------------
> Restored from a backup a few hours before the queries became very
> slow.
> Reindex all tables
> Vacuum all tables
> Analyze all tables

Are you sure it was the drive that broke? Or maybe you have some collateral hardware damage to, for example, your raid
controller,cables, disk controller or motherboard? Maybe the new drive has different requirements than the old one had
(morepower, for example)? 


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d8a454f651341486416489!



Re: RAID 1 - drive failed - very slow queries even after drive replaced

От
Andrew Sullivan
Дата:
On Wed, Mar 23, 2011 at 08:08:47PM +0100, Alban Hertroys wrote:
> er or motherboard? Maybe the new drive has different requirements than the old one had (more power, for example)?

Or a newer but backward-plug-compatible interface?  Often, the new
drive in the old plug only uses the Right Features once its
compatibility mode has been turned off.  (This is at least true in my
experience.  Not saying it's the cause of the present issue, though.)

A

--
Andrew Sullivan
ajs@crankycanuck.ca