Обсуждение: RAID 1 - drive failed - very slow queries even after drive replaced
Hi, I am looking for some advice on where to troubleshoot after 1 drive in a RAID 1 failed. Thank you. I am running v 7.41, I am currently importing the data to another physical server running 8.4 and will test with that once I can. In the meantime here is relevant info: Backups used to take 25 minutes, and now take 110 minutes, before replacing the drive it became clear the backup was not going to finish since in 120 minutes it had only finished 200mb of 2.8gb. Before replacing the drive: ----------------------------------- We noticed all of the queries were slow, many taking over 100 seconds. After we replaced the drives we noticed the queries are running 40 seconds or more and most are 8 seconds or more where the same query used to take only 1 second. We have replaced a drive in this RAID 1 before and nothing like this happened. The schema was not touched for at least 1 week prior to this. Since replacing the drive I have: ------------------------------------------- Restored from a backup a few hours before the queries became very slow. Reindex all tables Vacuum all tables Analyze all tables Here is what I get with iostat: iostat -k /dev/sda2 Linux 2.6.26-2-686-bigmem (db1) avg-cpu: %user %nice %system %iowait %steal %idle 19.61 0.00 8.34 1.60 0.00 70.45
> Hi, > > I am looking for some advice on where to troubleshoot after 1 drive in > a RAID 1 failed. > > Thank you. > > I am running v 7.41, I am currently importing the data to another > physical server running 8.4 and will test with that once I can. In the > meantime here is relevant info: > > Backups used to take 25 minutes, and now take 110 minutes, before > replacing the drive it became clear the backup was not going to finish > since in 120 minutes it had only finished 200mb of 2.8gb. What kind of RAID is this? I know it's RAID1, but is it built using linux sw raid (md), or is it some other solution (like a hw card)? Can you post some diagnostics info like mdadm --detail /dev/md1 or something like that? I guess this would be reflected in the iostat output (at least for the 'md') but I'm not sure. Maybe there's something still in progress (sync of the new drive?). Tomas
On Wed, Mar 23, 2011 at 3:33 AM, Merrick <merrick@gmail.com> wrote: > Hi, > > I am looking for some advice on where to troubleshoot after 1 drive in > a RAID 1 failed. > > Thank you. > > I am running v 7.41, I am currently importing the data to another > physical server running 8.4 and will test with that once I can. In the > meantime here is relevant info: > > Backups used to take 25 minutes, and now take 110 minutes, before > replacing the drive it became clear the backup was not going to finish > since in 120 minutes it had only finished 200mb of 2.8gb. > > Before replacing the drive: > ----------------------------------- > We noticed all of the queries were slow, many taking over 100 seconds. > After we replaced the drives we noticed the queries are running 40 > seconds or more and most are 8 seconds or more where the same query > used to take only 1 second. We have replaced a drive in this RAID 1 > before and nothing like this happened. The schema was not touched for > at least 1 week prior to this. > > Since replacing the drive I have: > ------------------------------------------- > Restored from a backup a few hours before the queries became very > slow. > Reindex all tables > Vacuum all tables > Analyze all tables > > Here is what I get with iostat: > > iostat -k /dev/sda2 > Linux 2.6.26-2-686-bigmem (db1) > avg-cpu: %user %nice %system %iowait %steal %idle > 19.61 0.00 8.34 1.60 0.00 70.45 probably the replacement drive is bunk, or some esoteric hw problem is tripping you up. some iostat numbers while you are having the problem would be more telling. the solution is obvious -- in terms of this server, it's time to ramble on... merlin
Thank you Merlin, I had my suspicions about the hardware as well. The backup server is blazing fast, it is definitely "time to ramble on..." On Mar 23, 7:11 am, mmonc...@gmail.com (Merlin Moncure) wrote: > On Wed, Mar 23, 2011 at 3:33 AM, Merrick <merr...@gmail.com> wrote: > > Hi, > > > I am looking for some advice on where to troubleshoot after 1 drive in > > a RAID 1 failed. > > > Thank you. > > > I am running v 7.41, I am currently importing the data to another > > physical server running 8.4 and will test with that once I can. In the > > meantime here is relevant info: > > > Backups used to take 25 minutes, and now take 110 minutes, before > > replacing the drive it became clear the backup was not going to finish > > since in 120 minutes it had only finished 200mb of 2.8gb. > > > Before replacing the drive: > > ----------------------------------- > > We noticed all of the queries were slow, many taking over 100 seconds. > > After we replaced the drives we noticed the queries are running 40 > > seconds or more and most are 8 seconds or more where the same query > > used to take only 1 second. We have replaced a drive in this RAID 1 > > before and nothing like this happened. The schema was not touched for > > at least 1 week prior to this. > > > Since replacing the drive I have: > > ------------------------------------------- > > Restored from a backup a few hours before the queries became very > > slow. > > Reindex all tables > > Vacuum all tables > > Analyze all tables > > > Here is what I get with iostat: > > > iostat -k /dev/sda2 > > Linux 2.6.26-2-686-bigmem (db1) > > avg-cpu: �%user � %nice %system %iowait �%steal � %idle > > � � � � �19.61 � �0.00 � �8.34 � �1.60 � �0.00 � 70.45 > > probably the replacement drive is bunk, or some esoteric hw problem is > tripping you up. some iostat numbers while you are having the problem > would be more telling. the solution is obvious -- in terms of this > server, it's time to ramble on... > > merlin > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
On 23 Mar 2011, at 9:33, Merrick wrote: > Backups used to take 25 minutes, and now take 110 minutes, before > replacing the drive it became clear the backup was not going to finish > since in 120 minutes it had only finished 200mb of 2.8gb. A few obvious questions: 1. Are you sure you replaced the correct drive? 2. Did the mirror finish resyncing before you did above measurements? > Before replacing the drive: > ----------------------------------- > We noticed all of the queries were slow, many taking over 100 seconds. > After we replaced the drives we noticed the queries are running 40 > seconds or more and most are 8 seconds or more where the same query > used to take only 1 second. We have replaced a drive in this RAID 1 > before and nothing like this happened. The schema was not touched for > at least 1 week prior to this. > > Since replacing the drive I have: > ------------------------------------------- > Restored from a backup a few hours before the queries became very > slow. > Reindex all tables > Vacuum all tables > Analyze all tables Are you sure it was the drive that broke? Or maybe you have some collateral hardware damage to, for example, your raid controller,cables, disk controller or motherboard? Maybe the new drive has different requirements than the old one had (morepower, for example)? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d8a454f651341486416489!
On Wed, Mar 23, 2011 at 08:08:47PM +0100, Alban Hertroys wrote: > er or motherboard? Maybe the new drive has different requirements than the old one had (more power, for example)? Or a newer but backward-plug-compatible interface? Often, the new drive in the old plug only uses the Right Features once its compatibility mode has been turned off. (This is at least true in my experience. Not saying it's the cause of the present issue, though.) A -- Andrew Sullivan ajs@crankycanuck.ca