On 8/11/06, Tom Laudeman <twl8n@virginia.edu> wrote:
> Merlin,
> The problem is disk. I've got a WD Caviar. hdparm says it does 44MB/sec
> (I ran that in single user mode so there was nothing interfering). A WD
> Caviar SE SATA in one of my servers at home gets 56MB/sec on a quiescent
> system at runlevel 3. What kind of values does hdparm give for a SATA
> Raptor?
i dont have one handy on a linux box to test, but all raptors are 10k
drives with 16 mb cache. they feel about twice as fast as 7200 rpm ide
drives in general use.
> I think my Dell Precision 650 has SATA on the motherboard. The boss says
> I can order one drive, so what should I get? How much faster is RAID 0+1
> than a single drive?
depends on a lot of factors, near 100% improvement is realistic even
with software raid. I would tell your boss that you could buy 2 36g
raptors (110$ each) do a simple raid 0. just be aware that either
drive failing will take you out. or, you could do raid 1 for
redundancy.
> Aside from size, I can't see much difference between these drives (WD
> Raptors at NewEgg):
>
http://www.newegg.com/Product/ProductList.asp?DEPA=0&type=&Description=raptor&Submit=ENE&Ntk=all&N=0&minPrice=&maxPrice=&Go.x=0&Go.y=0
the retail parts are more expensive as is the silly drive that you can
look into. buy the cheapest part at the size level you need.
> CLUSTER certainly helped. Each of the following queries would have
> returned roughly 50,000 records. Note that selecting a single record
> from blast_result using an index is plenty fast ( ~ 50 ms), so my
> primary concern is pulling back larger subsets of data.
maybe. you may have had table bloat as well, cluster does a full table
rebuild like vacuum fuul.
> It appears that count(*) on a CLUSTERed table uses the index (as opposed
> to the old way of doing a sequential scan). Count on the table after
> CLUSTER appears to be a *lot* faster, maybe almost 100x. I know we
> shouldn't count, but we've been too lazy to keep the record counts in
> another table, and our customers occasionally want to know how many
> records are in a certain subset.
no, afaik count(*) uses the table still (try explain analyze). you
just compacted and optimized the table for efficient sequential scans.
are you vacuuming regulary?
merlin