Re: Scaling with memory & disk planning
От | Curt Sampson |
---|---|
Тема | Re: Scaling with memory & disk planning |
Дата | |
Msg-id | Pine.NEB.4.43.0205311146251.448-100000@angelic.cynic.net обсуждение исходный текст |
Ответ на | Re: Scaling with memory & disk planning (terry@greatgulfhomes.com) |
Список | pgsql-general |
Jean-Luc Lachance writes: > > I think your undestanding of RAID 5 is wrong also. > > > > For a general N disk RAID 5 the process is: > > 1)Read sector > > 2)XOR with data to write > > 3)Read parity sector > > 4)XOR with result above > > 5)write data > > 6)write parity Yes, generally. There are a couple of tricks you can do to help get around this, though. One, which works very nicely when doing sequential writes, is to attempt to hold off on the write until you collect an entire stripe's worth of data. Then you can calculate the parity based on what's in memory, and write the new blocks across all of the disks without worrying about what was on them before. 3ware's Escalade IDE RAID controllers (the 3W-7x50 series, anyway) do this. Their explanation is at http://www.3ware.com/NewFaq/general_operating_and_troubleshooting.htm#R5 _Fusion_Explained . Another tactic is just to buffer entire stripes. Baydel does this with their disk arrays, which are actually RAID-3, not RAID-5. (Since they do only full-stripe reads and writes, it doesn't really make any difference which they use.) You want a fair amount of RAM in your controller for buffering in this case, but it keeps the computers "read, modify, write" cycle on one block from turning into "read, read, modify, write". Terry Fielder writes: > My simplification was intended, anyway it still equates to the same, > because in a performance machine (lots of memory) reads are (mostly) > pulled from cache (not disk IO). So the real cost is disk writes, and > 2 = 2. Well, it really depends on your workload. If you're selecting stuff almost completely randomly scattered about a large table (like the 25 GB one I'm dealing with right now), it's going to be a bit pricy to get hold of a machine with enough memory to cache that effectively. Kurt Gunderson writes: ] Likewise, when writing to the mirrored pair (and using 'write-through', ] never 'write-back'), the controller will pass along the 'data written' ] flag to the CPU when the first disk of the pair writes the data. The ] second will sync eventually but the controller need not wait for both. I hope not! I think that controller ought to wait for both to be written, because otherwise you can have this scenario: 1. Write of block X scheduled for drives A and B. 2. Block written to drive A. Still pending on drive B. 3. Controller returns "block committed to stable storage" to application. 4. Power failure. Pending write to drive B is never written. Now, how do you know, when the system comes back up, that you have a good copy of the block on drive A, but not on drive B? cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
В списке pgsql-general по дате отправления: