Re: Question: BlockSize > 8192 with FusionIO

Поиск
Список
Период
Сортировка
От Scott Carey
Тема Re: Question: BlockSize > 8192 with FusionIO
Дата
Msg-id 42AF139A-0385-4226-B81C-9569FB64873E@richrelevance.com
обсуждение исходный текст
Ответ на Re: Question: BlockSize > 8192 with FusionIO  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance
On Jan 4, 2011, at 8:48 AM, Merlin Moncure wrote:

> On Mon, Jan 3, 2011 at 9:13 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> Strange, John W wrote:
>>>
>>> Has anyone had a chance to recompile and try larger a larger blocksize
>>> than 8192 with pSQL 8.4.x?
>>
>> While I haven't done the actual experiment you're asking about, the problem
>> working against you here is how WAL data is used to protect against partial
>> database writes.  See the documentation for full_page_writes at
>> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>>  Because full size copies of the blocks have to get written there, attempts
>> to chunk writes into larger pieces end up requiring a correspondingly larger
>> volume of writes to protect against partial writes to those pages.  You
>> might get a nice efficiency gain on the read side, but the situation when
>> under a heavy write load (the main thing you have to be careful about with
>> these SSDs) is much less clear.
>
> most flash drives, especially mlc flash, use huge blocks anyways on
> physical level.  the numbers claimed here
> (http://www.fusionio.com/products/iodrive/)  (141k write iops) are
> simply not believable without write buffering.  i didn't see any note
> of how fault tolerance is maintained through the buffer (anyone
> know?).


Flash may have very large erase blocks -- 4k to 16M, but you can write to it at much smaller block sizes sequentially.

It has to delete a block in bulk, but it can write to an erased block bit by bit, sequentially (512 or 4096 bytes
typically,but some is 8k and 16k). 

Older MLC NAND flash could be written to at a couple bytes at a time -- but drives today incorporate too much EEC and
uselarger chunks to do that.  The minimum write size now is caused by the EEC requirements and not the physical NAND
flashrequirements.   

So, buffering isn't that big of a requirement with the current LBA > Physical translations which change all writes --
randomor not -- to sequential writes in one erase block. 
 But performance if waiting for the write to complete will not be all that good, especially with MLC.  Turn off the
bufferon an Intel SLC drive for example, and write IOPS is cut by 1/3 or more -- to 'only' 1000 or so iops. 

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Same stament sometime fast, something slow
Следующее
От: Josh Berkus
Дата:
Сообщение: Wrong docs on wal_buffers?