>> In summary, the capability to span would be the next resort after someone
>> has maxed out their blocksize. Each OS would have a different blocksize
>> max...an AIX driver breaks when going past 16k...don't know about others.
>>
>> I'd say make the blocksize a run-time variable and then do the spanning.
>
>If we could query to find the file system block size at runtime in a
>portable way, that would help us pick the best block size, no?
No, I would really not suggest that. Having one default page size is really the
best thing. If we have chaining, making the default 4k is probably a good thing.
Most commercial DBMS's have a tuneable blocksize with a 2k or 4k default.
You usually have tables so different, that specifying one optimal blocksize would not
be possible.
The chained row performance hit could be equalled out by implementing good read ahead.
The consecutive pages would already be read when they are needed. A good OS
does this anyway.
Doing tests with dd show, that the blocksize does have a performance impact
when doing IO on a file system file. Even if the OS does the read ahead in an
appropriate blocksize, there are a lot more system calls with small block sizes.
Where the blocksize does show dramatically is on raw devices.
So for IO reasons the blocksize should be large (on my box 64k or 256k).
But, such large blocks have a negative effect on buffer cache hit ratio.
Usually you say:
OLTP system --> small block size to maximize buffer usage and simultaneous access
DSS/OLAP systems --> large block size to maximize sequential scan performance
Conclusio:
The size of a block is always a trade off between IO bandwidth and memory usage.
Therefore having it a tuneable parameter per instance or database is best.
Andreas