Re: ext4 finally doing the right thing

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: ext4 finally doing the right thing
Дата
Msg-id 407d949e1001210313w1668d7e2jaee3b4d7984a059@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ext4 finally doing the right thing  (Greg Smith <greg@2ndquadrant.com>)
Список pgsql-performance

Both of those refer to the *drive* cache. 

greg

On 21 Jan 2010 05:58, "Greg Smith" <greg@2ndquadrant.com> wrote:

Greg Stark wrote: > > > That doesn't sound right. The kernel having 10% of memory dirty doesn't mean... Most safe ways ext3 knows how to initiate a write-out on something that must go (because it's gotten an fsync on data there) requires flushing every outstanding write to that filesystem along with it.  So as soon as a single WAL write shows up, bam!  The whole cache is emptied (or at least everything associated with that filesystem), and the caller who asked for that little write is stuck waiting for everything to clear before their fsync returns success.

This particular issue absolutely killed Firefox when they switched to using SQLite not too long ago; high-level discussion at http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ and confirmation/discussion of the issue on lkml at https://kerneltrap.org/mailarchive/linux-fsdevel/2008/5/26/1941354 .
Note the comment from the first article saying "those delays can be 30 seconds or more".  On multiple occasions, I've measured systems with dozens of disks in a high-performance RAID1+0 with battery-backed controller that could grind to a halt for 10, 20, or more seconds in this situation, when running pgbench on a big database.  As was the case on the latest one I saw, if you've got 32GB of RAM and have let 3.2GB of random I/O from background writer/checkpoint writes back up because Linux has been lazy about getting to them, that takes a while to clear no matter how good the underlying hardware.

Write barriers were supposed to improve all this when added to ext3, but they just never seemed to work right for many people.  After reading that lkml thread, among others, I know I was left not trusting anything beyond the simplest path through this area of the filesystem.  Slow is better than corrupted.

So the good news I was relaying is that it looks like this finally work on ext4, giving it the behavior you described and expected, but that's not actually been there until now.  I was hoping someone with more free time than me might be interested to go investigate further if I pointed the advance out.  I'm stuck with too many production systems to play with new kernels at the moment, but am quite curious.

-- Greg Smith    2ndQuadrant   Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQu...

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Inserting 8MB bytea: just 25% of disk perf used?
Следующее
От: Matthew Wakeling
Дата:
Сообщение: Re: a heavy duty operation on an "unused" table kills my server