Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
От | Claudio Freire |
---|---|
Тема | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Дата | |
Msg-id | CAGTBQpYaO345De38yh-LCkORgL8gdmhq+acGOd4PTBsMCJ2szQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
(Stephen Frost <sfrost@snowman.net>)
|
Список | pgsql-hackers |
On Tue, Jan 14, 2014 at 2:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: >> On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote: >>> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> > In terms of avoiding double-buffering, here's my thought after reading >>> > what's been written so far. Suppose we read a page into our buffer >>> > pool. Until the page is clean, it would be ideal for the mapping to >>> > be shared between the buffer cache and our pool, sort of like >>> > copy-on-write. That way, if we decide to evict the page, it will >>> > still be in the OS cache if we end up needing it again (remember, the >>> > OS cache is typically much larger than our buffer pool). But if the >>> > page is dirtied, then instead of copying it, just have the buffer pool >>> > forget about it, because at that point we know we're going to write >>> > the page back out anyway before evicting it. >>> > >>> > This would be pretty similar to copy-on-write, except without the >>> > copying. It would just be forget-from-the-buffer-pool-on-write. >>> >>> But... either copy-on-write or forget-on-write needs a page fault, and >>> thus a page mapping. >>> >>> Is a page fault more expensive than copying 8k? >>> >>> (I really don't know). >> >> A page fault can be expensive, yes ... but perhaps you don't need one. >> >> What you want is a range of memory that's read from a file but treated >> as anonymous for writeout (i.e. written to swap if we need to reclaim >> it). Then at some time later, you want to designate it as written back >> to the file instead so you control the writeout order. I'm not sure we >> can do this: the separation between file backed and anonymous pages is >> pretty deeply ingrained into the OS, but if it were possible, is that >> what you want? > > Doesn't sound exactly like what I had in mind. What I was suggesting > is an analogue of read() that, if it reads full pages of data to a > page-aligned address, shares the data with the buffer cache until it's > first written instead of actually copying the data. The pages are > write-protected so that an attempt to write the address range causes a > page fault. In response to such a fault, the pages become anonymous > memory and the buffer cache no longer holds a reference to the page. Yes, that's basically zero-copy reads. It could be done. The kernel can remap the page to the physical page holding the shared buffer and mark it read-only, then expire the buffer and transfer ownership of the page if any page fault happens. But that incurrs:- Page faults, lots- Hugely bloated mappings, unless KSM is somehow leveraged for this And there's a nice bingo. Had forgotten about KSM. KSM could help lots. I could try to see of madvising shared_buffers as mergeable helps. But this should be an automatic case of KSM - ie, when reading into a page-aligned address, the kernel should summarily apply KSM-style sharing without hinting. The current madvise interface puts the burden of figuring out what duplicates what on the kernel, but postgres already knows.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Stephen FrostДата:
Сообщение: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance