Re: Monitoring time of fsyncing WALs

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Monitoring time of fsyncing WALs
Дата
Msg-id 20180701032927.GA2109@paquier.xyz
обсуждение исходный текст
Ответ на Re: Monitoring time of fsyncing WALs  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Monitoring time of fsyncing WALs  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Jun 29, 2018 at 10:39:00PM +0900, Michael Paquier wrote:
> On Fri, Jun 29, 2018 at 08:48:33AM -0400, Robert Haas wrote:
>> Are there other instances of fsync() that also need to be covered?
>
> Yeah, I double-checked the calls to pg_fsync back when I looked at this
> patch, but now that I look again I see at least two more of them:
> - fsync_fname_ext.

This one is used internally by things like durable_rename or such, on
which I am not sure that we actually can pass down correctly a wait
event as multiple syncs may happen within each call, so that would
require a weird API layer.

> - write_auto_conf_file, where a wait event for a write() is missing as
> well.

I have been looking at the archives and this has been left out on
purpose:
https://postgr.es/m/CAGPqQf0bzVfTTZdxcr4qHb3WDbn=+iH6sWchbN4HGjhwtcbPYQ@mail.gmail.com

> Hm.  I am wondering if it would not be worth extending pg_fsync() with
> an argument for a wait event and introduce a sort of pg_write() which
> wraps write() with an extra wait event argument, and similarly for
> read() (warning, conflict with Windows ahead!).  That's somehow related
> to the feeling I had when working with transient file's writes and reads
> a couple of days back.  Those are most likely going to be forgotten
> again and again in the future.  In both cases the caller would still be
> responsible for looking at errno and decide the error handling, but I
> got no feedback about the idea on transient files.

That would finish by being sort of ugly as durable_rename or such would
also need some treatment.  That would be quite invasive.

> There are also a couple of places where the wait events are longer than
> they should.  For example in writeTimeLineHistory, there is a wait event
> for write() which is still switched on even within an error code path.
> And on top of it I think that a call to pgstat_report_wait_end() is
> missing in the error code path as on ERROR the session still exists.
> That's a bug.  Those need an extra lookup and visibly some cleanup back
> to v10.

On ERROR a backend would switch back quickly on ClientRead.  Perhaps
we'd want to call pgstat_report_wait_end() when an error or higher is
thrown?  That's a different discussion.

So at the end, I would like to use the proposed patch and call it a
day.  Thoughts?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Postgres 11 release notes
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Explain buffers wrong counter with parallel plans