Обсуждение: [PATCH] xlogreader: do not read a file block twice
Hello hackers, Grigory noticed that one of our utilities has very slow performance when xlogreader reads zlib archives. We found out that xlogreader sometimes reads a WAL file block twice. zlib has slow performance when you read an archive not in sequential order. I think reading a block twice in same position isn't sequential, because gzread() moves current position forward and next call gzseek() to the same position moves it back. It seems that the attached patch solves the issue. I think when reqLen == state->readLen the requested block already is in the xlogreader's buffer. What do you think? -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Вложения
Hm, looks like it could speed up PostgreSQL recovery, but is it safe? On 02/11/2019 07:25 PM, Arthur Zakirov wrote: > Hello hackers, > > Grigory noticed that one of our utilities has very slow performance > when xlogreader reads zlib archives. We found out that xlogreader > sometimes reads a WAL file block twice. > > zlib has slow performance when you read an archive not in sequential > order. I think reading a block twice in same position isn't > sequential, because gzread() moves current position forward and next > call gzseek() to the same position moves it back. > > It seems that the attached patch solves the issue. I think when reqLen > == state->readLen the requested block already is in the xlogreader's > buffer. > > What do you think? > -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, Feb 11, 2019 at 07:32:59PM +0300, Grigory Smolkin wrote: > Hm, looks like it could speed up PostgreSQL recovery, but is it > safe? (Please avoid top-posting.) > On 02/11/2019 07:25 PM, Arthur Zakirov wrote: >> Grigory noticed that one of our utilities has very slow performance when >> xlogreader reads zlib archives. We found out that xlogreader sometimes >> reads a WAL file block twice. >> >> What do you think? I think that such things, even if they look simple, need a careful lookup, and I have not looked at the proposal yet. Could you add it to the next commit fest so as we don't lose track of it? https://commitfest.postgresql.org/22/ -- Michael
Вложения
On 12.02.2019 07:23, Michael Paquier wrote: >> On 02/11/2019 07:25 PM, Arthur Zakirov wrote: >>> Grigory noticed that one of our utilities has very slow performance when >>> xlogreader reads zlib archives. We found out that xlogreader sometimes >>> reads a WAL file block twice. >>> >>> What do you think? > > I think that such things, even if they look simple, need a careful > lookup, and I have not looked at the proposal yet. Could you add it > to the next commit fest so as we don't lose track of it? > https://commitfest.postgresql.org/22/ Of course. Agree, it may be a non trivial case. Added as a bug fix: https://commitfest.postgresql.org/22/1994/ -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 11.02.2019 21:25, Arthur Zakirov wrote: > Hello hackers, > > Grigory noticed that one of our utilities has very slow performance when > xlogreader reads zlib archives. We found out that xlogreader sometimes > reads a WAL file block twice. > > zlib has slow performance when you read an archive not in sequential > order. I think reading a block twice in same position isn't sequential, > because gzread() moves current position forward and next call gzseek() > to the same position moves it back. > > It seems that the attached patch solves the issue. I think when reqLen > == state->readLen the requested block already is in the xlogreader's > buffer. > > What do you think? I looked at the history of the code changes: --------------------------------------------------------------- 7fcbf6a405f (Alvaro Herrera 2013-01-16 16:12:53 -0300 539) reqLen < state->readLen) 1bb2558046c (Heikki Linnakangas 2010-01-27 15:27:51 +0000 9349) targetPageOff == readOff && targetRecOff < readLen) eaef111396e (Tom Lane 2006-04-03 23:35:05 +0000 3842) len = XLOG_BLCKSZ - RecPtr->xrecoff % XLOG_BLCKSZ; 4d14fe0048c (Tom Lane 2001-03-13 01:17:06 +0000 3843) if (total_len > len) --------------------------------------------------------------- In the original code of Tom Lane, condition (total_len > len) caused a page reread from disk. As I understand it, this is equivalent to your proposal. Th code line in commit 1bb2558046c seems tantamount to the corresponding line in commit 7fcbf6a405f but have another semantics: the targetPageOff value can't be more or equal XLOG_BLCKSZ, but the reqLen value can be. It may be a reason of appearance of possible mistake, introduced by commit 7fcbf6a405f. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
On 12.02.2019 20:47, Andrey Lepikhov wrote: > I looked at the history of the code changes: > > --------------------------------------------------------------- > 7fcbf6a405f (Alvaro Herrera 2013-01-16 16:12:53 -0300 539) reqLen < > state->readLen) > > 1bb2558046c (Heikki Linnakangas 2010-01-27 15:27:51 +0000 9349) > targetPageOff == readOff && targetRecOff < readLen) > > eaef111396e (Tom Lane 2006-04-03 23:35:05 +0000 3842) > len = XLOG_BLCKSZ - RecPtr->xrecoff % XLOG_BLCKSZ; > 4d14fe0048c (Tom Lane 2001-03-13 01:17:06 +0000 3843) > if (total_len > len) > --------------------------------------------------------------- > > In the original code of Tom Lane, condition (total_len > len) caused a > page reread from disk. As I understand it, this is equivalent to your > proposal. > Th code line in commit 1bb2558046c seems tantamount to the corresponding > line in commit 7fcbf6a405f but have another semantics: the targetPageOff > value can't be more or equal XLOG_BLCKSZ, but the reqLen value can be. > It may be a reason of appearance of possible mistake, introduced by > commit 7fcbf6a405f. Thank you for your research. Indeed, it makes sense now. In my case after reading a page both `reqLen` and `state->readLen` equal to XLOG_BLCKSZ. This leads to a page reread, since `pageptr` is the same as the previous read. But `targetRecOff` is different in the second case because we want to read next record, which probably doesn't fit into the page wholly (that's why `reqLen` is equal to XLOG_BLCKSZ). -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Tue, Feb 12, 2019 at 11:44:14AM +0300, Arthur Zakirov wrote: > Of course. Agree, it may be a non trivial case. Added as a bug fix: > https://commitfest.postgresql.org/22/1994/ I have been looking at the patch, and I agree that the current coding is a bit crazy. If the wanted data has already been read, it makes little sense to require reading it again if the size requested by the caller of ReadPageInternal() exactly equals what has been read already, and that's what the code is doing. Now I don't actually agree that this qualifies as a bug fix. As things stand, a page may finish by being more than once if what has been read previously equals what is requested, however this does not prevent the code to work correctly. The performance gain is also heavily dependent on the callback reading a page and the way the WAL reader is used. How do you actually read WAL pages in your own plugin with compressed data? It begins by reading a full page once, then it moves on to a per-record read after making sure that the page has been read? -- Michael
Вложения
On 14.02.2019 09:51, Michael Paquier wrote: > Now I don't actually agree that this qualifies as a bug fix. As > things stand, a page may finish by being more than once if what has > been read previously equals what is requested, however this does not > prevent the code to work correctly. The performance gain is also > heavily dependent on the callback reading a page and the way the WAL > reader is used. How do you actually read WAL pages in your own > plugin with compressed data? It begins by reading a full page once, > then it moves on to a per-record read after making sure that the page > has been read? Yes, an application reads WAL pages wholly at a time. It is done within SimpleXLogPageRead() (it is a read_page callback passed to XLogReaderAllocate()). It returns XLOG_BLCKSZ. Here is the part of the code, not sure that it will be useful though: SimpleXLogPageRead(...) { ... targetPageOff = targetPagePtr % private_data->xlog_seg_size; ... if (gzseek(private_data->gz_xlogfile, (z_off_t) targetPageOff, SEEK_SET) == -1) ... if (gzread(private_data->gz_xlogfile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ) ... return XLOG_BLCKSZ; } So we read whole page with size XLOG_BLCKSZ. The full code: https://github.com/postgrespro/pg_probackup/blob/c052651b8c8864733bcabbc2660c387b792229d8/src/parsexlog.c#L1074 Here is the little optimization I made. Mainly I just add a buffer to store previous read page: https://github.com/postgrespro/pg_probackup/blob/c052651b8c8864733bcabbc2660c387b792229d8/src/parsexlog.c#L1046 -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Thu, Feb 14, 2019 at 11:20:56AM +0300, Arthur Zakirov wrote: > So we read whole page with size XLOG_BLCKSZ. The full code: > https://github.com/postgrespro/pg_probackup/blob/c052651b8c8864733bcabbc2660c387b792229d8/src/parsexlog.c#L1074 > > Here is the little optimization I made. Mainly I just add a buffer to store > previous read page: > https://github.com/postgrespro/pg_probackup/blob/c052651b8c8864733bcabbc2660c387b792229d8/src/parsexlog.c#L1046 Thanks, I see what you have done. I cannot comment if your shortcut is actually fully correct based on my knowledge of this code, but things cannot be in the best conditions without having the WAL reader handle properly the limits. So I am planning to commit what you propose after an extra pass on it in the next couple of days or so. -- Michael
Вложения
On Fri, Feb 15, 2019 at 08:06:16AM +0900, Michael Paquier wrote: > Thanks, I see what you have done. I cannot comment if your shortcut > is actually fully correct based on my knowledge of this code, but > things cannot be in the best conditions without having the WAL reader > handle properly the limits. So I am planning to commit what you > propose after an extra pass on it in the next couple of days or so. And done, after doing and extra pass, doing more testing using by own plugins, pg_waldump and more fancy stuff with a primary/standby and pgbench. -- Michael
Вложения
On 18.02.2019 04:09, Michael Paquier wrote: > And done, after doing and extra pass, doing more testing using by own > plugins, pg_waldump and more fancy stuff with a primary/standby and > pgbench. Thank you Michael. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company