Обсуждение: XLog changes for 9.3
When I worked on the XLogInsert scaling patch, it became apparent that some changes to the WAL format would make it a lot easier. So for 9.3, I'd like to do some refactoring: 1. Use a 64-bit integer instead of the two-variable log/seg representation, for identifying a WAL segment. This has no user-visible effect, but makes the code a bit simpler. 2. Don't waste the last WAL segment in each logical 4GB file. Currently, we skip the WAL segment ending with "FF". The comments claim that wasting the last segment "ensures that we don't have problems representing last-byte-position-plus-1", but in my experience, it just makes things more complicated. You have two ways to represent the segment boundary, and some functions are picky on which one is used. For example, XLogWrite() assumes that when you want to flush to the end of a logical log file, you use the "5/FF000000" representation, not "6/00000000". Other functions, like XLogPageRead(), expect the latter. This is a backwards-incompatible change for external utilities that know how the WAL segment numbering works. Hopefully there aren't too many of those around. 3. Move the only field, xl_rem_len, from the continuation record header straight to the xlog page header, eliminating XLogContRecord altogether. This makes it easier to calculate in advance how much space a WAL record requires, as it no longer depends on how many pages it has to be split across. This wastes 4-8 bytes on every xlog page, but that's not much. 4. Allow WAL record header to be split across page boundaries. Currently, if there are less than SizeOfXLogRecord bytes left on the current WAL page, it is wasted, and the next record is inserted at the beginning of the next page. The problem with that is again that it makes it impossible to know in advance exactly how much space a WAL record requires, because it depends on how many bytes need to be wasted at the end of current page. These changes will help the XLogInsert scaling patch, by making the space calculations simpler. In essence, to reserve space for a WAL record of size X, you just need to do "bytepos += X". There's a lot more details with that, like mapping from the contiguous byte position to an XLogRecPtr that takes page headers into account, and noticing RedoRecPtr changes safely, but it's a start. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote: > When I worked on the XLogInsert scaling patch, it became apparent that > some changes to the WAL format would make it a lot easier. So for 9.3, > I'd like to do some refactoring: > 1. Use a 64-bit integer instead of the two-variable log/seg > representation, for identifying a WAL segment. This has no user-visible > effect, but makes the code a bit simpler. +1 We can define a sensible InvalidXLogRecPtr instead of doing that locally in loads of places! Yipee. > 2. Don't waste the last WAL segment in each logical 4GB file. Currently, > we skip the WAL segment ending with "FF". The comments claim that > wasting the last segment "ensures that we don't have problems > representing last-byte-position-plus-1", but in my experience, it just > makes things more complicated. You have two ways to represent the > segment boundary, and some functions are picky on which one is used. For > example, XLogWrite() assumes that when you want to flush to the end of a > logical log file, you use the "5/FF000000" representation, not > "6/00000000". Other functions, like XLogPageRead(), expect the latter. > > This is a backwards-incompatible change for external utilities that know > how the WAL segment numbering works. Hopefully there aren't too many of > those around. +1 > 3. Move the only field, xl_rem_len, from the continuation record header > straight to the xlog page header, eliminating XLogContRecord altogether. > This makes it easier to calculate in advance how much space a WAL record > requires, as it no longer depends on how many pages it has to be split > across. This wastes 4-8 bytes on every xlog page, but that's not much. +1. I don't think this will waste a measureable amount in real-world scenarios. A very big percentag of pages have continuation records. > 4. Allow WAL record header to be split across page boundaries. > Currently, if there are less than SizeOfXLogRecord bytes left on the > current WAL page, it is wasted, and the next record is inserted at the > beginning of the next page. The problem with that is again that it makes > it impossible to know in advance exactly how much space a WAL record > requires, because it depends on how many bytes need to be wasted at the > end of current page. +0.5. Its somewhat convenient to be able to look at a record before you have reassembled it over multiple pages. But its probably not worth the implementation complexity. If we do that we can remove all the aligment padding as well. Which would be a problem for you anyway, wouldn't it? > These changes will help the XLogInsert scaling patch, by making the > space calculations simpler. In essence, to reserve space for a WAL > record of size X, you just need to do "bytepos += X". There's a lot > more details with that, like mapping from the contiguous byte position > to an XLogRecPtr that takes page headers into account, and noticing > RedoRecPtr changes safely, but it's a start. Hm. Wouldn't you need to remove short/long page headers for that as well? Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > When I worked on the XLogInsert scaling patch, it became apparent that > some changes to the WAL format would make it a lot easier. So for 9.3, > I'd like to do some refactoring: > 1. Use a 64-bit integer instead of the two-variable log/seg > representation, for identifying a WAL segment. This has no user-visible > effect, but makes the code a bit simpler. > 2. Don't waste the last WAL segment in each logical 4GB file. Currently, > we skip the WAL segment ending with "FF". The comments claim that > wasting the last segment "ensures that we don't have problems > representing last-byte-position-plus-1", but in my experience, it just > makes things more complicated. I think that's actually an indivisible part of point #1. The issue in the 32+32 representation is that you'd overflow the low-order half when trying to represent last-byte-of-file-plus-1, and have to do something with propagating that to the high half. In a 64-bit continuous addressing scheme the problem goes away, and it would just get more complicated not less to preserve the "hole". regards, tom lane
On 07.06.2012 17:18, Andres Freund wrote: > On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote: >> 3. Move the only field, xl_rem_len, from the continuation record header >> straight to the xlog page header, eliminating XLogContRecord altogether. >> This makes it easier to calculate in advance how much space a WAL record >> requires, as it no longer depends on how many pages it has to be split >> across. This wastes 4-8 bytes on every xlog page, but that's not much. > +1. I don't think this will waste a measureable amount in real-world > scenarios. A very big percentag of pages have continuation records. Yeah, although the way I'm planning to do it, you'll waste 4 bytes (on 64-bit architectures) even when there is a continuation record, because of alignment: typedef struct XLogPageHeaderData { uint16 xlp_magic; /* magic value for correctness checks */ uint16 xlp_info; /* flag bits, seebelow */ TimeLineID xlp_tli; /* TimeLineID of first record on XLogRecPtr xlp_pageaddr; /* XLOG addressof this page */ + uint32 xlp_rem_len; /* bytes remaining of continued record */ } XLogPageHeaderData; The page header is currently 16 bytes in length, so adding a 4-byte field to it bumps the aligned size to 24 bytes. Nevertheless, I think we can well live with that. >> 4. Allow WAL record header to be split across page boundaries. >> Currently, if there are less than SizeOfXLogRecord bytes left on the >> current WAL page, it is wasted, and the next record is inserted at the >> beginning of the next page. The problem with that is again that it makes >> it impossible to know in advance exactly how much space a WAL record >> requires, because it depends on how many bytes need to be wasted at the >> end of current page. > +0.5. Its somewhat convenient to be able to look at a record before you have > reassembled it over multiple pages. But its probably not worth the > implementation complexity. Looking at the code, I think it'll be about the same complexity for XLogInsert in its current form (it will help the patch I'm working on), and makes ReadRecord() a bit more complicated. But not much. > If we do that we can remove all the aligment padding as well. Which would be a > problem for you anyway, wouldn't it? It's not a problem. You just MAXALIGN the size of the record when you calculate how much space it needs, and then all records become naturally MAXALIGNed. We could quite easily remove the alignment on-disk if we wanted to, ReadRecord() already always copies the record to an aligned buffer, but I wasn't planning to do that. >> These changes will help the XLogInsert scaling patch, by making the >> space calculations simpler. In essence, to reserve space for a WAL >> record of size X, you just need to do "bytepos += X". There's a lot >> more details with that, like mapping from the contiguous byte position >> to an XLogRecPtr that takes page headers into account, and noticing >> RedoRecPtr changes safely, but it's a start. > Hm. Wouldn't you need to remove short/long page headers for that as well? No, those are ok because they're predictable. Although it would make the mapping simpler. To convert from a contiguous xlog byte position that excludes all headers, to XLogRecPtr, you need to do something like this (I just made this up, probably has bugs, but it's about this complex): #define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD) #define UsableBytesInSegment ((XLOG_SEG_SIZE / XLOG_BLCKSZ) * UsableBytesInPage - (SizeOfXLogLongPHD - SizeOfXLogShortPHD) uint64 xlogrecptr; uint64 full_segments = bytepos / UsableBytesInSegment; int offset_in_segment = bytepos % UsableBytesInSegment; xlogrecptr = full_segments * XLOG_SEG_SIZE; /* is it on the first page? */ if (offset_in_segment < XLOG_BLCKSZ - SizeOfXLogLongPHD) xlogrecptr += SizeOfXLogLongPHD + offset_in_segment; else { /* first page is fully used */ xlogrecptr += XLOG_BLCKSZ; /* add other full pages */ offset_in_segment -= XLOG_BLCKSZ- SizeOfXLogLongPHD; xlogrecptr += (offset_in_segment / UsableBytesInPage) * XLOG_BLCKSZ; /* and finally offsetwithin the last page */ xlogrecptr += offset_in_segment % UsableBytesInPage; } /* finally convert the 64-bit xlogrecptr to a XLogRecPtr struct */ XLogRecPtr.xlogid = xlogrecptr >> 32; XLogRecPtr.xrecoff = xlogrecptr & 0xffffffff; Capsulated in a function, that's not too bad. But if we want to make that simpler, one idea would be to allocate the whole 1st page in each WAL segment for metadata. That way all the actual xlog pages would hold the same amount of xlog data. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 7 June 2012 14:50, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > These changes will help the XLogInsert scaling patch ...and as I'm sure you're aware will junk much of the replication code and almost certainly set back the other work that we have brewing for 9.3. So this is a very large curve ball you're throwing there. Personally, I don't think we should do this until we have a better regression test suite around replication and recovery because the impact will be huge but I welcome the suggested changes themselves. If you are going to do this in 9.3, then it has to be early in the first Commit Fest and you'll need to be around to quickly follow through on all of the other subsequent breakages it will cause, otherwise every other piece of work in this area will be halted or delayed. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote: > On 7 June 2012 14:50, Heikki Linnakangas > > <heikki.linnakangas@enterprisedb.com> wrote: > > These changes will help the XLogInsert scaling patch > > ...and as I'm sure you're aware will junk much of the replication code > and almost certainly set back the other work that we have brewing for > 9.3. So this is a very large curve ball you're throwing there. It's not that bad. Most of that code is pretty abstracted, the changes to adapt to that should be less than 20 lines. And it would remove some of the complexity. > Personally, I don't think we should do this until we have a better > regression test suite around replication and recovery because the > impact will be huge but I welcome the suggested changes themselves. Hm. One could regard the logical rep stuff as a testsuite ;) > If you are going to do this in 9.3, then it has to be early in the > first Commit Fest and you'll need to be around to quickly follow > through on all of the other subsequent breakages it will cause, > otherwise every other piece of work in this area will be halted or > delayed. Yea, I would definitely welcome an early patch. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 7, 2012 at 5:56 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote: >> On 7 June 2012 14:50, Heikki Linnakangas >> >> <heikki.linnakangas@enterprisedb.com> wrote: >> > These changes will help the XLogInsert scaling patch >> >> ...and as I'm sure you're aware will junk much of the replication code >> and almost certainly set back the other work that we have brewing for >> 9.3. So this is a very large curve ball you're throwing there. > It's not that bad. Most of that code is pretty abstracted, the changes to > adapt to that should be less than 20 lines. And it would remove some of the > complexity. > >> Personally, I don't think we should do this until we have a better >> regression test suite around replication and recovery because the >> impact will be huge but I welcome the suggested changes themselves. > Hm. One could regard the logical rep stuff as a testsuite ;) > >> If you are going to do this in 9.3, then it has to be early in the >> first Commit Fest and you'll need to be around to quickly follow >> through on all of the other subsequent breakages it will cause, >> otherwise every other piece of work in this area will be halted or >> delayed. > Yea, I would definitely welcome an early patch. Just as I'm sure everybody else would welcome *your* patches landing in the first commitfest and that you all guarantee to be around quickly follow through on all potential breakages *that* can cause. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Thursday, June 07, 2012 06:02:12 PM Magnus Hagander wrote: > On Thu, Jun 7, 2012 at 5:56 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > Hi, > > > > On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote: > >> On 7 June 2012 14:50, Heikki Linnakangas > >> > >> <heikki.linnakangas@enterprisedb.com> wrote: > >> > These changes will help the XLogInsert scaling patch > >> > >> ...and as I'm sure you're aware will junk much of the replication code > >> and almost certainly set back the other work that we have brewing for > >> 9.3. So this is a very large curve ball you're throwing there. > > > > It's not that bad. Most of that code is pretty abstracted, the changes to > > adapt to that should be less than 20 lines. And it would remove some of > > the complexity. > > > >> Personally, I don't think we should do this until we have a better > >> regression test suite around replication and recovery because the > >> impact will be huge but I welcome the suggested changes themselves. > > > > Hm. One could regard the logical rep stuff as a testsuite ;) > > > >> If you are going to do this in 9.3, then it has to be early in the > >> first Commit Fest and you'll need to be around to quickly follow > >> through on all of the other subsequent breakages it will cause, > >> otherwise every other piece of work in this area will be halted or > >> delayed. > > > > Yea, I would definitely welcome an early patch. > > Just as I'm sure everybody else would welcome *your* patches landing > in the first commitfest and that you all guarantee to be around > quickly follow through on all potential breakages *that* can cause. Agreed. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 07.06.2012 18:51, Simon Riggs wrote: > On 7 June 2012 14:50, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > >> These changes will help the XLogInsert scaling patch > > ...and as I'm sure you're aware will junk much of the replication code > and almost certainly set back the other work that we have brewing for > 9.3. So this is a very large curve ball you're throwing there. I don't think this has much impact on what you're doing (although it's a bit hard to tell without more details). The way WAL records work is the same, it's just the code that lays them out on a page, and reads back from a page, that's changed. And that's fairly isolated in xlog.c. > If you are going to do this in 9.3, then it has to be early in the > first Commit Fest and you'll need to be around to quickly follow > through on all of the other subsequent breakages it will cause, > otherwise every other piece of work in this area will be halted or > delayed. Yeah, the plan is to get this in early, in the first commit fest. Not only because of possible breakage, but also because my ultimate goal is the XLogInsert refactoring, and I want do that early in the release cycle, too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jun 7, 2012 at 11:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > So this is a very large curve ball you're throwing there. This is not exactly unexpected. At least the first two of these items were previous discussed in the context of the XLOG scaling patch, many months ago. It shouldn't come as a surprise to anyone that Heikki is planning to continue to work on that patch even though it didn't make 9.2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thursday, June 07, 2012 05:35:11 PM Heikki Linnakangas wrote: > On 07.06.2012 17:18, Andres Freund wrote: > > On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote: > >> 3. Move the only field, xl_rem_len, from the continuation record header > >> straight to the xlog page header, eliminating XLogContRecord altogether. > >> This makes it easier to calculate in advance how much space a WAL record > >> requires, as it no longer depends on how many pages it has to be split > >> across. This wastes 4-8 bytes on every xlog page, but that's not much. > > > > +1. I don't think this will waste a measureable amount in real-world > > scenarios. A very big percentag of pages have continuation records. > > Yeah, although the way I'm planning to do it, you'll waste 4 bytes (on > 64-bit architectures) even when there is a continuation record, because > of alignment: > > typedef struct XLogPageHeaderData > { > uint16 xlp_magic; /* magic value for correctness checks */ > uint16 xlp_info; /* flag bits, see below */ > TimeLineID xlp_tli; /* TimeLineID of first record on > XLogRecPtr xlp_pageaddr; /* XLOG address of this page */ > > + uint32 xlp_rem_len; /* bytes remaining of continued record */ > } XLogPageHeaderData; > > The page header is currently 16 bytes in length, so adding a 4-byte > field to it bumps the aligned size to 24 bytes. Nevertheless, I think we > can well live with that. At that point we can just do the #define SizeofXLogPageHeaderData (offsetof(XLogPageHeaderData, xlp_pageaddr) + sizeof(uint32)) dance. If the record can be smeared over two pages there is no point in storing it aligned. Then we don't waste any additional space in comparison to the current state. > > If we do that we can remove all the aligment padding as well. Which would > > be a problem for you anyway, wouldn't it? > It's not a problem. You just MAXALIGN the size of the record when you > calculate how much space it needs, and then all records become naturally > MAXALIGNed. We could quite easily remove the alignment on-disk if we > wanted to, ReadRecord() already always copies the record to an aligned > buffer, but I wasn't planning to do that. Whats the reasoning for having alignment on disk if the records aren't stored continually? > >> These changes will help the XLogInsert scaling patch, by making the > >> space calculations simpler. In essence, to reserve space for a WAL > >> record of size X, you just need to do "bytepos += X". There's a lot > >> more details with that, like mapping from the contiguous byte position > >> to an XLogRecPtr that takes page headers into account, and noticing > >> RedoRecPtr changes safely, but it's a start. > > > > Hm. Wouldn't you need to remove short/long page headers for that as well? > > No, those are ok because they're predictable. I haven't read your scalability patch, so I am not really sure what you need... The "bytepos += X" from above isn't as easy that way. But yes, its not that complicated. > Although it would make the > mapping simpler. To convert from a contiguous xlog byte position that > excludes all headers, to XLogRecPtr, you need to do something like this > (I just made this up, probably has bugs, but it's about this complex): > > #define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD) > #define UsableBytesInSegment ((XLOG_SEG_SIZE / XLOG_BLCKSZ) * > UsableBytesInPage - (SizeOfXLogLongPHD - SizeOfXLogShortPHD) > > uint64 xlogrecptr; > uint64 full_segments = bytepos / UsableBytesInSegment; > int offset_in_segment = bytepos % UsableBytesInSegment; > > xlogrecptr = full_segments * XLOG_SEG_SIZE; > /* is it on the first page? */ > if (offset_in_segment < XLOG_BLCKSZ - SizeOfXLogLongPHD) > xlogrecptr += SizeOfXLogLongPHD + offset_in_segment; > else > { > /* first page is fully used */ > xlogrecptr += XLOG_BLCKSZ; > /* add other full pages */ > offset_in_segment -= XLOG_BLCKSZ - SizeOfXLogLongPHD; > xlogrecptr += (offset_in_segment / UsableBytesInPage) * XLOG_BLCKSZ; > /* and finally offset within the last page */ > xlogrecptr += offset_in_segment % UsableBytesInPage; > } > /* finally convert the 64-bit xlogrecptr to a XLogRecPtr struct */ > XLogRecPtr.xlogid = xlogrecptr >> 32; > XLogRecPtr.xrecoff = xlogrecptr & 0xffffffff; Its a bit more complicated than that, records can span a good bit more than just two pages (even more than two segments) and you need to decide for every of those whether it has a long or a short header. > Capsulated in a function, that's not too bad. But if we want to make > that simpler, one idea would be to allocate the whole 1st page in each > WAL segment for metadata. That way all the actual xlog pages would hold > the same amount of xlog data. Its a bit easier then, but you probably still need to loop over the size and subtract till you reached the final point. Its no problem to produce a 100MB wal record. But then thats probably nothing to design for. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > dance. If the record can be smeared over two pages there is no point in > storing it aligned. I think this is not true. The value of requiring alignment is that you can read the record-length field without first having to copy it somewhere. In particular, it will get really ugly if the record length field itself could cross a page boundary. I think we want to be able to determine the record length before we do any data copying, so that we can malloc the record buffer and then just do one copy step. The real reason for the current behavior of not letting the record header get split across multiple pages is so that the length field is guaranteed to be in the first page. We can still guarantee that if we (1) put the length field first and (2) require at least int32 alignment. I think losing that property will be pretty bad though. regards, tom lane
On Thursday, June 07, 2012 06:53:58 PM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > dance. If the record can be smeared over two pages there is no point in > > storing it aligned. > > I think this is not true. The value of requiring alignment is that you > can read the record-length field without first having to copy it somewhere. > In particular, it will get really ugly if the record length field itself > could cross a page boundary. I think we want to be able to determine > the record length before we do any data copying, so that we can malloc > the record buffer and then just do one copy step. Hm, I had assumed the record would get copied into a temp/static buffer first and only get reassembled together with the data afterwards. But if thats not the way to go, sure, storing it aligned so that the length can always be read aligned within a page is sensible. Andres
On 7 June 2012 17:12, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 07.06.2012 18:51, Simon Riggs wrote: >> >> On 7 June 2012 14:50, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >> >>> These changes will help the XLogInsert scaling patch >> >> >> ...and as I'm sure you're aware will junk much of the replication code >> and almost certainly set back the other work that we have brewing for >> 9.3. So this is a very large curve ball you're throwing there. > > > I don't think this has much impact on what you're doing (although it's a bit > hard to tell without more details). The way WAL records work is the same, > it's just the code that lays them out on a page, and reads back from a page, > that's changed. And that's fairly isolated in xlog.c. I wasn't worried about the code overlap, but the subsidiary breakage looks pretty enormous to me. Anything changing filenames will break every HA config anybody has anywhere. So you can pretty much kiss goodbye to the idea of pg_upgrade. For me, this one thing alone is sufficient to force next release to be 10.0. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thursday, June 07, 2012 07:03:32 PM Simon Riggs wrote: > On 7 June 2012 17:12, Heikki Linnakangas > > <heikki.linnakangas@enterprisedb.com> wrote: > > On 07.06.2012 18:51, Simon Riggs wrote: > >> On 7 June 2012 14:50, Heikki Linnakangas > >> > >> <heikki.linnakangas@enterprisedb.com> wrote: > >>> These changes will help the XLogInsert scaling patch > >> > >> ...and as I'm sure you're aware will junk much of the replication code > >> and almost certainly set back the other work that we have brewing for > >> 9.3. So this is a very large curve ball you're throwing there. > > > > I don't think this has much impact on what you're doing (although it's a > > bit hard to tell without more details). The way WAL records work is the > > same, it's just the code that lays them out on a page, and reads back > > from a page, that's changed. And that's fairly isolated in xlog.c. > I wasn't worried about the code overlap, but the subsidiary breakage > looks pretty enormous to me. The xlog arithmetic will still be encapsulated, so not much difference there. Removing reading of XLogContRecord isn't complicated and would result in less code. Shouldn't be much more than that. > Anything changing filenames will break every HA config anybody has > anywhere. So you can pretty much kiss goodbye to the idea of > pg_upgrade. For me, this one thing alone is sufficient to force next > release to be 10.0. Hm? Wal isn't relevant for pg_upgrade. And the HA setups should rely on archive_command and such and not do computation of the next/last name. I would guess removing that corner-case actually fixes more tools than it breaks. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> wrote: > Anything changing filenames will break every HA config anybody has > anywhere. It will impact our scripts related to backup and archiving, but I think we're talking about two or three staff days to cover it in our shop. We should definitely make sure that this change is conspicuously noted. The scariest part is that there will now be files that matter with names that previously didn't exist, so lack of action will cause failure to capture a usable backup. I don't know that it merits a bump to 10.0, though. We test every backup for usability, as I believe any shop should; failure to cover this should cause pretty obvious errors pretty quickly if you are testing your backups. -Kevin
On Thu, Jun 7, 2012 at 1:15 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > >> Anything changing filenames will break every HA config anybody has >> anywhere. > > It will impact our scripts related to backup and archiving, but I > think we're talking about two or three staff days to cover it in our > shop. > > We should definitely make sure that this change is conspicuously > noted. The scariest part is that there will now be files that > matter with names that previously didn't exist, so lack of action > will cause failure to capture a usable backup. But if you're just using regexp matching against pathnames, your tool will be just fine. Do your tools actually rely on the occasional absence of a file in what would otherwise be the usual sequence of files? ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > But if you're just using regexp matching against pathnames, your > tool will be just fine. Do your tools actually rely on the > occasional absence of a file in what would otherwise be the usual > sequence of files? To save "snapshot" backups for the long term, we generate a list of the specific WAL files needed to reach a consistent recovery point from a given base backup. We keep monthly snapshot backups for a year. We currently determine the first and last file needed, and then create a list of all the WAL files to save. We error out if any are missing, so we do skip the FF file. -Kevin
On Thu, Jun 7, 2012 at 1:40 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> But if you're just using regexp matching against pathnames, your >> tool will be just fine. Do your tools actually rely on the >> occasional absence of a file in what would otherwise be the usual >> sequence of files? > > To save "snapshot" backups for the long term, we generate a list of > the specific WAL files needed to reach a consistent recovery point > from a given base backup. We keep monthly snapshot backups for a > year. We currently determine the first and last file needed, and > then create a list of all the WAL files to save. We error out if > any are missing, so we do skip the FF file. OK, I see. Still, I think there are a lot of people who don't do anything that complex, and won't be affected. But I agree we had better clearly release-note it as an incompatibility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs <simon@2ndQuadrant.com> writes: > Anything changing filenames will break every HA config anybody has > anywhere. This seems like nonsense to me. How many external scripts are likely to know that we skip the FF page? There might be some, but not many. > So you can pretty much kiss goodbye to the idea of pg_upgrade. And that is certainly nonsense. I don't think pg_upgrade even knows about this, and if it does we can surely fix it. > For me, this one thing alone is sufficient to force next release to be > 10.0. Huh? We make incompatible changes in major versions all the time. This one does not appear to me to be worse than many others. regards, tom lane
On 7 June 2012 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> Anything changing filenames will break every HA config anybody has >> anywhere. > > This seems like nonsense to me. How many external scripts are likely to > know that we skip the FF page? There might be some, but not many. If that is the only change in filenames, then all is forgiven. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 7 June 2012 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This seems like nonsense to me. �How many external scripts are likely to >> know that we skip the FF page? �There might be some, but not many. > If that is the only change in filenames, then all is forgiven. Oh, now I see what you're on about. Yes, I agree that we should maintain the same formatting of WAL segment file names, even though it will be rather artificial in the 64-bit-arithmetic world. The only externally visible change should be the creation of FF-numbered files where formerly those were skipped. regards, tom lane
On Thu, Jun 07, 2012 at 02:52:04PM -0400, Tom Lane wrote: > > So you can pretty much kiss goodbye to the idea of pg_upgrade. > > And that is certainly nonsense. I don't think pg_upgrade even knows > about this, and if it does we can surely fix it. pg_upgrade doesn't know anything about xlog files --- all its interaction in that area is through pg_resetxlog and it doesn't look at the xlog details. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +