Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
От | Simon Riggs |
---|---|
Тема | Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation() |
Дата | |
Msg-id | CANP8+jJ1finWKnc7Q_i9K-nnT2KsSmKXfM5mvks2rXJr+dX4Aw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation() (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Changing WAL Header to reduce contention duringReserveXLogInsertLocation()
|
Список | pgsql-hackers |
On 12 January 2018 at 15:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I have some reservations about whether this makes the mechanism less >> reliable. > > Yeah, it scares me too. The xl_prev field is our only way of detecting > that we're looking at old WAL data when we cross a sector boundary. > I have no faith that we can prevent old WAL data from reappearing in the > file system across an OS crash, so I find Simon's assertion that we can > dodge the problem through file manipulation to be simply unbelievable. Not really sure what you mean by "file manipulation". Maybe the proposal wasn't clear. We need a way of detecting that we are looking at old WAL data. More specifically, we need to know whether we are looking at a current file or an older file. My main assertion here is that the detection only needs to happen at file-level, not at record level, so it is OK to lose some bits of information without changing our ability to protect data - they were not being used productively. Let's do the math to see if it is believable, or not. The new two byte value is protected by CRC. The 2 byte value repeats every 32768 WAL files. Any bit error in that value that made it appear to be a current value would need to have a rare set of circumstances. 1. We would need to suffer a bit error that did not get caught by the CRC. 2. An old WAL record would need to occur right on the boundary of the last WAL record. 3. The bit error would need to occur within the 2 byte value. WAL records are usually fairly long, but so this has a Probability of <1/16 4. The bit error would need to change an old value to the current value of the new 2 byte field. If the current value is N, and the previous value is M, then a single bit error that takes M -> N can only happen if N-M is divisible by 2. The maximum probability of an issue would occur when we reuse WAL every 3 files, so probability of such a change would be 1/16. If the distance between M and N is not a power of two then a single bit error cannot change M into N. So what probability do we assign to the situation that M and N are exactly a power of two apart? So the probability of this occurring requires a single undetectable bit error and would then happen less than 1 in 256 times, but arguably much less. Notice that this probability is therefore at least 2 orders of magnitude smaller than the chance that a single bit error occurs and simply corrupts data, a mere rounding error in risk. I don't find that unbelievable at all. If you still do, then I would borrow Andres' idea of using the page header. If we copy the new 2 byte value into the page header, we can use that to match against in the case of error. XLogPageHeaderData can be extended by 2 bytes without increasing its size when using 8 byte alignment. The new 2 byte value is the same anywhere in the file, so that works quickly and easily. And it doesn't increase the size of the header. So with that change it looks completely viable. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: