Обсуждение: btreecheck extension
As discussed at the developer meeting at pgCon, I think that there is a lot to be said for a tool that checks nbtree index invariants on live systems. Attached prototype patch adds contrib extension, btreecheck. This extension provides SQL-callable functions for checking these conditions on nbtree indexes on live systems. This code generally works by using the same insertion ScanKey infrastructure that the B-Tree code itself uses in servicing all kinds of index scans. There are two major goals for this patch. In order of their importance in my mind: 1. Improve testing around the B-Tree code. Give 9.4 beta testers the tools needed to detect subtle errors in a semi-automated way. I'm quite encouraged by recent efforts along these lines by Jeff Janes, Heikki, and others. Beyond 9.4, I think that stress testing is likely to become indispensable as more complex performance optimizations are pursued. Personally, I see some good opportunities to improve the performance of the B-Tree code, and if those ideas are ever to be put into action, there is clearly a big need for automated stress-testing. I see this tool as a start. I'm not just talking about my recent work on sort support for text, and the idea of generalizing that; I think that there are many techniques that we could pursue for decreasing the I/O and CPU overhead of using B-Tree indexes that are just too risky to pursue today because of the subtle interactions of many moving parts. I'd like to lower the risk. 2. Allow users to sanity check indexes in production. This is not my immediate goal here, though; I will not add this patch to the next CommitFest. That's something to think about in more detail a bit later. Clearly there is a big demand for this kind of thing, though. Since I believe the B-Tree code changes of 9.4 are generally thought to be the most plausible source of serious bugs in 9.4 purely because of their subtlety and fundamental importance, it seemed valuable to get this prototype into the hands of beta testers as soon as possible. For that reason, I have not given some aspects of the design as much thought as I would have preferred. I haven't really come up with a plausible plan of attack for finding bugs in the B-Tree code. Rather, I've more or less just considered what invariant conditions exist that can be correctly verified without too much effort, and then implemented checks for each. I've also stuck to a design that only has the tool share lock a single B-Tree buffer at a time, and copy the contents into local memory before operating on that (relation level heavyweight locks are used in various different ways too), in a similar manner to contrib/pageinspect. Copying the entire page into local memory is probably the right thing to do even from a performance perspective, since in each case the entire page is inspected, which takes much longer than, say, a binary search. Even though it's supposed to be possible to run this on a live system without causing too much disruption, subtle buffer locking protocols were avoided (although there is a rationale for why things are okay in the module, that considers current buffer locking protocols particular to the B-Tree code). This is a principle that I think I can stick to, and would certainly prefer to stick to for obvious reasons. The checks performed (the entire set of invariant conditions checked by all SQL-callable functions in the patch) are currently: * That data items on each B-Tree page are in logical order. * That non-rightmost pages (which are the only pages that have a high key) have only data items that respect that high key as an upper bound. * That a down-link data item in a parent page points to a child page where the parent down-link data item is respected as a lower bound. * That a down-link data item in a parent page points to a child pages whose high key (if any) is equal to the next data item on the parent page (if any), while taking into account incomplete page splits and right-most-ness. * That a down-link data item in a parent page points to a child page whose right link (if any) points to the same block/page as pointed to by the next data item on the parent page (if any). In other words, the children and their parent are in agreement about the children being siblings, while taking into account incomplete page splits and right-most-ness. In other other words, the second phase of a page split where a downlink is inserted into the parent was sane (so this is similar to the previous invariant check listed). The two prior checks may not be sufficient, though, if we want to operate with weak, non-disruptive locks -- what about cousins and grandparents, and even great-grandparents and second cousins? It's a good start, though. * That on each level, starting at the true root level and working down the the leaf level (level 0), there is mutual agreement between siblings that they are siblings (that is, their right and left links comport). I won't go into locking protocols and their correctness here. That's something that I would like others to look over, though - there are fairly detailed comments. I would also like to coordinate with anyone who thinks they can come up with a good stress testing workload that introduces a lot of variability for the B-Tree code to deal with. I think it's appropriate that that general effort should mostly drive the development of this tool, and not the other way around. When you consider the difficulty of isolating the kind of bugs we're looking for here, a lower lock strength is not just a nice-to-have. I think it might actually be an important contribution to the effectiveness of the tool generally. I envisage tests running with a high amount of concurrency on big databases over days or even longer. Thoughts? -- Peter Geoghegan
Вложения
On Mon, Jun 16, 2014 at 9:47 PM, Peter Geoghegan <pg@heroku.com> wrote: > As discussed at the developer meeting at pgCon, I think that there is > a lot to be said for a tool that checks nbtree index invariants on > live systems. Me too. > Attached prototype patch adds contrib extension, btreecheck. I don't feel qualified to comment on any of the substantive issues you raise, so instead I'd like to bikeshed the name. I suggest that we create one extension to be a repository for index-checking machinery (and perhaps also heap-checking machinery) and view this as the first of possibly several checkers to live there. Otherwise, we may eventually end up with separate extensions for btreecheck, hashcheck, gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck, toastcheck, etc. which seems like excessive namespace pollution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 17, 2014 at 1:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I don't feel qualified to comment on any of the substantive issues you > raise, so instead I'd like to bikeshed the name. I suggest that we > create one extension to be a repository for index-checking machinery > (and perhaps also heap-checking machinery) and view this as the first > of possibly several checkers to live there. Otherwise, we may > eventually end up with separate extensions for btreecheck, hashcheck, > gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck, > toastcheck, etc. which seems like excessive namespace pollution. I agree. I hope that we'll eventually be able to move code like this into each AM, with something like an amverifyintegrity pg_am entry optionally provided. There'd also be a utility statement that would perform this kind of verification. It seems natural to do this, as the patch I've posted arguably adds a big modularity violation. Besides, it seems worthwhile to pepper the regular regression tests with calls like these, at least in some places, and putting something in core is the most convenient way to do that. -- Peter Geoghegan
On Tue, Jun 17, 2014 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Jun 17, 2014 at 1:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I don't feel qualified to comment on any of the substantive issues you >> raise, so instead I'd like to bikeshed the name. I suggest that we >> create one extension to be a repository for index-checking machinery >> (and perhaps also heap-checking machinery) and view this as the first >> of possibly several checkers to live there. Otherwise, we may >> eventually end up with separate extensions for btreecheck, hashcheck, >> gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck, >> toastcheck, etc. which seems like excessive namespace pollution. > > I agree. > > I hope that we'll eventually be able to move code like this into each > AM, with something like an amverifyintegrity pg_am entry optionally > provided. There'd also be a utility statement that would perform this > kind of verification. It seems natural to do this, as the patch I've > posted arguably adds a big modularity violation. Besides, it seems > worthwhile to pepper the regular regression tests with calls like > these, at least in some places, and putting something in core is the > most convenient way to do that. I think there's something to be said for that, but I think at the moment I like the idea of a functional interface better. The reason is that I'm not sure we can predict all of the checks we're going to want to add. For example, maybe somebody will come up with another btree checker that's different from your btree checker, and maybe there will be good reasons not to merge the two - e.g. different locking requirements, or different runtimes, or whatever. Even more likely, I think there will be things we want to do that fall under the broad umbrella of integrity checking that we just can't predict now: scan the table and check whether all the keys are present in every index, scan the TOAST table and make sure that all the chunks are the right size, etc. If we have a bunch of functions for this sort of thing, it's easy and relatively uncontroversial to add more. If we put it in core and give it real grammar support, then we've got to fight with keywords and contemplate grammar bloat and, basically, I think every change will be likely to get a higher level of scrutiny. I'd rather not go there. Now, we could. We could come up with an extensible syntax, like this: CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ]; But frankly I'm kind of uncompelled by that. This isn't a feature that seems to me to really need to be in core. It doesn't particularly need grammar support, and it doesn't need WAL logging, and not everyone needs it at all, and especially if we eventually end up with a robust suite of tools in this area, not everyone may even want it, if it means a bigger installed footprint or more security concerns to worry about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 17, 2014 at 5:11 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think there's something to be said for that, but I think at the > moment I like the idea of a functional interface better. The reason > is that I'm not sure we can predict all of the checks we're going to > want to add. That's true. Clearly this is very hand-wavy, and as I said I think it's appropriate that the tool evolve in response to our testing requirements, but I would like to figure out a way to have verification tooling available for some kind of semi-standard tests, possibly even including the standard regression tests. It's probably too early to discuss, though. > Now, we could. We could come up with an extensible syntax, like this: > > CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ]; That's what I had in mind. Using the same trick that you came up with for EXPLAIN, to avoid grammar bloat and let the am figure out for itself what to name the various check types, with a generic default check. -- Peter Geoghegan
* Peter Geoghegan (pg@heroku.com) wrote: > On Tue, Jun 17, 2014 at 5:11 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > Now, we could. We could come up with an extensible syntax, like this: > > > > CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ]; > > That's what I had in mind. Using the same trick that you came up with > for EXPLAIN, to avoid grammar bloat and let the am figure out for > itself what to name the various check types, with a generic default > check. I'm fine with having these start out as external tools which are doing checks, but I've been specifically asked about (and have desired myself from time-to-time...) an in-core capability to check index/heap/etc validity. Folks coming from certain other RDBMS's find it amazing that we don't have any support for that when what they really want is a background worker which is just automatically going around doing these checks. Now, perhaps we could have the background worker without the syntax for running these by hand, but I don't particularly like that idea. Being able to run these checks by hand is extremely useful and I'd much prefer to be able to force that than to have some mechanism where I have to submit a request for a check to another process through a queue or something along those lines. Thanks, Stephen
On Wed, Jun 18, 2014 at 4:48 AM, Stephen Frost <sfrost@snowman.net> wrote: > I'm fine with having these start out as external tools which are doing > checks, but I've been specifically asked about (and have desired myself > from time-to-time...) an in-core capability to check index/heap/etc > validity. Folks coming from certain other RDBMS's find it amazing that > we don't have any support for that when what they really want is a > background worker which is just automatically going around doing these > checks. Yes, I think that being able to verify the integrity of index and heap relations is an important feature, and I think it's notable that we're the only major RDBMS that lacks this support. I'm not quite sure what that should entail just yet, but clearly it should be somewhat like what I have here. I think the big open questions to make something like this work are mostly around the cost/benefit ratio of each of the checks I've outlined. Certainly, for that use case minimizing disruption on a live system becomes even more important. I'll probably look at a buffer access strategy, just to give an example of that off the top of my head. -- Peter Geoghegan
--On 16. Juni 2014 18:47:30 -0700 Peter Geoghegan <pg@heroku.com> wrote: > Attached prototype patch adds contrib extension, btreecheck. This > extension provides SQL-callable functions for checking these > conditions on nbtree indexes on live systems. What's the current state of this module? I see you are interested in stress testing, but i'm not sure how far this all is gone? This tool actually served a very good job during identifying index corruption due to collation issues[1]. [1] <http://www.postgresql.org/message-id/CB4D1C6BAA80CF146CB0D4F2@eje.credativ.lan> -- Thanks Bernd
On Wed, Oct 7, 2015 at 2:59 AM, Bernd Helmle <mailings@oopsware.de> wrote: > What's the current state of this module? I see you are interested in stress > testing, but i'm not sure how far this all is gone? > > This tool actually served a very good job during identifying index > corruption due to collation issues[1]. I tweaked it a bit. I actually don't really know if I'll pick it up anytime soon. I'm glad that it helped you, but the goals for the tool need to be thrashed out a bit more. We should discuss it again. -- Peter Geoghegan