Обсуждение: pgcheck - data integrity check
Hi, I am working on a data integrity check tool (pgcheck). I would like to discuss the following issues: Right now I am working on a function, which should check the validity of pages in relation. The relation is passed to the function as its argument (its oid). For the integrity check of a page, I am using an AccessShare lock on a relation as you can see on http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgcheck/pokus/pgcheck_page/ . In near future, I would like to extend the functionality also with a recovery tool which would be able to repair broken pages. Should the function also repair the corrupted data on a page, the AccessShare lock on a relation would not be sufficient. But on the other hand AccessExclusive lock on the entire relation could significantly influence the performance of a database. So far I see these possibilities: 1- use AccessShare lock for the integrity check function and in case of faulty page, pass this page to a special function, which would lock the page using AccessExclusiveLock on the relation for correction. + it would not influence the performance so much - highercomplexity 2- use one function which would lock the relation with AccessExclusive lock, check the integrity of data and in case of faulty pages, it would repair it at once. + easier to implement - could cause performance downturn because of relatively long Exclusive lock on some relations. 3- use the AccessShare lock for the integrity check and use special "single-user mode" for recovery of data + safest option for recovery of data - long down time of database Furhter, I would like to know your opinions on what should be checked next in order to check the integrity of data in database. I am thinking of checking: -in case of variable-length data, compare the formal and actual size of data -check whether constrains applied on items are fulfilled -compare data in indexes with indexed tables, whether they are correct Robert P.S. Any comments to the c-funtion I made so far a welcome
On Fri, Aug 10, 2007 at 05:48:08AM +0200, Rober Mach wrote: > . In near future, I would like to extend the functionality also with a > recovery tool which would be able to repair broken pages. Should the > function also repair the corrupted data on a page, the AccessShare lock > on a relation would not be sufficient. But on the other hand > AccessExclusive lock on the entire relation could significantly > influence the performance of a database. I don't think data repairs are the best thing to be doing on a running system. At the very least, I'd lock the relation for now. If you're certain you can do page level locking safely that might be an option, but I'd consider that a bonus. Also, be careful not to introduce deadlocks by upgrading locks. A separate repair function might be a good idea (perhaps the check function can output a recordset that can be loaded into a table, and then the repair function is run against that...) > Furhter, I would like to know your opinions on what should be checked > next in order to check the integrity of data in database. I am thinking > of checking: > -in case of variable-length data, compare the formal and actual size of data How would you do that? The only way to tell the length is via the varlena header. The best I think you could do would be to calculate the length of a row and compare that to the item pointers. > -check whether constrains applied on items are fulfilled I don't know that there's much value in this... > -compare data in indexes with indexed tables, whether they are correct I think this should probably be top-priority, since the index code is fairly complex this has a much higher likelyhood of bugs/problems. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Robert, > I am working on a data integrity check tool (pgcheck). > I would like to discuss the following issues: I'm a little confused. I assumed that your project would check the pages of a *shut-down* database or one in recovery (single-user) mode as part of recovery after a crash or HW failure. We don't really want users running the database in normal mode if there's any significant chance of data-page corruption. Can you back up and explain what you're trying to accomplish with this tool? -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Robert, > > >> I am working on a data integrity check tool (pgcheck). >> I would like to discuss the following issues: >> > > I'm a little confused. I assumed that your project would check the pages of a > *shut-down* database or one in recovery (single-user) mode as part of > recovery after a crash or HW failure. We don't really want users running the > database in normal mode if there's any significant chance of data-page > corruption. > > Can you back up and explain what you're trying to accomplish with this tool? > > So far I was taking it just as a tool for validating the data in a database. A tool, which would be runned from psql and which would tell the user, whether the data is ok or not, eventually if there is any corrupted data, tell where the problem is... Just for checking the data integrity I thing this is possible way. It will be probably different in case of the data recovery tool and that is what I am trying to get opinions on.. Maybe one possible way (but probably not very convenient) is to run the check from psql to find out where the problem is and then run recovery tool on the corrupted files or use the "sigle-user" mode for recovery with the database being turned off. The integrity check tool could be also implemented to be runned on a *shut-down* database (for case of database crash), but I thing that being able to run it under *normal* conditions should be helpful as well. Robert