Обсуждение: Vacuum return codes (vacuum as db integrity check?)
General PostgreSQL administration question: I'm trying to use "vacuumdb" as a way to perform a database integrity check before starting my db-enabled application. I used to run it as "vacuumdb foo" and if the return code was anything other than zero, I knew the database was most likely hosed. (At which point I would destroy everything and reload from backup.) Now I've made a change and I find myself using the command "vacuumdb -f foo" to perform a full vacuum. However, I've noticed that vacuum full seems to return non-zero return values much more often. (Meaning the database is corrupt?) Is this known? Are the possible return values published somewhere without having to dig through all of the source code? (This is PostgrSQL v7.2 compiled for Linux, running on an older kernel 2.2.x.) BTW -- is this the "right" thing to do for a database integrity check? Is there a better way to do this than a vacuum? Stephane Charette
"Stephane Charette" <stephanecharette@telus.net> writes: > Now I've made a change and I find myself using the command "vacuumdb > -f foo" to perform a full vacuum. However, I've noticed that vacuum > full seems to return non-zero return values much more often. This is not a very useful statement. What would be useful is to see the error messages you are getting. (Look in the postmaster log, if your script is discarding stderr.) regards, tom lane
To re-state the reason behind my posting: >>[...] >>I'm trying to use "vacuumdb" as a way to perform a database integrity >>check before starting my db-enabled application. >>[...] >>Now I've made a change and I find myself using the command "vacuumdb >>-f foo" to perform a full vacuum. However, I've noticed that vacuum >>full seems to return non-zero return values much more often. ...to which Tom Lane (thank you!) replied: >This is not a very useful statement. What would be useful is to see the >error messages you are getting. (Look in the postmaster log, if your >script is discarding stderr.) Unfortunately, I don't have the situation repro'd at the very moment. I will look the next time this happens. However, I do have some other related questions: 1) Where is the postmaster log kept? I've looked, but I'm cannot seem to find it. 2) Does a non-zero return value from vacuumdb necessarily mean the database is corrupted? 3) If vacuumdb returns non-zero, does it mean we should blow away the database and restore from latest backup? Or is there a typical "fixdb" application that people run? 4) Are the exit codes in src/bin/pgsql/settings.h the only exit codes that can be used in the source? Or are there others defined in another file somewhere else? 5) Is vacuumdb the "right" thing to do for a database integrity check? 6) Is there a better way to do a database integrity check than a vacuumdb? And perhaps most importantly, 7) When power fails on a box that is running a 7.2 PostgreSQL database, do *you* ever find that the database is corrupt when the box comes back up? (A few technical details: db version is 7.2; Linux kernel 2.2.14; postmaster is running with the following command-line: "/usr/local/pgsql/bin/postmaster -D /foo/data -S"; haven't changed any of the default tuning values in the p*.conf files.) Thanks in advance for any assistance you can provide, Stephane Charette
"Stephane Charette" <stephanecharette@telus.net> writes: > 1) Where is the postmaster log kept? I've looked, but I'm cannot seem > to find it. I'm talking about the postmaster's stderr output. Look to see what your postmaster startup script does with stderr ... if it routes it to /dev/null, you'll need to change the script. > 2) Does a non-zero return value from vacuumdb necessarily mean the > database is corrupted? Impossible to tell without seeing the error messages. > 5) Is vacuumdb the "right" thing to do for a database integrity check? Not particularly. I'd consider a successful pg_dumpall run to be a more thorough check (or at least an equally good, and quite different, one). > 7) When power fails on a box that is running a 7.2 PostgreSQL database, > do *you* ever find that the database is corrupt when the box comes back > up? IIRC, there were some problems with sequences going backwards after a crash in 7.2. If you are on 7.2.x for x < 3, you are running a version with serious known bugs; you should update before complaining too much ... regards, tom lane
Tom Lane wrote: >IIRC, there were some problems with sequences going backwards after a >crash in 7.2. If you are on 7.2.x for x < 3, you are running a version >with serious known bugs; you should update before complaining too much >... Crazy question, but how do I determine the version number? If I run "postmaster --version", I get told "postmaster (PostgreSQL) 7.2". Does that really mean "7.2"? We were under the impression that we'd upgraded these boxes to 7.2.1 earlier this summer, so we're wondering if the minor version number is perhaps not being displayed. All of the shell scripts in .../pgsql/bin/ contain the word "7.2" -- no mention of minor version numbers such as "7.2.0", "7.2.1", etc. (Next time we'll skip the binaries and download & compile the source ourself so we know for certain what we're using!) Best regards, Stephane Charette
"Stephane Charette" <stephanecharette@telus.net> writes: > If I run "postmaster --version", I get told "postmaster (PostgreSQL) > 7.2". > Does that really mean "7.2"? Yes, it does. You could double-check with "select version()" though. regards, tom lane
Our client is running a large-ish database, and has infrequently been getting the following messages: Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 6 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 7 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 8 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 9 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 10 - fixing This goes on for about 2000 lines. A google search found Tom Lane telling someone with this symptom that they might have a serious hardware problem. However, _that_ person was getting these messages on a consistent basis, whereas we haven't seen this message in about 4 months, and then all of a sudden the other night during a nightly vacuum it popped up. I'd like to know what this means to our system - what causes an uninitialized page? What could cause it to be so infrequent? Here are our system details: postgre: 7.2.3 (~4GB in size) os: Redhat 7.1sbe (Seawolf) kernel: 2.4.9-12smp ram: ~1GB ECC Thanks, Tim -- Dyrect Media Group P.O. Box 486 6000 Goodrich Rd Clarence Center, NY 14032-0486 OFFICE: 716-504-1141 ext 208 CELL: 716-510-2451 AIM: somecallmetim100
Timothy D McKernan <tdm4@dyrectmedia.com> writes: > Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized > page 6 - fixing > I'd like to know what this means to our system - what causes an > uninitialized page? What could cause it to be so infrequent? "Uninitialized" means "page contains zeroes" (or at least a few critical page-header fields contain zeroes, which they should never do). Usually I take this as an indication of hardware problems. But: > Here are our system details: > postgre: 7.2.3 (~4GB in size) > os: Redhat 7.1sbe (Seawolf) > kernel: 2.4.9-12smp > ram: ~1GB ECC SMP? Are you actually using SMP hardware? I seem to recall that the 2.4 Linux kernels weren't stable on SMP machines till 2.4.15 or thereabouts. In any case, RedHat 7.1 is pretty long in the tooth. Perhaps an OS update would make life better. regards, tom lane
Our client is running a large-ish database, and has infrequently been getting the following messages: Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 6 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 7 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 8 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 9 - fixing Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized page 10 - fixing This goes on for about 2000 lines. A google search found Tom Lane telling someone with this symptom that they might have a serious hardware problem. However, _that_ person was getting these messages on a consistent basis, whereas we haven't seen this message in about 4 months, and then all of a sudden the other night during a nightly vacuum it popped up. I'd like to know what this means to our system - what causes an uninitialized page? What could cause it to be so infrequent? Here are our system details: postgre: 7.2.3 (~4GB in size) os: Redhat 7.1sbe (Seawolf) kernel: 2.4.9-12smp ram: ~1GB ECC Thanks, Tim -- Dyrect Media Group P.O. Box 486 6000 Goodrich Rd Clarence Center, NY 14032-0486 OFFICE: 716-504-1141 ext 208 CELL: 716-510-2451 AIM: somecallmetim100
>> Now I've made a change and I find myself using the command "vacuumdb >> -f foo" to perform a full vacuum. However, I've noticed that vacuum >> full seems to return non-zero return values much more often. > >This is not a very useful statement. What would be useful is to see the >error messages you are getting. (Look in the postmaster log, if your >script is discarding stderr.) Ok -- I redirected stdout/stderr for postmaster and vacuumdb, and the problem finally re-occurred a few minutes ago. This is what happens: - database is started using "postmaster -D /foo >/tmp/dblog 2>&1 &" - pg_dumpall is run which results in a return code of zero (Tom Lane mentionned a few days ago that pg_dumpall might be a better "database integrity check" than running vacuumdb) - vacuumdb is started using "vacuumdb -f log >/tmp/vacuumdblog 2>&1" - vacuumdb results in a return code of 1! (I'm testing how valid our previous attempts at "integrity checks" might be) The error received when we run "vacuumdb" is: -> ERROR: Cannot insert a duplicate key into unique index pg_class_oid_index -> vacuumdb: vacuum log failed The error logged by postmaster is exactly the same: -> ERROR: Cannot insert a duplicate key into unique index pg_class_oid_index Now in the past, when vacuumdb returns non-zero return codes, we've considered the databasebase to be hosed, and thus, would blow it away and rebuild it. Many days of data would sometimes be lost. However, pg_dumpall shows that we seem to still have access to the data. Looking up "vacuum cannot insert a duplicate key into unique index" on usenet returns quite a few postings, but no many suggestions or solutions. In our case, we are using 7.2 on a linux 2.2.14 kernel. Binaries were downloaded via RPM directly from RedHat. My questions now would be: 1) How serious is the vacuumdb error? 2) How do we fix it? 3) Is the database hosed? Thanks in advance, Stephane Charette
"Stephane Charette" <stephanecharette@telus.net> writes: > The error received when we run "vacuumdb" is: > -> ERROR: Cannot insert a duplicate key into unique index > pg_class_oid_index That's pretty interesting. My first thought is a corrupt index. Please try reindexing pg_class. You'll need to do this in a standalone backend --- read the REINDEX reference page carefully. > In our case, we are using 7.2 on a linux 2.2.14 kernel. Binaries were > downloaded via RPM directly from RedHat. 7.2? You should be using 7.2.3. regards, tom lane
Tom Lane wrote: > Timothy D McKernan <tdm4@dyrectmedia.com> writes: > >>Nov 6 04:17:09 dolidb-n1 logger: NOTICE: Rel pg_type: Uninitialized >>page 6 - fixing > > >>I'd like to know what this means to our system - what causes an >>uninitialized page? What could cause it to be so infrequent? > > > "Uninitialized" means "page contains zeroes" (or at least a few critical > page-header fields contain zeroes, which they should never do). Usually > I take this as an indication of hardware problems. But: > > >>Here are our system details: >>postgre: 7.2.3 (~4GB in size) >>os: Redhat 7.1sbe (Seawolf) >>kernel: 2.4.9-12smp >>ram: ~1GB ECC > > > SMP? Are you actually using SMP hardware? I seem to recall that the > 2.4 Linux kernels weren't stable on SMP machines till 2.4.15 or > thereabouts. In any case, RedHat 7.1 is pretty long in the tooth. > Perhaps an OS update would make life better. Yes, we have a dual processor board. Per your suggestion we'll be upgrading the system soon and watching the results of that. Thanks, Tim > > regards, tom lane > -- Dyrect Media Group P.O. Box 486 6000 Goodrich Rd Clarence Center, NY 14032-0486 OFFICE: 716-504-1141 ext 208 CELL: 716-510-2451 AIM: somecallmetim100