Обсуждение: index and table corruption

Поиск

Список

Период

Сортировка

index and table corruption

От

"Anand Kumar, Karthik"

Дата:

19 декабря 2013 г., 21:42:43

Hi,

We're looking for help with possible corruption of our indexes and tables.

Seemingly in the middle of normal operations, we will run into errors like

the below:

ERROR:  index "mv_visits_p03_n2" contains unexpected zero page at block

15939

ERROR:  invalid page header in block 344713 of relation

pg_tblspc/4376157/PG_9.1_201105231/16393/8367465

Following which the database continues on, but IO creeps up until finally

the server becomes unresponsive. The database has never 'crashed' though

A majority of the tables are the same each time, although new ones will

come in, and old ones will go out. A total of about 84 out of 452 tables

have gotten this error so far.

We run postgres verion 9.1.2, installed via the PGDG rpms.

The server runs centos5.6, and the disk backend is Netapp based SAN

Its a 24CPU box, with 768G RAM.

The database is about 1TB. Its a single database cluster.

Things we've tried so far:

- Everytime we run into the error, we restore the database from a previous

snapshot (block level Netapp snapshot). Snapshots are taken with the

postgres hot backup mode enabled, and are clean. They are block level, so

ideally going back to a snapshot should remove any block level corruption

that occurred on the device.

- We set zero_damaged_pages = on, ran a full vacuum and re-index of 4

tables. Both the full vacuum and reindex completed successfully, with no

errors. The same tables showed up when it failed again.

- We've had the sysadmins check for errors with the hardware ­ no errors

so far about any h/w problems, either on the box, with the SAN switches,

or on the filer. We are going to switch over to a different server on the

same SAN backend, to see if that helps

- We suspected it might have something to do with

http://wiki.postgresql.org/wiki/20120924updaterelease, and upgraded to

postgres 9.1.11, that hasn't helped.

- We had shared_buffers set to 60G, and reduced that down to 8G, and then

to 4G, suspecting problems with the background writer handling such high

shared buffers, that hasn't helped either.

Our postgres configuration is:

Version: PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc

(GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit

Updgraded to: PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by

gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit

            name             |

                           curent_setting

------------------------------+--------------------------------------------

-------------------------------------------------------

application_name             | psql

archive_command              | /usr/bin/archiver.sh %f %p

archive_mode                 | on

checkpoint_completion_target | 0.8

checkpoint_segments          | 25

checkpoint_timeout           | 10min

checkpoint_warning           | 2min

client_encoding              | UTF8

commit_siblings              | 25

custom_variable_classes      | symmetric

DateStyle                    | ISO, MDY

default_statistics_target    | 300

default_text_search_config   | pg_catalog.english

effective_cache_size         | 128GB

lc_messages                  | en_US.UTF-8

lc_monetary                  | en_US.UTF-8

lc_numeric                   | en_US.UTF-8

lc_time                      | en_US.UTF-8

listen_addresses             | *

log_checkpoints              | on

log_destination              | syslog

log_directory                | /var/lib/pgsql/cmates/admin

log_filename                 | postgresql-%a.log

log_line_prefix              | user=%u,db=%d,ip=%h

log_min_duration_statement   | 0

log_rotation_age             | 1d

log_rotation_size            | 0

log_timezone                 | US/Pacific

log_truncate_on_rotation     | on

logging_collector            | off

maintenance_work_mem         | 32MB

max_connections              | 1500

max_locks_per_transaction    | 1000

max_stack_depth              | 2MB

max_wal_senders              | 5

port                         | 5432

search_path                  | activities, alert, announce, askme, audit,

authentication, book, btdt, bulletinboard, cache, cas, cdc, cmates, cmdba,

collection, dep, emailsubscription, emailvalidation, eventmail, feeds,

friend, geo, inbox, invitation, ir

, kkumar, merge, myvisitor, people, photos, prepsports, profile,

provisioning, quiz, registrant_icons, registration, reunion, school,

schoolfeed, shortlist, socialauth, statspack, story, symmetricds, target,

yearbook, "$user", public

shared_buffers               | 8GB

synchronous_commit           | off

syslog_facility              | local0

syslog_ident                 | postgres

TimeZone                     | US/Pacific

vacuum_freeze_table_age      | 0

wal_buffers                  | 16MB

wal_level                    | archive

wal_sync_method              | fsync

work_mem                     | 8MB

(47 rows)

Any help would be most appreciated!

Thanks,

Karthik

Thanks,

Karthik

Re: index and table corruption

От

Shaun Thomas

Дата:

19 декабря 2013 г., 22:23:17

On 12/19/2013 12:42 PM, Anand Kumar, Karthik wrote:

> ERROR:  index "mv_visits_p03_n2" contains unexpected zero page at block
> 15939
> ERROR:  invalid page header in block 344713 of relation
> pg_tblspc/4376157/PG_9.1_201105231/16393/8367465

I don't care what kind of checks your admins have performed. You have
either bad memory, a bad controller card, SAN, or an otherwise unstable
system. Do not continue to use this platform in its current state if you
care about your data.

> A majority of the tables are the same each time, although new ones will
> come in, and old ones will go out. A total of about 84 out of 452 tables
> have gotten this error so far.

This is screaming memory or disk-based corruption.

> We run postgres verion 9.1.2, installed via the PGDG rpms.
> The server runs centos5.6, and the disk backend is Netapp based SAN
> Its a 24CPU box, with 768G RAM.
> The database is about 1TB. Its a single database cluster.

That's a pretty nice system. :)

> - We set zero_damaged_pages = on, ran a full vacuum and re-index of 4
> tables. Both the full vacuum and reindex completed successfully, with no
> errors. The same tables showed up when it failed again.

Because they're being corrupted again.

> - We've had the sysadmins check for errors with the hardware  no errors
> so far about any h/w problems, either on the box, with the SAN switches,
> or on the filer. We are going to switch over to a different server on the
> same SAN backend, to see if that helps

Do this. Do nothing else but this. Regardless of the checks the admins
have run, you need to verify the data remains uncorrupted by removing
variables. If this doesn't help, your SAN itself may be the problem.

> - We suspected it might have something to do with
> http://wiki.postgresql.org/wiki/20120924updaterelease, and upgraded to
> postgres 9.1.11, that hasn't helped.

You should do that anyway. There is literally no benefit to running an
old minor release. Unlike 9.1 to 9.2, 9.1.2 to 9.1.11 costs nothing but
a database restart. The amount of potential data corruption, crash, and
planner bugs you avoid by doing so should never be underestimated.

> - We had shared_buffers set to 60G, and reduced that down to 8G, and then
> to 4G, suspecting problems with the background writer handling such high
> shared buffers, that hasn't helped either.

60GB is way too high. Use 4GB or 8GB, like you said. 60GB means a very
very long checkpoint, requires a far longer completion_timeout and
checkpoint segments to really be useful, and greatly amplifies recovery
time.

> log_min_duration_statement   | 0

You're logging every statement to pass through the server? If you have a
lot of volume, that's a ton of IO all by itself. Enough that I would
recommend putting the logs on another LUN entirely.

> maintenance_work_mem         | 32MB

This is way too low. This setting is used for doing vacuum operations
and other work necessary to maintain the database and its contents. You
have more than enough memory to set this at 1GB.

> synchronous_commit           | off

Turn this back on. So long as you're having corruption issues, you need
this to be as stable as possible. Having transactions report commit
before the WAL is successfully written means potentially losing
transactions, especially with the frequent shutdowns these corruptions
are causing.

Aside from that, you're not likely to find much from asking here. You
clearly have a hardware problem somewhere along the chain. Until that's
resolved, you will have random corrupt data on your most frequently
modified objects. Snapshot restores and WAL recovery can help repair the
issues, but it's only a matter of time before a corruption is written
right into the WAL itself, forcing you to do PITR instead.

Good luck!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: index and table corruption

От

"Anand Kumar, Karthik"

Дата:

19 декабря 2013 г., 23:18:04

Thanks Shaun!

Yes, we're getting synchronous_commit on right now.

The log_min_duration was briefly set to 0 at the time I sent out the post,
just to see what statements were logged right before everything went to
hell. Didn't yield much since we very quickly realized we couldn't cope
with the volume of logs.

We also noticed that when trying to recover from a snapshot and replay
archived wal logs, it would corrupt right away, in under an hour. When
recovering from snapshots *without* replaying wal logs, we go on for a day
or two without the problem, so it does seem like wal logs are probably not
being flushed to disk as expected.

Will update once we get onto the new h/w to see if that fixes it.

Thanks,
Karthik

Re: index and table corruption

От

Jerry Sievers

Дата:

19 декабря 2013 г., 23:43:06

"Anand Kumar, Karthik" <Karthik.AnandKumar@classmates.com> writes:

> Thanks Shaun!
>
> Yes, we're getting synchronous_commit on right now.
>
> The log_min_duration was briefly set to 0 at the time I sent out the post,
> just to see what statements were logged right before everything went to
> hell. Didn't yield much since we very quickly realized we couldn't cope
> with the volume of logs.
>
> We also noticed that when trying to recover from a snapshot and replay
> archived wal logs, it would corrupt right away, in under an hour. When
> recovering from snapshots *without* replaying wal logs, we go on for a day
> or two without the problem, so it does seem like wal logs are probably not
> being flushed to disk as expected.

Make sure your snapshots are atomic as you probably assume they are and
in fact must be if you expect a consistent cluster after startup and
crash recovery.

That is, if you are doing snaps at random times and not wrapping with
pgstart/stop backup() *and* replaying WAL till concisconsistent recovery
point.

If you're snapping something like a remote-site mirror running SAN
block-level replication, unless the snap is done at the end of flushing
all changed blocks since last tick, then the image you're snapping may
not be consistent.

I say that because, I came into a company that had been doing snaps this
way since eons ago and thought that since the clusters would start up
and could perform trivial checks, things were OK.

As soon aas you subjected an instance dirived this way however with
something wide-ranging such as an all-table vac/analyze, dumpall... etc,
soon after launching the foo, corruption was observed.

FWIW

>
> Will update once we get onto the new h/w to see if that fixes it.
>
> Thanks,
> Karthik

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

Re: index and table corruption

От

"Anand Kumar, Karthik"

Дата:

20 декабря 2013 г., 00:23:11

Hi Jerry,

Thanks for the suggestion

Yes, until about a month ago, we weren't wrapping our snapshots with
pg_start_backup and pg_stop_backup. Same reason as you mentioned, the
database would start up and "trivial checks" would be okay, and so we
figured "why write a script?".

However we did change that a month or so ago ago, and have had the problem
after that. Every snapshot we have tried to actually recover from has been
wrapped in a pg_start_backup and pg_stop_backup, so we are leaning more
towards server/disk corruption at this time.

We also synced our snapshot to an alternate SAN, and ran a script to
update every row of every table, and do a full vacuum and reindex of every
table, and there were no error messages about bad blocks.

Thanks,
Karthik

Re: index and table corruption

От

"Anand Kumar, Karthik"

Дата:

23 декабря 2013 г., 05:59:50

>
>Thanks Shaun!
>
>Yes, we're getting synchronous_commit on right now.
>
>The log_min_duration was briefly set to 0 at the time I sent out the post,
>just to see what statements were logged right before everything went to
>hell. Didn't yield much since we very quickly realized we couldn't cope
>with the volume of logs.
>
>We also noticed that when trying to recover from a snapshot and replay
>archived wal logs, it would corrupt right away, in under an hour. When
>recovering from snapshots *without* replaying wal logs, we go on for a day
>or two without the problem, so it does seem like wal logs are probably not
>being flushed to disk as expected.
>
>Will update once we get onto the new h/w to see if that fixes it.

>>> FYI - the change of server hardware didn't help
>>> Also, I'm able to run a full pg_dump of the database (without output
>>>to /dev/null) and it completes okay.
>>> And then a few hours later the issue shows up again
>>> We run an ext3 filesystem with journaling=ordered, and kernel 2.6.18
>>> Currently testing an upgrade to the kernel and turning off journaling
>>>on the filesystem to see if that will help

>
>Thanks,
>Karthik
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: index and table corruption

index and table corruption

Re: index and table corruption

Re: index and table corruption

Re: index and table corruption

Re: index and table corruption

Re: index and table corruption