Re: Re: corruption issue after server crash - ERROR: unexpected chunk number 0

Поиск
Список
Период
Сортировка
От Shaun Thomas
Тема Re: Re: corruption issue after server crash - ERROR: unexpected chunk number 0
Дата
Msg-id 0683F5F5A5C7FE419A752A034B4A0B97975AAAED@sswchi5pmbx2.peak6.net
обсуждение исходный текст
Ответ на Re: Re: corruption issue after server crash - ERROR: unexpected chunk number 0  (Mike Broers <mbroers@gmail.com>)
Список pgsql-general
> So if this problem replicated to our standby servers does that indicate
> that the potential problematic fsync occurred during a pg_xlog write?

Pretty much. You have a couple issues here, and no easy way to approach them. Primarily, you got data corruption during
async operation. This means either the OS or the hardware somewhere along the line lied about the write, or the write
wascorrupted and the filesystem log replayed incorrectly upon reboot. Once that happens, you can't trust *any* data in
yourdatabase. Pre-checksum PostgreSQL has no way to verify integrity of existing data, and system crashes can corrupt
quitea bit of data that was only tangentially involved. 

What likely happens in these scenarios, is that the database startup succeeds, and then it read some rows in from a
corruptedtable. By corrupted, I mean even a single data page with a mangled pointer. That mangled pointer gave the
databaseincorrect information about the state of that data page's contents, and the database continued on that
information.That means subsequent transaction logs from that point are *also* corrupt, and hence any streaming or warm
standbyreplicas are subsequently damaged as well. But they'll be damaged differently, because they likely didn't have
theinitial corruption, just the byte changes dictated by the WAL stream. 

Unless you know where the initial corruption came from, the system that caused it should be quarantined for
verification.RAM, disk, CPU, everything should pass integrity checks before putting it back into production. 

> Would breaking replication at the time of the crash have prevented
> this from cascading or was it already too late at that point?

Most likely. If, at the time of the crash, you switched to one of your replicas and made it the new master, it would
giveyou the opportunity to check out the crashed system before it spread the love. Even if you don't have a true
STONITHmodel, starting up a potentially data-compromised node in an active cluster is a gamble. 

I did something similar once. One of our DRBD nodes crashed and came back up and re-attached to the DRBD pair after a
quickdata discard and replay. I continued with some scheduled system maintenance, and performed a node failover with no
incident.It wasn't until 20 minutes later that the corrupt disk pages started making their presence felt, and by then
Itwas too late. Luckily we were still verifying, but with our secondaries ruined, we had to restore from backup. A
30-minuteoutage became a 4-hour one. 

Afterwards, we put in a new policy that any crash means a DRBD verify at minimum, and until the node passes, it is to
beconsidered invalid and unusable. If you haven't already, I suggest something similar for your setup. Verify a crashed
nodebefore using it again, no matter how much pressure you're under. It can always get worse. 

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com


______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


В списке pgsql-general по дате отправления:

Предыдущее
От: Joey Quinn
Дата:
Сообщение: Re: tracking scripts...
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: tracking scripts...