Обсуждение: Dangers of fsync = off

Поиск
Список
Период
Сортировка

Dangers of fsync = off

От
Joel Dice
Дата:
Hello all.

It's clear from the documentation for the fsync configuration option that
turning it off may lead to unrecoverable data corruption.  I'd like to
learn more about why this is possible and how likely it really is.

A quick look at xlog.h reveals that each record in the transaction log
contains a CRC checksum, a transaction ID, a length, etc..  Assuming the
worst thing that can happen due to a crash is that the end of the log is
filled with random garbage, there seems to be little danger that the
recovery process will misinterpret any of that garbage as a valid
transaction record, complete with matching checksum.

If my assumption is incorrect (i.e. garbage at the end of the log is not
the worst that can happen), what else might happen, and how would this
lead to unrecoverable corruption?  Also, are there any filesystems
available which avoid such cases?

Sorry if this has been discussed before - in which case please point me to
that discussion.

Thanks.

  - Joel

Re: Dangers of fsync = off

От
Tom Lane
Дата:
Joel Dice <dicej@mailsnare.net> writes:
> It's clear from the documentation for the fsync configuration option that
> turning it off may lead to unrecoverable data corruption.  I'd like to
> learn more about why this is possible and how likely it really is.

As you note, WAL is not particularly vulnerable --- the worst likely
consequence is not being able to read the last few WAL entries that
were made.

The real problem with fsync off is that there is essentially no
guarantee about the relative write order of WAL and data files.
In particular, some data-file updates might hit disk before the
corresponding WAL entries.  If other data-file updates part of
the same transaction did *not* reach disk before a crash, then
replay of WAL might not cause those updates to happen (because
the relevant WAL records are unreadable), leaving you with
inconsistent data.

Another scenario is that a checkpoint is shown as completed by WAL but
not all of the before-the-checkpoint data-file updates actually reached
disk.  WAL replay will start from the checkpoint and therefore not fix
the missing updates.

Either way you have inconsistencies in on-disk data, such as missing
tuples, multiple live versions of the same tuple, index contents not
consistent with heap, or outright-corrupt index structure.  The extent
to which these things are visible to applications is hard to predict,
but it's frequently ugly :-(.  Index problems can always be fixed with
REINDEX, but there's no fix for inconsistent heap contents.

            regards, tom lane

Re: Dangers of fsync = off

От
Joel Dice
Дата:
Thanks for the explanation, Tom.  I understand the problem now.

My next question is this: what are the dangers of turning fsync off in the
context of a high-availablilty cluster using asynchronous replication?

In particular, we are using Slony-I and linux-ha to provide a two-node,
master-slave cluster.  As you may know, Slony-I uses triggers to provide
asynchronous replication.  If the master (X) fails, the slave (Y) becomes
active.  At this point, the administrator manually performs a recovery by
reintroducing X so that Y is the master and X is the slave.  This task
involves dropping any databases on X and having it sync with the versions
on Y.  Thus, database corruption on X is irrelevant since our first step
is to drop them.

It would seem that our only exposure is that both machines fail before the
administrator is able to perform the recovery.  Even that could be solved
by leaving fsync turned on for the slave, so that when failover occurs and
the slave becomes active, we only turn fsync off once we've safely
reintroduced the other machine (which, in turn will have fsync turned on).

There was a discussion about this here:

   http://gborg.postgresql.org/pipermail/slony1-general/2005-March/001760.html

However, that discussion seems to assume that the administrator needs to
salvage the databases on the failed machine, which is not necessary in
our case.

In short, is there any danger (besides losing a few transactions) of
turning fsync off on the master of a cluster using asynchronous
replication, assuming we don't need to recover the data from the master
when it fails?

Thanks.

  - Joel

Re: Dangers of fsync = off

От
Andrew Sullivan
Дата:
On Fri, May 04, 2007 at 08:54:10AM -0600, Joel Dice wrote:
>
> My next question is this: what are the dangers of turning fsync off in the
> context of a high-availablilty cluster using asynchronous replication?

My real question is why you want to turn it off.  If you're using a
battery-backed cache on your disk controller, then fsync ought to be
pretty close to free.  Are you sure that turning it off will deliver
the benefit you think it will?

> on Y.  Thus, database corruption on X is irrelevant since our first step
> is to drop them.

Not if the corruption introduces problems for replication, which is
indeed possible.

A
--
Andrew Sullivan  | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
        --Alexander Hamilton

Re: Dangers of fsync = off

От
Joel Dice
Дата:
Thanks for your response, Andrew.

On Tue, 8 May 2007, Andrew Sullivan wrote:

> On Fri, May 04, 2007 at 08:54:10AM -0600, Joel Dice wrote:
>>
>> My next question is this: what are the dangers of turning fsync off in the
>> context of a high-availablilty cluster using asynchronous replication?
>
> My real question is why you want to turn it off.  If you're using a
> battery-backed cache on your disk controller, then fsync ought to be
> pretty close to free.  Are you sure that turning it off will deliver
> the benefit you think it will?

You may very well be right.  I tend to think in terms of software
solutions, but a hardware solution may be most appropriate here.  In any
case, I'm not at all sure this will bring a significant peformance
improvement.  I just want to understand the implications before I start
fiddling; if fsync=off is dangerous, it doesn't matter what the
performance benefits may be.

>> on Y.  Thus, database corruption on X is irrelevant since our first step
>> is to drop them.
>
> Not if the corruption introduces problems for replication, which is
> indeed possible.

That's exactly what I want to understand.  How, exactly, is this possible?
If the danger of fsync is that it may leave the on-disk state of the
database in an inconsistent state after a crash, it would not seem to have
any implications for activity occurring prior to the crash.  In
particular, a trigger-based replication system would seem to be immune.

In other words, while there may be ways the master could cause corruption
on the slave, I don't see how they could be related to the fsync setting.

  - Joel

Re: Dangers of fsync = off

От
Bill Moran
Дата:
In response to Joel Dice <dicej@mailsnare.net>:

> Thanks for your response, Andrew.
>
> On Tue, 8 May 2007, Andrew Sullivan wrote:
>
> > On Fri, May 04, 2007 at 08:54:10AM -0600, Joel Dice wrote:
> >>
> >> My next question is this: what are the dangers of turning fsync off in the
> >> context of a high-availablilty cluster using asynchronous replication?
> >
> > My real question is why you want to turn it off.  If you're using a
> > battery-backed cache on your disk controller, then fsync ought to be
> > pretty close to free.  Are you sure that turning it off will deliver
> > the benefit you think it will?
>
> You may very well be right.  I tend to think in terms of software
> solutions, but a hardware solution may be most appropriate here.  In any
> case, I'm not at all sure this will bring a significant peformance
> improvement.  I just want to understand the implications before I start
> fiddling; if fsync=off is dangerous, it doesn't matter what the
> performance benefits may be.
>
> >> on Y.  Thus, database corruption on X is irrelevant since our first step
> >> is to drop them.
> >
> > Not if the corruption introduces problems for replication, which is
> > indeed possible.
>
> That's exactly what I want to understand.  How, exactly, is this possible?
> If the danger of fsync is that it may leave the on-disk state of the
> database in an inconsistent state after a crash, it would not seem to have
> any implications for activity occurring prior to the crash.  In
> particular, a trigger-based replication system would seem to be immune.

If you mean Slony, no.  It's not immune.  Slony maintains its state in
tables in the database.  If fsync is off, there's no guarantee that Slony's
state information is sane, which means replication is not guaranteed to be
or do anything.

> In other words, while there may be ways the master could cause corruption
> on the slave, I don't see how they could be related to the fsync setting.

Specifically, I can imagine a system crashing, then _seeming_ to restart
properly, but Slony starts re-replicating transactions that have already
been replicated once because the ACKs were never written to disk on the
master.  Take the example of a query "UPDATE tablename SET x = x + 1".
When this query is erroneously issued twice, data corruption will occur.

Other scenarios may be possible.

--
Bill Moran
http://www.potentialtech.com

Re: Dangers of fsync = off

От
Csaba Nagy
Дата:
> [snip] Take the example of a query "UPDATE tablename SET x = x + 1".
> When this query is erroneously issued twice, data corruption will occur.

Huh ? I thought slony is replicating data, not queries... what on the
master is "UPDATE tablename SET x = x + 1" will translate to "UPDATE
tablename SET x = new_value" on the slave, where new_value equals that x
+ 1. That's why slony is working well even if you do "UPDATE tablename
SET x = now()".

Cheers,
Csaba.



Re: Dangers of fsync = off

От
Bill Moran
Дата:
In response to Csaba Nagy <nagy@ecircle-ag.com>:

> > [snip] Take the example of a query "UPDATE tablename SET x = x + 1".
> > When this query is erroneously issued twice, data corruption will occur.
>
> Huh ? I thought slony is replicating data, not queries... what on the
> master is "UPDATE tablename SET x = x + 1" will translate to "UPDATE
> tablename SET x = new_value" on the slave, where new_value equals that x
> + 1. That's why slony is working well even if you do "UPDATE tablename
> SET x = now()".

True.  My mistake.

I still wouldn't trust Slony with fsync off.  Another scenario would be
the Slony trigger writes a change to the Slony DB, the db crashes before
it gets committed to disk.  When the DB is started, no errors prevent
startup, but that transaction is lost.

I mean, you have to weight all these possibilities against your tolerance
for data loss.  If you're a bank, none of this is acceptable.  If you're
MySpace, who f*cking cares if you lose data (I saw an article where the
CIO of MySpace admitted that was their policy -- must be nice to have a
job where nobody cares if you do it wrong!)

--
Bill Moran
http://www.potentialtech.com

Re: Dangers of fsync = off

От
Scott Ribe
Дата:
> I still wouldn't trust Slony with fsync off.  Another scenario would be
> the Slony trigger writes a change to the Slony DB, the db crashes before
> it gets committed to disk.  When the DB is started, no errors prevent
> startup, but that transaction is lost.

I'm not sure, but I think the questioner was proposing a policy of "if it
crashes, we go to the standby, no attempt at recovery, ever", and I think
that would be safe.

And, personally, given my experience with pg, I think that's reasonable.
Because the day I see pg crash I'm going to assume I have a hardware problem
;-)

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Dangers of fsync = off

От
Brad Nicholson
Дата:
On Wed, 2007-05-09 at 08:26 -0600, Scott Ribe wrote:
> > I still wouldn't trust Slony with fsync off.  Another scenario would be
> > the Slony trigger writes a change to the Slony DB, the db crashes before
> > it gets committed to disk.  When the DB is started, no errors prevent
> > startup, but that transaction is lost.
>
> I'm not sure, but I think the questioner was proposing a policy of "if it
> crashes, we go to the standby, no attempt at recovery, ever", and I think
> that would be safe.

Just make sure that there is no way that the database would come back up
after the crash.  If it did, the slons could pick up and cause you
trouble.

If you disable all start up scripts, and operate under the assumption
that crash=corruption=failover to Slony replica, you should be okay.
You will lose whatever transactions were not replicated to the
subscriber, but that's inherent to async replication.

> And, personally, given my experience with pg, I think that's reasonable.
> Because the day I see pg crash I'm going to assume I have a hardware problem
> ;-)

If you care about your data, leave fsync on.  Period.  If you can accept
the potential for data loss, and you've proven that there is a
worthwhile performance benefit from turning it off (which there may not
be), and you gotten your boss/clients/stakeholders to sign off
(preferably in writing) that data loss is acceptable if the db crashes,
then go ahead and turn it off.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Dangers of fsync = off

От
"Dawid Kuroczko"
Дата:
On 5/8/07, Joel Dice <dicej@mailsnare.net> wrote:
> On Tue, 8 May 2007, Andrew Sullivan wrote:
> > My real question is why you want to turn it off.  If you're using a
> > battery-backed cache on your disk controller, then fsync ought to be
> > pretty close to free.  Are you sure that turning it off will deliver
> > the benefit you think it will?
>
> You may very well be right.  I tend to think in terms of software
> solutions, but a hardware solution may be most appropriate here.  In any
> case, I'm not at all sure this will bring a significant peformance
> improvement.  I just want to understand the implications before I start
> fiddling; if fsync=off is dangerous, it doesn't matter what the
> performance benefits may be.

Well, fsync=off makes failures harder to cope with.

Normally when your operating system crashes/power fails your
master server should start up cleanly.  If it doesn't -- you've got slave.

Now, with fsync=off you should promote slave to master whenever
you experience crash/power failure, just to be safe.  Having battery
backed unit may be cheaper than cost of failovers (time of DBA
costs money, downtime also ;)).  Do some testing, do some
calculations.

> >> on Y.  Thus, database corruption on X is irrelevant since our first step
> >> is to drop them.
> >
> > Not if the corruption introduces problems for replication, which is
> > indeed possible.
>
> That's exactly what I want to understand.  How, exactly, is this possible?
> If the danger of fsync is that it may leave the on-disk state of the
> database in an inconsistent state after a crash, it would not seem to have
> any implications for activity occurring prior to the crash.  In
> particular, a trigger-based replication system would seem to be immune.
>
> In other words, while there may be ways the master could cause corruption
> on the slave, I don't see how they could be related to the fsync setting.

OK, let's assume you have machine mdb as a master database,
and sdb as slave database.  mdb has fsync=off and Slony-I is used
as a replication system.

You have a power failure/system crash/whatever.  mdb goes down.
Your sdb is consistent, but it's missing, let's say 15 seconds of last
transactions which didn't manage to replicate.
You don't do failover yet.  Your mdb starts up, PostgreSQL replays
its Write Ahead Log.  Everything seems fine, mdb is up and running,
and these 15 seconds of transactions are replicated to sdb.

Oops.  PostgreSQL seemd to be fine, but since fsync was off,
the rows in Money_Transactions weren't flushed to disk (fsync
was off), and PostgreSQL thought they should already be on disk
(WAL was replayed since last known CHECKPOINT), you didn't
actually replicated these transactions.  If you are really unlucky
you've replicated some old contents of database, and thus
now, both your mdb and sdb contain erraneous data.
Of course sdb is consistent in terms of "internal structure" but
try explaining it to the poor soul who happened to be doing
updates on Money_Transactions table. ;-)

Of course likelihood of this happening isn't very big -- PostgreSQL
really tries to safeguard your data (elephant never forgets ;)),
but only as long as you give him a chance. ;)

   Regards,
      Dawid

Re: Dangers of fsync = off

От
Joel Dice
Дата:
Thanks, Bill and Scott, for your responses.

To summarize, turning fsync off on the master of a Slony-I cluster is
probably safe if you observe the following:

   1. When failover occurs, drop all databases on the failed machine and
sync it with the new master before re-introducing it into the cluster.
Note that the failed machine must not be returned to use until this is
done.

   2. Be aware that the above implies that you will lose any transactions
which did not reach the standby machine prior to failure, violating the
Durability component of ACID.  This is true of any system which relies on
asynchronous replication and automatic failover.

  - Joel