Обсуждение: turning fsync off for WAL

Поиск
Список
Период
Сортировка

turning fsync off for WAL

От
"Ram Ravichandran"
Дата:
Hey,

I am running a postgresql server on Amazon EC2. My current plan is to mount an Amazon S3 bucket as a drive using PersistentFS which is a POSIX-compliant file system.
I will be using this for write-ahead-logging. The issue with S3 is that though the actual storage is cheap, they charge $1 per 100,000 put requests  - so frequent fsyncs will
cost me a lot.

I've been talking to the makers of persistentFS, and one possible solution is for the file system to disobey fsyncs. I am trying to find out the implications of this method in
case of a crash. Will I only lose information since the last fsync? Or will the earlier data, in general, be corrupted due to some out-of-order writes (I remember seeing this somewhere)?

Thanks,

Ram

Re: turning fsync off for WAL

От
"Scott Marlowe"
Дата:
On Mon, Jun 2, 2008 at 6:12 PM, Ram Ravichandran <ramkaka@gmail.com> wrote:
> Hey,
> I am running a postgresql server on Amazon EC2. My current plan is to mount
> an Amazon S3 bucket as a drive using PersistentFS which is a POSIX-compliant
> file system.
> I will be using this for write-ahead-logging. The issue with S3 is that
> though the actual storage is cheap, they charge $1 per 100,000 put requests
>  - so frequent fsyncs will
> cost me a lot.
> I've been talking to the makers of persistentFS, and one possible solution
> is for the file system to disobey fsyncs. I am trying to find out the
> implications of this method in
> case of a crash. Will I only lose information since the last fsync? Or will
> the earlier data, in general, be corrupted due to some out-of-order writes
> (I remember seeing this somewhere)?

Running without fsyncs is likely to lead to a corrupted db if you get
a crash / loss of connection etc...

Re: turning fsync off for WAL

От
"Ram Ravichandran"
Дата:

Running without fsyncs is likely to lead to a corrupted db if you get
a crash / loss of connection etc...

Just to clarify, by corrupted db you mean that all information (even the ones prior to the last fsync) will be lost. Right?

Thanks,

Ram

Re: turning fsync off for WAL

От
"Scott Marlowe"
Дата:
On Mon, Jun 2, 2008 at 6:42 PM, Ram Ravichandran <ramkaka@gmail.com> wrote:
>
>> Running without fsyncs is likely to lead to a corrupted db if you get
>> a crash / loss of connection etc...
>
> Just to clarify, by corrupted db you mean that all information (even the
> ones prior to the last fsync) will be lost. Right?


Well, it might be there, might not.  more likely most of it will be
there, but you'll get errors accessing files and could have incoherent
data coming out of the db.

but yeah, it's way more than just losing the last transaction.

Re: turning fsync off for WAL

От
Gregory Stark
Дата:
"Ram Ravichandran" <ramkaka@gmail.com> writes:

> Hey,
> I am running a postgresql server on Amazon EC2. My current plan is to mount
> an Amazon S3 bucket as a drive using PersistentFS which is a POSIX-compliant
> file system.
> I will be using this for write-ahead-logging. The issue with S3 is that
> though the actual storage is cheap, they charge $1 per 100,000 put requests
>  - so frequent fsyncs will
> cost me a lot.

Wow, this is a fascinating situation. Are you sure the fsyncs are the only
thing to worry about though? Postgres will call write(2) many times even if
you disabled fsync entirely. Surely the kernel and filesystem will eventually
send some of them through even if no fsyncs arrive?

Is it only fsyncs on the write-ahead-log that matter? Or on the data as well?
Checkpoints fsync the data files. The logs are fsynced on every commit and
also whenever a buffer has to be flushed if the logs for the last changes in
that buffer haven't been synced yet.

> I've been talking to the makers of persistentFS, and one possible solution
> is for the file system to disobey fsyncs. I am trying to find out the
> implications of this method in case of a crash. Will I only lose information
> since the last fsync? Or will the earlier data, in general, be corrupted due
> to some out-of-order writes (I remember seeing this somewhere)?

There actually is an option in Postgres to not call fsync. However your fear
is justified. If your file system can flush buffers to disk in a different
order than they were written (and most can) then it's possible for a database
with fsync off to become corrupted. Typical examples would be things like
records missing index pointers (or worse, index pointers to wrong records), or
duplicate or missing records (consider if an update is only partly written).

This is only an issue in the event of either a kernel crash or power failure
(whatever that means for a virtual machine...). In which case the only safe
course of action is to restore from backup. It's possible that in the context
of Amazon these would be rare enough events and restoring from backups easy
enough that that might be worth considering?

However a safer and more interesting option with Postgres 8.3 would be to
disable "synchronous_commit" and set a very large wal_writer_delay.
Effectively this would do the same thing, disabling fsync for every
transaction, but not risk the data integrity.

The default wal_writer_delay is 200ms meaning 5 fsyncs per second but you
could raise that substantially to get fewer fsyncs, possibly into the range of
minutes. If you raise it *too* far then you'll start observing fsyncs due to
processing being forced to flush dirty buffers before their changes have been
logged and synced. The only way to raise that would be to increase the
shared_buffers which would have complex effects.

You'll also have to look at the checkpoint_timeout and checkpoint_segments
parameters. And probably the bgwriter variables as well (lest it start trying
to flush buffers whose changes haven't been logged yet too).


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: turning fsync off for WAL

От
Greg Smith
Дата:
On Mon, 2 Jun 2008, Ram Ravichandran wrote:

> My current plan is to mount an Amazon S3 bucket as a drive using
> PersistentFS which is a POSIX-compliant file system.

Are you sure this will work correctly for database use at all?  The known
issue listed at http://www.persistentfs.com/documentation/Release_Notes
sounded like a much bigger consistancy concern than the fsync trivia
you're bringing up:

"In the current Technology Preview release, any changes to an open file's
meta data are not saved to S3 until the file is closed. As a result, if
PersistentFS or the system crashes while writing a file, it is possible
for the file size in the file's directory entry to be greater than the
actual number of file blocks written to S3..."

This sounds like you'll face potential file corruption every time the
database goes down for some reason, on whatever database files happen to
be open at the time.

Given the current state of EC2, I don't know why you'd take this approach
instead of just creating an AMI to install the database into.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: turning fsync off for WAL

От
"Ram Ravichandran"
Дата:

Are you sure this will work correctly for database use at all?  The known issue listed at http://www.persistentfs.com/documentation/Release_Notes sounded like a much bigger consistancy concern than the fsync trivia you're bringing up:

"In the current Technology Preview release, any changes to an open file's meta data are not saved to S3 until the file is closed. As a result, if PersistentFS or the system crashes while writing a file, it is possible for the file size in the file's directory entry to be greater than the actual number of file blocks written to S3..."

This sounds like you'll face potential file corruption every time the database goes down for some reason, on whatever database files happen to be open at the time.

Given the current state of EC2, I don't know why you'd take this approach instead of just creating an AMI to install the database into.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


The problem that I am facing is that EC2 has no persistent storage (at least currently). So, if the server restarts for some reason, all data on the local disks are gone. The idea was to store the tables on the non-persistent local disk, and do the WAL on to an S3 mounted drive. If the server goes down for some reason, I was hoping to recover by replaying the WAL. I was hoping that by faking the fsyncs, I would not incur the actual charges from Amazon until the file system writes into S3.
Also, since WAL is on a separate FS, it will not affect my disk-write rates. 

Ram





Re: turning fsync off for WAL

От
"Ram Ravichandran"
Дата:

Wow, this is a fascinating situation. Are you sure the fsyncs are the only
thing to worry about though? Postgres will call write(2) many times even if
you disabled fsync entirely. Surely the kernel and filesystem will eventually
send some of them through even if no fsyncs arrive?

Given that I am only worried about WAL being persistent, are these other issues 
still pertinent? I am sorry I am such a newbie.
 

Is it only fsyncs on the write-ahead-log that matter? Or on the data as well?
Checkpoints fsync the data files. The logs are fsynced on every commit and
also whenever a buffer has to be flushed if the logs for the last changes in
that buffer haven't been synced yet.

I was talking only of WAL. Basically, I am just trying to make sure if my EC2 instance goes down, 
I will be able to recover by replaying my write-ahead-logs. I am assuming checkpoints are for the 
actual tables on the disk (And not for logging / backup). Am I correct?



There actually is an option in Postgres to not call fsync. However your fear
is justified. If your file system can flush buffers to disk in a different
order than they were written (and most can) then it's possible for a database
with fsync off to become corrupted. Typical examples would be things like
records missing index pointers (or worse, index pointers to wrong records), or
duplicate or missing records (consider if an update is only partly written).

This is only an issue in the event of either a kernel crash or power failure
(whatever that means for a virtual machine...). In which case the only safe
course of action is to restore from backup. It's possible that in the context
of Amazon these would be rare enough events and restoring from backups easy
enough that that might be worth considering?

However a safer and more interesting option with Postgres 8.3 would be to
disable "synchronous_commit" and set a very large wal_writer_delay.
Effectively this would do the same thing, disabling fsync for every
transaction, but not risk the data integrity.

The default wal_writer_delay is 200ms meaning 5 fsyncs per second but you
could raise that substantially to get fewer fsyncs, possibly into the range of
minutes. If you raise it *too* far then you'll start observing fsyncs due to
processing being forced to flush dirty buffers before their changes have been
logged and synced. The only way to raise that would be to increase the
shared_buffers which would have complex effects.

This seems like a much better idea. So, I should 
a) disable synchronous_commit 
b) set wal_writer_delay to say 1 minute (and leave fsync on)
c) symlink pg_xlog to the PersistentFS on S3.

If there is a crash, I should be able to restore entirely from the WAL logs. Although, doesn't 
this have the same problem as disabling the fsyncs? 

BTW, if the wal_writer_delay is too long, then the fsyncs to flush dirty buffers would also fsync the
WAL right? Is that bad (as far as data integrity), or is it just that the fsyncs would be more frequent?

Thanks everyone for all the help.

Ram


Re: turning fsync off for WAL

От
Simon Riggs
Дата:
On Tue, 2008-06-03 at 00:04 -0400, Ram Ravichandran wrote:

> This seems like a much better idea. So, I should
> a) disable synchronous_commit
> b) set wal_writer_delay to say 1 minute (and leave fsync on)
> c) symlink pg_xlog to the PersistentFS on S3.
>

a) sounds good. b) has a max setting of 10 seconds, which I think is a
realistic maximum in this case also.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: turning fsync off for WAL

От
Gregory Stark
Дата:
"Ram Ravichandran" <ramkaka@gmail.com> writes:

> The problem that I am facing is that EC2 has no persistent storage (at least
> currently). So, if the server restarts for some reason, all data on the
> local disks are gone. The idea was to store the tables on the non-persistent
> local disk, and do the WAL on to an S3 mounted drive. If the server goes
> down for some reason, I was hoping to recover by replaying the WAL. I was
> hoping that by faking the fsyncs, I would not incur the actual charges from
> Amazon until the file system writes into S3.
> Also, since WAL is on a separate FS, it will not affect my disk-write
> rates.

Ahh. I think you can use this effectively but not the way you're describing.

Instead of writing the wal directly to persistentFS what I think you're better
off doing is treating persistentFS as your backup storage. Use "Archiving" as
described here to archive the WAL files to persistentFS:

http://postgresql.com.cn/docs/8.3/static/runtime-config-wal.html#GUC-ARCHIVE-MODE

Then if your database goes down you'll have to restore from backup (stored in
persistentFS) and then run recovery from the archived WAL files (from
persistentFS) and be back up.

You will lose any transactions which haven't been archived yet but you can
control how many transactions you're at risk of losing versus how much you pay
for all the "puts". The more "puts" the fewer transactions you'll be putting
at risk but the more you'll pay.

You can also trade off paying for more frequent "puts" of hot backup images
(make sure to read how to use pg_start_backup() properly) against longer
recovery times. TANSTAAFL :(

If you do this then you may as well turn fsync off on the server since you're
resigned to having to restore from backup on a server crash anyways...

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: turning fsync off for WAL

От
"Ram Ravichandran"
Дата:

Ahh. I think you can use this effectively but not the way you're describing.

Instead of writing the wal directly to persistentFS what I think you're better
off doing is treating persistentFS as your backup storage. Use "Archiving" as
described here to archive the WAL files to persistentFS:

http://postgresql.com.cn/docs/8.3/static/runtime-config-wal.html#GUC-ARCHIVE-MODE

Looks like this is the best solution.

Thanks,

Ram