Обсуждение: Filesystem options for storing pg_data

Поиск
Список
Период
Сортировка

Filesystem options for storing pg_data

От
Joe Maldonado
Дата:
Hello all,

I am in a position where I'm torn between using ext2 vs ext3 to keep the
pg_data, pg_xlog, and pg_clog contents.

The main concern is that switching to ext2 will not respond well to an
improper shutdown, power loss.

My question is what is the prefered filesystem to keep this data to be
able to optimize performance and still have some fault tolerance.

-Joe

Re: Filesystem options for storing pg_data

От
Scott Marlowe
Дата:
On Wed, 2005-04-20 at 11:07, Joe Maldonado wrote:
> Hello all,
>
> I am in a position where I'm torn between using ext2 vs ext3 to keep the
> pg_data, pg_xlog, and pg_clog contents.
>
> The main concern is that switching to ext2 will not respond well to an
> improper shutdown, power loss.
>
> My question is what is the prefered filesystem to keep this data to be
> able to optimize performance and still have some fault tolerance.

Generally XFS and JFS are considered superior to ext2/3.

ext3, in my experience, isn't much slower than ext2.  Plus the decreased
time required to bring up a server after a power outage is worth
something too.

Having used ext3 quite a bit, I'd say it's fairly stable and reliable,
but I have seen references here to know, possibly unfixable bugs.

I've used XFS a few years back, and there was no great gain for what we
were doing at the time, as we were CPU, not I/O bound.

Re: Filesystem options for storing pg_data

От
Scott Marlowe
Дата:
On Wed, 2005-04-20 at 11:18, Scott Marlowe wrote:
> On Wed, 2005-04-20 at 11:07, Joe Maldonado wrote:
> > Hello all,
> >
> > I am in a position where I'm torn between using ext2 vs ext3 to keep the
> > pg_data, pg_xlog, and pg_clog contents.
> >
> > The main concern is that switching to ext2 will not respond well to an
> > improper shutdown, power loss.
> >
> > My question is what is the prefered filesystem to keep this data to be
> > able to optimize performance and still have some fault tolerance.
>
> Generally XFS and JFS are considered superior to ext2/3.
>
> ext3, in my experience, isn't much slower than ext2.  Plus the decreased
> time required to bring up a server after a power outage is worth
> something too.
>
> Having used ext3 quite a bit, I'd say it's fairly stable and reliable,
> but I have seen references here to know, possibly unfixable bugs.
>
> I've used XFS a few years back, and there was no great gain for what we
> were doing at the time, as we were CPU, not I/O bound.

Oh, and if you use ext3, definitely turn off atime (use the noatime
option at mount time)

Re: Filesystem options for storing pg_data

От
Marco Colombo
Дата:
[I've got a private reply from Scott, which I won't quote here, which
can be fairly (I hope) summarized as "search the pgsql-performance
list". Well, I've done it, and I feel it's due to bring the issue
back in public. So if I seems I'm replying to myself, it's not,
I'm replying to Scott. I've realized the reply was private only
just before sending this out.]

> > On Wed, 2005-04-20 at 12:07, Marco Colombo wrote:
> > > On Wed, 2005-04-20 at 11:18 -0500, Scott Marlowe wrote:
> > >
> > > Generally XFS and JFS are considered superior to ext2/3.
> >
> > Do you mind posting a reference? I'm really interested in the comparison
> > but everytime I asked for a pointer, I got no valid resource, so far.
>
[...]

Well, my point being the ones I find lead to the conclusion that EXT3 is
"considered superior" to XFS and JFS. One for all:

http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html

"It's reassuring when various industry-standard benchmarks yield similar
results. In case you're wondering, I obtained similar results with
Benchmark Factory's other half dozen or so database benchmarks-so for
me, it'll be ext3."

Have a look at the graphs, EXT3 is almost twice as fast in these
(database) benchmarks.

Another one is:
http://www.kerneltraffic.org/kernel-traffic/kt20020401_160.html#8

Again ext3 is the winner (among journalled fs), but by a small edge
only. And again, there are a lot of variables. Using for example
data=journal with a big journal file on a different disk would
be extremely interesting, just as using a different disk for WALs
is at PostgreSQL level (the result might be the same).

All the other benchmarks I've found, with a simple search for
'filesystem benchmark' on the pgsql-performance list, either are
the usual Bonnie/iozone irrelevant benchmarks, or don't seem to care
to tune ext3 mount options and use the defaults (thus comparing apples
to oranges).

I'm not stating that EXT3 is better. My opinion on the matter is that
you shouldn't care about the filesystem much (EXT3, JFS, XFS being the
same for _most_ purposes with PostgreSQL). That is, it's a small little
spot in the big picture of performance tuning. You'd better look at the
big picture.

I'm only countering your claim:
"Generally XFS and JFS are considered superior to ext2/3".

There's no general agreement on the lists about that, so just handwaving
and saying "look at the lists" isn't enough. Mind posting a pointer
to _any_ serious PostegreSQL (or any database, at least) based
benchmark that consistently shows XFS and JFS as superior? One that
cares to show ext3/noatime/data=ordered,data=writeback,data=journal
results, too?

If I were to choose based on the results posted on the list (that I've
managed to find), ext3 would be the winner. Maybe I've missed something.

> > > Having used ext3 quite a bit, I'd say it's fairly stable and reliable,
> > > but I have seen references here to know, possibly unfixable bugs.
> >
> > Again, mind posting a reference?
>
[...]

I've searched for 'EXT3 bug' but got nothing. I'm (loosely) following
l-k, and never heard of "possibly unfixable bugs" in EXT3 by any
developer. Care to post any real reference? There have been bugs of
course, but that holds true for everything, XFS and JFS included.

Having re-read many many messages right now, I'm under a even stronger
impression that _all_ negative comments on both the stability and the
performance of EXT3 start with "I've heard that..." w/o almost noone
providing direct experience. Many comments display little understanding
of the subject: some don't know about data= mount option (there's little
point in comparing to XFS, if you don't use data=writeback), some have
misconceptions about what the option does, and what difference it makes
when the application keeps _syncing_ the files (I don't know well
either). See the data=journal case.

.TM.
--
      ____/  ____/   /
     /      /       /                   Marco Colombo
    ___/  ___  /   /                  Technical Manager
   /          /   /                      ESI s.r.l.
 _____/ _____/  _/                      Colombo@ESI.it


Re: Filesystem options for storing pg_data

От
Dawid Kuroczko
Дата:
On 4/21/05, Marco Colombo <pgsql@esiway.net> wrote:
> > > > Generally XFS and JFS are considered superior to ext2/3.
> > > Do you mind posting a reference? I'm really interested in the comparison
> > > but everytime I asked for a pointer, I got no valid resource, so far.
> Well, my point being the ones I find lead to the conclusion that EXT3 is
> "considered superior" to XFS and JFS. One for all:

First of all, my workload is not IO bound, so don't consider what
I write as solutions for IO heavy setups.

Personally I use ext3 (with ~128 KB per inode ratio, to save some
space and keep inodes more closely together), with noatime option.

I've tried JFS some time ago and got away from it soon after. The
reasons were that:
 1. JFS dynamic inode allocation left less free space for apps than
    ext3 (I usually decrease inode ratio to some reasonable limit
    (like 4 times current ratio for given directory set)).  (Yeah, not
    a serious issue, yet I admit I tend to consider it).
 2. FSCK.  Back then JFS had an ugly feature of mounting only
    'clean' filesystems, i.e. fsck had to be done in userspace
    (unlike ext3 which does it as a part of mount process).
    I don't know if it is still that way.
 3. Performance.  For my workload, mostly single threaded and
    bursty, ext3 appeared a bit faster.

Yet it was a good while ago, JFS might have changed a good bit
since then.  I have no experience with XFS, but I've heard a lot
of good about it.

> Again ext3 is the winner (among journalled fs), but by a small edge
> only. And again, there are a lot of variables. Using for example
> data=journal with a big journal file on a different disk would
> be extremely interesting, just as using a different disk for WALs
> is at PostgreSQL level (the result might be the same).

Some time ago I thought it could be nice thought experiment
to 'tune' ext3 for PostgreSQL needs.  (Mark WAL files for
immediate updates, journal other updates (filesize changes,
creations etc), and keep journal close to WAL files... ;)

> I'm not stating that EXT3 is better. My opinion on the matter is that
> you shouldn't care about the filesystem much (EXT3, JFS, XFS being the
> same for _most_ purposes with PostgreSQL). That is, it's a small little
> spot in the big picture of performance tuning. You'd better look at the
> big picture.
>
> I'm only countering your claim:
> "Generally XFS and JFS are considered superior to ext2/3".

You can certainly say that XFS/JFS are more complex and were
engineered to better handle high work load.  Ext3 is relatively
simple; and its simplicity may also be a big advantage when
handling high load.

Summary: I'm not arguing JFS/XFS are worser/same.  All I want to say
is that ext3 is a decent filesystem.  Ext3's greatest advantage, I guess,
is the ease of deployment -- it comes "out of the box" with most
distributions.  With a little tuning it can perform reasonably well for
most needs.

   Regards,
      Dawid

Re: Filesystem options for storing pg_data

От
Scott Marlowe
Дата:

Re: Filesystem options for storing pg_data

От
Marco Colombo
Дата:
On Thu, 21 Apr 2005, Scott Marlowe wrote:

> References:
>
> http://archives.postgresql.org/pgsql-performance/2005-01/msg00131.php
> http://archives.postgresql.org/pgsql-performance/2004-05/msg00130.php
> http://archives.postgresql.org/pgsql-performance/2003-08/msg00191.php
>
http://groups-beta.google.com/group/comp.os.linux.misc/msg/b299a71fd540c2b8?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&rnum=9
> http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf
> http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html
> http://jamesthornton.com/hotlist/linux-filesystems/
>
> It took me all of about 10 minutes to find all of those.  But I've got
> work to do, so I'll leave further research here to the rest of the list.

Thanks for your precious time, but when I say I searched the archives
I really mean it. If you cared to read _my_ message, I was looking for
any benchmark (or comment) under the following conditions:

1) PostgreSQL load - that is, a benchmarck based on PostgreSQL, or,
    alternatively, on another database, or on artificial write+fsync load.
    Any other (cached) write load is _meaningless_ to our purposes.
2) the author was aware of mount options, and actually used them.
    I think there's enough evidence that ext3 default mount options
    are on the safe side (_safer_ than other fses, it seems), so there's
    no point in comparing default ext3 alone (comparing all modes
    _is_ interesting, tho).

I've spend much more than 10 minutes of my time, and found nothing,
but the links that _I_ posted.

I'll invest more time, and comment on the links you posted
(which I had read already, of course):

http://archives.postgresql.org/pgsql-performance/2005-01/msg00131.php
it's not clear at all, it possibly fails both 1) and 2). The authors
says nothing about a write+fsync benchmark or about ext3 mount options.

http://archives.postgresql.org/pgsql-performance/2004-05/msg00130.php
that's the one I got Bert Scalzo's article from. Other links
fail to meet 1) and some 2). Note that fsync is likely to
disrupt most optimizations. The fact that a filesystem "scales better"
under normal (cached) load, means nothing when it comes to fsyncing.

http://archives.postgresql.org/pgsql-performance/2003-08/msg00191.php
this _defends_ ext2 from the accusation of being buggy. The author
prefers XFS, "but I only have fuzzy reasons, as opposed to metrics."
I was looking for metrics. It's says nothing about ext3, so does not
apply.

These are not from postgresql lists, but anyway:


http://groups-beta.google.com/group/comp.os.linux.misc/msg/b299a71fd540c2b8?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&rnum=9
"People are referring to the old ext2 filesystem here. The new ext3 is very
resistant to this issue."
If you're referring to what "Jinny" said, well all the evidence
is "...recently I have come to know from a reliable group that Linux
is not so stable". Does not meet 1) and 2), sorry.

http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf
Yes, surprisingly enough I've read this one, too. The only interesting
part is "[XFS] Perfomance features include asynchronous write ahead
logging (similar to Ext2 " - no, ext3 - " with data=writeback), ...". This
confirms my comment about comparing apples and oranges, and completely
justifies my requirement 2) - and comes from a XFS paper!
It's not clear at all if what they call OLTP Workload really performs
fsync after write. Anyway, there's only _one_ graph in the results
(how weird) and all filesystems are pretty close. No tests with
data=journal. All other graphs in the Appendix fail requirement 1).

http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html
thanks, this is the like that _I_ posted. Have _you_ read it?
It shows that EXT3 is almost twice as fast as JFS. Too bad there's no
XFS here.
BTW, this meets 1), I'm not sure about 2), but the options they used
seem enough to outperform JFS.

http://jamesthornton.com/hotlist/linux-filesystems/
this is just a collection of links. It's not clear which one would
back up your claim of XFS and JFS being _generally_ considered superior
for PostgreSQL or other database usage.

Let's see:
http://www-106.ibm.com/developerworks/linux/library/l-fs8.html
"data=ordered mode effectively solves the corruption problem found in
  data=writeback mode and _most other journaled filesystems_, and it does
  so without requiring full data journaling"

(emphasis mine) interesting enough, most journaled filesystems do have
a corruption problem, ext3 in default mode doesn't.
But this does not really apply to us, this refers to normal writes not
write+fsyncs. I think any fs has to be badly broken if it looses data
after fsycn, anyway.

http://www-106.ibm.com/developerworks/library/l-fs9.html
  "Other than that, XFS performance was very close to that of ReiserFS and
   generally surpasses that of ext3... "

uh, this sounds interesting... but wait...

  "... One of the nicest things about XFS is that, like ReiserFS, it doesn't
   generate a lot of unnecessary disk activity. XFS tries to cache as much
   data in memory as possible, and generally only writes things out to disk
   when memory pressure dictates that it do so."

so, if a benchmark shows XFS is faster, it's matter of better caching,
right? But it comes at a price of possible (data) corruption...
Thankfully, it's pretty useless to us, with every write followed by a sync.


I'm sorry, but with the links _you_ selected, applying my filter
conditions 1) and 2), which are necessary for a fair comparison,
one could say there's general consensus on EXT3 being far superior
to other filesystems, not the opposite.

Note that I'm not interested in supporting such a claim. As I already
wrote I think FS selection has generally a minimal impact on PostgreSQL
performance.

But again, what was you original claim
  "Generally XFS and JFS are considered superior to ext2/3."
based upon?

I apologize if I sound somehow harsh, it's not really intented.
But next time please assume that:
- I'm able to do a 10 minute search;
- I've got some work to do, too, but I'm willing so spend more than
   10 minutes on this research (it already took me more than 2 hours
   actually, of my spare time);
- if I say I've searched the lists and read many messages, I've
   really done so.

You're absolutely entitled to have your opinion, if you like XFS and
JFS go ahead and use them, because of their name, the names of their
sponsors (IBM and SGI), or their features, or your personal experience,
or whatever. Just please don't claim that's general consensus for the
pgsql lists. There's _no_ general consensus. There's _no_ clear winner.
And if you do want a winner anyway, it's ext3, so far.

This "ext3 is not good as XFS as JFS" is a recurring subject, as
long as "ext3 is buggy". _Every single time_ I've asked for
references to back up such claims, nothing valuable was presented.
On the contrary, the only references I've found are on the
"ext3 is equal or better" side.

Now, feel free to prove me wrong.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Filesystem options for storing pg_data

От
Scott Marlowe
Дата:
Whoa, hold on.  My original post was this:

QUOTE:

Generally XFS and JFS are considered superior to ext2/3.

ext3, in my experience, isn't much slower than ext2.  Plus the decreased
time required to bring up a server after a power outage is worth
something too.

Having used ext3 quite a bit, I'd say it's fairly stable and reliable,
but I have seen references here to know, possibly unfixable bugs.

I've used XFS a few years back, and there was no great gain for what we
were doing at the time, as we were CPU, not I/O bound.

ENDQUOTE:

So where do you get off saying I'm such a big fan of XFS and am trashing ext3.

You do the research, I'm tired of trying to have a civilized conversation with you.

If you wanna argue, go pay someone a quarter to do it, I'm done.