Обсуждение: optimization by removing the file system layer?

Поиск
Список
Период
Сортировка

optimization by removing the file system layer?

От
Mark Stier
Дата:
Hi,

I just wonder if it would be possible to speed up postgresql a bit if it
wouldn't use files but partitions for data storage.

That could also lead to the next generation of highly complex file systems that
more seamlessly integrate into grpahical UIs.

Regards,

Mark

p.s.: please don't hang me high for this one :-)

Re: optimization by removing the file system layer?

От
Jurgen Defurne
Дата:
Mark Stier wrote:

> Hi,
>
> I just wonder if it would be possible to speed up postgresql a bit if it
> wouldn't use files but partitions for data storage.
>
> That could also lead to the next generation of highly complex file systems that
> more seamlessly integrate into grpahical UIs.
>
> Regards,
>
> Mark
>
> p.s.: please don't hang me high for this one :-)

I think that the Un*x filesystem is one of the reasons that large database vendors

rather use raw devices, than filesystem storage files.

Using a raw device on the disk gives them the possibility to have complete control

over their files, indices and objects without being bothered by the operating
system. This speeds up things in several ways :
- the least possible OS intervention
- choose block sizes according to applications
- reducing fragmentation
- packing data in nearby cilinders
- Anyone other ideas -> the sky is the limit here

It also aids portability, at least on platforms that have an equivalent of a raw
device.

It is also independent of the standard implemented Un*x filesystems, for which you

will have to pay extra if you want to take extra measures against power loss.

The problem with e.g. e2fs, is that it is not robust enough if a CPU fails. This
is due
to the memory caching of i-nodes. For this the program update runs every (what?)
seconds,
but this is no true panacea. Of course, there are other filesystems for Linux, but
I
won't go out hunting for them and installing them. I have other things to do.

With the above scheme, the database vendor can design the internal file system, so
that there
is no data loss in case of a power failure.

Jurgen Defurne
defurnj@glo.be



Re: optimization by removing the file system layer?

От
Giles Lean
Дата:

> I think that the Un*x filesystem is one of the reasons that large
> database vendors rather use raw devices, than filesystem storage
> files.

This used to be the preference, back in the late 80s and possibly
early 90s.  I'm seeing a preference toward using the filesystem now,
possibly with some sort of async I/O and co-operation from the OS
filesystem about interactions with the filesystem cache.

Performance preferences don't stand still.  The hardware changes, the
software changes, the volume of data changes, and different solutions
become preferable.

> Using a raw device on the disk gives them the possibility to have
> complete control over their files, indices and objects without being
> bothered by the operating system.
>
> This speeds up things in several ways :
> - the least possible OS intervention

Not that this is especially useful, necessarily.  If the "raw" device
is in fact managed by a logical volume manager doing mirroring onto
some sort of storage array there is still plenty of OS code involved.

The cost of using a filesystem in addition may not be much if anything
and of course a filesystem is considerably more flexible to
administer (backup, move, change size, check integrity, etc.)

> - choose block sizes according to applications
> - reducing fragmentation
> - packing data in nearby cilinders

... but when this storage area is spread over multiple mechanisms in a
smart storage array with write caching, you've no idea what is where
anyway.  Better to let the hardware or at least the OS manage this;
there are so many levels of caching between a database and the
magnetic media that working hard to influence layout is almost
certainly a waste of time.

Kirk McKusick tells a lovely story that once upon a time it used to be
sensible to check some registers on a particular disk controller to
find out where the heads were when scheduling I/O.  Needless to say,
that is history now!

There's a considerable cost in complexity and code in using "raw"
storage too, and it's not a one off cost: as the technologies change,
the "fast" way to do things will change and the code will have to be
updated to match.  Better to leave this to the OS vendor where
possible, and take advantage of the tuning they do.

> - Anyone other ideas -> the sky is the limit here

> It also aids portability, at least on platforms that have an
> equivalent of a raw device.

I don't understand that claim.  Not much is portable about raw
devices, and they're typically not nearlly as well documented as the
filesystem interfaces.

> It is also independent of the standard implemented Un*x filesystems,
> for which you will have to pay extra if you want to take extra
> measures against power loss.

Rather, it is worse.  With a Unix filesystem you get quite defined
semantics about what is written when.

> The problem with e.g. e2fs, is that it is not robust enough if a CPU
> fails.

ext2fs doesn't even claim to have Unix filesystem semantics.

Regards,

Giles