Обсуждение: Beginner Question:Why it always make sure that the postgres better than common csv file storage in disaster recovery?

Поиск
Список
Период
Сортировка
I am a student who are interesting in database kernel.When I am reviewing my database course,a question make me confused.

In file system,if a error happen when I insert some data into data saving system,the whole data exists will be broken and can't recovery anymore.

But when I check the code in postgres,I found the postgres also use the write function(That! is a UNIX file system api)

My question is:

Since it's all built on top of the file system,why it always make sure that the postgres better than common csv file storage in disaster recovery?

Thanks in advance!
On 7/3/22 20:06, Wen Yi wrote:
> I am a student who are interesting in database kernel.When I am 
> reviewing my database course,a question make me confused.
> 
> In file system,if a error happen when I insert some data into data 
> saving system,the whole data exists will be broken and can't recovery 
> anymore.
> 
> But when I check the code in postgres,I found the postgres also use the 
> write function(That! is a UNIX file system api)
> 
> My question is:
> 
> Since it's all built on top of the file system,why it always make sure 
> that the postgres better than common csv file storage in disaster recovery?

https://www.postgresql.org/docs/current/wal.html

> 
> Thanks in advance!


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Wen Yi <chuxuec@outlook.com> writes:
> Since it's all built on top of the file system,why it always make sure 
> that the postgres better than common csv file storage in disaster 
> recovery?

Sure, Postgres cannot be any more reliable than the filesystem it's
sitting on top of (nor the physical storage underneath that, etc etc).

However, if you're comparing to some program that just writes a
flat file in CSV format or the like, that program is probably
not even *trying* to offer reliable storage.  Some things that
are likely missing:

* POSIX-compatible file systems promise nothing about the durability
of data that hasn't been successfully fsync'd.  You need to issue
fsync's, and you need a plan about what to do if you crash between
writing some data and getting an fsync confirmation, because maybe
those bits are safely down on disk, or maybe they aren't, or maybe
just some of them are.

* If you did crash partway through an update, you'd like some
assurances that the user-visible state after recovery will be
what it was before starting the failed update.  That CSV-using
program probably isn't even trying to do that.  Getting back
to a consistent state after a crash typically involves some
scheme along the lines of replaying a write-ahead log.

* None of this is worth anything if you can't even tell the
difference between good data and bad data.  CSV is pretty low
on redundancy --- not as bad as some formats, sure, but it's far
from checkable.

There's more to it than that, but if there's not any attention
to crash recovery then it's not what I'd call a database.  The
filesystem alone won't promise much here.

            regards, tom lane