Обсуждение: TOAST, large objects and ACIDity

Поиск
Список
Период
Сортировка

TOAST, large objects and ACIDity

От
Benoit Mathieu
Дата:
Hi all,

I want to use postgres to store data and large files, typically audio
files from 100ko to 20Mo. For those files, I just need to store et
retrieve them, in an ACID way. (I don't need search, or substring, or
others functionnalities)
I saw postgres offers at least 2 method : bytea column with TOAST, or
large objects API.

I wonder what are the differences of the 2 methods.
* I found that large objects need a vacuum after delete to really
release place. That may be handled by a trigger or automatic vaccum, is
it right ?
* Large objects are used via a special API available in libpq C client
library.
* I really care keeping my transaction fully ACID. Documentation on
large objects doesn't explicitly say if lo_import an lo_export (and
other primitives) are fully ACID. Some ideas ?
* I going to bench insertion and read with this 2 methods.

other advices are wellcome, particularly about integration of TOAST or
large objects with pgclusters.

thanks

Benoit

Re: TOAST, large objects and ACIDity

От
"Alexander Staubo"
Дата:
On 7/10/07, Benoit Mathieu <benoit.mathieu@mist-technologies.com> wrote:
> I saw postgres offers at least 2 method : bytea column with TOAST, or
> large objects API.

From the documentation:

> All large objects are placed in a single system table called pg_largeobject.
> PostgreSQL also supports a storage system called "TOAST" that automatically
> stores values larger than a single database page into a secondary storage area
> per table. This makes the large object facility partially obsolete. One remaining
> advantage of the large object facility is that it allows values up to 2 GB in size,
> whereas TOASTed fields can be at most 1 GB. Also, large objects can be
> randomly modified using a read/write API that is more efficient than performing
> such operations using TOAST.

My take: Stick with TOAST unless you need fast random access. TOAST is
faster, more consistently supported (eg., in Slony) and easier to work
with.

If you need fast random access, use the file system. Will you ever be
updating parts of the contents of a single file, or will you be
writing each file just once? If the latter, writing your own robust,
ACID-compliant file storage is fairly trivial.

> * I really care keeping my transaction fully ACID. Documentation on
> large objects doesn't explicitly say if lo_import an lo_export (and
> other primitives) are fully ACID. Some ideas ?

Since the lo_* functions are implemented as relational operations on
the internal large object table, they're ACID-compliant.

Alexander.

Re: TOAST, large objects and ACIDity

От
Tomasz Ostrowski
Дата:
On Tue, 10 Jul 2007, Alexander Staubo wrote:

> My take: Stick with TOAST unless you need fast random access. TOAST
> is faster, more consistently supported (eg., in Slony) and easier
> to work with.

Toasted bytea columns have some other disadvantages also:

1.

It is impossible to create its value in chunks - it means that
you'll need to create one very big "insert" with file contents. If
your client library does not support binary arguments to prepared
statements you'll need to escape this data, which makes it several
times bigger (I think it could be 4 times bigger). For parsing and
saving this a server would need several copies of this data (I think
a server would need at least 10 times more memory than binary data
size).

If I'm not mistaken it means that for saving a 20MB data file a
server would need at least 200MB of memory - and this is a huge
amount. Also a client would need over 100MB.


I've worked around this with a temporary table:
    create temporary table chunks (
        chunk_nr int primary key,
        chunk_data bytea not null
    );
and an aggregate:
    create aggregate bytea_accum(
        sfunc = byteacat,
        basetype = bytea,
        stype = bytea,
        initcond = ''
    );
I put 1MB chunks into this "chunks" table and then do:
    insert into attachments
        (filename, filedata)
    select
        ('filename.txt', bytea_accum(chunk_data)
    from (
        select chunk_data
        from chunks
        order by chunk_nr
    )

I've proposed that it would be easier if there was a memory-efficient
function:
    bytea_from_lo(oid lo_id) returns bytea
But there was no feedback.


2.

Also there's the same problem when reading this bytea data. I'm also doing
this in chunks - I've set on this table external storage:
    alter table attachments alter column filedata set storage external;
If this is set then a function
    substring(filedata from [offset] for [chunk_size])
is efficient.


================

In case of large objects you'll not be able to enforce constraints in
database, for example:
- you will be able to delete lo which is referenced elsewhere;
- you won't be able to limit lo size;
- I think that you'll not be able to limit access to lo;
- you will be able to delete a reference to a lo without
  deleting this object (you can use contrib/vacuum_lo for garbage
  collecting though).

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh