Обсуждение: TOAST, large objects and ACIDity
Hi all, I want to use postgres to store data and large files, typically audio files from 100ko to 20Mo. For those files, I just need to store et retrieve them, in an ACID way. (I don't need search, or substring, or others functionnalities) I saw postgres offers at least 2 method : bytea column with TOAST, or large objects API. I wonder what are the differences of the 2 methods. * I found that large objects need a vacuum after delete to really release place. That may be handled by a trigger or automatic vaccum, is it right ? * Large objects are used via a special API available in libpq C client library. * I really care keeping my transaction fully ACID. Documentation on large objects doesn't explicitly say if lo_import an lo_export (and other primitives) are fully ACID. Some ideas ? * I going to bench insertion and read with this 2 methods. other advices are wellcome, particularly about integration of TOAST or large objects with pgclusters. thanks Benoit
On 7/10/07, Benoit Mathieu <benoit.mathieu@mist-technologies.com> wrote: > I saw postgres offers at least 2 method : bytea column with TOAST, or > large objects API. From the documentation: > All large objects are placed in a single system table called pg_largeobject. > PostgreSQL also supports a storage system called "TOAST" that automatically > stores values larger than a single database page into a secondary storage area > per table. This makes the large object facility partially obsolete. One remaining > advantage of the large object facility is that it allows values up to 2 GB in size, > whereas TOASTed fields can be at most 1 GB. Also, large objects can be > randomly modified using a read/write API that is more efficient than performing > such operations using TOAST. My take: Stick with TOAST unless you need fast random access. TOAST is faster, more consistently supported (eg., in Slony) and easier to work with. If you need fast random access, use the file system. Will you ever be updating parts of the contents of a single file, or will you be writing each file just once? If the latter, writing your own robust, ACID-compliant file storage is fairly trivial. > * I really care keeping my transaction fully ACID. Documentation on > large objects doesn't explicitly say if lo_import an lo_export (and > other primitives) are fully ACID. Some ideas ? Since the lo_* functions are implemented as relational operations on the internal large object table, they're ACID-compliant. Alexander.
On Tue, 10 Jul 2007, Alexander Staubo wrote: > My take: Stick with TOAST unless you need fast random access. TOAST > is faster, more consistently supported (eg., in Slony) and easier > to work with. Toasted bytea columns have some other disadvantages also: 1. It is impossible to create its value in chunks - it means that you'll need to create one very big "insert" with file contents. If your client library does not support binary arguments to prepared statements you'll need to escape this data, which makes it several times bigger (I think it could be 4 times bigger). For parsing and saving this a server would need several copies of this data (I think a server would need at least 10 times more memory than binary data size). If I'm not mistaken it means that for saving a 20MB data file a server would need at least 200MB of memory - and this is a huge amount. Also a client would need over 100MB. I've worked around this with a temporary table: create temporary table chunks ( chunk_nr int primary key, chunk_data bytea not null ); and an aggregate: create aggregate bytea_accum( sfunc = byteacat, basetype = bytea, stype = bytea, initcond = '' ); I put 1MB chunks into this "chunks" table and then do: insert into attachments (filename, filedata) select ('filename.txt', bytea_accum(chunk_data) from ( select chunk_data from chunks order by chunk_nr ) I've proposed that it would be easier if there was a memory-efficient function: bytea_from_lo(oid lo_id) returns bytea But there was no feedback. 2. Also there's the same problem when reading this bytea data. I'm also doing this in chunks - I've set on this table external storage: alter table attachments alter column filedata set storage external; If this is set then a function substring(filedata from [offset] for [chunk_size]) is efficient. ================ In case of large objects you'll not be able to enforce constraints in database, for example: - you will be able to delete lo which is referenced elsewhere; - you won't be able to limit lo size; - I think that you'll not be able to limit access to lo; - you will be able to delete a reference to a lo without deleting this object (you can use contrib/vacuum_lo for garbage collecting though). Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh