Обсуждение: size of bytea + performance issues
Hi everyone, my first question concerns the the size of a table with a bytea row. In the documentation it says something like 4 Bytes + 1 Byte for each escaped octet sequence per row. For example an insertion into a table storing just one column with bytea data looks like this: insert into test values (''\\003''); ---- this allows 185 inserts per page until a new one is needed insert into test values (''\\003\\123\\123\\111''); ---- this yields exactly the same nr. of rows per page! insert into test values (''\\003\\123\\123\\111\\001''); ---- this one finally needs more pages! How can that be, if the system internally allocates ONE BYTE per octet... Does it acually take 4 Bytes? My second question is more generall: My dbms (acutally it is supposed to be a decision support system, so I do not really need rollbacks and transactions etc. --- can those features be turned off to enhance performance?) needs to store 1 Billion rows in a single table (I know that I could use horizontal partitioning) and I wonder if postgres is powerfull enough to handle such large tables? If anyone has experience with tables this size and could give me a hint which system to use (DB2, Sybase, Oracle, Informix, Postgres) this would be great... THX tschak
On Tue, Jan 31, 2006 at 07:58:30AM -0800, tschak wrote: > Hi everyone, > > my first question concerns the the size of a table with a bytea row. In > the documentation it says something like 4 Bytes + 1 Byte for each > escaped octet sequence per row. For example an insertion into a table > storing just one column with bytea data looks like this: > insert into test values (''\\003''); ---- this allows 185 inserts per > page until a new one is needed > insert into test values (''\\003\\123\\123\\111''); ---- this yields > exactly the same nr. of rows per page! > insert into test values (''\\003\\123\\123\\111\\001''); ---- this one > finally needs more pages! > > How can that be, if the system internally allocates ONE BYTE per > octet... > Does it acually take 4 Bytes? Alignment. A new row needs to start on a multiple of 4 (or 8) boundary. So even though your data value might take 5 bytes, you may end up with some slack before the next tuple. > My second question is more generall: > My dbms (acutally it is supposed to be a decision support system, so I > do not really need rollbacks and transactions etc. --- can those > features be turned off to enhance performance?) needs to store 1 Well, you might need to take this up on the -performance list, but there are people running databases that large. In general such features can't be turned off, though they don't really cost much on data that doesn't change. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
Allright... do you mean an alignment with an offset of 4/8 bit or byte? If it is just bit I cannot really follow the calculation... Nevertheless it sounds like an explanation for this "effect". I have one more question concerning size. The following table create table test(val0 unsigned smallint, ... ... ... .. ..., val7 unsigned smallint, bulk bytea); needs app. 100 Bytes per row even though the user data only uses 8*2Bytes (vali) + 40Bytes (the sizye of my bytea). Does that mean, that the system needs 46 bytes for internal represantationsor is the alignment factor a again a source of this "blowup"? Thanks for your help, Tschak
On Tue, Jan 31, 2006 at 09:15:18AM -0800, tschak wrote: > Allright... do you mean an alignment with an offset of 4/8 bit or byte? > If it is just bit I cannot really follow the calculation... > Nevertheless it sounds like an explanation for this "effect". > > I have one more question concerning size. The following table > create table test(val0 unsigned smallint, > ... ... ... .. ..., val7 unsigned smallint, > bulk bytea); > > needs app. 100 Bytes per row even though the user data only uses > 8*2Bytes (vali) + 40Bytes (the sizye of my bytea). Does that mean, that > the system needs 46 bytes for internal represantationsor is the > alignment factor a again a source of this "blowup"? Well, it's in the FAQ under "How much database disk space is required to store data from a typical text file?" but the per tuple overhead is between 36 and 44 bytes. depends a bit on the version. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
> Well, it's in the FAQ under "How much database disk space is required > to store data from a typical text file?" but the per tuple overhead is > between 36 and 44 bytes. depends a bit on the version. allright... thx a lot! I did not see this point in the FAQ because I am not storing data from a textfile and did not look in that direction... What store is basically a encoding scheme of several hash bins which store distances. Thus it is just a bitvector and to the best of my knowledge a bytea or varyiing bitarray is the most space efficient method to store my information(???). Thanks again, Tschak
On Tue, Jan 31, 2006 at 08:15:44PM +0100, Jochen Schlosser wrote: > > Well, it's in the FAQ under "How much database disk space is required > > to store data from a typical text file?" but the per tuple overhead is > > between 36 and 44 bytes. depends a bit on the version. > > allright... thx a lot! > I did not see this point in the FAQ because I am not storing data from > a textfile and did not look in that direction... What store is > basically a encoding scheme of several hash bins which store > distances. Thus it is just a bitvector and to the best of my knowledge > a bytea or varyiing bitarray is the most space efficient method to > store my information(???). It is if it's truely variable in length. If you know it's limited to say 4 bytes, you'd probably be better off with an int4, which doesn't have the varlena overhead (4 bytes) that a bytea does. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461