Обсуждение: size of bytea + performance issues

Поиск
Список
Период
Сортировка

size of bytea + performance issues

От
"tschak"
Дата:
Hi everyone,

my first question concerns the the size of a table with a bytea row. In
the documentation it says something like 4 Bytes + 1 Byte for each
escaped octet sequence per row. For example an insertion into a table
storing just one column with bytea data looks like this:
insert into test values (''\\003'');  ---- this allows 185 inserts per
page until a new one is needed
insert into test values (''\\003\\123\\123\\111''); ---- this yields
exactly the same nr. of rows per page!
insert into test values (''\\003\\123\\123\\111\\001''); ---- this one
finally needs more pages!

How can that be, if the system internally allocates ONE BYTE per
octet...
Does it acually take 4 Bytes?

My second question is more generall:
My dbms (acutally it is supposed to be a decision support system, so I
do not really need rollbacks and transactions etc. --- can those
features be turned off to enhance performance?) needs to store 1
Billion rows in a single table (I know that I could use horizontal
partitioning) and I wonder if postgres is powerfull enough to handle
such large tables? If anyone has experience with tables this size and
could give me a hint which system to use (DB2, Sybase, Oracle,
Informix, Postgres) this would be great...


THX

tschak


Re: size of bytea + performance issues

От
Martijn van Oosterhout
Дата:
On Tue, Jan 31, 2006 at 07:58:30AM -0800, tschak wrote:
> Hi everyone,
>
> my first question concerns the the size of a table with a bytea row. In
> the documentation it says something like 4 Bytes + 1 Byte for each
> escaped octet sequence per row. For example an insertion into a table
> storing just one column with bytea data looks like this:
> insert into test values (''\\003'');  ---- this allows 185 inserts per
> page until a new one is needed
> insert into test values (''\\003\\123\\123\\111''); ---- this yields
> exactly the same nr. of rows per page!
> insert into test values (''\\003\\123\\123\\111\\001''); ---- this one
> finally needs more pages!
>
> How can that be, if the system internally allocates ONE BYTE per
> octet...
> Does it acually take 4 Bytes?

Alignment. A new row needs to start on a multiple of 4 (or 8) boundary.
So even though your data value might take 5 bytes, you may end up with
some slack before the next tuple.

> My second question is more generall:
> My dbms (acutally it is supposed to be a decision support system, so I
> do not really need rollbacks and transactions etc. --- can those
> features be turned off to enhance performance?) needs to store 1

Well, you might need to take this up on the -performance list, but
there are people running databases that large. In general such features
can't be turned off, though they don't really cost much on data that
doesn't change.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Вложения

Re: size of bytea + performance issues

От
"tschak"
Дата:
Allright... do you mean an alignment with an offset of 4/8 bit or byte?
If it is just bit I cannot really follow the calculation...
Nevertheless it sounds like an explanation for this "effect".

I have one more question concerning size. The following table
create table test(val0 unsigned smallint,
... ... ... .. ..., val7 unsigned smallint,
bulk bytea);

needs app. 100 Bytes per row even though the user data only uses
8*2Bytes (vali) + 40Bytes (the sizye of my bytea). Does that mean, that
the system needs 46 bytes for internal represantationsor is the
alignment factor a again a source of this "blowup"?

Thanks for your help,
Tschak


Re: size of bytea + performance issues

От
Martijn van Oosterhout
Дата:
On Tue, Jan 31, 2006 at 09:15:18AM -0800, tschak wrote:
> Allright... do you mean an alignment with an offset of 4/8 bit or byte?
> If it is just bit I cannot really follow the calculation...
> Nevertheless it sounds like an explanation for this "effect".
>
> I have one more question concerning size. The following table
> create table test(val0 unsigned smallint,
> ... ... ... .. ..., val7 unsigned smallint,
> bulk bytea);
>
> needs app. 100 Bytes per row even though the user data only uses
> 8*2Bytes (vali) + 40Bytes (the sizye of my bytea). Does that mean, that
> the system needs 46 bytes for internal represantationsor is the
> alignment factor a again a source of this "blowup"?

Well, it's in the FAQ under "How much database disk space is required
to store data from a typical text file?" but the per tuple overhead is
between 36 and 44 bytes. depends a bit on the version.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Вложения

Re: size of bytea + performance issues

От
Jochen Schlosser
Дата:
> Well, it's in the FAQ under "How much database disk space is required
> to store data from a typical text file?" but the per tuple overhead is
> between 36 and 44 bytes. depends a bit on the version.

allright... thx a lot!
I did not see this point in the FAQ because I am not storing data from
a textfile and did not look in that direction... What store is
basically a encoding scheme of several hash bins which store
distances. Thus it is just a bitvector and to the best of my knowledge
a bytea or  varyiing bitarray is the most space efficient method to
store my information(???).

Thanks again,

Tschak

Re: size of bytea + performance issues

От
"Jim C. Nasby"
Дата:
On Tue, Jan 31, 2006 at 08:15:44PM +0100, Jochen Schlosser wrote:
> > Well, it's in the FAQ under "How much database disk space is required
> > to store data from a typical text file?" but the per tuple overhead is
> > between 36 and 44 bytes. depends a bit on the version.
>
> allright... thx a lot!
> I did not see this point in the FAQ because I am not storing data from
> a textfile and did not look in that direction... What store is
> basically a encoding scheme of several hash bins which store
> distances. Thus it is just a bitvector and to the best of my knowledge
> a bytea or  varyiing bitarray is the most space efficient method to
> store my information(???).

It is if it's truely variable in length. If you know it's limited to say
4 bytes, you'd probably be better off with an int4, which doesn't have
the varlena overhead (4 bytes) that a bytea does.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461