Обсуждение: varchar/char size
Does someone want to remind me why we allocate the full size for char() and varchar(), when we really can just allocate the size of the given string? I relize char() has to be padded, but why varchar()? In my experience, char() is full size as defined by create, and varchar() is the the size of the actual data in the field, like text, but with a pre-defined limit. -- Bruce Momjian maillist@candle.pha.pa.us
On Wed, 7 Jan 1998, Bruce Momjian wrote: > In my experience, char() is full size as defined by create, and > varchar() is the the size of the actual data in the field, like text, > but with a pre-defined limit. Can you remind me what the difference is between text and varchar? Why would you use varchar over text? Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> > On Wed, 7 Jan 1998, Bruce Momjian wrote: > > > In my experience, char() is full size as defined by create, and > > varchar() is the the size of the actual data in the field, like text, > > but with a pre-defined limit. > > Can you remind me what the difference is between text and varchar? Why > would you use varchar over text? Only because SQL people are used to varchar, and not text, and sometimes people want to have a maximum size if they are displaying this data in a form that is only of limited size. -- Bruce Momjian maillist@candle.pha.pa.us
> Does someone want to remind me why we allocate the full size for char() > and varchar(), when we really can just allocate the size of the given > string? > I relize char() has to be padded, but why varchar()? > In my experience, char() is full size as defined by create, and > varchar() is the the size of the actual data in the field, like text, > but with a pre-defined limit. Well, in many relational databases access can be optimized by having fixed-length tuple storage structures. Also, it allows re-use of deleted space in storage pages. It may be that neither of these points have any bearing on Postgres, and never will, but unless that clearly the case then I would be inclined to keep the storage scheme as it is currently. - Tom
> > > Does someone want to remind me why we allocate the full size for char() > > and varchar(), when we really can just allocate the size of the given > > string? > > I relize char() has to be padded, but why varchar()? > > > In my experience, char() is full size as defined by create, and > > varchar() is the the size of the actual data in the field, like text, > > but with a pre-defined limit. > > Well, in many relational databases access can be optimized by having > fixed-length tuple storage structures. Also, it allows re-use of deleted > space in storage pages. It may be that neither of these points have any > bearing on Postgres, and never will, but unless that clearly the case then > I would be inclined to keep the storage scheme as it is currently. With Ingres and Informix char() is fixed size, while varchar() is VARiable size. -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > > > > Does someone want to remind me why we allocate the full size for char() > > > and varchar(), when we really can just allocate the size of the given > > > string? > > > I relize char() has to be padded, but why varchar()? > > > > > In my experience, char() is full size as defined by create, and > > > varchar() is the the size of the actual data in the field, like text, > > > but with a pre-defined limit. > > > > Well, in many relational databases access can be optimized by having > > fixed-length tuple storage structures. Also, it allows re-use of deleted > > space in storage pages. It may be that neither of these points have any > > bearing on Postgres, and never will, but unless that clearly the case then > > I would be inclined to keep the storage scheme as it is currently. > > With Ingres and Informix char() is fixed size, while varchar() is > VARiable size. Go for it. Let me know if I can help with testing or anything... - Tom
> > Bruce Momjian wrote: > > > > > > > > Does someone want to remind me why we allocate the full size for char() > > > > and varchar(), when we really can just allocate the size of the given > > > > string? > > > > I relize char() has to be padded, but why varchar()? > > > > > > > In my experience, char() is full size as defined by create, and > > > > varchar() is the the size of the actual data in the field, like text, > > > > but with a pre-defined limit. > > > > > > Well, in many relational databases access can be optimized by having > > > fixed-length tuple storage structures. Also, it allows re-use of deleted > > > space in storage pages. It may be that neither of these points have any > > > bearing on Postgres, and never will, but unless that clearly the case then > > > I would be inclined to keep the storage scheme as it is currently. > > > > With Ingres and Informix char() is fixed size, while varchar() is > > VARiable size. > > Go for it. Let me know if I can help with testing or anything... I know we have text, and that it is better than the others, but if our varchar() were both variable sized storage, and you could place a max on it, it would be useful for certain applications. -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > Does someone want to remind me why we allocate the full size for char() > and varchar(), when we really can just allocate the size of the given > string? > > I relize char() has to be padded, but why varchar()? > > In my experience, char() is full size as defined by create, and > varchar() is the the size of the actual data in the field, like text, > but with a pre-defined limit. Is CHAR padded on disk? Of course it should be padded for representation, but for storage, couldn't it be stored just like TEXT or VARCHAR? Before storing, it could be trimmed, and when read from storage, it could be padded with spaces on the right. Btw, why is VARCHAR not simply an alias for TEXT, with maxlen added? Shouldn't these types be the same internally, but with maxlen checked for VARCHAR in the parser and maxlen set to "infinite"(-1?) for TEXT? Or perhaps CHAR could be put into the same type also? If we have a type called VARTEXT(int maxLen, bool doPaddingProcessing): VARCHAR(10) becomes VARTEXT(10, false) // 10 chars, no padding TEXT becomes VARTEXT(0, false) // infinite length, no padding CHAR(10) becomes VARTEXT(10, true) // 10 chars, padded Would not this be easier to handle than three different types? This type stuff would be handled in the parser. There would be only one storage function, which could do any kind of coding to make the VARTEXT take as little space as possible on disk. Perhaps it would (in some cases) be good to have the possibility to specify compression of the text. That could be another bool attribute to VARTEXT, used by "COMPRESSED VARCHAR()" or "COMPRESSED TEXT" so that people can squeeze the maximum out of their disk space. A related question: Is it possible to store tuples over more than one block? Would it be possible to split a big TEXT into multiple blocks? /* m */
> > Does someone want to remind me why we allocate the full size for char() > > and varchar(), when we really can just allocate the size of the given > > string? > > > > I relize char() has to be padded, but why varchar()? > > > > In my experience, char() is full size as defined by create, and > > varchar() is the the size of the actual data in the field, like text, > > but with a pre-defined limit. > > Is CHAR padded on disk? Of course it should be padded for > representation, but for storage, couldn't it be stored just like > TEXT or VARCHAR? Before storing, it could be trimmed, and when > read from storage, it could be padded with spaces on the right. My CA/Ingres Admin manual points out that there is a tradeoff between compressing tuples to save disk storage and the extra processing work required to uncompress for use. They suggest that the only case where you would consider compressing on disk is when your system is very I/O bound, and you have CPU to burn. The default for Ingres is to not compress anything, but you can specify compression on a table-by-table basis. btw, char() is a bit trickier to handle correctly if you do compress it on disk, since trailing blanks must be handled correctly all the way through. For example, you would want 'hi' = 'hi ' to be true, which is not a requirement for varchar(). - Tom
> > Bruce Momjian wrote: > > > > Does someone want to remind me why we allocate the full size for char() > > and varchar(), when we really can just allocate the size of the given > > string? > > > > I relize char() has to be padded, but why varchar()? > > > > In my experience, char() is full size as defined by create, and > > varchar() is the the size of the actual data in the field, like text, > > but with a pre-defined limit. > > Is CHAR padded on disk? Of course it should be padded for > representation, but for storage, couldn't it be stored just like > TEXT or VARCHAR? Before storing, it could be trimmed, and when > read from storage, it could be padded with spaces on the right. Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable. This is how Ingres and Informix handle it. There is very little difference in the types because internally they are handled the same. The only difference is when we need to specify a max length, we do that with those types. > > Btw, why is VARCHAR not simply an alias for TEXT, with maxlen added? > Shouldn't these types be the same internally, but with maxlen checked > for VARCHAR in the parser and maxlen set to "infinite"(-1?) for TEXT? > Or perhaps CHAR could be put into the same type also? Right now we do some of the special processing using the OID of VARCHAR and BPCHAR, which is char(). We would have to generalize the length idea for each type, which is not hard to do. > > If we have a type called VARTEXT(int maxLen, bool doPaddingProcessing): > > VARCHAR(10) becomes VARTEXT(10, false) // 10 chars, no padding > TEXT becomes VARTEXT(0, false) // infinite length, no padding > CHAR(10) becomes VARTEXT(10, true) // 10 chars, padded > > Would not this be easier to handle than three different types? This > type stuff would be handled in the parser. There would be only one > storage function, which could do any kind of coding to make the VARTEXT > take as little space as possible on disk. > Perhaps it would (in some cases) be good to have the possibility to > specify compression of the text. That could be another bool attribute > to VARTEXT, used by "COMPRESSED VARCHAR()" or "COMPRESSED TEXT" so that > people can squeeze the maximum out of their disk space. > > A related question: Is it possible to store tuples over more than one > block? Would it be possible to split a big TEXT into multiple blocks? I don't know why it is not possible, but I suppose it goes to the internal workings of PostgreSQL and how rows are added and modified. -- Bruce Momjian maillist@candle.pha.pa.us
On Fri, 9 Jan 1998, Bruce Momjian wrote: > > Is CHAR padded on disk? Of course it should be padded for > > representation, but for storage, couldn't it be stored just like > > TEXT or VARCHAR? Before storing, it could be trimmed, and when > > read from storage, it could be padded with spaces on the right. > > Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable. > This is how Ingres and Informix handle it. But how do we store this to the file system? If I setup a table with a char(20), and one of the records has a value of "a", does it then write 1 byte to the file system, or does it write 1 byte ("a") + 19 bytes ("")? If the second, is there a reason why, as far as writing to the file system is concerned, char() can't be treated like varchar()? I'd imagine you could save one helluva lot of "disk space" by doing that, no? Then again, thinkiing of it that way, I may as well just use varchar() instead, right? See, this is what *really* gets me lost...I use text for everything, since I really haven't got a clue as to *why* I'd want to use either char() or varchar() instead... Now, from what I *think* I recall you stating, char() and varchar() are more for backwards compatibility? Compatibility with other SQL engines? If so...as long as we have a type char(), does our backend representation have to be any different between char() and text?
> > On Fri, 9 Jan 1998, Bruce Momjian wrote: > > > > Is CHAR padded on disk? Of course it should be padded for > > > representation, but for storage, couldn't it be stored just like > > > TEXT or VARCHAR? Before storing, it could be trimmed, and when > > > read from storage, it could be padded with spaces on the right. > > > > Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable. > > This is how Ingres and Informix handle it. > > But how do we store this to the file system? If I setup a table > with a char(20), and one of the records has a value of "a", does it then > write 1 byte to the file system, or does it write 1 byte ("a") + 19 bytes > ("")? 20+VARHDRSZ bytes for char(20), 1+VARHDRSZ for varchar(20) > > If the second, is there a reason why, as far as writing to the > file system is concerned, char() can't be treated like varchar()? I'd > imagine you could save one helluva lot of "disk space" by doing that, no? But then you have variable length records where char(x) forces a fixed length. Currently, the code treats all varlena structures as variable, so we readly don't take advantage of this, but we may some day. > > Then again, thinkiing of it that way, I may as well just use > varchar() instead, right? Yep. > > See, this is what *really* gets me lost...I use text for > everything, since I really haven't got a clue as to *why* I'd want to use > either char() or varchar() instead... > > Now, from what I *think* I recall you stating, char() and > varchar() are more for backwards compatibility? Compatibility with other > SQL engines? If so...as long as we have a type char(), does our backend > representation have to be any different between char() and text? We need the fixed length trim cabability of char(), and I think we need the padding of char() too. -- Bruce Momjian maillist@candle.pha.pa.us