Обсуждение: datatype advice numeric vs. varchar
I'm working on a voicemail application which involves storing phone numbers and other types that have long sequences of digits [0-9]. I need to be able to do pattern matching using LIKE and map the fields back to java String objects. I was thinking maybe it would be more efficient to use the numeric(x,0) type instead of a varchar since it could use less space on disk. My calculations for disk space based off some information i found online are ( 8 + ( 2 bytes for every four digits) ) for numeric and ( 4 + number of chars ) for a utf8 varchar datatype. Are these calculations still valid and has anyone tried using numeric for this purpose or is this really stupid? Thanks for any hints! Gene
On Thu, Jan 18, 2007 at 01:15:02AM -0500, Gene wrote: > I'm working on a voicemail application which involves storing phone > numbers and other types that have long sequences of digits [0-9]. I > need to be able to do pattern matching using LIKE and map the fields > back to java String objects. I was thinking maybe it would be more > efficient to use the numeric(x,0) type instead of a varchar since it > could use less space on disk. Your calculations are reasonable, but if you do it as numeric you're going to be fighting the system the whole way. Numerics are treated as numbers and things like LIKE don't work on them. When you select them, java is likely to convert them to numbers there too. Is this not a case of premature optimisation? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
On Jan 18, 2007, at 15:15 , Gene wrote: > My calculations for disk space based off some information i found > online are ( 8 + ( 2 bytes for every four digits) ) for numeric and ( > 4 + number of chars ) for a utf8 varchar datatype. Are these > calculations still valid and has anyone tried using numeric for this > purpose or is this really stupid? While telephone numbers typically consist of digits, they're not numbers: they're strings of digits. For example, a telephone number in Tokyo is (typically) a string of 10 digits, beginning with "03". 0311111111 as numeric would have unexpected results when retrieved. While you may not be concerned with Japanese phone numbers, I use it as an example to show that telephone "numbers" are actually strings. In short, use strings (text/varchar). Michael Glaesemann grzm seespotcode net PS. The number of bytes used to represent characters in UTF-8 varies. I believe digits (0-9) are all 1 byte/char.
you're probably right, actually using LIKE with numeric works fine but of course i will have to contend with things like truncating zeros to the left etc... are there any advantages to using ascii encoding as far as performance of LIKE, REGEXES, INDEXES etc versus using UTF8? thanks, gene On 1/18/07, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Thu, Jan 18, 2007 at 01:15:02AM -0500, Gene wrote: > > I'm working on a voicemail application which involves storing phone > > numbers and other types that have long sequences of digits [0-9]. I > > need to be able to do pattern matching using LIKE and map the fields > > back to java String objects. I was thinking maybe it would be more > > efficient to use the numeric(x,0) type instead of a varchar since it > > could use less space on disk. > > Your calculations are reasonable, but if you do it as numeric you're > going to be fighting the system the whole way. Numerics are treated as > numbers and things like LIKE don't work on them. When you select them, > java is likely to convert them to numbers there too. > > Is this not a case of premature optimisation? > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > From each according to his ability. To each according to his ability to litigate. > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (GNU/Linux) > > iD8DBQFFrxFpIB7bNG8LQkwRAvMbAKCJ/1OeO5U7sk0QGjl9YBia4sfRjwCfSFWy > TK5gVC8ybtnH2Xtdco6SGqA= > =pUUw > -----END PGP SIGNATURE----- > > > -- Gene Hart cell: 443-604-2679
Gene schrieb: > you're probably right, actually using LIKE with numeric works fine but > of course i will have to contend with things like truncating zeros to > the left etc... are there any advantages to using ascii encoding as > far as performance of LIKE, REGEXES, INDEXES etc versus using UTF8? Well if you are really concerned, write a custom datatype where you store 2 digits of the telephone number in one byte. A nibble for 0-9+-/ und " " should be plenty (you would even have a c-like string end marker :-) Of course you would need to write operators for everything you want to do w/ it. IIRC, there were even plans I read on the list to write such a datatype. Maybe you scan the list archives first. Regards Tino
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/18/07 00:22, Michael Glaesemann wrote: > > On Jan 18, 2007, at 15:15 , Gene wrote: > >> My calculations for disk space based off some information i found >> online are ( 8 + ( 2 bytes for every four digits) ) for numeric and ( >> 4 + number of chars ) for a utf8 varchar datatype. Are these >> calculations still valid and has anyone tried using numeric for this >> purpose or is this really stupid? > > While telephone numbers typically consist of digits, they're not > numbers: they're strings of digits. For example, a telephone number in > Tokyo is (typically) a string of 10 digits, beginning with "03". > 0311111111 as numeric would have unexpected results when retrieved. > While you may not be concerned with Japanese phone numbers, I use it as > an example to show that telephone "numbers" are actually strings. Not only that, but since the number pad also has "*" & "#", there /might/ be obscure times when you need to use them. > In short, use strings (text/varchar). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFr3KvS9HxQb37XmcRAtWLAKCAM5hbrZtTmK4aK6N1zv8rVALkGQCg2E0P KMvry9tlIyCNH9LjiV+8M78= =wY6z -----END PGP SIGNATURE-----
On Jan 18, 1:25 am, genekh...@gmail.com (Gene) wrote: > you're probably right, actually using LIKE with numeric works fine but > of course i will have to contend with things like truncating zeros to > the left etc... are there any advantages to using ascii encoding as > far as performance of LIKE, REGEXES, INDEXES etc versus using UTF8? > > thanks, gene You might want to check out the following link, particularly if your database is UTF-8, use varchar for your TNs, and you want to look for calls from certain NPA/NXX/block_id ranges, e.g. with "WHERE ani LIKE '2015551%'. As I understand it, an index created with a simple "CREATE INDEX blah ON tbl (ani)" won't get used in that situation, whereas one created with "CREATE INDEX blah2 ON tbl (ani varchar_pattern_ops)" will. http://www.postgresql.org/docs/8.2/interactive/indexes-opclass.html