Обсуждение: PG_RETURN_?
Hi, I have a set of functions for a data type that return small integers (i.e. [0..12]). I can, of course, represent it as a char, short or long (CHAR, INT16 or INT32). re there any advantages/drawbacks to chosing one particular PG_RETURN_ type over another (realizing that they are effectively just casts)? Thanks! --don
Don Y wrote: > Hi, > > I have a set of functions for a data type that return > small integers (i.e. [0..12]). I can, of course, represent > it as a char, short or long (CHAR, INT16 or INT32). > re there any advantages/drawbacks to chosing one particular > PG_RETURN_ type over another (realizing that they are > effectively just casts)? If they are integers then an int would be the obvious choice. If you are going to treat them as int2 outside the function then int2, otherwise just integer. Oh, it's int2/int4 not int16/int32. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Don Y wrote: >> Hi, >> >> I have a set of functions for a data type that return >> small integers (i.e. [0..12]). I can, of course, represent >> it as a char, short or long (CHAR, INT16 or INT32). >> re there any advantages/drawbacks to chosing one particular >> PG_RETURN_ type over another (realizing that they are >> effectively just casts)? > > If they are integers then an int would be the obvious choice. If you are > going to treat them as int2 outside the function then int2, otherwise > just integer. Yes, I was more interested in what might be going on "behind the scenes" inside the server that could bias my choice of WHICH integer type to use. E.g., if arguments are marshalled as byte arrays vs. as Datum arrays, etc. (I would suspect the latter). Since I could use something as small as a char to represent the values, the choice is more interested in how OTHER things would be affected... > Oh, it's int2/int4 not int16/int32. The *data type* is int2/int4 but the PG_RETURN_? macro is PG_RETURN_INT16 or PG_RETURN_INT32 -- hence the reason I referred to them as "CHAR, INT16 or INT32" instead of "char, int2 or int4" :> --don
Don Y wrote: > Richard Huxton wrote: >> Don Y wrote: >>> Hi, >>> >>> I have a set of functions for a data type that return >>> small integers (i.e. [0..12]). I can, of course, represent >>> it as a char, short or long (CHAR, INT16 or INT32). >>> re there any advantages/drawbacks to chosing one particular >>> PG_RETURN_ type over another (realizing that they are >>> effectively just casts)? >> >> If they are integers then an int would be the obvious choice. If you >> are going to treat them as int2 outside the function then int2, >> otherwise just integer. > > Yes, I was more interested in what might be going on "behind the > scenes" inside the server that could bias my choice of WHICH > integer type to use. E.g., if arguments are marshalled as > byte arrays vs. as Datum arrays, etc. (I would suspect the > latter). Since I could use something as small as a char to > represent the values, the choice is more interested in how > OTHER things would be affected... I must admit I've never tested, but I strongly suspect any differences will be below the level you can accurately measure. Certainly from the point of view of 8/16/32 bit integers I'd guess they'd all time the same (they should all end up as a Datum). With a 64-bit CPU I'd guess that would extend to 64 bits too. Hmm - looking at comments it seems int64 is a reference type regardless of CPU (include/postgres.h) > > Oh, it's int2/int4 not int16/int32. > > The *data type* is int2/int4 but the PG_RETURN_? macro is > PG_RETURN_INT16 or PG_RETURN_INT32 -- hence the reason > I referred to them as "CHAR, INT16 or INT32" instead of > "char, int2 or int4" :> You're quite right. I was thinking from the other side. -- Richard Huxton Archonet Ltd
On Tue, May 02, 2006 at 08:43:03AM -0700, Don Y wrote: > Richard Huxton wrote: > >Don Y wrote: > >>Hi, > >> > >>I have a set of functions for a data type that return > >>small integers (i.e. [0..12]). I can, of course, represent > >>it as a char, short or long (CHAR, INT16 or INT32). > >>re there any advantages/drawbacks to chosing one particular > >>PG_RETURN_ type over another (realizing that they are > >>effectively just casts)? > > > >If they are integers then an int would be the obvious choice. If you are > >going to treat them as int2 outside the function then int2, otherwise > >just integer. > > Yes, I was more interested in what might be going on "behind the > scenes" inside the server that could bias my choice of WHICH > integer type to use. E.g., if arguments are marshalled as > byte arrays vs. as Datum arrays, etc. (I would suspect the > latter). Since I could use something as small as a char to > represent the values, the choice is more interested in how > OTHER things would be affected... You should always *always* match the PG_RETURN_* to the declared type you are returning. anything else will cause problems. PG_RETURN_INT16 means "return in a format consistant with a type declared as pass-by-value two byte width". PostgreSQL does not check that what you're returning actually matches what you declared. The type as declared determines the storage required to store it. That might be a far more useful factor to consider than what it copied internally which, as has been pointed out, is probably below what you can measure. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
Martijn van Oosterhout wrote: > On Tue, May 02, 2006 at 08:43:03AM -0700, Don Y wrote: >> Richard Huxton wrote: >>> Don Y wrote: >>>> Hi, >>>> >>>> I have a set of functions for a data type that return >>>> small integers (i.e. [0..12]). I can, of course, represent >>>> it as a char, short or long (CHAR, INT16 or INT32). >>>> re there any advantages/drawbacks to chosing one particular >>>> PG_RETURN_ type over another (realizing that they are >>>> effectively just casts)? >>> If they are integers then an int would be the obvious choice. If you are >>> going to treat them as int2 outside the function then int2, otherwise >>> just integer. >> Yes, I was more interested in what might be going on "behind the >> scenes" inside the server that could bias my choice of WHICH >> integer type to use. E.g., if arguments are marshalled as >> byte arrays vs. as Datum arrays, etc. (I would suspect the >> latter). Since I could use something as small as a char to >> represent the values, the choice is more interested in how >> OTHER things would be affected... > > You should always *always* match the PG_RETURN_* to the declared type > you are returning. anything else will cause problems. PG_RETURN_INT16 > means "return in a format consistant with a type declared as > pass-by-value two byte width". PostgreSQL does not check that what > you're returning actually matches what you declared. Yes, but that wasn't the question. I can PG_RETURN_CHAR(2), PG_RETURN_INT16(2) or PG_RETURN_INT32(2) and end up with the same result (assuming the function is defined to return char, int2 or int4, respectively in the SQL interface). > The type as declared determines the storage required to store it. That Yes, but for a function returning a value that does not exceed sizeof(Datum), there is no *space* consequence. I would assume most modern architectures use 32 bit (and larger) registers. OTOH, some machines incur a (tiny) penalty for casting char to long. Returning INT32 *may* be better from that standpoint -- assuming there is no added offsetting cost marshalling. > might be a far more useful factor to consider than what it copied > internally which, as has been pointed out, is probably below what you > can measure. Sure. But, given that the difference ONLY amounts to whether I type "INT32" or "INT16" or "CHAR" in the PG_RETURN_ macro, an understanding of what is going on "inside" can contribute epsilon for or against performance. I'd be annoyed to have built dozens of functions ASSUMING "INT32" when a *better* assumption might have been "CHAR"... (I'm working in an embedded environment where "spare CPU cycles" mean you've wasted $$$ on hardware that you don't need :-/ ) --don
On Tue, May 02, 2006 at 10:06:19AM -0700, Don Y wrote: > >The type as declared determines the storage required to store it. That > > Yes, but for a function returning a value that does not exceed > sizeof(Datum), there is no *space* consequence. I would assume > most modern architectures use 32 bit (and larger) registers. When you return a Datum, it's always the same size. When you're returning a string, you're still returning a Datum, which may be 4 or 8 bytes depending on the platform. But what I was referring to was the space to store the data in a tuple on disk, or to send the data to a client. These are affected by the choice of representation. > OTOH, some machines incur a (tiny) penalty for casting char to long. > Returning INT32 *may* be better from that standpoint -- assuming > there is no added offsetting cost marshalling. Within the backend the only representations used are Datum and tuples. I don't think either of them would have a noticable difference between various pass-by-value formats. > ... I'd be annoyed to have > built dozens of functions ASSUMING "INT32" when a *better* > assumption might have been "CHAR"... (I'm working in an > embedded environment where "spare CPU cycles" mean you've > wasted $$$ on hardware that you don't need :-/ ) Hmm, postgres doesn't try to save on cycles. the philosophy is to get it right first, then make it fast. The entire fmgr interface is slower than the original design (old-style functions), but this design works on all platforms whereas the old one didn't. I'd go for INT32, it's most likely to be an "int" which should be "the most natural size for the machine". Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
Martijn van Oosterhout wrote: > On Tue, May 02, 2006 at 10:06:19AM -0700, Don Y wrote: >>> The type as declared determines the storage required to store it. That >> Yes, but for a function returning a value that does not exceed >> sizeof(Datum), there is no *space* consequence. I would assume >> most modern architectures use 32 bit (and larger) registers. > > When you return a Datum, it's always the same size. When you're > returning a string, you're still returning a Datum, which may be 4 or 8 > bytes depending on the platform. Yes. > But what I was referring to was the space to store the data in a tuple > on disk, or to send the data to a client. These are affected by the > choice of representation. So, as I had mentioned before, you marshall as a *byte* stream and not a *Datum* stream? >> OTOH, some machines incur a (tiny) penalty for casting char to long. >> Returning INT32 *may* be better from that standpoint -- assuming >> there is no added offsetting cost marshalling. > > Within the backend the only representations used are Datum and tuples. > I don't think either of them would have a noticable difference between > various pass-by-value formats. > >> ... I'd be annoyed to have >> built dozens of functions ASSUMING "INT32" when a *better* >> assumption might have been "CHAR"... (I'm working in an >> embedded environment where "spare CPU cycles" mean you've >> wasted $$$ on hardware that you don't need :-/ ) > > Hmm, postgres doesn't try to save on cycles. <grin> Yes, I noticed. :> But it's hard for me to get this "attitude" out of the way I approach a problem. :-( (e.g., I wouldn't count people at a rally using a *float*! :>) > the philosophy is to get > it right first, then make it fast. The entire fmgr interface is slower > than the original design (old-style functions), but this design works > on all platforms whereas the old one didn't. Exactly. I could more "efficiently" replace postgres with dedicated structures to do what I want. But, that ties my implementation down to one less portable (and maintainable). > I'd go for INT32, it's most likely to be an "int" which should be "the > most natural size for the machine". (sigh) Yes, I suppose so. Though it can have a big impact on transport delays (server to client) if things really are marshalled as byte streams, etc. <shrug> I suppose I should just "do it" and let technology catch up with my inefficiencies later! Thanks! --don