Обсуждение: libpq support for arrays and composites
One of the areas where libpq seems to be severely lacking is in handling arrays and composites in query results. I'd like to set about rectifying that. Ideally this would mean that drivers using libpq could easily and reliably deliver such objects suitably structured in their particular languages (e.g. lists and hashes in Perl). One complicating factor I see is that there is no protocol level support for anything other than simple objects - each data value is simply a stream of bytes of a known length. We would therefore need some pretty robust processing to pick apart structured objects. We'll need a new API to handle such objects. I'm thinking of something like: PQarray * PQgetArray( const PGresult *res, int row_number, int column_number); int PQgetArrayNDims(PQarray * array); int PQgetArrayLower(PQarray * array, int dim); int PQgetArrayUpper(PQarray * array, int dim); int PQgetArrayElementLength(PQarray * array, int dim1, ...); bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...); char * PQgetArrayElement(PQarray * array, int dim1, ...); PQcomposite * PQgetComposite(const PGresult *res, int row_number, int column_number); PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...); int PQgetCompositeNFields(PQcomposite * composite); char * PQgetCompositeFName(PQcomposite * composite, int fnumber); int PQgetCompositeFNumber(PQcomposite * composite, char * fname); Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber); int PQgetCompositeFieldLength(PQcomposite * , int fnumber); bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber); char * PQgetCompositeField(PQcomposite * composite, int fnumber); Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect not, but right now I'm just thinking out loud. Thoughts? Is this worth doing? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > One complicating factor I see is that there is no protocol level support > for anything other than simple objects - each data value is simply a > stream of bytes of a known length. We would therefore need some pretty > robust processing to pick apart structured objects. Well, it's not that hard, the quoting/escaping rules for array_out and record_out are well defined. Are you intending that these operations support both text and binary results? The array accessors with ... parameter lists strike me as a bit dangerous, because there is no way at all to verify that the caller is passing the expected number of dimensions. Can't that be made tighter? Also you need to spell out the error handling conventions for each of these. I think you missed some "free()" operations. It might also be useful to provide some functions that form an array or composite value from per-element strings, ie, the converse of the de-construction routines. Here I'd be happy to skip the binary case. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> One complicating factor I see is that there is no protocol level support >> for anything other than simple objects - each data value is simply a >> stream of bytes of a known length. We would therefore need some pretty >> robust processing to pick apart structured objects. >> > > Well, it's not that hard, the quoting/escaping rules for array_out and > record_out are well defined. > > Are you intending that these operations support both text and binary > results? > I'm a bit open on that. > The array accessors with ... parameter lists strike me as a bit > dangerous, because there is no way at all to verify that the caller is > passing the expected number of dimensions. Can't that be made tighter? > Well, the only alternative I can think of is to make the client walk the array one dimension at a time. Something like: PQarray * PQgetInnerArray(PQarray * array, int dim); then when we're down to the leaf level, we could have: int PQgetArrayElementLength(PQarray * array, int dim); bool PQgetArrayElementIsNull(PQarray * array, int dim); char * PQgetArrayElement(PQarray * array, int dim); That strikes me as somewhat more cumbersome, so I guess the question is whether it's worth it. It probably fits the slightly clunky feel of libpq. > Also you need to spell out the error handling conventions for each of > these. > > I think you missed some "free()" operations. > Oh, yes, both of these are certainly true. This isn't really even a proposal yet, more a sketch that would lead to a proposal. I'm hoping to get some other input too, before settling this down, especially from driver writers. > It might also be useful to provide some functions that form an array or > composite value from per-element strings, ie, the converse of the > de-construction routines. Here I'd be happy to skip the binary case. > > > Yeah, that had occurred to me. Will think about it more, although it could possibly be done as a separate project, too. cheers andrew
Andrew Dunstan wrote: > > One of the areas where libpq seems to be severely lacking is in handling > arrays and composites in query results. I'd like to set about rectifying > that. > > Ideally this would mean that drivers using libpq could easily and > reliably deliver such objects suitably structured in their particular > languages (e.g. lists and hashes in Perl). > > One complicating factor I see is that there is no protocol level support > for anything other than simple objects - each data value is simply a > stream of bytes of a known length. We would therefore need some pretty > robust processing to pick apart structured objects. > > We'll need a new API to handle such objects. I'm thinking of something > like: > > PQarray * PQgetArray( const PGresult *res, int row_number, int > column_number); > int PQgetArrayNDims(PQarray * array); > int PQgetArrayLower(PQarray * array, int dim); > int PQgetArrayUpper(PQarray * array, int dim); > int PQgetArrayElementLength(PQarray * array, int dim1, ...); > bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...); > char * PQgetArrayElement(PQarray * array, int dim1, ...); > > PQcomposite * PQgetComposite(const PGresult *res, int row_number, int > column_number); > PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...); > int PQgetCompositeNFields(PQcomposite * composite); > char * PQgetCompositeFName(PQcomposite * composite, int fnumber); > int PQgetCompositeFNumber(PQcomposite * composite, char * fname); > Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber); > int PQgetCompositeFieldLength(PQcomposite * , int fnumber); > bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber); > char * PQgetCompositeField(PQcomposite * composite, int fnumber); > > Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect > not, but right now I'm just thinking out loud. > > Thoughts? Is this worth doing? > > cheers > > andrew > > > libpqtypes already implemented this. It is a different approach but provides the same functionality; with the inclusion of being able to handle every data type. libpqtypes uses the PGresult API for composites and arrays, rather than adding a new set of functions. To support this, one must be able to convert all data types (unless you are only supporting text results) because composites can be made up of any data type. Simple arrays: http://libpqtypes.esilo.com/man3/pqt-specs.html#array Composite arrays: http://libpqtypes.esilo.com/man3/pqt-composites.html EXAMPLE OF GETTING A COMPOSITE: (taken from http://libpqtypes.esilo.com/ home page) /* Let's get a composite. * CREATE TYPE simple AS (a int4, t text); */ PGint4 i4; PGtext text; PGresult *res, *simple; int resultFormat = 1; /* Your composites need to be registered */ PQregisterTypeHandler(conn, "simple", NULL, NULL); /* 2nd arg, PGparam, can be NULL if there are no query params. * Composites require binary results, so we can't use PQexec().*/ res = PQparamExec(conn, NULL, "SELECT my_simple FROM t", resultFormat); if(!res) fprintf(stderr, "ERROR: %s\n", PQgeterror()); /* Get the simple composite, which is exposed as a PGresult. */ PQgetf(res, 0, "%simple", 0, &simple); PQclear(res); /* no longer needed */ /* Get the simple composite attributes from the simple result. * Reference fields by name by using a '#' rather than a '%'.* The field names are the composite attribute names. */ PQgetf(simple, 0, "#int4 #text", "a", &i4, "t", &text); PQclear(simple); -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow wrote: > > libpqtypes already implemented this. It is a different approach but > provides the same functionality; with the inclusion of being able to > handle every data type. libpqtypes uses the PGresult API for > composites and arrays, rather than adding a new set of functions. > > Yes, I thought you'd say that :-) This has some significant limitations - for example (quoting from your docs) Arrays are only handled using binary format. This means that any type used as an array element must be put andgotten in binary format. If a user-defined type does not implement a send and recv function in the backend,it can not be used as an array element. That makes it quite useless for my intended purpose. I also am not particularly enamoured of the libpqtypes way of doing things, which feels rather foreign to me. Lastly, the idea is to provide extra facilities to libpq clients without requiring any extra library. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Lastly, the idea is to provide extra facilities to libpq clients > without requiring any extra library. Or more to the point, without requiring boatloads of new code that only some libpq users would have any use for. To my mind, the point of the present proposal is to provide some client-side code that understands how to invert the data formatting/escaping rules implemented by array_out, record_out, and perhaps array_in/record_in. We can make that happen without taking a quantum jump in libpq's API complexity --- and offhand it seems that Andrew D's proposal is at about the right level of complication. libpqtypes has its place also, but I think it's addressing a different level of problem complexity. regards, tom lane
>That makes it quite useless for my intended purpose. I found no more use cases for text results after libpqtypes started to take shape, eventhough libpqtypes supports all data types in text & binary excluding arrays and composites. Because of this, adding a text parser for arrays and composites felt like a lot of work for a little gain. libpqtypes is really designed to be a binary interface. The text support offered allows existing applications to use the new interface with results generated by PQexec(), meaning you can use PQgetf w/o having to change code to use PQputf(). If you take another glance at libpqtypes, you may see that result format decisions are pretty well abstracted and there really is no need for text results anymore (okay, I'll catagorize that as an opinion). > I also am not particularly enamoured of the libpqtypes way of doing> things, which feels rather foreign to me. Not sure we can fix this issue. We made every attempt to keep things familiar ... printf/scanf style. It's a new approach for libpq but an old one for C hacks. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On 6/8/08, Andrew Dunstan <andrew@dunslane.net> wrote: > Tom Lane wrote: > > Andrew Dunstan <andrew@dunslane.net> writes: > > > One complicating factor I see is that there is no protocol level support > for anything other than simple objects - each data value is simply a stream > of bytes of a known length. We would therefore need some pretty robust > processing to pick apart structured objects. > > > > Are you intending that these operations support both text and binary > > results? > > I'm a bit open on that. IMO, support for binary is critical. Because of the interplay of the array and composite out formats, the number of backslashes grows exponentially (!) with nesting levels. This makes text format arrays unsuitable for any non-trivial operations involving arrays of composites. > > The array accessors with ... parameter lists strike me as a bit > > dangerous, because there is no way at all to verify that the caller is > > passing the expected number of dimensions. Can't that be made tighter? > > Well, the only alternative I can think of is to make the client walk the > array one dimension at a time. Something like: > > PQarray * PQgetInnerArray(PQarray * array, int dim); Dimension walking is probably the best approach. The pure varargs version which Andrew D is suggesting has too many arguments based strictly on convention IMO (meaning, I agree with Tom). One alternative is to do a MAXDIM (6) argument 'getter' also taking the requested dimension with perhaps some wrapping macros for simplicity. One issue with this is that it seems to suggest array slicing etc. which seems more complicated than it's worth. One small style note here: the structs in libpq generally have a 'PG' prefix...PGarray, etc. > then when we're down to the leaf level, we could have: > > int PQgetArrayElementLength(PQarray * array, int dim); > bool PQgetArrayElementIsNull(PQarray * array, int dim); > char * PQgetArrayElement(PQarray * array, int dim); > > That strikes me as somewhat more cumbersome, so I guess the question is > whether it's worth it. It probably fits the slightly clunky feel of libpq. Have you considered using a PGresult to present the array/composite? The idea is to arrange it so: one column per composite field and one row per array row. I think this is a pretty elegant fit over the requirements of getting information out of an array, and you have to add a lot less convetions to libpq. This is a semantic enhancement of some of the existing functions but Tom Lane wrote: > It might also be useful to provide some functions that form an array or > composite value from per-element strings, ie, the converse of the > de-construction routines. Here I'd be happy to skip the binary case. That seems a little inconsistent to me: (we allow sending binary query parameters except when sending arrays). ISTM that arrays are one of the cases where sending in binary could be a big win in terms of performance. Andrew Dunstan wrote: > Yeah, that had occurred to me. Will think about it more, although it could > possibly be done as a separate project, too. I can't help but notice you proposing some overlapping behavior with libpqtypes when we could be making better progress working together and extending libpq in a way that: *) doesn't introduce new behavior to libpq which is an overreach of the library (particularly things which can be achieved via simple wrapping strategies) *) provides a reasonable foundation for the objectives which libpqtypes is trying to achieve. I have always thought that array and composite presentation belongs in libpq. For example, since the 'inner result' is constructed on the libpqtype side, the result has to be built with a deep copy. This is fine in most cases but a potential optimzation target if everything is done in libpq. Also we could drop some portions of our hooking patch, namely PQsetValue, PQcopyResult, and possibly PQresultAlloc. While we didn't introduce text/composite array parsing into libpqtypes, there is no reason why we couldn't (certain functions are undefined I understand that some portion of libpqtypes are unlikely to make it into libpq. Is it feasible come up with an array composite approach that is useful from libpqtypes perspective so we can layer on top of what andrew d is proposing. For example, in libpq: *) array/composite get/put (in text/binary) *) 'PGparam' argument stacker building arrays/composites for *) introduce PGarray struct. If PGresult is used to present data, AIUI the 'composite' family of functions, outside of what is needed to present the result from the composite, does not need to exist. But not: *) type serializing/deserializing from binary (except in the special case of arrays/composites). *) printf/scanf style query functions. Which we would presumably keep in the libpqtypes library. I haven't thought it completely through, but I'm guessing it's possible to adjust the libpqtype library so that it gives the interface we like but allow extending libpq to naturally allow better handling of arrays/composites which we need and libpq users obviously want. Our propsosed libpqtypes library solves all the use cases regarding arrays/composites with the exception of pulling out arrays in text (but it could be easily adjusted to do that, too). So, maybe it's worthwhile seeing if we can adapt some of the work which we have already done and is working quite well. merlin
"Merlin Moncure" <mmoncure@gmail.com> writes: > On 6/8/08, Andrew Dunstan <andrew@dunslane.net> wrote: >> Tom Lane wrote: >>> Are you intending that these operations support both text and binary >>> results? >> >> I'm a bit open on that. > IMO, support for binary is critical. Because of the interplay of the > array and composite out formats, the number of backslashes grows > exponentially (!) with nesting levels. This makes text format arrays > unsuitable for any non-trivial operations involving arrays of > composites. Um ... but who cares, as long as you've got functions to wrap and unwrap the data for you? Personally I wouldn't object if these were text-only; they'd be a whole lot more future-proof that way. > One alternative is to do a MAXDIM (6) argument 'getter' also taking > the requested dimension with perhaps some wrapping macros for > simplicity. One issue with this is that it seems to suggest array > slicing etc. which seems more complicated than it's worth. Let's not embed MAXDIM in libpq's ABI :-( regards, tom lane
On 6/10/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: > > IMO, support for binary is critical. Because of the interplay of the > > array and composite out formats, the number of backslashes grows > > exponentially (!) with nesting levels. This makes text format arrays > > unsuitable for any non-trivial operations involving arrays of > > composites. > > Um ... but who cares, as long as you've got functions to wrap and unwrap > the data for you? Personally I wouldn't object if these were text-only; > they'd be a whole lot more future-proof that way. Are you truly suggesting that a format where potentially hundreds of backslashes are sent per character of useful data is something you would encourage users to use in a data-centric application? This is bloated and wasteful in the extreme! I took one look at the output the database produced in one application we are using with a couple of levels of nesting and determined that the text approach was not workable. This was in fact the early genesis of libpqtypes. Granted, it doesn't impact a wide range of use cases. It's not like the array and composite container formats are particularly complicated. The 'future proofing' requirements should be fairly easily solved (as opposed to a general solution which covers the entire range of types). > > One alternative is to do a MAXDIM (6) argument 'getter' also taking > > the requested dimension with perhaps some wrapping macros for > > simplicity. One issue with this is that it seems to suggest array > > slicing etc. which seems more complicated than it's worth. > > Let's not embed MAXDIM in libpq's ABI :-( right. For this and other reasons I think recursing into the array structure is the best approach...ideally using a PGresult to present the array data. merlin