Обсуждение: Priorities for 6.6
Jan Wieck writes (over in pgsql-sql): > * WE STILL NEED THE GENERAL TUPLE SPLIT CAPABILITY!!! * I've been thinking about making this post for a while ... with 6.5 almost out the door, I guess now is a good time. I don't know what people have had in mind for 6.6, but I propose that there ought to be three primary objectives for our next release: 1. Eliminate arbitrary restrictions on tuple size. 2. Eliminate arbitrary restrictions on query size (textual length/complexity that is). 3. Cure within-statement memory leaks, so that processing large numbers of tuples in one statement is reliable. All of these are fairly major projects, and it might be that we get little or nothing else done if we take these on. But these are the problems we've been hearing about over and over and over. I think fixing these would do more to improve Postgres than almost any other work we might do. Comments? Does anyone have a different list of pet peeves? Is there any chance of getting everyone to subscribe to a master plan like this? regards, tom lane
Tom Lane wrote: > > I don't know what people have had in mind for 6.6, but I propose that > there ought to be three primary objectives for our next release: > > 1. Eliminate arbitrary restrictions on tuple size. This is not primary for me -:) Though, it's required by PL/pgSQL and so... I agreed that this problem must be resolved in some way. Related TODO items: * Allow compression of large fields or a compressed field type * Allow large text type to use large objects(Peter) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I like it very much, though I don't like that LO are stored in separate files. This is known as "multi-representation" feature in Illustra. > 2. Eliminate arbitrary restrictions on query size (textual > length/complexity that is). Yes, this is quite annoyning thing. > 3. Cure within-statement memory leaks, so that processing large numbers > of tuples in one statement is reliable. Quite significant! > All of these are fairly major projects, and it might be that we get > little or nothing else done if we take these on. But these are the > problems we've been hearing about over and over and over. I think > fixing these would do more to improve Postgres than almost any other > work we might do. > > Comments? Does anyone have a different list of pet peeves? Is there > any chance of getting everyone to subscribe to a master plan like this? No chance -:)) This is what I would like to see in 6.6: 1. Referential integrity. 2. Dirty reads (will be required by 1. if we'll decide to follow the way proposed by Jan - using rules, - though there isanother way I'll talk about later; dirty reads are useful anyway). 3. Savepoints (they are my primary wish-to-implement thing). 4. elog(ERROR) must return error-codes, not just messages! This is very important for non-interactive application... inconjuction with 3. -:) Vadim
On Fri, 4 Jun 1999, Vadim Mikheev wrote: > * Allow compression of large fields or a compressed field type This one looks cool... > > All of these are fairly major projects, and it might be that we get > > little or nothing else done if we take these on. But these are the > > problems we've been hearing about over and over and over. I think > > fixing these would do more to improve Postgres than almost any other > > work we might do. > > > > Comments? Does anyone have a different list of pet peeves? Is there > > any chance of getting everyone to subscribe to a master plan like this? > > No chance -:)) have to agree with Vadim here...the point that has *always* been stressed here is that if something is important to you, fix it. Don't expect anyone else to fall into some sort of "party line" or scheduale, cause then ppl lose the enjoyment in what they are doing *shrug* for instance, out of the three things you listed, the only one that I'd consider an issue is the third, as I've never hit the first two limitations ...*shrug* Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> This is what I would like to see in 6.6: > > 1. Referential integrity. Bingo. Item #1. Period. End of story. Everything else pales in comparison. We just get too many requests for this, though I think it an insignificant feature myself. Jan, I believe you have some ideas on this. (Like an elephant, I never forget.) > 4. elog(ERROR) must return error-codes, not just messages! > This is very important for non-interactive application... > in conjuction with 3. -:) Added to TODO. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Jan Wieck writes (over in pgsql-sql): > > * WE STILL NEED THE GENERAL TUPLE SPLIT CAPABILITY!!! * > > I've been thinking about making this post for a while ... with 6.5 > almost out the door, I guess now is a good time. > > I don't know what people have had in mind for 6.6, but I propose that > there ought to be three primary objectives for our next release: > > 1. Eliminate arbitrary restrictions on tuple size. > > 2. Eliminate arbitrary restrictions on query size (textual > length/complexity that is). > > 3. Cure within-statement memory leaks, so that processing large numbers > of tuples in one statement is reliable. I think the other hot item for 6.6 is outer joins. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > I think the other hot item for 6.6 is outer joins. I would like to have 48 hours in day -:) Vadim
> Bruce Momjian wrote: > > > > I think the other hot item for 6.6 is outer joins. > > I would like to have 48 hours in day -:) > > Vadim > You and I are off the hook. Jan volunteered for foreign keys, and Thomas for outer joins. We can relax. :-) -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > Bruce Momjian wrote: > > > > > > I think the other hot item for 6.6 is outer joins. > > > > I would like to have 48 hours in day -:) > > > > Vadim > > > > You and I are off the hook. Jan volunteered for foreign keys, and > Thomas for outer joins. We can relax. :-) I volunteered for savepoints -:)) Vadim
> Bruce Momjian wrote: > > > > > Bruce Momjian wrote: > > > > > > > > I think the other hot item for 6.6 is outer joins. > > > > > > I would like to have 48 hours in day -:) > > > > > > Vadim > > > > > > > You and I are off the hook. Jan volunteered for foreign keys, and > > Thomas for outer joins. We can relax. :-) > > I volunteered for savepoints -:)) Oh. Hey, I thought you were going to sleep? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > > > > > > > > > I think the other hot item for 6.6 is outer joins. > > > > > > > > I would like to have 48 hours in day -:) > > > > > > > > Vadim > > > > > > > > > > You and I are off the hook. Jan volunteered for foreign keys, and > > > Thomas for outer joins. We can relax. :-) > > > > I volunteered for savepoints -:)) > > Oh. > > Hey, I thought you were going to sleep? I just try to have at least 25 hours in day :) Vadim
Vadim Mikheev <vadim@krs.ru> writes: > Tom Lane wrote: >> 1. Eliminate arbitrary restrictions on tuple size. > This is not primary for me -:) Fair enough; it's not something I need either. But I see complaints about it constantly on the mailing lists; a lot of people do need it. > * Allow large text type to use large objects(Peter) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > I like it very much, though I don't like that LO are stored > in separate files. But, but ... if we fixed the tuple-size problem then people could stop using large objects at all, and instead just put their data into tuples. I hate to see work going into improving LO support when we really ought to be phasing out the whole feature --- it's got *so* many conceptual and practical problems ... >> any chance of getting everyone to subscribe to a master plan like this? > No chance -:)) Yeah, I know ;-). But I was hoping to line up enough people so that these things have some chance of getting done. I doubt that any of these projects can be implemented by just one or two people; they all affect too much of the code. (For instance, eliminating query-size restrictions will require looking at all of the interface libraries, psql, pg_dump, and probably other apps, even though the fixes in the backend should be somewhat localized.) regards, tom lane
At 05:39 PM 6/3/99 -0400, Tom Lane wrote: >But, but ... if we fixed the tuple-size problem then people could stop >using large objects at all, and instead just put their data into tuples. >I hate to see work going into improving LO support when we really ought >to be phasing out the whole feature --- it's got *so* many conceptual >and practical problems ... Making them go away would be a real blessing. Oracle folk bitch about CLOBS and BLOBS and the like, too. They're a pain. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at http://donb.photo.net
Don Baccus wrote: > > At 05:39 PM 6/3/99 -0400, Tom Lane wrote: > > > * Allow large text type to use large objects(Peter) > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > I like it very much, though I don't like that LO are stored > > in separate files. This is known as "multi-representation" feature > > in Illustra. > > > >But, but ... if we fixed the tuple-size problem then people could stop > >using large objects at all, and instead just put their data into tuples. > >I hate to see work going into improving LO support when we really ought > >to be phasing out the whole feature --- it's got *so* many conceptual > >and practical problems ... > > Making them go away would be a real blessing. Oracle folk > bitch about CLOBS and BLOBS and the like, too. They're a > pain. Note: I told about "multi-representation" feature, not just about LO/CLOBS/BLOBS support. "Multi-representation" means that server stores tuple fields sometime inside the main relation file, sometime outside of it, but this is hidden from user and so people "just put their data into tuples". I think that putting big fields outside of main relation file is very good thing. BTW, this approach also allows what you are proposing - why not put not too big field (~ 8K or so) to another block of main file? BTW, I don't like using LOs as external storage. Implementation seems easy: struct varlena { int32 vl_len; char vl_dat[1]; }; 1. make vl_len uint32; 2. use vl_len & 0x80000000 as flag that underlying data is in another place; 3. put oid of external "relation" (where data is stored), blocknumber and item position (something else?) to vl_dat. ... ... ... Vadim
> Implementation seems easy: > > struct varlena > { > int32 vl_len; > char vl_dat[1]; > }; > > 1. make vl_len uint32; > 2. use vl_len & 0x80000000 as flag that underlying data is > in another place; > 3. put oid of external "relation" (where data is stored), > blocknumber and item position (something else?) to vl_dat. > ... Yes, it would be very nice to have this. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 10:56 AM 6/4/99 +0800, Vadim Mikheev wrote: >Note: I told about "multi-representation" feature, not just about >LO/CLOBS/BLOBS support. "Multi-representation" means that server >stores tuple fields sometime inside the main relation file, >sometime outside of it, but this is hidden from user and so >people "just put their data into tuples". OK, in my first response I didn't pick up on your generalization, but I did respond with a generalization that implementation details should be hidden from the user. Which is what you're saying. As a compiler writer, this is more or less what I devoted my life to 20 years ago...of course, reasonable efficiency is a pre-condition if you're going to hide details from the user... I'll back off a bit, though, and say that a lot of DB users really don't need an enterprise engine like Oracle (i.e. something that requires a suite of $100K/yr DBAs :) There's a niche for a solid reliable, rich feature set, reasonably well-performing db out there, and this niche is ever-growing with the web. With $500 web servers sitting on $29.95/mo DSL lines, as does mine (http://donb.photo.net/tweeterdom), who wants to pay $6K to Oracle? - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at http://donb.photo.net
At 10:56 AM 6/4/99 +0800, Vadim Mikheev wrote: >Note: I told about "multi-representation" feature, not just about >LO/CLOBS/BLOBS support. "Multi-representation" means that server >stores tuple fields sometime inside the main relation file, >sometime outside of it, but this is hidden from user and so >people "just put their data into tuples". I think that putting >big fields outside of main relation file is very good thing. Yes, it is, though "big" is relative (as computers grow). The key is to hide the details of where things are stored from the user, so the user doesn't really have to know what is "big" (today) vs. "small" (tomorrow or today, for that matter). I don't think it's so much the efficiency hit of having big items stored outside the main relation file, as the need for the user to know what's "big" and what's "small", that's the problem. I mean, my background is as a compiler writer for high-level languages...call me a 1970's idealist if you will, but I really think such things should be hidden from the user. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at http://donb.photo.net
Tom Lane wrote: > > I don't know what people have had in mind for 6.6, but I propose that > there ought to be three primary objectives for our next release: > > 1. Eliminate arbitrary restrictions on tuple size. > > 2. Eliminate arbitrary restrictions on query size (textual > length/complexity that is). > > 3. Cure within-statement memory leaks, so that processing large numbers > of tuples in one statement is reliable. I would add a few that I think would be important: A. Add outer joins B. Add the possibility to prepare statements and then execute them with a set of arguments. This already exists in SPIbut for many C/S apps it would be desirable to have this in the fe/be protocol as well C. Look over the protocol and unify the _binary_ representations of datatypes on wire. in fact each type already has twosets of in/out conversion functions in its definition tuple, one for disk and another for net, it's only that untilnow they are the same for all types and thus probably used wromg in some parts of code. D. After B. and C., add a possibility to insert binary data in "(small)binary" field without relying on LOs or expensive (4x the size) quoting. Allow any characters in said binary field E. to make 2. and B., C, D. possible, some more fundamental changes in fe/be-protocol may be needed. There seems to be someeffort for a new fe/be communications mechanism using CORBA. But my proposal would be to adopt the X11 protocol whichis quite light but still very clean, well understood and which can transfer arbitrary data in an efficient way. There are even "low bandwidth" variants of it for using over really slow links. Alsosome kinds of "out of band" provisions exist, that are used by window managers. It should also be trivial to adaptcrypto wrappers/proxies (such as the one in ssh) The protocol is described in a document available from http://www.x.org F. As a lousy alternative to 1. fix the LO storage. Currently _all_ of the LO files are kept in the same directory as thetables and indexes. this can bog down the whole database quite fast if one lots of LOs and a file system that does linear scans on open (like ext2). A sheme where LOs are kept in subdirectories based on thehex representation of their oids would avoid that (so LO with OID 0x12345678 would be stored in $PG_DATA/DBNAME/LO/12/34/56/78.lo or maybe reversed $PG_DATA/DBNAME/LO/78/56/34/12.lo to distribute them more evenly in "buckets" > All of these are fairly major projects, and it might be that we get > little or nothing else done if we take these on. But then, the other things to do _are_ little compared to these ;) > But these are the problems we've been hearing about over and over and > over. The LO thing (and lack of decent full-text indexing) is what has kept me using hybrid solutions where I keep the LO data and home-grown full-text indexes in file system outside of the database. > I think fixing these would do more to improve Postgres than > almost any other work we might do. Amen! ---------------- Hannu
On Thu, Jun 03, 1999 at 11:27:14PM -0400, Bruce Momjian wrote: > > Implementation seems easy: > > > > struct varlena > > { > > int32 vl_len; > > char vl_dat[1]; > > }; > > > > 1. make vl_len uint32; > > 2. use vl_len & 0x80000000 as flag that underlying data is > > in another place; > > 3. put oid of external "relation" (where data is stored), > > blocknumber and item position (something else?) to vl_dat. > > ... > > Yes, it would be very nice to have this. I hate to be fussy - normally I am just watching, but could we *please* keep any flag like above in another field. That way, when the size of an object reaches 2^31 we will not have legacy problems.. struct varlena { size_t vl_len; int vl_flags; caddr_t vl_dat[1]; }; (Please:) Regards, -- Peter Galbavy Knowledge Matters Ltd http://www.knowledge.com/
Hannu Krosing <hannu@trust.ee> writes: > E. to make 2. and B., C, D. possible, some more fundamental changes in > fe/be-protocol may be needed. There seems to be some effort for a new > fe/be communications mechanism using CORBA. > But my proposal would be to adopt the X11 protocol which is quite > light but still very clean, well understood and which can transfer > arbitrary data in an efficient way. ... but no one uses it for database work. If we're going to go to the trouble of overhauling the fe/be protocol, I think we should adopt something fairly standard, and that seems to mean CORBA. > F. As a lousy alternative to 1. fix the LO storage. Currently _all_ of > the LO files are kept in the same directory as the tables and > indexes. this can bog down the whole database quite fast Yes. I was thinking last night that there's no good reason not to just stick all the LOs into a single relation --- or actually two relations, one having a row per LO (which would really just act to tell you what LOs exist, and perhaps store access-privileges info) and one that has a row per LO chunk, with columns LONumber, Offset, Data rather than just Offset and Data as is done now. The existing index on Offset would be replaced by a multi-index on LONumber and Offset. In this scheme the LONumbers need not be tied hard-and-fast to OIDs, but could actually be anything you wanted, which would be much nicer for dump/reload purposes. However, I am loathe to put *any* work into improving LOs, since I think the right answer is to get rid of the need for the durn things by eliminating the size restrictions on regular tuples. regards, tom lane
Vadim Mikheev <vadim@krs.ru> writes: > Note: I told about "multi-representation" feature, not just about > LO/CLOBS/BLOBS support. "Multi-representation" means that server > stores tuple fields sometime inside the main relation file, > sometime outside of it, but this is hidden from user and so > people "just put their data into tuples". I think that putting > big fields outside of main relation file is very good thing. Ah, I see what you mean. If you think that is easier than splitting tuples, we could go that way. We'd have a limit of about 500 fields in a tuple (maybe less if the tuple contains "small" fields that are not pushed to another place). That's annoying if the goal is to eliminate limits, but I think it would be unlikely to be a big problem in practice. Perhaps a better way is to imagine these "pointers to another place" to be just part of the tuple structure on disk, without tying them to individual fields. In other words, the tuple's data is still a string of fields, but now you can have that data either right there with the tuple header, or pointed to by a list of "indirect links" that are stored with the tuple header. (Kinda like direct vs indirect blocks in Unix filesystem.) You can chop the tuple data into blocks without regard for field boundaries if you do it that way. I think that might be better than altering the definition of varlena --- it'd be visible only to the tuple read and write mechanisms, not to everything in the executor that deals with varlena fields... regards, tom lane
On 04-Jun-99 Tom Lane wrote: > However, I am loathe to put *any* work into improving LOs, since I think > the right answer is to get rid of the need for the durn things by > eliminating the size restrictions on regular tuples. Is this doable? I just looked at the list of datatypes and didn't see binary as one of them. Imagining a Real Estate database with pictures of homes (inside and out), etc. or an employee database with mugshots of the employees, what datatype would you use to store the pictures (short of just storing a filename of the pic)? Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null # include <std/disclaimers.h> TEAM-OS2 Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
Vince Vielhaber <vev@michvhf.com> writes: > On 04-Jun-99 Tom Lane wrote: >> However, I am loathe to put *any* work into improving LOs, since I think >> the right answer is to get rid of the need for the durn things by >> eliminating the size restrictions on regular tuples. > Is this doable? I just looked at the list of datatypes and didn't see > binary as one of them. bytea ... even if we didn't have one, inventing it would be trivial. (Although I wonder whether pg_dump copes with arbitrary data in fields properly ... I think there are still some issues about COPY protocol not being fully 8-bit-clean...) As someone else pointed out, you'd still want an equivalent of lo_read/lo_write, but now it would mean fetch or put N bytes at an offset of M bytes within the value of field X of tuple Y in some relation. Otherwise field X is pretty much like any other item in the database. I suppose it'd only make sense to allow random data to be fetched/stored in a bytea field --- other datatypes would want to constrain the data to valid values... regards, tom lane
> > eliminating the size restrictions on regular tuples. > Is this doable? Presumably we would have to work out a "chunking" client/server protocol to allow sending very large tuples. Also, it would need to report the size of the tuple before it shows up, to allow very large rows to be caught correctly. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >>>> eliminating the size restrictions on regular tuples. >> Is this doable? > Presumably we would have to work out a "chunking" client/server > protocol to allow sending very large tuples. I don't really see a need to change the protocol. It's true that a single tuple containing a couple dozen megabytes (per someone's recent example) would stress the system unpleasantly, but that would be true in a *lot* of ways. Perhaps we should plan on keeping the LO feature to allow for really huge objects. As far as I've seen, 99% of users are not interested in storing objects that are so large that handling them as single tuples would pose serious performance problems. It's just that a hard limit at 8K (or any other particular small number) is annoying. regards, tom lane
> C. Look over the protocol and unify the _binary_ representations of > datatypes on wire. in fact each type already has two sets of > in/out conversion functions in its definition tuple, one for disk and > another for net, it's only that until now they are the same for > all types and thus probably used wromg in some parts of code. Added to TODO: * remove duplicate type in/out functions for disk and net > > D. After B. and C., add a possibility to insert binary data > in "(small)binary" field without relying on LOs or expensive > (4x the size) quoting. Allow any characters in said binary field I will add this to the TODO list if you can tell me how does the user pass this into the backend via a query? * Add non-large-object binary field > F. As a lousy alternative to 1. fix the LO storage. Currently _all_ of > the LO files are kept in the same directory as the tables and > indexes. > this can bog down the whole database quite fast if one lots of LOs > and > a file system that does linear scans on open (like ext2). > A sheme where LOs are kept in subdirectories based on the hex > representation of their oids would avoid that (so LO with OID > 0x12345678 > would be stored in $PG_DATA/DBNAME/LO/12/34/56/78.lo or maybe > reversed > $PG_DATA/DBNAME/LO/78/56/34/12.lo to distribute them more evenly in > "buckets" I have already added a TODO item to use hash directories for large objects. Probably single or double-level 256 directory buckets are enough: 04/4A/file09/B3/file -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
OK, question answered, TODO item added: * Add non-large-object binary field > > Is this doable? I just looked at the list of datatypes and didn't see > > binary as one of them. > > bytea ... even if we didn't have one, inventing it would be trivial. > (Although I wonder whether pg_dump copes with arbitrary data in fields > properly ... I think there are still some issues about COPY protocol > not being fully 8-bit-clean...) > > As someone else pointed out, you'd still want an equivalent of > lo_read/lo_write, but now it would mean fetch or put N bytes at an > offset of M bytes within the value of field X of tuple Y in some > relation. Otherwise field X is pretty much like any other item in the > database. I suppose it'd only make sense to allow random data to be > fetched/stored in a bytea field --- other datatypes would want to > constrain the data to valid values... > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026