Last thoughts about LONG
От | wieck@debis.com (Jan Wieck) |
---|---|
Тема | Last thoughts about LONG |
Дата | |
Msg-id | m11wpGy-0003kGC@orion.SAPserv.Hamburg.dsh.de обсуждение исходный текст |
Ответ на | Re: [HACKERS] LONG (wieck@debis.com (Jan Wieck)) |
Ответы |
Re: [HACKERS] Last thoughts about LONG
(Peter Eisentraut <e99re41@DoCS.UU.SE>)
Re: [HACKERS] Last thoughts about LONG (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
I wrote: > Bruce Momjian wrote: > > > Should we use large objects for this, and beef them up. Seems that > > would be a good way. I have considered putting them in a hash > > bucket/directory tree for faster access to lots of large objects. > > > > There is a lot to say about storing long tuples outside the tables > > because long tuples fill cache buffers and make short fields longer to > > access. > > I thought to use a regular table. Of course, it will eat > buffers ... When looking at my actual implementation concept, I'm not sure if it will win or loose compared against text itself! Amazing, but I think it could win already on relatively small text sizes (1-2K is IMHO small compared to what this type could store). Well, the implementation details. I really would like some little comments to verify it's really complete before starting. - A new field "rellongrelid" type Oid is added to pg_class. It contains the Oid of the long-value relation or the invalid Oid for those who have no LONG attributes. - At CREATE TABLE, a long value relation named "_LONG<tablename>" is created for those tables who need it. And of course dropped and truncated appropriate. The schema of this table is rowid Oid, -- oid of our main data row rowattno int2, -- the attribute number in main data chunk_seq int4, -- the part number of this data chunk chunk text -- the content of this data chunk There is a unique index defined on (rowid, rowattno). - The new data type is of variable size with the following header: typedef struct LongData { int32 varsize; int32 datasize; Oid longrelid; Oid rowid; int16 rowattno; } LongData; The types input function is very simple. Allocate sizeof(LongData) + strlen(input), set varsize to it, datasize to strlen(input), and the rest to invalid and 0. Then copy the input after the struct. The types output function determines on the longrelid, what to do. If it's invalid, just output the bytes stored after the struct (it must be a datum that resulted from an input operation. If longrelid isn't invalid, it does an index scan on that relation, fetching all tuples that match rowid and attno. Since it knows the datasize, it doesn't need them in the correct order, it can put them at the right places into the allocated return buffer by their chunk_seq. - For now (until we have enough experience to judge) I think it would be better to forbid ALTER TABLE when LONG attributes are involved. Sure, must be implemented finally, but IMHO not on the first evaluation attempt. Now how the data goes in and out of the longrel. - On heap_insert(), we look for non NULL LONG attributes in the tuple. If there could be any can simply be seen by looking at the rellongrelid in rd_rel. We fetch the value either from the memory after LongData or by using the type output function (for fetching it from the relation where it is!). Then we simply break it up into single chunks and store them with our tuples information. Now we need to do something tricky - to shrink the main data tuple size, we form a new heap tuple with the datums of the original one. But we replace all LongData items we stored by faked ones, where the varsize is sizeof(LongData) and all the other information is setup appropriate. We append that faked tuple instead, copy the resulting information into the original tuples header and throw it away. This is a point, where I'm not totally sure. Could it possibly be better or required to copy the entire faked tuple over the one we should have stored? It could never need more space, so that wouldn't be a problem. - On heap_replace(), we check all LONG attributes if they are NULL of if the information in longrelid, rowid and rowattno doesn't match our rellongrelid, tupleid, and attno. In that case this attribute might have an old content in the longrel, which we need to delete first. The rest of the operation is exactly like for heap_insert(), except all the attributes information did match - then it's our own OLD value that wasn't changed. So we can simply skip it - the existing data is still valid. - heap_delete() is so simple that I don't explain it. Now I hear you asking "how could this overhead be a win?" :-) That's easy to explain. As long as you don't use a LONG column in the WHERE clause, when will the data be fetched? At the time it's finally clear that it's needed. That's when a result tuple is sent to the client (type output) or when a tuple resulting from INSERT ... SELECT should be stored. Thus, all the tuples moving around in the execution tree, getting joined together, abused by sorts and aggregates and filtered out again, allways contain the small LongData struct, not the data itself. Wheren't there recently reports about too expansive sorts due to their huge size? Another bonus would be this: What happens on an UPDATE to a table having LONG attributes? If the attribute is not modified, the OLD LongData will be found in the targetlist, and we'll not waste any space by storing the same information again. IIRC that one was one of the biggest concerns about storing huge data in tuples, but it disappeared without leaving a trace - funny eh? It is so simple, that I fear I made some mistake somewhere. But where? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
В списке pgsql-hackers по дате отправления: