Обсуждение: extensible external toast tuple support
Hi, In http://archives.postgresql.org/message-id/20130216164231.GA15069%40awork2.anarazel.de I presented the need for 'indirect' toast tuples which point into memory instead of a toast table. In the comments to that proposal, off-list and in-person talks the wish to make that a more general concept has been voiced. The previous patch used varattrib_1b_e.va_len_1be to discern between different types of external tuples. That obviously only works if the data sizes of all possibly stored datum types are distinct which isn't nice. So what the newer patch now does is to rename that field into 'va_tag' and decide based on that what kind of Datum we have. To get the actual length of that datum there now is a VARTAG_SIZE() macro which maps the tags back to size. To keep on-disk compatibility the size of an external toast tuple containing a varatt_external is used as its tag value. This should allow for fairly easy development of a new compression scheme for out-of-line toast tuples. It will *not* work for compressed inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a problem or that if it is, that it cannot be solved separately. FWIW, in some quick microbenchmarks I couldn't find any performance difference due to the slightly more complex size computation which I do *not* find surprising. Opinions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Thu, May 30, 2013 at 7:42 AM, Andres Freund <andres@2ndquadrant.com> wrote: > In > http://archives.postgresql.org/message-id/20130216164231.GA15069%40awork2.anarazel.de > I presented the need for 'indirect' toast tuples which point into memory > instead of a toast table. In the comments to that proposal, off-list and > in-person talks the wish to make that a more general concept has > been voiced. > > The previous patch used varattrib_1b_e.va_len_1be to discern between > different types of external tuples. That obviously only works if the > data sizes of all possibly stored datum types are distinct which isn't > nice. So what the newer patch now does is to rename that field into > 'va_tag' and decide based on that what kind of Datum we have. To get the > actual length of that datum there now is a VARTAG_SIZE() macro which > maps the tags back to size. > To keep on-disk compatibility the size of an external toast tuple > containing a varatt_external is used as its tag value. > > This should allow for fairly easy development of a new compression > scheme for out-of-line toast tuples. It will *not* work for compressed > inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a > problem or that if it is, that it cannot be solved separately. > > FWIW, in some quick microbenchmarks I couldn't find any performance > difference due to the slightly more complex size computation which I do > *not* find surprising. > > Opinions? Seems pretty sensible to me. The patch is obviously WIP but the direction seems fine to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-05-31 23:42:51 -0400, Robert Haas wrote: > > This should allow for fairly easy development of a new compression > > scheme for out-of-line toast tuples. It will *not* work for compressed > > inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a > > problem or that if it is, that it cannot be solved separately. > Seems pretty sensible to me. The patch is obviously WIP but the > direction seems fine to me. So, I played a bit more with this, with an eye towards getting this into a non WIP state, but: While I still think the method for providing indirect external Datum support is fine, I don't think my sketch for providing extensible compression is. As mentioned upthread we also have compressed datums inline as VARATT_4B_C datums. The way toast_insert_or_update() is that when it finds it needs to shrink a Datum it tries to compress it *inline* and only if that still is to big it gets stored out of line. Changing that doesn't sound like a good idea since it a) would make an already complicated function even more complicated and b) would likely make the whole thing slower since we would frequently compress with two different methods. So I think for compressed tuples we need an independent trick that also works for inline compressed tuples: The current way 4B_C datums work is that they are basically a normal 4B Datum (but discernible by a different bit in the non-length part of the length). Such compressed Datums store the uncompressed length of a Datum in its first 4 bytes. Since we cannot have uncompressed Datums longer than 1GB due to varlena limitations 2 bits in that lenght are free to discern different compression algorithms. So what my (absolutely prototype) patch does is to use those two bits to discern different compression algorithms. Currently it simply assumes that '00' is pglz while '01' is snappy-c. That would leave us with two other possible algorithms ('11' and '10'), but we could easily enough extend that to more algorithms if we want by not regarding the first 4 bytes as a length word but as the compression algorithm indicator if the two high bits are set. So, before we go even more into details here are some benchmark results based on playing with a partial dump (1.1GB) of the public pg mailing list archives (Thanks Alvaro!): BEGIN; SET toast_compression_algo = 0; -- pglz CREATE TABLE messages ( ... ); \i ~/tmp/messages.sane.dump Time: 43053.786 ms ALTER TABLE messages RENAME TO messages_pglz; COMMIT; BEGIN; SET toast_compression_algo = 1; -- snappy CREATE TABLE messages ( ... ); \i ~/tmp/messages.sane.dump Time: 30340.210 ms ALTER TABLE messages RENAME TO messages_snappy; COMMIT; postgres=# \dt+ List of relations Schema | Name | Type | Owner | Size | Description --------+-----------------+-------+--------+--------+------------- public | messages_pglz | table | andres | 526 MB | public | messages_snappy | table | andres | 523 MB | Ok, so while the data size didn't change all that much the compression was quite noticeably faster. With snappy the most visible bottleneck is COPY not compression although it's still in the top 3 functions... So what about data reading? postgres=# COPY messages_pglz TO '/dev/null' WITH BINARY; COPY 86953 Time: 3825.241 ms postgres=# COPY messages_snappy TO '/dev/null' WITH BINARY; COPY 86953 Time: 3674.844 ms Ok, so here the performance difference is relatively small. Turns out that's because most of the time is spent in the output routines, even though we are using BINARY mode. tsvector_send is expensive. postgres=# COPY (SELECT rawtxt FROM messages_pglz) TO '/dev/null' WITH BINARY; COPY 86953 Time: 2180.512 ms postgres=# COPY (SELECT rawtxt FROM messages_snappy) TO '/dev/null' WITH BINARY; COPY 86953 Time: 1782.810 Ok, so here the benefits are are already nicer. Imo this shows that using a different compression algorithm is quite a good idea. Important questions are: 1) Which algorithms do we want? I think snappy is a good candidate but I mostly chose it because it's BSD licenced, widely used, and has a C implementation with a useable API. LZ4 might be another interesting choice. Another slower algorithm with higher compression ratio would also be a good idea for many applications. 2) Do we want to build infrastructure for more than 3 compression algorithms? We could delay that decision till we add the 3rd. 3) Surely choosing the compression algorithm via GUC ala SET toast_compression_algo = ... isn't the way to go. I'd say a storage attribute is more appropriate? 4) The prototype removed knowledge about the internals of compression from postgres.h which imo is a good idea, but that is debatable. 5) E.g. snappy stores the uncompressed length internally as a varint, but I don't think there's a way to benefit from that on little endian machines since the highbits we use to discern from pglz are actually stored 4 bytes in... Two patches attached: 1) add snappy to src/common. The integration needs some more work. 2) Combined patch that adds indirect tuple and snappy compression. Those coul be separated, but this is an experiment so far... Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Wed, Jun 5, 2013 at 11:01 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-05-31 23:42:51 -0400, Robert Haas wrote: >> > This should allow for fairly easy development of a new compression >> > scheme for out-of-line toast tuples. It will *not* work for compressed >> > inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a >> > problem or that if it is, that it cannot be solved separately. > >> Seems pretty sensible to me. The patch is obviously WIP but the >> direction seems fine to me. > > So, I played a bit more with this, with an eye towards getting this into > a non WIP state, but: While I still think the method for providing > indirect external Datum support is fine, I don't think my sketch for > providing extensible compression is. I didn't really care about doing (and don't really want to do) both things in the same patch. I just didn't want the patch to shut the door to extensible compression in the future. > Important questions are: > 1) Which algorithms do we want? I think snappy is a good candidate but I > mostly chose it because it's BSD licenced, widely used, and has a C > implementation with a useable API. LZ4 might be another interesting > choice. Another slower algorithm with higher compression ratio > would also be a good idea for many applications. I have no opinion on this. > 2) Do we want to build infrastructure for more than 3 compression > algorithms? We could delay that decision till we add the 3rd. I think we should leave the door open, but I don't have a compelling desire to get too baroque for v1. Still, maybe if the first byte has a 1 in the high-bit, the next 7 bits should be defined as specifying a compression algorithm. 3 compression algorithms would probably last us a while; but 127 should last us, in effect, forever. > 3) Surely choosing the compression algorithm via GUC ala SET > toast_compression_algo = ... isn't the way to go. I'd say a storage > attribute is more appropriate? The way we do caching right now supposes that attoptions will be needed only occasionally. It might need to be revised if we're going to need it all the time. Or else we might need to use a dedicated pg_class column. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-06-07 10:04:15 -0400, Robert Haas wrote: > On Wed, Jun 5, 2013 at 11:01 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-05-31 23:42:51 -0400, Robert Haas wrote: > >> > This should allow for fairly easy development of a new compression > >> > scheme for out-of-line toast tuples. It will *not* work for compressed > >> > inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a > >> > problem or that if it is, that it cannot be solved separately. > > > >> Seems pretty sensible to me. The patch is obviously WIP but the > >> direction seems fine to me. > > > > So, I played a bit more with this, with an eye towards getting this into > > a non WIP state, but: While I still think the method for providing > > indirect external Datum support is fine, I don't think my sketch for > > providing extensible compression is. > > I didn't really care about doing (and don't really want to do) both > things in the same patch. I just didn't want the patch to shut the > door to extensible compression in the future. Oh. I don't want to actually commit it in the same patch either. But to keep the road for extensible compression open we kinda need to know what the way to do that is. Turns out it's an independent thing that doesn't reuse any of the respective infrastructures. I only went so far to actually implement the compression because a) my previous thoughts about how it could work were bogus b) it was fun. Turns out the benefits are imo big enough to make it worth pursuing further. > > 2) Do we want to build infrastructure for more than 3 compression > > algorithms? We could delay that decision till we add the 3rd. > > I think we should leave the door open, but I don't have a compelling > desire to get too baroque for v1. Still, maybe if the first byte has > a 1 in the high-bit, the next 7 bits should be defined as specifying a > compression algorithm. 3 compression algorithms would probably last > us a while; but 127 should last us, in effect, forever. The problem is that to discern from pglz on little endian the byte with the two high bits unset is actually the fourth byte in a toast datum. So we would need to store it in the 5th byte or invent some more complicated encoding scheme. So I think we should just define '00' as pglz, '01' as xxx, '10' as yyy and '11' as storing the schema in the next byte. > > 3) Surely choosing the compression algorithm via GUC ala SET > > toast_compression_algo = ... isn't the way to go. I'd say a storage > > attribute is more appropriate? > > The way we do caching right now supposes that attoptions will be > needed only occasionally. It might need to be revised if we're going > to need it all the time. Or else we might need to use a dedicated > pg_class column. Good point. It probably belongs right besides attstorage, seems to be the most consistent choice anyway. Alternatively, if we only add one form of compression, we can just always store in snappy/lz4/.... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 7, 2013 at 10:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Turns out the benefits are imo big enough to make it worth pursuing > further. Yeah, those were nifty numbers. > The problem is that to discern from pglz on little endian the byte with > the two high bits unset is actually the fourth byte in a toast datum. So > we would need to store it in the 5th byte or invent some more > complicated encoding scheme. > > So I think we should just define '00' as pglz, '01' as xxx, '10' as yyy > and '11' as storing the schema in the next byte. Not totally following, but I'm fine with that. >> > 3) Surely choosing the compression algorithm via GUC ala SET >> > toast_compression_algo = ... isn't the way to go. I'd say a storage >> > attribute is more appropriate? >> >> The way we do caching right now supposes that attoptions will be >> needed only occasionally. It might need to be revised if we're going >> to need it all the time. Or else we might need to use a dedicated >> pg_class column. > > Good point. It probably belongs right besides attstorage, seems to be > the most consistent choice anyway. Possibly, we could even store it in attstorage. We're really only using two bits of that byte right now, so just invent some more letters. > Alternatively, if we only add one form of compression, we can just > always store in snappy/lz4/.... Not following. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-06-07 10:44:24 -0400, Robert Haas wrote: > On Fri, Jun 7, 2013 at 10:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > Turns out the benefits are imo big enough to make it worth pursuing > > further. > > Yeah, those were nifty numbers. > > > The problem is that to discern from pglz on little endian the byte with > > the two high bits unset is actually the fourth byte in a toast datum. So > > we would need to store it in the 5th byte or invent some more > > complicated encoding scheme. > > > > So I think we should just define '00' as pglz, '01' as xxx, '10' as yyy > > and '11' as storing the schema in the next byte. > > Not totally following, but I'm fine with that. Currently on a little endian system the pglz header contains the length in the first four bytes as: [dddddddd][dddddddd][dddddddd][xxdddddd] Where dd are valid length bits for pglz and xx are the two bits which are always zero since we only ever store up to 1GB. We can redefine 'xx' to mean whatever we want but we cannot change it's place. > >> > 3) Surely choosing the compression algorithm via GUC ala SET > >> > toast_compression_algo = ... isn't the way to go. I'd say a storage > >> > attribute is more appropriate? > >> > >> The way we do caching right now supposes that attoptions will be > >> needed only occasionally. It might need to be revised if we're going > >> to need it all the time. Or else we might need to use a dedicated > >> pg_class column. > > > > Good point. It probably belongs right besides attstorage, seems to be > > the most consistent choice anyway. > > Possibly, we could even store it in attstorage. We're really only > using two bits of that byte right now, so just invent some more > letters. Hm. Possible, but I don't think that's worth it. There's a padding byte before attinhcount anyway. Storing the preferred location in attstorage (plain, preferred-internal, external, preferred-external) separately from the compression seems to make sense to me. > > Alternatively, if we only add one form of compression, we can just > > always store in snappy/lz4/.... > > Not following. I mean, we don't necessarily need to make it configurable if we just add one canonical new "better" compression format. I am not sure that's sufficient since I can see usecases for 'very fast but not too well compressed' and 'very well compressed but slow', but I am personally not really interested in the second case, so ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/07/2013 04:54 PM, Andres Freund wrote: > > I mean, we don't necessarily need to make it configurable if we just add > one canonical new "better" compression format. I am not sure that's > sufficient since I can see usecases for 'very fast but not too well > compressed' and 'very well compressed but slow', but I am personally not > really interested in the second case, so ... As DE-comression is often still fast for slow-but-good compression, the obvious use case for 2nd is read-mostly data > > Greetings, > > Andres Freund > -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 2013-06-07 17:27:28 +0200, Hannu Krosing wrote: > On 06/07/2013 04:54 PM, Andres Freund wrote: > > > > I mean, we don't necessarily need to make it configurable if we just add > > one canonical new "better" compression format. I am not sure that's > > sufficient since I can see usecases for 'very fast but not too well > > compressed' and 'very well compressed but slow', but I am personally not > > really interested in the second case, so ... > As DE-comression is often still fast for slow-but-good compression, > the obvious use case for 2nd is read-mostly data Well. Those algorithms still are up to magnitude or so slower decompressing than something like snappy, lz4 or even pglz while the compression ratio usually is only like 50-80% improved... So you really need a good bit of compressible data (so the amount of storage actually hurts) that you don't read all that often (since you then would bottleneck on compression too often). That's just not something I run across to regularly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > I mean, we don't necessarily need to make it configurable if we just add > one canonical new "better" compression format. I am not sure that's > sufficient since I can see usecases for 'very fast but not too well > compressed' and 'very well compressed but slow', but I am personally not > really interested in the second case, so ... IME, once we've changed it once, the odds that we'll want to change it again go up drastically. I think if we're going to make a change here we should leave room for future revisions. regards, tom lane
On 2013-06-07 11:46:45 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > I mean, we don't necessarily need to make it configurable if we just add > > one canonical new "better" compression format. I am not sure that's > > sufficient since I can see usecases for 'very fast but not too well > > compressed' and 'very well compressed but slow', but I am personally not > > really interested in the second case, so ... > > IME, once we've changed it once, the odds that we'll want to change it > again go up drastically. I think if we're going to make a change here > we should leave room for future revisions. The above bit was just about how much control we give the user over the compression algorithm used for compressing new data. If we just add one method atm which we think is just about always better than pglz there's not much need to provide the tunables already. I don't think there's any question over the fact that we should leave room on the storage level to reasonably easy add new compression algorithms without requiring on-disk changes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > 2) Combined patch that adds indirect tuple and snappy compression. Those > coul be separated, but this is an experiment so far... Can we have a separate header for toast definitions? (i.e. split them out of postgres.h) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-06-07 11:46:45 -0400, Tom Lane wrote: >> IME, once we've changed it once, the odds that we'll want to change it >> again go up drastically. I think if we're going to make a change here >> we should leave room for future revisions. > The above bit was just about how much control we give the user over the > compression algorithm used for compressing new data. If we just add one > method atm which we think is just about always better than pglz there's > not much need to provide the tunables already. Ah, ok, I thought you were talking about storage-format decisions not about whether to expose a tunable setting. regards, tom lane
On 06/07/2013 05:38 PM, Andres Freund wrote: > On 2013-06-07 17:27:28 +0200, Hannu Krosing wrote: >> On 06/07/2013 04:54 PM, Andres Freund wrote: >>> I mean, we don't necessarily need to make it configurable if we just add >>> one canonical new "better" compression format. I am not sure that's >>> sufficient since I can see usecases for 'very fast but not too well >>> compressed' and 'very well compressed but slow', but I am personally not >>> really interested in the second case, so ... >> As DE-comression is often still fast for slow-but-good compression, >> the obvious use case for 2nd is read-mostly data > Well. Those algorithms still are up to magnitude or so slower > decompressing than something like snappy, lz4 or even pglz while the > compression ratio usually is only like 50-80% improved... So you really > need a good bit of compressible data (so the amount of storage actually > hurts) that you don't read all that often (since you then would > bottleneck on compression too often). > That's just not something I run across to regularly. While the difference in compression speeds between algorithms is different, it may be more then offset in favour of better compression if there is real I/O involved as exemplified here: http://www.citusdata.com/blog/64-zfs-compression -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
* Andres Freund (andres@2ndquadrant.com) wrote: > Currently on a little endian system the pglz header contains the length > in the first four bytes as: > [dddddddd][dddddddd][dddddddd][xxdddddd] > Where dd are valid length bits for pglz and xx are the two bits which > are always zero since we only ever store up to 1GB. We can redefine 'xx' > to mean whatever we want but we cannot change it's place. I'm not thrilled with the idea of using those 2 bits from the length integer. I understand the point of it and that we'd be able to have binary compatibility from it but is it necessary to track at the per-tuple level..? What about possibly supporting >1GB objects at some point (yes, I know there's a lot of other issues there, but still). We've also got complexity around the size of the length integer already. Anyway, just not 100% sure that we really want to use these bits for this. Thanks, Stephen
On 2013-06-07 12:16:48 -0400, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: > > Currently on a little endian system the pglz header contains the length > > in the first four bytes as: > > [dddddddd][dddddddd][dddddddd][xxdddddd] > > Where dd are valid length bits for pglz and xx are the two bits which > > are always zero since we only ever store up to 1GB. We can redefine 'xx' > > to mean whatever we want but we cannot change it's place. > > I'm not thrilled with the idea of using those 2 bits from the length > integer. I understand the point of it and that we'd be able to have > binary compatibility from it but is it necessary to track at the > per-tuple level..? What about possibly supporting >1GB objects at some > point (yes, I know there's a lot of other issues there, but still). > We've also got complexity around the size of the length integer already. I am open for different suggestions, but I don't know of any realistic ones. Note that the 1GB limitation is already pretty heavily baked in into varlenas itself (which is not the length we are talking about here!) since we use the two remaining bits discern between the 4 types of varlenas we have. * short (1b) * short, pointing to a ondisk tuple (1b_e) * long (4b) * long compressed (4b_c) Since long compressed ones always need to be convertible to long ones we can't ever have a 'rawsize' (which is what's proposed to be used for this) that's larger than 1GB. So, breaking the 1GB limit will not be stopped by this, but much much earlier. And it will require a break in on-disk compatibility. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-05-31 23:42:51 -0400, Robert Haas wrote: > On Thu, May 30, 2013 at 7:42 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > In > > http://archives.postgresql.org/message-id/20130216164231.GA15069%40awork2.anarazel.de > > I presented the need for 'indirect' toast tuples which point into memory > > instead of a toast table. In the comments to that proposal, off-list and > > in-person talks the wish to make that a more general concept has > > been voiced. > > > > The previous patch used varattrib_1b_e.va_len_1be to discern between > > different types of external tuples. That obviously only works if the > > data sizes of all possibly stored datum types are distinct which isn't > > nice. So what the newer patch now does is to rename that field into > > 'va_tag' and decide based on that what kind of Datum we have. To get the > > actual length of that datum there now is a VARTAG_SIZE() macro which > > maps the tags back to size. > > To keep on-disk compatibility the size of an external toast tuple > > containing a varatt_external is used as its tag value. > > > > This should allow for fairly easy development of a new compression > > scheme for out-of-line toast tuples. It will *not* work for compressed > > inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a > > problem or that if it is, that it cannot be solved separately. > > > > FWIW, in some quick microbenchmarks I couldn't find any performance > > difference due to the slightly more complex size computation which I do > > *not* find surprising. > > > > Opinions? > > Seems pretty sensible to me. The patch is obviously WIP but the > direction seems fine to me. Here's the updated version. It shouldn't contain any obvious WIP pieces anymore, although I think it needs some more documentation. I am just not sure where to add it yet, postgres.h seems like a bad place :/ Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Andres Freund escribió: > Here's the updated version. It shouldn't contain any obvious WIP pieces > anymore, although I think it needs some more documentation. I am just > not sure where to add it yet, postgres.h seems like a bad place :/ How about a new file, say src/include/access/toast.h? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-06-14 19:14:15 -0400, Alvaro Herrera wrote: > Andres Freund escribió: > > > Here's the updated version. It shouldn't contain any obvious WIP pieces > > anymore, although I think it needs some more documentation. I am just > > not sure where to add it yet, postgres.h seems like a bad place :/ > > How about a new file, say src/include/access/toast.h? Well, the question is if that buys us all that much, we need the varlena definitions to be available pretty much everywhere. Except of section 3 - which we reduced to be pretty darn small these days - of postgres.h pretty much all of it is concerned with Datums, a good of them being varlenas. We could move section 1) into its own file and unconditionally include it... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund <andres@2ndquadrant.com> wrote:
-- Here's the updated version. It shouldn't contain any obvious WIP pieces
anymore, although I think it needs some more documentation. I am just
not sure where to add it yet, postgres.h seems like a bad place :/
I have a few comments and questions after reviewing this patch.
- heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK macro?
- heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK macro?
- I'm not sure if plural for datum is good to use. Datum values?
- -1 from me to use enum for tag types, as I don't think it needs to be. This looks more like other "kind" macros such as relkind, but I know there's pros/cons
- don't we need cast for tag value comparison in VARTAG_SIZE macro, since tag is unit8 and enum is signed int?
- Is there better way to represent ONDISK size, instead of magic number 18? I'd suggest constructing it with sizeof(varatt_external).
Other than that, the patch applies fine and make check runs, though I don't think the new code path is exercised well, as no one is creating INDIRECT datum yet.
Also, I wonder how we could add more compression in datum, as well as how we are going to add more compression options in database. I'd love to see pluggability here, as surely the core cannot support dozens of different compression algorithms, but because the data on disk is critical and we cannot do anything like user defined functions. The algorithms should be optional builtin so that once the system is initialized the the plugin should not go away. Anyway pluggable compression is off-topic here.
Thanks,
Hitoshi Harada
On 2013-06-18 00:56:17 -0700, Hitoshi Harada wrote: > On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund <andres@2ndquadrant.com>wrote: > > > > > Here's the updated version. It shouldn't contain any obvious WIP pieces > > anymore, although I think it needs some more documentation. I am just > > not sure where to add it yet, postgres.h seems like a bad place :/ > > > > > I have a few comments and questions after reviewing this patch. Cool! > - heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK macro? It calls toast_fetch_datum() for any kind of external datum, so it should be fine since ONDISK is handled in there. > - I'm not sure if plural for datum is good to use. Datum values? Not sure at all either. Supposedly data is the plural for datum, but for the capitalized version Datums is used in other places, so I think it might be fine. > - -1 from me to use enum for tag types, as I don't think it needs to be. > This looks more like other "kind" macros such as relkind, but I know > there's pros/cons Well, relkind cannot easily be a enum because it needs to be stored in a char field. I like using enums because it gives you the chance of using switch()es at some point and getting warned about missed cases. Why do you dislike it? > - don't we need cast for tag value comparison in VARTAG_SIZE macro, since > tag is unit8 and enum is signed int? I think it should be fine (and the compiler doesn't warn) due to the integer promotion rules. > - Is there better way to represent ONDISK size, instead of magic number > 18? I'd suggest constructing it with sizeof(varatt_external). I explicitly did not want to do that, since the numbers really don't have anything to do with the struct size anymore. Otherwise the next person reading that part will be confused because there's a 8byte struct with the enum value 1. Or somebody will try adding another type of external tuple that also needs 18 bytes by copying the sizeof(). Which will fail in ugly, non-obvious ways. > Other than that, the patch applies fine and make check runs, though I don't > think the new code path is exercised well, as no one is creating INDIRECT > datum yet. Yea, I only use the API in the changeset extraction patch. That actually worries me to. But I am not sure where to introduce usage of it in core without making the patchset significantly larger. I was thinking of adding an option to heap_form_tuple/heap_fill_tuple to allow it to reference _4B Datums instead of copying them, but that would require quite some adjustments on the callsites. > Also, I wonder how we could add more compression in datum, as well as how > we are going to add more compression options in database. I'd love to see > pluggability here, as surely the core cannot support dozens of different > compression algorithms, but because the data on disk is critical and we > cannot do anything like user defined functions. The algorithms should be > optional builtin so that once the system is initialized the the plugin > should not go away. Anyway pluggable compression is off-topic here. Separate patchset by now, yep ;). Thanks! Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jun 18, 2013 at 1:58 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-18 00:56:17 -0700, Hitoshi Harada wrote:Cool!
> On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund <andres@2ndquadrant.com>wrote:
>
> >
> > Here's the updated version. It shouldn't contain any obvious WIP pieces
> > anymore, although I think it needs some more documentation. I am just
> > not sure where to add it yet, postgres.h seems like a bad place :/
> >
> >
> I have a few comments and questions after reviewing this patch.It calls toast_fetch_datum() for any kind of external datum, so it
> - heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK macro?
should be fine since ONDISK is handled in there.
toast_fetch_datum doesn't expect the input is INDIRECT. At least I see the code path in the same file around toast_insert_or_update() where we have a chance to (possibly accidentally) try to fetch ONDISK toasted value from non-ONDISK datum.
> - -1 from me to use enum for tag types, as I don't think it needs to be.Well, relkind cannot easily be a enum because it needs to be stored in a
> This looks more like other "kind" macros such as relkind, but I know
> there's pros/cons
char field. I like using enums because it gives you the chance of using
switch()es at some point and getting warned about missed cases.
Why do you dislike it?
I put -1 just because it doesn't have to be now. If you argue relkind is char field, tag is also uint8.
> - don't we need cast for tag value comparison in VARTAG_SIZE macro, sinceI think it should be fine (and the compiler doesn't warn) due to the
> tag is unit8 and enum is signed int?
integer promotion rules.
> - Is there better way to represent ONDISK size, instead of magic numberI explicitly did not want to do that, since the numbers really don't
> 18? I'd suggest constructing it with sizeof(varatt_external).
have anything to do with the struct size anymore. Otherwise the next
person reading that part will be confused because there's a 8byte struct
with the enum value 1. Or somebody will try adding another type of
external tuple that also needs 18 bytes by copying the sizeof(). Which
will fail in ugly, non-obvious ways.
Sounds reasonable. I was just confused when I looked at it first.
> Other than that, the patch applies fine and make check runs, though I don'tYea, I only use the API in the changeset extraction patch. That actually
> think the new code path is exercised well, as no one is creating INDIRECT
> datum yet.
worries me to. But I am not sure where to introduce usage of it in core
without making the patchset significantly larger. I was thinking of
adding an option to heap_form_tuple/heap_fill_tuple to allow it to
reference _4B Datums instead of copying them, but that would require
quite some adjustments on the callsites.
Maybe you can create a user-defined function that creates such datum for testing, just in order to demonstrate it works fine.
> Also, I wonder how we could add more compression in datum, as well as howSeparate patchset by now, yep ;).
> we are going to add more compression options in database. I'd love to see
> pluggability here, as surely the core cannot support dozens of different
> compression algorithms, but because the data on disk is critical and we
> cannot do anything like user defined functions. The algorithms should be
> optional builtin so that once the system is initialized the the plugin
> should not go away. Anyway pluggable compression is off-topic here.
I just found. Will look into it.
Thanks,
On 2013-06-18 10:13:39 -0700, Hitoshi Harada wrote: > On Tue, Jun 18, 2013 at 1:58 AM, Andres Freund <andres@2ndquadrant.com>wrote: > > > On 2013-06-18 00:56:17 -0700, Hitoshi Harada wrote: > > > On Fri, Jun 14, 2013 at 4:06 PM, Andres Freund <andres@2ndquadrant.com > > >wrote: > > > > > > > > > > > Here's the updated version. It shouldn't contain any obvious WIP pieces > > > > anymore, although I think it needs some more documentation. I am just > > > > not sure where to add it yet, postgres.h seems like a bad place :/ > > > > > > > > > > > I have a few comments and questions after reviewing this patch. > > > > Cool! > > > > > - heap_tuple_fetch_attr doesn't need to be updated to reflect ONDISK > > macro? > > > > It calls toast_fetch_datum() for any kind of external datum, so it > > should be fine since ONDISK is handled in there. > > > > > toast_fetch_datum doesn't expect the input is INDIRECT. At least I see the > code path in the same file around toast_insert_or_update() where we have a > chance to (possibly accidentally) try to fetch ONDISK toasted value from > non-ONDISK datum. Hm. Yes. I don't think that's really possible in any codepath I have thought of - or tested - but that's not a good reason not to make it robust. > > - -1 from me to use enum for tag types, as I don't think it needs to be. > > > This looks more like other "kind" macros such as relkind, but I know > > > there's pros/cons > > > > Well, relkind cannot easily be a enum because it needs to be stored in a > > char field. I like using enums because it gives you the chance of using > > switch()es at some point and getting warned about missed cases. > > > > Why do you dislike it? > > > > > > I put -1 just because it doesn't have to be now. If you argue relkind is > char field, tag is also uint8. Well, but to expose it to sql you need a numeric value that actually translates to a visible ascii character. > Maybe you can create a user-defined function that creates such datum for > testing, just in order to demonstrate it works fine. Hm. Will try whether I can think of something not completely pointless ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 5, 2013 at 8:01 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Thanks,
Two patches attached:
1) add snappy to src/common. The integration needs some more work.
2) Combined patch that adds indirect tuple and snappy compression. Those
coul be separated, but this is an experiment so far...
I took a look at them a little. This proposal is a super set of patch #1127.
https://commitfest.postgresql.org/action/patch_view?id=1127
https://commitfest.postgresql.org/action/patch_view?id=1127
- <endian.h> is not found in my mac. Commented it out, it builds clean.
- I don't see what the added is_inline flag means in toast_compress_datum(). Obviously not used, but I wonder what was the intention.
- By this,
* compression method. We could just use the two bytes to store 3 other
* compression methods but maybe we better don't paint ourselves in a
* corner again.
* compression methods but maybe we better don't paint ourselves in a
* corner again.
you mean two bits, not two bytes?
And patch adds snappy-c in src/common. I definitely like the idea to have pluggability for different compression algorithm for datum, but I am not sure if this location is a good place to add it. Maybe we want a modern algorithm other than pglz for different components across the system in core, and it's better to let users choose which to add more. The mapping between the index number and compression algorithm should be consistent for the entire life of database, so it should be defined at initdb time. From core maintainability perspective and binary size of postgres, I don't think we want to put dozenes of different algorithms into core in the future. And maybe someone will want to try BSD-incompatible algorithm privately.
I guess it's ok to use one more byte to indicate which compression is used for the value. It is a compressed datum and we don't expect something short anyway. I don't see big problems in this patch other than how to manage the pluggability, but it is a WIP patch anyway, so I'm going to mark it as Returned with Feedback.Thanks,
--
Hitoshi Harada
On 2013-06-19 00:15:56 -0700, Hitoshi Harada wrote: > On Wed, Jun 5, 2013 at 8:01 AM, Andres Freund <andres@2ndquadrant.com>wrote: > > > > Two patches attached: > > 1) add snappy to src/common. The integration needs some more work. > > 2) Combined patch that adds indirect tuple and snappy compression. Those > > coul be separated, but this is an experiment so far... > > > > > > > I took a look at them a little. This proposal is a super set of patch > #1127. > https://commitfest.postgresql.org/action/patch_view?id=1127 > > - <endian.h> is not found in my mac. Commented it out, it builds clean. > - I don't see what the added is_inline flag means in > toast_compress_datum(). Obviously not used, but I wonder what was the > intention. Hm. I don't think you've looked at the latest version of the patch, check http://archives.postgresql.org/message-id/20130614230625.GD19641@awork2.anarazel.de - that should be linked from the commitfest. The is_inline part should be gone there. > - By this, > * compression method. We could just use the two bytes to store 3 other > * compression methods but maybe we better don't paint ourselves in a > * corner again. > you mean two bits, not two bytes? Yes, typo... The plan is to use those two bits in the following way - 00 pglz - 01 snappy/lz4/whatever - 10 another - 11 one extra byte > And patch adds snappy-c in src/common. I definitely like the idea to have > pluggability for different compression algorithm for datum, but I am not > sure if this location is a good place to add it. Maybe we want a modern > algorithm other than pglz for different components across the system in > core, and it's better to let users choose which to add more. The mapping > between the index number and compression algorithm should be consistent for > the entire life of database, so it should be defined at initdb time. From > core maintainability perspective and binary size of postgres, I don't think > we want to put dozenes of different algorithms into core in the future. > And maybe someone will want to try BSD-incompatible algorithm privately. We've argued about this in the linked thread and I still think we should add one algorithm now, and when that one is outdated - which probably will be some time - replace it. Building enough infrastructure to make this really pluggable is not likely enough to be beneficial to many. There will be a newer version of the patch coming today or tomorrow, so there's probably no point in looking at the one linked above before that... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund <andres@2ndquadrant.com> wrote: > There will be a newer version of the patch coming today or tomorrow, so > there's probably no point in looking at the one linked above before > that... This patch is marked as "Ready for Committer" in the CommitFest app. But it is not clear to me where the patch is that is being proposed for commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
-- On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund <andres@2ndquadrant.com> wrote:This patch is marked as "Ready for Committer" in the CommitFest app.
> There will be a newer version of the patch coming today or tomorrow, so
> there's probably no point in looking at the one linked above before
> that...
But it is not clear to me where the patch is that is being proposed
for commit.
No, this conversation is about patch #1153, pluggable toast compression, which is in Needs Review, and you may be confused with #1127, extensible external toast tuple support.
Hitoshi Harada
Hi, Please find attached the next version of the extensible toast support. There basically are two changes: * handle indirect toast tuples properly in heap_tuple_fetch_attr and related places * minor comment adjustments Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 2013-06-27 09:17:10 -0700, Hitoshi Harada wrote: > On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund <andres@2ndquadrant.com> > > wrote: > > > There will be a newer version of the patch coming today or tomorrow, so > > > there's probably no point in looking at the one linked above before > > > that... > > > > This patch is marked as "Ready for Committer" in the CommitFest app. > > But it is not clear to me where the patch is that is being proposed > > for commit. > > > > > > > > No, this conversation is about patch #1153, pluggable toast compression, > which is in Needs Review, and you may be confused with #1127, extensible > external toast tuple support. Well, actually this is the correct thread, and pluggable toast compression developed out of it. You had marked #1127 as ready for committer although you had noticed an omission in heap_tuple_fetch_attr... Answered with the new patch version toplevel in the thread, to make this hopefully less confusing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 27, 2013 at 12:56 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Please find attached the next version of the extensible toast > support. There basically are two changes: > > * handle indirect toast tuples properly in heap_tuple_fetch_attr > and related places > * minor comment adjustments It looks to me like you need to pass true, rather than false, as the first argument to TrapMacro: +#define VARTAG_SIZE(tag) \ + ((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) : \ + (tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \ + TrapMacro(false, "unknown vartag")) Still looking at the rest of this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 28, 2013 at 9:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jun 27, 2013 at 12:56 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> Please find attached the next version of the extensible toast >> support. There basically are two changes: >> >> * handle indirect toast tuples properly in heap_tuple_fetch_attr >> and related places >> * minor comment adjustments > > It looks to me like you need to pass true, rather than false, as the > first argument to TrapMacro: > > +#define VARTAG_SIZE(tag) \ > + ((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) : \ > + (tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \ > + TrapMacro(false, "unknown vartag")) > > Still looking at the rest of this. Why does toast_insert_or_update() need to go through all the rigamarole in toast_datum_differs()? I would have thought that it could simply treat any external-indirect value as needing to be detoasted and retoasted, since the destination is the disk anyhow. Do you see external-indirect values getting used for anything other than logical replication? Is there code to do so anywhere? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-06-28 09:49:53 -0400, Robert Haas wrote: > On Fri, Jun 28, 2013 at 9:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jun 27, 2013 at 12:56 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Please find attached the next version of the extensible toast > >> support. There basically are two changes: > >> > >> * handle indirect toast tuples properly in heap_tuple_fetch_attr > >> and related places > >> * minor comment adjustments > > > > It looks to me like you need to pass true, rather than false, as the > > first argument to TrapMacro: > > > > +#define VARTAG_SIZE(tag) \ > > + ((tag) == VARTAG_INDIRECT ? sizeof(varatt_indirect) : \ > > + (tag) == VARTAG_ONDISK ? sizeof(varatt_external) : \ > > + TrapMacro(false, "unknown vartag")) Heh. I was obviously trying to be too cute ;) Good idea & thanks for committing it separately. > Do you see external-indirect values getting used for anything other > than logical replication? Is there code to do so anywhere? Yes and not really. I think we can reuse it to avoid repeated detoastings, even in relatively normal queries. What I am absolutely not sure yet is how to automatically decide when we want to keep the tuple in memory because it will be beneficial, and when not because of the obvious memory overhead. And how to keep track of the memory context used to allocate the referenced data. There are some usecases where I think it might be relatively easy to decide its a win. E.g tuples passed to triggers if there are multiple ones of them. I've played a bit with using that functionality in C functions, but nothing publishable, more to make sure things work. > Why does toast_insert_or_update() need to go through all the > rigamarole in toast_datum_differs()? I would have thought that it > could simply treat any external-indirect value as needing to be > detoasted and retoasted, since the destination is the disk anyhow. We could do that, yes. But I think it might be better not to: If we simplify the tuples used in a query to not reference ondisk tuples anymore and we then UPDATE using that new version I would rather not retoast all the unchanged columns. I can e.g. very well imagine that we decide to resolve toasted Datums to indirect Datums during an UPDATE if there are multiple BEFORE UPDATE triggers to avoid detoasting in each and every one of them. Such a tuple will then passed to heap_update... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 28, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Why does toast_insert_or_update() need to go through all the >> rigamarole in toast_datum_differs()? I would have thought that it >> could simply treat any external-indirect value as needing to be >> detoasted and retoasted, since the destination is the disk anyhow. > > We could do that, yes. But I think it might be better not to: If we > simplify the tuples used in a query to not reference ondisk tuples > anymore and we then UPDATE using that new version I would rather not > retoast all the unchanged columns. > > I can e.g. very well imagine that we decide to resolve toasted Datums to > indirect Datums during an UPDATE if there are multiple BEFORE UPDATE > triggers to avoid detoasting in each and every one of them. Such a tuple > will then passed to heap_update... I must be missing something. At that point, yes, you'd like to avoid re-toasting unnecessarily, but ISTM you've already bought the farm. Unless I'm misunderstanding the code as written, you'd just end up writing the indirect pointer back out to disk in that scenario. That's not gonna work... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-06-28 11:25:50 -0400, Robert Haas wrote: > On Fri, Jun 28, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Why does toast_insert_or_update() need to go through all the > >> rigamarole in toast_datum_differs()? I would have thought that it > >> could simply treat any external-indirect value as needing to be > >> detoasted and retoasted, since the destination is the disk anyhow. > > > > We could do that, yes. But I think it might be better not to: If we > > simplify the tuples used in a query to not reference ondisk tuples > > anymore and we then UPDATE using that new version I would rather not > > retoast all the unchanged columns. > > > > I can e.g. very well imagine that we decide to resolve toasted Datums to > > indirect Datums during an UPDATE if there are multiple BEFORE UPDATE > > triggers to avoid detoasting in each and every one of them. Such a tuple > > will then passed to heap_update... > > I must be missing something. At that point, yes, you'd like to avoid > re-toasting unnecessarily, but ISTM you've already bought the farm. > Unless I'm misunderstanding the code as written, you'd just end up > writing the indirect pointer back out to disk in that scenario. > That's not gonna work... You didn't misunderstand anything unfortunately. I broke that somewhere along the road, probably when factoring things out to toast_datum_differs(). Fixed... Which shows that we need tests. I've added some using a function in regress.c that makes all datums indirect. Together with a triggers that allows some testing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Mon, Jul 1, 2013 at 3:49 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> I must be missing something. At that point, yes, you'd like to avoid >> re-toasting unnecessarily, but ISTM you've already bought the farm. >> Unless I'm misunderstanding the code as written, you'd just end up >> writing the indirect pointer back out to disk in that scenario. >> That's not gonna work... > > You didn't misunderstand anything unfortunately. I broke that somewhere > along the road, probably when factoring things out to > toast_datum_differs(). Fixed... > Which shows that we need tests. I've added some using a function in > regress.c that makes all datums indirect. Together with a triggers that > allows some testing. OK, that's nifty. + elog(ERROR, "deleteing a tuple indirect datums doesn't make sense"); That's spelled wrong. + Assert(VARATT_IS_EXTERNAL_ONDISK(old_value)); + + /* fast path for the common case where we have the toast oid available */ + if (VARATT_IS_EXTERNAL_ONDISK(old_value) && + VARATT_IS_EXTERNAL_ONDISK(new_value)) + return memcmp((char *) old_value, (char *) new_value, + VARSIZE_EXTERNAL(old_value)) != 0; You've already asserted VARATT_IS_EXTERNAL_ONDISK(old_value), so there can't be any need to test that condition again. + changed = memcmp(VARDATA_ANY(old_data), VARDATA_ANY(new_data), + VARSIZE_ANY_EXHDR(old_data)) != 0; + if (!changed) + *needs_rewrite = true; + + /* heap_tuple_untoast_attr copied data */ + pfree(old_data); + if (new_value != new_data) + pfree(new_data); + return changed; So... basically, we always set needs_rewrite to !changed, except when VARATT_IS_EXTERNAL_ONDISK(new_value). In that latter case we return after doing memcmp(). That's kind of byzantine and could probably be simplified by pushing the first if-clause of this function up into the caller and then making the function's job to simply compare the datums on the assumption that the new datum is not external on-disk. Then you wouldn't need this funny three-way return value done as two Booleans. But backing up a minute, this is really a significant behavior change that is independent of the purpose of the rest of this patch. What you're proposing here is that every time we consider toasting a value on update, we should first check whether it's byte-for-byte equivalent to the old value. That may or may not be a good idea - it will suck if, for example, a user repeatedly updates a very long string by changing only the last character thereof, but it will win in other cases. Whether it's a good idea or not, I think it deserves to be a separate patch. For purposes of this patch, I think you should just assume that any external-indirect value needs to be retoasted, just as we currently assume for untoasted values. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-07-01 17:46:51 -0400, Robert Haas wrote: > But backing up a minute, this is really a significant behavior change > that is independent of the purpose of the rest of this patch. What > you're proposing here is that every time we consider toasting a value > on update, we should first check whether it's byte-for-byte equivalent > to the old value. That may or may not be a good idea - it will suck > if, for example, a user repeatedly updates a very long string by > changing only the last character thereof, but it will win in other > cases. In that case the old value will rather likely just have been read just before, so the price of rereading should be relatively low. But: > Whether it's a good idea or not, I think it deserves to be a > separate patch. For purposes of this patch, I think you should just > assume that any external-indirect value needs to be retoasted, just as > we currently assume for untoasted values. is a rather valid point. I've split the patch accordingly. The second patch is *not* supposed to be applied together with patch 1 but rather included for reference. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Tue, Jul 2, 2013 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote: > In that case the old value will rather likely just have been read just > before, so the price of rereading should be relatively low. Maybe in terms of I/O, but there's still CPU. > is a rather valid point. I've split the patch accordingly. The second > patch is *not* supposed to be applied together with patch 1 but rather > included for reference. OK, committed patch 1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company