Обсуждение: Pre-set Hint bits/VACUUM FREEZE on data load..?
Greetings, In a discussion which came up at PgEast, I questioned if it'd be possible to set the 'all visible' hint bit and give thetuples the frozen XID when loading data into a table which was created in the same transaction. The idea being that no other transactions could see the table (in any important way anyway.. couldn't SELECT from it, forexample) since it was created in the same transaction that the data was loaded. This would avoid having to rewrite thetable to set the hint bits and to set the tuples as frozen after the data load. There's a question here is about if concurrent transactions in serializable or read isolated would be able to see the newtable too early, because catalog lookups typically use SnapshotNow, and hence might see the new tuples when it reallyshouldn't be able to. It seems odd to me that it might be able to select from this new table which was committed ina transaction which started after the current one though. Anyway, just a thought that I wanted to get out to hackers before I destroy the brain cells that it's stored in tonight...:) Thanks, Stephen
On 24.03.2011 23:08, Stephen Frost wrote: > In a discussion which came up at PgEast, I questioned if it'd be > possible to set the 'all visible' hint bit and give the tuples the > frozen XID when loading data into a table which was created in the > same transaction. > > The idea being that no other transactions could see the table (in any > important way anyway.. couldn't SELECT from it, for example) since it > was created in the same transaction that the data was loaded. This > would avoid having to rewrite the table to set the hint bits and to > set the tuples as frozen after the data load. The problem is that you still need to track which queries within the transaction can see the tuples. For example: BEGIN; CREATE TABLE foo ... INSERT INTO foo VALUES (1); DECLARE foocur CURSOR FOR SELECT * FROM foo; FETCH foocur; INSERT INTO foo VALUES (2); FETCH foocur; The cursor was opened before the 2nd tuple was inserted, so it should not be returned by the cursor. There's also corner cases like triggers that query the same table, and self-joins. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Mar 24, 2011 at 9:15 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 24.03.2011 23:08, Stephen Frost wrote: >> >> In a discussion which came up at PgEast, I questioned if it'd be >> possible to set the 'all visible' hint bit and give the tuples the >> frozen XID when loading data into a table which was created in the >> same transaction. Fwiw this was the original plan with Simon's patch in the 8.3 era to skip wal logging tables being loaded in the same transaction they were created. (Ironically often made futile by his own HS work.) There were problems that I don't recall but might well be the same as the problem Heikki pointed out. > The problem is that you still need to track which queries within the > transaction can see the tuples. We could conceivably deal with that by not setting the frozenxid but setting the hint bit for those tuples and having a documented special case that if the hint bit is set but it's the same xid as your own you have to treat it as not-committed. Not sure if it's worth the ugliness to solve only half the problem. I get the impression most people are complaining about hint bit setting i/o but if you're still going to have to rewrite the table at some time in the future the problem's not really resolved. -- greg
On Thu, Mar 24, 2011 at 5:39 PM, Greg Stark <gsstark@mit.edu> wrote: > On Thu, Mar 24, 2011 at 9:15 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 24.03.2011 23:08, Stephen Frost wrote: >>> >>> In a discussion which came up at PgEast, I questioned if it'd be >>> possible to set the 'all visible' hint bit and give the tuples the >>> frozen XID when loading data into a table which was created in the >>> same transaction. > > Fwiw this was the original plan with Simon's patch in the 8.3 era to > skip wal logging tables being loaded in the same transaction they were > created. (Ironically often made futile by his own HS work.) There were > problems that I don't recall but might well be the same as the problem > Heikki pointed out. > >> The problem is that you still need to track which queries within the >> transaction can see the tuples. > > We could conceivably deal with that by not setting the frozenxid but > setting the hint bit for those tuples and having a documented special > case that if the hint bit is set but it's the same xid as your own you > have to treat it as not-committed. > > Not sure if it's worth the ugliness to solve only half the problem. I > get the impression most people are complaining about hint bit setting > i/o but if you're still going to have to rewrite the table at some > time in the future the problem's not really resolved. Also, you're not really going to get the whole benefit unless you can somehow manage to mark the pages PD_ALL_VISIBLE and set the visibility map bits. Without that, the next vacuum is going to dirty the whole heap anyway. Granted that's a bit better than having the next scan do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 24, 2011 at 9:39 PM, Greg Stark <gsstark@mit.edu> wrote: > > We could conceivably deal with that by not setting the frozenxid but > setting the hint bit for those tuples and having a documented special > case that if the hint bit is set but it's the same xid as your own you > have to treat it as not-committed. Or I suppose we could set the frozenxid but maintain a hash table of relations that we use to remember which relations we created in this transaction and are treating this way. For any table in that hash table we ignore the xmin and just look at cmin/cmax. I'm not sure this solves the cases of subtransactions -- but perhaps we just wouldn't store frozenxid if we're in a subtransaction. And I'm not sure we have access to the relation id when we're doing visibility checks. I think we do. This would involve no serious ugliness in the actual on-disk storage and it would make data loads a write-once operation which is the holy grail :) -- greg
* Greg Stark (gsstark@mit.edu) wrote: > I'm not sure this solves the cases of subtransactions -- but perhaps > we just wouldn't store frozenxid if we're in a subtransaction. And I'm > not sure we have access to the relation id when we're doing visibility > checks. I think we do. This would involve no serious ugliness in the > actual on-disk storage and it would make data loads a write-once > operation which is the holy grail :) I'd be happy with a "data loading mode" that even disallowed subtransactions if necessary to achieve the write-once (well, plus WAL if you're archiving) operation... Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > Also, you're not really going to get the whole benefit unless you can > somehow manage to mark the pages PD_ALL_VISIBLE and set the visibility > map bits. Without that, the next vacuum is going to dirty the whole > heap anyway. Granted that's a bit better than having the next scan do > it. If the premise that we can do just-about-whatever with the relation (and any sub-components, TOAST table, visibility map, etc) since it was created in the same transaction and therefore isn't visible outside, then I don't think that setting the visibility map bits would be a problem.. Thanks, Stephen
* Heikki Linnakangas (heikki.linnakangas@enterprisedb.com) wrote: > The problem is that you still need to track which queries within the > transaction can see the tuples. For example: Wow, that's a good point wrt cursors. Sounds more and more like we'd need a special data-loading mode for this where we'd have to disallow those options. I've been thinking that's a frowned-upon approach in general, but let me ask specifically- are we uninterested in such a special 'data-load' mode? Or do we expect that the limitations would be too great to make it useful enough for users? That last I don't think I agree with.. Thanks, Stephen
* Greg Stark (gsstark@mit.edu) wrote: > Fwiw this was the original plan with Simon's patch in the 8.3 era to > skip wal logging tables being loaded in the same transaction they were > created. (Ironically often made futile by his own HS work.) Hah, knew I couldn't have been the first, my subconcious is likely pulling from a discussion that I had w/ Simon about exactly this at some pgCon.. Figures that I conveniently forget the 'hard bits'. :) Thanks, Stephen
* Heikki Linnakangas (heikki.linnakangas@enterprisedb.com) wrote: > The problem is that you still need to track which queries within the > transaction can see the tuples. For example: Alright, one final thought on this for a while.. We can check if the table was created in the current transaction. If it was, no tuples in it could possibly be frozen, except if we set them that way when we first loaded them. How about something of the form: At the start of a load, we check if the table was created in the current transaction. If so, we check if we've already done a load which used the frozen XID. If we have, then we use the normal mechanics. If we havn't, then we stuff what the XID would have been in memory somewhere, mark that we've used the frozen XID, and load the data using the frozen XID. On subsequent queries, if we come across a frozen XID in a table created in this transaction, we use the XID we've stored instead of the frozen XID. The gist of this being that we could avoid having to rewrite the table while still tracking what the XID would have been for these records for subsequent activities. I certainly like something similar to this better than the "disable thing we think might break", since that could change over time and we may implement something new that breaks and we forgot to add it to the exclude list. Additionally, having an 'include' list for the data loading mode wouldn't be ideal if we can avoid it, but if not, may be acceptable anyway to get the fast path. My concern here and what I see the counter-argument being (presuming this isn't completely insane anyway) is added cost when we're not in this mode (tho hopefully an extra compare won't be too expensive overall..) and perhaps an issue with where the code would have to go to implement this and violating abstraction layers, etc. Headed back to pgEast. :) Thanks, Stephen
On 24.03.2011 23:54, Stephen Frost wrote: > * Heikki Linnakangas (heikki.linnakangas@enterprisedb.com) wrote: >> The problem is that you still need to track which queries within the >> transaction can see the tuples. For example: > > Wow, that's a good point wrt cursors. Sounds more and more like we'd > need a special data-loading mode for this where we'd have to disallow > those options. I've been thinking that's a frowned-upon approach in > general, but let me ask specifically- are we uninterested in such a > special 'data-load' mode? Or do we expect that the limitations would be > too great to make it useful enough for users? That last I don't think I > agree with.. I don't think we should put the onus on the user to choose the right data loading mode. If we can reliably detect the cases where it's safe do these tricks, we can transparently apply them when possible. I would be cool with tricks that apply only in narrow cases, as long as we don't require the user to do anything. That said, it can be surprising if some trivial change to the schema happens to disable the optimization, and your query performance after data load suddenly plummets. That can be a pain for the DBA to debug. I'm skeptical of changes that need any extra checks in the fast-path of HeapTupleSatisfiesMVCC. That is a hot spot already, any extra cycles there would add up to hurt performance. Careful performance testing is required. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 25.03.2011 09:51, Heikki Linnakangas wrote: > I don't think we should put the onus on the user to choose the right > data loading mode. If we can reliably detect the cases where it's safe > do these tricks, we can transparently apply them when possible. I would > be cool with tricks that apply only in narrow cases, as long as we don't > require the user to do anything. I believe the conditions for being able to set hint bits immediately at COPY are: 1. The table has been created or truncated in the same transaction 2. We are not in a subtransaction (or the table was created and truncated in the same subtransaction) 3. There are no open portals 4. Executing the COPY doesn't need to run any "unsafe" code that might access the same table. This includes triggers, check constraints and input functions. An expression is safe if it is immutable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Mar 25, 2011 at 1:00 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > 1. The table has been created or truncated in the same transaction > 2. We are not in a subtransaction (or the table was created and truncated in > the same subtransaction) > 3. There are no open portals > 4. Executing the COPY doesn't need to run any "unsafe" code that might > access the same table. This includes triggers, check constraints and input > functions. An expression is safe if it is immutable. It sounds like these cases would, at the very least, be met by some forms of pg_restore...that's a highly desirable property, I feel. -- fdr
On 25.03.2011 00:15, Stephen Frost wrote: > At the start of a load, we check if the table was created in the current > transaction. If so, we check if we've already done a load which used > the frozen XID. If we have, then we use the normal mechanics. If we > havn't, then we stuff what the XID would have been in memory somewhere, > mark that we've used the frozen XID, and load the data using the frozen > XID. On subsequent queries, if we come across a frozen XID in a table > created in this transaction, we use the XID we've stored instead of the > frozen XID. The tricky part here is how to check if the table was created in the same transaction, within HeapTupleSatisfiesMVCC, with minimal overhead. If you do it naively, the check will be executed at every single tuple read in the system. It has to be really really fast. I don't want to discourage, maybe it's possible with some clever code arrangement. However, it needs a lot of performance testing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Mar 25, 2011 at 8:09 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > The tricky part here is how to check if the table was created in the same > transaction, within HeapTupleSatisfiesMVCC, with minimal overhead. If you do > it naively, the check will be executed at every single tuple read in the > system. It has to be really really fast. > You could have a single global boolean variable to indicate whether any tables have been created in this transaction and inserted into using this frozenxid hack in this transaction yet. That would be branch predicted as false in any decent processor which shouldn't cost anything. If it's true then you go off and check a small hash table. The question is how often would the insert trigger with false positives. That is, how often do people create tables, insert a small number of records, then go off and do complex performance-sensitive things in the same transaction. I suppose this does happen due to the fairly common pattern of "I want temporary tables but they need to be visible to this other transaction". For example for batch jobs where you want to be able to display some sort of progress indicator. -- greg
* Greg Stark (gsstark@mit.edu) wrote: > You could have a single global boolean variable to indicate whether > any tables have been created in this transaction and inserted into > using this frozenxid hack in this transaction yet. This was exactly where I was going, and, honestly, I was wondering if there'd be a way to possibly even avoid running through HeapTupleIsVisible for those relations entirely.. Up to the point where some further action modifies the table again and forces us to have to go through that path. I'm not sure I feel it's a real 'bad' hack. :) > The question is how often would the insert trigger with false > positives. That is, how often do people create tables, insert a small > number of records, then go off and do complex performance-sensitive > things in the same transaction. I suppose this does happen due to the > fairly common pattern of "I want temporary tables but they need to be > visible to this other transaction". For example for batch jobs where > you want to be able to display some sort of progress indicator. Right, that kind of issue is why I had been thinking it might be a different mode, but it also may not be all that bad to use an approach like what you described above. We could descide to not do it for temporary tables, perhaps, or, if we could possibly avoid having to check for visibility at the tuple level for these relations, perhaps it'd be faster on those use-cases..? We would need to figure out, also, if people are doing batch operations.. eg: they load 10000 records at a time into a table. We wouldn't want to lose these optimizations in those cases, provied they're done one after another. Thanks, Stephen
On Thu, Mar 24, 2011 at 23:47, Stephen Frost <sfrost@snowman.net> wrote: > I'd be happy with a "data loading mode" that even disallowed > subtransactions if necessary to achieve the write-once (well, plus WAL > if you're archiving) operation... Note that there's already an extension on pgFoundry for a "data loading mode", have you looked at pg_bulkload? http://pgbulkload.projects.postgresql.org/ Regards, Marti
On Mar 24, 2011, at 4:42 PM, Greg Stark wrote: > On Thu, Mar 24, 2011 at 9:39 PM, Greg Stark <gsstark@mit.edu> wrote: >> >> We could conceivably deal with that by not setting the frozenxid but >> setting the hint bit for those tuples and having a documented special >> case that if the hint bit is set but it's the same xid as your own you >> have to treat it as not-committed. > > Or I suppose we could set the frozenxid but maintain a hash table of > relations that we use to remember which relations we created in this > transaction and are treating this way. For any table in that hash > table we ignore the xmin and just look at cmin/cmax. > > I'm not sure this solves the cases of subtransactions -- but perhaps > we just wouldn't store frozenxid if we're in a subtransaction. And I'm > not sure we have access to the relation id when we're doing visibility > checks. I think we do. This would involve no serious ugliness in the > actual on-disk storage and it would make data loads a write-once > operation which is the holy grail :) I have an even broader idea... If we had the ability to specify in a transaction exactly what tables it was going to read from, enforce that it only readfrom those tables, and published that information to other backends, vacuum could then ignore long running transactionsthat are guaranteed not to hit the table they're vacuuming. This would eliminate a large pain-point for anyonethat has a high-velocity table in an environment that also has to support longer running transactions. If we had that mechanism you could also make use of it for this idea; you would specify that your transaction will neverread from the new table, which means that any data inserted into that table can be pre-frozen (assuming we make surethat no one else can see that table exists yet). -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Fri, Mar 25, 2011 at 4:00 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 25.03.2011 09:51, Heikki Linnakangas wrote: >> >> I don't think we should put the onus on the user to choose the right >> data loading mode. If we can reliably detect the cases where it's safe >> do these tricks, we can transparently apply them when possible. I would >> be cool with tricks that apply only in narrow cases, as long as we don't >> require the user to do anything. > > I believe the conditions for being able to set hint bits immediately at COPY > are: > > 1. The table has been created or truncated in the same transaction > 2. We are not in a subtransaction (or the table was created and truncated in > the same subtransaction) > 3. There are no open portals > 4. Executing the COPY doesn't need to run any "unsafe" code that might > access the same table. This includes triggers, check constraints and input > functions. An expression is safe if it is immutable. That's not enough... some other transaction could see the data before the transaction commits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 25, 2011 at 7:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> 1. The table has been created or truncated in the same transaction >,,, > That's not enough... some other transaction could see the data before > the transaction commits. How? -- greg
On Fri, Mar 25, 2011 at 3:29 PM, Greg Stark <gsstark@mit.edu> wrote: > On Fri, Mar 25, 2011 at 7:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> 1. The table has been created or truncated in the same transaction >>,,, >> That's not enough... some other transaction could see the data before >> the transaction commits. > > How? Hmm. Maybe I'm wrong. I guess the XID would still be in the MVCC snapshot of every other running transaction, so maybe it would be OK. *scratches head* -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 25.03.2011 22:21, Robert Haas wrote: > On Fri, Mar 25, 2011 at 3:29 PM, Greg Stark<gsstark@mit.edu> wrote: >> On Fri, Mar 25, 2011 at 7:06 PM, Robert Haas<robertmhaas@gmail.com> wrote: >>>> 1. The table has been created or truncated in the same transaction >>> ,,, >>> That's not enough... some other transaction could see the data before >>> the transaction commits. >> >> How? > > Hmm. Maybe I'm wrong. I guess the XID would still be in the MVCC > snapshot of every other running transaction, so maybe it would be OK. > *scratches head* Right. You can't mark the tuples as frozen, but you can set the xmin-committed hint bits. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com