Обсуждение: don't create storage when unnecessary
Some time ago, after partitioned indexes had been pushed, I realized that even though I didn't want them to have relfilenodes, they did. And looking closer I noticed that *a lot* of relation kinds that didn't need relfilenodes, had them anyway. This patch fixes that; if no relfilenode is needed, it's not created. I didn't verify pg_upgrade behavior across this commit. Maybe something needs tweaking there. PS: I think it'd be worth following up with this ... https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=AvCYh55Q@mail.gmail.com -- Álvaro Herrera
Вложения
Hi, On 2018-12-06 18:55:52 -0300, Alvaro Herrera wrote: > Some time ago, after partitioned indexes had been pushed, I realized > that even though I didn't want them to have relfilenodes, they did. And > looking closer I noticed that *a lot* of relation kinds that didn't need > relfilenodes, had them anyway. > > This patch fixes that; if no relfilenode is needed, it's not created. > > I didn't verify pg_upgrade behavior across this commit. Maybe something > needs tweaking there. Hm, that generally sounds like a good plan. Could we back this up with tests in misc_sanity.sql or such? - Andres
On Thu, Dec 06, 2018 at 06:55:52PM -0300, Alvaro Herrera wrote: > Some time ago, after partitioned indexes had been pushed, I realized > that even though I didn't want them to have relfilenodes, they did. And > looking closer I noticed that *a lot* of relation kinds that didn't need > relfilenodes, had them anyway. > > This patch fixes that; if no relfilenode is needed, it's not created. > > I didn't verify pg_upgrade behavior across this commit. Maybe something > needs tweaking there. > > PS: I think it'd be worth following up with this ... > https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=AvCYh55Q@mail.gmail.com A macro makes sense to control that. Now I have to admit that I don't like your solution. Wouldn't it be cleaner to assign InvalidOid to relfilenode in such cases? The callers of heap_create would need to be made smarter when they now pass down a relfilenode (looking at you, DefineIndex!), but that seems way more consistent to me. Some tests would also be welcome. -- Michael
Вложения
On 2018-Dec-07, Michael Paquier wrote: > A macro makes sense to control that. I added Ashutosh's RELKIND_HAS_STORAGE, but renamed it to RELKIND_CAN_HAVE_STORAGE, because some of the relkinds can be mapped and thus would have relfilenode set to 0. I think this is a bit misleading either way. > Now I have to admit that I don't > like your solution. Wouldn't it be cleaner to assign InvalidOid to > relfilenode in such cases? The callers of heap_create would need to be > made smarter when they now pass down a relfilenode (looking at you, > DefineIndex!), but that seems way more consistent to me. I don't follow. When a relfilenode is passed by callers, they indicate that the storage has already been created. Contrariwise, when a relation kind that *does* have storage but is not yet created, they pass InvalidOid as relfilenode, and heap_create is in charge of creating storage. Maybe I'm not quite seeing what problem you mean. Or I could add a separate boolean, but that seems pointless. Another possible improvement is to remove the create_storage boolean > Some tests would also be welcome. Added a test in sanity_check.sql that there's no relation with the relkinds that aren't supposed to have storage. Without the code fix it fails in current regression database, but in the failure result set there isn't any relation of kinds 'p' or 'I', so this isn't a terribly comprehensive test -- the query runs too early in the regression sequence. I'm not sure it's worth bothering further. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Rebased over today's conflicting commits. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Hello. At Sun, 16 Dec 2018 17:47:16 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20181216204716.jddzijk3fb2dpdk4@alvherre.pgsql> > On 2018-Dec-07, Michael Paquier wrote: > > > A macro makes sense to control that. > > I added Ashutosh's RELKIND_HAS_STORAGE, but renamed it to > RELKIND_CAN_HAVE_STORAGE, because some of the relkinds can be mapped and > thus would have relfilenode set to 0. I think this is a bit misleading > either way. FWIW.. I RELKIND_HAS_STORAGE looks better to me. # Since it's a bit too late, I don't insist on that. Mapped relations have storage, which is signalled by relfilenode = 0 and the real file node is given by relation mapper. And it is actually created at the boostrap time. In this sense, (RELKIND_HAS_STORAGE && relfilenode == 0) in heap_craete makes sense. Assertion (..->relNode != InvalidOid) at the end of RelationInitPhysicalAddr doesn't fail. I think RELKIND_HAS_STORGE is preferable also in this sense. ATExecSetTableSpaceNoStorage assumes that the relation is actually having storage. > > Now I have to admit that I don't > > like your solution. Wouldn't it be cleaner to assign InvalidOid to > > relfilenode in such cases? The callers of heap_create would need to be > > made smarter when they now pass down a relfilenode (looking at you, > > DefineIndex!), but that seems way more consistent to me. > > I don't follow. When a relfilenode is passed by callers, they indicate > that the storage has already been created. Contrariwise, when a > relation kind that *does* have storage but is not yet created, they > pass InvalidOid as relfilenode, and heap_create is in charge of creating > storage. Maybe I'm not quite seeing what problem you mean. Or I could > add a separate boolean, but that seems pointless. > > Another possible improvement is to remove the create_storage boolean > > > Some tests would also be welcome. > > Added a test in sanity_check.sql that there's no relation with the > relkinds that aren't supposed to have storage. Without the code fix it > fails in current regression database, but in the failure result set > there isn't any relation of kinds 'p' or 'I', so this isn't a terribly > comprehensive test -- the query runs too early in the regression > sequence. I'm not sure it's worth bothering further. Actual files were not created even in the past, create_heap has been just refactored, which looks fine. pg_class and relcache entries no longer have false relfilenode, which looks fine. The test should check only relfilenode won't be given to such relation, which were given in the past, which looks fine. The patch applies cleanly and passed the regression test. regares. -- Kyotaro Horiguchi NTT Open Source Software Center
On Sun, Dec 16, 2018 at 05:47:16PM -0300, Alvaro Herrera wrote: > I don't follow. When a relfilenode is passed by callers, they indicate > that the storage has already been created. Contrariwise, when a > relation kind that *does* have storage but is not yet created, they > pass InvalidOid as relfilenode, and heap_create is in charge of creating > storage. Maybe I'm not quite seeing what problem you mean. Or I could > add a separate boolean, but that seems pointless. I have been double-checking my thoughts on the matter, and one take is to allow the reuse of relfilenodes for indexes like in TryReuseIndex(). I did not recall that case. Sorry for the noise. > Another possible improvement is to remove the create_storage boolean Yes, this could go away as this is linked with relfilenode. I let it up to you if you want to remove it or keep it. I think that I would just remove it. > Added a test in sanity_check.sql that there's no relation with the > relkinds that aren't supposed to have storage. Without the code fix it > fails in current regression database, but in the failure result set > there isn't any relation of kinds 'p' or 'I', so this isn't a terribly > comprehensive test -- the query runs too early in the regression > sequence. I'm not sure it's worth bothering further. +-- check that relations without storage don't have relfilenode It could be an idea to add a comment mentioning that the set of relkinds in RELKIND_CAN_HAVE_STORAGE are linked with the relkinds of this query. -- Michael
Вложения
Hi, On 2018/12/18 14:56, Kyotaro HORIGUCHI wrote: > Hello. > > At Sun, 16 Dec 2018 17:47:16 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20181216204716.jddzijk3fb2dpdk4@alvherre.pgsql> >> On 2018-Dec-07, Michael Paquier wrote: >> >>> A macro makes sense to control that. >> >> I added Ashutosh's RELKIND_HAS_STORAGE, but renamed it to >> RELKIND_CAN_HAVE_STORAGE, because some of the relkinds can be mapped and >> thus would have relfilenode set to 0. I think this is a bit misleading >> either way. > > FWIW.. I RELKIND_HAS_STORAGE looks better to me. > > # Since it's a bit too late, I don't insist on that. > > Mapped relations have storage, which is signalled by relfilenode > = 0 and the real file node is given by relation mapper. And it is > actually created at the boostrap time. In this sense, > (RELKIND_HAS_STORAGE && relfilenode == 0) in heap_craete makes > sense. > > Assertion (..->relNode != InvalidOid) at the end of > RelationInitPhysicalAddr doesn't fail. I think RELKIND_HAS_STORGE > is preferable also in this sense. Sorry to be saying it late, but I have to agree with Horiguchi-san here that RELKIND_HAS_STORAGE sounds better and is clear. Looking at what's committed: /* * Relation kinds that have physical storage. These relations normally have * relfilenode set to non-zero, but it can also be zero if the relation is * mapped. */ #define RELKIND_CAN_HAVE_STORAGE(relkind) \ ((relkind) == RELKIND_RELATION || \ (relkind) == RELKIND_INDEX || \ (relkind) == RELKIND_SEQUENCE || \ (relkind) == RELKIND_TOASTVALUE || \ (relkind) == RELKIND_MATVIEW) So, they all *do have* storage, as the comment's first sentence also says. Mixing the bit about mapped relations in the naming of this macro will make it hard to reason about using it in other parts of the backend code (although, in admittedly not too far off places), where it shouldn't matter if the relation's file is determined using relfilenode or via relation to file mapper. > ATExecSetTableSpaceNoStorage assumes that the relation is > actually having storage. Hmm, it has the following Assert: /* * Shouldn't be called on relations having storage; these are processed * in phase 3. */ Assert(!RELKIND_CAN_HAVE_STORAGE(rel->rd_rel->relkind)); So, it actually assumes that it's called for relations that don't have storage (...but do have a tablespace property that applies to its children.) Thanks, Amit
On 18/12/2018 08:18, Amit Langote wrote: > Sorry to be saying it late, but I have to agree with Horiguchi-san here > that RELKIND_HAS_STORAGE sounds better and is clear. I think I agree. Does someone want to send a patch? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Jan-04, Peter Eisentraut wrote: > On 18/12/2018 08:18, Amit Langote wrote: > > Sorry to be saying it late, but I have to agree with Horiguchi-san here > > that RELKIND_HAS_STORAGE sounds better and is clear. > > I think I agree. Does someone want to send a patch? I'll do it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 04, 2019 at 09:19:25AM -0300, Alvaro Herrera wrote: > I'll do it. Thanks for taking care of it. Please note that commands/copy.c also makes use of RELKIND_CAN_HAVE_STORAGE() as of bf491a9. Could you split the renaming with a commit independent on what is being discussed on this thread? These are separate issues in my opinion. -- Michael