Обсуждение: splitting htup.h
Hi, This patch splits htup.h in two pieces -- the first one (tupbasics.h; not wedded to the name) does not include many other headers and is just enough to have other parts of the code create tuples and pass them around, to be used by most other headers. The other one (which keeps the name htup.h) contains internal tuple stuff (struct declarations etc). Before patch, htup.h is directly or indirectly included by 364 .c files in src/backend; after patch, that's reduced to 299 files (that's 65 files less to compile if you modify the header). -- Álvaro Herrera <alvherre@alvh.no-ip.org>
Вложения
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > This patch splits htup.h in two pieces -- the first one (tupbasics.h; > not wedded to the name) does not include many other headers and is just > enough to have other parts of the code create tuples and pass them > around, to be used by most other headers. The other one (which keeps > the name htup.h) contains internal tuple stuff (struct declarations > etc). > Before patch, htup.h is directly or indirectly included by 364 .c files > in src/backend; after patch, that's reduced to 299 files (that's 65 > files less to compile if you modify the header). That's kind of a disappointing result --- if we're going to split htup.h into public and private parts, I would have hoped for a much smaller inclusion footprint for the private part. Maybe you could adjust the boundary between public and private parts a bit more? If we can't cut the footprint I'm inclined to think this isn't worth the code churn. (Or perhaps I'm missing the point. Do you have a reason for doing this other than cutting the inclusion footprint?) regards, tom lane
Excerpts from Tom Lane's message of vie jun 15 21:06:21 -0400 2012: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > This patch splits htup.h in two pieces -- the first one (tupbasics.h; > > not wedded to the name) does not include many other headers and is just > > enough to have other parts of the code create tuples and pass them > > around, to be used by most other headers. The other one (which keeps > > the name htup.h) contains internal tuple stuff (struct declarations > > etc). > > > Before patch, htup.h is directly or indirectly included by 364 .c files > > in src/backend; after patch, that's reduced to 299 files (that's 65 > > files less to compile if you modify the header). > > That's kind of a disappointing result --- if we're going to split htup.h > into public and private parts, I would have hoped for a much smaller > inclusion footprint for the private part. Maybe you could adjust the > boundary between public and private parts a bit more? If we can't cut > the footprint I'm inclined to think this isn't worth the code churn. Yeah, I have another version of the patch (attached) that includes HeapTupleData in the public part; this means catcache.h does not need htup.h, which was the main propagation vector in the original patch. With that, 162 .c files include htup.h -- about 130 of them do so directly, and the rest do so through either relscan.h or tuptoaster.h. Is this enough? I think the next step would require moving HeapTupleHeaderData to the public header as well; I haven't checked to see how much better it gets. It would require copying some of the other includes in htup.h into the public header, though, so there is a tradeoff. (Another idea: maybe we could split in three parts rather than two). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Вложения
Here's a third version with which I'm much happier. This patch is mainly doing four things: 1. take some typedefs and the HeapTupleData struct definition from access/htup.h, and put them in access/tupbasics.h. This new file is used as #include in all headers instead of htup.h. 2. take out catcache.h from syscache.h; in its stead, we add a forward struct declaration of struct catclist. This was proposed by Andres Freund and Peter Geogeghan previously, and it turns out to be convenient as well as foreseen by older comments in syscache.h. It limits proliferation of other unnecessary headers in catcache.h as well. 3. split resowner.h creating resowner_private.h. Files that just want to create and use ResourceOwners can include the thinner resowner.h; those that have stuff managed within a ResOwner use the other file. This limits proliferation of lots of other header inclusion. 4. split the Xlog stuff out of heapam.h into heapam_xlog.h. The number of src/backend files that depend on some src/include/ files: access/heapam.h 216 access/heapam_xlog.h 12 access/htup.h 182 access/tupbasics.h 401 utils/resowner.h 74 utils/resowner_private.h 10 utils/catcache.h 23 utils/syscache.h 230 It seems pretty clear that all these the splits are useful. I'm unsure about the "tupbasics.h" file name. I'm open to better ideas. The other two new files seem good enough that no bikeshedding seems really necessary. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > This patch is mainly doing four things: > 1. take some typedefs and the HeapTupleData struct definition from > access/htup.h, and put them in access/tupbasics.h. This new file is > used as #include in all headers instead of htup.h. > I'm unsure about the "tupbasics.h" file name. I'm open to better ideas. I'd be inclined to keep the name "htup.h" for the more widely used file, and invent a new name for what we're splitting out of it. This should reduce the number of changes needed, not only in our code but third party code. Not sure if the new file could sanely be called "htup_private.h"; it seems a bit widely used for that. Maybe "heap.h"? Also, is there any reason to consider just moving those defs into heapam.h, instead of inventing a new header? I'm not sure if there's any principled distinction between heap.h and heapam.h, or any significant differences between their sets of consumers. The other changes all sound sane. regards, tom lane
... btw, the buildfarm says you forgot contrib/ while fixing the collateral damage from these changes. regards, tom lane
Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > This patch is mainly doing four things: > > > 1. take some typedefs and the HeapTupleData struct definition from > > access/htup.h, and put them in access/tupbasics.h. This new file is > > used as #include in all headers instead of htup.h. > > > I'm unsure about the "tupbasics.h" file name. I'm open to better ideas. > > I'd be inclined to keep the name "htup.h" for the more widely used file, > and invent a new name for what we're splitting out of it. Meh. When seen in that light, it makes a lot of sense. I was actually thinking it the other way around. > This should > reduce the number of changes needed, not only in our code but third > party code. Not sure if the new file could sanely be called > "htup_private.h"; it seems a bit widely used for that. Maybe "heap.h"? Not sure I really like heap.h, but let's put that aside for a moment. > Also, is there any reason to consider just moving those defs into > heapam.h, instead of inventing a new header? I'm not sure if there's > any principled distinction between heap.h and heapam.h, or any > significant differences between their sets of consumers. To test this idea, I created a separate "scandesc.h" file that contains struct declarations for HeapScanDesc and IndexScanDesc, and included that one into execnodes.h instead of heapam.h and genam.h. After fixing the fallout in the .c files, it turns out that heapam.h is used by 116 files (some of them through relscan.h or hio.h). Per my previous count, heap.h would be used by 182 files. So here's a list that depends on heap.h but not heapam.h (the reverse is obviously empty because heapam.h itself depends on heap.h). So I'm inclined to keep both files separate. ./access/common/heaptuple.c ./access/common/reloptions.c ./access/common/tupconvert.c ./access/common/tupdesc.c ./access/gin/gininsert.c ./access/gist/gistbuild.c ./access/heap/visibilitymap.c ./access/nbtree/nbtsort.c ./access/nbtree/nbtxlog.c ./access/spgist/spginsert.c ./access/transam/rmgr.c ./access/transam/twophase.c ./access/transam/xlog.c ./access/transam/xlogfuncs.c ./bootstrap/bootparse.c ./catalog/indexing.c ./catalog/namespace.c ./commands/variable.c ./executor/execAmi.c ./executor/execQual.c ./executor/execTuples.c ./executor/functions.c ./executor/nodeAgg.c ./executor/nodeHash.c ./executor/nodeHashjoin.c ./executor/nodeSetOp.c ./executor/nodeSubplan.c ./executor/nodeWindowAgg.c ./executor/spi.c ./executor/tstoreReceiver.c ./foreign/foreign.c ./nodes/tidbitmap.c ./optimizer/path/costsize.c ./optimizer/plan/planagg.c ./optimizer/plan/planner.c ./optimizer/plan/subselect.c ./optimizer/util/clauses.c ./parser/parse_coerce.c ./parser/parse_func.c ./parser/parse_oper.c ./parser/parse_type.c ./snowball/dict_snowball.c ./storage/freespace/freespace.c ./storage/lmgr/predicate.c ./storage/page/bufpage.c ./tcop/fastpath.c ./tcop/utility.c ./tsearch/ts_selfuncs.c ./tsearch/wparser.c ./utils/adt/acl.c ./utils/adt/arrayfuncs.c ./utils/adt/array_selfuncs.c ./utils/adt/array_typanalyze.c ./utils/adt/datetime.c ./utils/adt/format_type.c ./utils/adt/genfile.c ./utils/adt/json.c ./utils/adt/lockfuncs.c ./utils/adt/pg_locale.c ./utils/adt/pgstatfuncs.c ./utils/adt/rangetypes_selfuncs.c ./utils/adt/rowtypes.c ./utils/adt/trigfuncs.c ./utils/adt/tsgistidx.c ./utils/adt/varbit.c ./utils/adt/varchar.c ./utils/adt/varlena.c ./utils/cache/inval.c ./utils/cache/lsyscache.c ./utils/cache/syscache.c ./utils/fmgr/fmgr.c ./utils/init/miscinit.c ./utils/misc/superuser.c ./utils/sort/tuplesort.c ./utils/sort/tuplestore.c ./utils/time/combocid.c ./utils/time/tqual.c -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012: >> Also, is there any reason to consider just moving those defs into >> heapam.h, instead of inventing a new header? I'm not sure if there's >> any principled distinction between heap.h and heapam.h, or any >> significant differences between their sets of consumers. > [ yeah, there's quite a few files that would need heap.h but not heapam.h ] OK, scratch that thought then. So we seem to be down to choosing a new name for what we're going to take out of htup.h. If you don't like heap.h, maybe something like heap_tuple.h? I'm not terribly excited about it either way though. Any other ideas out there? regards, tom lane
Excerpts from Alvaro Herrera's message of mié ago 29 11:32:04 -0400 2012: > Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012: > > This should > > reduce the number of changes needed, not only in our code but third > > party code. Not sure if the new file could sanely be called > > "htup_private.h"; it seems a bit widely used for that. Maybe "heap.h"? > > Not sure I really like heap.h, but let's put that aside for a moment. ... we already have catalog/heap.h, so that would be one reason to avoid that name. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > ... we already have catalog/heap.h, so that would be one reason to avoid > that name. Yeah, good point. We do have some duplicate file names elsewhere, but it's not a terribly good practice. regards, tom lane
On Wednesday, August 29, 2012 05:47:14 PM Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012: > >> Also, is there any reason to consider just moving those defs into > >> heapam.h, instead of inventing a new header? I'm not sure if there's > >> any principled distinction between heap.h and heapam.h, or any > >> significant differences between their sets of consumers. > > > > [ yeah, there's quite a few files that would need heap.h but not heapam.h > > ] > > OK, scratch that thought then. So we seem to be down to choosing a new > name for what we're going to take out of htup.h. If you don't like > heap.h, maybe something like heap_tuple.h? I'm not terribly excited > about it either way though. Any other ideas out there? htup_details.h? That doesn't have the sound of "fringe details" that htup_private.h has. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Excerpts from Andres Freund's message of mié ago 29 12:10:17 -0400 2012: > > OK, scratch that thought then. So we seem to be down to choosing a new > > name for what we're going to take out of htup.h. If you don't like > > heap.h, maybe something like heap_tuple.h? I'm not terribly excited > > about it either way though. Any other ideas out there? > htup_details.h? That doesn't have the sound of "fringe details" that > htup_private.h has. htup_details.h seems reasonable, thanks. Here's the proposed patch (which uses tupheader.h instead of htup_details.h but other than that it's pretty much what I would commit). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Excerpts from Alvaro Herrera's message of mié ago 29 15:13:11 -0400 2012: > Excerpts from Andres Freund's message of mié ago 29 12:10:17 -0400 2012: > > > > OK, scratch that thought then. So we seem to be down to choosing a new > > > name for what we're going to take out of htup.h. If you don't like > > > heap.h, maybe something like heap_tuple.h? I'm not terribly excited > > > about it either way though. Any other ideas out there? > > htup_details.h? That doesn't have the sound of "fringe details" that > > htup_private.h has. > > htup_details.h seems reasonable, thanks. Committed, with a slight adjustment that I had left out of this patch but was present in a previous version of it: the function prototypes, which in the last posted patch were to remain in htup.h, are now in htup_details.h. So I removed access/tupdesc.h from htup.h. I had to add #include "access/htup_details.h" to half a dozen more .c files but that seemed a good tradeoff. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services