Обсуждение: Tarball builds in the new world order

Поиск
Список
Период
Сортировка

Tarball builds in the new world order

От
Tom Lane
Дата:
With one eye on the beta-release calendar, I thought it'd be a
good idea to test whether our tarball build script works for
the new plan where we'll use "git archive" instead of the
traditional process.

It doesn't.

It makes tarballs all right, but whatever commit ID you specify
is semi-ignored, and you get a tarball corresponding to HEAD
of master.  (The PDFs come from the right version, though!)

The reason for that is that the mk-one-release script does this
(shorn of not-relevant-here details):

    export BASE=/home/pgsql
    export GIT_DIR=$BASE/postgresql.git

    mkdir pgsql

    # Export the selected git ref
    git archive ${gitref} | tar xf - -C pgsql

    cd pgsql
    ./configure

    # Produce .tar.gz and .tar.bz2 files
    make dist

Since 619bc23a1, what "make dist" does is

    $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o
$(abs_top_builddir)/$@
    $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix
$(distdir)/HEAD -o $(abs_top_builddir)/$@ 

Since GIT_DIR is set, git consults that repo not the current working
directory, so HEAD means whatever it means in a just-fetched repo,
and mk-one-release's efforts to select the ${gitref} commit mean
nothing.  (If git had tried to consult the current working directory,
it would've failed for lack of any .git subdir therein.)

I really really don't want to put version-specific coding into
mk-one-release, but fortunately I think we don't have to.
What I suggest is doing this in mk-one-release:

-make dist
+make dist PG_COMMIT_HASH=${gitref}

and changing the "make dist" rules to write $(PG_COMMIT_HASH) not
HEAD.  The extra make variable will have no effect in the back
branches, while it should cause the right thing to happen with
the new implementation of "make dist".

This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either.  Now, if we just do it exactly like that
then trying to "make dist" without setting PG_COMMIT_HASH will
fail, since "git archive" has no default for its <tree-ish>
argument.  I can't quite decide if that's a good thing, or if we
should hack the makefile a little further to allow PG_COMMIT_HASH
to default to HEAD.

Thoughts, better ideas?

            regards, tom lane



Re: Tarball builds in the new world order

От
Greg Sabino Mullane
Дата:
On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either.  Now, if we just do it exactly like that
then trying to "make dist" without setting PG_COMMIT_HASH will
fail, since "git archive" has no default for its <tree-ish>
argument.  I can't quite decide if that's a good thing, or if we
should hack the makefile a little further to allow PG_COMMIT_HASH
to default to HEAD.

Just having it fail seems harsh. What if we had plain "make dist" at least output a friendly hint about "please specify a hash"? That seems better than an implicit HEAD default, as they can manually set it to HEAD themselves per the hint.

Cheers,
Greg

 

Re: Tarball builds in the new world order

От
Peter Eisentraut
Дата:
On 24.04.24 00:05, Tom Lane wrote:
> It makes tarballs all right, but whatever commit ID you specify
> is semi-ignored, and you get a tarball corresponding to HEAD
> of master.  (The PDFs come from the right version, though!)
> 
> The reason for that is that the mk-one-release script does this
> (shorn of not-relevant-here details):
> 
>     export BASE=/home/pgsql
>     export GIT_DIR=$BASE/postgresql.git
> 
>     mkdir pgsql
> 
>     # Export the selected git ref
>     git archive ${gitref} | tar xf - -C pgsql

Where does ${gitref} come from?  Why doesn't this line use git archive 
HEAD | ... ?

> What I suggest is doing this in mk-one-release:
> 
> -make dist
> +make dist PG_COMMIT_HASH=${gitref}
> 
> and changing the "make dist" rules to write $(PG_COMMIT_HASH) not
> HEAD.  The extra make variable will have no effect in the back
> branches, while it should cause the right thing to happen with
> the new implementation of "make dist".

I suppose we could do something like that, but we'd also need to come up 
with a meson version.

(Let's not use "hash" though, since other ways to commit specify a 
commit can be used.)

> This change seems like a good thing anyway for anyone who's tempted
> to use "make dist" manually, since they wouldn't necessarily want
> to package HEAD either.

A tin-foil-hat argument is that we might not want to encourage that, 
because for reproducibility, we need a known git commit and also a known 
implementation of make dist.  If in the future someone uses the make 
dist implementation of PG19 to build a tarball for PG17, it might not 
come out the same way as using the make dist implementation of PG17.




Re: Tarball builds in the new world order

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 24.04.24 00:05, Tom Lane wrote:
>> # Export the selected git ref
>> git archive ${gitref} | tar xf - -C pgsql

> Where does ${gitref} come from?  Why doesn't this line use git archive 
> HEAD | ... ?

${gitref} is an argument to the script, specifying the commit
to be packaged.  HEAD would certainly not work when trying to
package a back-branch release, and in general it would hardly
ever be what you want if your goal is to make a reproducible
package.

>> What I suggest is doing this in mk-one-release:
>> -make dist
>> +make dist PG_COMMIT_HASH=${gitref}

> I suppose we could do something like that, but we'd also need to come up 
> with a meson version.

Packaging via meson is years away yet IMO, so I'm unconcerned
about that for now.  See below.

> (Let's not use "hash" though, since other ways to commit specify a 
> commit can be used.)

OK, do you have a different term in mind?

>> This change seems like a good thing anyway for anyone who's tempted
>> to use "make dist" manually, since they wouldn't necessarily want
>> to package HEAD either.

> A tin-foil-hat argument is that we might not want to encourage that, 
> because for reproducibility, we need a known git commit and also a known 
> implementation of make dist.  If in the future someone uses the make 
> dist implementation of PG19 to build a tarball for PG17, it might not 
> come out the same way as using the make dist implementation of PG17.

Of course.  The entire reason why this script invokes "make dist",
rather than implementing the behavior for itself, is so that
branch-specific behaviors can be accounted for in the branches
not here.  (To be clear, the script has no idea which branch
it's packaging --- that's implicit in the commit ID.)

Because of that, I really don't want to rely on some equivalent
meson infrastructure until it's available in all the live branches.
v15 will be EOL in 3.5 years, and that's more or less the time frame
that we've spoken of for dropping the makefile infrastructure, so
I don't think that's an unreasonable plan.

            regards, tom lane



Re: Tarball builds in the new world order

От
Tom Lane
Дата:
Greg Sabino Mullane <htamfids@gmail.com> writes:
> On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Now, if we just do it exactly like that
>> then trying to "make dist" without setting PG_COMMIT_HASH will
>> fail, since "git archive" has no default for its <tree-ish>
>> argument.  I can't quite decide if that's a good thing, or if we
>> should hack the makefile a little further to allow PG_COMMIT_HASH
>> to default to HEAD.

> Just having it fail seems harsh. What if we had plain "make dist" at least
> output a friendly hint about "please specify a hash"? That seems better
> than an implicit HEAD default, as they can manually set it to HEAD
> themselves per the hint.

Yeah, it would be easy to do something like

ifneq ($(PG_COMMIT_HASH),)
    $(GIT) ...
else
    @echo "Please specify PG_COMMIT_HASH." && exit 1
endif

I'm just debating whether that's better than inserting a default
value.

            regards, tom lane



Re: Tarball builds in the new world order

От
Tom Lane
Дата:
Concretely, I'm proposing the attached.  Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.

            regards, tom lane

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 30553b2a95..27357e5e3b 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -102,10 +102,18 @@ distdir-location:
 # on, Unix machines.

 $(distdir).tar.gz:
-    $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o
$(abs_top_builddir)/$@
+ifneq ($(PG_COMMIT_REFSPEC),)
+    $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ $(PG_COMMIT_REFSPEC) -o
$(abs_top_builddir)/$@
+else
+    @echo "Please specify PG_COMMIT_REFSPEC." && exit 1
+endif

 $(distdir).tar.bz2:
-    $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix
$(distdir)/HEAD -o $(abs_top_builddir)/$@ 
+ifneq ($(PG_COMMIT_REFSPEC),)
+    $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix
$(distdir)/$(PG_COMMIT_REFSPEC) -o $(abs_top_builddir)/$@ 
+else
+    @echo "Please specify PG_COMMIT_REFSPEC." && exit 1
+endif

 distcheck: dist
     rm -rf $(dummy)
--- mk-one-release.orig    2024-04-23 17:30:08.983226671 -0400
+++ mk-one-release    2024-04-26 15:17:29.713669677 -0400
@@ -39,13 +39,17 @@ mkdir pgsql
 git archive ${gitref} | tar xf - -C pgsql
 
 # Include the git ref in the output tarballs
+# (This has no effect with v17 and up; instead we rely on "git archive"
+# to include the commit hash in the tar header)
 echo ${gitref} >pgsql/.gitrevision
 
 cd pgsql
 ./configure
 
 # Produce .tar.gz and .tar.bz2 files
-make dist
+# (With v17 and up, this will repeat the "git archive" call; the contents
+# of the working directory don't matter directly to the results.)
+make dist PG_COMMIT_REFSPEC=${gitref}
 
 # Generate md5 and sha256 sums, and copy files to output
 for x in *.tar.*; do

Re: Tarball builds in the new world order

От
Alvaro Herrera
Дата:
On 2024-Apr-26, Tom Lane wrote:

> --- mk-one-release.orig    2024-04-23 17:30:08.983226671 -0400
> +++ mk-one-release    2024-04-26 15:17:29.713669677 -0400
> @@ -39,13 +39,17 @@ mkdir pgsql
>  git archive ${gitref} | tar xf - -C pgsql
>  
>  # Include the git ref in the output tarballs
> +# (This has no effect with v17 and up; instead we rely on "git archive"
> +# to include the commit hash in the tar header)
>  echo ${gitref} >pgsql/.gitrevision

Why is it that the .gitrevision file is only created here, instead of
being added to the tarball that "git archive" produces?  Adding an
argument like
    --add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC)

to the git archive call should suffice.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)



Re: Tarball builds in the new world order

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Why is it that the .gitrevision file is only created here, instead of
> being added to the tarball that "git archive" produces?  Adding an
> argument like
>     --add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC)
> to the git archive call should suffice.

I think we don't want to do that.  In the first place, it's redundant
because "git archive" includes the commit hash in the tar header,
and in the second place it gets away from the concept that the tarball
contains exactly what is in our git tree.

Now admittedly, if anyone's built tooling that relies on the presence
of the .gitrevision file, they might prefer that we keep on including
it.  But I'm not sure anyone has, and in any case I think switching
to the git-approved way of incorporating the hash is the best thing
in the long run.

What I'm thinking of doing, as soon as we've sorted the tarball
creation process, is to make a test tarball available to the
packagers group so that anyone interested can start working on
updating their packaging process for the new approach.  Hopefully,
if anyone's especially unhappy about omitting .gitrevision, they'll
speak up.

            regards, tom lane



Re: Tarball builds in the new world order

От
Peter Eisentraut
Дата:
On 26.04.24 21:24, Tom Lane wrote:
> Concretely, I'm proposing the attached.  Peter didn't like
> PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
> wedded to that if a better name is proposed.

This seems ok to me, but note that we do have an equivalent 
implementation in meson.  If we don't want to update that in a similar 
way, maybe we should disable it.




Re: Tarball builds in the new world order

От
Peter Eisentraut
Дата:
On 26.04.24 21:24, Tom Lane wrote:
> Concretely, I'm proposing the attached.  Peter didn't like
> PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
> wedded to that if a better name is proposed.

Um, "refspec" leads me here 
<https://git-scm.com/book/en/v2/Git-Internals-The-Refspec>, which seems 
like the wrong concept.  I think the more correct concept is "revision" 
(https://git-scm.com/docs/gitrevisions), so something like PG_GIT_REVISION?




Re: Tarball builds in the new world order

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 26.04.24 21:24, Tom Lane wrote:
>> Concretely, I'm proposing the attached.  Peter didn't like
>> PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
>> wedded to that if a better name is proposed.

> This seems ok to me, but note that we do have an equivalent 
> implementation in meson.  If we don't want to update that in a similar 
> way, maybe we should disable it.

OK.  After poking at that for awhile, it seemed like "default to
HEAD" fits into meson a lot better than "throw an error if the
variable isn't set", so I switched to doing it like that.
One reason is that AFAICT you can only set the variable during
"meson setup" not during "ninja".  This won't matter to the
tarball build script, which does a one-off configuration run
anyway.  But for manual use, a movable target like HEAD might be
more convenient given that behavior.

I tested this by building tarballs using the makefiles on a RHEL8
box, and using meson on my MacBook (with recent MacPorts tools).
I got bit-for-bit identical files, which I found rather impressive
given the gap between the platforms.  Maybe this "reproducible builds"
wheeze will actually work.

I also changed the variable name to PG_GIT_REVISION per your
other suggestion.

            regards, tom lane

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 30553b2a95..9e41794f60 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,6 +87,9 @@ update-unicode: | submake-generated-headers submake-libpgport
 distdir    = postgresql-$(VERSION)
 dummy    = =install=

+# git revision to be packaged
+PG_GIT_REVISION ?= HEAD
+
 GIT = git

 dist: $(distdir).tar.gz $(distdir).tar.bz2
@@ -102,10 +105,10 @@ distdir-location:
 # on, Unix machines.

 $(distdir).tar.gz:
-    $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o
$(abs_top_builddir)/$@
+    $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ $(PG_GIT_REVISION) -o
$(abs_top_builddir)/$@

 $(distdir).tar.bz2:
-    $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix
$(distdir)/HEAD -o $(abs_top_builddir)/$@ 
+    $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix
$(distdir)/$(PG_GIT_REVISION) -o $(abs_top_builddir)/$@ 

 distcheck: dist
     rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index cdfd31377d..1c0579d5a6 100644
--- a/meson.build
+++ b/meson.build
@@ -3469,6 +3469,8 @@ bzip2 = find_program('bzip2', required: false, native: true)

 distdir = meson.project_name() + '-' + meson.project_version()

+pg_git_revision = get_option('PG_GIT_REVISION')
+
 # Note: core.autocrlf=false is needed to avoid line-ending conversion
 # in case the environment has a different setting.  Without this, a
 # tarball created on Windows might be different than on, and unusable
@@ -3483,7 +3485,7 @@ tar_gz = custom_target('tar.gz',
             '-9',
             '--prefix', distdir + '/',
             '-o', join_paths(meson.build_root(), '@OUTPUT@'),
-            'HEAD', '.'],
+            pg_git_revision],
   output: distdir + '.tar.gz',
 )

@@ -3497,7 +3499,7 @@ if bzip2.found()
               '--format', 'tar.bz2',
               '--prefix', distdir + '/',
               '-o', join_paths(meson.build_root(), '@OUTPUT@'),
-              'HEAD', '.'],
+              pg_git_revision],
     output: distdir + '.tar.bz2',
   )
 else
diff --git a/meson_options.txt b/meson_options.txt
index 249ecc5ffd..246cecf382 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -55,6 +55,9 @@ option('atomics', type: 'boolean', value: true,
 option('spinlocks', type: 'boolean', value: true,
   description: 'Use spinlocks')

+option('PG_GIT_REVISION', type: 'string', value: 'HEAD',
+  description: 'git revision to be packaged by pgdist target')
+

 # Compilation options

--- mk-one-release.orig    2024-04-23 17:30:08.983226671 -0400
+++ mk-one-release    2024-04-29 12:10:38.106723387 -0400
@@ -39,13 +39,17 @@ mkdir pgsql
 git archive ${gitref} | tar xf - -C pgsql
 
 # Include the git ref in the output tarballs
+# (This has no effect with v17 and up; instead we rely on "git archive"
+# to include the commit hash in the tar header)
 echo ${gitref} >pgsql/.gitrevision
 
 cd pgsql
 ./configure
 
 # Produce .tar.gz and .tar.bz2 files
-make dist
+# (With v17 and up, this will repeat the "git archive" call; the contents
+# of the working directory don't matter directly to the results.)
+make dist PG_GIT_REVISION=${gitref}
 
 # Generate md5 and sha256 sums, and copy files to output
 for x in *.tar.*; do

Re: Tarball builds in the new world order

От
Peter Eisentraut
Дата:
On 29.04.24 18:14, Tom Lane wrote:
> Peter Eisentraut <peter@eisentraut.org> writes:
>> On 26.04.24 21:24, Tom Lane wrote:
>>> Concretely, I'm proposing the attached.  Peter didn't like
>>> PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
>>> wedded to that if a better name is proposed.
> 
>> This seems ok to me, but note that we do have an equivalent
>> implementation in meson.  If we don't want to update that in a similar
>> way, maybe we should disable it.
> 
> OK.  After poking at that for awhile, it seemed like "default to
> HEAD" fits into meson a lot better than "throw an error if the
> variable isn't set", so I switched to doing it like that.
> One reason is that AFAICT you can only set the variable during
> "meson setup" not during "ninja".  This won't matter to the
> tarball build script, which does a one-off configuration run
> anyway.  But for manual use, a movable target like HEAD might be
> more convenient given that behavior.

This patch looks good to me.




Re: Tarball builds in the new world order

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 29.04.24 18:14, Tom Lane wrote:
>> OK.  After poking at that for awhile, it seemed like "default to
>> HEAD" fits into meson a lot better than "throw an error if the
>> variable isn't set", so I switched to doing it like that.

> This patch looks good to me.

Pushed, thanks.

            regards, tom lane