Обсуждение: Adding CI to our tree
Hi, For several development efforts I found it to be incredibly valuable to push changes to a personal repository and see a while later whether tests succeed on a number of different platforms. This is especially useful for platforms that are quite different from ones own platform, like e.g. windows in my case. Of course everybody can set this up for themselves. However, doing so well is a significant effort, particularly if windows is to be supported well. And doubly so if useful things like getting backtraces for crashes is desirable ([1]) We do a form of pre-commit CI via cfbot. That is valuable. But it's not really comparable to having CI in tree - one need to post to the list and one cannot adjust the dependencies etc installed for the CI runs. New contributors (and quite a bit of older ones too) IMO expect to be able to see whether their changes work as-is, without sending a patch to the list. An obvious criticism of the effort to put CI runner infrastructure into core is that they are effectively all proprietary technology, and that we should be hesistant to depend too much on one of them. I think that's a valid concern. However, once one CI integration is done, a good chunk (but not all!) the work is transferrable to another CI solution, which I do think reduces the dependency sufficiently. The attached patch adds CI using cirrus-ci. The reason for choosing cirrus were that a) Thomas has ended up using cirrus for cfbot b) cirrus provides a comparatively wide variety of operating systems c) it allows custom VM images to be used. d) it does not require a login to look at c) is very valuable to be able to test e.g. upcoming linux versions, pre-installing software on systems that do not support docker (freebsd), and being faster to boot once the image is more than a trivial size. I've created a number of images for testing of the aio patchset [2] Right now the patch attached - runs check-world on FreeBSD, Linux, macOS - all using gcc - freebsd, linux use a custom generated image - macOS installs missing dependencies at runtime, with some caching - all use ccache to make subsequent compilation faster - runs all the tests I could find on windows, via vcregress.pl - checks for compiler warnings on linux, with both clang and gcc - captures all logs after a failing run - generates backtraces from core files (the output format differs between platforms) - allows to limit CI to certain OSs, by adding ci-os-only: (freebsd|linux|macos|windows)+ to the commit message (useful when fixing a platform dependent problem) Example output of a - successful run: https://cirrus-ci.com/build/4625606928236544 - github interface for the same: https://github.com/anarazel/postgres/runs/3772435617 - failed run on windows, with backtrace: https://cirrus-ci.com/task/6640561307254784?logs=cat_dumps#L150 Comments? Rotten tomatoes? There's some polishing we should do before actually adding this to the tree. But I wanted to discuss the idea before investing even more time. One policy discussion that we'd have to have is who should control the images used for CI. Right now that's on my personal google cloud account - which I am happy to do, but medium - long term that'd not be optimal. Thanks to Thomas - I based this on hist .cirrus.yml file. And made several contributions later. Also thanks to Andrew, who helped out with some windows issues I hit at some point... Greetings, Andres Freund [1] I did get this to work on windows, but it does require a small set of changes to work reliably, I'll start a separate thread about it. [2] https://github.com/anarazel/pg-vm-images/
Вложения
On Fri, Oct 1, 2021 at 3:28 PM Andres Freund <andres@anarazel.de> wrote: > For several development efforts I found it to be incredibly valuable to push > changes to a personal repository and see a while later whether tests succeed > on a number of different platforms. This is especially useful for platforms > that are quite different from ones own platform, like e.g. windows in my case. > An obvious criticism of the effort to put CI runner infrastructure into core > is that they are effectively all proprietary technology, and that we should be > hesistant to depend too much on one of them. I think that's a valid > concern. However, once one CI integration is done, a good chunk (but not all!) > the work is transferrable to another CI solution, which I do think reduces the > dependency sufficiently. I agree with everything you've said, including the nuanced parts about the possible downsides. We already know what happens when one of these CI providers stops providing open source projects with free resources, because that's exactly what happened with Travis CI: projects that use their infrastructure are mildly inconvenienced. I can't see any real notable downside, as long as we just use the resources that they make available for development work. Clearly these services could never replace the buildfarm. -- Peter Geoghegan
Hi, On 2021-10-02 01:49:39 +0200, 0010203112132233 wrote: > On Sat, 2 Oct 2021 at 00:28, Andres Freund <andres@anarazel.de> wrote: > > New contributors (and quite a bit of older ones too) IMO expect to be able to > > see whether their changes work as-is, without sending a patch to the list. > > Have they checked 'make installcheck-world'? I believe that is one of > the first action items on the 'So, you want to become a developer?' > wiki page, after downloading the sources. Of course, that is limited > to only the environment of the user, but that's what they're generally > developing for, right? If you want to get a change into postgres, it almost always needs to actually work on all operating systems, and always needs to at least not cause build failures on all platforms. > Furthermore, after looking it through, I think that Cirrus is an > unfortunate choice as a CI platform of preference, as you cannot use > it without access to Github (which is problematic for people located > in certain localities due to USA regulations). I agree that it's not optimal that cirrus isn't available on all git hosting platforms. Hence saying that I think it's likely we'd end up adding a few more platforms over time. If we factor the meat of the work into an helper script, so that the CI specific bit is just a couple invocation of that script, it's not a lot of overhead to have 2-3 CI platforms. > If we're going to include CI configuration for private use, I'd prefer if it > were a CI that can be enjoyed in private without pushing code to a 3rd > party. FWIW, you can use cirrus locally on your machine: https://github.com/cirruslabs/cirrus-cli It'll not be able to run all kinds of tasks though (e.g. no windows docker on a linux host, dealing with the license costs for that presumably would be nontrivial). > Lastly, I consider CI configuration similar to IDE configuration: each > developer has their own preferred tools which they use, but we don't > favour one over the other. We don't include IDE-specific configuration > files either, or at least, the policy is against that. > > So, I greatly appreciate the effort, but I don't think this is > something that should be committed into core. Maybe as a dedicated > wiki page detailing configurations for CI, similar to the Buildfarm > page? That doesn't scale - I've actually added CI to all my substantial development work in my private branches, and it's pretty annoying to need to do so every time. And there's many developers who won't go through the effort most of the time. It's not like this forces you to use cirrus or anything. For people that don't want to use CI, It'll make cfbot a bit more effective (because people can adjust what it tests as appropriate for $patch), but that's it. Greetings, Andres Freund
On Sat, Oct 2, 2021 at 1:10 PM Andres Freund <andres@anarazel.de> wrote: > On 2021-10-02 01:49:39 +0200, 0010203112132233 wrote: > > Furthermore, after looking it through, I think that Cirrus is an > > unfortunate choice as a CI platform of preference, as you cannot use > > it without access to Github (which is problematic for people located > > in certain localities due to USA regulations). > > I agree that it's not optimal that cirrus isn't available on all git hosting > platforms. Hence saying that I think it's likely we'd end up adding a few more > platforms over time. If we factor the meat of the work into an helper script, > so that the CI specific bit is just a couple invocation of that script, it's > not a lot of overhead to have 2-3 CI platforms. BTW I think they might be considering supporting other code hosting platforms (at least they ask for feedback on this at https://cirrus-ci.org/guide/quick-start/ ). > > Lastly, I consider CI configuration similar to IDE configuration: each > > developer has their own preferred tools which they use, but we don't > > favour one over the other. We don't include IDE-specific configuration > > files either, or at least, the policy is against that. We have some files in the tree to help users of Emacs, vim, and even make github format text the way we like. Personally, I think that if someone is willing to develop and maintain high quality CI control files that work for any public free-for-open-source CI system, then we should accept them too. It costs very little to have a few .something.yml files at top level. If at any point the file for a given provider is showing signs of being unmaintained, we can remove it. Personally, I'm willing and able to help maintain Cirrus control files, not least because it means that cfbot will become simpler and will match exactly what you can get in your own github account. I really like Cirrus because our project has more portability concerns than most, and most other CIs are like "we got both kinds, country and western!". I wanted to add FreeBSD to cfbot, which is something they advertise as a feature, but it looks like at least 3 other OSes we target would probably work just as well given a suitable image.
Andres Freund <andres@anarazel.de> writes: > It's not like this forces you to use cirrus or anything. For people that don't > want to use CI, It'll make cfbot a bit more effective (because people can > adjust what it tests as appropriate for $patch), but that's it. Yeah. I cannot see any reason to object to Andres' 0002 patch: you can just ignore those files if you don't want to use cirrus. It does set a precedent that we'd also accept infrastructure for other CI systems, but as long as they're similarly noninvasive, why not? (Maybe there needs to be one more directory level though, ie ci/cirrus/whatever. I don't want to end up with one toplevel directory per CI platform.) I don't know enough about Windows to evaluate 0001, but I'm a little worried about it because it looks like it's changing our *production* error handling on that platform. As for 0003, wasn't that committed already? regards, tom lane
> On 2 Oct 2021, at 00:27, Andres Freund <andres@anarazel.de> wrote: > For several development efforts I found it to be incredibly valuable to push > changes to a personal repository and see a while later whether tests succeed > on a number of different platforms. This is especially useful for platforms > that are quite different from ones own platform, like e.g. windows in my case. Same, and for my case I run several CI jobs to compile/test against different OpenSSL versions etc. > Of course everybody can set this up for themselves. However, doing so well is > a significant effort, particularly if windows is to be supported well. And > doubly so if useful things like getting backtraces for crashes is desirable > ([1]) +1 on adding these, rather than having everyone duplicate the effort. Those who don't want to use them can disregard them. > Right now the patch attached > - runs check-world on FreeBSD, Linux, macOS - all using gcc > - freebsd, linux use a custom generated image > - macOS installs missing dependencies at runtime, with some caching > - all use ccache to make subsequent compilation faster > - runs all the tests I could find on windows, via vcregress.pl > - checks for compiler warnings on linux, with both clang and gcc Why not compiling with OpenSSL on FreeBSD and macOS? On FreeBSD all you need is --with-ssl=openssl while on macOS you need to point to the headers and libs like: --with-includes=/usr/local/include:/usr/local/opt/openssl/include --with-libs=/usr/local/libs:/usr/local/opt/openssl/lib One thing to note for Cirrus on macOS (I've never seen it anywhere else) is that it intermittently will fail on a too long socketpath: Unix-domain socket path "/private/var/folders/wh/z5_y2cv53sg24tzvtw_f_y1m0000gn/T/cirrus-ci-build/src/bin/pg_upgrade/.s.PGSQL.51696"is too long (maximum103 bytes) Exporting PGSOCKETDIR can avoid that annoyance. + tests_script: + - su postgres -c 'ulimit -c unlimited ; ${TIMEOUT_CMD} make -s ${CHECK} ${CHECKFLAGS} -j8' Don't you need PG_TEST_EXTRA=ssl here to ensure the src/test/ssl tests are run? -- Daniel Gustafsson https://vmware.com/
Hi, On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote: > > On 2 Oct 2021, at 00:27, Andres Freund <andres@anarazel.de> wrote: > > Right now the patch attached > > - runs check-world on FreeBSD, Linux, macOS - all using gcc > > - freebsd, linux use a custom generated image > > - macOS installs missing dependencies at runtime, with some caching > > - all use ccache to make subsequent compilation faster > > - runs all the tests I could find on windows, via vcregress.pl > > - checks for compiler warnings on linux, with both clang and gcc > > Why not compiling with OpenSSL on FreeBSD and macOS? On FreeBSD all you need > is --with-ssl=openssl while on macOS you need to point to the headers and libs > like: > > --with-includes=/usr/local/include:/usr/local/opt/openssl/include --with-libs=/usr/local/libs:/usr/local/opt/openssl/lib Yea, there's several things like that, that should be added. The CI files originated from development where breakage around SSL wasn't likely (AIO, shared memory stats, procarray scalability etc), so I didn't focussed on that angle. Needing to get all that stuff right on multiple platforms is one of the reasons why I think having this thing in-tree would be good. No need for everyone to discover the magic incantations themselves. Even if you e.g. might want to extend them to test multiple SSL versions or such, it's a lot easier to do that if the basics are there. > One thing to note for Cirrus on macOS (I've never seen it anywhere else) is > that it intermittently will fail on a too long socketpath: I've seen it somewhere else before. It wasn't even intermittent - it always failed. I worked around that by setting CIRRUS_WORKING_DIR: ${HOME}/pgsql/ - also made output including filenames easier to read ;) > + tests_script: > + - su postgres -c 'ulimit -c unlimited ; ${TIMEOUT_CMD} make -s ${CHECK} ${CHECKFLAGS} -j8' > Don't you need PG_TEST_EXTRA=ssl here to ensure the src/test/ssl tests are run? Probably. I quickly added that stuff, we'll see how many mistakes I made: https://cirrus-ci.com/build/5846034501861376 Greetings, Andres Freund
Hi, On 2021-10-02 12:41:07 -0700, Andres Freund wrote: > On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote: > > + tests_script: > > + - su postgres -c 'ulimit -c unlimited ; ${TIMEOUT_CMD} make -s ${CHECK} ${CHECKFLAGS} -j8' > > Don't you need PG_TEST_EXTRA=ssl here to ensure the src/test/ssl tests are run? > > Probably. I quickly added that stuff, we'll see how many mistakes I made: > https://cirrus-ci.com/build/5846034501861376 I wonder if we shouldn't stop skipping the ssl / kerberos / ldap (and perhaps others) tests in the makefile, and instead do so in the tap tests themselves. Then one can see them included as the skipped in the tap result output, which seems like it'd make it easier to discover them? Greetings, Andres Freund
> On 2 Oct 2021, at 21:45, Andres Freund <andres@anarazel.de> wrote: > I wonder if we shouldn't stop skipping the ssl / kerberos / ldap (and perhaps > others) tests in the makefile, and instead do so in the tap tests > themselves. Then one can see them included as the skipped in the tap result > output, which seems like it'd make it easier to discover them? I am definitely in favor of doing that, better to see them skipped rather than having to remember to opt in. We even do so already to some extent already, like for example the SSL tests: if ($ENV{with_ssl} ne 'openssl') { plan skip_all => 'OpenSSL not supported by this build'; } -- Daniel Gustafsson https://vmware.com/
> On 2 Oct 2021, at 21:41, Andres Freund <andres@anarazel.de> wrote: >> One thing to note for Cirrus on macOS (I've never seen it anywhere else) is >> that it intermittently will fail on a too long socketpath: > > I've seen it somewhere else before. It wasn't even intermittent - it always > failed. I worked around that by setting CIRRUS_WORKING_DIR: ${HOME}/pgsql/ - > also made output including filenames easier to read ;) Aha, nice trick! Didn't know about that one but that's easier than setting specific dirs via PG* environment vars. -- Daniel Gustafsson https://vmware.com/
Hi, On 2021-10-02 11:05:20 -0400, Tom Lane wrote: > Yeah. I cannot see any reason to object to Andres' 0002 patch: you can > just ignore those files if you don't want to use cirrus. It does set a > precedent that we'd also accept infrastructure for other CI systems, > but as long as they're similarly noninvasive, why not? Exactly. > (Maybe there needs to be one more directory level though, ie > ci/cirrus/whatever. I don't want to end up with one toplevel directory per > CI platform.) Good question - it definitely shouldn't be one toplevel directory per CI platform (although some will require their own hidden toplevel directories, like .github/workflows etc). I'd hope to share a bunch of the infrastructure between them over time, so perhaps we don't need a deeper hierarchy. > I don't know enough about Windows to evaluate 0001, but I'm a little > worried about it because it looks like it's changing our *production* > error handling on that platform. Yea. It's clearly not ready as-is - it's the piece that I was planning to write a separate email about. It's hard to understand what *precisely* SEM_NOGPFAULTERRORBOX etc do. What I do know is that without the _set_abort_behavior() stuff abort() doesn't trigger windows' "crash" paths in at least debugging builds, and that the SetErrorMode() and _CrtSetReportMode() changes are necessary to get segfaults to reach the crash paths. The in-tree behaviour turns out to make debugging on windows a major pain, at least when compiling with msvc. Crashes never trigger core dumps or "just in time" debugging (their term for invoking a debugger upon crash), so one has to attach to processes before they crash, to have any chance of debugging. As far as I can tell this also means that at least for debugging builds, pgwin32_install_crashdump_handler() is pretty much dead weight - crashDumpHandler() never gets invoked. I think it may get invoked for abort()s in production builds, but probably not for segfaults. And despite SEM_NOGPFAULTERRORBOX we display those annoying "popup" boxes telling us about the crash and giving the option to retry, ignore, something something. It's all a bit baffling. > As for 0003, wasn't that committed already? Not at the time I was writing the email, but now it is, yes. Greetings, Andres Freund
Hi, On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote: > Same, and for my case I run several CI jobs to compile/test against different > OpenSSL versions etc. On that note: Did you do this for windows? If so, I'd rather not figure that out myself... Greetings, Andres Freund
> On 2 Oct 2021, at 22:01, Andres Freund <andres@anarazel.de> wrote: > On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote: >> Same, and for my case I run several CI jobs to compile/test against different >> OpenSSL versions etc. > > On that note: Did you do this for windows? If so, I'd rather not figure that > out myself... Not with Cirrus, I've been using Appveyor for Windows and they provide 1.0.2 - 3.0.0 which can easily set in config.pl with for example: $config->{openssl} = 'C:\OpenSSL-v111-Win64'; -- Daniel Gustafsson https://vmware.com/
On 10/2/21 11:05 AM, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> It's not like this forces you to use cirrus or anything. For people that don't >> want to use CI, It'll make cfbot a bit more effective (because people can >> adjust what it tests as appropriate for $patch), but that's it. > Yeah. I cannot see any reason to object to Andres' 0002 patch: you can > just ignore those files if you don't want to use cirrus. Yeah. I enable cirrus selectively on my github repos, which makes it close to impossible to get an unwanted effect. One of the things I like about this is that it institutionalizes some knowledge that has hitherto been mostly private. I have a lot of this in a setup I use for spinning up test instances, but this makes a lot of that sort of knowledge more broadly available. I hope it will also encourage people to test more widely, given how easy it will make it. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > I hope it will also encourage people to test more widely, given how easy > it will make it. If you'd like that, there would need to be some (ahem) documentation of how to use it. regards, tom lane
Hi, On 2021-10-02 16:44:44 -0400, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > I hope it will also encourage people to test more widely, given how easy > > it will make it. > > If you'd like that, there would need to be some (ahem) documentation > of how to use it. Yea, definitely necessary. Where would we want it to be? ci/README.md? That'd be viewable on the various git hosting platforms. I guess there's an argument for it to be in the sgml docs, but that doesn't seem all that useful in this case. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-10-02 16:44:44 -0400, Tom Lane wrote: >> If you'd like that, there would need to be some (ahem) documentation >> of how to use it. > Yea, definitely necessary. Where would we want it to be? ci/README.md? That'd > be viewable on the various git hosting platforms. I guess there's an argument > for it to be in the sgml docs, but that doesn't seem all that useful in this > case. A README seems plenty good enough to me. Maybe -0.1 for making it .md rather than plain text ... plain text is our habit everywhere else AFAIR. regards, tom lane
Hi, On October 2, 2021 1:18:38 PM PDT, Daniel Gustafsson <daniel@yesql.se> wrote: >> On 2 Oct 2021, at 22:01, Andres Freund <andres@anarazel.de> wrote: >> On 2021-10-02 20:42:00 +0200, Daniel Gustafsson wrote: >>> Same, and for my case I run several CI jobs to compile/test against different >>> OpenSSL versions etc. >> >> On that note: Did you do this for windows? If so, I'd rather not figure that >> out myself... > >Not with Cirrus, I've been using Appveyor for Windows and they provide 1.0.2 - >3.0.0 which can easily set in config.pl with for example: > > $config->{openssl} = 'C:\OpenSSL-v111-Win64'; Got the build part working (although the state of msvc compatible openssl distribution on windows seems a bit scary). Howeverthe ssl tests don't fully succeed: https://cirrus-ci.com/task/6264790323560448?logs=ssl#L655 I didn't see code in the bf client code running the test so perhaps that's not too surprising :/ Did you run those tests on windows? Regards, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi, On 2021-10-02 21:05:17 -0700, Andres Freund wrote: > Got the build part working (although the state of msvc compatible openssl > distribution on windows seems a bit scary). However the ssl tests don't > fully succeed: > > https://cirrus-ci.com/task/6264790323560448?logs=ssl#L655 > > I didn't see code in the bf client code running the test so perhaps that's > not too surprising :/ > > Did you run those tests on windows? As you can see in the test output, every mismatch prints the whole file, despite only intending to show the tail. Which appears to be because the windows portion of 3c5b0685b921 doesn't actually work. The reason for that in turn is that afaict the setFilePointer doesn't change the file position in a way that affects perl. Consequently, if I force the !win32 path, the tests pass. At first I assumed the cause of this is that while the setFilePointer() modifies the state of the underlying handle, it doesn't actually let perl know about that. Due to buffering etc perl likely has its own bookeeping about the position in the file. There's some pretty clear hints in https://perldoc.perl.org/functions/seek But the problem turns out to be that it's bogus to pass $fh to setFilePointer(). That's a perl handle, not an win32 handle. Fixing that seems to make the tests pass. Why did 3c5b0685b921 choose to use setFilePointer() in the first place? At this point it's a perl filehandle, so we should just use perl seek? Leaving the concrete breakage aside, I'm somewhat unhappy that there's not a single comment explaining why TestLib.pm is trying to use native windows APIs. Isn't the code as-is also "leaking" an open IO::Handle? There's a CloseHandle($fHandle), but nothing is done to $fh. But perhaps there's some perl magic cleaning things up? Even if so, loks like just closing $fh will close the handle as well... Greetings, Andres Freund
Hi, On 2021-10-03 10:18:31 -0700, Andres Freund wrote: > As you can see in the test output, every mismatch prints the whole file, > despite only intending to show the tail. Which appears to be because the > windows portion of 3c5b0685b921 doesn't actually work. The reason for that in > turn is that afaict the setFilePointer doesn't change the file position in a > way that affects perl. > > Consequently, if I force the !win32 path, the tests pass. > > At first I assumed the cause of this is that while the setFilePointer() modifies the > state of the underlying handle, it doesn't actually let perl know about > that. Due to buffering etc perl likely has its own bookeeping about the > position in the file. There's some pretty clear hints in > https://perldoc.perl.org/functions/seek > > But the problem turns out to be that it's bogus to pass $fh to > setFilePointer(). That's a perl handle, not an win32 handle. Fixing that seems > to make the tests pass. It does (I only let it run to the ssl test, then pushed a newer revision): https://cirrus-ci.com/task/5345293928497152?logs=ssl#L5 > Why did 3c5b0685b921 choose to use setFilePointer() in the first place? At > this point it's a perl filehandle, so we should just use perl seek? > > > Leaving the concrete breakage aside, I'm somewhat unhappy that there's not a > single comment explaining why TestLib.pm is trying to use native windows > APIs. > > Isn't the code as-is also "leaking" an open IO::Handle? There's a > CloseHandle($fHandle), but nothing is done to $fh. But perhaps there's some > perl magic cleaning things up? Even if so, loks like just closing $fh will > close the handle as well... I think something roughly like the attached might be a good idea. Runs locally on linux, and hopefully still on windows https://cirrus-ci.com/build/4857291573821440 Greetings, Andres Freund
Вложения
> On 3 Oct 2021, at 06:05, Andres Freund <andres@anarazel.de> wrote: > Did you run those tests on windows? Sorry, failed to mention I only compile it for now, I hadn't reached trying to run the tests yet. I see you started on that in this thread, so thank you for that! -- Daniel Gustafsson https://vmware.com/
On 10/3/21 1:30 PM, Andres Freund wrote: > >> Why did 3c5b0685b921 choose to use setFilePointer() in the first place? At >> this point it's a perl filehandle, so we should just use perl seek? >> >> >> Leaving the concrete breakage aside, I'm somewhat unhappy that there's not a >> single comment explaining why TestLib.pm is trying to use native windows >> APIs. >> >> Isn't the code as-is also "leaking" an open IO::Handle? There's a >> CloseHandle($fHandle), but nothing is done to $fh. But perhaps there's some >> perl magic cleaning things up? Even if so, loks like just closing $fh will >> close the handle as well... > I think something roughly like the attached might be a good idea. Runs locally > on linux, and hopefully still on windows > > https://cirrus-ci.com/build/4857291573821440 > Looks sane, thanks. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2021-10-04 11:07:07 -0400, Andrew Dunstan wrote: > Looks sane, thanks. Thanks for looking. Pushed to all branches.
On Sat, Oct 2, 2021 at 11:27 AM Andres Freund <andres@anarazel.de> wrote: > - runs check-world on FreeBSD, Linux, macOS - all using gcc Small correction: on macOS and FreeBSD it's using the vendor compiler, which is some kind of clang. BTW, on those two OSes there are some messages like this each time a submake dumps its output to the log: [03:36:16.591] fcntl(): Bad file descriptor It seems worth putting up with these compared to the alternatives of either not using -j, not using -Otarget and having the output of parallel tests all mashed up and unreadable (that still happen sometimes but it's unlikely, because the submakes write() whole output chunks at infrequent intervals), or redirecting to a file so you can't see the realtime test output on the main CI page (not so fun, you have to wait until it's finished and view it as an 'artifact'). I tried to write a patch for GNU make to fix that[1], let's see if something happens. [1] https://savannah.gnu.org/bugs/?52922
Hi, On 2021-10-06 17:01:53 +1300, Thomas Munro wrote: > On Sat, Oct 2, 2021 at 11:27 AM Andres Freund <andres@anarazel.de> wrote: > > - runs check-world on FreeBSD, Linux, macOS - all using gcc > > Small correction: on macOS and FreeBSD it's using the vendor compiler, > which is some kind of clang. Oh, oops. I guess that's even better ;). > I tried to write a patch for GNU make to fix that[1], let's see if something > happens. > > [1] https://savannah.gnu.org/bugs/?52922 It'd be nice to get rid of these... They're definitely confusing. Greetings, Andres Freund
On 02.10.21 00:27, Andres Freund wrote: > The attached patch adds CI using cirrus-ci. I like this in principle. But I don't understand what the docker stuff is about. I have used Cirrus CI before, and didn't have to do anything about Docker. This could use some explanation.
Hi, On 2021-10-10 21:48:09 +0200, Peter Eisentraut wrote: > On 02.10.21 00:27, Andres Freund wrote: > > The attached patch adds CI using cirrus-ci. > > I like this in principle. But I don't understand what the docker stuff is > about. I have used Cirrus CI before, and didn't have to do anything about > Docker. This could use some explanation. You don't *have* to do anything about docker - but especially for windows it takes longer to build without your own container, because we'd need to install our dependencies every time. And that turns out to take a while. Right now the docker containers are built as part of CI (cirrus rebuilds them when the container definition changes), but that doesn't have to be that way, we could do so independently of cirrus, so that they are usable on other platforms as well - although it's advantageous to use the cirrus containers as the base, as they're cached on the buildhosts. In principle we could also use docker for the linux tests, but I found that we can get better results using full blown virtual machines. Those I currently build from a separate repo, as mentioned upthread. There is a linux docker container, but that currently runs a separate task that compiles with -Werror for gcc, clang with / without asserts. That's a separate task so that compile warnings don't prevent one from seeing whether tests worked etc. One thing I was thinking of adding to the "compile warning" task was to cross-compile postgres from linux using mingw - that's a lot faster than running the windows builds, and it's not too hard to break that accidentally. Greetings, Andres Freund
On Sat, 2 Oct 2021 at 17:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > It's not like this forces you to use cirrus or anything. For people that don't > > want to use CI, It'll make cfbot a bit more effective (because people can > > adjust what it tests as appropriate for $patch), but that's it. I don't disagree on that part, but I fail to see what makes the situations of an unused CI config file in the tree and an unused `/.idea/` or `/.vs/` specifier in the .gitignore [0][1] distinct enough for it to be resolved differently. Both are quality-of-life additions for those that use that tool, while non-users of that tool can ignore those configuration entries. > Yeah. I cannot see any reason to object to Andres' 0002 patch: you can > just ignore those files if you don't want to use cirrus. It does set a > precedent that we'd also accept infrastructure for other CI systems, > but as long as they're similarly noninvasive, why not? (Maybe there > needs to be one more directory level though, ie ci/cirrus/whatever. > I don't want to end up with one toplevel directory per CI platform.) With the provided arguments I won't object to the addition of these config files, but I would appreciate it if a clear policy could be provided on the inclusion of configurations for external tools that are not expected to be used by all users of the repository, such as CI, editors and IDEs. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/flat/OS3PR01MB71593D78DD857C2BBA9FB824F2A69%40OS3PR01MB7159.jpnprd01.prod.outlook.com [1] https://www.postgresql.org/message-id/flat/15BFD11D-5D72-46B2-BDB1-2DF4E049C13D%40me.com
Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > I don't disagree on that part, but I fail to see what makes the > situations of an unused CI config file in the tree and an unused > `/.idea/` or `/.vs/` specifier in the .gitignore [0][1] distinct > enough for it to be resolved differently. Both are quality-of-life > additions for those that use that tool, while non-users of that tool > can ignore those configuration entries. Um ... I don't see a connection at all. One is talking about files we put into the git tree, and one is talking about files that are *not* in the tree. We do have a policy that files that are created by a supported build process should be .gitignore'd, so that might lead to more .gitignore entries as this idea moves ahead. I'm not on board though with the idea of .gitignore'ing anything that anybody anywhere thinks is junk. That's more likely to lead to conflicts and mistakes than anything useful. We expect developers to have personal excludesfile lists that block off editor backup files and other cruft from the tools that they personally use but are not required by the build. regards, tom lane
On 10/21/21 5:55 PM, Matthias van de Meent wrote: > On Sat, 2 Oct 2021 at 17:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Andres Freund <andres@anarazel.de> writes: >>> It's not like this forces you to use cirrus or anything. For people that don't >>> want to use CI, It'll make cfbot a bit more effective (because people can >>> adjust what it tests as appropriate for $patch), but that's it. > > I don't disagree on that part, but I fail to see what makes the > situations of an unused CI config file in the tree and an unused > `/.idea/` or `/.vs/` specifier in the .gitignore [0][1] distinct > enough for it to be resolved differently. Both are quality-of-life > additions for those that use that tool, while non-users of that tool > can ignore those configuration entries. There is a better solution to that. Just add those files to the global gitignore on your machine. You will want to ignore those files in all git repositories on your machine anyway. On the other hand the configuration files for the CI are relevant to just the PostgreSQL repo. Andreas
>Just add those files to the global gitignore on your machine
While global gitignore is a nice feature, it won't protect users who do not know they need to create a global ignore file.
Adding explicit excludes for well-known temporary files into PostgreSQL sources makes it easier to work with the sources for everybody.
Less manual configuration is better for having a productive environment.
On top of that, there is even a use-case for having .idea folder in Git:
.idea/icon.png and .idea/icon_dark.png files are displayed in JetBrains IDEs so the list of projects becomes easier to distinguish.
AFAIK, a standard icon configuration does not yet exist: https://github.com/editorconfig/editorconfig/issues/425
Here's are samples: https://github.com/junit-team/junit5/blob/4ddc786728bc3fbc68d6a35d2eeeb63eb3e85609/.idea/icon.png, https://github.com/gradle/gradle/tree/1be71a9cd8882b08a9f8728d44eac8f65a33fbda/.idea
Vladimir
Hi, Attached is an updated version of the CI patches. Changes: - more optional features are enabled on various platforms, including building with openssl on windows - added somewhat minimal, README explaining how CI can enabled in a repository - some cleanup to the windows crash reporting support. I also moved the code main.c code changes to after the CI stuff themselves, as it might be a bit harder to get into a committable shape (at least for me) Greetings, Andres Freund
Вложения
Hi, I reviewed the first patch in this set (v2-0001-ci-Add-CI-for-FreeBSD-Linux-MacOS-and-Windows-uti.patch). For the README, I found the instructions very clear. My only concern is that the cirrus-ci UI will change and the instructions on how to enable cirrus-ci on a repository will not be accessible in the same way in the future. That being said, I found your instructions easier to follow than those on [1]. Perhaps it is better to wait until it becomes a problem and then, at that point, change the README to guide people to the quickstart link. I have attached a patch which does a small refactor using a yaml anchor and aliases (tried it and it seems to work for me). A few questions and thoughts: - do you not need to change the default core resource limits for FreeBSD? - Would you find it valuable to set a few more coredump_filter bits? Might be worth setting bits 2 and 3 (see [2])-- in addition to the defaults (on Linux -- I don't know what the equivalent is on other platforms). - I found this line a bit confusing, so maybe it is worth a comment sysinfo_script: - export || true - For the docker files, I think it is recommended to run "docker build" only from within the specific build context (not in the top-level directory), so I don't think you should need the dockerignore file. Also, instead of putting the two docker files in the same directory, you could put them in dedicated directories (making those directories their build context). That way if you change one you don't end up rebuilding the other. - In ci/docker/linux_debian_bullseye, you can make this change: - apt-get clean + apt-get clean && \ + rm -f /var/lib/apt/lists/* to make that layer smaller. - Melanie [1] https://cirrus-ci.org/guide/quick-start/ [2] https://man7.org/linux/man-pages/man5/core.5.html
Вложения
On Sat, Nov 20, 2021 at 11:17 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > - do you not need to change the default core resource limits for > FreeBSD? Unfortunately the performance really sucks on that FreeBSD CI system if you crank it up, and I haven't had time to figure out why yet :-/ Possibly something to do with large numbers of files being created and unlinked concurrently under -jX on slow storage is going awry...
Hi, On 2021-11-19 17:17:44 -0500, Melanie Plageman wrote: > For the README, I found the instructions very clear. My only concern is > that the cirrus-ci UI will change and the instructions on how to enable > cirrus-ci on a repository will not be accessible in the same way in the > future. I think we can just adjust things at that point, I'm not too worried about past instructions not working. > I have attached a patch which does a small refactor using a yaml anchor > and aliases (tried it and it seems to work for me). Oh, neat. Yaml is so weird. > - Would you find it valuable to set a few more coredump_filter bits? > Might be worth setting bits 2 and 3 (see [2])-- in addition to the > defaults (on Linux -- I don't know what the equivalent is on other > platforms). I don't think we need 2/3 - we don't have file backed mappings. In some situations setting bit 6 (shared huge pages) would make sense - but here we don't configure them... > - I found this line a bit confusing, so maybe it is worth a comment > sysinfo_script: > - export || true We can probably just get rid of the ||. It was just so that a missing 'export' builtin didn't cause execution to abort. But that won't randomly vanish, so it's fine. > - For the docker files, I think it is recommended to run "docker build" > only from within the specific build context (not in the top-level > directory), so I don't think you should need the dockerignore file. We don't have control over that in this case - it's cirrus invoking docker, and it just uses the whole repo as the context. > - In ci/docker/linux_debian_bullseye, you can make this change: > - apt-get clean > + apt-get clean && \ > + rm -f /var/lib/apt/lists/* Might not matter too much compared to the size of the whole thing, but it definitely won't hurt... Greetings, Andres Freund
Hi, Attached is an updated version of the CI patches. An example of a test run with the attached version of this https://cirrus-ci.com/build/6501998521483264 I again included the commit allowing crash dumps to be collected on windows, but I don't think it can be merged as-is, and should be left for later. Changes since v2: - Address review comments - Build with further optional features enabled. I think just about everything reasonable is now enabled on freebsd, linux and macos. There's quite a bit more that could be done on windows, but I think it's good enough for now. - I added cross-compilation to windows from linux, to the "warnings" task. Occasionally there are build-system issues specific to cross-compilation, and the set of warnings are different. - Docs are now built as part of the 'CompilerWarnings' task. - I improved the CI README a bit more, in particular I added docs for the 'ci-os-only' tag I added to the CI logic, which lets one select which operating systems test get to run on. - Some of the 'Warnings' tasks now build with --enable-dtrace (once with optimizations, once without). It's pretty easy to break probes without seeing the problem locally. - Switched to using PG_TEST_USE_UNIX_SOCKETS for windows. Without that I was seeing occasional spurious test failures due to the use of PROVE_FLAGS= -j10, to make the otherwise serial execution of tests on windows bearable. - switch macos task to use monterey - plenty smaller changes / cleanups There of course is a lot more that can be done [1], but I am pretty happy with what this covers. I'd like to commit this soon. There's two aspects that perhaps deserve a bit more discussion before doing so though: One I explicitly brought up before: On 2021-10-01 15:27:52 -0700, Andres Freund wrote: > One policy discussion that we'd have to have is who should control the images > used for CI. Right now that's on my personal google cloud account - which I am > happy to do, but medium - long term that'd not be optimal. The proposed CI script uses custom images to run linux and freebsd tests. They are automatically built every day from the repository https://github.com/anarazel/pg-vm-images/ These images have all the prerequisites pre-installed. For Linux something similar can be achieved by using dockerfiles referenced the .cirrus.yml file, but for FreeBSD that's not available. Installing the necessary dependencies on every run is too time intensive. For linux, the tests run a lot faster in full-blown VMs than in docker, and full VMs allow a lot more control / debugging. I think this may be OK for now, but I also could see arguments for wanting to transfer the image specifications and the google account to PG properties. The second attention-worthy point is the choice of a new toplevel ci/ directory as the location for resources referencenced by CI. A few other projects also use ci/, but I can also see arguments for moving the contents to e.g. src/tools/ci or such? Greetings, Andres Freund [1] Some ideas for what could make sense to extend CI to in the future: - also test with msys / mingw on windows - provide more optional dependencies for windows build - Extend the set of compiler warnings - as the compiler version is controlled, we could be more aggressive than we can be via configure.ac. - Add further distributions / platforms. Possibly as "manual" tasks - the amount resources one CI user gets is limited, so running tests on all platforms all the time would make tests take longer. Interesting things could be: - further linux distributions, particularly long-term supported ones - Some of the other BSDs. There currently are no pre-made images for openbsd/netbsd, but it shouldn't be too hard to script building them. - running some tests on ARM could be interesting, cirrus supports that for container based builds now - run checks like cpluspluscheck as part of the CompilerWarnings task - add tasks for running tests with tools like asan / ubsan (valgrind will be too slow). - consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES, and run-time force_parallel_mode = regress on some platforms. They seem to catch a lot of problems during development and are likely affordable enough.
Вложения
On Mon, Dec 13, 2021 at 01:12:23PM -0800, Andres Freund wrote: > Hi, > > Attached is an updated version of the CI patches. An example of a test run > with the attached version of this > https://cirrus-ci.com/build/6501998521483264 sudo is used exactly twice; maybe it's not needed at all ? > +task: > + name: FreeBSD ... > + sysconfig_script: > + - sudo sysctl kern.corefile='/tmp/%N.%P.core' > +task: > + name: macOS ... > + core_install_script: > + - sudo chmod 777 /cores typos: binararies dont't
Hi, On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote: > On Mon, Dec 13, 2021 at 01:12:23PM -0800, Andres Freund wrote: > > Hi, > > > > Attached is an updated version of the CI patches. An example of a test run > > with the attached version of this > > https://cirrus-ci.com/build/6501998521483264 > > sudo is used exactly twice; maybe it's not needed at all ? The macos one is needed, but the freebsd one indeed isn't. > > +task: > > + name: FreeBSD > ... > > + sysconfig_script: > > + - sudo sysctl kern.corefile='/tmp/%N.%P.core' > > > +task: > > + name: macOS > ... > > + core_install_script: > > + - sudo chmod 777 /cores > > typos: > binararies > dont't Oops, thanks. Typos and sudo use are fixed in the repo [1]. Greetings, Andres Freund [1] https://github.com/anarazel/postgres/tree/ci
Andres Freund <andres@anarazel.de> writes: > On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote: >> sudo is used exactly twice; maybe it's not needed at all ? > The macos one is needed, but the freebsd one indeed isn't. I'm with Justin on this one. I would view a script trying to mess with /cores as a hostile act. PG cores on macOS tend to be extremely large and can fill up your disk fairly quickly if you don't know they're being accumulated. I think it's okay to suggest in the documentation that people might want to allow cores to be dropped, but the script has NO business trying to force that. regards, tom lane
Hi, On 2021-12-13 18:14:52 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote: > >> sudo is used exactly twice; maybe it's not needed at all ? > > > The macos one is needed, but the freebsd one indeed isn't. > > I'm with Justin on this one. I would view a script trying to > mess with /cores as a hostile act. PG cores on macOS tend to > be extremely large and can fill up your disk fairly quickly > if you don't know they're being accumulated. I think it's okay > to suggest in the documentation that people might want to allow > cores to be dropped, but the script has NO business trying to > force that. I'm not quite following. This is a ephemeral CI instance? Greetings, Andres Freund
On Tue, Dec 14, 2021 at 10:12 AM Andres Freund <andres@anarazel.de> wrote: > Attached is an updated version of the CI patches. An example of a test run > with the attached version of this > https://cirrus-ci.com/build/6501998521483264 I've been pushing various versions of these patches into my own development branches for a while now; they're working very nicely and helping me a lot. This is vastly better than anything I was doing before, especially on Windows which is a blind spot for most of us. It'll be great to see this committed, and continue improving it in-tree. I'd better go and figure out how to fix cfbot when this lands... > I think this may be OK for now, but I also could see arguments for wanting to > transfer the image specifications and the google account to PG properties. No clue on the GCP account side of it (does pginfra already have one?), but for the repo I guess it would seem natural to have one on git.postgresql.org infra, mirrored (just like the main repo) to a repo on project-owned github.com/postgres, from which image building is triggered. Then it could be maintained by the whole PostgreSQL project, patches discussed on -hackers, a bit like pg_bsd_indent. Perhaps with some way to trigger test image builds, so that people working on it don't need their own GCP account to do a trial run. + # Test that code can be built with gcc/clang without warnings As a TODO note, I think we should eventually run a warnings check for MSVC too. IIUC we only aim to be warning free in assertion builds on that platform, because it has no PG_USED_FOR_ASSERTS_ONLY (I think it has it in C++ but not C?), but that's something. + # XXX: Only do this if there have been changes in doc/ since last build + always: + docs_build_script: | + time ./configure \ + --cache gcc.cache CC="ccache gcc" + time make -s -j4 -C doc Another TODO note: perhaps we could also make the documentation results a browsable artefact with a short retention time, if that's a thing. (I've been confused about how to spell "artefact" for some time now, and finally I know why: in the US it has an i; I blame Noah Websta, whose name I have decided to improve.) I feel like I should apologise in advance for this level of nit-picking about English grammar, but: +2) For not yet merged development work, CI can be enabled for some git hosting + providers. This allows to test patches on a number of platforms before they + are merged (or even submitted). You can "allow testing" (gerund verb), you can "allow developers/us/one/... to test" (infinitive, but with a noun phrase to say who's allowed to do the thing), you can "allow verification of ..." (noun phrase), you can "be allowed to test" (passive), but you can't "allow to test": it's not allowed![1] :-) +# It might be nicer to switch to the openssl built as part of curl-for-win, +# but recent releases only build openssl 3, and that still seems troublesome +# on windows, s/windows,/Windows./ + name: FreeBSD FreeBSD is missing --with-llvm. If you add package "llvm" to your image builder you'll currently get LLVM 9, then LLVM_CONFIG="llvm-config" CXX="ccache c++" CLANG="ccache clang". Or we could opt for something more modern with package llvm13 and program names llvm-config13 and clang13. > The second attention-worthy point is the choice of a new toplevel ci/ > directory as the location for resources referencenced by CI. A few other > projects also use ci/, but I can also see arguments for moving the contents to > e.g. src/tools/ci or such? I'd be +0.75 for moving it to src/tools/ci. > [1] Some ideas for what could make sense to extend CI to in the future: To your list, I'd add: * 32 bit * test coverage report * ability to capture complete Window install directory as an artefact so a Windows user without a dev environment could try out a proposed change/CF entry/... I hope we can get ccache working on Windows. [1] https://english.stackexchange.com/questions/60271/grammatical-complements-for-allow/60285#60285
Hi, On 2021-12-14 16:51:58 +1300, Thomas Munro wrote: > I'd better go and figure out how to fix cfbot when this lands... I assume it'd be: - stop adding the CI stuff - adjust links to CI tasks, appveyor wouldn't be used anymore - perhaps reference individual tasks from the cfbot page? > > I think this may be OK for now, but I also could see arguments for wanting to > > transfer the image specifications and the google account to PG properties. > > No clue on the GCP account side of it (does pginfra already have > one?), but for the repo I guess it would seem natural to have one on > git.postgresql.org infra, mirrored (just like the main repo) to a repo > on project-owned github.com/postgres, from which image building is > triggered. Then it could be maintained by the whole PostgreSQL > project, patches discussed on -hackers, a bit like pg_bsd_indent. > Perhaps with some way to trigger test image builds, so that people > working on it don't need their own GCP account to do a trial run. I think that's a good medium-term goal, I'd not make it a prerequisite for merging myself. > + # Test that code can be built with gcc/clang without warnings > > As a TODO note, I think we should eventually run a warnings check for > MSVC too. IIUC we only aim to be warning free in assertion builds on > that platform, because it has no PG_USED_FOR_ASSERTS_ONLY (I think it > has it in C++ but not C?), but that's something. Hm. Not entirely sure how to do that without doing a separate windows build, which is too slow... > + # XXX: Only do this if there have been changes in doc/ since last build > + always: > + docs_build_script: | > + time ./configure \ > + --cache gcc.cache CC="ccache gcc" > + time make -s -j4 -C doc > > Another TODO note: perhaps we could also make the documentation > results a browsable artefact with a short retention time, if that's a > thing. Might be doable, but I'd guess that the volume of data it'd generate make it not particularly attractive. > I feel like I should apologise in advance for this level of > nit-picking about English grammar, but: :) Will try to fix. > + name: FreeBSD > > FreeBSD is missing --with-llvm. That was kind of intentional, I guess I should add a comment about it. The CI image for freebsd already starts slower due to its size, and is on the slower side compared to anything !windows, so I'm not sure it's worth enabling llvm there? It's probably not bad to have one platform testing without llvm. > > [1] Some ideas for what could make sense to extend CI to in the future: > > To your list, I'd add: > > * 32 bit That'd be easy. > * test coverage report If the output size is reasonable, that should be doable as well. > * ability to capture complete Window install directory as an artefact > so a Windows user without a dev environment could try out a proposed > change/CF entry/... I think the size of these artifacts would make this not something to enable by default. But perhaps a manually triggered task would make sense? > I hope we can get ccache working on Windows. They did merge a number of the other required changes for that over the weekend. I'll try once they released... Thanks! Andres Freund
> On 14 Dec 2021, at 05:11, Andres Freund <andres@anarazel.de> wrote: > On 2021-12-14 16:51:58 +1300, Thomas Munro wrote: >> I'd better go and figure out how to fix cfbot when this lands... > > I assume it'd be: > - stop adding the CI stuff > - adjust links to CI tasks, appveyor wouldn't be used anymore > - perhaps reference individual tasks from the cfbot page? +1 on leveraging the CI tasks in the tree in the CFBot. For a patch like the libnss TLS backend one it would be a great help to both developer and reviewer to have that codepath actually built and tested as part of the CFBot; being able to tweak the CI tasks used in the CFBot per patch would be very helpful. -- Daniel Gustafsson https://vmware.com/
On Mon, Dec 13, 2021 at 03:45:23PM -0800, Andres Freund wrote: > Hi, > > On 2021-12-13 18:14:52 -0500, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote: > > >> sudo is used exactly twice; maybe it's not needed at all ? > > > > > The macos one is needed, but the freebsd one indeed isn't. > > > > I'm with Justin on this one. I would view a script trying to > > mess with /cores as a hostile act. PG cores on macOS tend to > > be extremely large and can fill up your disk fairly quickly > > if you don't know they're being accumulated. I think it's okay > > to suggest in the documentation that people might want to allow > > cores to be dropped, but the script has NO business trying to > > force that. > > I'm not quite following. This is a ephemeral CI instance? As for myself, all I meant is that it's better to write it with zero sudos than one (for the same reason that it's better to write with one than with two). -- Justin
Justin Pryzby <pryzby@telsasoft.com> writes: > On Mon, Dec 13, 2021 at 03:45:23PM -0800, Andres Freund wrote: >> On 2021-12-13 18:14:52 -0500, Tom Lane wrote: >>> I'm with Justin on this one. I would view a script trying to >>> mess with /cores as a hostile act. >> I'm not quite following. This is a ephemeral CI instance? > As for myself, all I meant is that it's better to write it with zero sudos than > one (for the same reason that it's better to write with one than with two). What I'm concerned about is that it's unsafe to run the script in any non-throwaway environment. That doesn't seem desirable. regards, tom lane
> On 15 Dec 2021, at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Justin Pryzby <pryzby@telsasoft.com> writes: >> On Mon, Dec 13, 2021 at 03:45:23PM -0800, Andres Freund wrote: >>> On 2021-12-13 18:14:52 -0500, Tom Lane wrote: >>>> I'm with Justin on this one. I would view a script trying to >>>> mess with /cores as a hostile act. > >>> I'm not quite following. This is a ephemeral CI instance? > >> As for myself, all I meant is that it's better to write it with zero sudos than >> one (for the same reason that it's better to write with one than with two). > > What I'm concerned about is that it's unsafe to run the script in > any non-throwaway environment. That doesn't seem desirable. I don't think anyone should expect any part of the .cirrus.yml script to be safe or applicable to a local environment, so I think this is something we can solve with documentation. -- Daniel Gustafsson https://vmware.com/
On 13.12.21 22:12, Andres Freund wrote: > Attached is an updated version of the CI patches. An example of a test run > with the attached version of this > https://cirrus-ci.com/build/6501998521483264 + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*' I'm not in favor of this kind of thing. I don't understand how this is useful, other than perhaps for developing *this* patch. I don't think people would like having these tags in the mainline, and if it's for local use, then people can adjust the file locally. + CC="ccache cc" CFLAGS="-O0 -ggdb"' I don't think using -O0 is the right thing. It will miss some compiler warnings, and it will not thoroughly test the compiler. We should test using the configurations that are close to what users actually use. + - su postgres -c 'gmake -s -j3 && gmake -s -j3 -C contrib' Why doesn't this use make world (or world-bin, if you prefer). Why does this use -j3 if there are two CPUs configured? (Perhaps the number of CPUs should be put into a variable.) I don't like that the -s option is used. I would like to see what commands are executed. + cpu: 4 Why does the Linux job use 4 CPUs and the FreeBSD job 2? + - export I don't think that is portable to all shells. + - su postgres -c 'time script test.log gmake -s -j2 ${CHECK} ${CHECKFLAGS}' + su postgres -c '\ + ulimit -c unlimited; \ + make -s ${CHECK} ${CHECKFLAGS} -j8 \ + ' Not clear why these are so different. Don't we need the test.log file for Linux? Don't we need the ulimit call for FreeBSD? Why the -j8 option even though 4 CPUs have been configured? + brew install \ + ccache \ + coreutils \ + icu4c \ + krb5 \ + llvm \ + lz4 \ + make \ + openldap \ + openssl \ + python@3.10 \ + tcl-tk Curious why coreutils and make are installed? The system-supplied tools ought to work. + brew cleanup -s Seems unnecessary? + PKG_CONFIG_PATH="/usr/local/opt/krb5/lib/pkgconfig:$PKG_CONFIG_PATH" AFAICT, we don't use pkg-config for the krb5 package. + - gmake -s -j12 && gmake -s -j12 -C contrib These numbers should also be explained or configured somewhere. Possibly query the number of CPUs on the instance. + PROVE_FLAGS: -j10 Why only on Windows? + # Installation on windows currently only completely works from src\tools\msvc If that is so, let's fix that. But see that install.pl contains if (-e "src/tools/msvc/buildenv.pl") etc. it seems to want to be able to be invoked from the top level. + - cd src\tools\msvc && perl .\install.pl %CIRRUS_WORKING_DIR%\tmp_install Confusing mix of forward and backward slashes in the Windows section. I think forward slashes should work everywhere. + test_plcheck_script: + - perl src/tools/msvc/vcregress.pl plcheck etc. Couldn't we enhance vcregress.pl to take multiple arguments or take a "check-world" argument or something. Otherwise, this will be tedious to keep updated. + test_subscriptioncheck_script: + - perl src/tools/msvc/vcregress.pl taptest .\src\test\subscription\ This is even worse. I don't want to have to hand-register every new TAP test. + always: + gcc_warning_script: | + time ./configure \ + --cache gcc.cache CC="ccache gcc" \ + --enable-dtrace I don't know why we wouldn't need the full set of options here. It's not like optional code never has compiler warnings. + # cross-compile to windows + always: + mingw_cross_warning_script: | I would welcome a native mingw build with full options and test suite run. This cross-compiling stuff is of course interesting, but I'm not sure why it is being preferred over a full native run. --- /dev/null +++ b/.dockerignore @@ -0,0 +1,7 @@ +# Ignore everything, except ci/ I wonder whether this would interfere with other uses of docker. I suppose people might have their custom setups for building docker images from PostgreSQL sources. It seems weird that this file gets this prominence, saying that the canonical use of docker inside PostgreSQL sources is for Cirrus CI. It would be useful if the README explained the use of docker. As I mentioned before, it's not immediately clear why docker is used at all in this. The docker file for Windows contains a lot of hardcoded version numbers. This should at least be refactored a little bit so that it is clear which version numbers should be updated and how. Or better yet, avoid the need to constantly update version numbers. For example, the Python patch release changes every few weeks (e.g., 3.10.0 isn't current anymore). Also, the way OpenSSL is installed looks a bit fishy. Is this what people actually use in practice? How can we make it match actual practice better? +# So that tests using the "manually" started postgres on windows can use +# prepared statements +max_prepared_transactions = 10 Maybe add that to the pg_ctl invocation in the Windows section instead. +# Settings that make logs more useful +log_autovacuum_min_duration = 0 +log_checkpoints = true +log_connections = true +log_disconnections = true +log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] ' +log_lock_waits = true If we think these are useful, we should make the test suite drivers set them for all users. > One I explicitly brought up before: > > On 2021-10-01 15:27:52 -0700, Andres Freund wrote: >> One policy discussion that we'd have to have is who should control the images >> used for CI. Right now that's on my personal google cloud account - which I am >> happy to do, but medium - long term that'd not be optimal. > > The proposed CI script uses custom images to run linux and freebsd tests. They > are automatically built every day from the repository https://github.com/anarazel/pg-vm-images/ > > These images have all the prerequisites pre-installed. For Linux something > similar can be achieved by using dockerfiles referenced the .cirrus.yml file, > but for FreeBSD that's not available. Installing the necessary dependencies on > every run is too time intensive. For linux, the tests run a lot faster in > full-blown VMs than in docker, and full VMs allow a lot more control / > debugging. > > I think this may be OK for now, but I also could see arguments for wanting to > transfer the image specifications and the google account to PG properties. For the above reasons of lack of documentation, I still don't understand the whole docker flow here. Do you mean, the docker files included in your patch are not actually used as part of the CI run; instead you use them to build images manually, which are then pulled in by the test runs? If so, apart from the general problem of having this go through some personal account, I also have concerns how this can be kept up to date, given how often the dependent software changes, as mentioned above. I think it would be much easier to get this project over the initial hump if we skipped the whole docker business and just set the images up from scratch on each run. > The second attention-worthy point is the choice of a new toplevel ci/ > directory as the location for resources referencenced by CI. A few other > projects also use ci/, but I can also see arguments for moving the contents to > e.g. src/tools/ci or such? Or src/tools/cirrus/? This providers came and go, and before long there might be interest in another one. > - Extend the set of compiler warnings - as the compiler version is controlled, > we could be more aggressive than we can be via configure.ac. Not sure about that. I don't want this to evolve into some separate pool of policies that yells at you because of some settings that you never heard of. If we think other warnings are useful, we should provide a way to select them, perhaps optionally, from the usual build system. > - consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES, > and run-time force_parallel_mode = regress on some platforms. They seem to > catch a lot of problems during development and are likely affordable enough. That would be useful if we can think of a way to select it optionally.
On 12/13/21 16:12, Andres Freund wrote: > Hi, > > Attached is an updated version of the CI patches. An example of a test run > with the attached version of this > https://cirrus-ci.com/build/6501998521483264 > Maye I have missed it, but why are we using ccache here? That seems a bit pointless in an ephemeral instance. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > Maye I have missed it, but why are we using ccache here? That seems a > bit pointless in an ephemeral instance. I believe Munro's cfbot tooling is able to save and re-use ccache across successive instantiations of a build instance. I've not looked at this code, but if it can do that there'd be point to it. regards, tom lane
Hi, On 2021-12-17 12:34:36 +0100, Peter Eisentraut wrote: > On 13.12.21 22:12, Andres Freund wrote: > > Attached is an updated version of the CI patches. An example of a test run > > with the attached version of this > > https://cirrus-ci.com/build/6501998521483264 > > + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || > $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*' > > I'm not in favor of this kind of thing. I don't understand how this is > useful, other than perhaps for developing *this* patch. I don't think > people would like having these tags in the mainline, and if it's for local > use, then people can adjust the file locally. Definitely not for mainline. But it's extremely useful for development. If you iteratively try to fix windows, running all the other tests can be slower - there's a concurrency limit in how many tests you can run for free... > + CC="ccache cc" CFLAGS="-O0 -ggdb"' > > I don't think using -O0 is the right thing. It will miss some compiler > warnings, and it will not thoroughly test the compiler. We should test > using the configurations that are close to what users actually use. Hm. I personally always end up using -O0 for the actual development tree, and it seems a lot of others do as well. Building with -O2 makes backtraces etc just less useful. > + - su postgres -c 'gmake -s -j3 && gmake -s -j3 -C contrib' > > Why doesn't this use make world (or world-bin, if you prefer). I started working on this well before world-bin existed. And using 'world' as the target builds the docs, which is quite expensive... I happened to actually make the change to world-bin yesterday for the next version to send :) > Why does this use -j3 if there are two CPUs configured? (Perhaps the number > of CPUs should be put into a variable.) I tried a few and that worked best. > I don't like that the -s option is used. I would like to see what commands > are executed. I can change it - but it makes it *much* harder to spot compiler warnings. > + cpu: 4 > > Why does the Linux job use 4 CPUs and the FreeBSD job 2? I'll add a comment about it. Two reasons 1) the limits on cirrus are lower for linux than freebsd: https://cirrus-ci.org/faq/ 2) There's some issues on freebsd where test performance regressess *very* substantially with higher concurrency. Thomas and I looked a bunch into it without figuring out the details. > + - export > I don't think that is portable to all shells. Doesn't really need to be? > + - su postgres -c 'time script test.log gmake -s -j2 ${CHECK} > ${CHECKFLAGS}' > > + su postgres -c '\ > + ulimit -c unlimited; \ > + make -s ${CHECK} ${CHECKFLAGS} -j8 \ > + ' > > Not clear why these are so different. Don't we need the test.log file for > Linux? There's a comment about the use of script: # Use of script is to avoid make complaints about fcntl() # https://savannah.gnu.org/bugs/?60774 that bug is specific to platforms that don't allow locking pipes. Which linux does allow, but freebsd doesn't. > Don't we need the ulimit call for FreeBSD? I think the default core limits were different, I will check. > Why the -j8 option even though 4 CPUs have been configured? That might have been an accident. > + brew install \ > + ccache \ > + coreutils \ > + icu4c \ > + krb5 \ > + llvm \ > + lz4 \ > + make \ > + openldap \ > + openssl \ > + python@3.10 \ > + tcl-tk > > Curious why coreutils and make are installed? The system-supplied tools > ought to work. make because newer versions of make have -Otarget, which makes concurrent check-world output at least kind-of readable. > + brew cleanup -s > > Seems unnecessary? It reduces the size of the cached downloads. Not much point in keeping older versions of the package around. Populating the cache takes time. > + PKG_CONFIG_PATH="/usr/local/opt/krb5/lib/pkgconfig:$PKG_CONFIG_PATH" > > AFAICT, we don't use pkg-config for the krb5 package. I now converted this to a loop. > + - gmake -s -j12 && gmake -s -j12 -C contrib > > These numbers should also be explained or configured somewhere. Possibly > query the number of CPUs on the instance. macOS instances have a fixed number of cores - 12. Might make sense to query it, but not sure what a good portable way there is. > + PROVE_FLAGS: -j10 > > Why only on Windows? Because windows doesn't have a way to run tests in parallel in another way. prove-level concurrency is the only thing. Whereas other platforms can run tests in parallel via make. Combining both tends to not work very well in my experience. > + # Installation on windows currently only completely works from > src\tools\msvc > > If that is so, let's fix that. I did report the problem - just haven't gotten around to fixing it. Note this is also how the buildfarm invokes installation... The problem is that Install.pm includes config.pl from the current directory, IIRC. At some point I needed to restrict to dealing with the current state - there's plenty other bugs. > + - cd src\tools\msvc && perl .\install.pl > %CIRRUS_WORKING_DIR%\tmp_install > > Confusing mix of forward and backward slashes in the Windows section. I > think forward slashes should work everywhere. They would work here I think, but no, they don't work everywhere :( > + test_plcheck_script: > + - perl src/tools/msvc/vcregress.pl plcheck > > etc. Couldn't we enhance vcregress.pl to take multiple arguments or take a > "check-world" argument or something. Otherwise, this will be tedious to > keep updated. > + test_subscriptioncheck_script: > + - perl src/tools/msvc/vcregress.pl taptest .\src\test\subscription\ > > This is even worse. I don't want to have to hand-register every new TAP > test. I strongly agree. There were several tests that the buildfarm on windows didn't ever run before I started working on this. And clearly no windows developer is going to manually invoke ~10 test steps. > + always: > + gcc_warning_script: | > + time ./configure \ > + --cache gcc.cache CC="ccache gcc" \ > + --enable-dtrace > > I don't know why we wouldn't need the full set of options here. It's not > like optional code never has compiler warnings. I mostly didn't like the repetition of long argument lists. There's probably a decent way to deal with that. > + # cross-compile to windows > + always: > + mingw_cross_warning_script: | > > I would welcome a native mingw build with full options and test suite run. > This cross-compiling stuff is of course interesting, but I'm not sure why it > is being preferred over a full native run. I have a new colleague working on scripting the setup of mingw on windows. Besides not being available yet, it's *much* *much* slower to build postgres. This is a useful and quicker screening. > --- /dev/null > +++ b/.dockerignore > @@ -0,0 +1,7 @@ > +# Ignore everything, except ci/ > > I wonder whether this would interfere with other uses of docker. I suppose > people might have their custom setups for building docker images from > PostgreSQL sources. It seems weird that this file gets this prominence, > saying that the canonical use of docker inside PostgreSQL sources is for > Cirrus CI. I have a hard time seeing uses of docker where this would be a problem. If you actually use the whole tree as context you pretty much need to exclude most of it, otherwise docker (and other tools) tar the whole tree and send it to the daemon. > It would be useful if the README explained the use of docker. As I > mentioned before, it's not immediately clear why docker is used at all in > this. It boils down to this: Windows tests on cirrus are always run via docker (presumably because the licensing otherwise is more expensive). And for linux, it's considerably quicker to start up a container, rather than a full VM - but a full VM is slower. > The docker file for Windows contains a lot of hardcoded version numbers. > This should at least be refactored a little bit so that it is clear which > version numbers should be updated and how. Or better yet, avoid the need to > constantly update version numbers. I don't really see a great way to avoid them in general. And e.g. with perl the issue is that plperl straight up doesn't work with a newer perl version :(. > Also, the way OpenSSL is installed looks a bit fishy. Is this what people > actually use in practice? How can we make it match actual practice better? I wish I knew. I didn't see any good practice for this anywhere. > +# So that tests using the "manually" started postgres on windows can use > +# prepared statements > +max_prepared_transactions = 10 > > Maybe add that to the pg_ctl invocation in the Windows section instead. It doesn't hurt anything else, so I don't really think it's worth going a platform dependent way? > +# Settings that make logs more useful > +log_autovacuum_min_duration = 0 > +log_checkpoints = true > +log_connections = true > +log_disconnections = true > +log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] ' > +log_lock_waits = true > > If we think these are useful, we should make the test suite drivers set them > for all users. Some of these are set for some but not some other test drivers. And it's quite useful for out-of-tree development to be able to quickly add options to all tests... And no test driver helps the windows case that needs the manually started postgres (the state of windows testing really is dreadful). > For the above reasons of lack of documentation, I still don't understand the > whole docker flow here. Do you mean, the docker files included in your > patch are not actually used as part of the CI run; instead you use them to > build images manually, which are then pulled in by the test runs? As submitted this is about VM images, not docker images. Used by linux (for the main test run) and freebsd (cirrus doesn't support anything else). Some other platforms could also be supported that way with a bit more work (openbsd, netbsd). For some other platforms cirrus only supports docker containers (windows, linux on arm). The docker containers can be built on-demand (that's what the dockerfile: ... syntax for e.g. windows does). So yes, they currently are used. cirrus rebuilds them whenever their content (including referenced files) changes. But the building of containers happens in every repo enabling the tests. That takes quite a while and uses a lot of space (the additional windows installation is like ~6GB). So I was working over the last few days on moving the containers to be built the alongside the VM images. Then it looks more similar across all the platforms. > I think it would be much easier to get this project over the initial hump if > we skipped the whole docker business and just set the images up from scratch > on each run. It's not feasible. The windows stuff takes *way* too long (as in 40min), freebsd takes quite long, linux long. And the amount of traffic it generates to install packages over and over again isn't great either. The containers / images are from within google's network, and thus free - the package repos don't have that advantage. FWIW, homebrew at some point complained about a huge fraction of their cost being from CI. I know debian had issues with that on and off as well. While I couldn't solve the homebrew issue via VM images, I made it at least cache the homebrew downloads between runs. My current (not yet submitted version) comment about this is: > Images used for CI > ================== > > To keep CI times tolerable, most platforms use pre-generated images. Some > platforms use containers, others use full VMs. Images for both are generated > separately from CI runs, otherwise each git repository that is being tested > would need to build its own set of containers, which would be wasteful (both > in space and time. > > These images are built from the specifications in github.com/anarazel/pg-vm-images/ I also did the work of redoing the necessary setup and documenting all the steps ([1] although there could be a bit more handholding) . It'd now not be hard to transfer the image building into different / shared ownership. I'll expand this section. > > The second attention-worthy point is the choice of a new toplevel ci/ > > directory as the location for resources referencenced by CI. A few other > > projects also use ci/, but I can also see arguments for moving the contents to > > e.g. src/tools/ci or such? > > Or src/tools/cirrus/? This providers came and go, and before long there > might be interest in another one. I think quite a bit of the work will be portable between CI providers. I think having all the different CI things in one directory will make it more obvious than having a bunch of provider names one might or might not know. I'm imaginging that over time we'd put some of the stuff in the .cirrus.yml file into scripts in src/tools/ci/, so they can be used by different CI providers. > > - Extend the set of compiler warnings - as the compiler version is controlled, > > we could be more aggressive than we can be via configure.ac. > > Not sure about that. I don't want this to evolve into some separate pool of > policies that yells at you because of some settings that you never heard of. > If we think other warnings are useful, we should provide a way to select > them, perhaps optionally, from the usual build system. I'd be happy with that as well. I do think that the compiler version dependency makes it bit harder to this usefully from configure though. > > - consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES, > > and run-time force_parallel_mode = regress on some platforms. They seem to > > catch a lot of problems during development and are likely affordable enough. > > That would be useful if we can think of a way to select it optionally. What kind of optionally are you thinking? Commit message contents? A central flag somewhere? I was thinking it might be worth enabling such options on one of the platforms, so that one gets coverage by default. People remembering e.g. COPY_PARSE_PLAN_TREES don't tend to need it... Greetings, Andres Freund [1] https://github.com/anarazel/pg-vm-images/blob/main/gcp_project_setup.txt
Hi, On 2021-12-17 09:36:05 -0500, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > Maye I have missed it, but why are we using ccache here? That seems a > > bit pointless in an ephemeral instance. > > I believe Munro's cfbot tooling is able to save and re-use ccache > across successive instantiations of a build instance. I've not > looked at this code, but if it can do that there'd be point to it. Yes, the ccache cache is persisted across runs (see the *_cache and upload_cache inststructions). It makes a quite substantial difference. One reason the windows runs are a lot slower than the others is just that visual studio isn't yet supported by ccache, and that there doesn't seem to be good other tools. The ccache maintainers merged more of the msvc support last weekend! So I have quite a bit of hope of being able to use ccache there as well. Greetings, Andres Freund
> On 17 Dec 2021, at 20:31, Andres Freund <andres@anarazel.de> wrote: > On 2021-12-17 12:34:36 +0100, Peter Eisentraut wrote: >> I don't like that the -s option is used. I would like to see what commands >> are executed. > > I can change it - but it makes it *much* harder to spot compiler warnings. Having used Cirrus et.al a fair bit I strongly agree with Andres, working with huge logs in the browser is painful whereas -s makes it useable even on mobile devices. -- Daniel Gustafsson https://vmware.com/
On 2021-12-17 11:31:59 -0800, Andres Freund wrote: > > Don't we need the ulimit call for FreeBSD? > > I think the default core limits were different, I will check. Yep, freebsd has -c unlimited by default: https://cirrus-ci.com/task/6199382239346688?logs=sysinfo#L23 vs https://cirrus-ci.com/task/4792007355793408?logs=sysinfo#L32
On 12/17/21 14:34, Andres Freund wrote: > Hi, > > On 2021-12-17 09:36:05 -0500, Tom Lane wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> Maye I have missed it, but why are we using ccache here? That seems a >>> bit pointless in an ephemeral instance. >> I believe Munro's cfbot tooling is able to save and re-use ccache >> across successive instantiations of a build instance. I've not >> looked at this code, but if it can do that there'd be point to it. > Yes, the ccache cache is persisted across runs (see the *_cache and > upload_cache inststructions). It makes a quite substantial difference. One > reason the windows runs are a lot slower than the others is just that visual > studio isn't yet supported by ccache, and that there doesn't seem to be good > other tools. > > The ccache maintainers merged more of the msvc support last weekend! So I have > quite a bit of hope of being able to use ccache there as well. > Ok. I have had to disable ccache for fairywren (msys2) because it caused serious instability. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2021-12-20 11:21:05 -0800, Andres Freund wrote: > Attached is v4 of the CI patch. I'd like to push this - any objections? It's not disruptive to anything but cfbot, so we can incrementally improve it further. I'll try to sync pushing with Thomas, so that he can adjust cfbot to not add the CI changes anymore / adjust the links to the CI status URLs etc. Greetings, Andres Freund
> On 29 Dec 2021, at 21:17, Andres Freund <andres@anarazel.de> wrote: > On 2021-12-20 11:21:05 -0800, Andres Freund wrote: >> Attached is v4 of the CI patch. > > I'd like to push this - any objections? It's not disruptive to anything but > cfbot, so we can incrementally improve it further. No objection, I'm +1 on getting this in. -- Daniel Gustafsson https://vmware.com/
Hi, Attached is v5 of the CI patch. Not a lot of changes: - a bunch of copy-editing, wrote a commit message etc - use ccache for CXX/CLANG in the CompilerWarnings task, I had missed that when making the task use all --with-* flags - switch linux to use ossp-uuid. I tried to switch macos at first, but that doesn't currently seem to work. - minor other cleanups This time I've only attached the main CI patch, not the one making core dumps on windows work. That's not yet committable... I plan to push this after another cycle of tests passing (and driving over the bay bridge...) unless I hear protests. Greetings, Andres Freund
Вложения
commit message: agithub the the the buildfarm. => the access too. => to # Due to that it using concurrency within prove is helpful. => Due to that, it's useful to run prove with multiple jobs. further details src/tools/ci/README => further details , see src/tools/ci/README script uses a pseudo-tty, which do support locking. => which does To limit unneccessary work only run this once normal linux test succeeded => unnecessary => succeeds -- Justin
Hi, On 2021-12-30 20:28:46 -0600, Justin Pryzby wrote: > [ language fixes] Thanks! > script uses a pseudo-tty, which do support locking. > => which does This didn't seem right either way - it's pseudo-ttys that don't support locking, so plural seemed appropriate? I changed it to "script uses pseudo-ttys, which do" instead... Regards, Andres
On 2021-12-30 17:46:52 -0800, Andres Freund wrote: > I plan to push this after another cycle of tests passing (and driving > over the bay bridge...) unless I hear protests. Pushed. Marked CF entry as committed.
On 2021-Dec-30, Andres Freund wrote: > On 2021-12-30 17:46:52 -0800, Andres Freund wrote: > > I plan to push this after another cycle of tests passing (and driving > > over the bay bridge...) unless I hear protests. > > Pushed. > > Marked CF entry as committed. I tried it using the column filter patch. It worked on the first attempt. Thanks! -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
I noticed a patch failing in cfbot everywhere except windows: https://commitfest.postgresql.org/36/3476/ | Invalid relcache when ADD PRIMARY KEY USING INDEX It's because vcregress skips tests which have NO_INSTALLCHECK=1. Is it desirable to enable more module/contrib tests for windows CI ? This does a few, but there's a few others which would require the server to be restarted to set shared_preload_libraries for each module. diff --git a/.cirrus.yml b/.cirrus.yml index 19b3737fa11..c427b468334 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -390,7 +390,7 @@ task: - perl src/tools/msvc/vcregress.pl check parallel startcreate_script: # paths to binaries need backslashes - - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log + - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log --options=--no-sync - echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf - tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l tmp_check/postmaster.log test_pl_script: diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile index 9a31e0b8795..14fd847ba7f 100644 --- a/contrib/test_decoding/Makefile +++ b/contrib/test_decoding/Makefile @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \ twophase_snapshot -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf # Disabled because these tests require "wal_level=logical", which diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf index d8faa9c26c1..52cdb697a57 100644 --- a/src/tools/ci/pg_ci_base.conf +++ b/src/tools/ci/pg_ci_base.conf @@ -12,3 +12,24 @@ log_connections = true log_disconnections = true log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] ' log_lock_waits = true + +# test_decoding +wal_level = logical +max_replication_slots = 4 +logical_decoding_work_mem = 64kB + +# commit_ts +track_commit_timestamp = on + +## worker_spi +#shared_preload_libraries = worker_spi +#worker_spi.database = contrib_regression + +## pg_stat_statements +##shared_preload_libraries=pg_stat_statements + +## test_rls_hooks +#shared_preload_libraries=test_rls_hooks + +## snapshot_too_old +#old_snapshot_threshold = 60min diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl index 8f3e3fa937b..7e2cc971a42 100644 --- a/src/tools/msvc/vcregress.pl +++ b/src/tools/msvc/vcregress.pl @@ -443,6 +443,7 @@ sub plcheck sub subdircheck { my $module = shift; + my $obey_installcheck = shift || 1; if ( !-d "$module/sql" || !-d "$module/expected" @@ -452,7 +453,7 @@ sub subdircheck } chdir $module; - my @tests = fetchTests(); + my @tests = fetchTests($obey_installcheck); # Leave if no tests are listed in the module. if (scalar @tests == 0) @@ -516,6 +517,14 @@ sub contribcheck my $status = $? >> 8; $mstat ||= $status; } + + subdircheck('test_decoding', -1); + $mstat ||= $? >> 8; + + # The DB would need to be restarted + #subdircheck('pg_stat_statements', -1); + #$mstat ||= $? >> 8; + exit $mstat if $mstat; return; } @@ -530,6 +539,19 @@ sub modulescheck my $status = $? >> 8; $mstat ||= $status; } + + subdircheck('commit_ts', -1); + $mstat ||= $? >> 8; + + subdircheck('test_rls_hooks', -1); + $mstat ||= $? >> 8; + + ## The DB would need to be restarted + #subdircheck('worker_spi', -1); + #$mstat ||= $? >> 8; + + # src/test/modules/snapshot_too_old/Makefile + exit $mstat if $mstat; return; } @@ -726,6 +748,7 @@ sub fetchTests my $m = <$handle>; close($handle); my $t = ""; + my $obey_installcheck = shift || 1; $m =~ s{\\\r?\n}{}g; @@ -733,7 +756,7 @@ sub fetchTests # so bypass its run by returning an empty set of tests. if ($m =~ /^\s*NO_INSTALLCHECK\s*=\s*\S+/m) { - return (); + return () if $obey_installcheck == 1; } if ($m =~ /^REGRESS\s*=\s*(.*)$/gm)
Hi, On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote: > I noticed a patch failing in cfbot everywhere except windows: > > https://commitfest.postgresql.org/36/3476/ > | Invalid relcache when ADD PRIMARY KEY USING INDEX > > It's because vcregress skips tests which have NO_INSTALLCHECK=1. > Is it desirable to enable more module/contrib tests for windows CI ? Yes. I think the way we run windows tests is pretty bad - it's not reasonable that each developer needs to figure out 20 magic incantations to run all tests on windows. > This does a few, but there's a few others which would require the server to > be restarted to set shared_preload_libraries for each module. > > diff --git a/.cirrus.yml b/.cirrus.yml > index 19b3737fa11..c427b468334 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -390,7 +390,7 @@ task: > - perl src/tools/msvc/vcregress.pl check parallel > startcreate_script: > # paths to binaries need backslashes > - - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log > + - tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log --options=--no-sync > - echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf > - tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l tmp_check/postmaster.log > test_pl_script: > diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile > index 9a31e0b8795..14fd847ba7f 100644 > --- a/contrib/test_decoding/Makefile > +++ b/contrib/test_decoding/Makefile > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ > oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \ > twophase_snapshot > > -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf > +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf > ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf > Not sure why these are part of the diff? > diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf > index d8faa9c26c1..52cdb697a57 100644 > --- a/src/tools/ci/pg_ci_base.conf > +++ b/src/tools/ci/pg_ci_base.conf > @@ -12,3 +12,24 @@ log_connections = true > log_disconnections = true > log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] ' > log_lock_waits = true > + > +# test_decoding > +wal_level = logical > +max_replication_slots = 4 > +logical_decoding_work_mem = 64kB > [ more ] This doesn't really seem like a scalable path forward - duplicating configuration in more places doesn't seem sane. It seems it'd make more sense to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that changing the options passed to pg_regress based on fetchTests() return value wouldn't be too hard? Greetings, Andres Freund
Hi, On 2021-10-02 12:59:09 -0700, Andres Freund wrote: > On 2021-10-02 11:05:20 -0400, Tom Lane wrote: > > I don't know enough about Windows to evaluate 0001, but I'm a little > > worried about it because it looks like it's changing our *production* > > error handling on that platform. > > Yea. It's clearly not ready as-is - it's the piece that I was planning to > write a separate email about. > > It's hard to understand what *precisely* SEM_NOGPFAULTERRORBOX etc do. > > What I do know is that without the _set_abort_behavior() stuff abort() doesn't > trigger windows' "crash" paths in at least debugging builds, and that the > SetErrorMode() and _CrtSetReportMode() changes are necessary to get segfaults > to reach the crash paths. > > The in-tree behaviour turns out to make debugging on windows a major pain, at > least when compiling with msvc. Crashes never trigger core dumps or "just in > time" debugging (their term for invoking a debugger upon crash), so one has to > attach to processes before they crash, to have any chance of debugging. > > As far as I can tell this also means that at least for debugging builds, > pgwin32_install_crashdump_handler() is pretty much dead weight - > crashDumpHandler() never gets invoked. I think it may get invoked for abort()s > in production builds, but probably not for segfaults. > > And despite SEM_NOGPFAULTERRORBOX we display those annoying "popup" boxes > telling us about the crash and giving the option to retry, ignore, something > something. It's all a bit baffling. FWIW, the latest version of this patch (including an explanation why SEM_NOGPFAULTERRORBOX isn't useful for our purposes [anymore]) is at (and above) https://postgr.es/m/20220110005704.es4el6i2nxlxzwof%40alap3.anarazel.de Greetings, Andres Freund
On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote: > On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote: > > diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile > > index 9a31e0b8795..14fd847ba7f 100644 > > --- a/contrib/test_decoding/Makefile > > +++ b/contrib/test_decoding/Makefile > > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ > > oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \ > > twophase_snapshot > > > > -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf > > +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf > > ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf > > Not sure why these are part of the diff? Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...] ..which means test1 gets eaten as the argument to --temp-config > > diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf > > index d8faa9c26c1..52cdb697a57 100644 > > --- a/src/tools/ci/pg_ci_base.conf > > +++ b/src/tools/ci/pg_ci_base.conf > > @@ -12,3 +12,24 @@ log_connections = true > > log_disconnections = true > > log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] ' > > log_lock_waits = true > > + > > +# test_decoding > > +wal_level = logical > > +max_replication_slots = 4 > > +logical_decoding_work_mem = 64kB > > [ more ] > > This doesn't really seem like a scalable path forward - duplicating > configuration in more places doesn't seem sane. It seems it'd make more sense > to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that > changing the options passed to pg_regress based on fetchTests() return value > wouldn't be too hard? It needs to run the tests with separate instance. Maybe you're suggesting to use --temp-instance. It needs to avoid running on the buildfarm, right ? -- Justin
Вложения
Hi, On 2021-12-30 17:46:52 -0800, Andres Freund wrote: > I plan to push this after another cycle of tests passing (and driving > over the bay bridge...) unless I hear protests. I noticed that it's harder to see compile warnings on msvc in CI than it was at an earlier point. There used to be a summary of errors at the end. That turns out to be an uninteded consequence of the option to reduce msbuild verbosity. > + # Use parallelism, disable file tracker, we're never going to rebuild... > + MSBFLAGS: -m -verbosity:minimal /p:TrackFileAccess=false Unless somebody protests quickly, I'm going to add "-consoleLoggerParameters:Summary;ForceNoAlign" to that. Greetings, Andres Freund
Hi, On 2022-01-10 16:07:48 -0600, Justin Pryzby wrote: > On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote: > > On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote: > > > diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile > > > index 9a31e0b8795..14fd847ba7f 100644 > > > --- a/contrib/test_decoding/Makefile > > > +++ b/contrib/test_decoding/Makefile > > > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ > > > oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \ > > > twophase_snapshot > > > > > > -REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf > > > +REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf > > > ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf > > > > Not sure why these are part of the diff? > > Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...] > ..which means test1 gets eaten as the argument to --temp-config Ah. I see you changed that globally, good... I'll probably apply that part and 0002 separately. > > This doesn't really seem like a scalable path forward - duplicating > > configuration in more places doesn't seem sane. It seems it'd make more sense > > to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that > > changing the options passed to pg_regress based on fetchTests() return value > > wouldn't be too hard? > > It needs to run the tests with separate instance. Maybe you're suggesting to > use --temp-instance. Yes. > It needs to avoid running on the buildfarm, right ? I guess so. It currently appears to have its own logic for finding contrib (and other) tap tests: foreach my $testdir (glob("$pgsql/contrib/*")) { next unless -d "$testdir/t"; my $testname = basename($testdir); next unless step_wanted("contrib-$testname"); print time_str(), "running contrib test $testname ...\n" if $verbose; run_tap_test("$testdir", "contrib-$testname", undef); } but does run vcregress contribcheck, modulescheck. Andrew, do you see a better way than what Justin is proposing here? Greetings, Andres Freund
On Thu, Jan 13, 2022 at 10:55:27AM -0800, Andres Freund wrote: > I'll probably apply that part and 0002 separately. I've hacked on it a bit more now.. Question: are data-checksums tested at all ? The only thing I can find is that some buildfarm members *might* exercise it during installcheck. I added pg_regress --initdb-opts since that seems to be a deficiency. -- Justin
Вложения
- 0001-vcregress-ci-test-modules-contrib-with-NO_INSTALLCHE.patch
- 0002-CI-run-initdb-with-no-sync-for-windows.patch
- 0003-vcregress-style.patch
- 0004-cirrus-linux-script-test.log.patch
- 0005-cirrus-mac-uname-a-and-export-at-end-of-sysinfo.patch
- 0006-cirrus-factor-out-common-configure-options.patch
- 0007-wip-pg_regress-initdb-opts.patch
Hi, On 2022-01-13 13:06:42 -0600, Justin Pryzby wrote: > Question: are data-checksums tested at all ? The only thing I can find is that > some buildfarm members *might* exercise it during installcheck. There's some coverage via src/bin/pg_basebackup/t/010_pg_basebackup.pl and src/bin/pg_checksums/t/002_actions.pl - but that's not a whole lot. Might be worth converting one of the "additional" pg_regress runs to use data-checksums? E.g. pg_upgrade's, or the one being introduced in the "replay" test? https://postgr.es/m/CA%2BhUKGK-%2Bmg6RWiDu0JudF6jWeL5%2BgPmi8EKUm1eAzmdbwiE_A%40mail.gmail.com > From b67cd05895c8fa42e13e6743db36412a68292607 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 9 Jan 2022 22:54:32 -0600 > Subject: [PATCH 2/7] CI: run initdb with --no-sync for windows Applied this already. > From 885becd19f630a69ab1de44cefcdda21ca8146d6 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Tue, 11 Jan 2022 01:30:37 -0600 > Subject: [PATCH 4/7] cirrus/linux: script test.log.. > > For consistency, and because otherwise errors can be lost (?) or hard to find. > - make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS} > + script --command "make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS}" test.log > EOF I'm not following this one? all the output is in the CI run already, you can download it already as well? The only reason to use script as a wrapper is that otherwise make on freebsd/macos warns about fcntl failures? > only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*' > @@ -183,7 +178,7 @@ task: > mkdir -p ${CCACHE_DIR} > chown -R postgres:postgres ${CCACHE_DIR} > echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf > - su postgres -c "ulimit -l -H && ulimit -l -S" > + su postgres -c "ulimit -l -H && ulimit -l -S" # XXX Hm? Greetings, Andres Freund
On 1/13/22 13:55, Andres Freund wrote: >> It needs to avoid running on the buildfarm, right ? > I guess so. It currently appears to have its own logic for finding contrib > (and other) tap tests: > > foreach my $testdir (glob("$pgsql/contrib/*")) > { > next unless -d "$testdir/t"; > my $testname = basename($testdir); > next unless step_wanted("contrib-$testname"); > print time_str(), "running contrib test $testname ...\n" if $verbose; > run_tap_test("$testdir", "contrib-$testname", undef); > } > > but does run vcregress contribcheck, modulescheck. > > > Andrew, do you see a better way than what Justin is proposing here? > I can probably adjust to whatever we decide to do. But I think we're really just tinkering at the edges here. What I think we really need is the moral equivalent of `make check-world` in one invocation of vcregress.pl. While we're on the subject of vcregress.pl, there's this recent discussion, which is on my list of things to return to: <https://www.postgresql.org/message-id/46c40cc7-db28-b684-379d-43b34daa5ffa%40dunslane.net> cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote: > I can probably adjust to whatever we decide to do. But I think we're > really just tinkering at the edges here. What I think we really need is > the moral equivalent of `make check-world` in one invocation of > vcregress.pl. I agree strongly that we need that. But I think a good chunk of Justin's changes are actually required to get there? Specifically, unless we want lots of duplicated logic in vcregress.pl, we need to make vcregress know how to run NO_INSTALLCHECK test. The option added was just so the buildfarm doesn't start to run tests multiple times... Greetings, Andres Freund
On Fri, Jan 14, 2022 at 03:34:11PM -0800, Andres Freund wrote: > Hi, > > On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote: > > I can probably adjust to whatever we decide to do. But I think we're > > really just tinkering at the edges here. What I think we really need is > > the moral equivalent of `make check-world` in one invocation of > > vcregress.pl. > > I agree strongly that we need that. But I think a good chunk of Justin's > changes are actually required to get there? > > Specifically, unless we want lots of duplicated logic in vcregress.pl, we > need to make vcregress know how to run NO_INSTALLCHECK test. The option added > was just so the buildfarm doesn't start to run tests multiple times... The main reason I made the INSTALLCHECK runs conditional (they only run if a new option is specified) is because of these comments: | # Disabled because these tests require "shared_preload_libraries=pg_stat_statements", | # which typical installcheck users do not have (e.g. buildfarm clients). | NO_INSTALLCHECK = 1 Also, I saw that you saw that Thomas discovered/pointed out that a bunch of TAP tests aren't being run by CI. I think vcregress should have an "alltap" target that runs everything like glob("**/t/"). CI would use that instead of the existing ssl, auth, subscription, recovery, and bin targets. The buildfarm could switch to that after it's been published. https://www.postgresql.org/message-id/20220114234947.av4kkhuj7netsy5r%40alap3.anarazel.de -- Justin
On 1/14/22 18:54, Justin Pryzby wrote: > On Fri, Jan 14, 2022 at 03:34:11PM -0800, Andres Freund wrote: >> Hi, >> >> On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote: >>> I can probably adjust to whatever we decide to do. But I think we're >>> really just tinkering at the edges here. What I think we really need is >>> the moral equivalent of `make check-world` in one invocation of >>> vcregress.pl. >> I agree strongly that we need that. But I think a good chunk of Justin's >> changes are actually required to get there? >> >> Specifically, unless we want lots of duplicated logic in vcregress.pl, we >> need to make vcregress know how to run NO_INSTALLCHECK test. The option added >> was just so the buildfarm doesn't start to run tests multiple times... > The main reason I made the INSTALLCHECK runs conditional (they only run if a > new option is specified) is because of these comments: > > | # Disabled because these tests require "shared_preload_libraries=pg_stat_statements", > | # which typical installcheck users do not have (e.g. buildfarm clients). > | NO_INSTALLCHECK = 1 > > Also, I saw that you saw that Thomas discovered/pointed out that a bunch of TAP > tests aren't being run by CI. I think vcregress should have an "alltap" > target that runs everything like glob("**/t/"). CI would use that instead of > the existing ssl, auth, subscription, recovery, and bin targets. The buildfarm > could switch to that after it's been published. > > https://www.postgresql.org/message-id/20220114234947.av4kkhuj7netsy5r%40alap3.anarazel.de The buildfarm is moving in the opposite direction, to disaggregate steps. There are several reasons for that, including that it makes for less log output that you need to churn through o find out what's gone wrong in a particular case, and that it makes disabling certain test sets via the buildfarm client's `skip-steps' feature possible. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2022-01-17 10:25:12 -0500, Andrew Dunstan wrote: > The buildfarm is moving in the opposite direction, to disaggregate > steps. I'm a bit confused as to where you want changes to vcregress.pl going. Upthread you argued against adding more granular targets to vcregress. But this seems to be arguing against that? > There are several reasons for that, including that it makes for > less log output that you need to churn through o find out what's gone > wrong in a particular case, and that it makes disabling certain test > sets via the buildfarm client's `skip-steps' feature possible. FWIW, to me this shouldn't require a lot of separate manual test invocations. And continuing to have lots of granular test invocations from the buildfarm client is *bad*, because it requires constantly syncing up the set of test targets. It also makes the buildfarm far slower than necessary, because it runs a lot of stuff serially that it could run in parallel. This is particularly a problem for things like valgrind runs, where individual tests are quite slow - but just throwing more CPUs at it would help a lot. We should set things up so that: a) The output of each test can easily be associated with the corresponding set of log files. b) The list of tests run can be determined generically by looking at the filesystems c) For each test run, it's easy to see whether it failed or succeeded As part of the meson stuff (which has its own test runner), I managed to get a bit towards this by changing the log output hierarchy so that each test gets its own directory for log files (regress_log_*, initdb.log, postmaster.log, regression.diffs, server log files). What's missing is a test.{failed,succeeded} marker or such, to make it easy to report the log files for just failed tasks. Greetings, Andres Freund
On Mon, Jan 17, 2022 at 1:19 PM Andres Freund <andres@anarazel.de> wrote: > FWIW, to me this shouldn't require a lot of separate manual test > invocations. And continuing to have lots of granular test invocations from the > buildfarm client is *bad*, because it requires constantly syncing up the set > of test targets. I have a lot of sympathy with Andrew here, actually. If you just do 'make check-world' and assume that will cover everything, you get one giant output file. That is not great at all. People who are looking through buildfarm results do not want to have to look through giant logfiles hunting for the failure; they want to look at the stuff that's just directly relevant to the failure they saw. The current BF is actually pretty bad at this. You can click on various things on a buildfarm results page, but it's not very clear where those links are taking you, and the pages at least in my browser (normally Chrome) render so slowly as to make the whole thing almost unusable. I'd like to have a thing where the buildfarm shows a list of tests in red or green and I can click links next to each test to see the various logs that test produced. That's really hard to accomplish if you just run all the tests with one invocation - any technique you use to find the boundaries between one test's output and the next will prove to be unreliable. But having said that, I also agree that it sucks to have to keep updating the BF client every time we want to do any kind of test-related changes in the main source tree. One way around that would be to put a file in the main source tree that the build farm client can read to know what to do. Another would be to have the BF client download the latest list of steps from somewhere instead of having it in the source code, so that it can be updated without everyone needing to update their machine. There might well be other approaches that are even better. But the "ask Andrew to adjust the BF client and then have everybody install the new version" approach upon which we have been relying up until now is not terribly scalable. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I have a lot of sympathy with Andrew here, actually. If you just do > 'make check-world' and assume that will cover everything, you get one > giant output file. That is not great at all. Yeah. I agree with Andrew that we want output that is more modular, not less so. But we do need to find a way to have less knowledge hard-wired in the buildfarm client script. > But having said that, I also agree that it sucks to have to keep > updating the BF client every time we want to do any kind of > test-related changes in the main source tree. One way around that > would be to put a file in the main source tree that the build farm > client can read to know what to do. Another would be to have the BF > client download the latest list of steps from somewhere instead of > having it in the source code, so that it can be updated without > everyone needing to update their machine. The obvious place for "somewhere" is "the main source tree", so I doubt your second suggestion is better than your first. But your first does seem like a plausible way to proceed. Another way to think of it, maybe, is to migrate chunks of the buildfarm client script itself into the source tree. I'd rather that developers not need to become experts on the buildfarm client to make adjustments to the test process --- but I suspect that a simple script like "run make check in these directories" is not going to be flexible enough for everything. regards, tom lane
Hi, On 2022-01-17 13:50:04 -0500, Robert Haas wrote: > On Mon, Jan 17, 2022 at 1:19 PM Andres Freund <andres@anarazel.de> wrote: > > FWIW, to me this shouldn't require a lot of separate manual test > > invocations. And continuing to have lots of granular test invocations from the > > buildfarm client is *bad*, because it requires constantly syncing up the set > > of test targets. > > I have a lot of sympathy with Andrew here, actually. If you just do > 'make check-world' and assume that will cover everything, you get one > giant output file. That is not great at all. I very much agree with check-world output being near unusable. > That's really hard to accomplish if you just run all the tests with one > invocation - any technique you use to find the boundaries between one test's > output and the next will prove to be unreliable. I think it's not actually that hard, with something like I described in the email upthread, with each tests going into a prescribed location, and the on-disk status being inspectable in an automated way. check-world could invoke a command to summarize the tests at the end in a .NOTPARALLEL, to make the local case easier. pg_regress/isolation style tests have the "summary" output in regression.out, but we remove it on success. Tap tests have their output in the regress_log_* files, however these are far more verbose than the output from running the tests directly. I wonder if we should keep regression.out and also keep a copy of the "shorter" tap test output in a file? > But having said that, I also agree that it sucks to have to keep > updating the BF client every time we want to do any kind of > test-related changes in the main source tree. It also sucks locally. I *hate* having to dig through a long check-world output to find the first failure. This subthread is about the windows tests specifically, where it's even worse - there's no way to run all tests. Nor a list of test targets that's even halfway complete :/. One just has to know that one has to invoke a myriad of vcregress.pl taptest path/to/directory Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I think it's not actually that hard, with something like I described in the > email upthread, with each tests going into a prescribed location, and the > on-disk status being inspectable in an automated way. check-world could invoke > a command to summarize the tests at the end in a .NOTPARALLEL, to make the > local case easier. That sounds a bit, um, make-centric. At this point it seems to me we ought to be thinking about how it'd work under meson. > This subthread is about the windows tests specifically, where it's even worse > - there's no way to run all tests. That's precisely because the windows build doesn't use make. We shouldn't be thinking about inventing two separate dead-end solutions to this problem. regards, tom lane
Hi, On 2022-01-17 14:30:53 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think it's not actually that hard, with something like I described in the > > email upthread, with each tests going into a prescribed location, and the > > on-disk status being inspectable in an automated way. check-world could invoke > > a command to summarize the tests at the end in a .NOTPARALLEL, to make the > > local case easier. > > That sounds a bit, um, make-centric. At this point it seems to me > we ought to be thinking about how it'd work under meson. Some of this is a lot easier with meson. It has a builtin test runner, which understands tap (thereby not requiring prove anymore). Those tests can be listed (meson test --list). Depending on the option this results in a list of all tests with just the "topline" name of passing tests and error output from failing tests, or all output all the time or ... At the end it prints a summary of test counts and a list of failed tests. Here's an example (the leading timestamps are from CI, not meson), on windows: https://api.cirrus-ci.com/v1/task/6009638771490816/logs/check.log The test naming isn't necessarily great, but that's my fault. Running all the tests with meson takes a good bit less time than running most, but far from all, tests using vcregress.pl: https://cirrus-ci.com/build/4611852939296768 meson test makes it far easier to spot which tests failed, it's consistent across operating systems, allows to skip individual tests, etc. However: It doesn't address the log collection issue in itself. For that we'd still need to collect them in a way that's easier to associate with individual tests. In the meson branch I made it so that each test (including individual tap ones) has it's own log directory, which makes it easier to select all the logs for a failing test etc. > > This subthread is about the windows tests specifically, where it's even worse > > - there's no way to run all tests. > > That's precisely because the windows build doesn't use make. > We shouldn't be thinking about inventing two separate dead-end > solutions to this problem. Agreed. I think some improvements, e.g. around making the logs easier to associate with an individual test, is orthogonal to the buildsystem issue. I think it might still be worth adding stopgap way of running all tap tests on windows though. Having a vcregress.pl function to find all directories with t/ and run the tests there, shouldn't be a lot of code... Greetings, Andres Freund
On 1/17/22 13:19, Andres Freund wrote: > Hi, > > On 2022-01-17 10:25:12 -0500, Andrew Dunstan wrote: >> The buildfarm is moving in the opposite direction, to disaggregate >> steps. > I'm a bit confused as to where you want changes to vcregress.pl > going. Upthread you argued against adding more granular targets to > vcregress. But this seems to be arguing against that? OK, let me clear that up. Yes I argued upthread for a 'make check-world' equivalent, because it's useful for developers on Windows. But I don't really want to use it in the buildfarm client, where I prefer to run fine-grained tests. > > >> There are several reasons for that, including that it makes for >> less log output that you need to churn through o find out what's gone >> wrong in a particular case, and that it makes disabling certain test >> sets via the buildfarm client's `skip-steps' feature possible. > FWIW, to me this shouldn't require a lot of separate manual test > invocations. And continuing to have lots of granular test invocations from the > buildfarm client is *bad*, because it requires constantly syncing up the set > of test targets. Well, the logic we use for TAP tests is we run them for the following if they have a t subdirectory, subject to configuration settings like PG_TEST_EXTRA: src/test/bin/* contrib/* src/test/modules/* src/test/* As long as any new TAP test gets place in such a location nothing new is required in the buildfarm client. > > It also makes the buildfarm far slower than necessary, because it runs a lot > of stuff serially that it could run in parallel. That's a legitimate concern. I have made some strides towards gross parallelism in the buildfarm by providing for different branches to be run in parallel. This has proven to be fairly successful (i.e. I have not run into any complaints, and several of my own animals utilize it). I can certainly look at doing something of the sort for an individual branch run. > This is particularly a > problem for things like valgrind runs, where individual tests are quite slow - > but just throwing more CPUs at it would help a lot. When I ran a valgrind animal, `make check` was horribly slow, and it's already parallelized. But it was on a VM and I forget how many CPUs the VM had. > > We should set things up so that: > > a) The output of each test can easily be associated with the corresponding set > of log files. > b) The list of tests run can be determined generically by looking at the > filesystems > c) For each test run, it's easy to see whether it failed or succeeded > > As part of the meson stuff (which has its own test runner), I managed to get a > bit towards this by changing the log output hierarchy so that each test gets > its own directory for log files (regress_log_*, initdb.log, postmaster.log, > regression.diffs, server log files). What's missing is a > test.{failed,succeeded} marker or such, to make it easy to report the log > files for just failed tasks. One thing I have been working on is a way to split the log output of an individual buildfarm step, hitherto just a text blob, , so that you can easily navigate to say regress_log_0003-foo-step.log without having to page through myriads of stuff. It's been on the back burner but I need to prioritize it, maybe when the current CF is done. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 17.01.22 22:13, Andrew Dunstan wrote: > Well, the logic we use for TAP tests is we run them for the following if > they have a t subdirectory, subject to configuration settings like > PG_TEST_EXTRA: > > src/test/bin/* > contrib/* > src/test/modules/* > src/test/* > > As long as any new TAP test gets place in such a location nothing new is > required in the buildfarm client. But if I wanted to add TAP tests to libpq, then I'd still be stuck. Why not change the above list to "anywhere"?
On 1/18/22 08:06, Peter Eisentraut wrote: > On 17.01.22 22:13, Andrew Dunstan wrote: >> Well, the logic we use for TAP tests is we run them for the following if >> they have a t subdirectory, subject to configuration settings like >> PG_TEST_EXTRA: >> >> src/test/bin/* >> contrib/* >> src/test/modules/* >> src/test/* >> >> As long as any new TAP test gets place in such a location nothing new is >> required in the buildfarm client. > > But if I wanted to add TAP tests to libpq, then I'd still be stuck. > Why not change the above list to "anywhere"? Sure, very doable, although we will still need some special logic for src/test - there are security implication from running the ssl, ldap and kerberos tests by default. See its Makefile. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote: > Sure, very doable, although we will still need some special logic for > src/test - there are security implication from running the ssl, ldap and > kerberos tests by default. See its Makefile. ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd not need to duplicate them in the make / msvc world and we'd see them as skipped tests when not enabled. Greetings, Andres Freund
On 1/18/22 12:44, Andres Freund wrote: > Hi, > > On 2022-01-18 11:20:08 -0500, Andrew Dunstan wrote: >> Sure, very doable, although we will still need some special logic for >> src/test - there are security implication from running the ssl, ldap and >> kerberos tests by default. See its Makefile. > ISTM that we should move the PG_TEST_EXTRA handling into the tests. Then we'd > not need to duplicate them in the make / msvc world and we'd see them as > skipped tests when not enabled. > Yeah, good idea. Especially if we can backpatch that. The buildfarm client would also get simpler, so it would be doubleplusgood. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote: > I think it might still be worth adding stopgap way of running all tap tests on > windows though. Having a vcregress.pl function to find all directories with t/ > and run the tests there, shouldn't be a lot of code... I started doing that, however it makes CI/windows even slower. I think it'll be necessary to run prove with all the tap tests to parallelize them, rather than looping around directories, many of which have only a single file, and are run serially. https://github.com/justinpryzby/postgres/commits/citest-cirrus This has the link to a recent cirrus result if you'd want to look. I suppose I should start a new thread. There's a bunch of other stuff for cirrus in there (and bits and pieces of other branches). . cirrus: spell ccache_maxsize . cirrus: avoid unnecessary double star ** . vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1 . vcregress: style . wip: vcsregress: add alltaptests . wip: run upgrade tests with data-checksums . pg_regress --initdb-opts . wip: pg_upgrade: show list of files copied only in vebose mode . wip: cirrus: include hints how to install OS packages.. . wip: cirrus: code coverage . cirrus: upload html docs as artifacts . wip: cirrus/windows: save compiled build as an artifact
Hi, On 2022-01-18 15:08:47 -0600, Justin Pryzby wrote: > On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote: > > I think it might still be worth adding stopgap way of running all tap tests on > > windows though. Having a vcregress.pl function to find all directories with t/ > > and run the tests there, shouldn't be a lot of code... > > I started doing that, however it makes CI/windows even slower. To be expected... Perhaps the caching approach I just posted in [1] would buy most of it back though... > I think it'll be necessary to run prove with all the tap tests to > parallelize them, rather than looping around directories, many of which have > only a single file, and are run serially. That's unfortunately not trivially possible. Quite a few tests currently rely on being called in a specific directory. We should fix this, but it's not a trivial amount of work. Greetings, Andres Freund [1] https://postgr.es/m/20220119010034.javla5sakeh2a4fa%40alap3.anarazel.de
I just found one thing making check-world slower than it ought to be: src/test/recovery/t/008_fsm_truncation.pl does $node_primary->append_conf( 'postgresql.conf', qq{ fsync = on wal_log_hints = on max_prepared_transactions = 5 autovacuum = off }); There is no reason for this script to be overriding Cluster.pm's fsync = off setting. This actually causes parallel check-world to fail altogether on florican's host, because the initial fsync of the recovered primary takes more than 3 minutes when there's conflicting I/O traffic, causing pg_ctl to time out. This appears to go back to 917dc7d23 of 2016, so I think it just predates our recognition that we should disable fsync in routine tests. regards, tom lane
Hi, On 2022-01-18 21:50:07 -0500, Tom Lane wrote: > I just found one thing making check-world slower than it ought to be: > src/test/recovery/t/008_fsm_truncation.pl does > > $node_primary->append_conf( > 'postgresql.conf', qq{ > fsync = on > wal_log_hints = on > max_prepared_transactions = 5 > autovacuum = off > }); > > There is no reason for this script to be overriding Cluster.pm's > fsync = off setting. > > This appears to go back to 917dc7d23 of 2016, so I think it just > predates our recognition that we should disable fsync in routine > tests. Yea, I noticed this too. I was wondering if there's a conceivable reason to actually want fsyncs, but I couldn't come up with one. On systems where IO isn't overloaded, the main problem with this test are elsewhere: It multiple times waits for VACUUMs that are blocked truncating the table. Which these days takes 5 seconds. Thus the test takes quite a while. To me VACUUM_TRUNCATE_LOCK_TIMEOUT = 5s seems awfully long. On a system with a lot of tables that's much more than vacuum will take. So this can easily lead to using up all autovacuum workers... > This actually causes parallel check-world to fail altogether on florican's > host, because the initial fsync of the recovered primary takes more than 3 > minutes when there's conflicting I/O traffic, causing pg_ctl to time out. Ugh. I noticed a few other sources of "unnecessary" fsyncs. The most frequent being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests are surprisingly large, 135k for a freshly initdb'd cluster. There's an fsync in walmethods.c:tar_close() that sounds intentional, but I don't really understand what the comment: /* Always fsync on close, so the padding gets fsynced */ if (tar_sync(f) < 0) Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2022-01-18 21:50:07 -0500, Tom Lane wrote: >> There is no reason for this script to be overriding Cluster.pm's >> fsync = off setting. >> This appears to go back to 917dc7d23 of 2016, so I think it just >> predates our recognition that we should disable fsync in routine >> tests. > Yea, I noticed this too. I was wondering if there's a conceivable reason to > actually want fsyncs, but I couldn't come up with one. On the one hand, it feels a little wrong if our test suites never reach our fsync calls at all. On the other hand, it's not clear what is meaningful about testing fsync when your test scenario doesn't include a plug pull. I'd be okay with having some exercise of the fsync code paths in a test that is (a) designated for the purpose and (b) arranged to not take an excessive amount of time, even under heavy load. 008_fsm_truncation.pl is neither of those things. It seems entirely random that it has fsync = on when we don't test that elsewhere. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2022-01-18 21:50:07 -0500, Tom Lane wrote: >> This actually causes parallel check-world to fail altogether on florican's >> host, because the initial fsync of the recovered primary takes more than 3 >> minutes when there's conflicting I/O traffic, causing pg_ctl to time out. > Ugh. I misspoke there: it's the standby that is performing an fsync'd checkpoint and timing out, during the test's promote-the-standby step. This test attempt revealed another problem too: the standby never shut down, and thus the calling "make" never quit, until I intervened manually. I'm not sure why. I see that Cluster::promote uses system_or_bail() to run "pg_ctl promote" ... could it be that BAIL_OUT causes the normal script END hooks to not get run? But it seems like we'd have noticed that long ago. regards, tom lane
I wrote: > This test attempt revealed another problem too: the standby never > shut down, and thus the calling "make" never quit, until I intervened > manually. I'm not sure why. I see that Cluster::promote uses > system_or_bail() to run "pg_ctl promote" ... could it be that > BAIL_OUT causes the normal script END hooks to not get run? > But it seems like we'd have noticed that long ago. I failed to reproduce any failure in the promote step, and I now think I was mistaken and it happened during the standby's initial start. I can reproduce that very easily by setting PGCTLTIMEOUT to a second or two; with fsync enabled, it takes the standby more than that to reach a consistent state. And the cause of that is obvious: Cluster::start thinks that if "pg_ctl start" failed, there couldn't possibly be a postmaster running. So it doesn't bother to update self->_pid, and then the END hook thinks there is nothing to do. Now, leaving an idle postmaster hanging around isn't a mortal sin, since it'll go away by itself shortly after the next cycle of testing does an "rm -rf" on its data directory. But it's ugly, and conceivably it could cause problems for later testing on machines with limited shmem or semaphore space. The attached simple fix gets rid of this problem. Any objections? regards, tom lane diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 7af0f8db13..fd0738d12d 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -845,6 +845,11 @@ sub start { print "# pg_ctl start failed; logfile:\n"; print PostgreSQL::Test::Utils::slurp_file($self->logfile); + + # pg_ctl could have timed out, so check to see if there's a pid file; + # without this, we fail to shut down the new postmaster later. + $self->_update_pid(-1); + BAIL_OUT("pg_ctl start failed") unless $params{fail_ok}; return 0; } @@ -1123,7 +1128,10 @@ archive_command = '$copy_command' return; } -# Internal method +# Internal method to update $self->{_pid} +# $is_running = 1: pid file should be there +# $is_running = 0: pid file should NOT be there +# $is_running = -1: we aren't sure sub _update_pid { my ($self, $is_running) = @_; @@ -1138,7 +1146,7 @@ sub _update_pid close $pidfile; # If we found a pidfile when there shouldn't be one, complain. - BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running; + BAIL_OUT("postmaster.pid unexpectedly present") if $is_running == 0; return; } @@ -1146,7 +1154,7 @@ sub _update_pid print "# No postmaster PID for node \"$name\"\n"; # Complain if we expected to find a pidfile. - BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running; + BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running == 1; return; }
Hi, On 2022-01-19 15:05:44 -0500, Tom Lane wrote: > And the cause of that is obvious: Cluster::start thinks that if "pg_ctl > start" failed, there couldn't possibly be a postmaster running. So it > doesn't bother to update self->_pid, and then the END hook thinks there is > nothing to do. Ah, right. I'm doubtful that it's good that we use BAIL_OUT so liberally, because it prevents further tests from running (i.e. if 001 bails, 002+ doesn't run), which doesn't generally seem like the right thing to do after a single test fails. But that's really independent of the fix you make here. > The attached simple fix gets rid of this problem. Any objections? Nope, sounds like a plan. > diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm > index 7af0f8db13..fd0738d12d 100644 > --- a/src/test/perl/PostgreSQL/Test/Cluster.pm > +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm > @@ -845,6 +845,11 @@ sub start > { > print "# pg_ctl start failed; logfile:\n"; > print PostgreSQL::Test::Utils::slurp_file($self->logfile); > + > + # pg_ctl could have timed out, so check to see if there's a pid file; > + # without this, we fail to shut down the new postmaster later. > + $self->_update_pid(-1); I'd maybe do s/later/in the END block/ or such, so that one knows where to look. Took me a minute to find it again. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I'm doubtful that it's good that we use BAIL_OUT so liberally, because it > prevents further tests from running (i.e. if 001 bails, 002+ doesn't run), > which doesn't generally seem like the right thing to do after a single test > fails. But that's really independent of the fix you make here. Agreed, that's a separate question. It does seem like "stop this script and move to the next one" would be better behavior, though. > I'd maybe do s/later/in the END block/ or such, so that one knows where to > look. Took me a minute to find it again. OK. regards, tom lane
pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree)
От
Andres Freund
Дата:
Hi, On 2022-01-18 20:16:46 -0800, Andres Freund wrote: > I noticed a few other sources of "unnecessary" fsyncs. The most frequent > being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests are > surprisingly large, 135k for a freshly initdb'd cluster. Robert, I assume the fsync for manifests isn't ignoring --no-sync for a particular reason? The attached patch adds no-sync handling to the manifest rename, as well as one case in the directory wal method. It's a bit painful that we have to have code like if (dir_data->sync) r = durable_rename(tmppath, tmppath2); else { if (rename(tmppath, tmppath2) != 0) { pg_log_error("could not rename file \"%s\" to \"%s\": %m", tmppath, tmppath2); r = -1; } } It seems like it'd be better to set it up so that durable_rename() could decide internally wether to fsync, or have a wrapper around durable_rename()? > There's an fsync in walmethods.c:tar_close() that sounds intentional, but I > don't really understand what the comment: > > /* Always fsync on close, so the padding gets fsynced */ > if (tar_sync(f) < 0) tar_sync() actually checks for tar_data->sync, so it doesn't do an fsync. Arguably the comment is a bit confusing, but ... Greetings, Andres Freund
Вложения
Re: pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree)
От
Andres Freund
Дата:
Hi, On 2022-01-21 12:00:57 -0800, Andres Freund wrote: > The attached patch adds no-sync handling to the manifest rename, as well as > one case in the directory wal method. Pushed that. Greetings, Andres Freund
On Tue, Jan 18, 2022 at 03:08:47PM -0600, Justin Pryzby wrote: > On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote: > > I think it might still be worth adding stopgap way of running all tap tests on > > windows though. Having a vcregress.pl function to find all directories with t/ > > and run the tests there, shouldn't be a lot of code... > > I started doing that, however it makes CI/windows even slower. I think it'll > be necessary to run prove with all the tap tests to parallelize them, rather > than looping around directories, many of which have only a single file, and are > run serially. FYI: I've rebased these against your cirrus/windows changes. A recent cirrus result is here; this has other stuff in the branch, so you can see the artifacts with HTML docs and code coverage. https://github.com/justinpryzby/postgres/runs/5046465342 95793675633 cirrus: spell ccache_maxsize e5286ede1b4 cirrus: avoid unnecessary double star ** 03f6de4643e cirrus: include hints how to install OS packages.. 39cc2130e76 cirrus/linux: script test.log.. 398b7342349 cirrus/macos: uname -a and export at end of sysinfo 9d0f03d3450 wip: cirrus: code coverage bff64e8b998 cirrus: upload html docs as artifacts 80f52c3b172 wip: only upload changed docs 7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode ba229165fe1 wip: run pg_upgrade tests with data-checksums 97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1 654b6375401 wip: vcsregress: add alltaptests BTW, I think the double star added in f3feff825 probably doesn't need to be double, either: path: "crashlog-**.txt" -- Justin
Hi, On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote: > FYI: I've rebased these against your cirrus/windows changes. Did you put then on a dedicated branch, or only intermixed with other changes? > A recent cirrus result is here; this has other stuff in the branch, so you can > see the artifacts with HTML docs and code coverage. I'm a bit worried about the increased storage and runtime overhead due to the docs changes. We probably can make it a good bit cheaper though. > https://github.com/justinpryzby/postgres/runs/5046465342 > 95793675633 cirrus: spell ccache_maxsize Yep, will apply with a bunch of your other changes, if you answer a question or two... > e5286ede1b4 cirrus: avoid unnecessary double star ** Can't get excited about this, but whatever. What I am excited about is that some of your other changes showed that we don't need separate *_artifacts for separate directories anymore. That used to be the case, but an array of paths is now supported. Putting log, diffs, and regress_log in one directory will be considerably more convenient... > 03f6de4643e cirrus: include hints how to install OS packages.. What's the idea behind #echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list That's already in sources.list, so I'm not sure what this shows? I think it may be a bit cleaner to have the "install additional packages" "template step" be just that, and not merge in other contents into it? > 39cc2130e76 cirrus/linux: script test.log.. I still don't understand what this commit is trying to achieve. > 398b7342349 cirrus/macos: uname -a and export at end of sysinfo Shrug. > 9d0f03d3450 wip: cirrus: code coverage I don't think it's good to just unconditionally reference the master branch here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but not other uses. Perhaps we could have a cfbot special case (by checking for a repository variable variable indicating the base branch) and show the incremental changes to that branch? Or we could just check which branch has the smallest distance in #commits? If cfbot weren't a thing, I'd just make a code coverage / docs generation a manual task (startable by a click in the UI). But that requires permission on the repository... Hm. I wonder if cfbot could maintain the code not as branches as such, but as pull requests? Those include information about what the base branch is ;) Is looking at the .c files in the change really a reliable predictor of where code coverage changes? I'm doubtful. Consider stuff like removing the last user of some infrastructure or such. Or adding the first. > bff64e8b998 cirrus: upload html docs as artifacts > 80f52c3b172 wip: only upload changed docs Similar to the above. > 7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode I think that should be discussed on a different thread. > 97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1 Probably also worth breaking out into a new thread. > 654b6375401 wip: vcsregress: add alltaptests I assume this doesn't yet work to a meaningful degree? Last time I checked there were quite a few tests that needed to be invoked in a specific directory. In the meson branch I worked around that by having a wrapper around the invocation of individual tap tests that changes CWD. Greetings, Andres Freund
On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote: > On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote: > > FYI: I've rebased these against your cirrus/windows changes. > > Did you put then on a dedicated branch, or only intermixed with other changes? Yes it's intermixed (because I have too many branches, and because in this case it's useful to show the doc builds and coverage artifacts). > > A recent cirrus result is here; this has other stuff in the branch, so you can > > see the artifacts with HTML docs and code coverage. > > I'm a bit worried about the increased storage and runtime overhead due to the > docs changes. We probably can make it a good bit cheaper though. If you mean overhead of additional disk operations, it shouldn't be an issue, since the git clone uses shared references (not even hardlinks). If you meant storage capacity, I'm only uploading the *changed* docs as artifacts. The motivation being that it's a lot more convenient to look though a single .html, and not hundreds. > What's the idea behind > #echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list > That's already in sources.list, so I'm not sure what this shows? At one point I thought I needed it - maybe all I needed was "apt-get update".. > > 9d0f03d3450 wip: cirrus: code coverage > > I don't think it's good to just unconditionally reference the master branch > here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but > not other uses. That's only used for filtering changed files. It uses git diff --cherry-pick postgres/master..., which would *try* to avoid showing anything which is also in master. > Is looking at the .c files in the change really a reliable predictor of where > code coverage changes? I'm doubtful. Consider stuff like removing the last > user of some infrastructure or such. Or adding the first. You're right that it isn't particularly accurate, but it's a step forward if lots of patches use it to check/improve coverge of new code. In addition to the HTML generated for changed .c files, it's possible to create a coverage.gcov output for everything, which could be retrieved to generate full HTML locally. It's ~8MB (or 2MB after gzip). > > bff64e8b998 cirrus: upload html docs as artifacts > > 80f52c3b172 wip: only upload changed docs > > Similar to the above. Do you mean it's not reliable ? This is looking at which .html have changed (not sgml). Surely that's reliable ? > > 654b6375401 wip: vcsregress: add alltaptests > > I assume this doesn't yet work to a meaningful degree? Last time I checked > there were quite a few tests that needed to be invoked in a specific > directory. It works - tap_check() does chdir(). But it's slow, and maybe should try to implement a check-world target. It currently fails in 027_stream_regress.pl, although I keep hoping that it had been fixed... https://cirrus-ci.com/task/6116235950686208 (BTW, I just realized that that commit should also remove the recoverycheck call.) -- Justin
Hi, On February 3, 2022 9:04:04 PM PST, Justin Pryzby <pryzby@telsasoft.com> wrote: >On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote: >> On 2022-02-02 21:58:28 -0600, Justin Pryzby wrote: >> > FYI: I've rebased these against your cirrus/windows changes. >> >> What's the idea behind >> #echo 'deb http://deb.debian.org/debian bullseye main' >>/etc/apt/sources.list >> That's already in sources.list, so I'm not sure what this shows? > >At one point I thought I needed it - maybe all I needed was "apt-get update".. Yes, that's needed. There's no old pre fetched package list, because it'd be outdated anyway, and work *sometimes* for somepackages... They're also not small (image size influences job start speed heavily). >> > 9d0f03d3450 wip: cirrus: code coverage >> >> I don't think it's good to just unconditionally reference the master branch >> here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but >> not other uses. > >That's only used for filtering changed files. It uses git diff --cherry-pick >postgres/master..., which would *try* to avoid showing anything which is also >in master. The point is that master is not a relevant point of comparison when a commit in the 14 branch is tested. Regards, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi, On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote: > > I assume this doesn't yet work to a meaningful degree? Last time I checked > > there were quite a few tests that needed to be invoked in a specific > > directory. > > It works - tap_check() does chdir(). Ah, I thought you'd implemented a target that does it all in one prove invocation... > It currently fails in 027_stream_regress.pl, although I keep hoping that it > had been fixed... That's likely because you're not setting REGRESS_OUTPUTDIR like src/test/recovery/Makefile and recoverycheck() are doing. Greetings, Andres Freund
On Tue, Jan 18, 2022 at 05:16:26PM -0800, Andres Freund wrote: > On 2022-01-18 15:08:47 -0600, Justin Pryzby wrote: > > On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote: > > > I think it might still be worth adding stopgap way of running all tap tests on > > > windows though. Having a vcregress.pl function to find all directories with t/ > > > and run the tests there, shouldn't be a lot of code... > > > > I started doing that, however it makes CI/windows even slower. ... > > I think it'll be necessary to run prove with all the tap tests to > > parallelize them, rather than looping around directories, many of which have > > only a single file, and are run serially. > > That's unfortunately not trivially possible. Quite a few tests currently rely > on being called in a specific directory. We should fix this, but it's not a > trivial amount of work. On Sat, Feb 05, 2022 at 07:23:39PM -0800, Andres Freund wrote: > On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote: > > > I assume this doesn't yet work to a meaningful degree? Last time I checked > > > there were quite a few tests that needed to be invoked in a specific > > > directory. > > > > It works - tap_check() does chdir(). > > Ah, I thought you'd implemented a target that does it all in one prove > invocation... I had some success with that, but it doesn't seem to be significantly faster - it looks a lot like the tests are not actually running in parallel. I tried some variations like passing the list of dirs vs the list of files, and --jobs=9 vs -j9, without success. https://cirrus-ci.com/task/5580584675180544 https://github.com/justinpryzby/postgres/commit/a865adc5b8c fc7b3ea8bce vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1 03adb043d16 wip: vcsregress: add alltaptests 63bf0796ffd wip: vcregress: run alltaptests in parallel 9dc327f6b30 f!wip: vcregress: run alltaptests in a single prove invocation a865adc5b8c tmp: run tap tests first > > It currently fails in 027_stream_regress.pl, although I keep hoping that it > > had been fixed... > > That's likely because you're not setting REGRESS_OUTPUTDIR like > src/test/recovery/Makefile and recoverycheck() are doing. Yes, thanks. -- Justin
Hi, On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote: > I had some success with that, but it doesn't seem to be significantly faster - > it looks a lot like the tests are not actually running in parallel. Note that prove unfortunately serializes the test output to be in the order it started them, even when tests run concurrently. Extremely unhelpful, but ... Isn't this kind of a good test time? I thought earlier your alltaptests target took a good bit longer? One nice bit is that the output is a *lot* easier to read. You could try improving the total time by having prove remember slow tests and use that file to run the slowest tests first next time. --state slow,save or such I believe. Of course we'd have to save that state file... To verify that tests actually run concurrently you could emit a few notes. Looks like those are output immediately... Greetings, Andres Freund
Hi, On 2022-02-03 11:57:18 -0800, Andres Freund wrote: > > 95793675633 cirrus: spell ccache_maxsize > > Yep, will apply with a bunch of your other changes, if you answer a question > or two... Pushed. > > e5286ede1b4 cirrus: avoid unnecessary double star ** > > Can't get excited about this, but whatever. > > What I am excited about is that some of your other changes showed that we > don't need separate *_artifacts for separate directories anymore. That used to > be the case, but an array of paths is now supported. Putting log, diffs, and > regress_log in one directory will be considerably more convenient... pushed together. > > 398b7342349 cirrus/macos: uname -a and export at end of sysinfo > > Shrug. Pushed. Greetings, Andres Freund
Hi, Alvaro adding you because the "branch" discussion in the MERGE thread. On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote: > > I'm a bit worried about the increased storage and runtime overhead due to the > > docs changes. We probably can make it a good bit cheaper though. > > If you mean overhead of additional disk operations, it shouldn't be an issue, > since the git clone uses shared references (not even hardlinks). I was concerned about those and just the increased runtime of the script. Our sources are 130MB, leaving the shared .git aside. But maybe it's just fine. We probably can just get rid of the separate clone and configure run though? Build the docs, copy the output, do a git co parent docs/, build again? What was the reason behind moving the docs stuff from the compiler warnings task to linux? Not that either fits very well... I think it might be worth moving the docs stuff into its own task, using a 1 CPU container (docs build isn't parallel anyway). Think that'll be easier to see in the cfbot page. I think it's also good to run it independent of the linux task succeeding - a docs failure seems like a separate enough issue. > > > 9d0f03d3450 wip: cirrus: code coverage > > > > I don't think it's good to just unconditionally reference the master branch > > here. It'll do bogus stuff once 15 is branched off. It works for cfbot, but > > not other uses. > > That's only used for filtering changed files. It uses git diff --cherry-pick > postgres/master..., which would *try* to avoid showing anything which is also > in master. You commented in another email on this: On 2022-02-11 12:51:50 -0600, Justin Pryzby wrote: > Because I put your patch on top of some other branch with the CI coverage (and > other stuff). > > It tries to only show files changed by the branch being checked: > https://github.com/justinpryzby/postgres/commit/d668142040031915 > > But it has to figure out where the branch "starts". Which I did by looking at > "git diff --cherry-pick origin..." > > Andres thinks that does the wrong thing if CI is run manually (not by CFBOT) > for patches against backbranches. I'm not sure git diff --cherry-pick is > widely known/used, but I think using that relative to master may be good > enough. Note that I'm not concerned about "manually" running CI against other branches - I'm concerned about the point where where 15 branches off and CI will automatically also run against 15. E.g. in the postgres repo https://cirrus-ci.com/github/postgres/postgres/ I can see a few ways to deal with this: 1) iterate over release branches and see which has the smallest diff 2) parse the branch name, if it's a cfbot run, we know it's master, otherwise skip 3) change cfbot so that it maintains things as pull requests, which have a base branch > > Is looking at the .c files in the change really a reliable predictor of where > > code coverage changes? I'm doubtful. Consider stuff like removing the last > > user of some infrastructure or such. Or adding the first. > > You're right that it isn't particularly accurate, but it's a step forward if > lots of patches use it to check/improve coverge of new code. Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m -> ~7.15m), but probably acceptable. Although I also would like to enable -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as well (-fsanitize=address is a lot more expensive), they both work best on linux. > In addition to the HTML generated for changed .c files, it's possible to create > a coverage.gcov output for everything, which could be retrieved to generate > full HTML locally. It's ~8MB (or 2MB after gzip). Note sure that doing doing it locally adds much over just running tests locally. Greetings, Andres Freund
On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote: > > > e5286ede1b4 cirrus: avoid unnecessary double star ** > > > > Can't get excited about this, but whatever. > > > > What I am excited about is that some of your other changes showed that we > > don't need separate *_artifacts for separate directories anymore. That used to > > be the case, but an array of paths is now supported. Putting log, diffs, and > > regress_log in one directory will be considerably more convenient... > > pushed together. While rebasing, I noticed an error. You wrote **/.diffs, but should be **/*.diffs --- a/.cirrus.yml +++ b/.cirrus.yml @@ -30,15 +30,11 @@ env: # What files to preserve in case tests fail on_failure: &on_failure log_artifacts: - path: "**/**.log" + paths: + - "**/*.log" + - "**/.diffs" + - "**/regress_log_*" type: text/plain - regress_diffs_artifacts: - path: "**/**.diffs" - type: text/plain - tap_artifacts: - path: "**/regress_log_*" - type: text/plain -
On 2022-02-12 20:47:04 -0600, Justin Pryzby wrote: > While rebasing, I noticed an error. > > You wrote **/.diffs, but should be **/*.diffs Embarassing. Thanks for noticing! Pushed the fix...
On Sat, Feb 12, 2022 at 05:20:08PM -0800, Andres Freund wrote: > On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote: > > > I'm a bit worried about the increased storage and runtime overhead due to the > > > docs changes. We probably can make it a good bit cheaper though. > > > > If you mean overhead of additional disk operations, it shouldn't be an issue, > > since the git clone uses shared references (not even hardlinks). > > I was concerned about those and just the increased runtime of the script. Our > sources are 130MB, leaving the shared .git aside. But maybe it's just fine. > > We probably can just get rid of the separate clone and configure run though? > Build the docs, copy the output, do a git co parent docs/, build again? Yes - works great, thanks. > What was the reason behind moving the docs stuff from the compiler warnings > task to linux? I wanted to build docs even if the linux task fails. To allow CFBOT to link to them, so somoene can always review the docs, in HTML (rather than reading SGML with lines prefixed with +). > Not that either fits very well... I think it might be worth > moving the docs stuff into its own task, using a 1 CPU container (docs build > isn't parallel anyway). Think that'll be easier to see in the cfbot page. I Yeah. The only drawback is the duplication (including the "git parent" stuff). BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4. /usr/bin/xmllint --path . --noout --valid postgres.sgml /usr/bin/xmllint --path . --noout --valid postgres.sgml /usr/bin/xsltproc --path . --stringparam pg.version '15devel' stylesheet.xsl postgres.sgml /usr/bin/xsltproc --path . --stringparam pg.version '15devel' stylesheet-man.xsl postgres.sgml > 1) iterate over release branches and see which has the smallest diff Maybe for each branch: do echo `git revlist or merge base |wc -l` $branch; done |sort -n |head -1 > > > Is looking at the .c files in the change really a reliable predictor of where > > > code coverage changes? I'm doubtful. Consider stuff like removing the last > > > user of some infrastructure or such. Or adding the first. > > > > You're right that it isn't particularly accurate, but it's a step forward if > > lots of patches use it to check/improve coverge of new code. > > Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m > -> ~7.15m), but probably acceptable. Although I also would like to enable > -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as > well (-fsanitize=address is a lot more expensive), they both work best on > linux. There's other things that it'd be nice to enable, but maybe these don't need to be on by default. Maybe just have a list of optional compiler flags (and hints when they're useful). Like WRITE_READ_PARSE_PLAN_TREES. > > In addition to the HTML generated for changed .c files, it's possible to create > > a coverage.gcov output for everything, which could be retrieved to generate > > full HTML locally. It's ~8MB (or 2MB after gzip). > > Note sure that doing doing it locally adds much over just running tests > locally. You're right, since one needs to have the patched source files to generate the HTML. On Thu, Feb 03, 2022 at 11:57:18AM -0800, Andres Freund wrote: > I think it may be a bit cleaner to have the "install additional packages" > "template step" be just that, and not merge in other contents into it? I renamed the "cores" task since it consistently makes me think you're doing with CPU cores. It took it as an opportunity to generalize the task. These patches are ready for review. I'll plan to mail about TAP stuff tomorrow.
Вложения
Hi, On 2022-02-12 23:19:38 -0600, Justin Pryzby wrote: > On Sat, Feb 12, 2022 at 05:20:08PM -0800, Andres Freund wrote: > > What was the reason behind moving the docs stuff from the compiler warnings > > task to linux? > > I wanted to build docs even if the linux task fails. To allow CFBOT to link to > them, so somoene can always review the docs, in HTML (rather than reading SGML > with lines prefixed with +). I'd be ok with running the compiler warnings job without the dependency, if that's the connection. > BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4. > /usr/bin/xmllint --path . --noout --valid postgres.sgml > /usr/bin/xmllint --path . --noout --valid postgres.sgml > /usr/bin/xsltproc --path . --stringparam pg.version '15devel' stylesheet.xsl postgres.sgml > /usr/bin/xsltproc --path . --stringparam pg.version '15devel' stylesheet-man.xsl postgres.sgml Sure, it just doesn't make a difference: make -j48 -C doc/src/sgml/ maintainer-clean && time make -j48 -C doc/src/sgml/ real 0m34.626s user 0m34.342s sys 0m0.298s make -j48 -C doc/src/sgml/ maintainer-clean && time make -C doc/src/sgml/ real 0m34.780s user 0m34.494s sys 0m0.285s > > 1) iterate over release branches and see which has the smallest diff > > Maybe for each branch: do echo `git revlist or merge base |wc -l` $branch; done |sort -n |head -1 > > > > > Is looking at the .c files in the change really a reliable predictor of where > > > > code coverage changes? I'm doubtful. Consider stuff like removing the last > > > > user of some infrastructure or such. Or adding the first. > > > > > > You're right that it isn't particularly accurate, but it's a step forward if > > > lots of patches use it to check/improve coverge of new code. > > > > Maybe it's good enough... The overhead in test runtime is noticeable (~5.30m > > -> ~7.15m), but probably acceptable. Although I also would like to enable > > -fsanitize=alignment and -fsanitize=alignment, which add about 2 minutes as > > well (-fsanitize=address is a lot more expensive), they both work best on > > linux. > > There's other things that it'd be nice to enable, but maybe these don't need to > be on by default. Maybe just have a list of optional compiler flags (and hints > when they're useful). Like WRITE_READ_PARSE_PLAN_TREES. I think it'd be good to enable a reasonable set by default. Particularly for newer contributors stuff like forgetting in/out/readfuncs, or not knowing about some undefined behaviour, is easy. Probably makes sense to use different settings on different tasks. Greetings, Andres Freund
On Sun, Feb 13, 2022 at 3:30 AM Andres Freund <andres@anarazel.de> wrote: > > There's other things that it'd be nice to enable, but maybe these don't need to > > be on by default. Maybe just have a list of optional compiler flags (and hints > > when they're useful). Like WRITE_READ_PARSE_PLAN_TREES. > > I think it'd be good to enable a reasonable set by default. Particularly for > newer contributors stuff like forgetting in/out/readfuncs, or not knowing > about some undefined behaviour, is easy. Probably makes sense to use different > settings on different tasks. This is exactly why I'm not a huge fan of having ci stuff in the tree. It supposes that there's one right way to do a build, but in reality, different people want and indeed need to use different options for all kinds of reasons. That's the whole value of having things like configure and pg_config_manual.h. When we start arguing about whether or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably into the realm where no choice is objectively better, and whoever yells the loudest will get it the way they want, and then somebody else later will say "well that's dumb I don't like that" or even just "well that's not the right thing for testing MY patch," at which point the previous mailing list discussion will be cited as "precedent" for what was essentially an arbitrary decision made by 1 or 2 people. Mind you, I'm not trying to hold back the tide. I realize that in-tree ci stuff is very much in vogue. But I think it would be a very healthy thing if we acknowledged that what goes in there is far more arbitrary than most of what we put in the tree. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > This is exactly why I'm not a huge fan of having ci stuff in the tree. > It supposes that there's one right way to do a build, but in reality, > different people want and indeed need to use different options for all > kinds of reasons. That's the whole value of having things like > configure and pg_config_manual.h. When we start arguing about whether > or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably > into the realm where no choice is objectively better, Right. Can we set things up so that it's not too painful to inject custom build options into a CI build? I should think that at the very least one needs to be able to vary the configure switches and CPPFLAGS/CFLAGS. regards, tom lane
Hi, On 2022-02-13 12:13:17 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > This is exactly why I'm not a huge fan of having ci stuff in the tree. > > It supposes that there's one right way to do a build, but in reality, > > different people want and indeed need to use different options for all > > kinds of reasons. Sure. But why is that an argument against "having ci stuff in the tree"? All it does is to make sure that a certain base-level of testing is easy to achieve for everyone. I don't like working on windows or mac, but my patches often have platform dependent bits. Now it's less likely that I need to manually interact with windows. I don't think we can (or well should) replace the buildfarm with the CI stuff. The buildfarm provides extensive and varied coverage for master/release branches. Which isn't feasible for unmerged development work, including cfbot, from a resource usage POV alone. > > That's the whole value of having things like > > configure and pg_config_manual.h. When we start arguing about whether > > or ci builds should use -DWRITE_READ_PARSE_PLAN_TREES we're inevitably > > into the realm where no choice is objectively better, > Right. Can we set things up so that it's not too painful to inject > custom build options into a CI build? What kind of injection are you thinking about? A patch author can obviously just add options in .cirrus.yml. That's something possible now, that was not possible with cfbot applying its own .cirrus.yml It'd be nice if there were a way to do it more easily for msvc and configure builds together, right now it'd require modifying those tasks in different ways. But that's not really a CI question. I'd like to have things like -fanitize=aligned and -DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's benefit. Most patch authors won't know about using -DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling them. We're *really* not doing well on the "timely review" side of things, so we at least should not waste time on high latency back-forth for easily automatically detectable things. > I should think that at the very least one needs to be able to vary the > configure switches and CPPFLAGS/CFLAGS. Do you mean as part of a patch tested with cfbot, CI running for pushes to your own repository, or ...? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2022-02-13 12:13:17 -0500, Tom Lane wrote: >> Right. Can we set things up so that it's not too painful to inject >> custom build options into a CI build? > What kind of injection are you thinking about? That's exactly what needs to be decided. > A patch author can obviously > just add options in .cirrus.yml. That's something possible now, that was not > possible with cfbot applying its own .cirrus.yml Fine, but are committers supposed to keep track of the fact that they shouldn't commit that part of a patch? I'd prefer something a bit more out-of-band. I don't know this technology well enough to propose a way. > I'd like to have things like -fanitize=aligned and > -DWRITE_READ_PARSE_PLAN_TREES on by default for CI, primarily for cfbot's > benefit. Most patch authors won't know about using > -DWRITE_READ_PARSE_PLAN_TREES etc, so they won't even think about enabling > them. We're *really* not doing well on the "timely review" side of things, so > we at least should not waste time on high latency back-forth for easily > automatically detectable things. I don't personally have an objection to either of those; maybe Robert does. regards, tom lane
Hi, On 2022-02-13 15:09:20 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2022-02-13 12:13:17 -0500, Tom Lane wrote: > >> Right. Can we set things up so that it's not too painful to inject > >> custom build options into a CI build? > > > What kind of injection are you thinking about? > > That's exactly what needs to be decided. > [...] > I'd prefer something a bit more out-of-band. I don't know this technology > well enough to propose a way. I don't yet understand the precise use case for adjustments well enough to propose something. Who would like to adjust something for what purpose? The "original" author, for a one-off test? A reviewer / committer, to track down a hunch? If it's about CI runs in in a personal repository, one can set additional environment vars from the CI management interface. We can make sure they work (the ci stuff probably overrides CFLAGS, but COPT should work) and document the way to do so. > > A patch author can obviously > > just add options in .cirrus.yml. That's something possible now, that was not > > possible with cfbot applying its own .cirrus.yml > > Fine, but are committers supposed to keep track of the fact that they > shouldn't commit that part of a patch? I'd say it depends on the the specific modification - there's some patches where it seems to make sense to adjust extend CI as part of it and have it merged. But yea, in other cases committers would have to take them out. For more on-off stuff one would presumably not want to spam the list the list with a full patchset to trigger a cfbot run, nor wait till cfbot gets round to that patch again. When pushing to a personal repo it's of course easy to just have a commit on-top of what's submitted to play around with compile options. Greetings, Andres Freund
On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote: > > What I am excited about is that some of your other changes showed that we > > don't need separate *_artifacts for separate directories anymore. That used to > > be the case, but an array of paths is now supported. Putting log, diffs, and > > regress_log in one directory will be considerably more convenient... > > pushed together. This change actually complicates things. Before, there was log/src/test/recovery/tmp_check/log, with a few files for 001, a few for 002, a few for 003. This are a lot of output files, but at least they're all related. Now, there's a single log/tmp_check/log, which has logs for the entire tap tests. There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc. -- Justin
Hi, On 2022-02-13 15:02:50 -0600, Justin Pryzby wrote: > On Sat, Feb 12, 2022 at 04:24:20PM -0800, Andres Freund wrote: > > > What I am excited about is that some of your other changes showed that we > > > don't need separate *_artifacts for separate directories anymore. That used to > > > be the case, but an array of paths is now supported. Putting log, diffs, and > > > regress_log in one directory will be considerably more convenient... > > > > pushed together. > > This change actually complicates things. > > Before, there was log/src/test/recovery/tmp_check/log, with a few files for > 001, a few for 002, a few for 003. This are a lot of output files, but at > least they're all related. > Now, there's a single log/tmp_check/log, which has logs for the entire tap > tests. There's 3 pages of 001*, 2 pages of 002*, 3 pages of 003, etc. Hm? Doesn't look like that to me, and I don't see why it would work that way? This didn't do anything to flatten the directory hierarchy, just combine three hierarchies into one? What I see, and what I expect, is that logs end up in e.g. log/src/test/recovery/tmp_check/log but that that directory contains regress_log_*, as well as *.log? Before one needed to go through the hierarchy multiple times to see both regress_log_ (i.e. tap test log) as well as 0*.log (i.e. server logs). A random example: https://cirrus-ci.com/task/5152523873943552 shows the logs for the failure in log/src/bin/pg_upgrade/tmp_check/log If you're seeing this on windows on one of your test branches, that's much more likely to be caused by the alltaptests stuff, than by the change in artifact instruction. Greetings, Andres Freund
On Sun, Feb 13, 2022 at 01:23:16PM -0800, Andres Freund wrote: > If you're seeing this on windows on one of your test branches, that's much > more likely to be caused by the alltaptests stuff, than by the change in > artifact instruction. Oh - I suppose you're right. That's an unfortunate consequence of running a single prove instance without chdir. -- Justin
On Sat, Feb 12, 2022 at 02:26:25PM -0800, Andres Freund wrote: > On 2022-02-12 16:06:40 -0600, Justin Pryzby wrote: > > I had some success with that, but it doesn't seem to be significantly faster - > > it looks a lot like the tests are not actually running in parallel. Note that the total test time is close to the sum of the individual test times. But I think that may be an artifact of how prove is showing/attributing times to each test (which, if so, is misleading). > Note that prove unfortunately serializes the test output to be in the order it > started them, even when tests run concurrently. Extremely unhelpful, but ... Are you sure ? When I run it locally, I see: rm -fr src/test/recovery/tmp_check ; time PERL5LIB=`pwd`/src/test/perl TESTDIR=`pwd`/src/test/recovery PATH=`pwd`/tmp_install/usr/local/pgsql/bin:$PATHPG_REGRESS=`pwd`/src/test/regress/pg_regress REGRESS_SHLIB=`pwd`/src/test/regress/regress.soprove --time -j4 --ext '*.pl' `find src -name t` ... [15:34:48] src/bin/scripts/t/101_vacuumdb_all.pl ....................... ok 104 ms ( 0.00 usr 0.00 sys + 2.35 cusr 0.47 csys = 2.82 CPU) [15:34:49] src/bin/scripts/t/090_reindexdb.pl .......................... ok 8894 ms ( 0.06 usr 0.01 sys + 14.45 cusr 3.38 csys = 17.90 CPU) [15:34:50] src/bin/pg_config/t/001_pg_config.pl ........................ ok 79 ms ( 0.00 usr 0.01 sys + 0.23 cusr 0.04 csys = 0.28 CPU) [15:34:50] src/bin/pg_waldump/t/001_basic.pl ........................... ok 35 ms ( 0.00 usr 0.00 sys + 0.26 cusr 0.02 csys = 0.28 CPU) [15:34:51] src/bin/pg_test_fsync/t/001_basic.pl ........................ ok 100 ms ( 0.01 usr 0.00 sys + 0.24 cusr 0.04 csys = 0.29 CPU) [15:34:51] src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl ........ ok 177 ms ( 0.02 usr 0.00 sys + 0.26 cusr 0.03 csys = 0.31 CPU) [15:34:55] src/bin/scripts/t/100_vacuumdb.pl ........................... ok 11267 ms ( 0.12 usr 0.04 sys + 13.47 cusr 3.20 csys = 16.83 CPU) [15:34:57] src/bin/scripts/t/102_vacuumdb_stages.pl .................... ok 5802 ms ( 0.06 usr 0.01 sys + 7.70 cusr 1.37 csys = 9.14 CPU) ... => scripts/ stuff, followed by other stuff, followed by more, slower, scripts/ stuff. But I never saw that in cirrus. > Isn't this kind of a good test time? I thought earlier your alltaptests target > took a good bit longer? The original alltaptests runs in 16m 21s. https://cirrus-ci.com/task/6679061752709120 2 weeks ago, it was ~14min with your patch to cache initdb. https://cirrus-ci.com/task/5439320633901056 As I recall, that didn't seem to improve runtime when combined with my parallel patch. > One nice bit is that the output is a *lot* easier to read. > > You could try improving the total time by having prove remember slow tests and > use that file to run the slowest tests first next time. --state slow,save or > such I believe. Of course we'd have to save that state file... In a test, this hurt rather than helped (13m 42s). https://cirrus-ci.com/task/6359167186239488 I'm not surprised - it makes sense to run 10 fast tests at once, but usually doesn't make sense to run 10 slow tests tests at once (at least a couple of which are doing something intensive). It was faster (12m16s) to do it backwards (fastest tests first). https://cirrus-ci.com/task/5745115443494912 BTW, does it make sense to remove test_regress_parallel_script ? The pg_upgrade run would do the same things, no ? If so, it might make sense to run that first. OTOH, you suggested to run the upgrade tests with checksums enabled, which seems like a good idea. Note that in the attached patches, I changed the msvc "warnings" to use "tee". I don't know how to fix the pipeline test in a less hacky way... You said the docs build should be a separate task, but then said that it'd be okay to remove the dependency. So I did it both ways. There's currently some duplication between the docs patch and code coverage patch. -- Justin
Вложения
- 0001-cirrus-include-hints-how-to-install-OS-packages.patch
- 0002-cirrus-windows-add-compiler_warnings_script.patch
- 0003-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0004-s-build-docs-as-a-separate-task.patch
- 0005-vcregress-ci-test-modules-contrib-with-NO_INSTALLCHE.patch
- 0006-wip-cirrus-code-coverage.patch
- 0007-vcsregress-add-alltaptests.patch
- 0008-vcregress-run-alltaptests-in-parallel.patch
Hi, On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote: > Oh - I suppose you're right. That's an unfortunate consequence of running a > single prove instance without chdir. I don't think it's chdir that's relevant (that changes into the source directory after all). It's the TESTDIR environment variable. I was thinking that we should make Utils.pm's INIT block responsible for figuring out both the directory a test should run in and the log location, instead having that in vcregress.pl and Makefile.global.in. Mostly because doing it in the latter means we can't start tests with different TESTDIR and working dir at the same time. If instead we pass the location of the top-level build and top-level source directory from vcregress.pl / Makefile.global, the tap test infrastructure can figure out that stuff themselves, on a per-test basis. For msvc builds we probably would need to pass in some information that allow Utils.pm to set up PATH appropriately. I think that might just require knowing that a) msvc build system is used b) Release vs Debug. Greetings, Andres Freund
Hi, On 2022-02-13 15:42:13 -0600, Justin Pryzby wrote: > > Note that prove unfortunately serializes the test output to be in the order it > > started them, even when tests run concurrently. Extremely unhelpful, but ... > > Are you sure ? Somewhat. I think it's a question of the prove version and some autodetection of what type of environment prove is running in (stdin/stdout/stderr). I don't remember the details, but at some point I pinpointed the source of the serialization, and verified that parallelization makes a significant difference in runtime even without being easily visible :(. But this is all vague memory, so I might be wrong. Reminds me that somebody (ugh, me???) should fix the perl > 5.26 incompatibilities on windows, then we'd also get a newer prove... > > One nice bit is that the output is a *lot* easier to read. > > > > You could try improving the total time by having prove remember slow tests and > > use that file to run the slowest tests first next time. --state slow,save or > > such I believe. Of course we'd have to save that state file... > > In a test, this hurt rather than helped (13m 42s). > https://cirrus-ci.com/task/6359167186239488 > > I'm not surprised - it makes sense to run 10 fast tests at once, but usually > doesn't make sense to run 10 slow tests tests at once (at least a couple of > which are doing something intensive). It was faster (12m16s) to do it > backwards (fastest tests first). > https://cirrus-ci.com/task/5745115443494912 Hm. I know I saw significant reduction in test times locally with meson by starting slow tests earlier, because they're the limiting factor for the *overall* test runtime - but I have more resources than on cirrus. Even locally on a windows VM, with the current buildsystem, I found that moving 027 to earlier withing recoverycheck reduced the test time. But it's possible that with all tests being scheduled concurrently, starting the slow tests early leads to sufficient resource overcommit to be problematic. > BTW, does it make sense to remove test_regress_parallel_script ? The > pg_upgrade run would do the same things, no ? If so, it might make sense to > run that first. OTOH, you suggested to run the upgrade tests with checksums > enabled, which seems like a good idea. No, I don't think so. The main regression tests are by far the most important thing during normal development. Just relying on main regression test runs embedded in other tests, with different output and config of the main regression test imo is just confusing. Greetings, Andres Freund
On Sun, Feb 13, 2022 at 01:53:19PM -0800, Andres Freund wrote: > Hi, > > On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote: > > Oh - I suppose you're right. That's an unfortunate consequence of running a > > single prove instance without chdir. > > I don't think it's chdir that's relevant (that changes into the source > directory after all). It's the TESTDIR environment variable. > > I was thinking that we should make Utils.pm's INIT block responsible for > figuring out both the directory a test should run in and the log location, > instead having that in vcregress.pl and Makefile.global.in. Mostly because > doing it in the latter means we can't start tests with different TESTDIR and > working dir at the same time. > > If instead we pass the location of the top-level build and top-level source > directory from vcregress.pl / Makefile.global, the tap test infrastructure can > figure out that stuff themselves, on a per-test basis. > > For msvc builds we probably would need to pass in some information that allow > Utils.pm to set up PATH appropriately. I think that might just require knowing > that a) msvc build system is used b) Release vs Debug. I'm totally unsure if this resembles what you're thinking of, and I'm surprised I got it working so easily. But it gets the tap test output in separate dirs, and CI is passing for everyone (windows failed because I injected a "false" to force it to upload artifacts). https://github.com/justinpryzby/postgres/runs/5211673291 commit 899e562102dd7a663cb087cdf88f0f78f8302492 Author: Justin Pryzby <pryzbyj@telsasoft.com> Date: Tue Feb 15 20:02:36 2022 -0600 wip: set TESTDIR from src/test/perl rather than Makefile/vcregress diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 05c54b27def..1e49d8c8c37 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -450,7 +450,7 @@ define prove_installcheck rm -rf '$(CURDIR)'/tmp_check $(MKDIR_P) '$(CURDIR)'/tmp_check cd $(srcdir) && \ - TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \ + PATH="$(bindir):$(CURDIR):$$PATH" \ PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' \ PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \ $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl) @@ -460,7 +460,7 @@ define prove_installcheck rm -rf '$(CURDIR)'/tmp_check $(MKDIR_P) '$(CURDIR)'/tmp_check cd $(srcdir) && \ - TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \ + PATH="$(bindir):$(CURDIR):$$PATH" \ PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \ PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \ $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl) @@ -471,7 +471,7 @@ define prove_check rm -rf '$(CURDIR)'/tmp_check $(MKDIR_P) '$(CURDIR)'/tmp_check cd $(srcdir) && \ - TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \ + $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \ PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \ $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl) endef diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index 005961f34d4..a86dc78a365 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -70,7 +70,7 @@ delete $ENV{LS_COLORS}; # to run in the build directory so that we can use relative paths to # access the tmp_check subdirectory; otherwise the output from filename # completion tests is too variable. -if ($ENV{TESTDIR}) +if ($ENV{TESTDIR} && 0) { chdir $ENV{TESTDIR} or die "could not chdir to \"$ENV{TESTDIR}\": $!"; } diff --git a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl index facfec5cad4..2a0eca77440 100644 --- a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl +++ b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl @@ -49,9 +49,7 @@ for my $testname (@tests) my $expected; my $result; - # Hack to allow TESTDIR=. during parallel tap tests - my $inputdir = "$ENV{'TESTDIR'}/src/test/modules/libpq_pipeline"; - $inputdir = "$ENV{'TESTDIR'}" if ! -e $inputdir; + my $inputdir = "$ENV{'TESTDIR'}/tmp_check"; $expected = slurp_file_eval("$inputdir/traces/$testname.trace"); next unless $expected ne ""; $result = slurp_file_eval($traceout); diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm index 57fcb240898..5429de41ed5 100644 --- a/src/test/perl/PostgreSQL/Test/Utils.pm +++ b/src/test/perl/PostgreSQL/Test/Utils.pm @@ -184,19 +184,21 @@ INIT # test may still fail, but it's more likely to report useful facts. $SIG{PIPE} = 'IGNORE'; - # Determine output directories, and create them. The base path is the - # TESTDIR environment variable, which is normally set by the invoking - # Makefile. - $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check"; + my $test_dir = File::Spec->rel2abs(dirname($0)); + my $test_name = basename($0); + $test_name =~ s/\.[^.]+$//; + + # Determine output directories, and create them. + # TODO: set TESTDIR and srcdir? + $tmp_check = "$test_dir/tmp_check"; $log_path = "$tmp_check/log"; + $ENV{TESTDIR} = $test_dir; mkdir $tmp_check; mkdir $log_path; # Open the test log file, whose name depends on the test name. - $test_logfile = basename($0); - $test_logfile =~ s/\.[^.]+$//; - $test_logfile = "$log_path/regress_log_$test_logfile"; + $test_logfile = "$log_path/regress_log_$test_name"; open my $testlog, '>', $test_logfile or die "could not open STDOUT to logfile \"$test_logfile\": $!"; diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl index fdb6f44eded..d7794c5766a 100644 --- a/src/tools/msvc/vcregress.pl +++ b/src/tools/msvc/vcregress.pl @@ -261,10 +261,8 @@ sub tap_check $ENV{PG_REGRESS} = "$topdir/$Config/pg_regress/pg_regress"; $ENV{REGRESS_SHLIB} = "$topdir/src/test/regress/regress.dll"; - $ENV{TESTDIR} = "$dir"; my $module = basename $dir; - # add the module build dir as the second element in the PATH - $ENV{PATH} =~ s!;!;$topdir/$Config/$module;!; + $ENV{VCREGRESS_MODE} = $Config; print "============================================================\n"; print "Checking @args\n";
Hi, On February 15, 2022 10:12:36 PM PST, Justin Pryzby <pryzby@telsasoft.com> wrote: >On Sun, Feb 13, 2022 at 01:53:19PM -0800, Andres Freund wrote: >> Hi, >> >> On 2022-02-13 15:31:20 -0600, Justin Pryzby wrote: >> > Oh - I suppose you're right. That's an unfortunate consequence of running a >> > single prove instance without chdir. >> >> I don't think it's chdir that's relevant (that changes into the source >> directory after all). It's the TESTDIR environment variable. >> >> I was thinking that we should make Utils.pm's INIT block responsible for >> figuring out both the directory a test should run in and the log location, >> instead having that in vcregress.pl and Makefile.global.in. Mostly because >> doing it in the latter means we can't start tests with different TESTDIR and >> working dir at the same time. >> >> If instead we pass the location of the top-level build and top-level source >> directory from vcregress.pl / Makefile.global, the tap test infrastructure can >> figure out that stuff themselves, on a per-test basis. >> >> For msvc builds we probably would need to pass in some information that allow >> Utils.pm to set up PATH appropriately. I think that might just require knowing >> that a) msvc build system is used b) Release vs Debug. > >I'm totally unsure if this resembles what you're thinking of, and I'm surprised >I got it working so easily. But it gets the tap test output in separate dirs, >and CI is passing for everyone (windows failed because I injected a "false" to >force it to upload artifacts). > >https://github.com/justinpryzby/postgres/runs/5211673291 Yes, that's along the lines I was thinking. I only checked it on my phone, so it certainly isn't a careful look... I think this should be discussed in a separate thread, for visibility. FWIW, I'd like to additionally add marker files in INIT and remove them in END. And create files signaling success and failurein END. That would allow automated selection of log files of failed tests... Andres Regards, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 13.02.22 09:30, Andres Freund wrote: >> BTW, docs can be built in parallel, and CI is using BUILD_JOBS: 4. >> /usr/bin/xmllint --path . --noout --valid postgres.sgml >> /usr/bin/xmllint --path . --noout --valid postgres.sgml >> /usr/bin/xsltproc --path . --stringparam pg.version '15devel' stylesheet.xsl postgres.sgml >> /usr/bin/xsltproc --path . --stringparam pg.version '15devel' stylesheet-man.xsl postgres.sgml > Sure, it just doesn't make a difference: > > make -j48 -C doc/src/sgml/ maintainer-clean && time make -j48 -C doc/src/sgml/ > real 0m34.626s > user 0m34.342s > sys 0m0.298s > > make -j48 -C doc/src/sgml/ maintainer-clean && time make -C doc/src/sgml/ > > real 0m34.780s > user 0m34.494s > sys 0m0.285s Note that the default target in doc/src/sgml/ is "html", not "all". If you build "all", you build "html" plus "man", which can be run in parallel. (It's only two jobs, of course.) If you're more ambitious, you could also run the PDF builds.
Have you tried to use the yet-to-be-released ccache with MSVC ? Also, do you know about msbuild /outputResultsCache ? When I tried that, it gave a bunch of error. https://cirrus-ci.com/task/5697497241747456 |[16:35:13.605] 1>c:\cirrus\pgsql.sln.metaproj : error : MSB4252: Project "c:\cirrus\pgsql.sln" with global properties[c:\cirrus\pgsql.sln] |[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : (TrackFileAccess=false; CLToolExe=clcache.exe) [c:\cirrus\pgsql.sln] |[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : is building project "c:\cirrus\initdb.vcxproj" with global properties[c:\cirrus\pgsql.sln] |[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : (TrackFileAccess=false; CLToolExe=clcache.exe; BuildingSolutionFile=true;CurrentSolutionConfigurationContents=<SolutionConfiguration> [c:\cirrus\pgsql.sln] |[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : <ProjectConfiguration Project="{1BD4D6DB-9B78-4A46-B2A7-04508802E281}"AbsolutePath="c:\cirrus\initdb.vcxproj" BuildProjectInSolution="True">Debug|x64</ProjectConfiguration>[c:\cirrus\pgsql.sln] |... |[16:35:14.518] c:\cirrus\pgsql.sln.metaproj : error : <ProjectConfiguration Project="{7E9336CA-5E94-4D99-9D34-BF65ED440A6F}"AbsolutePath="c:\cirrus\euc2004_sjis2004.vcxproj" BuildProjectInSolution="True">Debug|x64</ProjectConfiguration>[c:\cirrus\pgsql.sln] |[16:35:14.518] c:\cirrus\pgsql.sln.metaproj : error : </SolutionConfiguration>; SolutionDir=c:\cirrus\; SolutionExt=.sln;SolutionFileName=pgsql.sln; SolutionName=pgsql; SolutionPath=c:\cirrus\pgsql.sln; Configuration=Debug; Platform=x64)[c:\cirrus\pgsql.sln] |[16:35:14.518] c:\cirrus\pgsql.sln.metaproj : error : with the (default) target(s) but the build result for thebuilt project is not in the engine cache. In isolated builds this could mean one of the following: [c:\cirrus\pgsql.sln] |[16:35:14.518] c:\cirrus\pgsql.sln.metaproj : error : - the reference was called with a target which is not specifiedin the ProjectReferenceTargets item in project "c:\cirrus\pgsql.sln" [c:\cirrus\pgsql.sln] |[16:35:14.518] c:\cirrus\pgsql.sln.metaproj : error : - the reference was called with global properties thatdo not match the static graph inferred nodes [c:\cirrus\pgsql.sln] |[16:35:14.518] c:\cirrus\pgsql.sln.metaproj : error : - the reference was not explicitly specified as a ProjectReferenceitem in project "c:\cirrus\pgsql.sln" [c:\cirrus\pgsql.sln] |[16:35:14.518] c:\cirrus\pgsql.sln.metaproj : error : [c:\cirrus\pgsql.sln] |[16:35:14.518] |[16:35:14.518] 0 Warning(s) |[16:35:14.518] 149 Error(s) Did you ever try to use clcache (or others) ? When I tried, it refused to cache because of our debug settings (DebugInformationFormat) - which seem to be enabled even in release mode. I wonder if that'll be an issue for ccache, too. I think that line may need to be conditional on debug mode. https://cirrus-ci.com/task/4808554103177216 |[17:14:28.765] C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py Expanded commandline '['/c', '/Isrc/include','/Isrc/include/port/win32', '/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi', '/nologo','/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', '_WINDOWS', '/D', '__WINDOWS__', '/D', '__WIN32__','/D', 'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', '_CRT_NONSTDC_NO_DEPRECATE', '/D','FRONTEND', '/D', '_MBCS', '/GF', '/Gm-', '/EHsc', '/MD', '/GS', '/fp:precise', '/Zc:wchar_t', '/Zc:forScope', '/Zc:inline','/Fo.\\Release\\libpgcommon\\', '/Fd.\\Release\\libpgcommon\\libpgcommon.pdb', '/external:W3', '/Gd', '/TC','/wd4018', '/wd4244', '/wd4273', '/wd4101', '/wd4102', '/wd4090', '/wd4267', '/FC', '/errorReport:queue', '/MP', 'src/common/archive.c','src/common/base64.c', 'src/common/checksum_helper.c', 'src/common/config_info.c', 'src/common/controldata_utils.c','src/common/cryptohash_openssl.c', 'src/common/d2s.c', 'src/common/encnames.c', 'src/common/exec.c','src/common/f2s.c', 'src/common/fe_memutils.c', 'src/common/file_perm.c', 'src/common/file_utils.c','src/common/hashfn.c', 'src/common/hmac_openssl.c', 'src/common/ip.c', 'src/common/jsonapi.c','src/common/keywords.c', 'src/common/kwlookup.c', 'src/common/link-canary.c', 'src/common/logging.c','src/common/md5_common.c', 'src/common/pg_get_line.c', 'src/common/pg_lzcompress.c', 'src/common/pg_prng.c','src/common/pgfnames.c', 'src/common/protocol_openssl.c', 'src/common/psprintf.c', 'src/common/relpath.c','src/common/restricted_token.c', 'src/common/rmtree.c', 'src/common/saslprep.c', 'src/common/scram-common.c','src/common/sprompt.c', 'src/common/string.c', 'src/common/stringinfo.c', 'src/common/unicode_norm.c','src/common/username.c', 'src/common/wait_error.c', 'src/common/wchar.c']' |[17:14:28.765] C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py Cannot cache invocation as ['/c', '/Isrc/include','/Isrc/include/port/win32', '/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi', '/nologo','/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', '_WINDOWS', '/D', '__WINDOWS__', '/D', '__WIN32__','/D', 'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', '_CRT_NONSTDC_NO_DEPRECATE', '/D','FRONTEND', '/D', '_MBCS', '/GF', '/Gm-', '/EHsc', '/MD', '/GS', '/fp:precise', '/Zc:wchar_t', '/Zc:forScope', '/Zc:inline','/Fo.\\Release\\libpgcommon\\', '/Fd.\\Release\\libpgcommon\\libpgcommon.pdb', '/external:W3', '/Gd', '/TC','/wd4018', '/wd4244', '/wd4273', '/wd4101', '/wd4102', '/wd4090', '/wd4267', '/FC', '/errorReport:queue', '/MP', 'src/common/archive.c','src/common/base64.c', 'src/common/checksum_helper.c', 'src/common/config_info.c', 'src/common/controldata_utils.c','src/common/cryptohash_openssl.c', 'src/common/d2s.c', 'src/common/encnames.c', 'src/common/exec.c','src/common/f2s.c', 'src/common/fe_memutils.c', 'src/common/file_perm.c', 'src/common/file_utils.c','src/common/hashfn.c', 'src/common/hmac_openssl.c', 'src/common/ip.c', 'src/common/jsonapi.c','src/common/keywords.c', 'src/common/kwlookup.c', 'src/common/link-canary.c', 'src/common/logging.c','src/common/md5_common.c', 'src/common/pg_get_line.c', 'src/common/pg_lzcompress.c', 'src/common/pg_prng.c','src/common/pgfnames.c', 'src/common/protocol_openssl.c', 'src/common/psprintf.c', 'src/common/relpath.c','src/common/restricted_token.c', 'src/common/rmtree.c', 'src/common/saslprep.c', 'src/common/scram-common.c','src/common/sprompt.c', 'src/common/string.c', 'src/common/stringinfo.c', 'src/common/unicode_norm.c','src/common/username.c', 'src/common/wait_error.c', 'src/common/wchar.c']: external debug information(/Zi) is not supported -- Justin
Hi, On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote: > Have you tried to use the yet-to-be-released ccache with MSVC ? Yes, it doesn't work, because it requires cl.exe to be used in a specific way (only a single input file, specific output file naming). Which would require a decent amount of changes to src/tools/msvc. I think it's more realistic with meson etc. > Also, do you know about msbuild /outputResultsCache ? I don't think it's really usable for what we need. But it's hard to tell. > Did you ever try to use clcache (or others) ? > > When I tried, it refused to cache because of our debug settings > (DebugInformationFormat) - which seem to be enabled even in release mode. > I wonder if that'll be an issue for ccache, too. I think that line may need to > be conditional on debug mode. That's relatively easily solvable by using a different debug format IIRC (/Z7 or such). Greetings, Andres Freund
On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote: > > Did you ever try to use clcache (or others) ? > > > > When I tried, it refused to cache because of our debug settings > > (DebugInformationFormat) - which seem to be enabled even in release mode. > > > I wonder if that'll be an issue for ccache, too. I think that line may need to > > be conditional on debug mode. > > That's relatively easily solvable by using a different debug format IIRC (/Z7 > or such). Yes. I got that working for CI by overriding with a value from the environment. https://cirrus-ci.com/task/6191974075072512 This is right after rebasing, so it doesn't save anything, but normally cuts build time to 90sec, which isn't impressive, but it's something. BTW, I think it's worth compiling the windows build with optimizations (as I did here). At least with all the tap tests, this pays for itself. I suppose you don't want to use a Release build, but optimizations could be enabled by an(other) environment variable. -- Justin
This is the other half of my CI patches, which are unrelated to the TAP ones on the other thread.
Вложения
- 0001-cirrus-include-hints-how-to-install-OS-packages.patch
- 0002-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0003-s-build-docs-as-a-separate-task.patch
- 0004-wip-cirrus-code-coverage.patch
- 0005-cirrus-vcregress-test-modules-contrib-with-NO_INSTAL.patch
- 0006-wip-cirrus-windows-save-tmp_install-as-an-artifact.patch
- 0007-wip-cirrus-windows-add-compiler_warnings_script.patch
Hi, > Subject: [PATCH 2/7] cirrus: upload changed html docs as artifacts I still think the determination of the base branch needs to be resolved before this can be considered. > Always run doc build; to allow them to be shown in cfbot, they should not be > skipped if the linux build fails. > > This could be done on the client side (cfbot). One advantage of doing it here > is that fewer docs are uploaded - many patches won't upload docs at all. Imo this stuff is largely independent from the commit subject.... > XXX: if this is run in the same task, the configure flags should probably be > consistent ? What do you mean? > Subject: [PATCH 3/7] s!build docs as a separate task.. Could you reorder this to earlier, then we can merge it before resolving the branch issues. And we don't waffle on the CompilerWarnings dependency. > I believe this'll automatically show up as a separate "column" on the cfbot > page. Yup. > +# Verify docs can be built, and upload changed docs as artifacts > +task: > + name: HTML docs > + > + env: > + CPUS: 1 > + > + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*' > + > + container: > + image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest > + cpu: $CPUS > + how about using something like (the syntax might be slightly off) skip: !changesInclude('doc/**') to avoid running it for the many pushes where no docs are changed? > + sysinfo_script: | > + id > + uname -a > + cat /proc/cmdline > + ulimit -a -H && ulimit -a -S > + export > + > + git remote -v > + git branch -a > + git remote add postgres https://github.com/postgres/postgres > + time git fetch -v postgres master > + git log -1 postgres/master > + git diff --name-only postgres/master.. Hardly "sysinfo"? > Subject: [PATCH 4/7] wip: cirrus: code coverage > > XXX: lcov should be installed in the OS image FWIW, you can open a PR in https://github.com/anarazel/pg-vm-images/ both the debian docker and VM have their packages installed via scripts/linux_debian_install_deps.sh > From 226699150e3e224198fc297689add21bece51c4b Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 9 Jan 2022 18:25:02 -0600 > Subject: [PATCH 5/7] cirrus/vcregress: test modules/contrib with > NO_INSTALLCHECK=1 I don't want to commit the vcregress.pl part myself. But if you split off I'm happy to push the --temp-config bits. > From 08933bcd93d4f57ad73ab6df2f1627b93e61b459 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 16 Jan 2022 12:51:13 -0600 > Subject: [PATCH 6/7] wip: cirrus/windows: save tmp_install as an artifact.. > > ..to allow users to easily download compiled binaries to try a patch. > If they don't have a development environment handy or not familiar with > compilation. > > XXX: maybe this should be conditional or commented out ? Yea, I don't want to do this by default, that's a fair bit of data that very likely nobody will ever access. One can make entire tasks triggered manually, but that'd then require building again :/. > From a7d2bba6f51d816412fb645b0d4821c36ee5c400 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 20 Feb 2022 15:01:59 -0600 > Subject: [PATCH 7/7] wip: cirrus/windows: add compiler_warnings_script > > I'm not sure how to write this test in windows shell; it's also not easy to > write it in posix sh, since windows shell is somehow interpretting && and ||... You could put the script in src/tools/ci and call it from the script to avoid the quoting issues. Would be good to add a comment explaining the fileLoggerParameters1 thing and a warning that compiler_warnings_script should be the last script. Greetings, Andres Freund
On 2022-02-26 17:09:08 -0800, Andres Freund wrote: > You could put the script in src/tools/ci and call it from the script to avoid > the quoting issues. Might also be a good idea for the bulk of the docs / coverage stuff, even if there are no quoting issues.
On Sat, Feb 26, 2022 at 05:09:08PM -0800, Andres Freund wrote: > > XXX: if this is run in the same task, the configure flags should probably be > > consistent ? > > What do you mean? I mean that commit to run CompilerWarnings unconditionally built docs with different flags than the other stuff in that task. If it's going to be a separate task, then that doesn't matter. > > +# Verify docs can be built, and upload changed docs as artifacts > > +task: > > + name: HTML docs > > + > > + env: > > + CPUS: 1 > > + > > + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*' > > + > > + container: > > + image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest > > + cpu: $CPUS > > + > > how about using something like (the syntax might be slightly off) > skip: !changesInclude('doc/**') > to avoid running it for the many pushes where no docs are changed? This doesn't do the right thing - I just tried. https://cirrus-ci.org/guide/writing-tasks/#environment-variables | changesInclude function can be very useful for skipping some tasks when no changes to sources have been made since thelast successful Cirrus CI build. That means it will not normally rebuild docs (and then this still requires resolving the "base branch"). -- Justin
Hi, On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote: > This doesn't do the right thing - I just tried. > https://cirrus-ci.org/guide/writing-tasks/#environment-variables > | changesInclude function can be very useful for skipping some tasks when no changes to sources have been made since thelast successful Cirrus CI build. > That means it will not normally rebuild docs (and then this still requires > resolving the "base branch"). Why would we want to rebuild docs if they're the same as in the last build for the same branch? For cfbot purposes each commit is independent from the prior commit, so it should rebuild it every time if the CF entry has changes to the docs. Greetings, Andres Freund
On Sat, Feb 26, 2022 at 06:50:00PM -0800, Andres Freund wrote: > Hi, > > On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote: > > This doesn't do the right thing - I just tried. > > https://cirrus-ci.org/guide/writing-tasks/#environment-variables > > | changesInclude function can be very useful for skipping some tasks when no changes to sources have been made sincethe last successful Cirrus CI build. > > > That means it will not normally rebuild docs (and then this still requires > > resolving the "base branch"). > > Why would we want to rebuild docs if they're the same as in the last build for > the same branch? For cfbot purposes each commit is independent from the prior > commit, so it should rebuild it every time if the CF entry has changes to the > docs. I did git commit --amend --no-edit and repushed to github to trigger a new CI run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714 This is in a branch with changes to doc. I wasn't intending it to skip building docs on this branch just because the same, changed docs were previously built. Why wouldn't the docs be built following the same logic as the rest of the sources ? If someone renames or removes an xref target, shouldn't CI fail on its next run for a patch which tries to reference it ? It would fail on the buildfarm, and I think one major use for the CI is to minimize the post-push cleanup cycles. Are you sure about cfbot ? AIUI cirrus would see that docs didn't change relative to the previous run for branch: commitfest/NN/MMMM. -- Justin
Hi, On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote: > I did git commit --amend --no-edit and repushed to github to trigger a new CI > run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714 > > This is in a branch with changes to doc. I wasn't intending it to skip > building docs on this branch just because the same, changed docs were > previously built. But why not? If nothing in docs/ changes, there's little point? It'd probably be good to check .cirrus.yml and docs/**, to make sure that .cirrus logic changes rerun as well. > Why wouldn't the docs be built following the same logic as the rest of the > sources? Tests have a certain rate of spurious failure, so rerunning them makes sense. But what do we gain by rebuilding the docs? And especially, what do we gain about uploading the docs e.g. in the postgres/postgres repo? > If someone renames or removes an xref target, shouldn't CI fail on its next > run for a patch which tries to reference it ? Why wouldn't it? > It would fail on the buildfarm, and I think one major use for the CI is to > minimize the post-push cleanup cycles. I personally see it more as access to a "mini buildfarm" before patches are committable, but that's of course compatible. > Are you sure about cfbot ? AIUI cirrus would see that docs didn't change > relative to the previous run for branch: commitfest/NN/MMMM. Not entirely sure, but it's what I remember observing when ammending commits in a repo using changesInclues. If I were to build a feature like it I'd look at the list of files of git diff $(git merge-base last_green new_commit)..new_commit or such. cfbot doesn't commit incrementally but replaces the prior commit, so I suspect it'll always be viewn as new. But who knows, shouldn't be hard to figure out. Greetings, Andres Freund
On Sat, Feb 26, 2022 at 08:08:38PM -0800, Andres Freund wrote: > On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote: > > If someone renames or removes an xref target, shouldn't CI fail on its next > > run for a patch which tries to reference it ? > > Why wouldn't it? I suppose you're right - I was thinking that cirrus was checking whether the *patch* had changed any matching files, but it probably checks (as it should) whether "the sources" have changed. Hmm, it's behaving strangely...if there's a single argument ('docs/**'), then it will skip the docs task if I resubmit it after git commit --amend --no-edit. But with multiple args ('.cirrus.yaml', 'docs/**') it reruns it... I tried it as skip: !changesInclude() and by adding it to the existing only_if: (.. || ..) && changesInclude(). > > Are you sure about cfbot ? AIUI cirrus would see that docs didn't change > > relative to the previous run for branch: commitfest/NN/MMMM. > > Not entirely sure, but it's what I remember observing when ammending commits > in a repo using changesInclues. If I were to build a feature like it I'd look > at the list of files of > git diff $(git merge-base last_green new_commit)..new_commit > > or such. cfbot doesn't commit incrementally but replaces the prior commit, so > I suspect it'll always be viewn as new. But who knows, shouldn't be hard to > figure out. Anyway... I still think that if "Build Docs" is a separate cirrus task, it should rebuild docs on every CI run, even if they haven't changed, for any patch that touches docs/. It'll be confusing if cfbot shows 5 green circles and 4 of them were built 1 day ago, and 1 was built 3 weeks ago. Docs are the task that runs quickest, so I don't think it's worth doing anything special there (especially without understanding the behavior of changesInclude()). Also, to allow users to view the built HTML docs, cfbot would need to 1) keep track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow showing artifacts from the previous run. If it's not already done, I think the first half is a good idea on its own. But the 2nd part doesn't seem desirable. However, I realized that we can filter on cfbot with either of these: | $CIRRUS_CHANGE_TITLE =~ '^\[CF...' | git log -1 |grep '^Author: Commitfest Bot <cfbot@cputube.org>' If we can assume that cfbot will continue submitting branches as a single patch, this resolves the question of a "base branch", for cfbot. (Actually, I'd prefer if it preserved the original patches as separate commits, but that isn't what it does). These patches implement that idea, and make "code coverage" and "HTML diffs" stuff only run for cfbot commits. This still needs another round of testing, though. -- Justin PS. I've just done this. I'm unsure whether to say that it's wonderful or terrible. This would certainly be better if each branch preserved the original set of patches. $ git remote add cfbot https://github.com/postgresql-cfbot/postgresql $ git fetch cfbot $ git branch -a |grep -c cfbot 5417
Вложения
On Mon, Feb 28, 2022 at 02:58:02PM -0600, Justin Pryzby wrote: > I still think that if "Build Docs" is a separate cirrus task, it should rebuild > docs on every CI run, even if they haven't changed, for any patch that touches > docs/. It'll be confusing if cfbot shows 5 green circles and 4 of them were > built 1 day ago, and 1 was built 3 weeks ago. Docs are the task that runs > quickest, so I don't think it's worth doing anything special there (especially > without understanding the behavior of changesInclude()). > > Also, to allow users to view the built HTML docs, cfbot would need to 1) keep > track of previous CI runs; and 2) logic to handle "skipped" CI runs, to allow > showing artifacts from the previous run. If it's not already done, I think the > first half is a good idea on its own. But the 2nd part doesn't seem desirable. Maybe changesInclude() could work if we use this URL (from cirrus' documentation), which uses the artifacts from the last successful build. https://api.cirrus-ci.com/v1/artifact/github/justinpryzby/postgres/Documentation/html_docs/html_docs/00-doc.html?branch=citest-cirrus2 That requires knowing the file being modified, so we'd have to generate an index of changed files - which I've started doing here. > However, I realized that we can filter on cfbot with either of these: > | $CIRRUS_CHANGE_TITLE =~ '^\[CF...' > | git log -1 |grep '^Author: Commitfest Bot <cfbot@cputube.org>' > If we can assume that cfbot will continue submitting branches as a single > patch, this resolves the question of a "base branch", for cfbot. I don't know what you think of that idea, but I think I want to amend my proposal: show HTML and coverage artifacts for HEAD~1, unless set otherwise by an environment var. Today, that'd do the right thing for cfbot, and also for any 1-patch commits. > These patches implement that idea, and make "code coverage" and "HTML diffs" > stuff only run for cfbot commits. This still needs another round of testing, > though. The patch was missing a file due to an issue while rebasing - oops. BTW (regarding the last patch), I just noticed that -Og optimization can cause warnings with gcc-4.8.5-39.el7.x86_64. be-fsstubs.c: In function 'be_lo_export': be-fsstubs.c:522:24: warning: 'fd' may be used uninitialized in this function [-Wmaybe-uninitialized] if (CloseTransientFile(fd) != 0) ^ trigger.c: In function 'ExecCallTriggerFunc': trigger.c:2400:2: warning: 'result' may be used uninitialized in this function [-Wmaybe-uninitialized] return (HeapTuple) DatumGetPointer(result); ^ xml.c: In function 'xml_pstrdup_and_free': xml.c:1205:2: warning: 'result' may be used uninitialized in this function [-Wmaybe-uninitialized] return result; -- Justin
Вложения
- 0001-cirrus-include-hints-how-to-install-OS-packages.patch
- 0002-cirrus-build-docs-as-a-separate-task.patch
- 0003-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0004-f-html-index-file.patch
- 0005-wip-cirrus-code-coverage.patch
- 0006-wip-cirrus-windows-add-compiler_warnings_script.patch
- 0007-cirrus-compile-with-Og.patch
On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote: > On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote: > > Have you tried to use the yet-to-be-released ccache with MSVC ? > > Yes, it doesn't work, because it requires cl.exe to be used in a specific way > (only a single input file, specific output file naming). Which would require a > decent amount of changes to src/tools/msvc. I think it's more realistic with > meson etc. Did you get to the point that that causes a problem, or did you just realize that it was a limitation that seems to preclude its use ? If so, could you send the branch/commit you had ? The error I'm getting when I try to use ccache involves .rst files, which don't exist (and which ccache doesn't know how to find or ignore). https://cirrus-ci.com/task/5441491957972992 I gather this is the difference between "compiling with MSVC" and compiling with a visual studio project.
Hi, On 2022-03-03 22:56:15 -0600, Justin Pryzby wrote: > On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote: > > On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote: > > > Have you tried to use the yet-to-be-released ccache with MSVC ? > > > > Yes, it doesn't work, because it requires cl.exe to be used in a specific way > > (only a single input file, specific output file naming). Which would require a > > decent amount of changes to src/tools/msvc. I think it's more realistic with > > meson etc. > > Did you get to the point that that causes a problem, or did you just realize > that it was a limitation that seems to preclude its use ? I tried to use it, but saw that no caching was happening, and debugged it. Which yielded that it can't be used due to the way output files are specified (and due to multiple files, but that can be prevented with an msbuild parameter). > If so, could you send the branch/commit you had ? This was in a local VM, not cirrus. I ended up making it work with the meson build, after a bit of fiddling. Although bypassing msbuild (by building with ninja, using cl.exe) is a larger win... > The error I'm getting when I try to use ccache involves .rst files, which don't > exist (and which ccache doesn't know how to find or ignore). > https://cirrus-ci.com/task/5441491957972992 ccache has code to deal with response files. I suspect the problem here is rather that ccache expects the compiler as an argument. > I gather this is the difference between "compiling with MSVC" and compiling > with a visual studio project. I doubt it's related to that. Greetings, Andres Freund
On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote: > I tried to use it, but saw that no caching was happening, and debugged > it. Which yielded that it can't be used due to the way output files are > specified (and due to multiple files, but that can be prevented with an > msbuild parameter). Could you give a hint about to the msbuild param to avoid processing multiple files with cl.exe? I'm not able to find it... I don't know about the issue with output filenames .. -- Justin
Hi, On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote: > On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote: > > I tried to use it, but saw that no caching was happening, and debugged > > it. Which yielded that it can't be used due to the way output files are > > specified (and due to multiple files, but that can be prevented with an > > msbuild parameter). > > Could you give a hint about to the msbuild param to avoid processing multiple > files with cl.exe? I'm not able to find it... /p:UseMultiToolTask=true > I don't know about the issue with output filenames .. I don't remember the precise details anymore, but it boils down to ccache requiring the output filename to be specified, but we only specify the output directory. Or very similar. Greetings, Andres Freund
On Mon, Mar 07, 2022 at 11:10:54AM -0800, Andres Freund wrote: > On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote: > > On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote: > > > I tried to use it, but saw that no caching was happening, and debugged > > > it. Which yielded that it can't be used due to the way output files are > > > specified (and due to multiple files, but that can be prevented with an > > > msbuild parameter). > > > > Could you give a hint about to the msbuild param to avoid processing multiple > > files with cl.exe? I'm not able to find it... > > /p:UseMultiToolTask=true Wow - thanks ;) > > I don't know about the issue with output filenames .. > > I don't remember the precise details anymore, but it boils down to ccache > requiring the output filename to be specified, but we only specify the output > directory. Or very similar. There's already a problem report and PR for this. I didn't test it, but I hope it'll be fixed in their next minor release. https://github.com/ccache/ccache/issues/1018 -- Justin
I'm curious what you think of this patch. It makes check-world on freebsd over 30% faster - saving 5min. Apparently gcc -Og was added in gcc 4.8 (c. 2013). On Wed, Mar 02, 2022 at 02:50:58PM -0600, Justin Pryzby wrote: > From d180953d273c221a30c5e9ad8d74b1b4dfc60bd1 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 27 Feb 2022 15:17:50 -0600 > Subject: [PATCH 7/7] cirrus: compile with -Og.. > > To improve performance of check-world, and improve debugging, without > significantly slower builds (they're cached anyway). > > This makes freebsd check-world run in 8.5 minutes rather than 15 minutes. > --- > .cirrus.yml | 12 +++++++----- > src/tools/msvc/MSBuildProject.pm | 4 ++-- > 2 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/.cirrus.yml b/.cirrus.yml > index 6f05d420c85..8b673bf58cf 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -113,7 +113,7 @@ task: > \ > CC="ccache cc" \ > CXX="ccache c++" \ > - CFLAGS="-O0 -ggdb" > + CFLAGS="-Og -ggdb" > EOF > build_script: su postgres -c "gmake -s -j${BUILD_JOBS} world-bin" > upload_caches: ccache > @@ -208,8 +208,8 @@ task: > CC="ccache gcc" \ > CXX="ccache g++" \ > CLANG="ccache clang" \ > - CFLAGS="-O0 -ggdb" \ > - CXXFLAGS="-O0 -ggdb" > + CFLAGS="-Og -ggdb" \ > + CXXFLAGS="-Og -ggdb" > EOF > build_script: su postgres -c "make -s -j${BUILD_JOBS} world-bin" > upload_caches: ccache > @@ -329,8 +329,8 @@ task: > CC="ccache cc" \ > CXX="ccache c++" \ > CLANG="ccache ${brewpath}/llvm/bin/ccache" \ > - CFLAGS="-O0 -ggdb" \ > - CXXFLAGS="-O0 -ggdb" \ > + CFLAGS="-Og -ggdb" \ > + CXXFLAGS="-Og -ggdb" \ > \ > LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \ > PYTHON=python3 > @@ -383,6 +383,8 @@ task: > # -fileLoggerParameters1: write to msbuild.warn.log. > MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log > > + MSBUILD_OPTIMIZE: MaxSpeed > + > # If tests hang forever, cirrus eventually times out. In that case log > # output etc is not uploaded, making the problem hard to debug. Of course > # tests internally should have shorter timeouts, but that's proven to not > diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm > index 5e312d232e9..05e0c41eb5c 100644 > --- a/src/tools/msvc/MSBuildProject.pm > +++ b/src/tools/msvc/MSBuildProject.pm > @@ -85,7 +85,7 @@ EOF > $f, 'Debug', > { > defs => "_DEBUG;DEBUG=1", > - opt => 'Disabled', > + opt => $ENV{MSBUILD_OPTIMIZE} || 'Disabled', > strpool => 'false', > runtime => 'MultiThreadedDebugDLL' > }); > @@ -94,7 +94,7 @@ EOF > 'Release', > { > defs => "", > - opt => 'Full', > + opt => $ENV{MSBUILD_OPTIMIZE} || 'Full', > strpool => 'true', > runtime => 'MultiThreadedDLL' > }); > -- > 2.17.1 >
Hi, On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote: > I'm curious what you think of this patch. > > It makes check-world on freebsd over 30% faster - saving 5min. That's nice! While -Og makes interactive debugging noticeably harder IME, it's not likely to be a large enough difference just for backtraces etc. I'm far less convinced that using "MaxSpeed" for the msvc build is a good idea. I've looked at one or two backtraces of optimized msvc builds and backtraces were quite a bit worse - and they're not great to start with. What was the win there? Greetings, Andres Freund
On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote: > On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote: > > I'm curious what you think of this patch. > > > > It makes check-world on freebsd over 30% faster - saving 5min. > > That's nice! While -Og makes interactive debugging noticeably harder IME, it's > not likely to be a large enough difference just for backtraces etc. Yeah. gcc(1) claims that -Og can improve debugging. I should've mentioned that this seems to mitigate the performance effect of --coverage on linux, too. > I'm far less convinced that using "MaxSpeed" for the msvc build is a good > idea. I've looked at one or two backtraces of optimized msvc builds and > backtraces were quite a bit worse - and they're not great to start with. What > was the win there? Did you compare FULL optimization or "favor speed/size" or "default" optimization ? It's worth trading some some build time (especially with a compiler cache) for test time (especially with alltaptests). But I didn't check backtraces, and I didn't compare the various optimization options. The argument may not be as strong for windows, since it has no build cache (and it has no -Og). We'd save a bit more when also running the other tap tests. CI runs are probably not very consistent, but I've just run https://cirrus-ci.com/task/5236145167532032 master is the average of 4 patches at the top of cfbot. / master / patched / change subscription / 197s / 195s / +2s recovery / 234s / 212s / -22s bin / 383s / 373s / -10s -- Justin
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote: > > On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote: > > > I'm curious what you think of this patch. > > > > > > It makes check-world on freebsd over 30% faster - saving 5min. > > > > That's nice! While -Og makes interactive debugging noticeably harder IME, it's > > not likely to be a large enough difference just for backtraces etc. > > Yeah. gcc(1) claims that -Og can improve debugging. Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of 12:43 when I tried (terrible absolute times, but fantastic improvement!). Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03 with this change. My working theory had been that there is something bad happening in the I/O stack under concurrency making FreeBSD on Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits, something I hope to dig into when post-freeze round tuits present themselves), but that doesn't gel with this huge improvement from noodling with optimiser details, and I don't know why I don't see something similar locally. I'm confused. Just BTW it's kinda funny that we say -ggdb for macOS and then we use lldb to debug cores in cores_backtrace.sh. I suppose it would be more correct to say -glldb, but doubt it matters much...
Hi, On 2022-03-10 15:43:16 +1300, Thomas Munro wrote: > Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of > 12:43 when I tried (terrible absolute times, but fantastic > improvement!). Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03 > with this change. My working theory had been that there is something > bad happening in the I/O stack under concurrency making FreeBSD on > Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits, > something I hope to dig into when post-freeze round tuits present > themselves), but that doesn't gel with this huge improvement from > noodling with optimiser details, and I don't know why I don't see > something similar locally. I'm confused. The "terrible IO wait" thing was before we reduced the number of CPUs and concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound, in which case -Og obviously can make a difference. > Just BTW it's kinda funny that we say -ggdb for macOS and then we use > lldb to debug cores in cores_backtrace.sh. I suppose it would be more > correct to say -glldb, but doubt it matters much... Yea. I used -ggdb because I didn't know -glldb existed :). And there's also the advantage that -ggdb works both with gcc and clang, whereas -glldb doesn't seem to be known to gcc. Greetings, Andres Freund
On Thu, Mar 10, 2022 at 4:33 PM Andres Freund <andres@anarazel.de> wrote: > On 2022-03-10 15:43:16 +1300, Thomas Munro wrote: > > I'm confused. > > The "terrible IO wait" thing was before we reduced the number of CPUs and > concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound, > in which case -Og obviously can make a difference. Oh, duh, yeah, that makes sense when you put it that way and considering the CPU graph: -O0: https://cirrus-ci.com/task/4578631912521728 -Og: https://cirrus-ci.com/task/5239486182326272
Hi, On 2022-03-02 14:50:58 -0600, Justin Pryzby wrote: > From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Mon, 17 Jan 2022 00:53:04 -0600 > Subject: [PATCH 1/7] cirrus: include hints how to install OS packages.. > > This is useful for patches during development, but once a feature is merged, > new libraries should be added to the OS image files, rather than installed > during every CI run forever into the future. > --- > .cirrus.yml | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/.cirrus.yml b/.cirrus.yml > index d10b0a82f9f..1b7c36283e9 100644 > --- a/.cirrus.yml > +++ b/.cirrus.yml > @@ -73,10 +73,11 @@ task: > chown -R postgres:postgres . > mkdir -p ${CCACHE_DIR} > chown -R postgres:postgres ${CCACHE_DIR} > - setup_cores_script: | > + setup_os_script: | > mkdir -m 770 /tmp/cores > chown root:postgres /tmp/cores > sysctl kern.corefile='/tmp/cores/%N.%P.core' > + #pkg install -y ... Would you mind if I split this into setup_core_files_script and setup_additional_packages_script:? > + # The commit that this branch is rebased on. There's no easy way to find this. > + # This does the right thing for cfbot, which always squishes all patches into a single commit. > + # And does the right thing for any 1-patch commits. > + # Patches series manually submitted to cirrus may benefit from setting this > + # to the number of patches in the series (or directly to the commit the series was rebased on). > + BASE_COMMIT: HEAD~1 Still think that something using git merge-base $CIRRUS_LAST_GREEN_CHANGE HEAD might be better. With a bit of error handling for unset CIRRUS_LAST_GREEN_CHANGE and for git not seeing enough history for CIRRUS_LAST_GREEN_CHANGE. > + apt-get update > + apt-get -y install lcov FWIW, I just added that to the install script used for the container / VM image build. So it'll be pre-installed once that completes. https://cirrus-ci.com/build/5818788821073920 > From feceea4413b84f478e6a0888cdfab4be1c80767a Mon Sep 17 00:00:00 2001 > From: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sun, 20 Feb 2022 15:01:59 -0600 > Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script > > I'm not sure how to write this test in windows shell; it's also not easy to > write it in posix sh, since windows shell is somehow interpretting && and ||... That comment isn't accurate anymore now that it's in an external script, right? Greetings, Andres Freund
On Thu, Mar 10, 2022 at 12:50:15PM -0800, Andres Freund wrote: > > - setup_cores_script: | > > + setup_os_script: | > > mkdir -m 770 /tmp/cores > > chown root:postgres /tmp/cores > > sysctl kern.corefile='/tmp/cores/%N.%P.core' > > + #pkg install -y ... > > Would you mind if I split this into setup_core_files_script and > setup_additional_packages_script:? That's fine. FYI I'm also planning on using choco install --no-progress I could resend my latest patches shortly. > > Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script > > > > I'm not sure how to write this test in windows shell; it's also not easy to > > write it in posix sh, since windows shell is somehow interpretting && and ||... > > That comment isn't accurate anymore now that it's in an external script, > right? No, it is accurate. What I mean is that it's also hard to write it as a 1-liner using posix sh, since the || (and &&) seemed to be interpretted by cmd.exe and needed escaping - gross. -- Justin
See attached, or at https://github.com/justinpryzby/postgres/runs/5503079878
Вложения
- 0001-cirrus-include-hints-how-to-install-OS-packages.patch
- 0002-cirrus-compile-with-Og.patch
- 0003-cirrus-windows-add-compiler_warnings_script.patch
- 0004-cirrus-code-coverage.patch
- 0005-cirrus-build-docs-as-a-separate-task.patch
- 0006-cirrus-upload-changed-html-docs-as-artifacts.patch
- 0007-f-html-index-file.patch
Hi, Pushed 0001, 0002. Only change I made was to add DEBIAN_FRONTEND=noninteractive to the apt-get invocations, because some packages will fail / complain verbosely if there's no interactive prompt during installation. Greetings, Andres Freund
On Fri, Mar 18, 2022 at 03:45:03PM -0700, Andres Freund wrote: > Pushed 0001, 0002. Only change I made was to add Thanks - is there any reason not to do the MSVC compiler warnings one, too ? I see that it'll warn about issues with at least 3 patches (including one of yours). -- Justin
Hi, On 2022-03-22 23:14:23 -0500, Justin Pryzby wrote: > On Fri, Mar 18, 2022 at 03:45:03PM -0700, Andres Freund wrote: > > Pushed 0001, 0002. Only change I made was to add > > Thanks - is there any reason not to do the MSVC compiler warnings one, too ? Purely a lack of round tuits. IIRC I thought there was a small aspect that still needed some polishing, but didn't have time to tackle it yet. Greetings, Andres Freund
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > -Og Adding this to CXXFLAGS caused a torrent of warnings from g++ about LLVM headers, which I also see on my local system for LLVM 11 and LLVM 14: [19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h: In member function ‘llvm::CallInst* llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, const llvm::Twine&, llvm::MDNode*)’: [19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h:229:16: warning: ‘<anonymous>.llvm::Twine::LHS.llvm::Twine::Child::twine’ may be used uninitialized in this function [-Wmaybe-uninitialized] [19:47:11.047] 229 | !LHS.twine->isBinary()) [19:47:11.047] | ~~~~^~~~~ https://cirrus-ci.com/task/5597526098182144?logs=build#L6 Not sure who to complain to about that...
On Thu, Mar 24, 2022 at 09:52:39AM +1300, Thomas Munro wrote: > On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > -Og > > Adding this to CXXFLAGS caused a torrent of warnings from g++ about > LLVM headers, which I also see on my local system for LLVM 11 and LLVM > 14: Yes, I mentioned seeing some other warnings here. 20220302205058.GJ15744@telsasoft.com I think warnings were cleaned up with -O0/1/2 but not with -Og.
Hi, Now that zstd is used, enable it in CI. I plan to commit this shortly, unless somebody sees a reason not to do so. Greetings, Andres Freund
Вложения
On Wed, Oct 6, 2021 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote: > BTW, on those two OSes there are some messages like this each time a > submake dumps its output to the log: > > [03:36:16.591] fcntl(): Bad file descriptor > > It seems worth putting up with these compared to the alternatives of > either not using -j, not using -Otarget and having the output of > parallel tests all mashed up and unreadable (that still happen > sometimes but it's unlikely, because the submakes write() whole output > chunks at infrequent intervals), or redirecting to a file so you can't > see the realtime test output on the main CI page (not so fun, you have > to wait until it's finished and view it as an 'artifact'). I tried to > write a patch for GNU make to fix that[1], let's see if something > happens. > > [1] https://savannah.gnu.org/bugs/?52922 For the record, GNU make finally fixed this problem (though Andres found a workaround anyway), but in any case it won't be in the relevant package repos before we switch over to Meson/Ninja :-)