Обсуждение: Bogus reports from coverage.postgresql.org
I happened to notice that if you look at https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html it claims that the two occurrences of elog(ERROR, "predicate_classify returned a bogus value"); at lines 492 and 803 are reached. This would be quite astonishing if true, but it isn't true, since the regression tests aren't reporting any such failure. Needless to say, this puts quite a damper on my faith in the coverage reports. Maybe that machine needs a software update? regards, tom lane
On 2018-03-09 19:36:28 -0500, Tom Lane wrote: > I happened to notice that if you look at > https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html > it claims that the two occurrences of > elog(ERROR, "predicate_classify returned a bogus value"); > at lines 492 and 803 are reached. > > This would be quite astonishing if true, but it isn't true, since > the regression tests aren't reporting any such failure. > > Needless to say, this puts quite a damper on my faith in the coverage > reports. Maybe that machine needs a software update? Is it possible that the compiler figured that the patch is unreachable and removed the code? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-03-09 19:36:28 -0500, Tom Lane wrote: >> I happened to notice that if you look at >> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html >> it claims that the two occurrences of >> elog(ERROR, "predicate_classify returned a bogus value"); >> at lines 492 and 803 are reached. > Is it possible that the compiler figured that the patch is unreachable > and removed the code? Maybe, but then it's even less reasonable to claim that the lines were reached. In general, lcov seems to ignore lines for which no code was generated, which is fine by me. regards, tom lane
Tom Lane wrote: > I happened to notice that if you look at > https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html > it claims that the two occurrences of > elog(ERROR, "predicate_classify returned a bogus value"); > at lines 492 and 803 are reached. > > This would be quite astonishing if true, but it isn't true, since > the regression tests aren't reporting any such failure. > > Needless to say, this puts quite a damper on my faith in the coverage > reports. Maybe that machine needs a software update? Hmm, wow. The count for that line is the same as the count at function entrance, so maybe it is being considered as the normal output path of the function somehow. I suspect this is a bug in gcov. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/9/18 19:36, Tom Lane wrote: > I happened to notice that if you look at > https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html > it claims that the two occurrences of > elog(ERROR, "predicate_classify returned a bogus value"); > at lines 492 and 803 are reached. > > This would be quite astonishing if true, but it isn't true, since > the regression tests aren't reporting any such failure. > > Needless to say, this puts quite a damper on my faith in the coverage > reports. Maybe that machine needs a software update? I can't reproduce this locally, so yeah, maybe we should check whether the toolchain could be updated. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Mar 11, 2018 at 3:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/9/18 19:36, Tom Lane wrote:
> I happened to notice that if you look at
> https://coverage.postgresql.org/src/backend/optimizer/ util/predtest.c.gcov.html
> it claims that the two occurrences of
> elog(ERROR, "predicate_classify returned a bogus value");
> at lines 492 and 803 are reached.
>
> This would be quite astonishing if true, but it isn't true, since
> the regression tests aren't reporting any such failure.
>
> Needless to say, this puts quite a damper on my faith in the coverage
> reports. Maybe that machine needs a software update?
I can't reproduce this locally, so yeah, maybe we should check whether
the toolchain could be updated.
Is it specifically "lcov" we are considering here? If so, we are on version 1.13, and there is no newer version packaged for debian or from what I can tell even released?
Magnus Hagander <magnus@hagander.net> writes: > On Sun, Mar 11, 2018 at 3:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> On 3/9/18 19:36, Tom Lane wrote: >>> I happened to notice that if you look at >>> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html >>> it claims that the two occurrences of >>> elog(ERROR, "predicate_classify returned a bogus value"); >>> at lines 492 and 803 are reached. >> I can't reproduce this locally, so yeah, maybe we should check whether >> the toolchain could be updated. Now that I look, I don't see this behavior on my local machine (RHEL6, gcc 4.4.7, lcov 1.10) either. > Is it specifically "lcov" we are considering here? If so, we are on version > 1.13, and there is no newer version packaged for debian or from what I can > tell even released? It could also be a compiler issue, I believe. regards, tom lane
I wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Is it specifically "lcov" we are considering here? If so, we are on version >> 1.13, and there is no newer version packaged for debian or from what I can >> tell even released? > It could also be a compiler issue, I believe. Actually, looking closer, it appears that lcov is just translating gcov's textual output into HTML, so that the error is probably to be blamed on gcc or gcov. And at least on RHEL, those come from the same package: $ rpm -qf /usr/bin/gcov gcc-4.4.7-18.el6.x86_64 So if you're up2date on gcc, we might have material for a gcc bug report here. I'll try to reproduce it on recent Fedora, and if so file something at Red Hat. regards, tom lane
On Sun, Mar 11, 2018 at 6:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Is it specifically "lcov" we are considering here? If so, we are on version
>> 1.13, and there is no newer version packaged for debian or from what I can
>> tell even released?
> It could also be a compiler issue, I believe.
Actually, looking closer, it appears that lcov is just translating
gcov's textual output into HTML, so that the error is probably to
be blamed on gcc or gcov. And at least on RHEL, those come from
the same package:
$ rpm -qf /usr/bin/gcov
gcc-4.4.7-18.el6.x86_64
So if you're up2date on gcc, we might have material for a gcc bug report
here. I'll try to reproduce it on recent Fedora, and if so file something
at Red Hat.
The current gcc version in debin stable is:
mha@galvin:~$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
(and yes, gcov is in the gcc package on Debian as well)
Magnus Hagander <magnus@hagander.net> writes: > On Sun, Mar 11, 2018 at 6:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So if you're up2date on gcc, we might have material for a gcc bug report >> here. I'll try to reproduce it on recent Fedora, and if so file something >> at Red Hat. > The current gcc version in debin stable is: > mha@galvin:~$ gcc --version > gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Hm. The closest I'm going to be able to get on Red Hat distros is one of f22/gcc.spec:%global gcc_version 5.3.1 f23/gcc.spec:%global gcc_version 5.3.1 f24/gcc.spec:%global gcc_version 6.3.1 f25/gcc.spec:%global gcc_version 6.4.1 f26/gcc.spec:%global gcc_version 7.3.1 f27/gcc.spec:%global gcc_version 7.3.1 f28/gcc.spec:%global gcc_version 8.0.1 I'll try this on 7.3.1, since I already have that on a handy laptop. But if it doesn't reproduce there I think the answer is going to be that Debian is shipping an old buggy gcov. regards, tom lane
I wrote: > I'll try this on 7.3.1, since I already have that on a handy laptop. > But if it doesn't reproduce there I think the answer is going to be > that Debian is shipping an old buggy gcov. Huh. 7.3.1 shows the bug. I'll install F28 and see if it's still there. Peter, what gcc/gcov version did you test? regards, tom lane
I wrote: > Huh. 7.3.1 shows the bug. Wait, scratch that, I misread it. The gcov output on 7.3.1 looks like 4127: 478: case CLASS_ATOM: -: 479: -: 480: /* -: 481: * atom => atom is the base case -: 482: */ -: 483: return 4127: 484: predicate_implied_by_simple_clause((Expr *\ ) predicate, call 0 returned 100% -: 485: \ clause, -: 486: \ weak); -: 487: } #####: 488: break; -: 489: } -: 490: -: 491: /* can't get here */ #####: 492: elog(ERROR, "predicate_classify returned a bogus value"); call 0 never executed call 1 never executed call 2 never executed -: 493: return false; -: 494:} -: 495: -: 496:/*---------- -: 497: * predicate_refuted_by_recurse which is correct: it's showing that the elog line is never executed. So apparently, this bug is specific to gcc 6.3.0 and maybe a few versions on either side of that. regards, tom lane
On 3/11/18 15:24, Tom Lane wrote: > So apparently, this bug is specific to gcc 6.3.0 and maybe a few > versions on either side of that. I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it. cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Mar 11, 2018 at 8:52 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/11/18 15:24, Tom Lane wrote:
> So apparently, this bug is specific to gcc 6.3.0 and maybe a few
> versions on either side of that.
I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it.
cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Hmm. So it works for you on exactly the same version that the server runs. That's interesting. Is there something wrong in how we run it? I wonder if its broken by ccache. What we have is:
# current ccache is useless, because 3.1 doesn't work with -fprofile-arcs.
# Bug is fixed in 3.2, so maybe someday when this is updated to a new debian
# release, this will be useful.
export PATH=/usr/lib/ccache:$PATH
./configure --enable-depend --enable-coverage --enable-tap-tests --enable-nls --with-python --with-perl --with-tcl --with-openssl --with-libxml --with-ldap --with-pam >> $LOG 2>&1
# run tests
make -j4 >> $LOG 2>&1
make -j4 -C contrib >> $LOG 2>&1
make check-world >> $LOG 2>&1
make -C src/test/ssl check >> $LOG 2>&1
make -C src/test/ldap check >> $LOG 2>&1
make coverage-html >> $LOG 2>&1
We do now have ccache 3.3.4, so that comment sounds like it's out of date. But maybe something else broke...
Does that sound likely? We could try to comment out that line and see if it helps?
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it. > cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Oh ... *that's* interesting. Are we sure this is the version in use on coverage.postgresql.org? Also, I just finished searching in vain for any plausibly-matching bug at https://gcc.gnu.org/bugzilla/. I did find a few reports suggesting that gcov could misbehave with nondefault build options such as LTO. And "gcov -a" has also had issues. So maybe we ought to have a look at exactly what coverage.postgresql.org's build recipe is. regards, tom lane
Magnus Hagander <magnus@hagander.net> writes: > Hmm. So it works for you on exactly the same version that the server runs. > That's interesting. Is there something wrong in how we run it? I wonder if > its broken by ccache. What we have is: It might be worth blowing away ccache's cache just to see if it's cached anything bogus. regards, tom lane
On Sun, Mar 11, 2018 at 9:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Hmm. So it works for you on exactly the same version that the server runs.
> That's interesting. Is there something wrong in how we run it? I wonder if
> its broken by ccache. What we have is:
It might be worth blowing away ccache's cache just to see if it's cached
anything bogus.
I've done that and completed a run. AFAICT it's still showing the same issue, right?
Magnus Hagander <magnus@hagander.net> writes: > On Sun, Mar 11, 2018 at 9:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It might be worth blowing away ccache's cache just to see if it's cached >> anything bogus. > I've done that and completed a run. AFAICT it's still showing the same > issue, right? Yeah, looks like it to me. 'Tis a puzzlement. regards, tom lane
On 3/11/18 16:05, Tom Lane wrote: > It might be worth blowing away ccache's cache just to see if it's cached > anything bogus. --enable-coverage disables ccache, so that's not going to be it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 12, 2018 at 3:53 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/11/18 16:05, Tom Lane wrote:
> It might be worth blowing away ccache's cache just to see if it's cached
> anything bogus.
--enable-coverage disables ccache, so that's not going to be it.
That certainly makes that coding put into the script (which has been there for a long time) completely wrong :)
That said, it completed with that row commented out, and AFAICT the issue is still lthere :( So much for that theory.
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > > I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it. > > cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 > > Oh ... *that's* interesting. Are we sure this is the version in use > on coverage.postgresql.org? Yes, it's the same, but this isn't a gcov issue, I don't think. > Also, I just finished searching in vain for any plausibly-matching > bug at https://gcc.gnu.org/bugzilla/. I did find a few reports > suggesting that gcov could misbehave with nondefault build options > such as LTO. And "gcov -a" has also had issues. So maybe we ought > to have a look at exactly what coverage.postgresql.org's build > recipe is. Running gcov on the box itself in the source dir shows the same results which you got: --- -: 487: } #####: 488: break; -: 489: } -: 490: -: 491: /* can't get here */ #####: 492: elog(ERROR, "predicate_classify returned a bogus value"); -: 493: return false; -: 494:} --- Which seems to indicate that this actually is some kind of lcov bug. That makes more sense too since lcov is 1.13 on coverage.p.o, but you used 1.10 and said you didn't see an issue. Not sure if you have access to 1.13 easily, but if so, could be useful to see if you're seeing the same behavior under 1.13 where the gcov output is correct but lcov output isn't. Thanks! Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > Running gcov on the box itself in the source dir shows the same results > which you got: Oh really! > Which seems to indicate that this actually is some kind of lcov bug. Yeah. I'd written that off as too low-probability to worry about, but maybe not. I'll poke further. Peter, which lcov did you test? regards, tom lane
Stephen Frost <sfrost@snowman.net> writes: > Which seems to indicate that this actually is some kind of lcov bug. > That makes more sense too since lcov is 1.13 on coverage.p.o, but you > used 1.10 and said you didn't see an issue. Not sure if you have access > to 1.13 easily, but if so, could be useful to see if you're seeing the > same behavior under 1.13 where the gcov output is correct but lcov > output isn't. Hpmh. So I tested Fedora 26, which has lcov 1.12, and that seems fine. Then I installed pre-release Fedora 28, which has lcov 1.13, and that does not work at all: $ make coverage-html /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info geninfo: WARNING: /home/tgl/pgsql/src/backend/catalog/aclchk.gcno: Overlong record at end of file! geninfo: WARNING: /home/tgl/pgsql/src/backend/catalog/pg_subscription.gcno: Overlong record at end of file! geninfo: WARNING: /home/tgl/pgsql/src/backend/catalog/pg_db_role_setting.gcno: Overlong record at end of file! ... lots more ... geninfo: WARNING: /home/tgl/pgsql/src/backend/storage/lmgr/predicate.gcno: Overlong record at end of file! geninfo: ERROR: /home/tgl/pgsql/src/backend/storage/lmgr/lwlocknames.gcno: reached unexpected end of file make: *** [src/Makefile.global:911: lcov_base.info] Error 255 make: *** Deleting file 'lcov_base.info' Eyeing the commit history at https://github.com/linux-test-project/lcov/commits/master makes it appear that lcov doesn't work with gcc 8.0 yet (well, they committed support last week, but there's no release yet). So this is basically Fedora breakage ... but it means I don't have an easy way to check 1.13 except by installing handmade packages, which would probably not be definitive proof one way or the other. regards, tom lane
I wrote: > So this is basically Fedora breakage ... > but it means I don't have an easy way to check 1.13 except > by installing handmade packages, which would probably > not be definitive proof one way or the other. I overcame my trepidation after noting that the F26 and F28 lcov.spec files were nigh identical, and built lcov 1.13 on F26. And it's fine too: $ grep bogus coverage/src/backend/optimizer/util/predtest.c.* coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum"> 492 </span><span class="lineNoCov"> 0 : elog(ERROR, "predicate_classify returned a bogus value");</span> coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum"> 803 </span><span class="lineNoCov"> 0 : elog(ERROR, "predicate_classify returned a bogus value");</span> (span class="lineNoCov" is what we want to see here). The underlying compiler is gcc 7.3.1 in this case. So it seems we've reduced the possibilities to these: either there's something weird about Debian's packaging of lcov, or there's a bug in some underlying perl module on Debian, or there's some incompatibility between gcc 6.3.0 and lcov 1.13. The last seems the most likely. The gcov text output looks about the same on 7.3.1: $ grep bogus src/backend/optimizer/util/predtest.c.gcov #####: 492: elog(ERROR, "predicate_classify returned a bogus value"); #####: 803: elog(ERROR, "predicate_classify returned a bogus value"); but now that we know that lcov looks directly at the .gcno files, and possibly also .gcda (I've not examined the source code), I doubt that we should trust that "the gcov text output looks the same" means anything compatibility-wise. regards, tom lane
On 3/13/18 13:45, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: >> Running gcov on the box itself in the source dir shows the same results >> which you got: > > Oh really! > >> Which seems to indicate that this actually is some kind of lcov bug. > > Yeah. I'd written that off as too low-probability to worry about, > but maybe not. I'll poke further. Peter, which lcov did you test? lcov: LCOV version 1.13 The one that comes with Debian stable, presumably the same that is running on coverage.p.o. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 3/13/18 13:45, Tom Lane wrote: >> Yeah. I'd written that off as too low-probability to worry about, >> but maybe not. I'll poke further. Peter, which lcov did you test? > lcov: LCOV version 1.13 > The one that comes with Debian stable, presumably the same that is > running on coverage.p.o. OK, now we're getting into the *really* low-probability cases. Like, maybe it's time to remove and reinstall lcov and/or gcc on that box? regards, tom lane
Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > > On 3/13/18 13:45, Tom Lane wrote: > >> Yeah. I'd written that off as too low-probability to worry about, > >> but maybe not. I'll poke further. Peter, which lcov did you test? > > > lcov: LCOV version 1.13 > > The one that comes with Debian stable, presumably the same that is > > running on coverage.p.o. > > OK, now we're getting into the *really* low-probability cases. Like, > maybe it's time to remove and reinstall lcov and/or gcc on that box? I checked on my own Debian box, which shows $ gcov --version gcov (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 $ lcov --version lcov: LCOV version 1.13 and what I get is: 492 0 : elog(ERROR, "predicate_classify returned a bogus value"); 803 0 : elog(ERROR, "predicate_classify returned a bogus value"); so. In the coverage.pg.org box, we have: $ cat .lcovrc lcov_branch_coverage = 1 I'm going to do a test run with that disabled and see what happens. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > In the coverage.pg.org box, we have: > > $ cat .lcovrc > lcov_branch_coverage = 1 > > I'm going to do a test run with that disabled and see what happens. So the counters for the "bogus" line are reported as zero now, but we lost branch-based numbers. I suppose we can live with that, if we don't want to try and track down the bug. I suppose you could try to reproduce it locally with that setting. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > In the coverage.pg.org box, we have: > $ cat .lcovrc > lcov_branch_coverage = 1 Oh, that's critical information not mentioned before :-(. On my Fedora 26 box (gcc 7.3.1) I can reproduce the problem after setting up ~/.lcovrc that way: $ grep bogus coverage/src/backend/optimizer/util/predtest.c.* coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum"> 492 </span> :<span class="lineCov"> 10208 : elog(ERROR, "predicate_classify returned a bogus value");</span> coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum"> 803 </span> :<span class="lineCov"> 59482 : elog(ERROR, "predicate_classify returned a bogus value");</span> (note "lineCov" instead of expected "lineNoCov"). This happens with either lcov 1.12 or 1.13. I also notice an interesting warning with either version: $ make coverage-html /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info geninfo: Note: --initial does not generate branch coverage data /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -d . -d . -o lcov_test.info rm -rf coverage /usr/bin/genhtml -q --legend -o coverage --title='PostgreSQL 11devel' --num-spaces=4 --prefix='/home/tgl/pgsql' lcov_base.infolcov_test.info touch coverage-html-stamp No idea what "--initial" refers to, but it suggests that we're misusing the tool somehow with this configuration. regards, tom lane
I wrote: > I also notice an interesting warning with either version: > $ make coverage-html > /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info > geninfo: Note: --initial does not generate branch coverage data > /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -d . -d . -o lcov_test.info > rm -rf coverage > /usr/bin/genhtml -q --legend -o coverage --title='PostgreSQL 11devel' --num-spaces=4 --prefix='/home/tgl/pgsql' lcov_base.infolcov_test.info > touch coverage-html-stamp > No idea what "--initial" refers to, but it suggests that we're > misusing the tool somehow with this configuration. So I went and read the lcov man page, and according to it we're doing this entirely wrong. The correct process, saith lcov, is -i --initial Capture initial zero coverage data. Run lcov with -c and this option on the directories containing .bb, .bbg or .gcno files before running any test case. The result is a "baseline" coverage data file that contains zero coverage for every instrumented line. Combine this data file (using lcov -a) with coverage data files captured after a test run to ensure that the percentage of total lines covered is cor- rect even when not all source code files were loaded during the test. Recommended procedure when capturing data for a test case: 1. create baseline coverage data file # lcov -c -i -d appdir -o app_base.info 2. perform test # appdir/test 3. create test coverage data file # lcov -c -d appdir -o app_test.info 4. combine baseline and test coverage data # lcov -a app_base.info -a app_test.info -o app_total.info Our process swaps steps 1 and 2. What I now suspect is that we accidentally get away with that for basic mode, but it doesn't work with lcov_branch_coverage. regards, tom lane
I wrote: > Our process swaps steps 1 and 2. What I now suspect is that > we accidentally get away with that for basic mode, but it > doesn't work with lcov_branch_coverage. I tested that theory by manually doing things in the "correct" order, and it didn't change the results: $ /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info geninfo: Note: --initial does not generate branch coverage data $ make check ... $ /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -d . -d . -o lcov_test.info $ rm -rf coverage $ /usr/bin/genhtml -q --legend -o coverage --title='PostgreSQL 11devel' --num-spaces=4 --prefix='/home/tgl/pgsql' lcov_base.infolcov_test.info $ grep bogus coverage/src/backend/optimizer/util/predtest.c.* coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum"> 492 </span> :<span class="lineCov"> 10186 : elog(ERROR, "predicate_classify returned a bogus value");</span> coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum"> 803 </span> :<span class="lineCov"> 59482 : elog(ERROR, "predicate_classify returned a bogus value");</span> I'm still a bit suspicious that we may have a process problem, but it's looking like there may be grounds for an lcov bug report. regards, tom lane
I wrote: > I'm still a bit suspicious that we may have a process problem, > but it's looking like there may be grounds for an lcov bug > report. Well, this is darn interesting. I got the Fedora lcov maintainer to push an update absorbing the upstream "gcc 8" fixes. (Turned out he'd already done that for rawhide, but forgot to push it into the F28 branch.) And with that, and gcc 8.0.1, ... no bug. The lines are marked "lineNoCov" with or without lcov_branch_coverage. The commit message for said upstream patch is even more interesting: Subject: [PATCH] geninfo: Add gcc 8 support Fix errors and incorrect data when trying to collect coverage data for programs compiled with gcc 8. Covers the following gcov-related changes in gcc: .gcov-file format: - Line coverage data can appear multiple times for the same line - Line coverage count can be suffixed by '*' to indicated unexecuted basic blocks in that line .gcno-file format: - new header field 'support unexecuted blocks flag' - new function record fields 'column number', 'ending line number', and 'compiler-generated entity flag' Perhaps I'm reading too much into this, but what it sounds like is that gcc's coverage data output format was incapable of correctly representing the situation before gcc 8, and thus that it's not really lcov's fault that we get questionable output. regards, tom lane
Tom Lane wrote: > Well, this is darn interesting. I got the Fedora lcov maintainer > to push an update absorbing the upstream "gcc 8" fixes. (Turned > out he'd already done that for rawhide, but forgot to push it into > the F28 branch.) And with that, and gcc 8.0.1, ... no bug. The > lines are marked "lineNoCov" with or without lcov_branch_coverage. Hmm. I wonder if this means that the reports generated with any compiler prior to gcc 8 are unreliable. At least we know now that that is indeed the case with branch coverage, but what about without? While we're on this topic ... Some time ago, I looked into whether it would be possible to make the coverage report ignore the elog(ERROR) lines, which are --or should be-- unreachable code and thus we don't care too much about test coverage. Finding no way to implement that, I gave up (I tried adding exclusion markers in the elog definition, as documented in geninfo. Perhaps I did it wrong). But maybe it is possible with these recent improvements? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Hmm. I wonder if this means that the reports generated with any > compiler prior to gcc 8 are unreliable. At least we know now that that > is indeed the case with branch coverage, but what about without? Well, that's probably an overly strong conclusion; if lcov were broken in general, people would've noticed before now. I have a question in to the lcov mailing list at sourceforge to see if anyone wants to offer a more informed opinion, though. In the short term it seems clear that we'd better turn off lcov_branch_coverage at coverage.postgresql.org, as I see you've already done. regards, tom lane
On 3/14/18 12:31, Tom Lane wrote: > Recommended procedure when capturing data for a test case: > > 1. create baseline coverage data file > # lcov -c -i -d appdir -o app_base.info > > 2. perform test > # appdir/test > > 3. create test coverage data file > # lcov -c -d appdir -o app_test.info > > 4. combine baseline and test coverage data > # lcov -a app_base.info -a app_test.info -o > app_total.info > > Our process swaps steps 1 and 2. That doesn't matter because 1 and 3 are independent of another. The pipeline is 1 ------\ |--> 4 2 -> 3 -/ Our makefiles are set up in exactly this way and may run 1 and 3 in parallel. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I wrote: > ... I have a question in > to the lcov mailing list at sourceforge to see if anyone wants to offer > a more informed opinion, though. And the authoritative answer is: the bug is in gcc/gcov. The branch_coverage switch implies passing "-a" to gcov, and if you do that (with "make coverage GCOVFLAGS=-a" in our infrastructure), you get output like this (tested with gcc 7.3.1): -: 485: clause, -: 486: weak); -: 487: } #####: 488: break; $$$$$: 488-block 0 -: 489: } -: 490: -: 491: /* can't get here */ 5134: 492: elog(ERROR, "predicate_classify returned a bogus value"); $$$$$: 492-block 0 call 0 never executed call 1 never executed call 2 never executed 5134: 492-block 1 -: 493: return false; -: 494:} which is clearly claiming that the second basic block in line 492 gets executed. With gcc 8.0.1 I instead get: -: 485: clause, -: 486: weak); -: 487: } #####: 488: break; %%%%%: 488-block 0 -: 489: } -: 490: -: 491: /* can't get here */ #####: 492: elog(ERROR, "predicate_classify returned a bogus value"); %%%%%: 492-block 0 call 0 never executed call 1 never executed call 2 never executed -: 493: return false; -: 494:} The fact that they changed the gcov data format at the same time hints that it may be unfixable without such a change, though I've not tried to excavate in their changelog for more info. It could also just be a code generation change, seeing that 8.0.1 is not claiming there are 2 basic blocks anymore. (Hm, I wonder whether our "pg_unreachable" stuff confuses the earlier releases.) regards, tom lane
On 2018-Mar-12, Peter Eisentraut wrote: > On 3/11/18 16:05, Tom Lane wrote: > > It might be worth blowing away ccache's cache just to see if it's cached > > anything bogus. > > --enable-coverage disables ccache, so that's not going to be it. (Ref. src/Makefile.global.in lines 484ff setting CCACHE_DISABLE=1) Why does it do that? Now that ccache is about 5 years less buggy than it was when you wrote this, can we get rid of this behavior? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-May-30, Alvaro Herrera wrote: > On 2018-Mar-12, Peter Eisentraut wrote: > > > On 3/11/18 16:05, Tom Lane wrote: > > > It might be worth blowing away ccache's cache just to see if it's cached > > > anything bogus. > > > > --enable-coverage disables ccache, so that's not going to be it. > > (Ref. src/Makefile.global.in lines 484ff setting CCACHE_DISABLE=1) > > Why does it do that? Now that ccache is about 5 years less buggy than > it was when you wrote this, can we get rid of this behavior? I think I found the answer in this commit: https://github.com/ccache/ccache/commit/02d3f078bd2495b8db2264ae0b2c692b4c5ba1bd "Add support for coverage (compiling for gcov)" committed for ccache 3.2.2. So I think we could remove that stuff now ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2019-May-30, Alvaro Herrera wrote: >> On 2018-Mar-12, Peter Eisentraut wrote: >>> --enable-coverage disables ccache, so that's not going to be it. > I think I found the answer in this commit: > https://github.com/ccache/ccache/commit/02d3f078bd2495b8db2264ae0b2c692b4c5ba1bd > "Add support for coverage (compiling for gcov)" > committed for ccache 3.2.2. > So I think we could remove that stuff now ... I'd be a bit sad if we did, because where I usually do coverage checking is my RHEL6 box, which has $ ccache --version ccache version 3.1.6 I'll probably bite the bullet and upgrade that box sometime in the next year or so, but it's not really a near-term project. Is there any big hurry for having ccache for coverage builds? regards, tom lane
On 2019-06-01 19:10, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> On 2019-May-30, Alvaro Herrera wrote: >>> On 2018-Mar-12, Peter Eisentraut wrote: >>>> --enable-coverage disables ccache, so that's not going to be it. > >> I think I found the answer in this commit: >> https://github.com/ccache/ccache/commit/02d3f078bd2495b8db2264ae0b2c692b4c5ba1bd >> "Add support for coverage (compiling for gcov)" >> committed for ccache 3.2.2. > >> So I think we could remove that stuff now ... > > I'd be a bit sad if we did, because where I usually do coverage checking > is my RHEL6 box, which has > > $ ccache --version > ccache version 3.1.6 > > I'll probably bite the bullet and upgrade that box sometime in the > next year or so, but it's not really a near-term project. Is there > any big hurry for having ccache for coverage builds? I happened to come across this again. I think we should make the change Álvaro was proposing. The fix in ccache is 5 years old now. For older ccache versions, you can always set CCACHE_DISABLE yourself. Doing it the other way around is hard or impossible with the current setup. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2019-06-01 19:10, Tom Lane wrote: >> I'll probably bite the bullet and upgrade that box sometime in the >> next year or so, but it's not really a near-term project. Is there >> any big hurry for having ccache for coverage builds? > I happened to come across this again. I think we should make the change > Álvaro was proposing. The fix in ccache is 5 years old now. Fair enough, I can just install a newer ccache version locally. But should we worry about any buildfarm members running old ccache versions? It'd probably be polite to notify the owners list about what the minimum ccache version is going to be. regards, tom lane
On 2019-09-17 22:12, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> On 2019-06-01 19:10, Tom Lane wrote: >>> I'll probably bite the bullet and upgrade that box sometime in the >>> next year or so, but it's not really a near-term project. Is there >>> any big hurry for having ccache for coverage builds? > >> I happened to come across this again. I think we should make the change >> Álvaro was proposing. The fix in ccache is 5 years old now. > > Fair enough, I can just install a newer ccache version locally. > > But should we worry about any buildfarm members running old ccache > versions? It'd probably be polite to notify the owners list about > what the minimum ccache version is going to be. Buildfarm members are not running coverage, are they? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2019-09-17 22:12, Tom Lane wrote: >> But should we worry about any buildfarm members running old ccache >> versions? It'd probably be polite to notify the owners list about >> what the minimum ccache version is going to be. > Buildfarm members are not running coverage, are they? Ah, of course not. I'd forgotten the details of what we were concerned about. Objection withdrawn. regards, tom lane
On 2019-09-17 22:32, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> On 2019-09-17 22:12, Tom Lane wrote: >>> But should we worry about any buildfarm members running old ccache >>> versions? It'd probably be polite to notify the owners list about >>> what the minimum ccache version is going to be. > >> Buildfarm members are not running coverage, are they? > > Ah, of course not. I'd forgotten the details of what we were > concerned about. Objection withdrawn. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services