Обсуждение: Bogus reports from coverage.postgresql.org

Поиск
Список
Период
Сортировка

Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I happened to notice that if you look at
https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html
it claims that the two occurrences of
    elog(ERROR, "predicate_classify returned a bogus value");
at lines 492 and 803 are reached.

This would be quite astonishing if true, but it isn't true, since
the regression tests aren't reporting any such failure.

Needless to say, this puts quite a damper on my faith in the coverage
reports.  Maybe that machine needs a software update?

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Andres Freund
Дата:
On 2018-03-09 19:36:28 -0500, Tom Lane wrote:
> I happened to notice that if you look at
> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html
> it claims that the two occurrences of
>     elog(ERROR, "predicate_classify returned a bogus value");
> at lines 492 and 803 are reached.
> 
> This would be quite astonishing if true, but it isn't true, since
> the regression tests aren't reporting any such failure.
> 
> Needless to say, this puts quite a damper on my faith in the coverage
> reports.  Maybe that machine needs a software update?

Is it possible that the compiler figured that the patch is unreachable
and removed the code?

Greetings,

Andres Freund


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-03-09 19:36:28 -0500, Tom Lane wrote:
>> I happened to notice that if you look at
>> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html
>> it claims that the two occurrences of
>> elog(ERROR, "predicate_classify returned a bogus value");
>> at lines 492 and 803 are reached.

> Is it possible that the compiler figured that the patch is unreachable
> and removed the code?

Maybe, but then it's even less reasonable to claim that the lines
were reached.

In general, lcov seems to ignore lines for which no code was generated,
which is fine by me.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> I happened to notice that if you look at
> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html
> it claims that the two occurrences of
>     elog(ERROR, "predicate_classify returned a bogus value");
> at lines 492 and 803 are reached.
> 
> This would be quite astonishing if true, but it isn't true, since
> the regression tests aren't reporting any such failure.
> 
> Needless to say, this puts quite a damper on my faith in the coverage
> reports.  Maybe that machine needs a software update?

Hmm, wow.  The count for that line is the same as the count at function
entrance, so maybe it is being considered as the normal output path of
the function somehow.  I suspect this is a bug in gcov.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 3/9/18 19:36, Tom Lane wrote:
> I happened to notice that if you look at
> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html
> it claims that the two occurrences of
>     elog(ERROR, "predicate_classify returned a bogus value");
> at lines 492 and 803 are reached.
> 
> This would be quite astonishing if true, but it isn't true, since
> the regression tests aren't reporting any such failure.
> 
> Needless to say, this puts quite a damper on my faith in the coverage
> reports.  Maybe that machine needs a software update?

I can't reproduce this locally, so yeah, maybe we should check whether
the toolchain could be updated.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Magnus Hagander
Дата:


On Sun, Mar 11, 2018 at 3:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/9/18 19:36, Tom Lane wrote:
> I happened to notice that if you look at
> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html
> it claims that the two occurrences of
>       elog(ERROR, "predicate_classify returned a bogus value");
> at lines 492 and 803 are reached.
>
> This would be quite astonishing if true, but it isn't true, since
> the regression tests aren't reporting any such failure.
>
> Needless to say, this puts quite a damper on my faith in the coverage
> reports.  Maybe that machine needs a software update?

I can't reproduce this locally, so yeah, maybe we should check whether
the toolchain could be updated.

Is it specifically "lcov" we are considering here? If so, we are on version 1.13, and there is no newer version packaged for debian or from what I can tell even released?

--

Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Mar 11, 2018 at 3:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>> On 3/9/18 19:36, Tom Lane wrote:
>>> I happened to notice that if you look at
>>> https://coverage.postgresql.org/src/backend/optimizer/util/predtest.c.gcov.html
>>> it claims that the two occurrences of
>>> elog(ERROR, "predicate_classify returned a bogus value");
>>> at lines 492 and 803 are reached.

>> I can't reproduce this locally, so yeah, maybe we should check whether
>> the toolchain could be updated.

Now that I look, I don't see this behavior on my local machine (RHEL6,
gcc 4.4.7, lcov 1.10) either.

> Is it specifically "lcov" we are considering here? If so, we are on version
> 1.13, and there is no newer version packaged for debian or from what I can
> tell even released?

It could also be a compiler issue, I believe.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Is it specifically "lcov" we are considering here? If so, we are on version
>> 1.13, and there is no newer version packaged for debian or from what I can
>> tell even released?

> It could also be a compiler issue, I believe.

Actually, looking closer, it appears that lcov is just translating
gcov's textual output into HTML, so that the error is probably to
be blamed on gcc or gcov.  And at least on RHEL, those come from
the same package:

$ rpm -qf /usr/bin/gcov
gcc-4.4.7-18.el6.x86_64

So if you're up2date on gcc, we might have material for a gcc bug report
here.  I'll try to reproduce it on recent Fedora, and if so file something
at Red Hat.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Magnus Hagander
Дата:
On Sun, Mar 11, 2018 at 6:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Is it specifically "lcov" we are considering here? If so, we are on version
>> 1.13, and there is no newer version packaged for debian or from what I can
>> tell even released?

> It could also be a compiler issue, I believe.

Actually, looking closer, it appears that lcov is just translating
gcov's textual output into HTML, so that the error is probably to
be blamed on gcc or gcov.  And at least on RHEL, those come from
the same package:

$ rpm -qf /usr/bin/gcov
gcc-4.4.7-18.el6.x86_64

So if you're up2date on gcc, we might have material for a gcc bug report
here.  I'll try to reproduce it on recent Fedora, and if so file something
at Red Hat.

The current gcc version in debin stable is:

mha@galvin:~$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

(and yes, gcov is in the gcc package on Debian as well)

--

Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Mar 11, 2018 at 6:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So if you're up2date on gcc, we might have material for a gcc bug report
>> here.  I'll try to reproduce it on recent Fedora, and if so file something
>> at Red Hat.

> The current gcc version in debin stable is:
> mha@galvin:~$ gcc --version
> gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

Hm.  The closest I'm going to be able to get on Red Hat distros is one of

f22/gcc.spec:%global gcc_version 5.3.1
f23/gcc.spec:%global gcc_version 5.3.1
f24/gcc.spec:%global gcc_version 6.3.1
f25/gcc.spec:%global gcc_version 6.4.1
f26/gcc.spec:%global gcc_version 7.3.1
f27/gcc.spec:%global gcc_version 7.3.1
f28/gcc.spec:%global gcc_version 8.0.1

I'll try this on 7.3.1, since I already have that on a handy laptop.
But if it doesn't reproduce there I think the answer is going to be
that Debian is shipping an old buggy gcov.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> I'll try this on 7.3.1, since I already have that on a handy laptop.
> But if it doesn't reproduce there I think the answer is going to be
> that Debian is shipping an old buggy gcov.

Huh.  7.3.1 shows the bug.  I'll install F28 and see if it's still
there.  Peter, what gcc/gcov version did you test?

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> Huh.  7.3.1 shows the bug.

Wait, scratch that, I misread it.  The gcov output on 7.3.1 looks like

     4127:  478:                                case CLASS_ATOM:
        -:  479:
        -:  480:                                        /*
        -:  481:                                         * atom => atom is the base case
        -:  482:                                         */
        -:  483:                                        return
     4127:  484:                                                predicate_implied_by_simple_clause((Expr *\
) predicate,
call    0 returned 100%
        -:  485:                                                                                          \
                         clause,
        -:  486:                                                                                          \
                         weak);
        -:  487:                        }
    #####:  488:                        break;
        -:  489:        }
        -:  490:
        -:  491:        /* can't get here */
    #####:  492:        elog(ERROR, "predicate_classify returned a bogus value");
call    0 never executed
call    1 never executed
call    2 never executed
        -:  493:        return false;
        -:  494:}
        -:  495:
        -:  496:/*----------
        -:  497: * predicate_refuted_by_recurse

which is correct: it's showing that the elog line is never executed.

So apparently, this bug is specific to gcc 6.3.0 and maybe a few
versions on either side of that.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 3/11/18 15:24, Tom Lane wrote:
> So apparently, this bug is specific to gcc 6.3.0 and maybe a few
> versions on either side of that.

I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it.

cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Magnus Hagander
Дата:

On Sun, Mar 11, 2018 at 8:52 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/11/18 15:24, Tom Lane wrote:
> So apparently, this bug is specific to gcc 6.3.0 and maybe a few
> versions on either side of that.

I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it.

cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516


Hmm. So it works for you on exactly the same version that the server runs. That's interesting. Is there something wrong in how we run it? I wonder if its broken by ccache. What we have is:

# current ccache is useless, because 3.1 doesn't work with -fprofile-arcs.
# Bug is fixed in 3.2, so maybe someday when this is updated to a new debian
# release, this will be useful.
export PATH=/usr/lib/ccache:$PATH

./configure --enable-depend --enable-coverage --enable-tap-tests --enable-nls --with-python --with-perl --with-tcl --with-openssl --with-libxml --with-ldap --with-pam >> $LOG 2>&1

# run tests
make -j4 >> $LOG 2>&1
make -j4 -C contrib >> $LOG 2>&1
make check-world >> $LOG 2>&1
make -C src/test/ssl check >> $LOG 2>&1
make -C src/test/ldap check >> $LOG 2>&1

make coverage-html >> $LOG 2>&1



We do now have ccache 3.3.4, so that comment sounds like it's out of date. But maybe something else broke...

Does that sound likely? We could try to comment out that line and see if it helps?


--

Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it.
> cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

Oh ... *that's* interesting.  Are we sure this is the version in use
on coverage.postgresql.org?

Also, I just finished searching in vain for any plausibly-matching
bug at https://gcc.gnu.org/bugzilla/.  I did find a few reports
suggesting that gcov could misbehave with nondefault build options
such as LTO.  And "gcov -a" has also had issues.  So maybe we ought
to have a look at exactly what coverage.postgresql.org's build
recipe is.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> Hmm. So it works for you on exactly the same version that the server runs.
> That's interesting. Is there something wrong in how we run it? I wonder if
> its broken by ccache. What we have is:

It might be worth blowing away ccache's cache just to see if it's cached
anything bogus.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Magnus Hagander
Дата:
On Sun, Mar 11, 2018 at 9:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Hmm. So it works for you on exactly the same version that the server runs.
> That's interesting. Is there something wrong in how we run it? I wonder if
> its broken by ccache. What we have is:

It might be worth blowing away ccache's cache just to see if it's cached
anything bogus.

I've done that and completed a run. AFAICT it's still showing the same issue, right?

--

Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Mar 11, 2018 at 9:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It might be worth blowing away ccache's cache just to see if it's cached
>> anything bogus.

> I've done that and completed a run. AFAICT it's still showing the same
> issue, right?

Yeah, looks like it to me.  'Tis a puzzlement.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 3/11/18 16:05, Tom Lane wrote:
> It might be worth blowing away ccache's cache just to see if it's cached
> anything bogus.

--enable-coverage disables ccache, so that's not going to be it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Magnus Hagander
Дата:
On Mon, Mar 12, 2018 at 3:53 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/11/18 16:05, Tom Lane wrote:
> It might be worth blowing away ccache's cache just to see if it's cached
> anything bogus.

--enable-coverage disables ccache, so that's not going to be it.


That certainly makes that coding put into the script (which has been there for a long time) completely wrong :)

That said, it completed with that row commented out, and AFAICT the issue is still lthere :( So much for that theory. 

--

Re: Bogus reports from coverage.postgresql.org

От
Stephen Frost
Дата:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > I tried it on Debian stable with gcc 6.3.0 and couldn't reproduce it.
> > cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
>
> Oh ... *that's* interesting.  Are we sure this is the version in use
> on coverage.postgresql.org?

Yes, it's the same, but this isn't a gcov issue, I don't think.

> Also, I just finished searching in vain for any plausibly-matching
> bug at https://gcc.gnu.org/bugzilla/.  I did find a few reports
> suggesting that gcov could misbehave with nondefault build options
> such as LTO.  And "gcov -a" has also had issues.  So maybe we ought
> to have a look at exactly what coverage.postgresql.org's build
> recipe is.

Running gcov on the box itself in the source dir shows the same results
which you got:

---
        -:  487:                        }
    #####:  488:                        break;
        -:  489:        }
        -:  490:
        -:  491:        /* can't get here */
    #####:  492:        elog(ERROR, "predicate_classify returned a bogus value");
        -:  493:        return false;
        -:  494:}
---

Which seems to indicate that this actually is some kind of lcov bug.
That makes more sense too since lcov is 1.13 on coverage.p.o, but you
used 1.10 and said you didn't see an issue.  Not sure if you have access
to 1.13 easily, but if so, could be useful to see if you're seeing the
same behavior under 1.13 where the gcov output is correct but lcov
output isn't.

Thanks!

Stephen

Вложения

Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> Running gcov on the box itself in the source dir shows the same results
> which you got:

Oh really!

> Which seems to indicate that this actually is some kind of lcov bug.

Yeah.  I'd written that off as too low-probability to worry about,
but maybe not.  I'll poke further.  Peter, which lcov did you test?

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> Which seems to indicate that this actually is some kind of lcov bug.
> That makes more sense too since lcov is 1.13 on coverage.p.o, but you
> used 1.10 and said you didn't see an issue.  Not sure if you have access
> to 1.13 easily, but if so, could be useful to see if you're seeing the
> same behavior under 1.13 where the gcov output is correct but lcov
> output isn't.

Hpmh.  So I tested Fedora 26, which has lcov 1.12, and that seems fine.
Then I installed pre-release Fedora 28, which has lcov 1.13, and that
does not work at all:

$ make coverage-html
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info
geninfo: WARNING: /home/tgl/pgsql/src/backend/catalog/aclchk.gcno: Overlong record at end of file!
geninfo: WARNING: /home/tgl/pgsql/src/backend/catalog/pg_subscription.gcno: Overlong record at end of file!
geninfo: WARNING: /home/tgl/pgsql/src/backend/catalog/pg_db_role_setting.gcno: Overlong record at end of file!
... lots more ...
geninfo: WARNING: /home/tgl/pgsql/src/backend/storage/lmgr/predicate.gcno: Overlong record at end of file!
geninfo: ERROR: /home/tgl/pgsql/src/backend/storage/lmgr/lwlocknames.gcno: reached unexpected end of file
make: *** [src/Makefile.global:911: lcov_base.info] Error 255
make: *** Deleting file 'lcov_base.info'

Eyeing the commit history at
https://github.com/linux-test-project/lcov/commits/master
makes it appear that lcov doesn't work with gcc 8.0 yet
(well, they committed support last week, but there's no
release yet).  So this is basically Fedora breakage ...
but it means I don't have an easy way to check 1.13 except
by installing handmade packages, which would probably
not be definitive proof one way or the other.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> So this is basically Fedora breakage ...
> but it means I don't have an easy way to check 1.13 except
> by installing handmade packages, which would probably
> not be definitive proof one way or the other.

I overcame my trepidation after noting that the F26 and F28 lcov.spec
files were nigh identical, and built lcov 1.13 on F26.  And it's fine
too:

$ grep bogus coverage/src/backend/optimizer/util/predtest.c.*
coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum">     492 </span><span class="lineNoCov">
        0 :     elog(ERROR, "predicate_classify returned a bogus value");</span> 
coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum">     803 </span><span class="lineNoCov">
        0 :     elog(ERROR, "predicate_classify returned a bogus value");</span> 

(span class="lineNoCov" is what we want to see here).  The underlying
compiler is gcc 7.3.1 in this case.

So it seems we've reduced the possibilities to these: either there's
something weird about Debian's packaging of lcov, or there's a bug in some
underlying perl module on Debian, or there's some incompatibility between
gcc 6.3.0 and lcov 1.13.  The last seems the most likely.  The gcov text
output looks about the same on 7.3.1:

$ grep bogus src/backend/optimizer/util/predtest.c.gcov
    #####:  492:        elog(ERROR, "predicate_classify returned a bogus value");
    #####:  803:        elog(ERROR, "predicate_classify returned a bogus value");

but now that we know that lcov looks directly at the .gcno files,
and possibly also .gcda (I've not examined the source code), I doubt
that we should trust that "the gcov text output looks the same" means
anything compatibility-wise.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 3/13/18 13:45, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> Running gcov on the box itself in the source dir shows the same results
>> which you got:
> 
> Oh really!
> 
>> Which seems to indicate that this actually is some kind of lcov bug.
> 
> Yeah.  I'd written that off as too low-probability to worry about,
> but maybe not.  I'll poke further.  Peter, which lcov did you test?

lcov: LCOV version 1.13

The one that comes with Debian stable, presumably the same that is
running on coverage.p.o.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 3/13/18 13:45, Tom Lane wrote:
>> Yeah.  I'd written that off as too low-probability to worry about,
>> but maybe not.  I'll poke further.  Peter, which lcov did you test?

> lcov: LCOV version 1.13
> The one that comes with Debian stable, presumably the same that is
> running on coverage.p.o.

OK, now we're getting into the *really* low-probability cases.  Like,
maybe it's time to remove and reinstall lcov and/or gcc on that box?

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > On 3/13/18 13:45, Tom Lane wrote:
> >> Yeah.  I'd written that off as too low-probability to worry about,
> >> but maybe not.  I'll poke further.  Peter, which lcov did you test?
> 
> > lcov: LCOV version 1.13
> > The one that comes with Debian stable, presumably the same that is
> > running on coverage.p.o.
> 
> OK, now we're getting into the *really* low-probability cases.  Like,
> maybe it's time to remove and reinstall lcov and/or gcc on that box?

I checked on my own Debian box, which shows

$ gcov --version
gcov (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

$ lcov --version
lcov: LCOV version 1.13

and what I get is:
     492           0 :     elog(ERROR, "predicate_classify returned a bogus value");
     803           0 :     elog(ERROR, "predicate_classify returned a bogus value");

so.

In the coverage.pg.org box, we have:

$ cat .lcovrc 
lcov_branch_coverage = 1

I'm going to do a test run with that disabled and see what happens.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:

> In the coverage.pg.org box, we have:
> 
> $ cat .lcovrc 
> lcov_branch_coverage = 1
> 
> I'm going to do a test run with that disabled and see what happens.

So the counters for the "bogus" line are reported as zero now, but we
lost branch-based numbers.  I suppose we can live with that, if we don't
want to try and track down the bug.  I suppose you could try to
reproduce it locally with that setting.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> In the coverage.pg.org box, we have:
> $ cat .lcovrc
> lcov_branch_coverage = 1

Oh, that's critical information not mentioned before :-(.

On my Fedora 26 box (gcc 7.3.1) I can reproduce the problem after
setting up ~/.lcovrc that way:

$ grep bogus  coverage/src/backend/optimizer/util/predtest.c.*
coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum">     492 </span>                :<span
class="lineCov">     10208 :     elog(ERROR, "predicate_classify returned a bogus value");</span> 
coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum">     803 </span>                :<span
class="lineCov">     59482 :     elog(ERROR, "predicate_classify returned a bogus value");</span> 

(note "lineCov" instead of expected "lineNoCov").  This happens
with either lcov 1.12 or 1.13.  I also notice an interesting
warning with either version:

$ make coverage-html
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info
geninfo: Note: --initial does not generate branch coverage data
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -d . -d . -o lcov_test.info
rm -rf coverage
/usr/bin/genhtml -q --legend -o coverage --title='PostgreSQL 11devel' --num-spaces=4 --prefix='/home/tgl/pgsql'
lcov_base.infolcov_test.info 
touch coverage-html-stamp

No idea what "--initial" refers to, but it suggests that we're
misusing the tool somehow with this configuration.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> I also notice an interesting warning with either version:

> $ make coverage-html
> /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info
> geninfo: Note: --initial does not generate branch coverage data
> /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -d . -d . -o lcov_test.info
> rm -rf coverage
> /usr/bin/genhtml -q --legend -o coverage --title='PostgreSQL 11devel' --num-spaces=4 --prefix='/home/tgl/pgsql'
lcov_base.infolcov_test.info 
> touch coverage-html-stamp

> No idea what "--initial" refers to, but it suggests that we're
> misusing the tool somehow with this configuration.

So I went and read the lcov man page, and according to it we're doing
this entirely wrong.  The correct process, saith lcov, is

       -i
       --initial
              Capture initial zero coverage data.

              Run  lcov  with -c and this option on the directories containing
              .bb, .bbg or .gcno files  before  running  any  test  case.  The
              result  is  a  "baseline"  coverage data file that contains zero
              coverage for every instrumented line.  Combine  this  data  file
              (using  lcov  -a) with coverage data files captured after a test
              run to ensure that the percentage of total lines covered is cor-
              rect  even when not all source code files were loaded during the
              test.

              Recommended procedure when capturing data for a test case:

              1. create baseline coverage data file
                     # lcov -c -i -d appdir -o app_base.info

              2. perform test
                     # appdir/test

              3. create test coverage data file
                     # lcov -c -d appdir -o app_test.info

              4. combine baseline and test coverage data
                     # lcov -a app_base.info -a app_test.info -o
                     app_total.info

Our process swaps steps 1 and 2.  What I now suspect is that
we accidentally get away with that for basic mode, but it
doesn't work with lcov_branch_coverage.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> Our process swaps steps 1 and 2.  What I now suspect is that
> we accidentally get away with that for basic mode, but it
> doesn't work with lcov_branch_coverage.

I tested that theory by manually doing things in the "correct" order,
and it didn't change the results:

$ /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d . -o lcov_base.info
geninfo: Note: --initial does not generate branch coverage data
$ make check
...
$ /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -d . -d . -o lcov_test.info
$ rm -rf coverage
$ /usr/bin/genhtml -q --legend -o coverage --title='PostgreSQL 11devel' --num-spaces=4 --prefix='/home/tgl/pgsql'
lcov_base.infolcov_test.info 
$ grep bogus  coverage/src/backend/optimizer/util/predtest.c.*
coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum">     492 </span>                :<span
class="lineCov">     10186 :     elog(ERROR, "predicate_classify returned a bogus value");</span> 
coverage/src/backend/optimizer/util/predtest.c.gcov.html:<span class="lineNum">     803 </span>                :<span
class="lineCov">     59482 :     elog(ERROR, "predicate_classify returned a bogus value");</span> 

I'm still a bit suspicious that we may have a process problem,
but it's looking like there may be grounds for an lcov bug
report.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> I'm still a bit suspicious that we may have a process problem,
> but it's looking like there may be grounds for an lcov bug
> report.

Well, this is darn interesting.  I got the Fedora lcov maintainer
to push an update absorbing the upstream "gcc 8" fixes.  (Turned
out he'd already done that for rawhide, but forgot to push it into
the F28 branch.)  And with that, and gcc 8.0.1, ... no bug.  The
lines are marked "lineNoCov" with or without lcov_branch_coverage.

The commit message for said upstream patch is even more interesting:

    Subject: [PATCH] geninfo: Add gcc 8 support
    
    Fix errors and incorrect data when trying to collect coverage data
    for programs compiled with gcc 8.
    
    Covers the following gcov-related changes in gcc:
    
    .gcov-file format:
      - Line coverage data can appear multiple times for the same line
      - Line coverage count can be suffixed by '*' to indicated unexecuted
        basic blocks in that line
    
    .gcno-file format:
      - new header field 'support unexecuted blocks flag'
      - new function record fields 'column number', 'ending line number',
        and 'compiler-generated entity flag'

Perhaps I'm reading too much into this, but what it sounds like is
that gcc's coverage data output format was incapable of correctly
representing the situation before gcc 8, and thus that it's not
really lcov's fault that we get questionable output.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> Well, this is darn interesting.  I got the Fedora lcov maintainer
> to push an update absorbing the upstream "gcc 8" fixes.  (Turned
> out he'd already done that for rawhide, but forgot to push it into
> the F28 branch.)  And with that, and gcc 8.0.1, ... no bug.  The
> lines are marked "lineNoCov" with or without lcov_branch_coverage.

Hmm.  I wonder if this means that the reports generated with any
compiler prior to gcc 8 are unreliable.  At least we know now that that
is indeed the case with branch coverage, but what about without?

While we're on this topic ...  Some time ago, I looked into whether it
would be possible to make the coverage report ignore the elog(ERROR)
lines, which are --or should be-- unreachable code and thus we don't
care too much about test coverage.  Finding no way to implement that, I
gave up (I tried adding exclusion markers in the elog definition, as
documented in geninfo.  Perhaps I did it wrong).  But maybe it is
possible with these recent improvements?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Hmm.  I wonder if this means that the reports generated with any
> compiler prior to gcc 8 are unreliable.  At least we know now that that
> is indeed the case with branch coverage, but what about without?

Well, that's probably an overly strong conclusion; if lcov were broken
in general, people would've noticed before now.  I have a question in
to the lcov mailing list at sourceforge to see if anyone wants to offer
a more informed opinion, though.

In the short term it seems clear that we'd better turn off
lcov_branch_coverage at coverage.postgresql.org, as I see you've
already done.

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 3/14/18 12:31, Tom Lane wrote:
>               Recommended procedure when capturing data for a test case:
> 
>               1. create baseline coverage data file
>                      # lcov -c -i -d appdir -o app_base.info
> 
>               2. perform test
>                      # appdir/test
> 
>               3. create test coverage data file
>                      # lcov -c -d appdir -o app_test.info
> 
>               4. combine baseline and test coverage data
>                      # lcov -a app_base.info -a app_test.info -o
>                      app_total.info
> 
> Our process swaps steps 1 and 2.

That doesn't matter because 1 and 3 are independent of another.  The
pipeline is

1 ------\
         |--> 4
2 -> 3 -/


Our makefiles are set up in exactly this way and may run 1 and 3 in
parallel.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
I wrote:
> ... I have a question in
> to the lcov mailing list at sourceforge to see if anyone wants to offer
> a more informed opinion, though.

And the authoritative answer is: the bug is in gcc/gcov.  The
branch_coverage switch implies passing "-a" to gcov, and if you do
that (with "make coverage GCOVFLAGS=-a" in our infrastructure),
you get output like this (tested with gcc 7.3.1):

        -:  485:                                                           clause,
        -:  486:                                                           weak);
        -:  487:            }
    #####:  488:            break;
    $$$$$:  488-block  0
        -:  489:    }
        -:  490:
        -:  491:    /* can't get here */
     5134:  492:    elog(ERROR, "predicate_classify returned a bogus value");
    $$$$$:  492-block  0
call    0 never executed
call    1 never executed
call    2 never executed
     5134:  492-block  1
        -:  493:    return false;
        -:  494:}

which is clearly claiming that the second basic block in line 492 gets
executed.

With gcc 8.0.1 I instead get:

        -:  485:                                                           clause,
        -:  486:                                                           weak);
        -:  487:            }
    #####:  488:            break;
    %%%%%:  488-block  0
        -:  489:    }
        -:  490:
        -:  491:    /* can't get here */
    #####:  492:    elog(ERROR, "predicate_classify returned a bogus value");
    %%%%%:  492-block  0
call    0 never executed
call    1 never executed
call    2 never executed
        -:  493:    return false;
        -:  494:}

The fact that they changed the gcov data format at the same time hints
that it may be unfixable without such a change, though I've not tried to
excavate in their changelog for more info.  It could also just be a code
generation change, seeing that 8.0.1 is not claiming there are 2 basic
blocks anymore.  (Hm, I wonder whether our "pg_unreachable" stuff confuses
the earlier releases.)

            regards, tom lane


Re: Bogus reports from coverage.postgresql.org

От
Alvaro Herrera
Дата:
On 2018-Mar-12, Peter Eisentraut wrote:

> On 3/11/18 16:05, Tom Lane wrote:
> > It might be worth blowing away ccache's cache just to see if it's cached
> > anything bogus.
> 
> --enable-coverage disables ccache, so that's not going to be it.

(Ref. src/Makefile.global.in lines 484ff setting CCACHE_DISABLE=1)

Why does it do that?  Now that ccache is about 5 years less buggy than
it was when you wrote this, can we get rid of this behavior?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bogus reports from coverage.postgresql.org

От
Alvaro Herrera
Дата:
On 2019-May-30, Alvaro Herrera wrote:

> On 2018-Mar-12, Peter Eisentraut wrote:
> 
> > On 3/11/18 16:05, Tom Lane wrote:
> > > It might be worth blowing away ccache's cache just to see if it's cached
> > > anything bogus.
> > 
> > --enable-coverage disables ccache, so that's not going to be it.
> 
> (Ref. src/Makefile.global.in lines 484ff setting CCACHE_DISABLE=1)
> 
> Why does it do that?  Now that ccache is about 5 years less buggy than
> it was when you wrote this, can we get rid of this behavior?

I think I found the answer in this commit:
https://github.com/ccache/ccache/commit/02d3f078bd2495b8db2264ae0b2c692b4c5ba1bd
"Add support for coverage (compiling for gcov)"
committed for ccache 3.2.2.

So I think we could remove that stuff now ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-May-30, Alvaro Herrera wrote:
>> On 2018-Mar-12, Peter Eisentraut wrote:
>>> --enable-coverage disables ccache, so that's not going to be it.

> I think I found the answer in this commit:
> https://github.com/ccache/ccache/commit/02d3f078bd2495b8db2264ae0b2c692b4c5ba1bd
> "Add support for coverage (compiling for gcov)"
> committed for ccache 3.2.2.

> So I think we could remove that stuff now ...

I'd be a bit sad if we did, because where I usually do coverage checking
is my RHEL6 box, which has

$ ccache --version
ccache version 3.1.6

I'll probably bite the bullet and upgrade that box sometime in the
next year or so, but it's not really a near-term project.  Is there
any big hurry for having ccache for coverage builds?

            regards, tom lane



Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 2019-06-01 19:10, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> On 2019-May-30, Alvaro Herrera wrote:
>>> On 2018-Mar-12, Peter Eisentraut wrote:
>>>> --enable-coverage disables ccache, so that's not going to be it.
> 
>> I think I found the answer in this commit:
>> https://github.com/ccache/ccache/commit/02d3f078bd2495b8db2264ae0b2c692b4c5ba1bd
>> "Add support for coverage (compiling for gcov)"
>> committed for ccache 3.2.2.
> 
>> So I think we could remove that stuff now ...
> 
> I'd be a bit sad if we did, because where I usually do coverage checking
> is my RHEL6 box, which has
> 
> $ ccache --version
> ccache version 3.1.6
> 
> I'll probably bite the bullet and upgrade that box sometime in the
> next year or so, but it's not really a near-term project.  Is there
> any big hurry for having ccache for coverage builds?

I happened to come across this again.  I think we should make the change
Álvaro was proposing.  The fix in ccache is 5 years old now.

For older ccache versions, you can always set CCACHE_DISABLE yourself.
Doing it the other way around is hard or impossible with the current setup.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-06-01 19:10, Tom Lane wrote:
>> I'll probably bite the bullet and upgrade that box sometime in the
>> next year or so, but it's not really a near-term project.  Is there
>> any big hurry for having ccache for coverage builds?

> I happened to come across this again.  I think we should make the change
> Álvaro was proposing.  The fix in ccache is 5 years old now.

Fair enough, I can just install a newer ccache version locally.

But should we worry about any buildfarm members running old ccache
versions?  It'd probably be polite to notify the owners list about
what the minimum ccache version is going to be.

            regards, tom lane



Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 2019-09-17 22:12, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 2019-06-01 19:10, Tom Lane wrote:
>>> I'll probably bite the bullet and upgrade that box sometime in the
>>> next year or so, but it's not really a near-term project.  Is there
>>> any big hurry for having ccache for coverage builds?
> 
>> I happened to come across this again.  I think we should make the change
>> Álvaro was proposing.  The fix in ccache is 5 years old now.
> 
> Fair enough, I can just install a newer ccache version locally.
> 
> But should we worry about any buildfarm members running old ccache
> versions?  It'd probably be polite to notify the owners list about
> what the minimum ccache version is going to be.

Buildfarm members are not running coverage, are they?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bogus reports from coverage.postgresql.org

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-09-17 22:12, Tom Lane wrote:
>> But should we worry about any buildfarm members running old ccache
>> versions?  It'd probably be polite to notify the owners list about
>> what the minimum ccache version is going to be.

> Buildfarm members are not running coverage, are they?

Ah, of course not.  I'd forgotten the details of what we were
concerned about.  Objection withdrawn.

            regards, tom lane



Re: Bogus reports from coverage.postgresql.org

От
Peter Eisentraut
Дата:
On 2019-09-17 22:32, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 2019-09-17 22:12, Tom Lane wrote:
>>> But should we worry about any buildfarm members running old ccache
>>> versions?  It'd probably be polite to notify the owners list about
>>> what the minimum ccache version is going to be.
> 
>> Buildfarm members are not running coverage, are they?
> 
> Ah, of course not.  I'd forgotten the details of what we were
> concerned about.  Objection withdrawn.

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services