Обсуждение: strange valgrind failures (again)

Поиск
Список
Период
Сортировка

strange valgrind failures (again)

От
Tomas Vondra
Дата:
Hi,

I've started observing funny valgrind failures on Fedora 28, possibly
after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
time it does not seem like platform-specific issues, though - the
failures all look like this:

==20974== Conditional jump or move depends on uninitialised value(s)
==20974==    at 0xA02088: calc_bucket (dynahash.c:870)
==20974==    by 0xA021BA: hash_search_with_hash_value (dynahash.c:963)
==20974==    by 0xA020EE: hash_search (dynahash.c:909)
==20974==    by 0x88DAB3: smgrclosenode (smgr.c:358)
==20974==    by 0x9D6C01: LocalExecuteInvalidationMessage (inval.c:607)
==20974==    by 0x86C44F: ReceiveSharedInvalidMessages (sinval.c:121)
==20974==    by 0x9D6D83: AcceptInvalidationMessages (inval.c:681)
==20974==    by 0x539B6B: AtStart_Cache (xact.c:980)
==20974==    by 0x53AA6C: StartTransaction (xact.c:1915)
==20974==    by 0x53B6F0: StartTransactionCommand (xact.c:2685)
==20974==    by 0x892EFB: start_xact_command (postgres.c:2475)
==20974==    by 0x89083E: exec_simple_query (postgres.c:923)
==20974==    by 0x894E7B: PostgresMain (postgres.c:4143)
==20974==    by 0x7F553D: BackendRun (postmaster.c:4412)
==20974==    by 0x7F4CA1: BackendStartup (postmaster.c:4084)
==20974==    by 0x7F12A0: ServerLoop (postmaster.c:1757)
==20974==    by 0x7F08CF: PostmasterMain (postmaster.c:1365)
==20974==    by 0x728E33: main (main.c:228)
==20974==  Uninitialised value was created by a stack allocation
==20974==    at 0x9D65D4: AddCatcacheInvalidationMessage (inval.c:339)
==20974==

==20974== Use of uninitialised value of size 8
==20974==    at 0xA021FD: hash_search_with_hash_value (dynahash.c:968)
==20974==    by 0xA020EE: hash_search (dynahash.c:909)
==20974==    by 0x88DAB3: smgrclosenode (smgr.c:358)
==20974==    by 0x9D6C01: LocalExecuteInvalidationMessage (inval.c:607)
==20974==    by 0x86C44F: ReceiveSharedInvalidMessages (sinval.c:121)
==20974==    by 0x9D6D83: AcceptInvalidationMessages (inval.c:681)
==20974==    by 0x539B6B: AtStart_Cache (xact.c:980)
==20974==    by 0x53AA6C: StartTransaction (xact.c:1915)
==20974==    by 0x53B6F0: StartTransactionCommand (xact.c:2685)
==20974==    by 0x892EFB: start_xact_command (postgres.c:2475)
==20974==    by 0x89083E: exec_simple_query (postgres.c:923)
==20974==    by 0x894E7B: PostgresMain (postgres.c:4143)
==20974==    by 0x7F553D: BackendRun (postmaster.c:4412)
==20974==    by 0x7F4CA1: BackendStartup (postmaster.c:4084)
==20974==    by 0x7F12A0: ServerLoop (postmaster.c:1757)
==20974==    by 0x7F08CF: PostmasterMain (postmaster.c:1365)
==20974==    by 0x728E33: main (main.c:228)
==20974==  Uninitialised value was created by a stack allocation
==20974==    at 0x9D65D4: AddCatcacheInvalidationMessage (inval.c:339)
==20974==

There are more reports in the attached log, but what they all share is
dynahash and invalidations. Which might be an arguments against a
possible valgrind bug, because that would (probably?) affect various
other places.

It's reproducible quite far back (a couple thousand commits, at least),
so it does not seem like caused by a recent commit either.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: strange valgrind failures (again)

От
Andres Freund
Дата:
Hi,

On 2019-01-15 03:07:10 +0100, Tomas Vondra wrote:
> I've started observing funny valgrind failures on Fedora 28, possibly
> after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
> time it does not seem like platform-specific issues, though - the
> failures all look like this:

Any chance you're compiling without USE_VALGRIND defined? IIRC these are
precisely what the VALGRIND_MAKE_MEM_DEFINED calls in inval.c are
intended to fight:
    /*
     * Define padding bytes in SharedInvalidationMessage structs to be
     * defined. Otherwise the sinvaladt.c ringbuffer, which is accessed by
     * multiple processes, will cause spurious valgrind warnings about
     * undefined memory being used. That's because valgrind remembers the
     * undefined bytes from the last local process's store, not realizing that
     * another process has written since, filling the previously uninitialized
     * bytes
     */
    VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));


Greetings,

Andres Freund


Re: strange valgrind failures (again)

От
Tomas Vondra
Дата:
On 1/15/19 3:11 AM, Andres Freund wrote:
> Hi,
> 
> On 2019-01-15 03:07:10 +0100, Tomas Vondra wrote:
>> I've started observing funny valgrind failures on Fedora 28, possibly
>> after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
>> time it does not seem like platform-specific issues, though - the
>> failures all look like this:
> 
> Any chance you're compiling without USE_VALGRIND defined? IIRC these are
> precisely what the VALGRIND_MAKE_MEM_DEFINED calls in inval.c are
> intended to fight:
>     /*
>      * Define padding bytes in SharedInvalidationMessage structs to be
>      * defined. Otherwise the sinvaladt.c ringbuffer, which is accessed by
>      * multiple processes, will cause spurious valgrind warnings about
>      * undefined memory being used. That's because valgrind remembers the
>      * undefined bytes from the last local process's store, not realizing that
>      * another process has written since, filling the previously uninitialized
>      * bytes
>      */
>     VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
> 
> 

... the bang you might have just heard was me facepalming

Yes, I've been compiling without USE_VALGRIND, because I've been
rebuilding using a command from shell history and the command-line grew
a bit too long to notice that.

Anyway, I find it interesting that valgrind notices this particular
place and not the other places, and that it only starts happening after
a couple of minutes of running the regression tests (~5 minutes or so).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: strange valgrind failures (again)

От
Andres Freund
Дата:
Hi,

On 2019-01-15 03:41:34 +0100, Tomas Vondra wrote:
> On 1/15/19 3:11 AM, Andres Freund wrote:
> > On 2019-01-15 03:07:10 +0100, Tomas Vondra wrote:
> >> I've started observing funny valgrind failures on Fedora 28, possibly
> >> after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
> >> time it does not seem like platform-specific issues, though - the
> >> failures all look like this:
> > 
> > Any chance you're compiling without USE_VALGRIND defined? IIRC these are
> > precisely what the VALGRIND_MAKE_MEM_DEFINED calls in inval.c are
> > intended to fight:
> >     /*
> >      * Define padding bytes in SharedInvalidationMessage structs to be
> >      * defined. Otherwise the sinvaladt.c ringbuffer, which is accessed by
> >      * multiple processes, will cause spurious valgrind warnings about
> >      * undefined memory being used. That's because valgrind remembers the
> >      * undefined bytes from the last local process's store, not realizing that
> >      * another process has written since, filling the previously uninitialized
> >      * bytes
> >      */
> >     VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
> > 
> > 
> 
> ... the bang you might have just heard was me facepalming

Heh ;)


> Anyway, I find it interesting that valgrind notices this particular
> place and not the other places, and that it only starts happening after
> a couple of minutes of running the regression tests (~5 minutes or so).

IIRC you basically need to fill the space for sinvals for this to
matter, and individual backends need to be old enough to have previously
used the same space. So it's not that easy to trigger.   I don't think
we needed many other such tricks to make valgrind work / other things
like this have been solved via valgrind.supp, so it's not that
surprising that you didn't find anything else...

Greetings,

Andres Freund