Обсуждение: Cosmic ray hits integerset
Hi, Here's a curious one-off failure in test_integerset: +ERROR: iterate returned wrong value; got 519985430528, expected 485625692160 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2021-04-01%2018:19:47
On 2021-Jun-22, Thomas Munro wrote: > Hi, > > Here's a curious one-off failure in test_integerset: > > +ERROR: iterate returned wrong value; got 519985430528, expected 485625692160 Cosmic rays indeed. The base-2 representation of the expected value is 111000100010001100011000000000000000000 and that of the actual value is 111100100010001100011000000000000000000 There's a single bit of difference. -- Álvaro Herrera Valdivia, Chile "No hay hombre que no aspire a la plenitud, es decir, la suma de experiencias de que un hombre es capaz"
> 22 июня 2021 г., в 19:21, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а): > > On 2021-Jun-22, Thomas Munro wrote: > >> Hi, >> >> Here's a curious one-off failure in test_integerset: >> >> +ERROR: iterate returned wrong value; got 519985430528, expected 485625692160 > > Cosmic rays indeed. The base-2 representation of the expected value is > 111000100010001100011000000000000000000 > and that of the actual value is > 111100100010001100011000000000000000000 > > There's a single bit of difference. I've tried to explain this as not a single-event upset, but integer overflow in 30-bits mode of simple8b somewhere. But foundnothing so far. Actual error is in bit 35, and next mode is 60-bit mode. Looks like cosmic ray to me too. Best regards, Andrey Borodin.
Hi, Asking out of pure technical curiosity about "the rhinoceros" - what kind of animal is it ? Physical box or VM? How onecould get dmidecode(1) / dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?) -J. > -----Original Message----- > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Sent: Tuesday, June 22, 2021 4:21 PM > To: Thomas Munro <thomas.munro@gmail.com> > Cc: pgsql-hackers <pgsql-hackers@postgresql.org> > Subject: Re: Cosmic ray hits integerset > > On 2021-Jun-22, Thomas Munro wrote: > > > Hi, > > > > Here's a curious one-off failure in test_integerset: > > > > +ERROR: iterate returned wrong value; got 519985430528, expected > > +485625692160 > > Cosmic rays indeed. The base-2 representation of the expected value is > 111000100010001100011000000000000000000 > and that of the actual value is > 111100100010001100011000000000000000000 > > There's a single bit of difference.
On 7/7/21 2:53 AM, Jakub Wartak wrote: > Hi, Asking out of pure technical curiosity about "the rhinoceros" - what kind of animal is it ? Physical box or VM? Howone could get dmidecode(1) / dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?) Rhinoceros is just a VM on a simple desktop machine. Nothing fancy. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Fwiw, yes it could be a cosmic ray. It could also just be marginally bad ram. Bad ram is notoriously hard to reliably test for. It can be very sensitive to the exact bit pattern stored in it, the timing of reads and writes, and other factors. The whole point of the rowhammer attacks is to push some of those timing factors hard but the same failures can happen randomly. On Wed, 7 Jul 2021 at 08:14, Joe Conway <mail@joeconway.com> wrote: > > On 7/7/21 2:53 AM, Jakub Wartak wrote: > > Hi, Asking out of pure technical curiosity about "the rhinoceros" - what kind of animal is it ? Physical box or VM? Howone could get dmidecode(1) / dmesg(1) / mcelog (1) from what's out there (e.g. does it run ECC or not ?) > > > Rhinoceros is just a VM on a simple desktop machine. Nothing fancy. > > Joe > > -- > Crunchy Data - http://crunchydata.com > PostgreSQL Support for Secure Enterprises > Consulting, Training, & Open Source Development > > -- greg