Обсуждение: Uh, this is *not* a 64-bit CRC ...

Поиск
Список
Период
Сортировка

Uh, this is *not* a 64-bit CRC ...

От
Tom Lane
Дата:
I just took a close look at the COMP_CRC64 macro in xlog.c.

This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
on just the odd-numbered bytes and one on just the even-numbered bytes
of the datastream.  That's hardly any stronger than a single 32-bit CRC;
it's certainly not what I thought we had agreed to implement.

We can't change this algorithm without forcing an initdb, which would be
a rather unpleasant thing to do at this late stage of the release cycle.
But I'm not happy with it.  Comments?
        regards, tom lane


Re: Uh, this is *not* a 64-bit CRC ...

От
ncm@zembu.com (Nathan Myers)
Дата:
On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> I just took a close look at the COMP_CRC64 macro in xlog.c.
> 
> This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
> on just the odd-numbered bytes and one on just the even-numbered bytes
> of the datastream.  That's hardly any stronger than a single 32-bit CRC;
> it's certainly not what I thought we had agreed to implement.
> 
> We can't change this algorithm without forcing an initdb, which would be
> a rather unpleasant thing to do at this late stage of the release cycle.
> But I'm not happy with it.  Comments?

This might be a good time to update:
 The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
   ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
 From the README:
 The code in this package has been derived from the BTLib package obtained from Christian Iseli
<chris@ludwig-alpha.unil.ch>.From his mail:
 
 The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.  Flannery, "Numerical recipes in C", 2nd
ed.,Cambridge University Press.  Pages 896ff.
 
 The generator polynomial is x64 + x4 + x3 + x1 + 1.

I would suggest that if you don't change the algorithm, at least change
the name in the sources.  Were you to #ifdef in a real crc-64, and make 
a compile-time option to select the old one, you could allow users who 
wish to avoid the initdb a way to continue with the existing pair of 
CRC-32s.

Nathan Myers
ncm@zembu.com


Re: Uh, this is *not* a 64-bit CRC ...

От
Bruce Momjian
Дата:
I will just add a TODO item and we can hit it for 7.2.


> On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> > I just took a close look at the COMP_CRC64 macro in xlog.c.
> > 
> > This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
> > on just the odd-numbered bytes and one on just the even-numbered bytes
> > of the datastream.  That's hardly any stronger than a single 32-bit CRC;
> > it's certainly not what I thought we had agreed to implement.
> > 
> > We can't change this algorithm without forcing an initdb, which would be
> > a rather unpleasant thing to do at this late stage of the release cycle.
> > But I'm not happy with it.  Comments?
> 
> This might be a good time to update:
> 
>   The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> 
>     ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
> 
>   From the README:
> 
>   The code in this package has been derived from the BTLib package
>   obtained from Christian Iseli <chris@ludwig-alpha.unil.ch>.
>   From his mail:
> 
>   The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
>   B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
>   Press.  Pages 896ff.
> 
>   The generator polynomial is x64 + x4 + x3 + x1 + 1.
> 
> I would suggest that if you don't change the algorithm, at least change
> the name in the sources.  Were you to #ifdef in a real crc-64, and make 
> a compile-time option to select the old one, you could allow users who 
> wish to avoid the initdb a way to continue with the existing pair of 
> CRC-32s.
> 
> Nathan Myers
> ncm@zembu.com
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Uh, this is *not* a 64-bit CRC ...

От
Bruce Momjian
Дата:
Added to TODO:
* Correct CRC WAL code to be normal CRC32 algorithm 

> On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> > I just took a close look at the COMP_CRC64 macro in xlog.c.
> > 
> > This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
> > on just the odd-numbered bytes and one on just the even-numbered bytes
> > of the datastream.  That's hardly any stronger than a single 32-bit CRC;
> > it's certainly not what I thought we had agreed to implement.
> > 
> > We can't change this algorithm without forcing an initdb, which would be
> > a rather unpleasant thing to do at this late stage of the release cycle.
> > But I'm not happy with it.  Comments?
> 
> This might be a good time to update:
> 
>   The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> 
>     ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
> 
>   From the README:
> 
>   The code in this package has been derived from the BTLib package
>   obtained from Christian Iseli <chris@ludwig-alpha.unil.ch>.
>   From his mail:
> 
>   The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
>   B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
>   Press.  Pages 896ff.
> 
>   The generator polynomial is x64 + x4 + x3 + x1 + 1.
> 
> I would suggest that if you don't change the algorithm, at least change
> the name in the sources.  Were you to #ifdef in a real crc-64, and make 
> a compile-time option to select the old one, you could allow users who 
> wish to avoid the initdb a way to continue with the existing pair of 
> CRC-32s.
> 
> Nathan Myers
> ncm@zembu.com
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Uh, this is *not* a 64-bit CRC ...

От
ncm@zembu.com (Nathan Myers)
Дата:
On Wed, Feb 28, 2001 at 09:17:19PM -0500, Bruce Momjian wrote:
> > On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> > > I just took a close look at the COMP_CRC64 macro in xlog.c.
> > > 
> > > This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
> > > on just the odd-numbered bytes and one on just the even-numbered bytes
> > > of the datastream.  That's hardly any stronger than a single 32-bit CRC;
> > > it's certainly not what I thought we had agreed to implement.
> > > 
> > > We can't change this algorithm without forcing an initdb, which would be
> > > a rather unpleasant thing to do at this late stage of the release cycle.
> > > But I'm not happy with it.  Comments?
> > 
> > This might be a good time to update:
> > 
> >   The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> > 
> >     ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
> > 
> >   From the README:
> > 
> >   The code in this package has been derived from the BTLib package
> >   obtained from Christian Iseli <chris@ludwig-alpha.unil.ch>.
> >   From his mail:
> > 
> >   The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> >   B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> >   Press.  Pages 896ff.
> > 
> >   The generator polynomial is x64 + x4 + x3 + x1 + 1.
> > 
> > I would suggest that if you don't change the algorithm, at least change
> > the name in the sources.  Were you to #ifdef in a real crc-64, and make 
> > a compile-time option to select the old one, you could allow users who 
> > wish to avoid the initdb a way to continue with the existing pair of 
> > CRC-32s.
>
> Added to TODO:
> 
>     * Correct CRC WAL code to be normal CRC32 algorithm 

Um, how about
     * Correct CRC WAL code to be a real CRC64 algorithm

instead?

Nathan Myers
ncm@zembu.com


Re: Uh, this is *not* a 64-bit CRC ...

От
Bruce Momjian
Дата:
> > Added to TODO:
> > 
> >     * Correct CRC WAL code to be normal CRC32 algorithm 
> 
> Um, how about
> 
>       * Correct CRC WAL code to be a real CRC64 algorithm
> 
> instead?

Done.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Uh, this is *not* a 64-bit CRC ...

От
Tom Lane
Дата:
ncm@zembu.com (Nathan Myers) writes:
>   The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
>     ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz

>   From the README:

>   The code in this package has been derived from the BTLib package
>   obtained from Christian Iseli <chris@ludwig-alpha.unil.ch>.
>   From his mail:

>   The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
>   B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
>   Press.  Pages 896ff.

>   The generator polynomial is x64 + x4 + x3 + x1 + 1.

Nathan (or anyone else with a copy of "Numerical recipes in C", which
I'm embarrassed to admit I don't own), is there any indication in there
that anyone spent any effort on choosing that particular generator
polynomial?  As far as I can see, it violates one of the standard
guidelines for choosing a polynomial, namely that it be a multiple of
(x + 1) ... which in modulo-2 land is equivalent to having an even
number of terms, which this ain't got.  See Ross Williams'
A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS, available from
ftp://ftp.rocksoft.com/papers/crc_v3.txt among other places, which is
by far the most thorough and readable thing I've ever seen on CRCs.

I spent some time digging around the net for standard CRC64 polynomials,
and the only thing I could find that looked like it might have been
picked by someone who understood what they were doing is in the DLT
(digital linear tape) standard, ECMA-182 (available from
http://www.ecma.ch/ecma1/STAND/ECMA-182.HTM):

x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 +
x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 +
x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
x^7 + x^4 + x + 1

I'm probably going to go with this one unless someone can come up with
an authoritative source for another one.
        regards, tom lane


Re: Uh, this is *not* a 64-bit CRC ...

От
"Vadim Mikheev"
Дата:
> This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
> on just the odd-numbered bytes and one on just the even-numbered bytes
> of the datastream.  That's hardly any stronger than a single 32-bit CRC;

I believe that the longer data the more chance to get same CRC/hash
for different data sets (if data length > CRC/hash length). Or am I wrong?
Having no crc64 implementation (see below) I decided to use 2 CRC32 instead
of one - it looked better, without any additional cost (record header is
8 byte aligned anyway, on, mmm, most platform).

> it's certainly not what I thought we had agreed to implement.

I've asked if anyone can send crc64 impl to me and got only one from
Nathan Myers. Unfortunately, SWISS-PROT impl assumes that long long
is 8 bytes - is it portable?

Vadim




Re: Re: Uh, this is *not* a 64-bit CRC ...

От
Tom Lane
Дата:
"Vadim Mikheev" <vmikheev@sectorbase.com> writes:
> I've asked if anyone can send crc64 impl to me and got only one from
> Nathan Myers. Unfortunately, SWISS-PROT impl assumes that long long
> is 8 bytes - is it portable?

No, it's not.  I have written an implementation that uses uint64 if
available (per configure results) otherwise a pair of uint32 registers.
(See pg_crc.h in as-yet-uncommitted patches I posted to pgpatches.)

Interestingly, gcc -O on HP-PA generates essentially the same code
sequence for either the 64- or dual-32-bit versions of the macro...
        regards, tom lane


Re: Uh, this is *not* a 64-bit CRC ...

От
ncm@zembu.com (Nathan Myers)
Дата:
On Mon, Mar 05, 2001 at 02:00:59PM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> >   The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> >     ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
> 
> >   From the README:
> 
> >   The code in this package has been derived from the BTLib package
> >   obtained from Christian Iseli <chris@ludwig-alpha.unil.ch>.
> >   From his mail:
> 
> >   The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> >   B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> >   Press.  Pages 896ff.
> 
> >   The generator polynomial is x64 + x4 + x3 + x1 + 1.
> 
> Nathan (or anyone else with a copy of "Numerical recipes in C", which
> I'm embarrassed to admit I don't own), is there any indication in there
> that anyone spent any effort on choosing that particular generator
> polynomial?  As far as I can see, it violates one of the standard
> guidelines for choosing a polynomial, namely that it be a multiple of
> (x + 1) ... which in modulo-2 land is equivalent to having an even
> number of terms, which this ain't got.  See Ross Williams'
> A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS, available from
> ftp://ftp.rocksoft.com/papers/crc_v3.txt among other places, which is
> by far the most thorough and readable thing I've ever seen on CRCs.
> 
> I spent some time digging around the net for standard CRC64 polynomials,
> and the only thing I could find that looked like it might have been
> picked by someone who understood what they were doing is in the DLT
> (digital linear tape) standard, ECMA-182 (available from
> http://www.ecma.ch/ecma1/STAND/ECMA-182.HTM):
> 
> x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 +
> x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 +
> x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
> x^7 + x^4 + x + 1

I'm sorry to have taken so long to reply.  

The polynomial chosen for SWISS-PROT turns out to be presented, in 
Numerical Recipes, just as an example of a primitive polynomial of 
that degree; no assertion is made about its desirability for error 
checking.  It is (in turn) drawn from E. J. Watson, "Mathematics of 
Computation", vol. 16, pp368-9.

Having (x + 1) as a factor guarantees to catch all errors in which
an odd number of bits have been changed.  Presumably you are then
infinitesimally less likely to catch all errors in which an even 
number of bits have been changed.

I would have posted the ECMA-182 polynomial if I had found it.  (That 
was good searching!)  One hopes that the ECMA polynomial was chosen more 
carefully than entirely at random.  High-degree codes are often chosen 
by Monte Carlo methods, by applying statistical tests to randomly-chosen 
values, because the search space is so large.

I have verified that Tom transcribed the polynomial correctly from
the PDF image.  The ECMA document doesn't say whether their polynomial
is applied "bit-reversed", but the check would be equally strong either
way.

Nathan Myers
ncm@zembu.com