Обсуждение: Cryptohash OpenSSL error queue in FIPS enabled builds

Поиск
Список
Период
Сортировка

Cryptohash OpenSSL error queue in FIPS enabled builds

От
Daniel Gustafsson
Дата:
In trying out an OpenSSL 3.1 build with FIPS enabled I realized that our
cryptohash code had a small issue.  Calling a banned cipher generated two
different error messages interleaved:

  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: unsupported
  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: initialization error

It turns out that OpenSSL places two errors in the queue for this operation,
and we only consume one without clearing the queue in between, so we grab an
error from the previous run.

Consuming all (both) errors and creating a concatenated string seems overkill
as it would alter the API from a const error string to something that needs
freeing etc (also, very few OpenSSL consumers actually drain the queue, OpenSSL
themselves don't).  Skimming the OpenSSL code I was unable to find another
example of two errors generated.  The attached calls ERR_clear_error() as how
we do in libpq in order to avoid consuming earlier errors.

--
Daniel Gustafsson        https://vmware.com/


Вложения

Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> It turns out that OpenSSL places two errors in the queue for this operation,
> and we only consume one without clearing the queue in between, so we grab an
> error from the previous run.

Ugh.

> Consuming all (both) errors and creating a concatenated string seems overkill
> as it would alter the API from a const error string to something that needs
> freeing etc (also, very few OpenSSL consumers actually drain the queue, OpenSSL
> themselves don't).  Skimming the OpenSSL code I was unable to find another
> example of two errors generated.  The attached calls ERR_clear_error() as how
> we do in libpq in order to avoid consuming earlier errors.

This seems quite messy.  How would clearing the queue *before* creating
the object improve matters?  It seems like that solution means you're
leaving an extra error in the queue to break unrelated code.  Wouldn't
it be better to clear after grabbing the error?  (Or maybe do both.)

Also, a comment seems advisable.

            regards, tom lane



Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Daniel Gustafsson
Дата:
> On 22 Apr 2022, at 19:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:

>> Consuming all (both) errors and creating a concatenated string seems overkill
>> as it would alter the API from a const error string to something that needs
>> freeing etc (also, very few OpenSSL consumers actually drain the queue, OpenSSL
>> themselves don't).  Skimming the OpenSSL code I was unable to find another
>> example of two errors generated.  The attached calls ERR_clear_error() as how
>> we do in libpq in order to avoid consuming earlier errors.
>
> This seems quite messy.  How would clearing the queue *before* creating
> the object improve matters?

We know there won't be any leftovers which would make us display the wrong
message.

> It seems like that solution means you're leaving an extra error in the queue to
> break unrelated code.  Wouldn't it be better to clear after grabbing the error?
> (Or maybe do both.)

That's a very good point, doing it in both ends of the operation is better
here.

> Also, a comment seems advisable.

Agreed.

--
Daniel Gustafsson        https://vmware.com/




Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Michael Paquier
Дата:
On Sat, Apr 23, 2022 at 11:40:19PM +0200, Daniel Gustafsson wrote:
> On 22 Apr 2022, at 19:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>> Consuming all (both) errors and creating a concatenated string seems overkill
>>> as it would alter the API from a const error string to something that needs
>>> freeing etc (also, very few OpenSSL consumers actually drain the queue, OpenSSL
>>> themselves don't).  Skimming the OpenSSL code I was unable to find another
>>> example of two errors generated.  The attached calls ERR_clear_error() as how
>>> we do in libpq in order to avoid consuming earlier errors.

It looks like the initialization error would come only from
evp_md_init_internal() in digest.c.

>> This seems quite messy.  How would clearing the queue *before* creating
>> the object improve matters?
>
> We know there won't be any leftovers which would make us display the wrong
> message.

Yeah.

>> It seems like that solution means you're leaving an extra error in the queue to
>> break unrelated code.  Wouldn't it be better to clear after grabbing the error?
>> (Or maybe do both.)
>
> That's a very good point, doing it in both ends of the operation is better
> here.

Error queues are cleaned with ERR_clear_error() before specific SSL
calls in the frontend and the backend, never after the fact.  If we
assume that multiple errors can be stacked in the OpenSSL error queue,
shouldn't we worry about cleaning up the error queue in code paths
like pgtls_read/write(), be_tls_read/write() and be_tls_open_server()?
So it seems to me that SSLerrmessage() should be treated the same way
for the backend and the frontend.  Any opinions?

pgcrypto's openssl.c has the same problem under FIPS as it includes
EVP calls.  Saying that, putting a cleanup in pg_cryptohash_create()
before the fact, and one in SSLerrmessage() after consuming the error
code should be fine to keep a clean queue.

Daniel, were you planning to write a patch?  The other parts of the
code are older than the hmac and cryptohash business, but I would not
mind writing something for the whole.
--
Michael

Вложения

Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Daniel Gustafsson
Дата:
> On 25 Apr 2022, at 02:50, Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Apr 23, 2022 at 11:40:19PM +0200, Daniel Gustafsson wrote:
>> On 22 Apr 2022, at 19:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>>> It seems like that solution means you're leaving an extra error in the queue to
>>> break unrelated code.  Wouldn't it be better to clear after grabbing the error?
>>> (Or maybe do both.)
>>
>> That's a very good point, doing it in both ends of the operation is better
>> here.
>
> Error queues are cleaned with ERR_clear_error() before specific SSL
> calls in the frontend and the backend, never after the fact.  If we
> assume that multiple errors can be stacked in the OpenSSL error queue,
> shouldn't we worry about cleaning up the error queue in code paths
> like pgtls_read/write(), be_tls_read/write() and be_tls_open_server()?
> So it seems to me that SSLerrmessage() should be treated the same way
> for the backend and the frontend.  Any opinions?

Well, clearing the queue before calling into OpenSSL is the programming pattern
which is quite universally followed so I'm not sure we need to litter the
codepaths with calls to clearing the queue as we leave.

In this particular codepath I think we can afford clearing it on the way out,
with a comment explaining why.  It's easily reproducible and adding a call and
a comment is a good documentation for ourselves of this OpenSSL behavior.  That
being said, clearing on the way in is the important bit.

> pgcrypto's openssl.c has the same problem under FIPS as it includes
> EVP calls.  Saying that, putting a cleanup in pg_cryptohash_create()
> before the fact, and one in SSLerrmessage() after consuming the error
> code should be fine to keep a clean queue.

pgcrypto doesn't really consume or even inspect the OpenSSL errors, but pass on
a PXE error based on the context of the operation.  We could clear the queue as
we leave, but as you say we already clear it before calling in other places so
it's not clear that it's useful.  We've had EVP in pgcrypto for some time
without seeing issues from error queues, one error left isn't that different
from two when not consumed.

The attached 0002 does however correctly (IMO) report the error as an init
error instead of the non-descript generic error, which isn't really all that
helpful.  I think that also removes the last consumer of the generic error, but
I will take another look with fresh eyes to confirm that.

0003 removes what I think is a weirdly placed questionmark from the message
that make it seem strangely ambiguous.  This needs to update the test answer
files as well, but I first wanted to float the idea before doing that.

--
Daniel Gustafsson        https://vmware.com/


Вложения

Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> In this particular codepath I think we can afford clearing it on the way out,
> with a comment explaining why.

Yeah.  It seems out of the ordinary for an OpenSSL call to stack
two error conditions, so treating a known case of that specially
seems reasonable.  Patches seem sane from here.

            regards, tom lane



Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Michael Paquier
Дата:
On Tue, Apr 26, 2022 at 12:07:32AM +0200, Daniel Gustafsson wrote:
> In this particular codepath I think we can afford clearing it on the way out,
> with a comment explaining why.  It's easily reproducible and adding a call and
> a comment is a good documentation for ourselves of this OpenSSL behavior.  That
> being said, clearing on the way in is the important bit.

+     * consumed an error, but cipher initialization can in FIPS enabled
It seems to me that this comment needs a hyphen, as of
"FIPS-enabled".

I am a bit annoyed to assume that having only a localized
ERR_clear_error() in the error code path of the init() call is the
only problem that would occur, only because that's the first one we'd
see in a hash computation.  So my choice would be to call
ERR_get_error() within SSLerrmessage() and clear the queue after
fetching the error code via ERR_get_error() for both
cryptohash_openssl.c and hmac_openssl.c, but I won't fight hard
against both of you on this point, either.

Perhaps this should be reported to the upstream folks?  We'd still
need this code for already released versions, but getting two errors
looks like a mistake.

> pgcrypto doesn't really consume or even inspect the OpenSSL errors, but pass on
> a PXE error based on the context of the operation.  We could clear the queue as
> we leave, but as you say we already clear it before calling in other places so
> it's not clear that it's useful.  We've had EVP in pgcrypto for some time
> without seeing issues from error queues, one error left isn't that different
> from two when not consumed.

Okay.  I did not recall the full error stack used in pgcrypto.  It is
annoying to not get from OpenSSL the details of the error, though.
With FIPS enabled, one computing a hash with pgcrypto would just know
about the initialization error, but would miss why the computation
failed.  It looks like we could use a new error code to tell
px_strerror() to look at the OpenSSL error queue instead of one of the
hardcoded strings.  Just saying.

> The attached 0002 does however correctly (IMO) report the error as an init
> error instead of the non-descript generic error, which isn't really all that
> helpful.  I think that also removes the last consumer of the generic error, but
> I will take another look with fresh eyes to confirm that.
>
> 0003 removes what I think is a weirdly placed questionmark from the message
> that make it seem strangely ambiguous.  This needs to update the test answer
> files as well, but I first wanted to float the idea before doing that.

Good catches.
--
Michael

Вложения

Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Daniel Gustafsson
Дата:
> On 26 Apr 2022, at 03:55, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Apr 26, 2022 at 12:07:32AM +0200, Daniel Gustafsson wrote:
>> In this particular codepath I think we can afford clearing it on the way out,
>> with a comment explaining why.  It's easily reproducible and adding a call and
>> a comment is a good documentation for ourselves of this OpenSSL behavior.  That
>> being said, clearing on the way in is the important bit.
>
> +     * consumed an error, but cipher initialization can in FIPS enabled
> It seems to me that this comment needs a hyphen, as of
> "FIPS-enabled".

Will fix.

> I am a bit annoyed to assume that having only a localized
> ERR_clear_error() in the error code path of the init() call is the
> only problem that would occur, only because that's the first one we'd
> see in a hash computation.

It's also the only one in this case since the computation won't get past the
init step with the error no?  The queue will be cleared for each computation so
the risk of cross contamination is removed.

> Perhaps this should be reported to the upstream folks?  We'd still
> need this code for already released versions, but getting two errors
> looks like a mistake.

Not really, the error system in OpenSSL has been defined as a queue with
multiple errors per call possible at least since SSLeay 0.9.1.  I think this is
very much intentional, but a rare case of it.

>> pgcrypto doesn't really consume or even inspect the OpenSSL errors, but pass on
>> a PXE error based on the context of the operation.  We could clear the queue as
>> we leave, but as you say we already clear it before calling in other places so
>> it's not clear that it's useful.  We've had EVP in pgcrypto for some time
>> without seeing issues from error queues, one error left isn't that different
>> from two when not consumed.
>
> Okay.  I did not recall the full error stack used in pgcrypto.  It is
> annoying to not get from OpenSSL the details of the error, though.
> With FIPS enabled, one computing a hash with pgcrypto would just know
> about the initialization error, but would miss why the computation
> failed.  It looks like we could use a new error code to tell
> px_strerror() to look at the OpenSSL error queue instead of one of the
> hardcoded strings.  Just saying.

I looked at that briefly, and might revisit it during the 16 cycle, but it does
have a smell of diminishing returns to it.  With non-OpenSSL code ripped out
from pgcrypto it's clearly more interesting than before.

--
Daniel Gustafsson        https://vmware.com/




Re: Cryptohash OpenSSL error queue in FIPS enabled builds

От
Michael Paquier
Дата:
On Tue, Apr 26, 2022 at 03:15:24PM +0200, Daniel Gustafsson wrote:
> On 26 Apr 2022, at 03:55, Michael Paquier <michael@paquier.xyz> wrote:
>> I am a bit annoyed to assume that having only a localized
>> ERR_clear_error() in the error code path of the init() call is the
>> only problem that would occur, only because that's the first one we'd
>> see in a hash computation.
>
> It's also the only one in this case since the computation won't get past the
> init step with the error no?  The queue will be cleared for each computation so
> the risk of cross contamination is removed.

I was wondering about the case where an error is applied while
updating or finishing the cryptohash, not just the creation or the
initialization.  But cleaning up the queue when beginning a
computation is fine enough.

>> Okay.  I did not recall the full error stack used in pgcrypto.  It is
>> annoying to not get from OpenSSL the details of the error, though.
>> With FIPS enabled, one computing a hash with pgcrypto would just know
>> about the initialization error, but would miss why the computation
>> failed.  It looks like we could use a new error code to tell
>> px_strerror() to look at the OpenSSL error queue instead of one of the
>> hardcoded strings.  Just saying.
>
> I looked at that briefly, and might revisit it during the 16 cycle, but it does
> have a smell of diminishing returns to it.  With non-OpenSSL code ripped out
> from pgcrypto it's clearly more interesting than before.

Clearly.

For the sake of the archives, this patch series has been applied as
17ec5fa, 0250a16 and ee97d46.
--
Michael

Вложения