Обсуждение: Errands around AllocateDir()

Поиск
Список
Период
Сортировка

Errands around AllocateDir()

От
Michael Paquier
Дата:
Hi all,

On the recent following thread problems around the use of
AllocateDir() have been diagnosed with its use in the backend code:
https://www.postgresql.org/message-id/20171127093107.1473.70477@wrigleys.postgresql.org

I had a close look at all the callers of AllocateDir() and noticed a
couple of unwelcome things (Tom noticed some of those in the thread
mentioned above, I found others):
- Some elog() instead of ereport(), which is bad for the error code
and translation.
- Some use ereport(LOG), and could take advantage of ReadDirExtended,
which could get exposed instead of being contained in fd.c. Those
elog(LOG) can be removed when there is no further operation in the
routine where the folder read is happening.
- perform_base_backup() makes the mistake of not saving errno before
CheckXLogRemoved() when AllocateDir returns NULL, which can lead to an
incorrect error message.
- restoreTwoPhaseData() calls ReadDir() with LWLockAcquire()
in-between, which can lead to an incorrect errno used.
- Some code paths can simply rely on the error generated by ReadDir(),
so some code is useless.
- Some code paths use non-generic error messages, like
RemoveOldXlogFiles(). Here more consistent error string would I think
make sense.

Some issues are real bugs, like what's in perform_base_backup() and
restoreTwoPhaseData(), and some incorrect error codes. Missing
translation of some messages is also wrong IMO. Making error messages
more consistent is nice and cosmetic. My monitoring of all those
issues is leading me to the attached patch for HEAD. Thoughts?
-- 
Michael

Вложения

Re: Errands around AllocateDir()

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> I had a close look at all the callers of AllocateDir() and noticed a
> couple of unwelcome things (Tom noticed some of those in the thread
> mentioned above, I found others):

The only part of this that seems likely to be controversial is the
decision to get rid of special-case error messages, eg replace
    "could not open tablespace directory \"%s\": %m"
with the more generic
    "could not open directory \"%s\": %m"

As I said in the previous thread, I don't see anything much wrong
with that; but if anyone doesn't like it, speak now.

Also, I'm inclined to back-patch the exporting of ReadDirExtended,
so that we can rely on it being available if we use it in any
back-patched fixes.  I wouldn't back-patch anything else here that's
not a clear bug fix, but we may need public ReadDirExtended anyway
just to do that much (I didn't really check that yet).

            regards, tom lane


Re: Errands around AllocateDir()

От
Robert Haas
Дата:
On Mon, Dec 4, 2017 at 12:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> I had a close look at all the callers of AllocateDir() and noticed a
>> couple of unwelcome things (Tom noticed some of those in the thread
>> mentioned above, I found others):
>
> The only part of this that seems likely to be controversial is the
> decision to get rid of special-case error messages, eg replace
>         "could not open tablespace directory \"%s\": %m"
> with the more generic
>         "could not open directory \"%s\": %m"
>
> As I said in the previous thread, I don't see anything much wrong
> with that; but if anyone doesn't like it, speak now.

I find that change an improvement, but not something to back-patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Errands around AllocateDir()

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Dec 4, 2017 at 12:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The only part of this that seems likely to be controversial is the
>> decision to get rid of special-case error messages, eg replace
>> "could not open tablespace directory \"%s\": %m"
>> with the more generic
>> "could not open directory \"%s\": %m"
>> As I said in the previous thread, I don't see anything much wrong
>> with that; but if anyone doesn't like it, speak now.

> I find that change an improvement, but not something to back-patch.

Yeah, agreed.  The only thing I'm concerned about back-patching is
the places where a wrong errno might be reported.

            regards, tom lane


Re: Errands around AllocateDir()

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> Yeah, agreed.  The only thing I'm concerned about back-patching is
> the places where a wrong errno might be reported.

If we're currently reporting "could not open dir: Success" then
backpatching such a fix is definitely an improvement.  OTOH if currently
we have opendir() trying to report a failure, then LWLockRelease replace
the errno because something completely unrelated also failed, having the
message report exactly the opendir() failure rather than the lwlock
failure is surely also an improvement.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Errands around AllocateDir()

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> I had a close look at all the callers of AllocateDir() and noticed a
> couple of unwelcome things (Tom noticed some of those in the thread
> mentioned above, I found others):

Pushed with some minor additional fiddling.  The most notable thing
I changed was that instead of this:

> - perform_base_backup() makes the mistake of not saving errno before
> CheckXLogRemoved() when AllocateDir returns NULL, which can lead to an
> incorrect error message.

I modified CheckXLogRemoved() to internally guarantee that it does not
change errno.  This is because there seemed to be other call sites that
were depending on that, not just this one.  Anyway, that seemed like a
more future-proof fix than relying on callers to deal with it.

            regards, tom lane


Re: Errands around AllocateDir()

От
Michael Paquier
Дата:
On Tue, Dec 5, 2017 at 7:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> I had a close look at all the callers of AllocateDir() and noticed a
>> couple of unwelcome things (Tom noticed some of those in the thread
>> mentioned above, I found others):
>
> Pushed with some minor additional fiddling.

Thanks.

> The most notable thing I changed was that instead of this:
>
>> - perform_base_backup() makes the mistake of not saving errno before
>> CheckXLogRemoved() when AllocateDir returns NULL, which can lead to an
>> incorrect error message.
>
> I modified CheckXLogRemoved() to internally guarantee that it does not
> change errno.  This is because there seemed to be other call sites that
> were depending on that, not just this one.  Anyway, that seemed like a
> more future-proof fix than relying on callers to deal with it.

Hm, OK. Yes I can see the point behind this way of doing instead,
CheckXLogRemoved() is much used in error code paths.
-- 
Michael


Re: Errands around AllocateDir()

От
Michael Paquier
Дата:
On Tue, Dec 5, 2017 at 4:11 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Tom Lane wrote:
>> Yeah, agreed.  The only thing I'm concerned about back-patching is
>> the places where a wrong errno might be reported.
>
> If we're currently reporting "could not open dir: Success" then
> backpatching such a fix is definitely an improvement.  OTOH if currently
> we have opendir() trying to report a failure, then LWLockRelease replace
> the errno because something completely unrelated also failed, having the
> message report exactly the opendir() failure rather than the lwlock
> failure is surely also an improvement.

Note I am +/-0 with exposing ReadDirExtended in back-branches, because
there is no use for it yet there. Only fixing the actual bugs with
errno is of course fine for me if my initial message was not clear for
back-branches.
-- 
Michael


Re: Errands around AllocateDir()

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> Note I am +/-0 with exposing ReadDirExtended in back-branches, because
> there is no use for it yet there.

My thought is basically that we might end up back-patching some change
that needs that.  Nothing I've done today requires it, but it seems like
a very harmless guard against future problems.

I am BTW wondering whether any of the other directory scan loops in
the backend ought to be converted from ReadDir to ReadDirExtended(LOG).
I'm just finishing up making ResetUnloggedRelations() go the other way,
because of the argument that failing to reset unlogged relations would
be a data corruption hazard.  But there may well be other cases where
the best thing on balance is to log the problem and press on.  With
the changes we've made today, it's a very easy fix to flip from one
behavior to the other.

            regards, tom lane