Обсуждение: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      7815
Logged by:          George Machitidze
Email address:      giomac@gmail.com
PostgreSQL version: 9.2.2
Operating system:   Fedora 18 Linux
Description:        =


https://bugzilla.redhat.com/show_bug.cgi?id=3D896161
Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
with invalid message "There seems to be a postmaster servicing the old
cluster". Looks like pg_upgrade is checking pid file too early without
waiting for master process to exit:

open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) =3D 5
On Fri, Jan 18, 2013 at 10:19:48PM +0000, giomac@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      7815
> Logged by:          George Machitidze
> Email address:      giomac@gmail.com
> PostgreSQL version: 9.2.2
> Operating system:   Fedora 18 Linux
> Description:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896161
> Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
> with invalid message "There seems to be a postmaster servicing the old
> cluster". Looks like pg_upgrade is checking pid file too early without
> waiting for master process to exit:
>
> open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5

How are you shutting down the postmaster?  Are you use pg_ctl -w stop?
If not, you have to wait for the server to actually shut down before
starting pg_upgrade.  pg_upgrade is not going to do that waiting.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes:
> On Fri, Jan 18, 2013 at 10:19:48PM +0000, giomac@gmail.com wrote:
>> https://bugzilla.redhat.com/show_bug.cgi?id=896161
>> Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
>> with invalid message "There seems to be a postmaster servicing the old
>> cluster". Looks like pg_upgrade is checking pid file too early without
>> waiting for master process to exit:
>>
>> open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5

> How are you shutting down the postmaster?  Are you use pg_ctl -w stop?
> If not, you have to wait for the server to actually shut down before
> starting pg_upgrade.  pg_upgrade is not going to do that waiting.

The backstory on this is at the cited Red Hat bug ... apparently the OP
decided I was clueless and he needed to consult some real authorities.

The existing pg_control clearly says that the cluster was shut down,
so it's not clear why there's still a postmaster.pid file there.
There's some debugging to be done yet about how that got to be that way.
(AFAICS the RPM upgrade process ought to shut down the old postmaster
before installing a new one; but somehow that went wrong, or else a
doppelganger postmaster.pid rose from the dead.  Anyway, that's not a
matter for this list because it involves Red Hat upgrade processes, not
anything supplied by the community.)

In the meantime, I was wondering a bit why pg_upgrade looks at the
postmaster.pid file at all.  Generally we recommend that startup scripts
*not* look at the lock file but just try to start a postmaster, and
leave it to the postmaster to decide if there's a valid lockfile
present.  Is it really appropriate for pg_upgrade to do this
differently?  I think the complained-of case would have gone through
cleanly if that error check weren't there, or in any case the postmaster
would have done a better job of checking for a conflicting postmaster.

            regards, tom lane
On Sat, Jan 19, 2013 at 12:02:31AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Fri, Jan 18, 2013 at 10:19:48PM +0000, giomac@gmail.com wrote:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=896161
> >> Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
> >> with invalid message "There seems to be a postmaster servicing the old
> >> cluster". Looks like pg_upgrade is checking pid file too early without
> >> waiting for master process to exit:
> >>
> >> open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5
>
> > How are you shutting down the postmaster?  Are you use pg_ctl -w stop?
> > If not, you have to wait for the server to actually shut down before
> > starting pg_upgrade.  pg_upgrade is not going to do that waiting.
>
> The backstory on this is at the cited Red Hat bug ... apparently the OP
> decided I was clueless and he needed to consult some real authorities.

Yes, it was clear there was some backstory in reading that thread.

> The existing pg_control clearly says that the cluster was shut down,
> so it's not clear why there's still a postmaster.pid file there.
> There's some debugging to be done yet about how that got to be that way.
> (AFAICS the RPM upgrade process ought to shut down the old postmaster
> before installing a new one; but somehow that went wrong, or else a
> doppelganger postmaster.pid rose from the dead.  Anyway, that's not a
> matter for this list because it involves Red Hat upgrade processes, not
> anything supplied by the community.)
>
> In the meantime, I was wondering a bit why pg_upgrade looks at the
> postmaster.pid file at all.  Generally we recommend that startup scripts
> *not* look at the lock file but just try to start a postmaster, and
> leave it to the postmaster to decide if there's a valid lockfile
> present.  Is it really appropriate for pg_upgrade to do this
> differently?  I think the complained-of case would have gone through
> cleanly if that error check weren't there, or in any case the postmaster
> would have done a better job of checking for a conflicting postmaster.

The reason we check for postmaster.pid is so we can give the user a clue
about which postmaster is running.  We want to make sure everything is
super-clean before we do anything.  What we could do is to first try to
start each cluster, and then fail if the start fails, but the start
could fail for all sorts of reasons so it doesn't really seems like a
win.

Also, we don't want to start on a non-clean shutdown, so the missing pid
file tells us it was clean.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes:
> On Sat, Jan 19, 2013 at 12:02:31AM -0500, Tom Lane wrote:
>> In the meantime, I was wondering a bit why pg_upgrade looks at the
>> postmaster.pid file at all.

> The reason we check for postmaster.pid is so we can give the user a clue
> about which postmaster is running.

[ scratches head... ]  I failed to detect any such clue in the error
message it prints.  Had you printed the PID from the file, or even
better looked to see if that process was actually still alive, this
argument would be reasonable.  But pg_upgrade does neither of those,
whereas if it had started a postmaster the postmaster would have done
both of those things.

> Also, we don't want to start on a non-clean shutdown, so the missing pid
> file tells us it was clean.

I agree that super paranoia is not unreasonable in pg_upgrade.  But it
would be useful to print something similar to what the backend prints,
about checking whether PID N is still there and manually removing the
lock file if not.  Or (ahem) you could let the existing backend-side
logic do that for you, rather than reimplementing that logic badly.

Meanwhile I still have to figure out how come the postmaster.pid file
is still there in the OP's case ...

            regards, tom lane
On Sat, Jan 19, 2013 at 12:47:03AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Sat, Jan 19, 2013 at 12:02:31AM -0500, Tom Lane wrote:
> >> In the meantime, I was wondering a bit why pg_upgrade looks at the
> >> postmaster.pid file at all.
>
> > The reason we check for postmaster.pid is so we can give the user a clue
> > about which postmaster is running.
>
> [ scratches head... ]  I failed to detect any such clue in the error
> message it prints.  Had you printed the PID from the file, or even
> better looked to see if that process was actually still alive, this
> argument would be reasonable.  But pg_upgrade does neither of those,
> whereas if it had started a postmaster the postmaster would have done
> both of those things.
>
> > Also, we don't want to start on a non-clean shutdown, so the missing pid
> > file tells us it was clean.
>
> I agree that super paranoia is not unreasonable in pg_upgrade.  But it
> would be useful to print something similar to what the backend prints,
> about checking whether PID N is still there and manually removing the
> lock file if not.  Or (ahem) you could let the existing backend-side
> logic do that for you, rather than reimplementing that logic badly.

The current output is:

    There seems to be a postmaster servicing the old cluster.
        Please shutdown that postmaster and try again.

You are right that it is inaccurate.   I should reword that to say the
server is running or was not properly shut down:

    There seems to be a postmaster servicing the old cluster, or
    it was not properly shut down.    Please cleanly shutdown that
    postmaster and try again.

Why is a clean shutdown important?  If the server crashed, we would have
committed transactions in the WAL files which are not transfered to the
new server, and would be lost.

I am hesistant to even start such an old server because pg_upgrade never
modifies the old server.  Even starting it in that case would be
modifying it.

The other problem is that if the server start fails, how do we know if
the failure was due to a running postmaster?  I could later check the
postmaster.pid file, but it might have failed not yet getting to the
section where we remove that file.

The server-still-running is a common cause of failure, so I wanted
something that was very clear, rather than a generic
can't-start-the-server.

I am open to ideas.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes:
> Why is a clean shutdown important?  If the server crashed, we would have
> committed transactions in the WAL files which are not transfered to the
> new server, and would be lost.

> I am hesistant to even start such an old server because pg_upgrade never
> modifies the old server.  Even starting it in that case would be
> modifying it.

I'm not really following this logic.  If the old cluster was in a
crashed state, why would we not expect that starting a postmaster would
be the best (only) way to repair the damage and make everything good
again?  Isn't that exactly what the user would have to do anyway?  What
other action would you expect him to take instead?

(But, at least with the type of packaging I'm using in Fedora, he would
first have to go through a package downgrade/reinstallation process,
because the packaging provides no simple scripted way of manually
starting the old postgres executable, only the new one.  Moreover, what
pg_upgrade is printing provides no help in figuring out whether that's
the next step.)

I do sympathize with taking a paranoid attitude here, but I'm failing
to see what advantage there is in not attempting to start the old
postmaster.  In the *only* case that pg_upgrade is successfully
protecting against with this logic, namely there's-an-active-postmaster-
already, the postmaster is equally able to protect itself.  In other
cases it would be more helpful not less to let the postmaster analyze
the situation.

> The other problem is that if the server start fails, how do we know if
> the failure was due to a running postmaster?

Because we read the postmaster's log file, or at least tell the user to
do so.  That report would be unambiguous, unlike pg_upgrade's.

            regards, tom lane
On Sat, Jan 19, 2013 at 10:45:15PM +0400, George Machitidze wrote:
> Hi Bruce, Tom
>
> >The backstory on this is at the cited Red Hat bug ... apparently the OP
> >decided I was clueless and he needed to consult some real authorities.
> Oh come on, I'm very sure you both are good guys and know what you are doing,
> none of us is ignorant bastard :)
> Decided to open case here too, because of simple reason - maybe someone had
> same issue, or knows how pg_upgrade works (in details) better than me, because
> I am clueless.
> This is test DB and I can erase it, but I'm very sure there's something wrong
> in upgrade process - this is what I want to be solved.
>
> Now, we can open a bottle of whiskey and go back to the problem:
> 1. I didn't run postmaster before/during pg_upgrade, it was never invoked
> manually in this process
> 2. There is no pid file AFTER application is stopped, but looks like it's there
> while pg_upgrade is running - strace showed that and there is no need to run
> FAM to verify that
>
> I don't know how pg_upgrade works, looks like it's trying to start postmaster,
> which runs, postmaster.pid is created, then postmaster fails stop or needs some
> more time bedore pg_upgrade is checking it's pid. That's what I see.
>
> So, is pg_upgrade starting postmaster? If yes, then when (at which step) and
> why pid file check is done. That's all what we all want to know, right?

The pid check is done before pg_upgrade starts or stops any postmaster,
to make sure both servers are down before it starts.  Tom wants that
testing improved.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
On Sat, Jan 19, 2013 at 11:27:28AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Why is a clean shutdown important?  If the server crashed, we would have
> > committed transactions in the WAL files which are not transfered to the
> > new server, and would be lost.
>
> > I am hesistant to even start such an old server because pg_upgrade never
> > modifies the old server.  Even starting it in that case would be
> > modifying it.
>
> I'm not really following this logic.  If the old cluster was in a
> crashed state, why would we not expect that starting a postmaster would
> be the best (only) way to repair the damage and make everything good
> again?  Isn't that exactly what the user would have to do anyway?  What
> other action would you expect him to take instead?
>
> (But, at least with the type of packaging I'm using in Fedora, he would
> first have to go through a package downgrade/reinstallation process,
> because the packaging provides no simple scripted way of manually
> starting the old postgres executable, only the new one.  Moreover, what
> pg_upgrade is printing provides no help in figuring out whether that's
> the next step.)
>
> I do sympathize with taking a paranoid attitude here, but I'm failing
> to see what advantage there is in not attempting to start the old
> postmaster.  In the *only* case that pg_upgrade is successfully
> protecting against with this logic, namely there's-an-active-postmaster-
> already, the postmaster is equally able to protect itself.  In other
> cases it would be more helpful not less to let the postmaster analyze
> the situation.
>
> > The other problem is that if the server start fails, how do we know if
> > the failure was due to a running postmaster?
>
> Because we read the postmaster's log file, or at least tell the user to
> do so.  That report would be unambiguous, unlike pg_upgrade's.

Attached is a WIP patch to give you an idea of how I am going to solve
this problem.  This comment says it all:

!       /*
!        *  If we have a postmaster.pid file, try to start the server.  If
!        *  it starts, the pid file was stale, so stop the server.  If it
!        *  doesn't start, assume the server is running.
!        */


--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Вложения
Hi Bruce, Tom

>The backstory on this is at the cited Red Hat bug ... apparently the OP
>decided I was clueless and he needed to consult some real authorities.
Oh come on, I'm very sure you both are good guys and know what you are
doing, none of us is ignorant bastard :)
Decided to open case here too, because of simple reason - maybe someone had
same issue, or knows how pg_upgrade works (in details) better than me,
because I am clueless.
This is test DB and I can erase it, but I'm very sure there's something
wrong in upgrade process - this is what I want to be solved.

Now, we can open a bottle of whiskey and go back to the problem:
1. I didn't run postmaster before/during pg_upgrade, it was never invoked
manually in this process
2. There is no pid file AFTER application is stopped, but looks like it's
there while pg_upgrade is running - strace showed that and there is no need
to run FAM to verify that

I don't know how pg_upgrade works, looks like it's trying to start
postmaster, which runs, postmaster.pid is created, then postmaster fails
stop or needs some more time bedore pg_upgrade is checking it's pid. That's
what I see.

So, is pg_upgrade starting postmaster? If yes, then when (at which step)
and why pid file check is done. That's all what we all want to know, right?


Best regards,
George Machitidze




On Sat, Jan 19, 2013 at 8:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Bruce Momjian <bruce@momjian.us> writes:
> > Why is a clean shutdown important?  If the server crashed, we would have
> > committed transactions in the WAL files which are not transfered to the
> > new server, and would be lost.
>
> > I am hesistant to even start such an old server because pg_upgrade never
> > modifies the old server.  Even starting it in that case would be
> > modifying it.
>
> I'm not really following this logic.  If the old cluster was in a
> crashed state, why would we not expect that starting a postmaster would
> be the best (only) way to repair the damage and make everything good
> again?  Isn't that exactly what the user would have to do anyway?  What
> other action would you expect him to take instead?
>
> (But, at least with the type of packaging I'm using in Fedora, he would
> first have to go through a package downgrade/reinstallation process,
> because the packaging provides no simple scripted way of manually
> starting the old postgres executable, only the new one.  Moreover, what
> pg_upgrade is printing provides no help in figuring out whether that's
> the next step.)
>
> I do sympathize with taking a paranoid attitude here, but I'm failing
> to see what advantage there is in not attempting to start the old
> postmaster.  In the *only* case that pg_upgrade is successfully
> protecting against with this logic, namely there's-an-active-postmaster-
> already, the postmaster is equally able to protect itself.  In other
> cases it would be more helpful not less to let the postmaster analyze
> the situation.
>
> > The other problem is that if the server start fails, how do we know if
> > the failure was due to a running postmaster?
>
> Because we read the postmaster's log file, or at least tell the user to
> do so.  That report would be unambiguous, unlike pg_upgrade's.
>
>                         regards, tom lane
>