RE: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id TYAPR01MB5866455A1654D621571DBD24F50CA@TYAPR01MB5866.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Dear Amit, Julien,

> > > >
> > > > Unless I'm missing something I don't see what prevents something to
> connect
> > > > using the replication protocol and issue any query or even create new
> > > > replication slots?
> > > >
> > >
> > > I think the point is that if we have any slots where we have not
> > > consumed the pending WAL (other than the expected like
> > > SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
> > > won't proceed and we will request user to remove such slots or ensure
> > > that WAL is consumed by slots. So, I think in the case you mentioned,
> > > the upgrade won't succeed.
> >
> > What if new slots are added while the old instance is started in the middle of
> > pg_upgrade, *after* the various checks are done?
> >
> 
> They won't be copied but I think that won't be any different than
> other objects like tables. Anyway, I have another idea which is to not
> allow creating slots during binary upgrade unless one specifically
> requests it by having an API like binary_upgrade_allow_slot_create()
> similar to existing APIs binary_upgrade_*.

I confirmed the part and confirmed that objects created after the dump
were not copied to new node. PSA scripts to emulate my test.

# tested steps

-1. applied v18 patch set
0. modified source to create objects during upgrade and install:

```
@@ -188,6 +188,9 @@ check_and_dump_old_cluster(bool live_check)
        if (!user_opts.check)
                generate_old_dump();
 
+       printf("XXX: start to sleep\n");
+       sleep(35);
+
```

1. prepared a node which had a replication slot
2. did pg_upgrade, the process will sleep 35 seconds during that
3. connected to the in-upgrading node by the command:

```
psql "host=`pwd` user=postgres port=50432 replication=database"
```

4. created a table and replication slot. Note that for binary upgrade, it was very
  hard to create tables manually. For me, table "bar" and slot "test" were created.
5. waited until the upgrade and boot new node.
6. confirmed that created tables and slots were not found on new node.

```
new_publisher=# \d
Did not find any relations.

new_publisher=# SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'test';
 slot_name 
-----------
(0 rows)
```

You can execute test_01.sh first, and then execute test_02.sh while the first terminal is stuck.


Note that such creations are theoretically occurred, but it is very rare.
By followings line in start_postmaster(), the TCP/IP connections are refused and
only the superuser can connect to the server.

```
#if !defined(WIN32)
    /* prevent TCP/IP connections, restrict socket access */
    strcat(socket_string,
           " -c listen_addresses='' -c unix_socket_permissions=0700");

    /* Have a sockdir?    Tell the postmaster. */
    if (cluster->sockdir)
        snprintf(socket_string + strlen(socket_string),
                 sizeof(socket_string) - strlen(socket_string),
                 " -c %s='%s'",
                 (GET_MAJOR_VERSION(cluster->major_version) <= 902) ?
                 "unix_socket_directory" : "unix_socket_directories",
                 cluster->sockdir);
#endif
```

Moreover, the socket directory is set to current dir of caller, and port number
is also different from setting written in postgresql.conf.
I think there are few chances that replication slots are accidentally created
during the replication slot.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Configurable FP_LOCK_SLOTS_PER_BACKEND
Следующее
От: Christoph Berg
Дата:
Сообщение: Re: A failure in 031_recovery_conflict.pl on Debian/s390x