Обсуждение: BDR: name conflict when joining a rebuilt node


BDR: name conflict when joining a rebuilt node

Florin Andrei
Let's say node pg12 in a cluster needs to be removed because it has
serious problems. I remove it by running this command on another node in
the cluster:

SELECT bdr.bdr_part_by_node_names('{pg12}');

On pg12, I run this:

SET LOCAL bdr.permit_unsafe_ddl_commands = true;
SET LOCAL bdr.skip_ddl_locking = true;
SECURITY LABEL FOR 'bdr' ON DATABASE pgmirror IS '{"bdr": false}';

I repair the broken node, drop the existing database, fix whatever is
wrong with it, re-create the database (empty). It's basically like a new
node. Then I try to re-join it to the cluster under the same old name:

SELECT bdr.bdr_group_join(
    local_node_name := 'pg12',
    node_external_dsn := 'host=pg12 dbname=pgmirror',
    join_using_dsn := 'host=pg11 dbname=pgmirror'
SELECT bdr.bdr_node_join_wait_for_ready();

The problem is, bdr_node_join_wait_for_ready() never returns, it just
waits forever. If I go on pg11 and run SELECT * FROM bdr.bdr_nodes, I
see pg12 listed twice, with node_status k and i, respectively. On pg11 I
see this in the logs:

"System identification mismatch between connection and slot","Connection
for bdr (6211167104388615363,1,16387,) resulted in slot on node bdr
(6211167104388615363,1,17163,) instead of expected node",,,,,,,,"bdr
(6211167104388615363,1,17163,): perdb"

How can I re-join an old node to the cluster after rebuilding it from
scratch, under the old name?

Do I have to change the name every time I re-join a node?

Florin Andrei

Re: BDR: name conflict when joining a rebuilt node

Craig Ringer
On 30 October 2015 at 08:24, Florin Andrei <florin@andrei.myip.org> wrote:

> The problem is, bdr_node_join_wait_for_ready() never returns, it just waits
> forever. If I go on pg11 and run SELECT * FROM bdr.bdr_nodes, I see pg12
> listed twice, with node_status k and i, respectively. On pg11 I see this in
> the logs:
> "System identification mismatch between connection and slot","Connection for
> bdr (6211167104388615363,1,16387,) resulted in slot on node bdr
> (6211167104388615363,1,17163,) instead of expected node",,,,,,,,"bdr
> (6211167104388615363,1,17163,): perdb"

This is a bug fixed in 0.9.3.


 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR: name conflict when joining a rebuilt node

Florin Andrei
Still having issues with this with BDR-0.9.3

This is how I join a new node to the cluster:

su - postgres
psql pgmirror

-- fire up BDR extensions

-- join BDR group via an existing node there
SELECT bdr.bdr_group_join(
    local_node_name := 'pg12-prod-uswest2-aws',
    node_external_dsn := 'host=pg12-prod-uswest2-aws dbname=pgmirror',
    join_using_dsn := 'host=pg11-prod-uswest2-aws dbname=pgmirror'

SELECT bdr.bdr_node_join_wait_for_ready();

This is how I remove a node from the cluster:

# Log into any other node in the cluster (NOT the node you want to
remove) and run:

su - postgres
psql pgmirror

SELECT bdr.bdr_part_by_node_names('{pg12-prod-uswest2-aws}');

# Log into the removed node and run:

su - postgres
psql pgmirror

SET LOCAL bdr.permit_unsafe_ddl_commands = true;
SET LOCAL bdr.skip_ddl_locking = true;
SECURITY LABEL FOR 'bdr' ON DATABASE pgmirror IS '{"bdr": false}';

# Now restart PostgreSQL.

Now let's say on the removed node I've dropped the pgmirror database,
performed some maintenance, re-created the pgmirror DB (empty), and now
I want to re-join the node to the cluster under the same name. I repeat
the join new node procedure described at the top. It gets stuck in

On another node, the re-joined node is now listed twice in
bdr.bdr_nodes, once with status k, and again with status i. The logs on
the re-joined node show this:

2015-11-03 20:29:52.016 UTC,,,4916,,56391614.1334,219,,2015-11-03
20:16:20 UTC,,0,LOG,00000,"starting background worker process ""bdr db:
2015-11-03 20:29:52.047 UTC,,,7222,"",56391940.1c36,1,"",2015-11-03
20:29:52 UTC,,0,LOG,00000,"connection received: host=
2015-11-03 20:29:52.050
UTC,"postgres","pgmirror",7222,"",56391940.1c36,2,"authentication",2015-11-03 20:29:52
UTC,4/321,0,LOG,00000,"replicationconnection authorized: user=postgres SSL enabled (protocol=TLSv1.2,
2015-11-03 20:29:52.052 UTC,,,7221,,56391940.1c35,1,,2015-11-03 20:29:52
UTC,3/0,0,ERROR,55000,"System identification mismatch between connection
and slot","Connection for bdr (6212727469166484615,1,16387,) resulted in
slot on node bdr (6212727469166484615,1,17169,) instead of expected
node",,,,,,,,"bdr (6212727469166484615,1,17169,): perdb"
2015-11-03 20:29:52.053
UTC,"postgres","pgmirror",7222,"",56391940.1c36,3,"idle",2015-11-03 20:29:52 UTC,4/0,0,LOG,08006,"could
notreceive data from client: Connection reset by peer",,,,,,,,,"bdr (6212727469166484615,1,17169,):mkslot" 
2015-11-03 20:29:52.053
UTC,"postgres","pgmirror",7222,"",56391940.1c36,4,"idle",2015-11-03 20:29:52
UTC,,0,LOG,00000,"disconnection:session time: 0:00:00.006 user=postgres database=pgmirror host=
2015-11-03 20:29:52.053 UTC,,,4916,,56391614.1334,220,,2015-11-03
20:16:20 UTC,,0,LOG,00000,"worker process: bdr db: pgmirror (PID 7221)
exited with exit code 1",,,,,,,,,""

OS is Ubuntu 14.04, with these packages installed:

ii  postgresql-bdr-9.4               9.4.4-1trusty
amd64        object-relational SQL database, version 9.4 server
ii  postgresql-bdr-9.4-bdr-plugin    0.9.3-1trusty
amd64        BDR Plugin for PostgreSQL-BDR 9.4
ii  postgresql-bdr-client-9.4        9.4.4-1trusty
amd64        front-end programs for PostgreSQL-BDR 9.4
ii  postgresql-bdr-contrib-9.4       9.4.4-1trusty
amd64        additional facilities for PostgreSQL
ii  postgresql-bdr-server-dev-9.4    9.4.4-1trusty
amd64        development files for PostgreSQL-BDR 9.4 server-side
ii  postgresql-client-common         169.pgdg14.04+1
all          manager for multiple PostgreSQL client versions
ii  postgresql-common                169.pgdg14.04+1
all          PostgreSQL database-cluster manager

Florin Andrei