[HACKERS] logical replication - still unstable after all these months
От | Erik Rijkers |
---|---|
Тема | [HACKERS] logical replication - still unstable after all these months |
Дата | |
Msg-id | 3897361c7010c4ac03f358173adbcd60@xs4all.nl обсуждение исходный текст |
Ответы |
Re: [HACKERS] logical replication - still unstable after all these months
(Simon Riggs <simon@2ndquadrant.com>)
Re: [HACKERS] logical replication - still unstable after all thesemonths (Petr Jelinek <petr.jelinek@2ndquadrant.com>) Re: [HACKERS] logical replication - still unstable after all thesemonths (Erik Rijkers <er@xs4all.nl>) |
Список | pgsql-hackers |
If you run a pgbench session of 1 minute over a logical replication connection and repeat that 100x this is what you get: At clients 90, 64, 8, scale 25: -- out_20170525_0944.txt 100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n -- scale 25 93 -- All is well. 7 -- Notgood. -- out_20170525_1426.txt 100 -- pgbench -c 64 -j 8 -T 60 -P 12 -n -- scale 25 82 -- All is well. 18 -- Notgood. -- out_20170525_2049.txt 100 -- pgbench -c 8 -j 8 -T 60 -P 12 -n -- scale 25 90 -- All is well. 10 -- Not good At clients 90, 64, 8, scale 25: -- out_20170526_0126.txt 100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n -- scale 5 98 -- All is well. 2 -- Not good. -- out_20170526_0352.txt 100 -- pgbench -c 64 -j 8 -T 60 -P 12 -n -- scale 5 97 -- All is well. 3 -- Not good. -- out_20170526_0621.txt 45 -- pgbench -c 8 -j 8 -T 60 -P 12 -n -- scale 5 41 -- All is well. 3 -- Not good. (That last one obviously not finished) I think this is pretty awful, really, for a beta level. The above installations (master+replica) are with Petr Jelinek's (and Michael Paquier's) last patches 0001-Fix-signal-handling-in-logical-workers.patch 0002-Make-tablesync-worker-exit-when-apply-dies-while-it-.patch 0003-Receive-invalidation-messages-correctly-in-tablesync.patch Remove-the-SKIP-REFRESH-syntax-suggar-in-ALTER-SUBSC-v2.patch Now, it could be that there is somehow something wrong with my test-setup (as opposed to some bug in log-repl). I can post my test program, but I'll do that separately (but below is the core all my tests -- it's basically still that very first test that I started out with, many months ago...) I'd like to find out/know more about: - Do you agree this number of failures is far too high? - Am I the only one finding so many failures? - Is anyone else testing the same way (more or less continually, finding only succes)? - Which of the Open Items could be resposible for this failure rate? (I don't see a match.) - What tests do others do? Could we somehow concentrate results and method somewhere? Thanks, Erik Rijkers PS The core of the 'pgbench_derail' test (bash) is simply: echo "drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -qXp $port1 \ && echo "drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -qXp $port2 \ && pgbench -p $port1 -qis $scale \ && echo "alter table pgbench_history add column hid serial primary key;" \ | psql -q1Xp $port1 && pg_dump -F c -p $port1 \ --exclude-table-data=pgbench_history \ --exclude-table-data=pgbench_accounts\ --exclude-table-data=pgbench_branches \ --exclude-table-data=pgbench_tellers \ -t pgbench_history -t pgbench_accounts \ -t pgbench_branches -t pgbench_tellers\ | pg_restore -1 -p $port2 -d testdb appname=derail2 echo "create publication pub1 for all tables;" | psql -p $port1 -aqtAX echo "create subscription sub1 connection 'port=${port1} application_name=$appname' publication pub1 with(enabled=false); alter subscription sub1 enable;" | psql -p $port2 -aqtAX pgbench -c $clients -j $threads -T $duration -P $pseconds -n # scale $scale Now compare md5's of the sorted content of each of the 4 pgbench tables on primary and replica. They should be the same.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Regina Obe"Дата:
Сообщение: Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity
Следующее
От: Simon RiggsДата:
Сообщение: Re: [HACKERS] logical replication - still unstable after all these months