Re: [HACKERS] logical replication - still unstable after all these months

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: [HACKERS] logical replication - still unstable after all these months
Дата
Msg-id CAMkU=1zsThCJV03SvdUtYGapsm+yA_GkVBgm_e+xpb2FEcoEtQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] logical replication - still unstable after all thesemonths  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Ответы Re: [HACKERS] logical replication - still unstable after all thesemonths  (Erik Rijkers <er@xs4all.nl>)
Re: [HACKERS] logical replication - still unstable after all thesemonths  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Re: [HACKERS] logical replication - still unstable after all thesemonths  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Список pgsql-hackers
On Sun, May 28, 2017 at 3:17 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 28/05/17 19:01, Mark Kirkwood wrote:


So running in cloud land now...so for no errors - will update.




The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3 failed test runs (all scale 25 and 8 pgbench clients). Given the way the test decides on failure (gets tired of waiting for the table md5's to match) - it begs the question 'What if it had waited a bit longer'? However from what I can see in all cases:

- the rowcounts were the same in master and replica
- the md5 of pgbench_accounts was different

All four tables should be wrong if there is still a transaction it is waiting for, as all the changes happen in a single transaction.  

I also got a failure, after 87 iterations of a similar test case.  It waited for hours, as mine requires manual intervention to stop waiting.  On the subscriber, one account still had a zero balance, while the history table on the subscriber agreed with both history and accounts on the publisher and the account should not have been zero, so definitely a transaction atomicity got busted.

I altered the script to also save the tellers and branches tables and repeated the runs, but so far it hasn't failed again in over 800 iterations using the altered script.
   

...so does seem possible that there is some bug being tickled here. Unfortunately the test framework blasts away the failed tables and subscription and continues on...I'm going to amend it to stop on failure so I can have a closer look at what happened.

What would you want to look at?  Would saving the WAL from the master be helpful?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Kirkwood
Дата:
Сообщение: Re: [HACKERS] logical replication - still unstable after all thesemonths
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] Extra Vietnamese unaccent rules