Обсуждение: Multi Master Replication

Поиск
Список
Период
Сортировка

Multi Master Replication

От
itishree sukla
Дата:
Hi all,

I need suggestion about setting up multi master replication between two postgresql server place two different geographical area. As i know using some third party tool like Bucardo,RubyRep it can be achievable, not sue which is the good one to use. If any one  can provide me some online documentation links, it will help me as well.


Thanks in advance.

Regards,
Itishree

Re: Multi Master Replication

От
John R Pierce
Дата:
On 12/18/2013 1:31 AM, itishree sukla wrote:
> I need suggestion about setting up multi master replication between
> two postgresql server place two different geographical area. As i know
> using some third party tool like Bucardo,RubyRep it can be achievable,
> not sue which is the good one to use. If any one  can provide me some
> online documentation links, it will help me as well.

that sort of replication is very problematic.   its virtually impossible
to maintain ACID (Atomicity, Consistency, Isolation, Durability) and
maintain any semblance of performance.

question for you, what do you expect to happen if the communications
link between the servers is interrupted, and updates continue to be sent
to both servers?

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Multi Master Replication

От
Scott Marlowe
Дата:
On Wed, Dec 18, 2013 at 1:16 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 12/18/2013 1:31 AM, itishree sukla wrote:
>>
>> I need suggestion about setting up multi master replication between two
>> postgresql server place two different geographical area. As i know using
>> some third party tool like Bucardo,RubyRep it can be achievable, not sue
>> which is the good one to use. If any one  can provide me some online
>> documentation links, it will help me as well.
>
>
> that sort of replication is very problematic.   its virtually impossible to
> maintain ACID (Atomicity, Consistency, Isolation, Durability) and maintain
> any semblance of performance.
>
> question for you, what do you expect to happen if the communications link
> between the servers is interrupted, and updates continue to be sent to both
> servers?

When people start talking multi-master replication my first response
is to ask what problem you're trying to solve. Sometimes MM Rep IS the
answer. But quite often it's not the best one for your problem. So to
OP I'd ask what problem they're trying to solve.

--
To understand recursion, one must first understand recursion.


Re: Multi Master Replication

От
Michael Paquier
Дата:
On Thu, Dec 19, 2013 at 7:12 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Dec 18, 2013 at 1:16 PM, John R Pierce <pierce@hogranch.com> wrote:
>> that sort of replication is very problematic.   its virtually impossible to
>> maintain ACID (Atomicity, Consistency, Isolation, Durability) and maintain
>> any semblance of performance.
Yep, there's usually a trade-off between performance and data
consistency. OLTP applications can benefit from MM with a
shared-nothing architecture, more than data warehouse type of things
that need to transfer a lot of data for join operations, or SQL
operations that use non-pushable clauses (for example stable/volatile
functions).

>> question for you, what do you expect to happen if the communications link
>> between the servers is interrupted, and updates continue to be sent to both
>> servers?
Split-brain is another problem, hard to solve. Even harder if you have
several types nodes in your cluster dedicated to provide some piece
building the MM system.

> When people start talking multi-master replication my first response
> is to ask what problem you're trying to solve. Sometimes MM Rep IS the
> answer. But quite often it's not the best one for your problem. So to
> OP I'd ask what problem they're trying to solve.
Yes that's actually the right approach, multi-master replication is
often cited as a marketing term for a fantastic technology that can
solve a lot of problems, which could be solved with a couple of
Postgres servers using a single-master, multiple-slave approach, or by
simply design a system that can do data sharding among a set of
Postgres servers to achieve some kind of write scalability.

Regards,
--
Michael


Re: Multi Master Replication

От
Scott Marlowe
Дата:
On Wed, Dec 18, 2013 at 10:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

> On Thu, Dec 19, 2013 at 7:12 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

>> When people start talking multi-master replication my first response
>> is to ask what problem you're trying to solve. Sometimes MM Rep IS the
>> answer. But quite often it's not the best one for your problem. So to
>> OP I'd ask what problem they're trying to solve.

> Yes that's actually the right approach, multi-master replication is
> often cited as a marketing term for a fantastic technology that can
> solve a lot of problems, which could be solved with a couple of
> Postgres servers using a single-master, multiple-slave approach, or by
> simply design a system that can do data sharding among a set of
> Postgres servers to achieve some kind of write scalability.

Sharding with plproxy is pretty easy and can scale hugely.


Re: Multi Master Replication

От
Michael Paquier
Дата:
On Thu, Dec 19, 2013 at 2:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Sharding with plproxy is pretty easy and can scale hugely.
Yeah indeed, the writable postgres_fdw could also be used as a
solution, if designed carefully.
--
Michael


Re: Multi Master Replication

От
Chris Travers
Дата:


On Wed, Dec 18, 2013 at 1:31 AM, itishree sukla <itishree.sukla@gmail.com> wrote:
Hi all,

I need suggestion about setting up multi master replication between two postgresql server place two different geographical area. As i know using some third party tool like Bucardo,RubyRep it can be achievable, not sue which is the good one to use. If any one  can provide me some online documentation links, it will help me as well.

First suggestion:  don't.  Multi-master replication almost never is a win.  I tell customers this and they don't like to hear it but it is true.

Second suggestion:  If you absolutely must, go with Bucardo.  Rubyrep might work but it is db-agnostic and this raises additional complexity concerns.

The major reasons why I argue against multi-master replication are:

1.  With async replication you have the problem of conflict resolution.  In some (rare) cases this is avoidable, but in most cases it is not.  You have to deal with the fact that two different people in two different locations may update the same row, and you have to decide how to handle this.  Once you have a conflict, every option you have to resolve the conflict results in data loss.   There are rare cases where this is the right solution.

2.  With sync replication, you have coordination problems and therefore it is never (at least IME) a win compared to master-slave replication since all writes must occur in the same order in the set, or you need global sequences, or such.  So I would say that something like PGPool for multi-master replication is just to be avoided generally (however for load balancing in master/slave it can be a win).  You will never get better read or write throughput, or a more robust system than you will with master/slave replication in a synchronous environment.  Keep it simple.

As others have mentioned your best bet here is pl/proxy.  I would go a little further however and suggest that you can separate storage dbs from proxy db's and thus create the appearance of multi-master over a master-slave setup.  If you are going to go this route however, I don't know whether Postgres-XC would be worth looking into.

The key here though is that design effort is important.  If you carefully design your federated storage, then you should be good.  However this leads to lots of problems and you need to think them through. 

--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: Multi Master Replication

От
Wolfgang Keller
Дата:
> 2.  With sync replication, you have coordination problems and
> therefore it is never (at least IME) a win compared to master-slave
> replication since all writes must occur in the same order in the set,
> or you need global sequences, or such.

*snip*

>  You will never get better read or write throughput,

Better read throughput is trivial to achieve even with other solutions
than multi-master replication.

And for better write throughput, the developers of Postgres-XC
(supported by NTT, among others) beg to differ:

http://postgresxc.wikia.com/wiki/Postgres-XC_Wiki
http://postgres-xc.sourceforge.net/

As does Bettina Kemme (of Postgres-R fame).

Sincerely,

Wolfgang


Re: Multi Master Replication

От
Chris Travers
Дата:



On Thu, Dec 19, 2013 at 6:52 AM, Wolfgang Keller <feliphil@gmx.net> wrote:
> 2.  With sync replication, you have coordination problems and
> therefore it is never (at least IME) a win compared to master-slave
> replication since all writes must occur in the same order in the set,
> or you need global sequences, or such.

*snip*

>  You will never get better read or write throughput,

Better read throughput is trivial to achieve even with other solutions
than multi-master replication.

And for better write throughput, the developers of Postgres-XC
(supported by NTT, among others) beg to differ:

http://postgresxc.wikia.com/wiki/Postgres-XC_Wiki
http://postgres-xc.sourceforge.net/

I am not quite sure what the point is.   I am not sure you will get the same write extensibility if you list every table as replicated instead of partitioned.  What Postgres-XC gives you ideally is a no-storage and multi-master coordination layer on top of master-slave data nodes.  Some things may need to be replicated multi-master between data nodes but that's not a win write throughput-wise.

I am btw a reasonable fan of Postgres-XC within its problem domain, but it is not a synchronous multi-master replication solution as far as write scaling goes. 

My point still holds, which is that synchronous multi-master replication will never beat master-slave in write throughput.  My understanding of Postgres-XC is that you'd mark tables as replicated (instead of partitioned) when they are going to be joined against by different nodes and infrequently updated (and hence the write overhead is less of a problem than the cross-node join overhead).

Am I way off-base with my understanding here?  At any rate it isn't Postgres-XC (which is something very different than a typical "replication" setup, and I would describe it more as an advanced sharding solution).

Best Wishes,
Chris Travers


As does Bettina Kemme (of Postgres-R fame).

Sincerely,

Wolfgang


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: Multi Master Replication

От
Michael Paquier
Дата:
On Thu, Dec 19, 2013 at 11:18 PM, Chris Travers <chris.travers@gmail.com> wrote:
>
>
>
> On Thu, Dec 19, 2013 at 6:52 AM, Wolfgang Keller <feliphil@gmx.net> wrote:
>>
>> > 2.  With sync replication, you have coordination problems and
>> > therefore it is never (at least IME) a win compared to master-slave
>> > replication since all writes must occur in the same order in the set,
>> > or you need global sequences, or such.
> I am not quite sure what the point is.   I am not sure you will get the same
> write extensibility if you list every table as replicated instead of
> partitioned.  What Postgres-XC gives you ideally is a no-storage and
> multi-master coordination layer on top of master-slave data nodes.  Some
> things may need to be replicated multi-master between data nodes but that's
> not a win write throughput-wise.
You'd kill the write scalability of the application by marking all the
tables as replicated. The communication between nodes uses SQL
strings, so a DML on a replicated table needs to occur on all the
nodes, and on top of that you need 2PC for a transaction commit if
more than 2 nodes are involved in write operations in this
transaction.

> I am btw a reasonable fan of Postgres-XC within its problem domain, but it
> is not a synchronous multi-master replication solution as far as write
> scaling goes.
OLTP applications that have a schema tunable for replication/partition
to maximize join pushdown might be a good definition of the
application range that could benefit from XC.

> My point still holds, which is that synchronous multi-master replication
> will never beat master-slave in write throughput.  My understanding of
> Postgres-XC is that you'd mark tables as replicated (instead of partitioned)
> when they are going to be joined against by different nodes and infrequently
> updated (and hence the write overhead is less of a problem than the
> cross-node join overhead).
Yep, exactly. Those tables are actually master tables and the point is
to maximize the number of join clause push down to minimize the amount
of data exchanged between the nodes because of the shared-nothing
infrastructure. The type of tables that should be marked as
partitioned is the once that keep growing and need to scale of the
type "user" tables. This is actually how DBT-1 has been tuned when
doing scaling testing with it: partition user and adress tables,
replicate stock and item tables.

Regards,
--
Michael


Re: Multi Master Replication

От
Michael Paquier
Дата:
On Fri, Dec 20, 2013 at 8:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 19, 2013 at 11:18 PM, Chris Travers <chris.travers@gmail.com> wrote:
>>
>>
>>
>> On Thu, Dec 19, 2013 at 6:52 AM, Wolfgang Keller <feliphil@gmx.net> wrote:
>>>
>>> > 2.  With sync replication, you have coordination problems and
>>> > therefore it is never (at least IME) a win compared to master-slave
>>> > replication since all writes must occur in the same order in the set,
>>> > or you need global sequences, or such.
>> I am not quite sure what the point is.   I am not sure you will get the same
>> write extensibility if you list every table as replicated instead of
>> partitioned.  What Postgres-XC gives you ideally is a no-storage and
>> multi-master coordination layer on top of master-slave data nodes.  Some
>> things may need to be replicated multi-master between data nodes but that's
>> not a win write throughput-wise.
> You'd kill the write scalability of the application by marking all the
> tables as replicated. The communication between nodes uses SQL
> strings, so a DML on a replicated table needs to occur on all the
> nodes, and on top of that you need 2PC for a transaction commit if
> more than 2 nodes are involved in write operations in this
> transaction.
>
>> I am btw a reasonable fan of Postgres-XC within its problem domain, but it
>> is not a synchronous multi-master replication solution as far as write
>> scaling goes.
> OLTP applications that have a schema tunable for replication/partition
> to maximize join pushdown might be a good definition of the
> application range that could benefit from XC.
>
>> My point still holds, which is that synchronous multi-master replication
>> will never beat master-slave in write throughput.  My understanding of
>> Postgres-XC is that you'd mark tables as replicated (instead of partitioned)
>> when they are going to be joined against by different nodes and infrequently
>> updated (and hence the write overhead is less of a problem than the
>> cross-node join overhead).
> Yep, exactly. Those tables are actually master tables and the point is
> to maximize the number of join clause push down to minimize the amount
> of data exchanged between the nodes because of the shared-nothing
> infrastructure. The type of tables that should be marked as
> partitioned is the once that keep growing and need to scale of the
> type "user" tables. This is actually how DBT-1 has been tuned when
> doing scaling testing with it: partition user and adress tables,
> replicate stock and item tables.
I actually wrote something stupid here, stock is partitioned and it
makes sense as it faces lot of updates:
http://images.wikia.com/postgresxc/images/6/66/PG-XC_Architecture.pdf (page 23)
Thanks to Chris for pointing that out.
--
Michael


Re: Multi Master Replication

От
John R Pierce
Дата:
14 replies so far, and the OP hasn't chimed in with any feedback as to
what their presumed requirements are based on.


*meh*




Re: Multi Master Replication

От
pax
Дата:
Sometime ago i was looking for something like this and because at this time
XC was a little baby i tried installing bucardo but i gave up when stucked
fighting with perl modules. So, after testing some other solutions i decided
to make my own, just "touching" the trigger part of the pyreplica project
and building a daemon in a binary compiled program (no script) for a better
performance. It's working well by now, with some tables and few nodes (12
nodes deployed across long distances), but i was lucky then because the
database and the systems were still in design and development phases.
There are many things you have to keep in mind, as all people is warning
you. I tell you, it can be done, but you MUST redesign your existing
database preparing for multimaster paradigm where the key is avoid node
level concurrency, and your client database apps to THINK what to do when
nodes are down, but if you are not ready to prepare your database (design)
or you have not access to the source code of your client database apps then
you have problems.
I suggest you read this
http://www.dbspecialists.com/files/presentations/mm_replication.html and
this
http://scale-out-blog.blogspot.com/2012/04/if-you-must-deploy-multi-master.html.
Finally, i can mention another posible solution that i never tried but maybe
can help you, SymmetricDS.
Good luck.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Multi-Master-Replication-tp5783855p5784468.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Multi Master Replication

От
"Anand Kumar, Karthik"
Дата:
We use symmetricDS pretty extensively, across oracle and postgres
databases. It has its flaws and its strengths. It shines when there's
multiple database platforms involved, when the volume of transactions is
not too high, and supports multi master. Its optimized for wan topologies,
so its great if you have small amounts of data over longer distances, not
as great when there's several gigabytes needing transfer within the same
datacenter. It also has inbuilt insert vs update conflict resolution.

On the flip side, as the volume of writes increases, it can quickly lead
to significant database bloat and high CPU usage. Its also not as fast as
some other tools (like slony, or ora2pg) for initial loads of tables.

It exposes all of its switches and knobs (and there are a *lot* of
tunables), which make it both very flexible, but also prone to breakage if
you fiddle with it too much. Its in active development and we've found the
community forums to be pretty helpful when we run into undocumented bugs.

Let me know if you need help with it.

Thanks,
Karthik