Re: Replication Ideas
От | Dennis Gearon |
---|---|
Тема | Re: Replication Ideas |
Дата | |
Msg-id | 3F4C34B7.3050307@fireserve.net обсуждение исходный текст |
Ответ на | Re: Replication Ideas (Jan Wieck <JanWieck@Yahoo.com>) |
Список | pgsql-general |
Jan Wieck wrote: > WARNING: This is getting long ... > > Postgres-R is a very interesting and inspiring idea. And I've been > kicking that concept around for a while now. What I don't like about > it is that it requires fundamental changes in the lock mechanism and > that it is based on the assumption of very low lock conflict. > > <explain-PG-R> > In Postgres-R a committing transaction sends it's workset (WS - a list > of all updates done in this transaction) to the group communication > system (GC). The GC guarantees total order, meaning that all nodes > will receive all WSs in the same order, no matter how they have been > sent. > > If a node receives back it's own WS before any error occured, it goes > ahead and finalizes the commit. If it receives a foreign WS, it has to > apply the whole WS and commit it before it can process anything else. > If now a local transaction, in progress or while waiting for it's WS > to come back, holds a lock that is required to process such remote WS, > the local transaction needs to be aborted to unlock it's resources ... > it lost the total order race. > </explain-PG-R> > > Postgres-R requires that all remote WSs are applied and committed > before a local transaction can commit. Otherwise it couldn't correctly > detect a lock conflict. So there will not be any read ahead. And since > the total order really counts here, it cannot apply any two remote WSs > in parallel, a race condition could possibly exist and a later WS in > the total order runs faster and locks up a previous one, so we have to > squeeze all remote WSs through one single replication work process. > And all the locally parallel executed transactions that wait for their > WSs to come back have to wait until that poor little worker is done > with the whole pile. Bye bye concurrency. And I don't know how the GC > will deal with the backlog either. Could well choke on it. > > I do not see how this will scale well in a multi-SMP-system cluster. > At least the serialization of WSs will become a horror if there is > significant lock contention like in a standard TPC-C on the district > row containing the order number counter. I don't know for sure, but I > suspect that with this kind of bottleneck, Postgres-R will have to > rollback more than 50% of it's transactions when there are more than 4 > nodes under heavy load (like in a benchmark run). That will suck ... > > > But ... initially I said that it is an inspiring concept ... soooo ... > > I am currently hacking around with some C+PL/TclU+Spread constructs > that might form a rude kind of prototype creature. > > My changes to the Postgres-R concept are that there will be as many > replicating slave processes as there are in summary masters out in the > cluster ... yes, it will try to utilize all the CPU's in the cluster! > For failover reliability, A committing transaction will hold before > finalizing the commit and send it's "I'm ready" to the GC. Every > replicator that reaches the same state send's "I'm ready" too. Spread > guarantees in SAFE_MESS mode that messages are delivered to all nodes > in a group or that at least LEAVE/DISCONNECT messages are deliverd > before. So if a node receives more than 50% of "I'm ready", there > would be a very small gap where multiple nodes have to fail in the > same split second so that the majority of nodes does NOT commit. A > node that reported "I'm ready" but lost more than 50% of the cluster > before committing has to rollback and rejoin or wait for operator > intervention. > > Now the idea is to split up the communication into GC distribution > groups per transaction. So working master backends and associated > replication backends will join/leave a unique group for every > transaction in the cluster. This way, the per process communication is > reduced to the required minimum. > > > As said, I am hacking on some code ... > > > Jan > > Chris Travers wrote: > >> Tom Lane wrote: >> >>> Chris Travers <chris@travelamericas.com> writes: >>> >>> >>>> Yes I have. Postgres-r is not a high-availability solution which is >>>> capable of transparent failover, >>>> >>> >>> >>> What makes you say that? My understanding is it's supposed to survive >>> loss of individual servers. >>> >>> regards, tom lane >>> >>> >>> >>> >> My mistake. I must have gotten them confused with another >> (asynchronous) replication project. >> >> Best Wishes, >> Chris Travers >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: the planner will ignore your desire to choose an index scan if >> your >> joining column's datatypes do not match > > > As my british friends would say, "Bully for you",and I applaud you playing, struggling, learning from this for our sakes. Jeez, all I think about is me,huh?
В списке pgsql-general по дате отправления: