Обсуждение: Statistics about Streaming Replication deployments in production
Hi all,
Samba
We, at Avaya India, have been using postgres for a few years and are very happy with the stability and performance of the system. We would want to utilise the newly released streaming replication feature to build a master-(multiple)slave based geographically redundant setup . We ship to our customers a product that stores its transactional data in postgres, and the size of the data would be accumulating to some where around a couple of hundred gigabytes over a period of time. it will have heavy read load and average write load.
One concern that is being coined by the our management team is regarding the relative stability and 'industrial-strength' of streaming replication. Considering that this feature is just one year old, doubts are expressed about
On account of these, we thought it would be reassuring to our management team if we can cite a few existing production deployments and their success stories.
I think one year is sufficient time for any product/feature to be thoroughly tested for all its strengths and weaknesses; so would it be too much to ask the vast postgres customer base about their experiences with streaming replication, the good, the bad; and perhaps the best and the ugly too? It would be great if customers can give their identity (employer info) but not necessary though.
Thanks and Regards,One concern that is being coined by the our management team is regarding the relative stability and 'industrial-strength' of streaming replication. Considering that this feature is just one year old, doubts are expressed about
- data integrity -- cancelled long running transactions on Primary must not be applied on the standby
- reliability -- what if the network link is broken or one of the pair got crashed when log-segments for a huge committed transaction are being sent from master top standby?
- guaranteed recovery (on failover) -- at any moment, one should be able to turn the standby into active and start using it (there should not be a scenario where master crashed and the slave could not be turned active)
On account of these, we thought it would be reassuring to our management team if we can cite a few existing production deployments and their success stories.
I think one year is sufficient time for any product/feature to be thoroughly tested for all its strengths and weaknesses; so would it be too much to ask the vast postgres customer base about their experiences with streaming replication, the good, the bad; and perhaps the best and the ugly too? It would be great if customers can give their identity (employer info) but not necessary though.
Samba
Dne 28.7.2011 13:03, Samba napsal(a): > One concern that is being coined by the our management team is regarding > the relative stability and 'industrial-strength' of streaming > replication. Considering that this feature is just one year old, doubts > are expressed about > > * data integrity -- cancelled long running transactions on Primary > must not be applied on the standby I'm not quite sure what you mean by "apply on the standby." Queries that run on primary and modify data (e.g. an INSERT) has to apply the changes to the standby. That's how streaming application works - it maintains a binary copy of the datafiles. If a query on primary modifies the datafiles, the change has to be applied to the standby even if the query is cancelled. But those changes won't be visible because it was not commited (just as you can't see the changes on the primary). > * reliability -- what if the network link is broken or one of the > pair got crashed when log-segments for a huge committed transaction > are being sent from master top standby? The standby can ask for the changes either the primary or check the WAL archiving. So even if the network goes down, the standby can get the data from the archive. If you care about continuous backups and PITR, you should probably enable WAL archiving anyway. See this: http://www.postgresql.org/docs/9.0/static/continuous-archiving.html > * guaranteed recovery (on failover) -- at any moment, one should be > able to turn the standby into active and start using it (there > should not be a scenario where master crashed and the slave could > not be turned active) I'm not aware of any bug preventing a failover ... > On account of these, we thought it would be reassuring to our management > team if we can cite a few existing production deployments and their > success stories. I'd like to see that too, but I guess it's bit too early for that. Keep in mind the SR is just one year old. That's not much, especially for large projects - it takes time to develop the system, test it, prepare the production environment etc. > I think one year is sufficient time for any product/feature to be > thoroughly tested for all its strengths and weaknesses; so would it be > too much to ask the vast postgres customer base about their experiences > with streaming replication, the good, the bad; and perhaps the best and > the ugly too? It would be great if customers can give their identity > (employer info) but not necessary though. Well, yes. I believe the companies have been testing it, bugs were reported to pgsql-bugs and fixed. That's how it works ;-) Tomas
On Thu, Jul 28, 2011 at 12:03 PM, Samba <saasira@gmail.com> wrote: > I think one year is sufficient time for any product/feature to be thoroughly > tested for all its strengths and weaknesses; so would it be too much to ask > the vast postgres customer base about their experiences with streaming > replication, the good, the bad; and perhaps the best and the ugly too? It > would be great if customers can give their identity (employer info) but not > necessary though. Maybe its not clear in the documentation but the streaming replication feature isn't just one year old. The core parts of it are actually 7 years old, and they are definitely battle tested. The slightly newer parts changed the transport logic to stream rather than use file-by-file. The features relevant here are Point in Time Recovery (8.0), Warm Standby (8.2), pg_standby (8.3), Bgwriter during recovery (8.4) -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services