Обсуждение: postgresql clustering
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">Dear Sirs</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">I know that that postgresql can be configured for high availability over a clustered environment usingpgcluster, I am currently studying in my masters the clustering using MPI and OpenMP, PVM and others packages and Ihave to do a project, so I was thinking to use this opportunity to start implementing the clustering over postgresql usingany of the above packages.</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">What do you think?</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">Thanks</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Harlow Solid Italic" size="3"><span style="font-size:12.0pt;font-family:"HarlowSolid Italic"">Rafik Salama</span></font><p class="MsoNormal"><font face="HarlowSolid Italic" size="3"><span style="font-size:12.0pt;font-family:"Harlow Solid Italic"">Systems Architect</span></font><pclass="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size: 12.0pt"> </span></font><p class="MsoNormal"><font face="Arial" size="1"><span style="font-size:7.5pt; font-family:Arial">CIT Global</span></font><p class="MsoNormal"><font face="Arial" size="1"><span style="font-size:7.5pt;font-family:Arial">CIT</span></font><fontface="Arial" size="1"><span style="font-size:7.5pt;font-family:Arial">Building</span></font><font face="Arial" size="1"><span style="font-size:7.5pt;font-family:Arial">,Free Zone</span></font><p class="MsoNormal"><font face="Arial" size="1"><spanstyle="font-size:7.5pt;font-family:Arial">Nasr</span></font><font face="Arial" size="1"><span style="font-size:7.5pt;font-family:Arial">City</span></font><font face="Arial" size="1"><span style="font-size:7.5pt;font-family:Arial">,</span></font><pclass="MsoNormal"><font face="Arial" size="1"><span style="font-size:7.5pt;font-family:Arial">P.O.Box11816</span></font><font face="Arial" size="1"><span style="font-size:7.5pt;font-family:Arial">,Cairo, Egypt</span></font><p class="MsoNormal"><font face="Arial" size="1"><spanstyle="font-size:7.5pt; font-family:Arial">Tel : +202 271 8794 (ext. 115)</span></font><p class="MsoNormal"><font face="Arial" size="1"><span style="font-size:7.5pt; font-family:Arial">Fax : +202 2748335</span></font><p class="MsoNormal"><font face="Arial" size="1"><span style="font-size:7.5pt; font-family:Arial">Cell: +2010 5410035</span></font><p class="MsoNormal"><font face="Arial" size="1"><span style="font-size:7.5pt; font-family:Arial"><a href="http://www.citglobal.com">http://www.citglobal.com</a></span></font><p class="MsoNormal"><fontface="Times New Roman" size="3"><span style="font-size: 12.0pt"> </span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size: 12.0pt"> </span></font></div>
On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: > Dear Sirs > > I know that that postgresql can be configured for high availability > over a clustered environment using pgcluster, Do you have a case study showing this? > I am currently studying in my masters the clustering using MPI and > OpenMP, PVM and others packages and I have to do a project, so I was > thinking to use this opportunity to start implementing the > clustering over postgresql using any of the above packages. > > What do you think? Let a thousand schools of thought content. Let a hundred flowers bloom. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
I think its a great idea to give it a shot, maybe you can present a proposal to the list of how you wish to go about it. There could be some experts on the list who may give you some input and direction. Aly. David Fetter wrote: > On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: > >>Dear Sirs >> >>I know that that postgresql can be configured for high availability >>over a clustered environment using pgcluster, > > > Do you have a case study showing this? > > >>I am currently studying in my masters the clustering using MPI and >>OpenMP, PVM and others packages and I have to do a project, so I was >>thinking to use this opportunity to start implementing the >>clustering over postgresql using any of the above packages. >> >>What do you think? > > > Let a thousand schools of thought content. Let a hundred flowers > bloom. > > Cheers, > D -- Aly Dharshi aly.dharshi@telus.net "A good speech is like a good dress that's short enough to be interesting and long enough to coverthe subject"
No I do not have a case study, I just read so, but what I am suggesting to start doing is that if there is no cluster implementation to give high availability of the database, I will start doing this project through the message passing technique and I already have in the university a cluster of 19 machine intel xeon, you can see it in this URL http://www.cs.aucegypt.edu/~cluster But any way I was just asking so as not to reinvent the Wheel, in case there is something like that, but since there is not, I will give it a try, at the end of the day it is open source and I can do anything and if it happens to work, who knows!!!! Thanks Rafik Salama Systems Architect CIT Global CIT Building, Free Zone Nasr City, P.O.Box 11816, Cairo, Egypt Tel : +202 271 8794 (ext. 115) Fax : +202 2748335 Cell: +2010 5410035 http://www.citglobal.com -----Original Message----- From: David Fetter [mailto:david@fetter.org] Sent: Wednesday, September 21, 2005 8:12 PM To: Rafik Salama Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql clustering On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: > Dear Sirs > > I know that that postgresql can be configured for high availability > over a clustered environment using pgcluster, Do you have a case study showing this? > I am currently studying in my masters the clustering using MPI and > OpenMP, PVM and others packages and I have to do a project, so I was > thinking to use this opportunity to start implementing the > clustering over postgresql using any of the above packages. > > What do you think? Let a thousand schools of thought content. Let a hundred flowers bloom. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
In the past couple years I've worked on several personal/business projects to cluster PostgreSQL and InnoDB (without MySQL). I've tested shared-nothing, shared-memory, and shared-disk models. IMHO, shared-disk is the only viable option for performance and/or large production business environments. Using shared-memory or shared-nothing architectures in a database are fine for high-availability, but are expensive from a business-case for added performance. I'd be happy to share any of my clustering knowledge with ya offline. Have fun!
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On 9/21/05, Rafik Salama <rafikamir@gmail.com> wrote:
No I do not have a case study, I just read so, but what I am suggesting to
start doing is that if there is no cluster implementation to give high
availability of the database, I will start doing this project through the
message passing technique and I already have in the university a cluster of
19 machine intel xeon, you can see it in this URL
http://www.cs.aucegypt.edu/~cluster
But any way I was just asking so as not to reinvent the Wheel, in case there
is something like that, but since there is not, I will give it a try, at the
end of the day it is open source and I can do anything and if it happens to
work, who knows!!!!
Thanks
Rafik Salama
Systems Architect
CIT Global
CIT Building, Free Zone
Nasr City,
P.O.Box 11816, Cairo, Egypt
Tel : +202 271 8794 (ext. 115)
Fax : +202 2748335
Cell: +2010 5410035
http://www.citglobal.com
-----Original Message-----
From: David Fetter [mailto:david@fetter.org]
Sent: Wednesday, September 21, 2005 8:12 PM
To: Rafik Salama
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql clustering
On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote:
> Dear Sirs
>
> I know that that postgresql can be configured for high availability
> over a clustered environment using pgcluster,
Do you have a case study showing this?
> I am currently studying in my masters the clustering using MPI and
> OpenMP, PVM and others packages and I have to do a project, so I was
> thinking to use this opportunity to start implementing the
> clustering over postgresql using any of the above packages.
>
> What do you think?
Let a thousand schools of thought content. Let a hundred flowers
bloom.
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778
Remember to vote!
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
Jonah, I stumbled on this discussion in one of my recurring searches for an open-source database app capable of true clustering (failover, load balancing, etc) that I can pair with my PHP application. A search that, sadly, most often ends in disappointment -- there's tons and tons of database marketing BS out there. Part of my frustration is do to my lack of a real understanding of the models you mentioned in your comment. I've been searching for meaningful text and comparisons of the different clustering models, but have yet to find anything that truely breaks it down well (and deep). Could you perhaps point me -- and anyone else that happens upon this post with the same frustrations -- in the right direction? I've looked at PostgreSQL and EnterpriseDB, but I can't find anything definitive as far as clustering capabilities. What kinds of projects are there for clustering PgSQL, and are any of them mature enough for commercial apps? Best, Dan "Jonah H. Harris" wrote: > In the past couple years I've worked on several personal/business projects > to cluster PostgreSQL and InnoDB (without MySQL). I've tested > shared-nothing, shared-memory, and shared-disk models. IMHO, shared-disk is > the only viable option for performance and/or large production business > environments. Using shared-memory or shared-nothing architectures in a > database are fine for high-availability, but are expensive from a > business-case for added performance. I'd be happy to share any of my > clustering knowledge with ya offline. Have fun! > > > > On 9/21/05, Rafik Salama <rafikamir@gmail.com> wrote: > > > > No I do not have a case study, I just read so, but what I am suggesting to > > start doing is that if there is no cluster implementation to give high > > availability of the database, I will start doing this project through the > > message passing technique and I already have in the university a cluster > > of > > 19 machine intel xeon, you can see it in this URL > > http://www.cs.aucegypt.edu/~cluster > > > > But any way I was just asking so as not to reinvent the Wheel, in case > > there > > is something like that, but since there is not, I will give it a try, at > > the > > end of the day it is open source and I can do anything and if it happens > > to > > work, who knows!!!! > > > > Thanks > > > > Rafik Salama > > Systems Architect > > > > CIT Global > > CIT Building, Free Zone > > Nasr City, > > P.O.Box 11816, Cairo, Egypt > > Tel : +202 271 8794 (ext. 115) > > Fax : +202 2748335 > > Cell: +2010 5410035 > > http://www.citglobal.com > > > > -----Original Message----- > > From: David Fetter [mailto:david@fetter.org] > > Sent: Wednesday, September 21, 2005 8:12 PM > > To: Rafik Salama > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] postgresql clustering > > > > On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: > > > Dear Sirs > > > > > > I know that that postgresql can be configured for high availability > > > over a clustered environment using pgcluster, > > > > Do you have a case study showing this? > > > > > I am currently studying in my masters the clustering using MPI and > > > OpenMP, PVM and others packages and I have to do a project, so I was > > > thinking to use this opportunity to start implementing the > > > clustering over postgresql using any of the above packages. > > > > > > What do you think? > > > > Let a thousand schools of thought content. Let a hundred flowers > > bloom. > > > > Cheers, > > D > > -- > > David Fetter david@fetter.org http://fetter.org/ > > phone: +1 510 893 6100 mobile: +1 415 235 3778 > > > > Remember to vote! > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings > > > > > > -- > Respectfully, > > Jonah H. Harris, Database Internals Architect > EnterpriseDB Corporation > http://www.enterprisedb.com/
Daniel Duvall wrote: > I've looked at PostgreSQL and EnterpriseDB, but I can't find anything > definitive as far as clustering capabilities. What kinds of projects > are there for clustering PgSQL, and are any of them mature enough for > commercial apps? As you well know "clustering" means all and nothing at the same time. We do have a commercial failover cluster for provided by Redhat, with postgres running on it. The Postgres is installed on both nodes and the data are stored on SAN, only one instance of postgres run at time in one of two nodes. In last 2 years we had a failure and the service relocation worked as expected. Consider also that applications shall have a good behaviour like "try" to close the current connection and retry to open a new one for a while.... Regards Gaetano Mendola
Gaetano Mendola wrote: >Daniel Duvall wrote: > > > >>I've looked at PostgreSQL and EnterpriseDB, but I can't find anything >>definitive as far as clustering capabilities. What kinds of projects >>are there for clustering PgSQL, and are any of them mature enough for >>commercial apps? >> >> Are you looking for clustering or replication? There are two very popular replication solutions: Slony-I and Mammoth Replicator. Slony-I is an external replication solution, Mammoth Replicator is a complete PostgreSQL + Replication solution. Sincerely, Joshua D. Drake > >As you well know "clustering" means all and nothing at the same time. >We do have a commercial failover cluster for provided by Redhat, >with postgres running on it. The Postgres is installed on both nodes and the >data are stored on SAN, only one instance of postgres run at time in one >of two nodes. In last 2 years we had a failure and the service relocation >worked as expected. > >Consider also that applications shall have a good behaviour like "try" to >close the current connection and retry to open a new one for a while.... > >Regards >Gaetano Mendola > > >---------------------------(end of broadcast)--------------------------- >TIP 5: don't forget to increase your free space map settings > > -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
While "clustering" in some circles may be an open-ended buzzword -- mainly the commercial DB marketing crowd -- there are concepts beneath the bull that are even inherent in the name. However, I understand your point. >From what I've researched, the concepts and practices seem to fall under one of two abstract categorizations: fail-over (ok... high-availability), and parallel execution (high-performance... sure). While some consider the implementation of only one of these to qualify a cluster, others seem to demand that a "true" cluster must implement both. What I'm really after is a DB setup that does fail-over and parallel execution. Your setup sounds like it would gracefully handle the former, but cannot achieve the latter. Perhaps I'm simply asking too much of a free software setup. Thanks for your response.
Daniel Duvall schrieb: > While "clustering" in some circles may be an open-ended buzzword -- > mainly the commercial DB marketing crowd -- there are concepts beneath > the bull that are even inherent in the name. However, I understand > your point. > >>From what I've researched, the concepts and practices seem to fall > under one of two abstract categorizations: fail-over (ok... > high-availability), and parallel execution (high-performance... sure). Well, I dont know why many people believe parallel execution automatically means high performance. Actually most of the time the performance is much worser this way. If your dataset remains statically and you do only read-only requets, you get higher performance thru load-balancing. If howewer you do some changes to the data, the change has to be propagated to all nodes - which in fact costs performance. This highly depends on the link speed between the nodes. > While some consider the implementation of only one of these to qualify > a cluster, others seem to demand that a "true" cluster must > implement both. > > What I'm really after is a DB setup that does fail-over and parallel > execution. Your setup sounds like it would gracefully handle the > former, but cannot achieve the latter. Perhaps I'm simply asking too > much of a free software setup. commercial vendors arent much better here - they just dont tell you :-) There is pgpool or SQLRelay for example if you want to parallelize requests, you can combine with the various replication mechanism also available for PG and get what you want - and most important - get whats possible. Nobody can trick the math :-) Greets Tino
On 9/29/05, Tino Wildenhain <tino@wildenhain.de> wrote:
I think you should clarify that the type of clustering you're discussing is the, "shared-nothing" model which is most prevalent in open-source databases. Shared-disk and shared-memory clustered systems do not have the "propagation" issue but do have others (distributed lock manager, etc). Don't make blind statements. If you want more information about "real-world" clustering, read the research for DB2 (Mainframe) and Oracle RAC.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
Well, I dont know why many people believe parallel execution
automatically means high performance. Actually most of the time
the performance is much worser this way.
If your dataset remains statically and you do only read-only
requets, you get higher performance thru load-balancing.
If howewer you do some changes to the data, the change has to
be propagated to all nodes - which in fact costs performance.
This highly depends on the link speed between the nodes.
I think you should clarify that the type of clustering you're discussing is the, "shared-nothing" model which is most prevalent in open-source databases. Shared-disk and shared-memory clustered systems do not have the "propagation" issue but do have others (distributed lock manager, etc). Don't make blind statements. If you want more information about "real-world" clustering, read the research for DB2 (Mainframe) and Oracle RAC.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
Daniel Duvall wrote: > While "clustering" in some circles may be an open-ended buzzword -- > mainly the commercial DB marketing crowd -- there are concepts beneath > the bull that are even inherent in the name. However, I understand > your point. > >>From what I've researched, the concepts and practices seem to fall > under one of two abstract categorizations: fail-over (ok... > high-availability), and parallel execution (high-performance... sure). > While some consider the implementation of only one of these to qualify > a cluster, others seem to demand that a "true" cluster must > implement both. > > What I'm really after is a DB setup that does fail-over and parallel > execution. Your setup sounds like it would gracefully handle the > former, but cannot achieve the latter. Perhaps I'm simply asking too > much of a free software setup. > > Thanks for your response. > Also consider the PITR and some work I did last year: http://archives.postgresql.org/pgsql-admin/2005-06/msg00013.php With PITR you can have one or more remote machine/s that continuously replay log from main, and if the main crash the "mirrors" can come out from their reply and go "on line". At that time was not possible connect to a "replayng" engine to perform ( at least ) queries, dunno if this changed in 8.1 BTW, did someone go further with that idea? If not I'd like rewrite that stuff in C ( I do prefer C++ ). Regards Gaetano Mendola
Jonah H. Harris schrieb: > On 9/29/05, *Tino Wildenhain* <tino@wildenhain.de > <mailto:tino@wildenhain.de>> wrote: > > Well, I dont know why many people believe parallel execution > automatically means high performance. Actually most of the time > the performance is much worser this way. > If your dataset remains statically and you do only read-only > requets, you get higher performance thru load-balancing. > If howewer you do some changes to the data, the change has to > be propagated to all nodes - which in fact costs performance. > This highly depends on the link speed between the nodes. > > > I think you should clarify that the type of clustering you're discussing > is the, "shared-nothing" model which is most prevalent in open-source > databases. Shared-disk and shared-memory clustered systems do not have > the "propagation" issue but do have others (distributed lock manager, > etc). Don't make blind statements. If you want more information about > "real-world" clustering, read the research for DB2 (Mainframe) and > Oracle RAC. No, thats not a blind statement ;) It does not matter how the information is technically shared - shared mem must be copied or accessed over network links if you have more then one independend system. Locks are informations too - thus the same constraints apply. So no matter how you label the problem, the basic constraints: read communication and synchronisation overhead will remain. Costom solutions can circumvent some of the problems if you can shift the problem area (e.g. have some read-only areas, some seldom-write areas and some high write, some seldom read and not immediately propagated data)
Daniel, >From what I've researched, the concepts and practices seem to fall > under one of two abstract categorizations: fail-over (ok... > high-availability), and parallel execution (high-performance... sure). > While some consider the implementation of only one of these to qualify > a cluster, others seem to demand that a "true" cluster must > implement both. If you want to get a high degree of parallelism, 10s or 100s of machines are required. At that size, you must have faulttolerance to make the ystem usable. > What I'm really after is a DB setup that does fail-over and parallel > execution. Your setup sounds like it would gracefully handle the > former, but cannot achieve the latter. Perhaps I'm simply asking too > much of a free software setup. We've spent the last 3 years developing a parallel database that does both and I can tell you that it takes a huge developmenteffort to get it right for the general audience. Bizgres MPP is capable of handling ANSI SQL, is ACID compliantand scales to tens of terabytes, but it's not free (sorry about that). It is tons cheaper than Oracle or Teradatathough, and it's based on Postgres. - Luke
Thanks for your reply Luke. Bizgres looks like a very promissing project. I'll be sure to follow it. Thanks to everyone for their comments. I'm starting to understand the truth behind the hype and where these performance gains and hits stem from. -Dan
What about clustered filesystems? At first blush I would think the overhead of something like GFS might kill performance. Could one potentially achieve a fail-over config using multiple nodes with GFS, each having there own instance of PostgreSQL (but only one running at any given moment)? Best, Dan
What is the relationship between database support for clustering and grid computing and support for distributed databases? Two-phase COMMIT is comming in 8.1. What effect will this have in promoting FOSS grid support or distribution solutions for Postgresql?
Dan, On 9/29/05 3:23 PM, "Daniel Duvall" <the.liberal.media@gmail.com> wrote: > What about clustered filesystems? At first blush I would think the > overhead of something like GFS might kill performance. Could one > potentially achieve a fail-over config using multiple nodes with GFS, > each having there own instance of PostgreSQL (but only one running at > any given moment)? Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his first customers/sponsors of the research in 1998 when I implemented an 8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel. Again - it depends on what you're doing - if it's OLTP, you will spend too much time in lock management for disk access and things like Oracle RAC's CacheFusion becomes critical to reduce the number of times you have to hit disks. For warehousing/sequential scans, this kind of clustering is irrelevant. - Luke
Luke Lonergan wrote: > Dan, > > On 9/29/05 3:23 PM, "Daniel Duvall" <the.liberal.media@gmail.com> wrote: > > >>What about clustered filesystems? At first blush I would think the >>overhead of something like GFS might kill performance. Could one >>potentially achieve a fail-over config using multiple nodes with GFS, >>each having there own instance of PostgreSQL (but only one running at >>any given moment)? > > > Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his > first customers/sponsors of the research in 1998 when I implemented an > 8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel. > > Again - it depends on what you're doing - if it's OLTP, you will spend too > much time in lock management for disk access and things like Oracle RAC's > CacheFusion becomes critical to reduce the number of times you have to hit > disks. Hitting the disk is really bad. However, we have seen that consulting the network for small portions of data (e.g. locks) is even more critical. you will see that the CPU on all nodes is running at 1% or so while the network is waiting for data to be exchanged (latency) - this is the real problem. i don't know what oracle is doing in detail but they have real problem when losing a node inside the cluster (syncing again is really time consuming). > For warehousing/sequential scans, this kind of clustering is > irrelevant. I suggest to look at Teradata - for do really nice query partitioning on so called AMPs (we'd simply call it node). It is really nice for really ugly warehousing queries (ugly in terms of amount of data). Hans -- Cybertec Geschwinde & Schönig GmbH Schöngrabern 134; A-2020 Hollabrunn Tel: +43/1/205 10 35 / 340 www.postgresql.at, www.cybertec.at