Обсуждение: Master/Slave, DB separation or just spend $$$?
Hi Performance Wizards! I need advice on this. I have a db which is being constantly updated and queried by a few computers. We are doing datamining. The machine is running on a moderately powered machine and processors constantly hit 90%. At the same time, we need to present these data on a web interface. The performance for the web interface is now very sluggish as most of the power is occupied by the mining process. I have thought of a few ways out of this - 1) Buy a mega powered machine (temporal solution, quick fix) 2) Do a master-slave configuration 3) Separate the DB into 2 - One for pure mining purposes, the other purely for web serving For (2), I do not know if it will be very effective since the master will probably have many changes at any moment. I do not understand how the changes will be propagated from the master to the slave without impacting the slave's performance. Anyone with more experience here? (3) seems ideal but is a very very painful solution! We can possibly use a message queue system but again I am not familiar with MQ. Will need to do more research. If you were me, how would you solve this problem? Thanks! Kelvin Quee +65 9177 3635
On Tue, Jul 21, 2009 at 9:47 PM, Kelvin Quee<kelvinq@gmail.com> wrote: > Hi Performance Wizards! > > I need advice on this. > > I have a db which is being constantly updated and queried by a few > computers. We are doing datamining. The machine is running on a > moderately powered machine and processors constantly hit 90%. When your CPUs say 90%, is that regular user / sys %, or is it wait %? The difference is very important. What kind of hardware are you running on btw? # cpus, memory, # of drives,type, RAID controller if any? > At the same time, we need to present these data on a web interface. > The performance for the web interface is now very sluggish as most of > the power is occupied by the mining process. > > I have thought of a few ways out of this - > > 1) Buy a mega powered machine (temporal solution, quick fix) Depends very much on what your bound by, CPU or IO. If adding a couple of 15K SAS drives would double your performance then u don't need a super powerful machine. > 2) Do a master-slave configuration Often a good choice. > 3) Separate the DB into 2 - One for pure mining purposes, the other > purely for web serving > > For (2), I do not know if it will be very effective since the master > will probably have many changes at any moment. I do not understand how > the changes will be propagated from the master to the slave without > impacting the slave's performance. Anyone with more experience here? > > (3) seems ideal but is a very very painful solution! > > We can possibly use a message queue system but again I am not familiar > with MQ. Will need to do more research. That could be a very complex solution. > If you were me, how would you solve this problem? Slony, most likely.
Hi Scott, Thanks for the quick reply. I have been staring at *top* for a while and it's mostly been 40% in userspace and 30% in system. Wait is rather low and never ventures beyond 1%. My hardware is a duo core AMD Athlon64 X2 5000+, 1GB RAM and a single 160 GB SATA II hard disk drive. I will go look at Slony now. Scott, one question though - If my master is constantly changing, wouldn't the updates from the master to the slave also slow down the slave? Kelvin Quee +65 9177 3635 On Wed, Jul 22, 2009 at 1:42 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > On Tue, Jul 21, 2009 at 9:47 PM, Kelvin Quee<kelvinq@gmail.com> wrote: >> Hi Performance Wizards! >> >> I need advice on this. >> >> I have a db which is being constantly updated and queried by a few >> computers. We are doing datamining. The machine is running on a >> moderately powered machine and processors constantly hit 90%. > > When your CPUs say 90%, is that regular user / sys %, or is it wait %? > The difference is very important. > What kind of hardware are you running on btw? # cpus, memory, # of > drives,type, RAID controller if any? > >> At the same time, we need to present these data on a web interface. >> The performance for the web interface is now very sluggish as most of >> the power is occupied by the mining process. >> >> I have thought of a few ways out of this - >> >> 1) Buy a mega powered machine (temporal solution, quick fix) > > Depends very much on what your bound by, CPU or IO. If adding a > couple of 15K SAS drives would double your performance then u don't > need a super powerful machine. > >> 2) Do a master-slave configuration > > Often a good choice. > >> 3) Separate the DB into 2 - One for pure mining purposes, the other >> purely for web serving >> >> For (2), I do not know if it will be very effective since the master >> will probably have many changes at any moment. I do not understand how >> the changes will be propagated from the master to the slave without >> impacting the slave's performance. Anyone with more experience here? >> >> (3) seems ideal but is a very very painful solution! >> >> We can possibly use a message queue system but again I am not familiar >> with MQ. Will need to do more research. > > That could be a very complex solution. > >> If you were me, how would you solve this problem? > > Slony, most likely. >
On Wed, Jul 22, 2009 at 1:52 AM, Kelvin Quee<kelvinq@gmail.com> wrote: > Hi Scott, > > Thanks for the quick reply. > > I have been staring at *top* for a while and it's mostly been 40% in > userspace and 30% in system. Wait is rather low and never ventures > beyond 1%. > > My hardware is a duo core AMD Athlon64 X2 5000+, 1GB RAM and a single > 160 GB SATA II hard disk drive. So I take it you're on a tight budget then? I'm guessing you could put a single quad core cpu and 8 Gigs of ram in place for a reasonable price. I'd highly recommend setting up at least software RAID-1 for increased reliability. > I will go look at Slony now. Might be overkill if you can get by on a single reasonably powerful machine. > Scott, one question though - If my master is constantly changing, > wouldn't the updates from the master to the slave also slow down the > slave? Yes it will, but the overhead for the slave is much less than the master.
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > I have a db which is being constantly updated and queried by a few > computers. We are doing datamining. The machine is running on a > moderately powered machine and processors constantly hit 90%. ... > 2) Do a master-slave configuration > 3) Separate the DB into 2 - One for pure mining purposes, the other > purely for web serving Why not combine the two (if I'm understanding correctly)? Use Bucardo or Slony to make two slaves, one for the web servers to hit (assuming they are read-only queries), and one to act as a data warehouse. Your main box gets all the updates but has no selects or complex queries to weigh it down. If the we server does read and write, have your app maintain two database handles. > For (2), I do not know if it will be very effective since the master > will probably have many changes at any moment. I do not understand how > the changes will be propagated from the master to the slave without > impacting the slave's performance. Anyone with more experience here? The slave will get the updates as well, but in a more efficient manner as there will be no WHERE clauses or other logic associated with the original update. Bucardo or Slony will simply COPY over the rows as needed. Keep in mind that both are asynchronous, so changes won't appear on the slaves at the same time as the master, but the delay is typically measured in seconds. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200907221229 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkpnPpsACgkQvJuQZxSWSsggKgCfT0EbxWQdym30n7IV1J1X6dC6 HRkAoND4nCMVeffE2VW34VVmPcRtLclI =tTjn -----END PGP SIGNATURE-----
kelvinq@gmail.com (Kelvin Quee) writes: > I will go look at Slony now. It's worth looking at, but it is not always to be assumed that replication will necessarily improve scalability of applications; it's not a "magic wand" to wave such that "presto, it's all faster!" Replication is helpful from a performance standpoint if there is a lot of query load where it is permissible to look at *somewhat* out of date information. For instance, replication can be quite helpful for pushing load off for processing accounting data where you tend to be doing analysis on data from {yesterday, last week, last month, last year}, and where the data tends to be inherently temporal (e.g. - you're looking at transactions with dates on them). On the other hand, any process that anticipates *writing* to the master database will be more or less risky to try to shift over to a possibly-somewhat-behind 'slave' system, as will be anything that needs to be consistent with the "master state." -- (reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc")) http://www3.sympatico.ca/cbbrowne/spiritual.html "Nondeterminism means never having to say you're wrong." -- Unknown
On Wed, Jul 22, 2009 at 12:52 AM, Kelvin Quee<kelvinq@gmail.com> wrote: > I have been staring at *top* for a while and it's mostly been 40% in > userspace and 30% in system. Wait is rather low and never ventures > beyond 1%. Certainly seems like you are CPU bound. > My hardware is a duo core AMD Athlon64 X2 5000+, 1GB RAM and a single > 160 GB SATA II hard disk drive. Looks like you are on a budget as Scott also suggested - I would also mirror his recommendation to upgrade to a quad core processor and more memory. Hopefully your motherboard supports quad-cores so you don't have to replace that bit, and you should be able to get at least 4GB of RAM in there. If IO load becomes an issue, Velociraptors are fast and don't cost too much. Getting a basic RAID1 will help prevent data-loss due to disk failure - make sure you are making offline backups as well! > I will go look at Slony now. > > Scott, one question though - If my master is constantly changing, > wouldn't the updates from the master to the slave also slow down the > slave? Yes - Slony will increase the load on your source node as it does take work to do the replication, so unless you are able to offload your CPU heavy read only queries to the slave machine, it will only bog down the source node more. -Dave