Обсуждение: CFH: Mariposa, distributed DB
This is a Call For Hackers: Some time ago, I floated a little discussion on this list about doing some distributed database work with PostgreSQL. The project got back burnered at work, but now has a timeline for needing a solution "this summer." Recent discussions on this list about Postgres's historical object roots got me back to the Berkeley db sites, and reminded me about Mariposa, which is Stonebraker's take on distributed DBs. http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/ StoneBraker has gone on to commercialize Mariposa as Cohera, which seems to be one of those Enterprise Scale products where if you need to ask how much a license costs, you can't afford it ;-) Sounds like now would be a good time to re-visit Mariposa, and see what good ideas can be folded over into PostgreSQL. Mariposa was funded by ARPA and ARO, and was used by NASA as the database part of the Sequoia Project, which became Big Sur, looking to unify the various kinds of geophysical data collected by earth observing missions. The code is an offshoot of Postgres95, with lots of nasty '#ifdef P95's scattered around. The split predates lots of good work by the PostgreSQL team to clean up years of academic cruft that had accumulated, so merging is not trivial. Anyway, anyone interested in taking a look at this with me? I think the place to start (i.e., where I'm starting) is to get the June-1996 alpha release of Mariposa to compile on a current system (I'm running Linux myself.) I've been doing a compare-and-contrast, staring at source code, but I think I need a running system to decide how the parts fit together. Then, plan what features to 'fold' into pgsql, and run a proposal past this list, some time later in the 7.x series, perhaps in a couple of months (you guys will probably be on 8.x by then!) Hopefully, not take-up too much of the core developers time until we're talking integration. Anyone else interested, I'm using the tarball from: ftp://epoch.cs.berkeley.edu/pub/mariposa/src/alpha-1/mariposa-alpha-1.tar.gz If this really takes off, I can host CVS of the mariposa and pgsql sources, as well as web pages, mailing list, whatever. If it's just a couple of us (or me all by myself ;-) we'll keep it simple. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> This is a Call For Hackers: > > Some time ago, I floated a little discussion on this list about doing > some distributed database work with PostgreSQL. The project got back > burnered at work, but now has a timeline for needing a solution "this > summer." Recent discussions on this list about Postgres's historical > object roots got me back to the Berkeley db sites, and reminded me about > Mariposa, which is Stonebraker's take on distributed DBs. > > http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/ > I have looked at the code. I have files that show all the diffs they made to it and they have some new files. It was hard for me to see what they were doing. Looks like they hacked up the executor and put in some translation layer to talk to some databroker. It seems like an awfully complicated way to do it. I would not bother getting it to run, but figure out what they were trying to do, and why, and see how we can implement it. My guess is that they had one central server for each table, and you went to that server to get information. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
"Ross J. Reedstrom" wrote: > > > Anyone else interested, I'm using the tarball from: > > ftp://epoch.cs.berkeley.edu/pub/mariposa/src/alpha-1/mariposa-alpha-1.tar.gz > Is mariposa licence compatible with ours ? ------------------ Hannu
On Mon, Feb 07, 2000 at 04:23:06PM -0500, Bruce Momjian wrote: > > This is a Call For Hackers: > > > > Some time ago, I floated a little discussion on this list about doing > > some distributed database work with PostgreSQL. The project got back > > burnered at work, but now has a timeline for needing a solution "this > > summer." Recent discussions on this list about Postgres's historical > > object roots got me back to the Berkeley db sites, and reminded me about > > Mariposa, which is Stonebraker's take on distributed DBs. > > > > http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/ > > > > I have looked at the code. I have files that show all the diffs they > made to it and they have some new files. It was hard for me to see what > they were doing. Looks like they hacked up the executor and put in some > translation layer to talk to some databroker. It seems like an awfully > complicated way to do it. I would not bother getting it to run, but > figure out what they were trying to do, and why, and see how we can > implement it. My guess is that they had one central server for each > table, and you went to that server to get information. > Actually, this being an academic project, there's lots of design documents about how it's _supposed_ to work. Stonebraker calls in an 'agoric' distributed database, as in agora, market. The various db servers offer tables (or even specific views on tables) 'for sale', and bid against/with each other to provide the data to clients requesting it. The idea behind it is to us a micro-economic market model to do your distributed optimizations for you, rather than have the DBAs decide what tables go where, what tables need to be shadowed, etc. The win is supposedly massive scaleability: they Cohera site talks about 10000s of servers. As I said, I've been doing the compare existing source code thing, but thought working code might be more revealing, and give my project manager something to see progress on ;-) Your right, though, that the most productive way to go, in the long run, might be to reimplement what they've described, in the current pgsql tree, using the Mariposa source as an example implementation. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
On Mon, Feb 07, 2000 at 11:44:14PM +0200, Hannu Krosing wrote: > "Ross J. Reedstrom" wrote: > > > > > > Anyone else interested, I'm using the tarball from: > > > > ftp://epoch.cs.berkeley.edu/pub/mariposa/src/alpha-1/mariposa-alpha-1.tar.gz > > > > Is mariposa licence compatible with ours ? It better be, it's the same license ;-) That is, Mariposa is a branch off the Postgres95 tree. Actually, it's a good question: the PG95 license would have let them put just about any license on Mariposa they wanted. After running both COPYRIGHT files throught fmt, here's the diff output: wallace$ diff COPYRIGHT COPYRIGHT.pgsql 1c1,2 < Mariposa Distributed Data Base Management System --- > PostgreSQL Data Base Management System (formerly known as Postgres, > then as Postgres95). 3c4 < Copyright (c) 1994-6 Regents of the University of California --- > Copyright (c) 1994-7 Regents of the University of California 21d21 < wallace$ So, it is word for word the PostgreSQL license. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
Bruce Momjian wrote: > > > This is a Call For Hackers: > > > > Some time ago, I floated a little discussion on this list about doing > > some distributed database work with PostgreSQL. The project got back > > burnered at work, but now has a timeline for needing a solution "this > > summer." Recent discussions on this list about Postgres's historical > > object roots got me back to the Berkeley db sites, and reminded me about > > Mariposa, which is Stonebraker's take on distributed DBs. > > > > http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/ It has a nice concept of simulating free market for distributed query optimisation. Auctions, brokers and all ... > > I have looked at the code. I have files that show all the diffs they > made to it and they have some new files. It was hard for me to see what > they were doing. Looks like they hacked up the executor and put in some > translation layer to talk to some databroker. The broker was for determining where to get the data from - as each table could be queried from several sites there had to be a mechanism for the planner to figure out the cheapest (or fastest if "money" was not a problem) > It seems like an awfully > complicated way to do it. I would not bother getting it to run, but > figure out what they were trying to do, and why, and see how we can > implement it. My guess is that they had one central server for each > table, and you went to that server to get information. They would not have needed the broker for such a simple scheme IIRC they had no central table, but they doubled the length of oid and made it to include the site id of the site that created the tuple. It could be that they restricted changing a tuple to that site ? The site to go for information was determined by an auction where each site offered speed and cost for looking up the data. Usually the didn't also quarantee the latest data, just the "best effort". ------------------- Hannu
At 12:04 AM 2/8/00 +0200, Hannu Krosing wrote: >The site to go for information was determined by an auction where each site >offered speed and cost for looking up the data. Usually the didn't also >quarantee the latest data, just the "best effort". I just glanced at the website. They explicitly mention that they don't require global synchronization, because it would slow down response time for many things (with thousands of server, that sounds like an understatement). So, yes, it would appear they don't guarantee the latest data. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Seems there was more than just going back to the Berkeley site that reminded me of Mariposa. A principle new functionality in Mariposa is the ability to 'fragment' a class, based on a user-defined partitioning function. The example used is a widgets class, which is partitioned on the 'location' field (i.e., the warehouse the widget is stored in) CREATE TABLE widgets (part_no int4,location char16,on_hand int4,on_order int4,commited int4 ) PARTITION ON LOCATION USING btchar16cmp; Then, the table is filled with tuples, all containing locations of either 'Miami' or 'New York'. SELECT * from widgets; works as expected. Later, this table is fragmented: SPLIT FRAGMENT widgets INTO widgets_mi, widgets_ny AT 'Miami'; Now, the original table widgets is _empty_: all the tuples with location <= 'Miami' go to widgets_mi, location > 'Miami' go to widgets_ny. SELECT * from widgets; Still returns all the tuples! So, this works sort of the way Chris Bitmead has implemented subclasses: widgets_mi and widgets_ny are subclasses of the widgets class, so selects return everything below. They differ in that only PARTITIONed classes can be FRAGMENTed. The distributed part comes in with the MOVE FRAGMENT command. This transfers the 'master' copy of a table to the designated host, so future access to that FRAGMENT will go over the network. There's also a COPY FRAGMENT command, that sets up a local cache of a fragment, with a periodic update time. These copies may be either READONLY, or (default) READ/WRITE. Seems updates are timed only (simple extension would be to implement write through behavior) All this is coming from the Mariposa User's Manual, which is an extended version of the Postgres95 User's Manual. As to latest vs. best effort: One defines a BidCurve, who's dimensions are Cost and Time. A flat curve should get you that latest data. And, since the DataBroker and Bidder are both implemented as Tcl scripts, so it would be possible to define a bid policy that only buys the latest data, regardless of how long it's going to take. Oh, BTW, yes that does put _two_ interpreted Tcl scripts on the execution path for every query. Wonder what _that'll_ do for execution time. However, it's like planning/optimization time, in that it's spent per query, rather than per tuple. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005 On Mon, Feb 07, 2000 at 02:19:56PM -0800, Don Baccus wrote: > At 12:04 AM 2/8/00 +0200, Hannu Krosing wrote: > > >The site to go for information was determined by an auction where each site > >offered speed and cost for looking up the data. Usually the didn't also > >quarantee the latest data, just the "best effort". > > I just glanced at the website. They explicitly mention that they don't > require global synchronization, because it would slow down response time > for many things (with thousands of server, that sounds like an > understatement). > > So, yes, it would appear they don't guarantee the latest data. >
At 04:57 PM 2/7/00 -0600, Ross J. Reedstrom wrote: >CREATE TABLE widgets ( > part_no int4, > location char16, > on_hand int4, > on_order int4, > commited int4 >) PARTITION ON LOCATION USING btchar16cmp; Oracle's partitioning is fixed, in other words once you choose a condition to split on, you can't change it. In other words, in your example: >Then, the table is filled with tuples, all containing locations of either >'Miami' or 'New York'. After splitting the table into ">'Miami'" and "<='Miami" fragments, I've been told that you can't (say) change it to ">'Boston'" and have the proper rows move automatically. In practice, partioning is often used to split tables on dates. You might want to partion off your old tax data at the 7-yr old mark, and each year as you do your taxes move the oldest tax data in your "recent taxes" table split off to your "older taxes" table. Apparently, Informix is smart enough to do this for you. Since a couple of the people associated with the project are Informix people, do you have any idea if Mariposa is able to do this? > >SELECT * from widgets; > >works as expected. > >Later, this table is fragmented: > >SPLIT FRAGMENT widgets INTO widgets_mi, widgets_ny AT 'Miami'; In other words some sort of "update the two tables AT <some new criteria>" Whatever the answer to my question, Mariposa certainly looks interesting. It's functionality that folks who do data warehousing really need. >Oh, BTW, yes that does put _two_ interpreted Tcl scripts on the execution >path for every query. Wonder what _that'll_ do for execution time. However, >it's like planning/optimization time, in that it's spent per query, rather >than per tuple. Probably not as bad as you think, if they're simple and short. Once someone has this up and running and integrated with PostgreSQL and robust and reliable we can measure it and change to something else if necessary :) - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Hi, the Mariposa db distribution is interesting, but it is very specific. If I good understand it is not real-time and global synchronized DB replication. But for a lot of users (and me) is probably interestion on-line DB replication and synchronization. How much users have 10K servers?I explore current PG's source and is probably possible create supportfor on-line replication. My idea is replicate data on a heap_ layout. The parser, planer and executor run on local backend and replicate straight-out tuples to the others servers (nodes). It needs synchronize PG's locks too. In near future I want start project for PG on-line replication. Or works on this anyone now? Comments? Karel