Обсуждение: Materialized views
This is the time of year when the Wisconsin Courts formalize their annual plan for where people will be spending the bulk of their time in the coming year. Two years ago at this time, managers decided that serializable transactions were a big enough issue to justify assigning about half of my 2011 time to working on PostgreSQL enhancements for that. This year our big database issue is materialized views. As we strive to create our next generation of software we find ourselves wanting to provide "dashboard" type windows with graphs of statistics which are insanely expensive to calculate on the fly. We've been creating ad hoc materialized views to deal with the performance issues, but that is labor intensive. I'm considering submitting a proposal to management that I be assigned to work on a declarative implementation in PostgreSQL to allow speedier application development of software needing materialized views. I'm posting to make sure that nobody else is already in the midst of working on this, and to check regarding something on the Wiki page for this topic: http://wiki.postgresql.org/wiki/Materialized_Views That page describes three components: creating MVs, updating MVs, and having the planner automatically detect when an MV matches some portion of a regular query and using the MV instead of the specified tables in such cases. I have high confidence that if time is approved I could do the first two for the 9.3, but that last one seems insanely complicated and not necessarily a good idea. (That's particularly true with some of the lazier strategies for maintaining the data in the materialized view.) I don't think we want to use that 3rd component in our shop, anyway. So the question is, would a patch which does the first two without the third be accepted by the community? I'm not at the point of proposing specifics yet; the first phase would be a close review of prior threads and work on the topic (including the GSoC work). Then I would discuss implementation details here before coding. The hope on our end, of course, is that the time spent on implementing this would be more than compensated by application programmer time savings as we work on our next generation of application software, which seems like a pretty safe bet to me. -Kevin
On 8 November 2011 21:23, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > This is the time of year when the Wisconsin Courts formalize their > annual plan for where people will be spending the bulk of their time > in the coming year. Two years ago at this time, managers decided > that serializable transactions were a big enough issue to justify > assigning about half of my 2011 time to working on PostgreSQL > enhancements for that. This year our big database issue is > materialized views. > > As we strive to create our next generation of software we find > ourselves wanting to provide "dashboard" type windows with graphs of > statistics which are insanely expensive to calculate on the fly. > We've been creating ad hoc materialized views to deal with the > performance issues, but that is labor intensive. I'm considering > submitting a proposal to management that I be assigned to work on > a declarative implementation in PostgreSQL to allow speedier > application development of software needing materialized views. > > I'm posting to make sure that nobody else is already in the midst of > working on this, and to check regarding something on the Wiki page > for this topic: > > http://wiki.postgresql.org/wiki/Materialized_Views > > That page describes three components: creating MVs, updating MVs, and > having the planner automatically detect when an MV matches some > portion of a regular query and using the MV instead of the specified > tables in such cases. I have high confidence that if time is > approved I could do the first two for the 9.3, but that last one > seems insanely complicated and not necessarily a good idea. (That's > particularly true with some of the lazier strategies for maintaining > the data in the materialized view.) I don't think we want to use > that 3rd component in our shop, anyway. So the question is, would a > patch which does the first two without the third be accepted by the > community? > > I'm not at the point of proposing specifics yet; the first phase > would be a close review of prior threads and work on the topic > (including the GSoC work). Then I would discuss implementation > details here before coding. > > The hope on our end, of course, is that the time spent on > implementing this would be more than compensated by application > programmer time savings as we work on our next generation of > application software, which seems like a pretty safe bet to me. +1 I was pleased to see the subject of this thread. I definitely think it's worth it, especially if you're able to make it also work for foreign tables (saving expense of seeking external data so can also act as a local cache, but that's me getting carried away). And I agree regarding the planner detection. If that ever were desired, it certainly would't need implementing in the first phase. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/8/11 1:23 PM, Kevin Grittner wrote: > So the question is, would a > patch which does the first two without the third be accepted by the > community? AFAIC, yes. For that matter, Part 3 is useful without parts 1 and 2, if someone wanted to work on that. I recall some academic work on automated materialized view matching back in the 7.2 days. And I can help test whatever you come up with ... I do a lot of matviews. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 8 Nov 2011, at 21:23, Kevin Grittner wrote: > This is the time of year when the Wisconsin Courts formalize their > annual plan for where people will be spending the bulk of their time > in the coming year. Two years ago at this time, managers decided > that serializable transactions were a big enough issue to justify > assigning about half of my 2011 time to working on PostgreSQL > enhancements for that. This year our big database issue is > materialized views. +1 for that too. Too many hacked together MVs here, and more coming.
* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote: > So the question is, would a > patch which does the first two without the third be accepted by the > community? I'm about 99% sure the answer to that is 'yes'. Are you thinking of having a background scheduler which handles the updating of schedule-driven (instead of trigger-driven) MVs..? Not to try to feature-creep this on you, but you might consider how a new backend process which handles scheduled tasks could be generalized to go beyond handling just MV updates.. :) Thanks! Stephen
Stephen Frost <sfrost@snowman.net> wrote: > Are you thinking of having a background scheduler which handles > the updating of schedule-driven (instead of trigger-driven) MVs..? I'm trying not to get to far into discussing design in advance of reviewing previous work and any papers on the topic. That said, it seems clear that the above is required but not sufficient. > Not to try to feature-creep this on you, but you might consider > how a new backend process which handles scheduled tasks could be > generalized to go beyond handling just MV updates.. :) I'll keep that in mind. :-) -Kevin
On Nov 8, 2011, at 2:54 PM, Stephen Frost wrote: >> So the question is, would a >> patch which does the first two without the third be accepted by the >> community? +1 Definitely. > I'm about 99% sure the answer to that is 'yes'. Are you thinking of > having a background scheduler which handles the updating of > schedule-driven (instead of trigger-driven) MVs..? Not to try to > feature-creep this on you, but you might consider how a new backend > process which handles scheduled tasks could be generalized to go beyond > handling just MV updates.. :) +1 That sure would be nice. Might be some useful stuff in pgAgent to pull into this (license permitting). Best, David
> I'm about 99% sure the answer to that is 'yes'. Are you thinking of > having a background scheduler which handles the updating of > schedule-driven (instead of trigger-driven) MVs..? Not to try to > feature-creep this on you, but you might consider how a new backend > process which handles scheduled tasks could be generalized to go beyond > handling just MV updates.. :) No feature-creeping; the two features described will be hard enough. Besides, we have pg_agent for that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Nov 8, 2011, at 3:25 PM, Josh Berkus wrote: >> I'm about 99% sure the answer to that is 'yes'. Are you thinking of >> having a background scheduler which handles the updating of >> schedule-driven (instead of trigger-driven) MVs..? Not to try to >> feature-creep this on you, but you might consider how a new backend >> process which handles scheduled tasks could be generalized to go beyond >> handling just MV updates.. :) > > No feature-creeping; the two features described will be hard enough. > Besides, we have pg_agent for that. Well, there has to be some way to refresh an MV at regular intervals, no? Best, David
On Nov 8, 2011, at 4:23 PM, Josh Berkus wrote: >> Well, there has to be some way to refresh an MV at regular intervals, no? > > For periodic update matviews, yes. However, Kevin needs only produce > the command/function call for updating a generic periodic matview. He > doesn't need to provide a scheduling utility. For simple cases, cron works. Oh, I see. I was assuming it would be automatic-ish, with the declaration of the matview including update interval information.Could certainly come later, though. Best, David
>> No feature-creeping; the two features described will be hard enough. >> Besides, we have pg_agent for that. > > Well, there has to be some way to refresh an MV at regular intervals, no? For periodic update matviews, yes. However, Kevin needs only produce the command/function call for updating a generic periodic matview. He doesn't need to provide a scheduling utility. For simple cases, cron works. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/08/2011 06:25 PM, Josh Berkus wrote: >> I'm about 99% sure the answer to that is 'yes'. Are you thinking of >> having a background scheduler which handles the updating of >> schedule-driven (instead of trigger-driven) MVs..? Not to try to >> feature-creep this on you, but you might consider how a new backend >> process which handles scheduled tasks could be generalized to go beyond >> handling just MV updates.. :) > No feature-creeping; the two features described will be hard enough. > Besides, we have pg_agent for that. > Yeah. I'd be plenty happy with greedy MVs plus lazy MVs that need a manual refresh. These are both patterns I use a lot (the buildfarm dashboard is in effect a greedy MV although it's not updated by a trigger). cheers andrew
* Josh Berkus (josh@agliodbs.com) wrote: > > Well, there has to be some way to refresh an MV at regular intervals, no? > > For periodic update matviews, yes. However, Kevin needs only produce > the command/function call for updating a generic periodic matview. He > doesn't need to provide a scheduling utility. For simple cases, cron works. Perhaps that would be an acceptable initial version, but I'd be pretty disappointed and I certainly don't think we should stop there. I'm quite aware of cron and as aware of the difficulties that many DBAs are going to have getting cronjobs implemented. There's permission issues (we see this already with the requests to provide an in-PG DBA interface for pg_hba.conf..), locking issues (writing decent scripts that don't destroy the box if they take a bit too long, like >5m on a 5m scheduled job), authentication issues (we don't really want these running as superuser if we can avoid it..), and probably other things I'm not thinking of. Thanks, Stephen
On Tue, Nov 8, 2011 at 8:31 PM, Stephen Frost <sfrost@snowman.net> wrote: > Perhaps that would be an acceptable initial version, but I'd be pretty > disappointed and I certainly don't think we should stop there. I agree that a built-in job scheduler would be pretty awesome, but I think it's a completely separate project from materialized views. Each of the two is a major project all by itself; making one into a dependency of the other is a recipe for failure. In view of Kevin's success with SSI, I'm very pleased to see him picking this out as his next target. If and when it gets done, this will be a great, great feature. And we shouldn't be afraid to start small and move incrementally toward what we really want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/08/2011 04:23 PM, Kevin Grittner wrote: > http://wiki.postgresql.org/wiki/Materialized_Views > That page describes three components: creating MVs, updating MVs, and > having the planner automatically detect when an MV matches some > portion of a regular query and using the MV instead of the specified > tables in such cases. So far that page is just my initial notes on this subject based on some research I did, don't read too much into it. Ignoring MV substitution is certainly the right plan for an initial development target. An implementation that didn't update automatically at all is also still a useful step. It's very easy to pick too big of a chunk of this project to chew on at once. When I wrote that, it wasn't completely clear to me yet that doing the updates would involve so many of the same tricky bits that stalled progress on the MERGE command. Nowadays I think of working out the concurrency issues in both this and MERGE, in the wake of true serialization, as another step in this implementation plan. It's not strictly necessary, but there's some shared issues that might be tackled easier in the MERGE context instead. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Tue, Nov 8, 2011 at 9:23 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > That page describes three components: creating MVs, updating MVs, and > having the planner automatically detect when an MV matches some > portion of a regular query and using the MV instead of the specified > tables in such cases. I have high confidence that if time is > approved I could do the first two for the 9.3, but that last one > seems insanely complicated and not necessarily a good idea. (That's > particularly true with some of the lazier strategies for maintaining > the data in the materialized view.) I don't think we want to use > that 3rd component in our shop, anyway. So the question is, would a > patch which does the first two without the third be accepted by the > community? For me, yes. I support and encourage your work. It's a big topic and we must approach it incrementally. Having said that, we should assume that #3 will be implemented and that we need to collect appropriate metadata and anything else required. So the design should foresee #3 and not in any way optimise for the case where #3 doesn't happen. It may occur that #3 is added during next cycle concurrently with this development. I would also caution that all other databases currently provide #3 as a matter of course. That is the "sauce" as far as many people are concerned. Everything else is already achievable using external application code. So I would not want people to start saying "we have MVs" when in fact all we did was add declarative syntax to support what was already possible - we could easily publicise that incorrectly at release time. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > I'm considering submitting a proposal to management that I be > assigned to work on a declarative implementation in PostgreSQL to > allow speedier application development of software needing > materialized views. Thanks to all who provided feedback and support in response to my post. Based on the feedback here and off-list, I did submit a proposal. It was just approved by the appropriate steering committee (consisting of our CIO, the Director of State Courts, District Court Administrators, Judges, Clerks of Court, and other stake-holders) as a low-priority project. That means that I expect I'll have the time to get a patch together in time for 9.3, but the times at which the decks will be clear of other assignments to allow work on this will not be very predictable. I'll probably be on-again, off-again throughout the year. I apologize in advance for the fact that the times when I will be able to work on it might not fit well with the release cycle or CFs, but I kinda have to take what I can get in that regard. -Kevin