Обсуждение: Materialized views proposal
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I apologize if this post is inappropriate. Doing some development work, me and my co-worker discussed some optimizations strategies. One of the ideas that came up was materialized views. Trading disk space to summarize queries, and paying for a trigger waterfall on certain kinds of updates seems like a small price to pay right now in our situation. I searched through the archives and I couldn't seem to find anything relevant. As I've no familiarity with the internals, but I do know the front-end pretty well, what can I do to help implement materialized views? Is such an idea feasible? Is there some reason why it just isn't a good idea? (And if not, what is a better idea?) If someone is willing to guide me, I can probably contribute a fair amount of C code. I do have experience with C. The bird's eye view of the project would probably be turning a statement (is there such a statement in SQL92?) in the creation of a table, the initial population of the table, and the creation of several triggers strategically placed so that the data in the materialized view table is always in sync. Direct inserts or updates on the materialized view should be illegal, except by the triggers. However, perhaps like views work today, we can allow rules to be added to the table. Certain restrictions on the materialized views should be enforced: No mutables, in particular. - -- Jonathan Gardner jgardner@jonathangardner.net Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/xNzcWgwF3QvpWNwRAkbiAJ4m1zRPe+y3Tha0649QXH30y9eITwCfTjsv ow9Nwnnwrc6x9QaAB1AfHWQ= =Ofb5 -----END PGP SIGNATURE-----
Jonathan Gardner kirjutas K, 26.11.2003 kell 19:03: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I apologize if this post is inappropriate. > > Doing some development work, me and my co-worker discussed some > optimizations strategies. One of the ideas that came up was materialized > views. Trading disk space to summarize queries, and paying for a trigger > waterfall on certain kinds of updates seems like a small price to pay right > now in our situation. > > I searched through the archives and I couldn't seem to find anything > relevant. > > As I've no familiarity with the internals, but I do know the front-end > pretty well, what can I do to help implement materialized views? First, You could start by implementing materialized views manually, using tables and triggers, to get the feel of what should be generated. Next, still working from frontend, try to make some code (probably in some RAD/scripring language) which does the transform from CREATE VIEW (or whatever) syntax (or manually constructed syntax tree) into create table + create trigger statements. If you have the above working well, talk about moving this to backend. > Is such an > idea feasible? Is there some reason why it just isn't a good idea? (And if > not, what is a better idea?) > > If someone is willing to guide me, I can probably contribute a fair amount > of C code. I do have experience with C. What is needed is good algorithms. Writing C code is secondary to that. Similar problem has kept us from implementing updatable views for quite some time. -------------------- Hannu
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 26 November 2003 09:19, Hannu Krosing wrote: > > First, You could start by implementing materialized views manually, > using tables and triggers, to get the feel of what should be generated. > > Next, still working from frontend, try to make some code (probably in > some RAD/scripring language) which does the transform from CREATE VIEW > (or whatever) syntax (or manually constructed syntax tree) into create > table + create trigger statements. > > If you have the above working well, talk about moving this to backend. > Thanks for the suggestion. I will definitely do that. > What is needed is good algorithms. Writing C code is secondary to that. > > Similar problem has kept us from implementing updatable views for quite > some time. > You are definitely correct. - -- Jonathan M. Gardner Web Developer, Amazon.com jonagard@amazon.com - (206) 266-2906 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/xPfOBFeYcclU5Q0RAhgfAKDAYmm67Jc5n7WwyUDtn5IJOhhrXwCfcRud EAR9U63FEqlHbGrdzKxGmdw= =a6RK -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 26 November 2003 10:58 am, Jonathan Gardner wrote: > On Wednesday 26 November 2003 09:19, Hannu Krosing wrote: > > What is needed is good algorithms. Writing C code is secondary to that. > > > > Similar problem has kept us from implementing updatable views for quite > > some time. > > You are definitely correct. Preliminary research has shown that: 1) We would put triggers on each table that contributes to a view. 2) We would only be interested in those inserts and deletes (counting an update as a delete and then an insert) that satisfy the "WHERE" clause of the view. 3) We would implement some sort of differential view update scheme based on the paper "Efficiently Updating Materialized Views"[1]. They classify all select queries into one of four categories and describe how to go about producing a differential update. I couldn't understand this part at all. - --- I am already too deep in database theory. While I enjoy reading a good paper, I just don't have the background in DB science like I do in Physics. For instance, my set theory is very weak (I can't tell if U is a union or an intersection, for instanct). At this point, I have a choice. Drop a couple hundred dollars on database theory textbooks and spend the next three months learning it, or hoping one of you database theorists out there take pity on me and pick this up or coach me through it. Any takers on the second one? - --- I have a proposal. Let me hear what you think. There will be two kinds of materialized view in PostgreSQL: The first is the "snapshot" approach. Provide a materialized view, but don't provide automatic updating. The user can call a refresh statement to repopulate the entire view from time to time as appropriate. The benefit of this is that you can use queries that have functions that are immutable or stable. (You could probably use volatile as well but I wouldn't recommend it). This is so easy to implement, it isn't even funny. Optimizing this would involve collecting all the inserted / updated /deleted rows since the last snapshot. We can have a logging table that accumulates all the changes since the last refresh for this purpose. (This kind of table may exist anyway for replication or other purposes.) Finally, we could examine the changed rows, ignore the irrelevant ones, ignore the redundant ones (IE, rows that have been inserted and then deleted during that time), and decide whether doing a complete refresh would be quicker than doing several differential updates to the materialized view. The second is the automatically updated materialized view. Each insert, update, or delete executed against the table that are queried for the view will trigger a function call using the algorithms proposed in [1]. Every function in the query must be stable. We could provide a mechanism to alter the materialized view between snapshot to auto-updated. It would be as simple as refreshing the snapshot and then enabling the triggers, or as simple as disabling the triggers. I already have written a set of scripts in perl that provides the first proposal without the optimization idea. I will put them up at GBorg. Footnotes: [1] ftp://ftp.research.microsoft.com/users/palarson/sigmod86.ps - -- Jonathan Gardner jgardner@jonathangardner.net Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/yQjZWgwF3QvpWNwRAnqkAKCasoODeV2+KcP52DMXiEyq7pfhmACcCFBV X28Nd5PvlhR8Xta/G4w2qBQ= =OLa8 -----END PGP SIGNATURE-----
Jonathan Gardner <jgardner@jonathangardner.net> writes: > 3) We would implement some sort of differential view update scheme > based on the paper "Efficiently Updating Materialized Views"[1]. One resource on this topic that appears to be quite authoritative is "Materialized Views: Techniques, Implementations, and Applications" by A. Gupta and I. Mumick (MIT Press, 1999). That might present the material in a manner that is easier to grok than the original papers themselves. I didn't realize that it was possible to spend 600 pages only discussing materialized views, but it apparently it is :-) -Neil
Neil Conway kirjutas P, 30.11.2003 kell 02:18: > Jonathan Gardner <jgardner@jonathangardner.net> writes: > > 3) We would implement some sort of differential view update scheme > > based on the paper "Efficiently Updating Materialized Views"[1]. Maybe the TelegraphCQ engine can give some ideas http://telegraph.cs.berkeley.edu/ from brief reading, it seems that all they do is kind of materialized views ;) - they call it Continuous Dataflow Processing, i.e. queries that run continuously over incoming data. ---------------- Hannu
>>>>> "Hannu" == Hannu Krosing <hannu@tm.ee> writes: Hannu> Neil Conway kirjutas P, 30.11.2003 kell 02:18: >> Jonathan Gardner <jgardner@jonathangardner.net> writes: > 3)We >> would implement some sort of differential view update scheme > >> based on the paper "Efficiently Updating Materialized >> Views"[1]. Hannu> Maybe the TelegraphCQ engine can give some ideas Hannu> http://telegraph.cs.berkeley.edu/ Hannu> from brief reading, it seems that all they do is kind of Hannu> materialized views ;) - they call it ContinuousDataflow Hannu> Processing, i.e. queries that run continuously over Hannu> incoming data. A fair portion of the community argues that everything that we do is merely materialized views :-) While materialized views are certainly related, there are however, things that are quite different in processing continuous queries over data streams. The first ideas on stream processing did come from work on scalable triggers which is intimately related to materialized views. Work on mviews isn't really concerned with things like windowed joins and aggregate processing. My advice is to take a very incremental approach to materialized view implementation. There are essentially 2 pieces - one the maintenance of the views and second the routing of queries to automagically use the materialized views. The former is done by something like triggers and the latter is through either a query-rewrite mechanism or something like an optimizer choosing a different access method. As for the former, there are again two subdivisions - what changes you propagate and when you apply 'em. What you propagate are the deltas. So you have the following choices: IPIA - Immediate Propagate, Immediate Apply IPDA - Immediate Propagate, Deferred Apply DPDA - Deferred Propagate, Deferred Apply DPIA - makes no sense .. The easiest is DPDA .. in this model you essentially recompute the view on demand. The next is IPIA.. with IPIA, you can choose to either recompute the entire view or only figure out how to translate a delta into appropriate changes in the mview. After that there is IPDA .. this involves creating a delta table that is populated based on the triggers. Every so often you process a set of deltas and recompute the view (either in its entirety or in parts> Another way to slice the problem is to limit the scope of queries that can be used to define views. I suggest that first we don't consider joins .. and only consider grouped aggregates over a single table. Joins are more hairy with changes in one table potentially requiring a wholesale recomputation of the join. Note that all this discussion is only about the first part of the problem .. maintenance of the views. The other part, routing is almost equally complicated .. there you have to solve the query subsumption problem without also eliminating choices in your access plan. As people have said there are plenty of papers on this in the literature. While I am no theorist I can certainly help with reading the papers .. not every bit of a paper is very useful. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh