Обсуждение: Incrementally refreshed materialized view

Поиск
Список
Период
Сортировка

Incrementally refreshed materialized view

От
Adam Brusselback
Дата:
Hello all,
I am working on a plan to implement incrementally refreshed materialized "views" with the existing functionality in Postgres.

Below is the plan for doing that:

Trigger based eagerly updated materialized tables for Postgres 9.5

 

High level plan:

Have a view definition stored in the database which we can use for reference.  Create functions which will read that view definition, and create a materialized table with all the same columns as the reference view, create triggers on all tables the view depends on to keep the materialized table fresh within a transaction.  All queries would hit the materialized table, the view is just there so we know what dependencies to track, and have an easy way to update the materialized table.

 

How do we actually do the refresh?

1.       A refresh key is defined for the materialized table.

2.       Every dependent table must roll up to that refresh key so we know what rows to refresh.

3.       That key should be able to be referenced in the views where clause performantly so we can refresh just the rows that match the refresh key using the view.

4.       The refresh will be done by deleting any existing rows with the key, and inserting new ones with the key from the view.

How do we know what to refresh?

1.       A before statement trigger to create a temp table to log all changes.

2.       A for each row trigger to log the rows modified by DML.

a.       This should be done at the refresh key level.

                                                               i.      We need to figure out a way to generate queries to roll up things multiple levels on the dependency chain until we get to the refresh key.  Not sure at all how to do that.

3.       An after statement trigger to run a refresh on the materialized table, looking at only the rows touched by the DML. 


I am however stuck on: How do we know what to refresh?  -> Step 2
Pretty much, I need to figure out how to follow the joins in the view back to whatever key was defined as the "refresh key" for each dependent table.  I know about the information_schema.view_column_usage, but I don't think that'll get me everything I need.

I'd really appreciate any help with this, as i'd love a better way to get eagerly refreshed materialized views in Postgres rather than doing everything manually as I have to now.

If I can provide any more info please let me know.
Thanks,
-Adam

Re: Incrementally refreshed materialized view

От
Melvin Davidson
Дата:


On Mon, Sep 26, 2016 at 1:56 PM, Adam Brusselback <adambrusselback@gmail.com> wrote:
Hello all,
I am working on a plan to implement incrementally refreshed materialized "views" with the existing functionality in Postgres.

Below is the plan for doing that:

Trigger based eagerly updated materialized tables for Postgres 9.5

 

High level plan:

Have a view definition stored in the database which we can use for reference.  Create functions which will read that view definition, and create a materialized table with all the same columns as the reference view, create triggers on all tables the view depends on to keep the materialized table fresh within a transaction.  All queries would hit the materialized table, the view is just there so we know what dependencies to track, and have an easy way to update the materialized table.

 

How do we actually do the refresh?

1.       A refresh key is defined for the materialized table.

2.       Every dependent table must roll up to that refresh key so we know what rows to refresh.

3.       That key should be able to be referenced in the views where clause performantly so we can refresh just the rows that match the refresh key using the view.

4.       The refresh will be done by deleting any existing rows with the key, and inserting new ones with the key from the view.

How do we know what to refresh?

1.       A before statement trigger to create a temp table to log all changes.

2.       A for each row trigger to log the rows modified by DML.

a.       This should be done at the refresh key level.

                                                               i.      We need to figure out a way to generate queries to roll up things multiple levels on the dependency chain until we get to the refresh key.  Not sure at all how to do that.

3.       An after statement trigger to run a refresh on the materialized table, looking at only the rows touched by the DML. 


I am however stuck on: How do we know what to refresh?  -> Step 2
Pretty much, I need to figure out how to follow the joins in the view back to whatever key was defined as the "refresh key" for each dependent table.  I know about the information_schema.view_column_usage, but I don't think that'll get me everything I need.

I'd really appreciate any help with this, as i'd love a better way to get eagerly refreshed materialized views in Postgres rather than doing everything manually as I have to now.

If I can provide any more info please let me know.
Thanks,
-Adam

I am a bit curious. Why are you reinventing the wheel?
What is wrong with:

REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name   [ WITH [ NO ] DATA ]

https://www.postgresql.org/docs/9.4/static/sql-refreshmaterializedview.html

Can't you do that in a cron job?

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Incrementally refreshed materialized view

От
Rakesh Kumar
Дата:

Does PG support INCREMENTAL MV ? Looks like not (until 9.5)

Re: Incrementally refreshed materialized view

От
Rob Sargent
Дата:
Of course 9.5 is the current release so the answer is Yes, since 9.5

On 09/26/2016 12:29 PM, Rakesh Kumar wrote:

Does PG support INCREMENTAL MV ? Looks like not (until 9.5)


Re: Incrementally refreshed materialized view

От
Adam Brusselback
Дата:
I require eagerly refreshed materialized views for my use case, which is something Postgres does not currently support.  I need my updates to a table the view refers to visible within the same transaction, and often it is a single change to one row which will only effect a single row in the view.  If I used materialized views as they're currently implemented, that would run the entire query and replace what needs to change, but it has to run that entire query to do so.  For my use case, that is totally out of the question to do for every insert / update / delete that could effect the view.

For example, if I had a account balance materialized view that pretty much summed transactions for a specific user from two different tables and displayed a "balance" for every user, I would want that to only run the query for the refresh for the specific user(s) that just had data inserted/updated/deleted.  Not every user in the system after every statement.

I've pretty much implemented this manually for some specific views which performed horribly in Postgres (but would work fine in SQL Server for example). I am looking to do this in a generic way so it's easier to implement when necessary, and can be packaged as an extension for others who may need to use it.

Obviously if we had better support for statement level triggers (so we could reference all the NEW / OLD values touched by a statement) this would be lower overhead, but that is one of the things holding up incrementally refreshed materialized views from being implemented in the first place. I just thought up a way to do it which gets around not having better statement level triggers and wanted to see where I could get with things as they are.

Re: Incrementally refreshed materialized view

От
Kevin Grittner
Дата:
On Mon, Sep 26, 2016 at 1:22 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
> On Mon, Sep 26, 2016 at 1:56 PM, Adam Brusselback <adambrusselback@gmail.com> wrote:
>>
>> I am working on a plan to implement incrementally refreshed
>> materialized "views" with the existing functionality in
>> Postgres.
>>
>> Below is the plan for doing that:

>> [design from scratch, incomplete]

>> I am however stuck on: How do we know what to refresh?

>> Pretty much, I need to figure out how to follow the joins in the
>> view back to whatever key was defined as the "refresh key" for
>> each dependent table.  I know about the
>> information_schema.view_column_usage, but I don't think that'll
>> get me everything I need.
>>
>> I'd really appreciate any help with this, as i'd love a better
>> way to get eagerly refreshed materialized views in Postgres
>> rather than doing everything manually as I have to now.
>>
>> If I can provide any more info please let me know.

> I am a bit curious. Why are you reinventing the wheel?
> What is wrong with:
>
> REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name
>     [ WITH [ NO ] DATA ]
>
> https://www.postgresql.org/docs/9.4/static/sql-refreshmaterializedview.html
>
> Can't you do that in a cron job?

Well, that is not *incremental* maintenance -- the entire query is
executed, with the resulting relation either replacing the previous
contents of the matview or "diffed" against the previous contents
(so that the difference can be applied with transactional
semantics), depending on whether CONCURRENTLY was specified.

The OP is still reinventing the wheel though.  A summary of
available techniques as of the mid-90s can be found here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254&rep=rep1&type=pdf

With some detail for what to me look like the two most promising
techniques here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.3208&rep=rep1&type=pdf

The first step in using either of those techniques (counting or
DRed) is to capture a delta relation to feed into the relational
algebra used by these techniques.  As a first step in that
direction I have been floating a patch to implement the
SQL-standard "transition tables" feature for AFTER triggers.

https://commitfest.postgresql.org/10/778/

If you want to help that effort, reading the thread and reviewing
the patch would be good.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Incrementally refreshed materialized view

От
Adam Brusselback
Дата:
On Mon, Sep 26, 2016 at 2:35 PM, Rob Sargent <robjsargent@gmail.com> wrote:
Of course 9.5 is the current release so the answer is Yes, since 9.5

It seems like there is some confusion about what we're talking about. I am talking about incremental updates to a sort of "fake" materialized view (implemented as a table).  This is totally different than what we currently have implemented for actual materialized views (REFRESH CONCURRENTLY).  Concurrent refresh just minimizes the time the view is locked by building a second table in the background and merging the changes between them.

Re: Incrementally refreshed materialized view

От
Rakesh Kumar
Дата:

> Of course 9.5 is the current release so the answer is Yes, since 9.5


https://www.postgresql.org/docs/current/static/sql-creatematerializedview.html

What am I missing. I don't see any support for incremental refresh.

Just in case we are not speaking the same thing:  When a MV is created for the first time on a base table, all further updates on the base table, will be handled by only applying the delta changes to the base table. That is called incremental refresh. Check how DB2 does it:

http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000977.html

Or does " REFRESH MATERIALIZED VIEW." does it only incremental refresh.

Does PG have a concept of MV log, from where it can detect the delta changes and apply  incremental changes quickly.

Re: Incrementally refreshed materialized view

От
Kevin Grittner
Дата:
On Mon, Sep 26, 2016 at 2:04 PM, Rakesh Kumar
<rakeshkumar464@outlook.com> wrote:

> Does PG have a concept of MV log, from where it can detect the
> delta changes and apply  incremental changes quickly.

That is what I am trying to work toward with the patch I cited in
an earlier post.  Once some variation of that is in, the actual
incremental maintenance can be build on top of it.  To see an
example of what would be done with such a delta relation for a
simple MV, using the count algorithm, see below:


--------------------------------------------------------------------------
--
-- This file demonstrates how incremental maintenance of materialized views
-- is intended to work using the counting algorithm.  TEMP tables are used
-- to mock up both snapshots of relations and delta relations which will
-- be created automatically "behind the scenes" as part of the algorithm.
--
-- Delta relations for both base tables and materialized views will need a
-- "count(t)" column to track the number of alternative derivations of the
-- tuple.  The column is actually named that in this example.  Where the
-- count is needed by the algorithm and a relation doesn't have it
-- (because, for example, it is a base table), 1 is used.  This is kept in
-- the example for purposes of illustration.  While it is explicitly added
-- to the materialized views for this example, the idea would be that
-- specifying these materialized views using DISTINCT would implicitly add
-- the "count(t)" column when incremental maintenance was specified.
--
-- The logic for a JOIN is that the resulting row should have the product
-- of multiplying the two input row counts.  Calculating a delta for that
-- requires two such joins which are then processed by a UNION with special
-- semantics.  Those semantics are achieved in this example by feeding
-- UNION ALL results to an outer SELECT which uses GROUP BY and HAVING
-- clauses.
--
-- This example dodges some complex and performance-sapping problems that
-- occur when duplicate rows may be present.  It does so with a combination
-- of a PRIMARY KEY declaration in the base table and GROUP BY clauses in
-- the materialized views.
--
-- The temporary relation names in the example are chosen for clarity of
-- the example.  If temporary relation names are actually needed in the
-- implementation, they would need to be different, probably based on the
-- related permanent relation OID, for length.
--
--------------------------------------------------------------------------

-- Set up the base table used for these materialized views.
CREATE TABLE link
(
  src  text not null,
  dst  text not null,
  primary key (src, dst)
);
INSERT INTO link
  VALUES ('a','b'),('b','c'),('b','e'),('a','d'),('d','c');

-- Display and capture "before" image.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link1 AS SELECT * FROM link;

-- Create and display the initial state of the "hop" MV.
CREATE MATERIALIZED VIEW hop AS
  SELECT t1.src, t2.dst, count(*) AS "count(t)"
    FROM link t1
    JOIN link t2 ON (t2.src = t1.dst)
    GROUP BY t1.src, t2.dst;
SELECT * FROM hop;

-- Execute a transaction that modifies the base table.
-- The assumption is that the changes will be applied shortly after commit.
BEGIN;
DELETE FROM link WHERE src = 'b' AND dst = 'e';
INSERT INTO link VALUES ('c','h'),('f','g');
COMMIT;

-- Display and capture "after" image.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link2 AS SELECT * FROM link;

-- Mock up the delta(link) table which will get created "behind the scenes".
CREATE TEMP TABLE "Δ(link)" (src text NOT NULL, dst text NOT NULL,
"count(t)" int NOT NULL);
INSERT INTO "Δ(link)" VALUES ('b','e',-1),('c','h',1),('f','g',1);
SELECT * FROM "Δ(link)" ORDER BY 1, 2;

-- Given link before and after images, and delta(link), calculate delta(hop).
-- This must happen while the "before" and "after" snapshots are still
registered,
-- but not necessarily in the same transaction as the modifications to
the base table.
CREATE TEMP TABLE "Δ(hop)"
  (src text NOT NULL, dst text NOT NULL, "count(t)" int NOT NULL);
INSERT INTO "Δ(hop)"
  SELECT src, dst, sum("count(t)") AS "count(t)"
    FROM (
           SELECT delta.src, before.dst, delta."count(t)" * 1
             FROM "Δ(link)" delta
             JOIN link1 before ON (before.src = delta.dst)
           UNION ALL
           SELECT after.src, delta.dst, 1 * delta."count(t)"
             FROM link2 after
             JOIN "Δ(link)" delta ON (delta.src = after.dst)
         ) x(src, dst, "count(t)")
    GROUP BY src, dst
    HAVING sum("count(t)") <> 0;
SELECT * FROM "Δ(hop)" ORDER BY 1, 2;

-- Once the MV delta has been generated, the snapshots can be released.
-- We're using temp tables for demonstration purposes, so drop those,
-- and the base table's delta.
DROP TABLE link1, link2, "Δ(link)";

-- At some later time the MV delta is processed "behind the scenes".
-- We can't do the demonstration maintenance against the MV, so copy it.
CREATE TEMP TABLE hop2 AS SELECT * FROM hop;
BEGIN;
DELETE FROM hop2 t1
  USING "Δ(hop)" t2
  WHERE t1.src = t2.src
    AND t1.dst = t2.dst
    AND t1."count(t)" + t2."count(t)" = 0;
UPDATE hop2 t1
  SET "count(t)" = t1."count(t)" + t2."count(t)"
  FROM "Δ(hop)" t2
  WHERE t1.src = t2.src
    AND t1.dst = t2.dst;
INSERT INTO hop2
  SELECT * FROM "Δ(hop)" t1
    WHERE "count(t)" > 0
      AND NOT EXISTS (SELECT * FROM hop2 t2 WHERE t2.src = t1.src AND
t2.dst = t1.dst);
COMMIT;

-- Show that we got to the same result with incremental maintenance as
with REFRESH.
REFRESH MATERIALIZED VIEW hop;
SELECT * FROM hop ORDER BY 1, 2;
SELECT * FROM hop2 ORDER BY 1, 2;

-- Now we're done with the MV delta.
DROP TABLE hop2, "Δ(hop)";

--------------------------------------------------------------------------

-- Let's show how it works when an MV uses another MV.
-- Create the 2nd-level MV.
CREATE MATERIALIZED VIEW tri_hop AS
  SELECT t1.src, t2.dst, sum(t1."count(t)" * 1) AS "count(t)"
    FROM hop t1
    JOIN link t2 ON (t2.src = t1.dst)
    GROUP BY t1.src, t2.dst;
SELECT * FROM tri_hop ORDER BY 1, 2;

-- Display and capture "before" image.
-- Include hop now, because it is referenced by tri_hop.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link1 AS SELECT * FROM link;
SELECT * FROM hop ORDER BY 1, 2;
CREATE TEMP TABLE hop1 AS SELECT * FROM hop;

-- Execute a transaction that modifies the base table.
-- The assumption is that the changes will be applied shortly after commit.
BEGIN;
DELETE FROM link WHERE src = 'a' AND dst = 'b';
INSERT INTO link VALUES ('d','f'),('b','h');
COMMIT;

-- Display and capture "after" image.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link2 AS SELECT * FROM link;

-- Mock up the delta(link) table which will get created "behind the scenes".
CREATE TEMP TABLE "Δ(link)" (src text NOT NULL, dst text NOT NULL,
"count(t)" int NOT NULL);
INSERT INTO "Δ(link)" VALUES ('a','b',-1),('d','f',1),('b','h',1);
SELECT * FROM "Δ(link)" ORDER BY 1, 2;

-- Given link before and after images, and delta(link), calculate delta(hop).
-- This must happen while the "before" and "after" snapshots are still
registered,
-- but not necessarily in the same transaction as the modifications to
the base table.
-- For a "first level" MV, this is calculated just the same as before,
but it will be used
-- to calculate the 2nd level MV before we discard the snapshots.
CREATE TEMP TABLE "Δ(hop)"
  (src text NOT NULL, dst text NOT NULL, "count(t)" int NOT NULL);
INSERT INTO "Δ(hop)"
  SELECT src, dst, sum("count(t)") AS "count(t)"
    FROM (
           SELECT delta.src, before.dst, delta."count(t)" * 1
             FROM "Δ(link)" delta
             JOIN link1 before ON (before.src = delta.dst)
           UNION ALL
           SELECT after.src, delta.dst, 1 * delta."count(t)"
             FROM link2 after
             JOIN "Δ(link)" delta ON (delta.src = after.dst)
         ) x(src, dst, "count(t)")
    GROUP BY src, dst
    HAVING sum("count(t)") <> 0;
SELECT * FROM "Δ(hop)" ORDER BY 1, 2;

-- Since the counting algorithm requires the link table "before"
image, as well as
-- the hop MV "after" image and delta(hop) to calculate delta(tri_hop), we must
-- maintain hop before releasing the snapshots used to update link.
CREATE TEMP TABLE hop2 AS SELECT * FROM hop;
BEGIN;
DELETE FROM hop2 t1
  USING "Δ(hop)" t2
  WHERE t1.src = t2.src
    AND t1.dst = t2.dst
    AND t1."count(t)" + t2."count(t)" = 0;
UPDATE hop2 t1
  SET "count(t)" = t1."count(t)" + t2."count(t)"
  FROM "Δ(hop)" t2
  WHERE t1.src = t2.src
    AND t1.dst = t2.dst;
INSERT INTO hop2
  SELECT * FROM "Δ(hop)" t1
    WHERE "count(t)" > 0
      AND NOT EXISTS (SELECT * FROM hop2 t2 WHERE t2.src = t1.src AND
t2.dst = t1.dst);
COMMIT;

-- Show that we got to the same result with incremental maintenance as
with REFRESH.
REFRESH MATERIALIZED VIEW hop;
SELECT * FROM hop ORDER BY 1, 2;
SELECT * FROM hop2 ORDER BY 1, 2;

-- After the 1st level MV is run, we can calculate the delta for the 2nd level.
CREATE TEMP TABLE "Δ(tri_hop)"
  (src text NOT NULL, dst text NOT NULL, "count(t)" int NOT NULL);
INSERT INTO "Δ(tri_hop)"
  SELECT src, dst, sum("count(t)") AS "count(t)"
    FROM (
           SELECT delta.src, before.dst, delta."count(t)" * 1
             FROM "Δ(hop)" delta
             JOIN link1 before ON (before.src = delta.dst)
           UNION ALL
           SELECT after.src, delta.dst, 1 * delta."count(t)"
             FROM hop2 after
             JOIN "Δ(link)" delta ON (delta.src = after.dst)
         ) x(src, dst, "count(t)")
    GROUP BY src, dst
    HAVING sum("count(t)") <> 0;
SELECT * FROM "Δ(tri_hop)" ORDER BY 1, 2;

-- Now we're done with snapshots and all but the highest-level delta.
DROP TABLE link1, link2, "Δ(link)";
DROP TABLE hop1, hop2, "Δ(hop)";

-- At some later time the MV delta is processed "behind the scenes".
-- We can't do the demonstration maintenance against the MV, so copy it.
CREATE TEMP TABLE tri_hop2 AS SELECT * FROM tri_hop;
BEGIN;
DELETE FROM tri_hop2 t1
  USING "Δ(tri_hop)" t2
  WHERE t1.src = t2.src
    AND t1.dst = t2.dst
    AND t1."count(t)" + t2."count(t)" = 0;
UPDATE tri_hop2 t1
  SET "count(t)" = t1."count(t)" + t2."count(t)"
  FROM "Δ(tri_hop)" t2
  WHERE t1.src = t2.src
    AND t1.dst = t2.dst;
INSERT INTO tri_hop2
  SELECT * FROM "Δ(tri_hop)" t1
    WHERE "count(t)" > 0
      AND NOT EXISTS (SELECT * FROM tri_hop2 t2 WHERE t2.src = t1.src
AND t2.dst = t1.dst);
COMMIT;

-- Show that we got to the same result with incremental maintenance as
with REFRESH.
REFRESH MATERIALIZED VIEW tri_hop;
SELECT * FROM tri_hop ORDER BY 1, 2;
SELECT * FROM tri_hop2 ORDER BY 1, 2;

-- Now we're done with the MV delta.
DROP TABLE tri_hop2, "Δ(tri_hop)";


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Incrementally refreshed materialized view

От
Adam Brusselback
Дата:


On Mon, Sep 26, 2016 at 3:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
On Mon, Sep 26, 2016 at 2:04 PM, Rakesh Kumar
<rakeshkumar464@outlook.com> wrote:

> Does PG have a concept of MV log, from where it can detect the
> delta changes and apply  incremental changes quickly.

That is what I am trying to work toward with the patch I cited in
an earlier post.  Once some variation of that is in, the actual
incremental maintenance can be build on top of it.  To see an
example of what would be done with such a delta relation for a
simple MV, using the count algorithm, see below:

Well I feel like I've learned a ton already reading through the links you provided earlier and that example above.

I'm very interested in getting this into core. I'll look into what I need to do to review. Not crazy familiar with C, as I mainly do Java development. I'll see if I can help in any way though.

The main reason I was working on an alternative is because I need something now rather than in a couple years, but I've been dealing with manually creating the few I do need for my database. What I proposed above was just me thinking about what could be done with things as they are. Obviously it's junk compared to a real solution in-core.  Would you consider my approach even worth trying, or should I just suck it up and do things manually for now and put that effort into getting incremental refresh into core?

Re: Incrementally refreshed materialized view

От
Kevin Grittner
Дата:
On Mon, Sep 26, 2016 at 3:16 PM, Adam Brusselback
<adambrusselback@gmail.com> wrote:

> Well I feel like I've learned a ton already reading through the links you
> provided earlier and that example above.

Yeah, I know that example can really help show what will happen
"under the covers", and make it more concrete.  The theory that
it's based on seems brilliant to me.  That fact that it's based on
relational algebra (RA) means that it is much more likely to be
implemented in a complete and correct manner than something ad hoc.
 I know I started at this with a try (similar to yours) at
analyzing from scratch, then went and searched the literature.
When I came back to my ad hoc notes, the RA pointed out some holes
in the logic where corner cases would have been wrong because of
missed details.  RA is well thought out and solid; it seems to me
to be the perfect thing to underlie a solution to this problem.

> I'm very interested in getting this into core. I'll look into what I need to
> do to review. Not crazy familiar with C, as I mainly do Java development.
> I'll see if I can help in any way though.

Just testing it and reviewing the documentation for errors,
omissions, or just plain hard-to-follow language would be a big
help.  Please post back about any problems getting things patched
and build.

> The main reason I was working on an alternative is because I need something
> now rather than in a couple years, but I've been dealing with manually
> creating the few I do need for my database. What I proposed above was just
> me thinking about what could be done with things as they are. Obviously it's
> junk compared to a real solution in-core.  Would you consider my approach
> even worth trying, or should I just suck it up and do things manually for
> now and put that effort into getting incremental refresh into core?

Oh, I've used plain tables and triggers many times myself.  If you
need something now, you kind of have to go that route.  The
algorithms I cited do provide an interesting potential alternative
for how to go about that, although operating a row at a time you
probably won't approach the speed of statement-level set logic for
statements that affect very many rows.  :-(

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Incrementally refreshed materialized view

От
hariprasath nallasamy
Дата:
We also tried to achieve incremental refresh of materialized view and our solution doesn't solve all of the use cases.

Players:
1) WAL
2) Logical decoding 
3) replication slots 
4) custom background worker

Two kinds of approaches :
1. Deferred refresh (oracle type of creating log table for each base tables with its PK and agg's columns old and new values)
      a) Log table for each base table has to be created and this log table will keep track of delta changes. 
      b) UDF is called to refresh the view incrementally - this will run original materialized view query with the tracked delta PK's in their where clause. so only rows that are modified/inserted will be touched.
      c) Log table will keep track of changed rows from the data given by replication slot which uses logical decoding to decode from WAL.
      d) Shared memory is used to maintain the relationship between the view and its base table. In case of restart they are pushed to maintenance table.

2. RealTime refresh (update the view whenever we get any change-sets related to that base tables)
      a) Delta data from the replication slot will be applied to view by checking the relationship between our delta data and the view definiton. Here also shared memory and maintenance table are used.
      b) Work completed only for materialized views having single table.

Main disadvantage : 
1) Data inconsistency when master failure and also slave doesn't have replication slot as of now. But 2ndquard guys try to create slots in slave using some concepts of failover slots. But that doesn't come along with PG :(. 
2) Sum, count and avg are implemented for aggregates(single table) and for other aggs full refresh comes to play a role.
3) Right join implementation requires more queries to run on the top of MV's.

So we are on a long way to go and dono whether this is the right path.

Only deferred refresh was pushed to github.

I wrote a post regarding that in medium.


Re: Incrementally refreshed materialized view

От
Nguyễn Trần Quốc Vinh
Дата:
Dear,

As it was recommended, we pushed our projects into github: https://github.com/ntqvinh/PgMvIncrementalUpdate. 

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

 We hope that our projects may be helpful for someone!




TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Tue, Sep 27, 2016 at 12:23 PM, hariprasath nallasamy <hariprasathnallasamy@gmail.com> wrote:
We also tried to achieve incremental refresh of materialized view and our solution doesn't solve all of the use cases.

Players:
1) WAL
2) Logical decoding 
3) replication slots 
4) custom background worker

Two kinds of approaches :
1. Deferred refresh (oracle type of creating log table for each base tables with its PK and agg's columns old and new values)
      a) Log table for each base table has to be created and this log table will keep track of delta changes. 
      b) UDF is called to refresh the view incrementally - this will run original materialized view query with the tracked delta PK's in their where clause. so only rows that are modified/inserted will be touched.
      c) Log table will keep track of changed rows from the data given by replication slot which uses logical decoding to decode from WAL.
      d) Shared memory is used to maintain the relationship between the view and its base table. In case of restart they are pushed to maintenance table.

2. RealTime refresh (update the view whenever we get any change-sets related to that base tables)
      a) Delta data from the replication slot will be applied to view by checking the relationship between our delta data and the view definiton. Here also shared memory and maintenance table are used.
      b) Work completed only for materialized views having single table.

Main disadvantage : 
1) Data inconsistency when master failure and also slave doesn't have replication slot as of now. But 2ndquard guys try to create slots in slave using some concepts of failover slots. But that doesn't come along with PG :(. 
2) Sum, count and avg are implemented for aggregates(single table) and for other aggs full refresh comes to play a role.
3) Right join implementation requires more queries to run on the top of MV's.

So we are on a long way to go and dono whether this is the right path.

Only deferred refresh was pushed to github.

I wrote a post regarding that in medium.



Re: Incrementally refreshed materialized view

От
Adam Brusselback
Дата:
On Thu, Sep 29, 2016 at 8:10 AM, Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com> wrote:
Dear,

As it was recommended, we pushed our projects into github: https://github.com/ntqvinh/PgMvIncrementalUpdate

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

 We hope that our projects may be helpful for someone!

Very interesting. Does this support materialized views with recursive queries? What about left joins? (not) exists? Aggregates? Window functions? In reading up on the implementations in other databases, I was surprised by some of the limitations imposed by DB2 / Oracle / Sql Server.

I'm trying to look through the code base to answer my questions, but it's large enough that it may be easier to just ask first.

Re: Incrementally refreshed materialized view

От
Nguyễn Trần Quốc Vinh
Дата:
Dear Adam.

There are read-me files within each projects. They support only inner-joins and aggregations (sum, count, min, max, avg). The updating algorithm for matviews with min/max must be differ from without ones. When there are min and/or max, we have to use the base table, otherwise, not.

About 15 years ago we implemented for SPJ matvew-queries to generate triggers in PL/pgSQL. We developed that project for queries with aggregations about 4 years ago. Unfortunately we lost the last versions. We have now only the version with some error when there are aggregations. For SPJ queries it works well. We don't know is it useful or not. If yes, we can upload for sharing. We didn't share it because we thought that our work is not good enough for public use. 

The projects that generate C-codes was "finished" 2 years ago.

Concern the project that generates triggers (doing incremental update) in PL/pgSQL, we implemented the algorithm likes your suggestion in this thread, i.e. at least one key of the base tables are added  automatically into the mat-view queries for further incremental updates.

TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Thu, Sep 29, 2016 at 8:37 PM, Adam Brusselback <adambrusselback@gmail.com> wrote:
On Thu, Sep 29, 2016 at 8:10 AM, Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com> wrote:
Dear,

As it was recommended, we pushed our projects into github: https://github.com/ntqvinh/PgMvIncrementalUpdate

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

 We hope that our projects may be helpful for someone!

Very interesting. Does this support materialized views with recursive queries? What about left joins? (not) exists? Aggregates? Window functions? In reading up on the implementations in other databases, I was surprised by some of the limitations imposed by DB2 / Oracle / Sql Server.

I'm trying to look through the code base to answer my questions, but it's large enough that it may be easier to just ask first.

Re: Incrementally refreshed materialized view

От
Nguyễn Trần Quốc Vinh
Дата:

We came to C and discontinued the version generating code in PL/pgSQL because of
- Our testing showed that triggers in C give better performance than the ones in PL/pgSQL. Our opinion may be wrong.
- If we can generate triggers and other updating codes in C, we may integrate it into PostgreSQL source codes. We may be wrong too.

:)

We plan to do by the same way for queries with outer-joins next year. With recursive queries we have no plan because we don't see any effective update algorithm. The worst is that we have no fund to do with matviews and PostgreSQL. All that just for fun! We have too many things to do each day.

TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Thu, Sep 29, 2016 at 8:56 PM, Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com> wrote:
Dear Adam.

There are read-me files within each projects. They support only inner-joins and aggregations (sum, count, min, max, avg). The updating algorithm for matviews with min/max must be differ from without ones. When there are min and/or max, we have to use the base table, otherwise, not.

About 15 years ago we implemented for SPJ matvew-queries to generate triggers in PL/pgSQL. We developed that project for queries with aggregations about 4 years ago. Unfortunately we lost the last versions. We have now only the version with some error when there are aggregations. For SPJ queries it works well. We don't know is it useful or not. If yes, we can upload for sharing. We didn't share it because we thought that our work is not good enough for public use. 

The projects that generate C-codes was "finished" 2 years ago.

Concern the project that generates triggers (doing incremental update) in PL/pgSQL, we implemented the algorithm likes your suggestion in this thread, i.e. at least one key of the base tables are added  automatically into the mat-view queries for further incremental updates.

TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Thu, Sep 29, 2016 at 8:37 PM, Adam Brusselback <adambrusselback@gmail.com> wrote:
On Thu, Sep 29, 2016 at 8:10 AM, Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com> wrote:
Dear,

As it was recommended, we pushed our projects into github: https://github.com/ntqvinh/PgMvIncrementalUpdate

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

 We hope that our projects may be helpful for someone!

Very interesting. Does this support materialized views with recursive queries? What about left joins? (not) exists? Aggregates? Window functions? In reading up on the implementations in other databases, I was surprised by some of the limitations imposed by DB2 / Oracle / Sql Server.

I'm trying to look through the code base to answer my questions, but it's large enough that it may be easier to just ask first.