Обсуждение: Advice on using materialized views

Поиск
Список
Период
Сортировка

Advice on using materialized views

От
"Phil Endecott"
Дата:
Dear Experts,

Does anyone have any suggestions about how to use materialized 
views in combination with regular views for the most recent data? 
Example:

- I have a raw data table, indexed on a timestamp column, to which 
  new data is regularly being added.

- I have a set of views that produce hourly/daily/monthly summaries 
  of this data. Querying these views is slow, so I'd like to make 
  them materialized views.

- But I'd like the most recent data to be included in the results 
  of my queries. So I think I need a combined view that is the 
  union of the materialized view and a non-materialised view for 
  the recent values.

I can imagine how to implement this but it feels rather clunky; 
I need a parallel set of views for the "old" and "new" data, and 
I need to modify the definition of a view that filters the "new" 
values from the raw table each time the materialised view is 
refreshed.

I'm wondering if anyone has an advice or examples of how best 
to do this.


Many thanks, Phil.





Re: Advice on using materialized views

От
Nick Cleaton
Дата:
On Mon, 6 Dec 2021 at 18:48, Phil Endecott
<spam_from_pgsql_lists@chezphil.org> wrote:
>
> - I have a raw data table, indexed on a timestamp column, to which
>   new data is regularly being added.
>
> - I have a set of views that produce hourly/daily/monthly summaries
>   of this data. Querying these views is slow, so I'd like to make
>   them materialized views.
>
> - But I'd like the most recent data to be included in the results
>   of my queries. So I think I need a combined view that is the
>   union of the materialized view and a non-materialised view for
>   the recent values.

Assuming your table is insert-only:

How about instead of using a materialized view at all, you define a
table of hourly summaries which your script updates, and define a view
which merges that with an on-the-fly summary of main table rows newer
than the most recent summarised hour.



Re: Advice on using materialized views

От
Wicher
Дата:
On Mon, 06 Dec 2021 18:48:47 +0000
"Phil Endecott" <spam_from_pgsql_lists@chezphil.org> wrote:

> and 
> I need to modify the definition of a view that filters the "new" 
> values from the raw table each time the materialised view is 
> refreshed.

You won't necessarily need to rewrite the "recent data" view definitions, I think. What is
deemed "recent" depends on what's in the materialized views (it'd be anything newer than
whatever is in there). The good news is that you can simply query for that :-)
So trivially, in your "the data that is more recent than the stuff from the materialized
views" non-materialized view you'd use a definition like
SELECT .... WHERE sometimestamp > (select max(sometimestamp) from the_materialized_view)
or something along those lines.



Re: Advice on using materialized views

От
"Phil Endecott"
Дата:
Thanks both for your replies.

Wicher wrote:
> On Mon, 06 Dec 2021 18:48:47 +0000
> "Phil Endecott" <spam_from_pgsql_lists@chezphil.org> wrote:
>> and 
>> I need to modify the definition of a view that filters the "new" 
>> values from the raw table each time the materialised view is 
>> refreshed.
>
> You won't necessarily need to rewrite the "recent data" view definitions, I think. What is
> deemed "recent" depends on what's in the materialized views (it'd be anything newer than
> whatever is in there). The good news is that you can simply query for that :-)
> So trivially, in your "the data that is more recent than the stuff from the materialized
> views" non-materialized view you'd use a definition like
> SELECT .... WHERE sometimestamp > (select max(sometimestamp) from the_materialized_view)
> or something along those lines.


I guess I was hoping that someone would suggest a more "magic"
way to do this sort of thing. Actually I'm a bit surprised that
materialised views don't event have a way to either

- Refresh a materialised view whenever a source table is modified;

- Refresh a materialised view whenever it is read, if a source table
has changed since it was last refreshed.

Beyond that, I could imagine smart updates where e.g. if you
modify source table rows with primary key K, then you only need
to refresh materialised view rows derived from K.

I think this could all be done on top of triggers. I wonder, do any
other databases do things like this automagically?


Regards, Phil.







Re: Advice on using materialized views

От
Wicher
Дата:
On Thu, 09 Dec 2021 16:06:27 +0000
"Phil Endecott" <spam_from_pgsql_lists@chezphil.org> wrote:

> Thanks both for your replies.
> 
> Wicher wrote:
> > On Mon, 06 Dec 2021 18:48:47 +0000
> > "Phil Endecott" <spam_from_pgsql_lists@chezphil.org> wrote:  
> >> and 
> >> I need to modify the definition of a view that filters the "new" 
> >> values from the raw table each time the materialised view is 
> >> refreshed.  
> >
> > You won't necessarily need to rewrite the "recent data" view definitions, I think. What is
> > deemed "recent" depends on what's in the materialized views (it'd be anything newer than
> > whatever is in there). The good news is that you can simply query for that :-)
> > So trivially, in your "the data that is more recent than the stuff from the materialized
> > views" non-materialized view you'd use a definition like
> > SELECT .... WHERE sometimestamp > (select max(sometimestamp) from the_materialized_view)
> > or something along those lines.  
> 
> 
> I guess I was hoping that someone would suggest a more "magic"
> way to do this sort of thing. Actually I'm a bit surprised that
> materialised views don't event have a way to either
> 
> - Refresh a materialised view whenever a source table is modified;
> 
> - Refresh a materialised view whenever it is read, if a source table
> has changed since it was last refreshed.
> 
> Beyond that, I could imagine smart updates where e.g. if you
> modify source table rows with primary key K, then you only need
> to refresh materialised view rows derived from K.
> 
> I think this could all be done on top of triggers. I wonder, do any
> other databases do things like this automagically?
> 

Not too long ago I asking the list something similar but came up short:

https://www.postgresql.org/message-id/20211129102315.058595fe@tipi

Here's my own take on that problem area, tangent to your question. This
project aims to do (or make it easier to do) the things you mention:

https://git.sr.ht/~nullenenenen/DBSamizdat

It supports your first use case out of the box, and may make your second use case easier
to accommodate, give it a go :-)


There's a sweet spot for materialized views. But at some point (volume/computational
load/freshness requirements) it becomes necessary to use tables instead so that you can
indeed implement efficient partial recalculation. As far as I know. I too am curious
about other approaches.