Обсуждение: How to improve performance in reporting database?

Поиск
Список
Период
Сортировка

How to improve performance in reporting database?

От
Matthew Wilson
Дата:
I have a daily job that pushes data from the production database into
the reporting database, which right now, is an exact copy.

I have a webapp that builds lots of reports for users.  Most of these
reports involve elaborate joins of lookup tables and lots of summations,
and they take too long to run, even after using everything I know to
tune the queries.

Since I know this is a read-only data, it seems like I should be able to
speed everything up dramatically if I run the queries offline and then
save the results into new tables.  Then the web app could just grab the
cached results out of these new tables and then spit them out quickly.

I've heard people talking about using "materialized views" for this, but
that was with Oracle.

What's the postgresql way here?

More generally, any advice on running reporting databases well is
welcome.


Matt

Re: How to improve performance in reporting database?

От
Greg Smith
Дата:
Matthew Wilson wrote:
> I've heard people talking about using "materialized views" for this, but
> that was with Oracle.
>

You can build those manually with PostgreSQL if you really want them:
http://wiki.postgresql.org/wiki/Materialized_Views

The fundamental architecture is sound for a lot of problems in this
area, you just have to figure out how to build them efficiently.  In
your case, you might just consider if there's a way way to update the MV
in batches, rather than rely on triggers to keep the data up to date,
after each data import.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: How to improve performance in reporting database?

От
Scott Marlowe
Дата:
On Thu, Jul 22, 2010 at 8:45 AM, Matthew Wilson <matt@tplus1.com> wrote:
> I have a daily job that pushes data from the production database into
> the reporting database, which right now, is an exact copy.
>
> I have a webapp that builds lots of reports for users.  Most of these
> reports involve elaborate joins of lookup tables and lots of summations,
> and they take too long to run, even after using everything I know to
> tune the queries.
>
> Since I know this is a read-only data, it seems like I should be able to
> speed everything up dramatically if I run the queries offline and then
> save the results into new tables.  Then the web app could just grab the
> cached results out of these new tables and then spit them out quickly.
>
> I've heard people talking about using "materialized views" for this, but
> that was with Oracle.
>
> What's the postgresql way here?

http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

They're kinda roll your own, but they're not that hard to do.

> More generally, any advice on running reporting databases well is
> welcome.

Throw more drives and RAM at the problem, and use materialized views.
Also you're often better off with fewer faster cpus than more slower
ones for reporting servers (the opposite of OLTP where number of cores
is far more important.)

Re: How to improve performance in reporting database?

От
Vick Khera
Дата:
On Thu, Jul 22, 2010 at 2:31 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> You can build those manually with PostgreSQL if you really want them:
>  http://wiki.postgresql.org/wiki/Materialized_Views
>

Another thing to consider... In our case we use a materialized view to
keep track of counts of various things that are expensive to count,
such as number of members of a specific account, or number of times a
particular URL was viewed.  What we end up with is many processes
trying to update the same counter row, and we end up with lots and
lots of lock contention.

Which reminds me... Greg, are you done with your book yet and when can
I get you back in here to help with my locking problems? ;-)