Re: aggregation question

Поиск
Список
Период
Сортировка
От Samuel Gendler
Тема Re: aggregation question
Дата
Msg-id AANLkTik3-kgfGdQx6iKTDW7te5fEFeVZtONQZLqJM-Ut@mail.gmail.com
обсуждение исходный текст
Ответ на aggregation question  (Samuel Gendler <sgendler@ideasculptor.com>)
Список pgsql-sql


On Tue, Nov 30, 2010 at 7:47 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote:
I have a fact table with a number of foreign keys to dimension data and some measure columns.  Let's assume I've got dimension columns a,b,c,d,e, and f and measure columns x, and y.

I need to be able to find the value of f, x, and y for the row with min(x/y) when e in (1,2) and max(x/y) when e not in (1,2) with grouping by a,b,c, and d.

A query which would identify the row in question would look like this:

select min(x/y), a, b, c, d, e from my_table group by a,b,c,d,e order by a,b,c,d 

But I don't know how to go about also extracting the value of x, y, and f for the row which is matched - I suppose I could join to a subquery, but that strikes me as probably not being the most efficient way to accomplish it.

Actually, it occurs to me that I've potentially got more than one row for any given a,b,c,d,e,f tuple, but I want a min for all rows aggregated by f

Basically, for each (a,b,c,d,e) I want to return sum(x), sum(y), and f (as well as a,b,c,d,e) for the f with min(sum(x)/sum(y))


I solved this, but I'm not sure if my solution is ideal

    -- select distinct causes multiple rows that tie for rank = 1
    -- to condense down to a single row. Outer query also strips 
    -- any rows from windows with less than 3 entries.
    select distinct count, total_ms, time, fk_e,
           fk_a, fk_b, fk_c
    from (        -- grab count and total_ms for each fk_g, grouped by        -- a,b,c, time, and e.  Use rank() window function        -- to rank each row in the window, ordered by total_ms/count        -- and use count(*) as window function to get total rows in        -- the window. 
        select sum(f.x) as count, sum(f.y) as total_ms,
               f.time as time,
               f.fk_a, f.fk_b, f.fk_c,
               f.fk_e, f.fk_f,
               rank() over (w_agg order by sum(f.x)/sum(f.y)) as rank,
               count(*) over (w_agg) as rows
        from fact_table f
        group by 3,4,5,6,7,8
        window w_agg as (partition by time,
                         fk_a, fk_b, fk_c,
                         fk_e)
    ) q
    where q.rank = 1 and q.rows >= 3
 

I will need to run this twice - once with rank() ordered descending and the inner query actually limited by fk_e in (4,14) and once with rank() ordered ascending for fk_e not in (4,14).   It is possible I can put a case statement in the inner select in order to handle that.  I haven't tried it yet.

Is there a better solution than the outer 'select distinct' in order to only see rows that have rank = 1 and to never have duplicate rows?

В списке pgsql-sql по дате отправления:

Предыдущее
От: Jeff Bland
Дата:
Сообщение: DELETE WHERE EXISTS unexpected results
Следующее
От: Lee Hachadoorian
Дата:
Сообщение: Re: OT - load a shp file