Обсуждение: Counting # of consecutive rows with specified value(s)?

Поиск
Список
Период
Сортировка

Counting # of consecutive rows with specified value(s)?

От
Ken Tanzer
Дата:
I'm working with an attendance table, where each person gets a record for each day of class, with an attendance code (ABSENT, ATTENDED, ...).  I'm trying to figure out how to get the number of consecutive absences a person has.  I'm guessing this can be done without writing a function, but I need some help here.

I can currently test whether someone has at least a specified number of consecutive absences with the query below, but it would be  better to get the actual number.

As a second question, what about getting the number of consecutive records for a set of values?  (e.g., attendance_code IN ('ATTENDED','EXCUSED')

Any ideas or suggestions?  Thanks.

Ken

This query checks for 4 consecutive absences:

SELECT client_id,
    array( 
      SELECT attendance_code
      FROM attendance
      WHERE client_id=enrollment.client_id
      ORDER BY attended_on DESC
      LIMIT 4
    )=array_fill('ABSENT'::varchar,array[4]) AS absent_last_4 
  FROM enrollment;

--
AGENCY Software  
A data system that puts you in control
(253) 245-3801


Re: Counting # of consecutive rows with specified value(s)?

От
David Johnston
Дата:
On Jun 6, 2012, at 22:20, Ken Tanzer <ken.tanzer@gmail.com> wrote:

> I'm working with an attendance table, where each person gets a record for each day of class, with an attendance code
(ABSENT,ATTENDED, ...).  I'm trying to figure out how to get the number of consecutive absences a person has.  I'm
guessingthis can be done without writing a function, but I need some help here. 
>
> I can currently test whether someone has at least a specified number of consecutive absences with the query below,
butit would be  better to get the actual number. 
>
> As a second question, what about getting the number of consecutive records for a set of values?  (e.g.,
attendance_codeIN ('ATTENDED','EXCUSED') 
>
> Any ideas or suggestions?  Thanks.
>
> Ken
>
> This query checks for 4 consecutive absences:
>
> SELECT client_id,
>     array(
>       SELECT attendance_code
>       FROM attendance
>       WHERE client_id=enrollment.client_id
>       ORDER BY attended_on DESC
>       LIMIT 4
>     )=array_fill('ABSENT'::varchar,array[4]) AS absent_last_4
>   FROM enrollment;
>

Quick thought if you have window functions.  If you perform a rank over the appropriate subset of columns (with order
by)you might be able to get consecutive records with matching codes to have the same rank.  The in an outer query you
cansimply sum with the rank as the group.  I am unsure which function gives you equal values for ties, or whether this
approachwill even work, but it seems worth considering at least. Using a min date your result might be readable as
"startingon 9/1, 4 absences; starting on 9/5, 21 attendeds; etc...". 

Consider using a frame of 1 prior row to see whether a given date/code is different than the immediately preceding day.
If it is that day qualifies as the possible beginning of a chain otherwise it has to be added to an existing chain.
Youprobably need to do this first then use the dates of the applicable chain starts (and ends maybe...) as fields in
thepartition mentioned above. 

HTH

David J.

Re: Counting # of consecutive rows with specified value(s)?

От
François Beausoleil
Дата:

Le 2012-06-06 à 22:20, Ken Tanzer a écrit :

I can currently test whether someone has at least a specified number of consecutive absences with the query below, but it would be  better to get the actual number.

As a second question, what about getting the number of consecutive records for a set of values?  (e.g., attendance_code IN ('ATTENDED','EXCUSED')

Any ideas or suggestions?  Thanks.

This is similar to the islands and gaps problem. Search for that on StackOverflow and you'll get it.

Bye!
François

Re: Counting # of consecutive rows with specified value(s)?

От
Ken Tanzer
Дата:
Thank you both for the suggestions.  I started playing with the window functions, but found and copied an "islands and gaps" example that didn't need them, and was simpler than I thought.  This query seems to do the trick:

SELECT
  client_id,
  count(*)
FROM 
  (SELECT
    client_id,
    attendance_code
  FROM recovery_circle_attendance rca
  WHERE attended_on >
     (SELECT max(attended_on)
      FROM recovery_circle_attendance
      WHERE client_id=rca.client_id AND attendance_code != 'ABSENT')
  ) foo
GROUP BY client_id;

It's a fairly small dataset, so at least right now I'm not too worried about performance, but am curious if this is a reasonably well-optimized way to get this info, or if there are any glaring issues or room for improvement in this regard?

Cheers,
Ken





On Thu, Jun 7, 2012 at 12:35 PM, François Beausoleil <francois@teksol.info> wrote:

Le 2012-06-06 à 22:20, Ken Tanzer a écrit :

I can currently test whether someone has at least a specified number of consecutive absences with the query below, but it would be  better to get the actual number.

As a second question, what about getting the number of consecutive records for a set of values?  (e.g., attendance_code IN ('ATTENDED','EXCUSED')

Any ideas or suggestions?  Thanks.

This is similar to the islands and gaps problem. Search for that on StackOverflow and you'll get it.

Bye!
François



--
AGENCY Software  
A data system that puts you in control
(253) 245-3801