Обсуждение: partitioning a dataset + employing hysteresis condition
I've got this table:
create table phone_calls
(
start_time timestamp,
device_id integer,
term_status integer
);
It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes
For instance, w/the following data set:
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);
with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.
I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.
Would appreciate some guidance.
10x,
Hi,
I've got this table:
create table phone_calls
(
start_time timestamp,
device_id integer,
term_status integer
);
It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes
For instance, w/the following data set:
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);
with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.
I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.
Would appreciate some guidance.
10x,
... fixed data set:
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);
On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:Hi,
I've got this table:
create table phone_calls
(
start_time timestamp,
device_id integer,
term_status integer
);
It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes
For instance, w/the following data set:
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);
with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.
I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.
Would appreciate some guidance.
10x,
... fixed data set:
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);
Hi ,On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:Hi,
I've got this table:
create table phone_calls
(
start_time timestamp,
device_id integer,
term_status integer
);
It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes
For instance, w/the following data set:
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 1, 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 1, 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 1, 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 1, 2, 2);
with N=3, T=3
The query should return device_id 2 as errorring, as it registered 3 "bad" events for at least 3 minutes.
I assume some partitioning needs to be employed here, but am not very sure-footed on the subject.
Would appreciate some guidance.
10x,
... fixed data set:
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_calls(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_calls(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);
This is my attempt...
DROP TABLE IF EXISTS phone_call;
CREATE TABLE phone_call
(
device_id int NOT NULL,
start_time timestamptz NOT NULL,
term_status int NOT NULL,
PRIMARY KEY (device_id, start_time, term_status)
);
INSERT INTO
phone_call
(
device_id,
start_time,
term_status
)
VALUES
(10, '20100701T151433', 0),
(20, '20100701T151533', 0),
(20, '20100701T151633', 2),
(30, '20100701T151433', 0),
(30, '20100701T151533', 2),
(30, '20100701T151633', 2),
(40, '20100701T004022', 0),
(40, '20100701T004122', 2),
(40, '20100701T004622', 2),
(40, '20100701T010022', 2),
(40, '20100701T012122', 2),
(50, '20100701T120000', 0),
(50, '20100701T120100', 2),
(50, '20100701T120200', 2),
(50, '20100701T120300', 2),
(60, '20100701T090000', 0),
(60, '20100701T090200', 2),
(60, '20100701T100000', 0),
(60, '20100701T100100', 2),
(60, '20100701T100200', 2),
(60, '20100701T100300', 2),
(60, '20100701T101000', 2),
(60, '20100701T102000', 2),
(60, '20100701T104000', 2),
(60, '20100701T105000', 2),
(60, '20100701T105200', 2),
(60, '20100701T105600', 2),
(60, '20100701T500300', 0),
(60, '20100701T501400', 2);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '10 minutes', 2, 0);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '9 minutes', 2, 1);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '7 minutes', 2, 1);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '6 minutes', 2, 1);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '5 minutes', 2, 0);
INSERT INTO phone_call(start_time, device_id, term_status) VALUES (now() - interval '4 minutes', 2, 2);
INSERT INTO phone_call(start_time, device_id, term_status)VALUES (now() - interval '3 minutes', 2, 2);
INSERT INTO phone_call(start_time, device_id, term_status)VALUES (now() - interval '2 minutes', 2, 2);
TABLE phone_call;
SELECT DISTINCT
pc.device_id
FROM
phone_call pc
WHERE
pc.term_status = 0
AND 3 <=
(
SELECT count(*)
FROM phone_call pc1
WHERE
pc1.device_id = pc.device_id
AND pc1.term_status = 2
AND pc1.start_time > pc.start_time
AND pc1.start_time <
COALESCE
(
(
SELECT min(pc2.start_time)
FROM phone_call pc2
WHERE
pc2.term_status = 0
AND pc2.start_time > pc.start_time
)
, '9999-12-31'
)
GROUP BY
pc1.device_id
HAVING
max(pc1.start_time) >= min(pc1.start_time) + interval '2 minute'
)
ORDER BY
pc.device_id
/**/;/**/
Regards,
Gavin
On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:Hi,
I've got this table:
create table phone_calls
(
start_time timestamp,
device_id integer,
term_status integer
);
[…]
3 points
POINT 1:
I should have given the results of my attempt...
[...]
device_id
-----------
2
40
50
60
(4 rows)
POINT 2:
I also realized I left of a condition in the HAVBING part
HAVING
max(pc1.start_time) >= min(pc1.start_time) + interval '2 minute'
I think it should be
HAVING
max(pc1.start_time) >= min(pc1.start_time) + interval '2 minute'
AND pc1.term_status = 2
POINT 3:
Timestamps should almost always be stored with a time zone (using timestamptz rather than just timestamp) – or you will have problems when Summer time ends or begins, and using timestamptz allows for date&time to be displayed currectly in different locales.
Regards,
Gavin
P.S. Since my post has not shown up yet, I had to attach to its parent!
(I won't offer my first born to get direct posting rights,
as his wife may object -
besides which, it is probably illegal in my jurisdiction!)
On 14/11/11 18:35, Amit Dor-Shifer wrote:On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:Hi,
I've got this table:
create table phone_calls
(
start_time timestamp,
device_id integer,
term_status integer
);
It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes
For instance, w/the following data set:
On 14/11/11 18:35, Amit Dor-Shifer wrote:On Mon, Nov 14, 2011 at 4:29 PM, Amit Dor-Shifer <amit.dor.shifer@gmail.com> wrote:Hi,
I've got this table:
create table phone_calls
(
start_time timestamp,
device_id integer,
term_status integer
);
It describes phone call events. A 'term_status' is a sort-of an exit status for the call, whereby a value != 0 indicates some sort of error.
Given that, I wish to retrieve data on devices with a persisting error on them, of a specific type. I.E. that their last term_status was, say 2. I'd like to employ some hysteresis on the query: only consider a device as errorring if:
1. the last "good" (0) term_status pre-dates a "bad" (2) term_status.
2. it has at least N "bad" term_status events following the last "good" one.
3. The time span between the first "bad" term_status event and the last one is >= T minutes
For instance, w/the following data set:Alternative thought,Have a Boolean field which is set to true for non-zero entries and false for zeros. Upon entering a zero into the table, for a given device, set all currently true records to false. Combine with a partial index on the true and you can quickly get a listing of all devices in error mode and all the recent error entries.David J.
Really appreciate the attention. Thanks!
Here's what I've so-far come up with:
SELECT pc.device_id, MAX(lsc.last_successful) AS last_successful, MIN(pc.start_time) AS faulting_since
FROM (
WHERE term_status IS NOT NULL AND term_status = 0
UNION
SELECT NULL AS last_successful, device_id FROM phone_calls
HAVING EVERY(term_status = 2) = TRUE
) AS lsc,
WHERE
pc.device_id=lsc.device_id
AND pc.term_status=2
AND (lsc.last_successful is NULL OR pc.start_time > lsc.last_successful)
GROUP BY pc.device_id
HAVING MIN(pc.start_time) < ?
AND COUNT(term_status) >= ?
The two parameters to the query are T & N, in order, with a slight change: T is a timestamp, so instead of specifying the a time span of 2 minutes, I pass it as NOW() - INTERVAL '2 minute'.
with T=NOW() - INTERVAL '2 minute' I get the following on the a/m dataset:
N=4:
40;"2010-07-01 00:40:22";"2010-07-01 00:41:22"
N=3:
40;"2010-07-01 00:40:22";"2010-07-01 00:41:22"
50;"2010-07-01 12:00:00";"2010-07-01 12:01:00"
2;"2011-11-16 21:56:59.52107";"2011-11-16 21:57:59.52107"
N=2:
40;"2010-07-01 00:40:22";"2010-07-01 00:41:22"
2;"2011-11-16 21:55:16.88869";"2011-11-16 21:56:16.88869"
30;"2010-07-01 15:14:33";"2010-07-01 15:15:33"
50;"2010-07-01 12:00:00";"2010-07-01 12:01:00"
* The bit with the union is to accommodate for devices which never see a single successful term_status.
Thanks a lot for the helpful hints :)