Обсуждение: Populating missing dates in postgresql data

Поиск
Список
Период
Сортировка

Populating missing dates in postgresql data

От
"Lavrenz, Steven M"
Дата:

Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a database of hardware objects, each of which has several different “channels”. Once per day, these channels are supposed to check in with a central server, generating an event log table (TABLE A) like the following:

 

object_id             channel                check-in date

****************************************

990                         1                              2014-12-01

990                         1                              2014-12-02

990                         2                              2014-12-01

990                         2                              2014-12-02

286                         2                              2014-12-01

286                         2                              2014-12-02

286                         5                              2014-12-01

286                         5                              2014-12-02

4507                       1                              2014-12-01

4507                       1                              2014-12-02

4507                       2                              2014-12-01

4507                       2                              2014-12-02

 

And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. For example, let’s say that object 286 loses communications on 12/1/2014. Then the table might look like:   

 

object_id             channel                check-in date

****************************************

990                         1                              2014-12-01

990                         1                              2014-12-02

990                         2                              2014-12-01

990                         2                              2014-12-02

 

286                         2                              2014-12-02

 

286                         5                              2014-12-02

4507                       1                              2014-12-01

4507                       1                              2014-12-02

4507                       2                              2014-12-01

4507                       2                              2014-12-02

 

Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have:

 

object_id             channel                check-in date

****************************************

990                         1                              2014-12-01

990                         1                              2014-12-02

990                         2                              2014-12-01

990                         2                              2014-12-02

 

286                         2                              2014-12-02

286                         5                              2014-12-01

286                         5                              2014-12-02

4507                       1                              2014-12-01

4507                       1                              2014-12-02

4507                       2                              2014-12-01

4507                       2                              2014-12-02

 

I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day. For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So, for the example where all channels on object 286 do not check in, I would like to get is something like this:

 

object_id             channel                check-in date                     comm failure

**********************************************************

990                         1                              2014-12-01                          No

990                         1                              2014-12-02                          No

990                         2                              2014-12-01                          No

990                         2                              2014-12-02                          No

286                         2                              2014-12-01                          Yes

286                         2                              2014-12-02                          No

286                         5                              2014-12-01                          Yes

286                         5                              2014-12-02                          No

4507                       1                              2014-12-01                          No

4507                       1                              2014-12-02                          No

4507                       2                              2014-12-01                          No

4507                       2                              2014-12-02                          No

 

 

I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join of TABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single day, and it gives me something like:

 

object_id             channel                check-in date                     comm failure

**********************************************************

990                         1                              2014-12-01                          No

990                         1                              2014-12-02                          No

990                         2                              2014-12-01                          No

990                         2                              2014-12-02                          No

286                         2                                                                              Yes

286                         2                              2014-12-02                          No

286                         5                                                                              Yes

286                         5                              2014-12-02                          No

4507                       1                              2014-12-01                          No

4507                       1                              2014-12-02                          No

4507                       2                              2014-12-01                          No

4507                       2                              2014-12-02                          No

 

I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help!

               

 

Best Regards,

Steve

 

*************************************************

Steven Lavrenz, MS, EIT

Doctoral Research Fellow, Ph.D. Candidate

Purdue University | Transportation Engineering

Hampton Hall of Civil Engineering, Room 1122

550 Stadium Mall Drive
West Lafayette, IN 47907

765-775-6423

slavrenz@purdue.edu

 

Re: Populating missing dates in postgresql data

От
Adrian Klaver
Дата:
On 03/25/2015 05:25 PM, Lavrenz, Steven M wrote:
> Alright everyone, this is a doozy of a problem. I am new to Postgres so
> I appreciate patience/understanding. I have a database of hardware
> objects, each of which has several different “channels”. Once per day,
> these channels are supposed to check in with a central server,
> generating an event log table (TABLE A) like the following:
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
> 286                         2                              2014-12-01
>
> 286                         2                              2014-12-02
>
> 286                         5                              2014-12-01
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
> And so on. Occasionally, communications will break down to the hardware,
> such that no reporting occurs. For example, let’s say that object 286
> loses communications on 12/1/2014. Then the table might look like:
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
> 286                         2                              2014-12-02
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
> Or let’s say that for some reason, just channel 2 loses reporting for a
> day. Then we would have:
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
> 286                         2                              2014-12-02
>
> 286                         5                              2014-12-01
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
> I have a second table (TABLE B) with all of the object_ids and channels
> that are supposed to be reporting in each day. For cases where a certain
> channel does not check in, I want to add a column that indicates the
> comm failure. So, for the example where all channels on object 286 do
> not check in, I would like to get is something like this:
>
> object_id             channel                check-in
> date                     comm failure
>
> **********************************************************
>
> 990                         1
> 2014-12-01                          No
>
> 990                         1
> 2014-12-02                          No
>
> 990                         2
> 2014-12-01                          No
>
> 990                         2
> 2014-12-02                          No
>
> 286                         2
> 2014-12-01                          Yes
>
> 286                         2
> 2014-12-02                          No
>
> 286                         5
> 2014-12-01                          Yes
>
> 286                         5
> 2014-12-02                          No
>
> 4507                       1
> 2014-12-01                          No
>
> 4507                       1
> 2014-12-02                          No
>
> 4507                       2
> 2014-12-01                          No
>
> 4507                       2
> 2014-12-02                          No
>
> I have been racking my mind for the better part of a day on how to do
> this. The thing is that I can do a right join of TABLE B on TABLE A, and
> this will populate the missing object ids and channels. However, this
> only works for a single day, and it gives me something like:
>
> object_id             channel                check-in
> date                     comm failure
>
> **********************************************************
>
> 990                         1
> 2014-12-01                          No
>
> 990                         1
> 2014-12-02                          No
>
> 990                         2
> 2014-12-01                          No
>
> 990                         2
> 2014-12-02                          No
>
> 286                         2
>                                                  Yes
>
> 286                         2
> 2014-12-02                          No
>
> 286                         5
>                                                  Yes
>
> 286                         5
> 2014-12-02                          No
>
> 4507                       1
> 2014-12-01                          No
>
> 4507                       1
> 2014-12-02                          No
>
> 4507                       2
> 2014-12-01                          No
>
> 4507                       2
> 2014-12-02                          No
>
> I need to do a count of comm failures by day, so I need to populate the
> check-in date field. Please help!
>

Without seeing the actual query this is just a suggestion. I would say
use CASE:

http://www.postgresql.org/docs/9.3/interactive/functions-conditional.html#FUNCTIONS-CASE

Where if the date was not available from table A use the one from table B.

> Best Regards,
>
> Steve
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Populating missing dates in postgresql data

От
"David G. Johnston"
Дата:
On Wed, Mar 25, 2015 at 5:25 PM, Lavrenz, Steven M <slavrenz@purdue.edu> wrote:

Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a database of hardware objects, each of which has several different “channels”. Once per day, these channels are supposed to check in with a central server, generating an event log table (TABLE A) like the following:

 

object_id             channel                check-in date

****************************************

990                         1                              2014-12-01

990                         1                              2014-12-02

990                         2                              2014-12-01

990                         2                              2014-12-02

286                         2                              2014-12-01

286                         2                              2014-12-02

286                         5                              2014-12-01

286                         5                              2014-12-02

4507                       1                              2014-12-01

4507                       1                              2014-12-02

4507                       2                              2014-12-01

4507                       2                              2014-12-02

 

And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. For example, let’s say that object 286 loses communications on 12/1/2014. Then the table might look like:   

 

object_id             channel                check-in date

****************************************

990                         1                              2014-12-01

990                         1                              2014-12-02

990                         2                              2014-12-01

990                         2                              2014-12-02

 

286                         2                              2014-12-02

 

286                         5                              2014-12-02

4507                       1                              2014-12-01

4507                       1                              2014-12-02

4507                       2                              2014-12-01

4507                       2                              2014-12-02

 

Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have:

 

object_id             channel                check-in date

****************************************

990                         1                              2014-12-01

990                         1                              2014-12-02

990                         2                              2014-12-01

990                         2                              2014-12-02

 

286                         2                              2014-12-02

286                         5                              2014-12-01

286                         5                              2014-12-02

4507                       1                              2014-12-01

4507                       1                              2014-12-02

4507                       2                              2014-12-01

4507                       2                              2014-12-02

 

I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day. For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So, for the example where all channels on object 286 do not check in, I would like to get is something like this:

 

object_id             channel                check-in date                     comm failure

**********************************************************

990                         1                              2014-12-01                          No

990                         1                              2014-12-02                          No

990                         2                              2014-12-01                          No

990                         2                              2014-12-02                          No

286                         2                              2014-12-01                          Yes

286                         2                              2014-12-02                          No

286                         5                              2014-12-01                          Yes

286                         5                              2014-12-02                          No

4507                       1                              2014-12-01                          No

4507                       1                              2014-12-02                          No

4507                       2                              2014-12-01                          No

4507                       2                              2014-12-02                          No

 

 

I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join of TABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single day, and it gives me something like:

 

object_id             channel                check-in date                     comm failure

**********************************************************

990                         1                              2014-12-01                          No

990                         1                              2014-12-02                          No

990                         2                              2014-12-01                          No

990                         2                              2014-12-02                          No

286                         2                                                                              Yes

286                         2                              2014-12-02                          No

286                         5                                                                              Yes

286                         5                              2014-12-02                          No

4507                       1                              2014-12-01                          No

4507                       1                              2014-12-02                          No

4507                       2                              2014-12-01                          No

4507                       2                              2014-12-02                          No

 

I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help!

               


​People are much more inclined to help if you make it easy for them.

Read up on "CTE" (WITH clause) in the documentation.  And "VALUES".  Both in the "SELECT" command section.

Use those to create inline versions of your two tables and then provide the query you can get working.

The basic solution is:

WITH actual_checkins (id, channel, date, count_for_day) AS (...)
, expected_checkins (id, channel, date) AS (...)
SELECT id, channel, date, COALESCE(count_for_day, 0) AS number_of_checkins
FROM expected_checkins
LEFT JOIN actual_checkins USING (id, channel, date)

You can convert zero/non-true to "no/yes" via a CASE WHEN count = 0 THEN 'no' ELSE 'yes' END or similar.

David J.

​P.S. the function "generate_series(date, date)" may serve you well

Re: Populating missing dates in postgresql data

От
Mitu Verma
Дата:

Hi,

 

We have a customer complaining about the time taken by one of the application scripts while deleting older data from the log tables.

During the deletion, customer reported that he often sees the below error and because of which table size doesn’t reduce.

 

ERROR: canceling autovacuum task

Date: 2015-03-14 04:29:19

Context: automatic analyze of table "fm_db_Server3.mmsuper.audittraillogentry"

 

We have the following queries in this regard:

-          How often is the autovacuum task invoked by postgres

-          If the task gets cancelled (while we were deleting data from the database) would this task be re-invoked at a later time?

-          If insertion of data into a table also impact this task?

-          If we can manually schedule this task to a particular time (like off peak hours)?

 

 

Regards

Mitu

 

 

 

Re: Populating missing dates in postgresql data

От
"David G. Johnston"
Дата:
On Wed, Mar 25, 2015 at 8:57 PM, Mitu Verma <mitu.verma@ericsson.com> wrote:

Hi,

 

We have a customer complaining about the time taken by one of the application scripts while deleting older data from the log tables.

During the deletion, customer reported that he often sees the below error and because of which table size doesn’t reduce.


​You should look into partitioning (implemented via inheritance) for your log table so that you can simply issue a "DROP TABLE ..."​

 

ERROR: canceling autovacuum task

Date: 2015-03-14 04:29:19

Context: automatic analyze of table "fm_db_Server3.mmsuper.audittraillogentry"

 

We have the following queries in this regard:

-          How often is the autovacuum task invoked by postgres

​As often as it needs to - based upon changes being made to tables​

-          If the task gets cancelled (while we were deleting data from the database) would this task be re-invoked at a later time?

​Yes​

-          If insertion of data into a table also impact this task?

​Yes​
 

-          If we can manually schedule this task to a particular time (like off peak hours)?


​Yes.  You can issue "ANALYZE" (or VACUUM) manually via SQL or the PostgreSQL provided "vacuumdb" command using whatever external scheduling mechanism you have at your disposal.

David J.

Re: Populating missing dates in postgresql data

От
Alban Hertroys
Дата:
> On 26 Mar 2015, at 1:25, Lavrenz, Steven M <slavrenz@purdue.edu> wrote:
>
> Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a
databaseof hardware objects, each of which has several different “channels”. Once per day, these channels are supposed 
>
>
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
> 286                         2                              2014-12-01
>
> 286                         2                              2014-12-02
>
> 286                         5                              2014-12-01
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
>
>
> And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. object 286
losescommunications on 12/1/2014. Then the table might look like:    
>
>
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
>
>
> 286                         2                              2014-12-02
>
>
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
>
>
> Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have:
>
>
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
>
>
> 286                         2                              2014-12-02
>
> 286                         5                              2014-12-01
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
>
>
> I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each
day.For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So,
for
>
>
>
> object_id             channel                check-in date                     comm failure
>
> **********************************************************
>
> 990                         1                              2014-12-01                          No
>
> 990                         1                              2014-12-02                          No
>
> 990                         2                              2014-12-01                          No
>
> 990                         2                              2014-12-02                          No
>
> 286                         2                              2014-12-01                          Yes
>
> 286                         2                              2014-12-02                          No
>
> 286                         5                              2014-12-01                          Yes
>
> 286                         5                              2014-12-02                          No
>
> 4507                       1                              2014-12-01                          No
>
> 4507                       1                              2014-12-02                          No
>
> 4507                       2                              2014-12-01                          No
>
> 4507                       2                              2014-12-02                          No
>
>
>
>
>
> I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join
ofTABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single 
>
>
>
> object_id             channel                check-in date                     comm failure
>
> **********************************************************
>
> 990                         1                              2014-12-01                          No
>
> 990                         1                              2014-12-02                          No
>
> 990                         2                              2014-12-01                          No
>
> 990                         2                              2014-12-02                          No
>
> 286                         2                                                                              Yes
>
> 286                         2                              2014-12-02                          No
>
> 286                         5                                                                              Yes
>
> 286                         5                              2014-12-02                          No
>
> 4507                       1                              2014-12-01                          No
>
> 4507                       1                              2014-12-02                          No
>
> 4507                       2                              2014-12-01                          No
>
> 4507                       2                              2014-12-02                          No
>
>
>
> I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help!


Easiest would be to insert the missing values in your table, something like:

WITH RECURSIVE calendar (missing_date) AS (
    SELECT MAX(check_in_date) FROM table_a WHERE comm_failure = 'YES'
    UNION ALL
    SELECT missing_date + interval '1 day' FROM calendar
    WHERE missing_date < CURRENT_DATE
)
INSERT INTO table_a (object_id, channel, check_in_date, comm_failure)
SELECT b.object_id, b.channel. c.missing_date, 'YES'
FROM table_b b, calendar c
WHERE NOT EXISTS (
    SELECT 1
    FROM table_a a
    WHERE a.object_id = b.object_id
    AND a.channel = b.channel
    AND a.check_in_date = c.missing_date
);

That's off the top of my head, untested, etc, but I think I got that mostly right.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: Populating missing dates in postgresql data

От
Vincent Veyron
Дата:
On Thu, 26 Mar 2015 00:25:09 +0000
"Lavrenz, Steven M" <slavrenz@purdue.edu> wrote:

> I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each
day.For cases where a certain channel does not check in, I want to add a column that indicates the comm failure.  

Not sure if your context allows it, but if you can change your program's logic, it might be easier to :

-add a boolean field (e.g. 'checked') to table B, set to false
-whenever an object checks in do 'update table B set checked = true where object_id = X and channel = Y'
-run a cron job once a day that
    -runs 'select * from B where checked = false' and stores results somewhere
    -resets B with 'update B set checked = false'





--
                    Salutations, Vincent Veyron

https://libremen.com/
Legal case, contract and insurance claim management software


Re: Populating missing dates in postgresql data

От
"David G. Johnston"
Дата:
On Fri, Mar 27, 2015 at 3:41 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
On Thu, 26 Mar 2015 00:25:09 +0000
"Lavrenz, Steven M" <slavrenz@purdue.edu> wrote:

> I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day. For cases where a certain channel does not check in, I want to add a column that indicates the comm failure.

Not sure if your context allows it, but if you can change your program's logic, it might be easier to :

-add a boolean field (e.g. 'checked') to table B, set to false
-whenever an object checks in do 'update table B set checked = true where object_id = X and channel = Y'
-run a cron job once a day that
        -runs 'select * from B where checked = false' and stores results somewhere
        -resets B with 'update B set checked = false'


​This is a performance optimization that I would avoid at nearly any cost, and there are likely better ways to limit the processing scope without having to trust the a cron job runs daily​
 
​in order to not lose data.  Now, there are other points of failure here that are of a similar nature already but still adding one for (pre-mature...) optimization doesn't seem like a good move.

David J.

Re: Populating missing dates in postgresql data

От
Vincent Veyron
Дата:
On Fri, 27 Mar 2015 08:33:36 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> ​This is a performance optimization that I would avoid at nearly any cost,
> and there are likely better ways to limit the processing scope without
> having to trust the a cron job runs daily​

mmmhh... it's not so much performance optimization as query simplification, but I agree it adds a maintenance problem.

I never heard of problems with cron not running though, is this really a concern?


--
                    Salutations, Vincent Veyron

https://legalcase.libremen.com/
Legal case, contract and insurance claim management software