Обсуждение: filtering based on table of start/end times
Hi, At first glance, this seemed simple to implement, but this is giving me a bit of a headache. Say we have a table as follows: CREATE TABLE voltage_series ( voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass), "time" timestamp withouttime zone NOT NULL, voltage numeric, CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id)); So it contains a time series of voltage measurements. Now suppose we have another table of start/end times that we'd like to use to filter out (or keep) records in voltage_series: CREATE TABLE voltage_log ( record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass), time_beg timestamp without time zoneNOT NULL, time_end timestamp without time zone NOT NULL, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id)); where each record represents start/end times where the voltage measurement should be removed/kept. The goal is to retrieve the records in voltage_series that are not included in any of the periods defined by the start/end times in voltage_log. I've looked at the OVERLAPS operator, but it's not evident to me whether that is the best approach. Any tips would be appreciated. Cheers, -- Seb
<br /><div class="moz-cite-prefix">On 11/7/2014 3:12 PM, Seb wrote:<br /></div><blockquote cite="mid:8761eqbwip.fsf@net82.ceos.umanitoba.ca"type="cite"><pre wrap="">Hi, At first glance, this seemed simple to implement, but this is giving me a bit of a headache. Say we have a table as follows: CREATE TABLE voltage_series ( voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass), "time" timestamp withouttime zone NOT NULL, voltage numeric, CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id)); So it contains a time series of voltage measurements. Now suppose we have another table of start/end times that we'd like to use to filter out (or keep) records in voltage_series: CREATE TABLE voltage_log ( record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass), time_beg timestamp without time zoneNOT NULL, time_end timestamp without time zone NOT NULL, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id)); where each record represents start/end times where the voltage measurement should be removed/kept. The goal is to retrieve the records in voltage_series that are not included in any of the periods defined by the start/end times in voltage_log. I've looked at the OVERLAPS operator, but it's not evident to me whether that is the best approach. Any tips would be appreciated. Cheers, </pre></blockquote><br /> Hello Seb,<br /><br /> Any reason this won't work for you?<br /><br /><blockquote><tt>SELECT *</tt><br/><tt> FROM voltage_series</tt><br /><tt> WHERE time NOT IN (SELECT DISTINCT time FROM voltage_series, voltage_logWHERE time BETWEEN time_beg AND time_end);</tt><br /></blockquote><br /> Might not be the fastest way to do it,if the tables are large. Apologies if I've not understood your question properly.<br /><br /> Regards,<br /> Bryan<br/><br /><br /><br /><pre class="moz-signature" cols="72">-- ______________ Postdoctoral Researcher GA Cooperative Fish & Wildlife Research Unit Warnell School of Forestry & Natural Resources University of Georgia Athens 30602-2152</pre>
I already sent this but used a wrong address. Sorry for the mess. On 07.11.2014 21:12, Seb wrote: > Hi, > > At first glance, this seemed simple to implement, but this is giving me > a bit of a headache. > > Say we have a table as follows: > > CREATE TABLE voltage_series > ( > voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass), > "time" timestamp without time zone NOT NULL, > voltage numeric, > CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id)); > > So it contains a time series of voltage measurements. Now suppose we > have another table of start/end times that we'd like to use to filter > out (or keep) records in voltage_series: > > CREATE TABLE voltage_log > ( > record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass), > time_beg timestamp without time zone NOT NULL, > time_end timestamp without time zone NOT NULL, > CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id)); > > where each record represents start/end times where the voltage > measurement should be removed/kept. The goal is to retrieve the records > in voltage_series that are not included in any of the periods defined by > the start/end times in voltage_log. > > I've looked at the OVERLAPS operator, but it's not evident to me whether > that is the best approach. Any tips would be appreciated. > > Cheers, > Something like this should work: SELECT * FROM voltage_series AS vs LEFT JOIN voltage_log vl ON vs.time BETWEEN vl.time_beg AND vl.time_end WHERE vl.id IS NULL This is untested, but I think it should work. greetings Tim
On Sat, 08 Nov 2014 11:26:53 +0100, Tim Schumacher <tim@bandenkrieg.hacked.jp> wrote: > I already sent this but used a wrong address. Sorry for the mess. > On 07.11.2014 21:12, Seb wrote: >> Hi, >> At first glance, this seemed simple to implement, but this is giving >> me a bit of a headache. >> Say we have a table as follows: >> CREATE TABLE voltage_series ( voltage_record_id integer NOT NULL >> DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass), >> "time" timestamp without time zone NOT NULL, voltage numeric, >> CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id)); >> So it contains a time series of voltage measurements. Now suppose we >> have another table of start/end times that we'd like to use to filter >> out (or keep) records in voltage_series: >> CREATE TABLE voltage_log ( record_id integer NOT NULL DEFAULT >> nextval('voltage_log_record_id_seq'::regclass), time_beg timestamp >> without time zone NOT NULL, time_end timestamp without time zone NOT >> NULL, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id)); >> where each record represents start/end times where the voltage >> measurement should be removed/kept. The goal is to retrieve the >> records in voltage_series that are not included in any of the periods >> defined by the start/end times in voltage_log. >> I've looked at the OVERLAPS operator, but it's not evident to me >> whether that is the best approach. Any tips would be appreciated. >> Cheers, > Something like this should work: > SELECT * FROM voltage_series AS vs LEFT JOIN voltage_log vl ON vs.time > BETWEEN vl.time_beg AND vl.time_end WHERE vl.id IS NULL > This is untested, but I think it should work. Thank you all for your suggestions. The above proved very fast with the millions of records and several other joins involved. -- Seb
On 07.11.2014 21:12, Seb wrote: > Hi, > > At first glance, this seemed simple to implement, but this is giving me > a bit of a headache. > > Say we have a table as follows: > > CREATE TABLE voltage_series > ( > voltage_record_id integer NOT NULL DEFAULT nextval('voltage_series_logger_record_id_seq'::regclass), > "time" timestamp without time zone NOT NULL, > voltage numeric, > CONSTRAINT voltage_series_pkey PRIMARY KEY (voltage_record_id)); > > So it contains a time series of voltage measurements. Now suppose we > have another table of start/end times that we'd like to use to filter > out (or keep) records in voltage_series: > > CREATE TABLE voltage_log > ( > record_id integer NOT NULL DEFAULT nextval('voltage_log_record_id_seq'::regclass), > time_beg timestamp without time zone NOT NULL, > time_end timestamp without time zone NOT NULL, > CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id)); > > where each record represents start/end times where the voltage > measurement should be removed/kept. The goal is to retrieve the records > in voltage_series that are not included in any of the periods defined by > the start/end times in voltage_log. > > I've looked at the OVERLAPS operator, but it's not evident to me whether > that is the best approach. Any tips would be appreciated. > > Cheers, > Something like this should work: SELECT * FROM voltage_series AS vs LEFT JOIN voltage_log vl ON vs.time BETWEEN vl.time_beg AND vl.time_end WHERE vl.id IS NULL This is untested, but I think it should work. greetings Tim
On Sun, 09 Nov 2014 12:43:22 -0600, Seb <spluque@gmail.com> wrote: > On Sat, 08 Nov 2014 11:26:53 +0100, > Tim Schumacher <tim@bandenkrieg.hacked.jp> wrote: [...] >> Something like this should work: >> SELECT * FROM voltage_series AS vs LEFT JOIN voltage_log vl ON >> vs.time BETWEEN vl.time_beg AND vl.time_end WHERE vl.id IS NULL >> This is untested, but I think it should work. > Thank you all for your suggestions. The above proved very fast with > the millions of records and several other joins involved. Sorry to come back with a related issue, which is proving troublesome. There's another log table, that looks just like voltage_log, but has an additional column with an integer indicating what problem occurred during the period: CREATE TABLE voltage_diagnostic_log ( record_id serial, time_beg timestamp without time zone NOT NULL, time_end timestamp without time zone NOT NULL, diagnosticinteger, CONSTRAINT voltage_log_pkey PRIMARY KEY (record_id)); So that a view can be built for the voltage_series table where its columns can be adjusted based on the diagnostic integer (there are other columns besides voltage in the actual table), if the time stamp falls within a period of the voltage_diagnostic_log table. To illustrate, the view needs to be able to have field definitions such as (pseudo-code): CASE WHEN diagnostic=1 THEN voltage * 0.88 ELSE voltage END AS voltage_corrected, CASE WHEN diagnostic=2 THEN pressure - 2.5 ELSE pressure END AS pressure_corrected, The problem is that each record in voltage_series can have several matching records in voltage_diagnostic_log. Any insights welcome! -- Seb
On Wed, 12 Nov 2014 15:49:57 -0600, Seb <spluque@gmail.com> wrote: > Sorry to come back with a related issue, which is proving troublesome. > There's another log table, that looks just like voltage_log, but has > an additional column with an integer indicating what problem occurred > during the period: > CREATE TABLE voltage_diagnostic_log ( record_id serial, time_beg > timestamp without time zone NOT NULL, time_end timestamp without time > zone NOT NULL, diagnostic integer, CONSTRAINT voltage_log_pkey PRIMARY > KEY (record_id)); [...] For posterity's sake, the only solution I was able to find was to first create a view with a separate boolean column for each diagnostic value via crosstab(). From there it was possible to use a WITH subquery to remove rows with a particular diagnostic value (as suggested previously), and then have the main SELECT statement do a second left join to the crosstab view so that it could make use of the rest of the boolean columns as sources for CASE statements for each field. Cheers, -- Seb