Обсуждение: Joining one-to-one and one-to-many tables

Поиск
Список
Период
Сортировка

Joining one-to-one and one-to-many tables

От
Jeff Ross
Дата:
I'm stumped about the best was to retrieve the most recent entry in a
one-to-many type of table and combine it with a more standard query
that joins one-to-one.

I have defined these:

jross@wykids localhost# \d trainer_dates
                                    Table "public.trainer_dates"
        Column       |  Type   |                             Modifiers

--------------------+---------+-----------------------------------------
  tr_date_id         | integer | not null default
nextval('trainer_dates_tr_date_id_seq'::regclass)
  tr_date_short_name | text    | not null
  tr_date_name       | text    | not null
  tr_date_active     | boolean | default false
Indexes:
     "trainer_dates_pkey" PRIMARY KEY, btree (tr_date_id)
Referenced by:
     TABLE "trainers_trainer_dates" CONSTRAINT
"trainers_trainer_dates_trs_tr_date_id_fkey" FOREIGN KEY (trs_tr_date_id)
REFERENCES trainer_dates(tr_date_id)

and

jross@wykids localhost# \d trainers_trainer_dates
                       Table "public.trainers_trainer_dates"
         Column        |            Type             |          Modifiers

----------------------+-----------------------------+----------------------
  trs_tr_date_pp_id    | integer                     |
  trs_tr_date_tr_id    | integer                     |
  trs_tr_date_id       | integer                     |
  trs_tr_date          | date                        | default ('now'::text)::date
  trs_tr_date_recorded | timestamp without time zone | default now()
Foreign-key constraints:
     "trainers_trainer_dates_trs_tr_date_id_fkey" FOREIGN KEY (trs_tr_date_id)
REFERENCES trainer_dates(tr_date_id)
     "trainers_trainer_dates_trs_tr_date_pp_id_fkey" FOREIGN KEY
(trs_tr_date_pp_id) REFERENCES people(pp_id)
     "trainers_trainer_dates_trs_tr_date_tr_id_fkey" FOREIGN KEY
(trs_tr_date_tr_id) REFERENCES trainers(tr_id)

Here are the records in trainer_dates:

jross@wykids localhost# select * from trainer_dates;
  tr_date_id |     tr_date_short_name     |    tr_date_name    | tr_date_active
------------+----------------------------+--------------------+----------------
           1 | tr_active_date             | Active             | t
           2 | tr_inactive_date           | Inactive           | t
           3 | tr_destroyed_date          | Destroyed          | t
           4 | tr_pending_date            | Pending            | t
           5 | tr_waiting_for_update_date | Waiting for Update | t
           6 | tr_last_updated_date       | Last Updated Date  | t
           7 | tr_application_date        | Application Date   | t
           8 | tr_denied_date             | Denied             | f
           9 | tr_approved_date           | Approved           | f
(9 rows)

Here is a sample of the data that might be in trainers_trainer_dates:

jross@wykids localhost# select trs_tr_date, trs_tr_date_recorded, tr_date_name
from trainers_trainer_dates join trainer_dates on trs_tr_date_id = tr_date_id
where trs_tr_date_tr_id = 1099 order by trs_tr_date_recorded desc;

  trs_tr_date |    trs_tr_date_recorded    |   tr_date_name
-------------+----------------------------+-------------------
  2010-03-11  | 2010-03-11 09:49:42.736914 | Pending
  2009-12-23  | 2009-12-23 01:00:00        | Inactive
  2009-12-23  | 2009-12-23 00:00:00        | Last Updated Date
  2002-03-21  | 2002-03-21 00:00:00        | Application Date
(4 rows)

(Most of the older trs_tr_date_recorded will not have a full timestamp--this
is a part of a table restructuring and the old table just kept a date.  As I
move data into the new tables I add one hour to the timestamp of the current
status to make sure everything sorts correctly.)

The problem comes when I try to offer results to a query that says something
like "show me all trainers whose current status is Pending".  This query has
to join three tables, a people table with demographics, a trainers table that
holds a trainer id and some notes and the trainers_trainer_dates table.  The
people and trainers table have a one-to-one correlation, while the
trainers_trainer_dates has a one-to-many relationship.

This query, for example, yields 2 results for the trainer referenced above,
instead of just his Pending record:

SELECT
   pp_id,
   pp_trainer_id,
   name,
   tr_status,
   max(trs_tr_date_recorded)
FROM
  (
     SELECT
     pp_id,
     pp_trainer_id,
     pp_last_name || ', ' || pp_first_name as name,
     trs_tr_date_id,
     tr_date_name as tr_status,
     trs_tr_date_recorded
     FROM people
     JOIN trainers_trainer_dates on pp_id = trs_tr_date_pp_id
     JOIN trainer_dates on tr_date_id = trs_tr_date_id WHERE
     trs_tr_date_id NOT IN (
       SELECT tr_date_id from trainer_dates WHERE
         tr_date_name in ('Last Updated Date','Application Date')
       )
     GROUP BY pp_id, pp_trainer_id, pp_last_name, pp_first_name,
       trs_tr_date_id, tr_date_name, trs_tr_date_recorded
     ) as foo2
   WHERE pp_id in (
     SELECT pp_id from people WHERE
       pp_trainer_id IS NOT NULL and pp_provisional_p = 'f'
     INTERSECT
     SELECT trs_tr_date_pp_id from (
       SELECT DISTINCT on (trs_tr_date_pp_id) trs_tr_date_pp_id,
         trs_tr_date_id from trainers_trainer_dates
         order by trs_tr_date_pp_id asc, trs_tr_date_recorded desc
       ) as foo
     JOIN trainer_dates on tr_date_id = trs_tr_date_id WHERE
     tr_date_name = 'Pending'
   )
   GROUP BY pp_id, pp_trainer_id, name, tr_status
   ORDER BY name ASC LIMIT 20 OFFSET 0;

jross@wykids localhost# \e
-[ RECORD 1 ]-+---------------------------
pp_id         | 2790
pp_trainer_id | 1099
name          | Allen, Bryan
tr_status     | Pending
max           | 2010-03-11 09:49:42.736914
-[ RECORD 2 ]-+---------------------------
pp_id         | 2790
pp_trainer_id | 1099
name          | Allen, Bryan
tr_status     | Inactive
max           | 2009-12-23 01:00:00


For my test database, this is the only "Pending" record but in the live
database there will be many trainers with Pending status.  I've tried an
initial SELECT DISTINCT ON (pp_id) pp_id but the results of the query are then
piped into a "datatable" that allows sorting on any column so I ran into a
hitch in the gitalong there.

I'm sure there's a better way to accomplish what I'm trying to get here
(especially since this really doesn't quite *work*) but I can't see it.  This
type of query has become important for me to get because more and more we are
moving to keeping things in the history type of table so we can keep a
progression of events.

Cluesticks or links welcome!

Thanks!

Jeff Ross

Re: Joining one-to-one and one-to-many tables

От
"Garrett Murphy"
Дата:
Jeff:

I may not fully understand the situation or the data you're trying to retrieve, but if you're trying to get "show me
alltrainers whose current status is Pending", I would go backwards to how you're doing it.  If you're only concerned
aboutthose records with tr_date_name = Pending, then you only need to query for those specific records and join to
trainers_trainer_dates.

SELECT
   pp_id,
   pp_trainer_id,
   name,
   tr_status,
   trs_tr_date_recorded
FROM
    trainer_dates
    JOIN trainers_trainer_dates ON tr_date_id = trs_tr_date_id
    JOIN people ON trs_tr_date_pp_id = pp_id
WHERE 
    tr_status='Pending' 





-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jeff Ross
Sent: Thursday, March 11, 2010 11:37 AM
To: PostgreSQL
Subject: [GENERAL] Joining one-to-one and one-to-many tables

I'm stumped about the best was to retrieve the most recent entry in a 
one-to-many type of table and combine it with a more standard query
that joins one-to-one.

I have defined these:

jross@wykids localhost# \d trainer_dates
                                    Table "public.trainer_dates"
        Column       |  Type   |                             Modifiers 

--------------------+---------+-----------------------------------------
  tr_date_id         | integer | not null default 
nextval('trainer_dates_tr_date_id_seq'::regclass)
  tr_date_short_name | text    | not null
  tr_date_name       | text    | not null
  tr_date_active     | boolean | default false
Indexes:
     "trainer_dates_pkey" PRIMARY KEY, btree (tr_date_id)
Referenced by:
     TABLE "trainers_trainer_dates" CONSTRAINT 
"trainers_trainer_dates_trs_tr_date_id_fkey" FOREIGN KEY (trs_tr_date_id) 
REFERENCES trainer_dates(tr_date_id)

and

jross@wykids localhost# \d trainers_trainer_dates
                       Table "public.trainers_trainer_dates"
         Column        |            Type             |          Modifiers 

----------------------+-----------------------------+----------------------
  trs_tr_date_pp_id    | integer                     |
  trs_tr_date_tr_id    | integer                     |
  trs_tr_date_id       | integer                     |
  trs_tr_date          | date                        | default ('now'::text)::date
  trs_tr_date_recorded | timestamp without time zone | default now()
Foreign-key constraints:
     "trainers_trainer_dates_trs_tr_date_id_fkey" FOREIGN KEY (trs_tr_date_id) 
REFERENCES trainer_dates(tr_date_id)
     "trainers_trainer_dates_trs_tr_date_pp_id_fkey" FOREIGN KEY 
(trs_tr_date_pp_id) REFERENCES people(pp_id)
     "trainers_trainer_dates_trs_tr_date_tr_id_fkey" FOREIGN KEY 
(trs_tr_date_tr_id) REFERENCES trainers(tr_id)

Here are the records in trainer_dates:

jross@wykids localhost# select * from trainer_dates;
  tr_date_id |     tr_date_short_name     |    tr_date_name    | tr_date_active
------------+----------------------------+--------------------+----------------
           1 | tr_active_date             | Active             | t
           2 | tr_inactive_date           | Inactive           | t
           3 | tr_destroyed_date          | Destroyed          | t
           4 | tr_pending_date            | Pending            | t
           5 | tr_waiting_for_update_date | Waiting for Update | t
           6 | tr_last_updated_date       | Last Updated Date  | t
           7 | tr_application_date        | Application Date   | t
           8 | tr_denied_date             | Denied             | f
           9 | tr_approved_date           | Approved           | f
(9 rows)

Here is a sample of the data that might be in trainers_trainer_dates:

jross@wykids localhost# select trs_tr_date, trs_tr_date_recorded, tr_date_name 
from trainers_trainer_dates join trainer_dates on trs_tr_date_id = tr_date_id 
where trs_tr_date_tr_id = 1099 order by trs_tr_date_recorded desc;

  trs_tr_date |    trs_tr_date_recorded    |   tr_date_name
-------------+----------------------------+-------------------
  2010-03-11  | 2010-03-11 09:49:42.736914 | Pending
  2009-12-23  | 2009-12-23 01:00:00        | Inactive
  2009-12-23  | 2009-12-23 00:00:00        | Last Updated Date
  2002-03-21  | 2002-03-21 00:00:00        | Application Date
(4 rows)

(Most of the older trs_tr_date_recorded will not have a full timestamp--this 
is a part of a table restructuring and the old table just kept a date.  As I 
move data into the new tables I add one hour to the timestamp of the current 
status to make sure everything sorts correctly.)

The problem comes when I try to offer results to a query that says something 
like "show me all trainers whose current status is Pending".  This query has 
to join three tables, a people table with demographics, a trainers table that 
holds a trainer id and some notes and the trainers_trainer_dates table.  The 
people and trainers table have a one-to-one correlation, while the 
trainers_trainer_dates has a one-to-many relationship.

This query, for example, yields 2 results for the trainer referenced above, 
instead of just his Pending record:

SELECT
   pp_id,
   pp_trainer_id,
   name,
   tr_status,
   max(trs_tr_date_recorded)
FROM
  (
     SELECT
     pp_id,
     pp_trainer_id,
     pp_last_name || ', ' || pp_first_name as name,
     trs_tr_date_id,
     tr_date_name as tr_status,
     trs_tr_date_recorded
     FROM people
     JOIN trainers_trainer_dates on pp_id = trs_tr_date_pp_id
     JOIN trainer_dates on tr_date_id = trs_tr_date_id WHERE
     trs_tr_date_id NOT IN (
       SELECT tr_date_id from trainer_dates WHERE
         tr_date_name in ('Last Updated Date','Application Date')
       )
     GROUP BY pp_id, pp_trainer_id, pp_last_name, pp_first_name,
       trs_tr_date_id, tr_date_name, trs_tr_date_recorded
     ) as foo2
   WHERE pp_id in (
     SELECT pp_id from people WHERE
       pp_trainer_id IS NOT NULL and pp_provisional_p = 'f'
     INTERSECT
     SELECT trs_tr_date_pp_id from (
       SELECT DISTINCT on (trs_tr_date_pp_id) trs_tr_date_pp_id,
         trs_tr_date_id from trainers_trainer_dates
         order by trs_tr_date_pp_id asc, trs_tr_date_recorded desc
       ) as foo
     JOIN trainer_dates on tr_date_id = trs_tr_date_id WHERE
     tr_date_name = 'Pending'
   )
   GROUP BY pp_id, pp_trainer_id, name, tr_status
   ORDER BY name ASC LIMIT 20 OFFSET 0;

jross@wykids localhost# \e
-[ RECORD 1 ]-+---------------------------
pp_id         | 2790
pp_trainer_id | 1099
name          | Allen, Bryan
tr_status     | Pending
max           | 2010-03-11 09:49:42.736914
-[ RECORD 2 ]-+---------------------------
pp_id         | 2790
pp_trainer_id | 1099
name          | Allen, Bryan
tr_status     | Inactive
max           | 2009-12-23 01:00:00


For my test database, this is the only "Pending" record but in the live 
database there will be many trainers with Pending status.  I've tried an 
initial SELECT DISTINCT ON (pp_id) pp_id but the results of the query are then 
piped into a "datatable" that allows sorting on any column so I ran into a 
hitch in the gitalong there.

I'm sure there's a better way to accomplish what I'm trying to get here 
(especially since this really doesn't quite *work*) but I can't see it.  This 
type of query has become important for me to get because more and more we are 
moving to keeping things in the history type of table so we can keep a 
progression of events.

Cluesticks or links welcome!

Thanks!

Jeff Ross

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Joining one-to-one and one-to-many tables

От
Jeff Ross
Дата:
Garrett Murphy wrote:
> Jeff:
>
> I may not fully understand the situation or the data you're trying to retrieve, but if you're trying to get "show me
alltrainers whose current status is Pending", I would go backwards to how you're doing it.  If you're only concerned
aboutthose records with tr_date_name = Pending, then you only need to query for those specific records and join to
trainers_trainer_dates.
>
> SELECT
>    pp_id,
>    pp_trainer_id,
>    name,
>    tr_status,
>    trs_tr_date_recorded
> FROM
>     trainer_dates
>     JOIN trainers_trainer_dates ON tr_date_id = trs_tr_date_id
>     JOIN people ON trs_tr_date_pp_id = pp_id
> WHERE
>     tr_status='Pending'
>
>

Thanks for the reply.  This doesn't quite do what I need but I may have tried
to oversimplify my question too much.

What I need is to be able to join the two one-to-one tables with  several
one-to-many tables and retrieve the most recent entry from each based on the
most recent timestamp for that record.

In my example, I used the sample query of "Show me all trainers who's current
status is "Pending", but that could also be "show me all trainers" and include
in that their status, their education level, their trainer level, and so on.
Each of those three are held in a separate table that contains a pointer to
the person, to their trainer record, to a level or status id, and a  timestamp.

I would have thought that using a subselect to first retrieve the latest value
  from the one-to-many tables would do it but I keep getting results returning
more than one value, as below where I got two results instead of just one.

I think I have found a solution, though, on page 324 of Joe Celko's SQL for
Smarties.  If I re-write the query pulling the latest status for all trainers
as this:

SELECT trs_tr_date_pp_id, tr_date_name FROM trainers_trainer_dates as ttd1
JOIN trainer_dates on tr_date_id = trs_tr_date_id
WHERE NOT EXISTS (
   SELECT * FROM trainers_trainer_dates as ttd2 where
     ttd1.trs_tr_date_pp_id = ttd2.trs_tr_date_pp_id and
     ttd1.trs_tr_date_recorded < ttd2.trs_tr_date_recorded;

I get a list of all trainers and their most current status and I can then
further restrict it to whatever status as needed.

I knew I was making it overly complex but I sure needed a nudge in the right
direction to figure this out.

Jeff

>
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jeff Ross
> Sent: Thursday, March 11, 2010 11:37 AM
> To: PostgreSQL
> Subject: [GENERAL] Joining one-to-one and one-to-many tables
>
> I'm stumped about the best was to retrieve the most recent entry in a
> one-to-many type of table and combine it with a more standard query
> that joins one-to-one.
>
> I have defined these:
>
> jross@wykids localhost# \d trainer_dates
>                                     Table "public.trainer_dates"
>         Column       |  Type   |                             Modifiers
>
> --------------------+---------+-----------------------------------------
>   tr_date_id         | integer | not null default
> nextval('trainer_dates_tr_date_id_seq'::regclass)
>   tr_date_short_name | text    | not null
>   tr_date_name       | text    | not null
>   tr_date_active     | boolean | default false
> Indexes:
>      "trainer_dates_pkey" PRIMARY KEY, btree (tr_date_id)
> Referenced by:
>      TABLE "trainers_trainer_dates" CONSTRAINT
> "trainers_trainer_dates_trs_tr_date_id_fkey" FOREIGN KEY (trs_tr_date_id)
> REFERENCES trainer_dates(tr_date_id)
>
> and
>
> jross@wykids localhost# \d trainers_trainer_dates
>                        Table "public.trainers_trainer_dates"
>          Column        |            Type             |          Modifiers
>
> ----------------------+-----------------------------+----------------------
>   trs_tr_date_pp_id    | integer                     |
>   trs_tr_date_tr_id    | integer                     |
>   trs_tr_date_id       | integer                     |
>   trs_tr_date          | date                        | default ('now'::text)::date
>   trs_tr_date_recorded | timestamp without time zone | default now()
> Foreign-key constraints:
>      "trainers_trainer_dates_trs_tr_date_id_fkey" FOREIGN KEY (trs_tr_date_id)
> REFERENCES trainer_dates(tr_date_id)
>      "trainers_trainer_dates_trs_tr_date_pp_id_fkey" FOREIGN KEY
> (trs_tr_date_pp_id) REFERENCES people(pp_id)
>      "trainers_trainer_dates_trs_tr_date_tr_id_fkey" FOREIGN KEY
> (trs_tr_date_tr_id) REFERENCES trainers(tr_id)
>
> Here are the records in trainer_dates:
>
> jross@wykids localhost# select * from trainer_dates;
>   tr_date_id |     tr_date_short_name     |    tr_date_name    | tr_date_active
> ------------+----------------------------+--------------------+----------------
>            1 | tr_active_date             | Active             | t
>            2 | tr_inactive_date           | Inactive           | t
>            3 | tr_destroyed_date          | Destroyed          | t
>            4 | tr_pending_date            | Pending            | t
>            5 | tr_waiting_for_update_date | Waiting for Update | t
>            6 | tr_last_updated_date       | Last Updated Date  | t
>            7 | tr_application_date        | Application Date   | t
>            8 | tr_denied_date             | Denied             | f
>            9 | tr_approved_date           | Approved           | f
> (9 rows)
>
> Here is a sample of the data that might be in trainers_trainer_dates:
>
> jross@wykids localhost# select trs_tr_date, trs_tr_date_recorded, tr_date_name
> from trainers_trainer_dates join trainer_dates on trs_tr_date_id = tr_date_id
> where trs_tr_date_tr_id = 1099 order by trs_tr_date_recorded desc;
>
>   trs_tr_date |    trs_tr_date_recorded    |   tr_date_name
> -------------+----------------------------+-------------------
>   2010-03-11  | 2010-03-11 09:49:42.736914 | Pending
>   2009-12-23  | 2009-12-23 01:00:00        | Inactive
>   2009-12-23  | 2009-12-23 00:00:00        | Last Updated Date
>   2002-03-21  | 2002-03-21 00:00:00        | Application Date
> (4 rows)
>
> (Most of the older trs_tr_date_recorded will not have a full timestamp--this
> is a part of a table restructuring and the old table just kept a date.  As I
> move data into the new tables I add one hour to the timestamp of the current
> status to make sure everything sorts correctly.)
>
> The problem comes when I try to offer results to a query that says something
> like "show me all trainers whose current status is Pending".  This query has
> to join three tables, a people table with demographics, a trainers table that
> holds a trainer id and some notes and the trainers_trainer_dates table.  The
> people and trainers table have a one-to-one correlation, while the
> trainers_trainer_dates has a one-to-many relationship.
>
> This query, for example, yields 2 results for the trainer referenced above,
> instead of just his Pending record:
>
> SELECT
>    pp_id,
>    pp_trainer_id,
>    name,
>    tr_status,
>    max(trs_tr_date_recorded)
> FROM
>   (
>      SELECT
>      pp_id,
>      pp_trainer_id,
>      pp_last_name || ', ' || pp_first_name as name,
>      trs_tr_date_id,
>      tr_date_name as tr_status,
>      trs_tr_date_recorded
>      FROM people
>      JOIN trainers_trainer_dates on pp_id = trs_tr_date_pp_id
>      JOIN trainer_dates on tr_date_id = trs_tr_date_id WHERE
>      trs_tr_date_id NOT IN (
>        SELECT tr_date_id from trainer_dates WHERE
>          tr_date_name in ('Last Updated Date','Application Date')
>        )
>      GROUP BY pp_id, pp_trainer_id, pp_last_name, pp_first_name,
>        trs_tr_date_id, tr_date_name, trs_tr_date_recorded
>      ) as foo2
>    WHERE pp_id in (
>      SELECT pp_id from people WHERE
>        pp_trainer_id IS NOT NULL and pp_provisional_p = 'f'
>      INTERSECT
>      SELECT trs_tr_date_pp_id from (
>        SELECT DISTINCT on (trs_tr_date_pp_id) trs_tr_date_pp_id,
>          trs_tr_date_id from trainers_trainer_dates
>          order by trs_tr_date_pp_id asc, trs_tr_date_recorded desc
>        ) as foo
>      JOIN trainer_dates on tr_date_id = trs_tr_date_id WHERE
>      tr_date_name = 'Pending'
>    )
>    GROUP BY pp_id, pp_trainer_id, name, tr_status
>    ORDER BY name ASC LIMIT 20 OFFSET 0;
>
> jross@wykids localhost# \e
> -[ RECORD 1 ]-+---------------------------
> pp_id         | 2790
> pp_trainer_id | 1099
> name          | Allen, Bryan
> tr_status     | Pending
> max           | 2010-03-11 09:49:42.736914
> -[ RECORD 2 ]-+---------------------------
> pp_id         | 2790
> pp_trainer_id | 1099
> name          | Allen, Bryan
> tr_status     | Inactive
> max           | 2009-12-23 01:00:00
>
>
> For my test database, this is the only "Pending" record but in the live
> database there will be many trainers with Pending status.  I've tried an
> initial SELECT DISTINCT ON (pp_id) pp_id but the results of the query are then
> piped into a "datatable" that allows sorting on any column so I ran into a
> hitch in the gitalong there.
>
> I'm sure there's a better way to accomplish what I'm trying to get here
> (especially since this really doesn't quite *work*) but I can't see it.  This
> type of query has become important for me to get because more and more we are
> moving to keeping things in the history type of table so we can keep a
> progression of events.
>
> Cluesticks or links welcome!
>
> Thanks!
>
> Jeff Ross
>