Обсуждение: connecting multiple INSERT CTEs to same record?

Поиск
Список
Период
Сортировка

connecting multiple INSERT CTEs to same record?

От
Assaf Gordon
Дата:
Hello,

I'm looking for a way to insert items to multiple tables (connected with 
foreign keys) using CTEs.

I found few similar questions on stack-overflow and the mailing list,
but no solution.

Please consider this contrived example:
A table of students, and a table of classes they are in:
====
   create temp table students(
         id serial primary key,
         name varchar);

   create temp table classes(
         id serial primary key,
         student_id int not null references students(id),
         subject varchar);
====

And,
I am given a list of new students and their classes (from an external 
source, imported into a temp table):

====
create temp table new_data (
         name varchar,
         subject varchar);

insert into new_data values
      ('John','Math'),
      ('Jane','Physics'),
      ('Moe','Science'),
      ('John','English'); -- different student with same name
====


I want to first create new 'students' record, and then create a
corresponding 'classes' record. For that - I need both the newly
assigned 'id', and they corresponding source record from 'new_data'.

A trivial usage of "insert ... returning *" doesn't work - it can't
return columns that were not directly inserted (and the 'subject' string 
wasn't inserted):

===
   with
      new_students as (
        insert into students(name)
        select name from new_data
        returning *
       )

    insert into classes(student_id, subject)
    select new_students.id,
            -- how to get the subject matching to
            --- the newly inserted student?
            ??????
   from new_students ;
===

I found a work-around, but it relies on a flimsy assumption - that 
within this transaction, the CTIDs for the 'new_data' and the newly
inserted 'students' rows maintain the same order.
Using this assumption, I calculate a unique value for each row with 
'row_number() over (order by ctid)',
assuming that the order stays the same,
then join the tables to match the new student to its 'subject':

===
with
   new_data_with_order as (
        select
            row_number() over (order by ctid) as new_order,
            *
        from new_data )
   ,
   new_students as (
       insert into students(name)
       select name from new_data_with_order
       returning ctid,*
       )
   ,
   new_students_with_order as (
        select
            row_number() over (order by ctid) as new_order,
            *
        from new_students
   )
   ,

   merged_data as (
      -- Is this correct??
      -- it is based on the assumption that the order
      -- of inserted rows into 'students' (and hence, their CTID order)
      -- is the same as the order of the rows in 'new_data'
      -- and their CTID.
      select *
      from new_students_with_order
      join new_data_with_order
      on
        new_students_with_order.new_order =
          new_data_with_order.new_order
   )

   -- for troubleshooting:
   -- select * from merged_data ;

   insert into classes(student_id, subject)
   select
      id, -- merged_data.id is the newly assigned student.id primary key.
      subject
   from merged_data ;

===


So my question is:
Is this assumption about CTID maintaining order (within the transaction) 
correct? Even when considering concurrency?
Even if in strange coincidence 'VACCUUM' is running in parallel to this 
query?

Or,
Is there another way to achieve this? consider that this is a contrived
example, in my use-case I need to update several tables.
Of course it can be done on the application-level, one-by-one inserting 
new 'student' then inserting its 'classes' - but I prefer to avoid that.

Sadly, I can't assume the student name is unique, so I can't "join" on it.


Thanks!
  - Assaf Gordon




Re: connecting multiple INSERT CTEs to same record?

От
"David G. Johnston"
Дата:
On Tuesday, October 19, 2021, Assaf Gordon <assafgordon@gmail.com> wrote:

Sadly, I can't assume the student name is unique, so I can't "join" on it.


Instead of assigning a unique identifier to student after inserting it into the table, assign the identifier first.  Generally this is done by using “nextval()” explicitly in a manner similar to how it is used when computing the default for the serial column.  Then you can return/join on the identifier reliably and not worry about duplicate student names.

David J.

Re: connecting multiple INSERT CTEs to same record?

От
Assaf Gordon
Дата:
On 2021-10-19 1:13 p.m., David G. Johnston wrote:
> 
> Instead of assigning a unique identifier to student after inserting it 
> into the table, assign the identifier first.  Generally this is done by 
> using “nextval()” 

Aha! Such elegant solution!

Seems obvious in hindsight, but I just couldn't figure it out nor find
any mentions to it online.

For others who might stumble upon this thread in the future,
the solution becomes:

====

  with
    -- pre-assign unique IDs for each record,
    -- without inserting them to the table yet.
    new_data_with_ids as (
      select
        nextval(pg_get_serial_sequence('students','id'))
             as new_student_id,
        new_data.*
      from new_data )
    ,

    -- Now insert the new names, with their pre-assigned IDs
    new_students as (
        insert into students(id,name)
        select new_student_id, name
        from new_data_with_ids
        returning * -- optional
        )
    ,

    -- And use the IDs for other tables, too
    new_classes as (
       insert into classes(student_id, subject)
       select new_student_id, subject
       from new_data_with_ids
       returning * --optional
    )

  -- return the new IDs with the data
  select * from new_data_with_ids ;

===

Thank you!

Regards,
  - Assaf Gordon