Обсуждение: connecting multiple INSERT CTEs to same record?
Hello, I'm looking for a way to insert items to multiple tables (connected with foreign keys) using CTEs. I found few similar questions on stack-overflow and the mailing list, but no solution. Please consider this contrived example: A table of students, and a table of classes they are in: ==== create temp table students( id serial primary key, name varchar); create temp table classes( id serial primary key, student_id int not null references students(id), subject varchar); ==== And, I am given a list of new students and their classes (from an external source, imported into a temp table): ==== create temp table new_data ( name varchar, subject varchar); insert into new_data values ('John','Math'), ('Jane','Physics'), ('Moe','Science'), ('John','English'); -- different student with same name ==== I want to first create new 'students' record, and then create a corresponding 'classes' record. For that - I need both the newly assigned 'id', and they corresponding source record from 'new_data'. A trivial usage of "insert ... returning *" doesn't work - it can't return columns that were not directly inserted (and the 'subject' string wasn't inserted): === with new_students as ( insert into students(name) select name from new_data returning * ) insert into classes(student_id, subject) select new_students.id, -- how to get the subject matching to --- the newly inserted student? ?????? from new_students ; === I found a work-around, but it relies on a flimsy assumption - that within this transaction, the CTIDs for the 'new_data' and the newly inserted 'students' rows maintain the same order. Using this assumption, I calculate a unique value for each row with 'row_number() over (order by ctid)', assuming that the order stays the same, then join the tables to match the new student to its 'subject': === with new_data_with_order as ( select row_number() over (order by ctid) as new_order, * from new_data ) , new_students as ( insert into students(name) select name from new_data_with_order returning ctid,* ) , new_students_with_order as ( select row_number() over (order by ctid) as new_order, * from new_students ) , merged_data as ( -- Is this correct?? -- it is based on the assumption that the order -- of inserted rows into 'students' (and hence, their CTID order) -- is the same as the order of the rows in 'new_data' -- and their CTID. select * from new_students_with_order join new_data_with_order on new_students_with_order.new_order = new_data_with_order.new_order ) -- for troubleshooting: -- select * from merged_data ; insert into classes(student_id, subject) select id, -- merged_data.id is the newly assigned student.id primary key. subject from merged_data ; === So my question is: Is this assumption about CTID maintaining order (within the transaction) correct? Even when considering concurrency? Even if in strange coincidence 'VACCUUM' is running in parallel to this query? Or, Is there another way to achieve this? consider that this is a contrived example, in my use-case I need to update several tables. Of course it can be done on the application-level, one-by-one inserting new 'student' then inserting its 'classes' - but I prefer to avoid that. Sadly, I can't assume the student name is unique, so I can't "join" on it. Thanks! - Assaf Gordon
On Tuesday, October 19, 2021, Assaf Gordon <assafgordon@gmail.com> wrote:
Sadly, I can't assume the student name is unique, so I can't "join" on it.
Instead of assigning a unique identifier to student after inserting it into the table, assign the identifier first. Generally this is done by using “nextval()” explicitly in a manner similar to how it is used when computing the default for the serial column. Then you can return/join on the identifier reliably and not worry about duplicate student names.
David J.
On 2021-10-19 1:13 p.m., David G. Johnston wrote: > > Instead of assigning a unique identifier to student after inserting it > into the table, assign the identifier first. Generally this is done by > using “nextval()” Aha! Such elegant solution! Seems obvious in hindsight, but I just couldn't figure it out nor find any mentions to it online. For others who might stumble upon this thread in the future, the solution becomes: ==== with -- pre-assign unique IDs for each record, -- without inserting them to the table yet. new_data_with_ids as ( select nextval(pg_get_serial_sequence('students','id')) as new_student_id, new_data.* from new_data ) , -- Now insert the new names, with their pre-assigned IDs new_students as ( insert into students(id,name) select new_student_id, name from new_data_with_ids returning * -- optional ) , -- And use the IDs for other tables, too new_classes as ( insert into classes(student_id, subject) select new_student_id, subject from new_data_with_ids returning * --optional ) -- return the new IDs with the data select * from new_data_with_ids ; === Thank you! Regards, - Assaf Gordon