Обсуждение: Duplicates Processing
All, I've been provided a CSV file of parts that contains duplicates of properties (e.g. resistors have a wattage, tolerance, and temperature coefficient property) of those parts that differ by a manufacturer part number. What I'd like to do is to process this file and, upon encountering one of the duplicates, take that part with its new part number and move it to a part substitutes table. It seems like it should be pretty simple, but I can't seem to generate a query or a function to accomplish it. I'd greatly appreciate any insight or assistance with solving this problem. Thank you very much in advance. -- Gary Chambers
Gary Chambers <gwchamb@gmail.com> wrote: > I've been provided a CSV file of parts that contains duplicates of > properties (e.g. resistors have a wattage, tolerance, and temperature > coefficient property) of those parts that differ by a manufacturer > part number. What I'd like to do is to process this file and, upon > encountering one of the duplicates, take that part with its new part > number and move it to a part substitutes table. It seems like it > should be pretty simple, but I can't seem to generate a query or a > function to accomplish it. I'd greatly appreciate any insight or > assistance with solving this problem. Thank you very much in advance. You can - for example - create a query with a call to ROW_NUMBER() and then process the matching rows (untested): | INSERT INTO substitutes ([...]) | SELECT [...] FROM | (SELECT *, | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature | ORDER BY part_number) AS RN | FROM parts) AS SubQuery | WHERE RN > 1; | DELETE FROM parts | WHERE primary_key IN | (SELECT primary_key FROM | (SELECT *, | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature | ORDER BY part_number) AS RN | FROM parts) AS SubQuery | WHERE RN > 1); Tim
Tim, Thanks for taking the time to reply! > | INSERT INTO substitutes ([...]) > | SELECT [...] FROM > | (SELECT *, > | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature > | ORDER BY part_number) AS RN > | FROM parts) AS SubQuery > | WHERE RN > 1; > | DELETE FROM parts > | WHERE primary_key IN > | (SELECT primary_key FROM > | (SELECT *, > | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature > | ORDER BY part_number) AS RN > | FROM parts) AS SubQuery > | WHERE RN > 1); You have solved the problem precisely as I described it. In my haste to make the request for assistance, I omitted one critical piece of information that may call into question my data model. In its current state, my substitute parts table contains only the part number (the "new" one, so-to-speak), a foreign key reference to the original parts table, and some location data (which is also in the original parts table). Is there any advice you can offer in light of what I have just described? I apologize for the oversight. -- Gary Chambers
On 10/08/2010 01:42 PM, Gary Chambers wrote: > Tim, > > Thanks for taking the time to reply! > >> | INSERT INTO substitutes ([...]) >> | SELECT [...] FROM >> | (SELECT *, >> | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature >> | ORDER BY part_number) AS RN >> | FROM parts) AS SubQuery >> | WHERE RN > 1; > >> | DELETE FROM parts >> | WHERE primary_key IN >> | (SELECT primary_key FROM >> | (SELECT *, >> | ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature >> | ORDER BY part_number) AS RN >> | FROM parts) AS SubQuery >> | WHERE RN > 1); > > You have solved the problem precisely as I described it. In my haste > to make the request for assistance, I omitted one critical piece of > information that may call into question my data model. In its current > state, my substitute parts table contains only the part number (the > "new" one, so-to-speak), a foreign key reference to the original parts > table, and some location data (which is also in the original parts > table). Is there any advice you can offer in light of what I have > just described? I apologize for the oversight. > > -- Gary Chambers > Perhaps a trade off between nullable fields and redundant types. If your original table simply had a nullable column called isReplacementFor, into which you place in the subsequent rows the id of the first instance found.
Rob, > Perhaps a trade off between nullable fields and redundant types. If > your original table simply had a nullable column called > isReplacementFor, into which you place in the subsequent rows the id of > the first instance found. Am I misunderstanding you when you're suggesting a table like: part_number INTEGER is_replacement_for INTEGER references part_number value INTEGER wattage FLOAT8 ... -- Gary Chambers
Yes. With this you can find all part numbers/supplies which match your value, wattage criteria in one table. Or exclude any which have a non-null is_replacement_for value. If you need to drop the "replaceable" variant, you choose which of the replacements to promote and update the others to match the new "replaceable". They're all instances of the same type of thing so in my view they ought to live in the same table. Also possible to maintain the replacement structure via a (self) join record with replacable/is_replacement_for tuples. You have a similar but slightly more involve maintenance issue of course. On 10/08/2010 02:42 PM, Gary Chambers wrote: > Rob, > >> Perhaps a trade off between nullable fields and redundant types. If >> your original table simply had a nullable column called >> isReplacementFor, into which you place in the subsequent rows the id of >> the first instance found. > > Am I misunderstanding you when you're suggesting a table like: > > part_number INTEGER > is_replacement_for INTEGER references part_number > value INTEGER > wattage FLOAT8 > ... > > -- Gary Chambers
Rob, > Yes. With this you can find all part numbers/supplies which match your > value, wattage criteria in one table. Or exclude any which have a > non-null is_replacement_for value. I understand -- thanks. I have received contradictory advice in a purely data modeling context. What about the null values that will be in the properties columns of the part? It would appear to be more applicable to an employee database where the columns are populated regardless and the "replacement_for" in the context of our discussion would be a self-reference to the employee's manager. No? Thanks again for your help. -- Gary Chambers
My understanding was that the values were in fact in the data of the "replacers". If not, you are correct. In this case the replacers are more like alias for the only instance you have. If the replacers are immutable by all means ship them off to some other table (where I suppose the become pointers to other suppliers of the type of thing holding the real data). Not sure I've ever met immutable data but there you go ;) And to your point of self-reference, it would be to a co-worker more than a manager. Managers are often not good replacements for workers. :) On 10/08/2010 04:12 PM, Gary Chambers wrote: > Rob, > >> Yes. With this you can find all part numbers/supplies which match your >> value, wattage criteria in one table. Or exclude any which have a >> non-null is_replacement_for value. > > I understand -- thanks. I have received contradictory advice in a > purely data modeling context. What about the null values that will be > in the properties columns of the part? It would appear to be more > applicable to an employee database where the columns are populated > regardless and the "replacement_for" in the context of our discussion > would be a self-reference to the employee's manager. No? > > Thanks again for your help. > > -- Gary Chambers
Rob, Thanks for your reply! > And to your point of self-reference, it would be to a co-worker more > than a manager. Managers are often not good replacements for workers. :) :) Absolutely! I was having a conversation over on #postgresql yesterday about this and, due to my inexperience with managing electronic components (and some of the database issues involved), I still have a few unanswered questions. I would like to create a master table of parts derived from the individual tables (e.g. resistors, capacitors, diodes, etc.).I have the rare opportunity to build this from theground, up and would like to ensure that I get it right. Thanks for any advice that you (or anyone) can offer. -- Gary Chambers
Gross generalization perhaps, but keep in mind what the over app/system needs of the components. Bounce those off you standard ER modeling instincts and vice versa and you have a chance! On 10/12/2010 08:19 AM, Gary Chambers wrote: > Rob, > > Thanks for your reply! > >> And to your point of self-reference, it would be to a co-worker more >> than a manager. Managers are often not good replacements for workers. :) > > :) Absolutely! > > I was having a conversation over on #postgresql yesterday about this > and, due to my inexperience with managing electronic components (and > some of the database issues involved), I still have a few unanswered > questions. I would like to create a master table of parts derived > from the individual tables (e.g. resistors, capacitors, diodes, etc.). > I have the rare opportunity to build this from the ground, up and > would like to ensure that I get it right. Thanks for any advice that > you (or anyone) can offer. > > -- Gary Chambers