Обсуждение: Duplicates Processing

Поиск

Список

Период

Сортировка

Duplicates Processing

От

Gary Chambers

Дата:

08 октября 2010 г., 15:12:19

All,

I've been provided a CSV file of parts that contains duplicates of
properties (e.g. resistors have a wattage, tolerance, and temperature
coefficient property) of those parts that differ by a manufacturer
part number.  What I'd like to do is to process this file and, upon
encountering one of the duplicates, take that part with its new part
number and move it to a part substitutes table.  It seems like it
should be pretty simple, but I can't seem to generate a query or a
function to accomplish it.  I'd greatly appreciate any insight or
assistance with solving this problem.  Thank you very much in advance.

-- Gary Chambers

Re: Duplicates Processing

От

Tim Landscheidt

Дата:

08 октября 2010 г., 18:42:44

Gary Chambers <gwchamb@gmail.com> wrote:

> I've been provided a CSV file of parts that contains duplicates of
> properties (e.g. resistors have a wattage, tolerance, and temperature
> coefficient property) of those parts that differ by a manufacturer
> part number.  What I'd like to do is to process this file and, upon
> encountering one of the duplicates, take that part with its new part
> number and move it to a part substitutes table.  It seems like it
> should be pretty simple, but I can't seem to generate a query or a
> function to accomplish it.  I'd greatly appreciate any insight or
> assistance with solving this problem.  Thank you very much in advance.

You can - for example - create a query with a call to
ROW_NUMBER() and then process the matching rows (untested):

| INSERT INTO substitutes ([...])
|   SELECT [...] FROM
|     (SELECT *,
|             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
|                         ORDER BY part_number) AS RN
|      FROM parts) AS SubQuery
|   WHERE RN > 1;

| DELETE FROM parts
| WHERE primary_key IN
|   (SELECT primary_key FROM
|     (SELECT *,
|             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
|                                ORDER BY part_number) AS RN
|      FROM parts) AS SubQuery
|    WHERE RN > 1);

Tim

Re: Duplicates Processing

От

Gary Chambers

Дата:

08 октября 2010 г., 19:42:33

Tim,

Thanks for taking the time to reply!

> | INSERT INTO substitutes ([...])
> |   SELECT [...] FROM
> |     (SELECT *,
> |             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
> |                         ORDER BY part_number) AS RN
> |      FROM parts) AS SubQuery
> |   WHERE RN > 1;

> | DELETE FROM parts
> | WHERE primary_key IN
> |   (SELECT primary_key FROM
> |     (SELECT *,
> |             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
> |                                ORDER BY part_number) AS RN
> |      FROM parts) AS SubQuery
> |    WHERE RN > 1);

You have solved the problem precisely as I described it.  In my haste
to make the request for assistance, I omitted one critical piece of
information that may call into question my data model.  In its current
state, my substitute parts table contains only the part number (the
"new" one, so-to-speak), a foreign key reference to the original parts
table, and some location data (which is also in the original parts
table).  Is there any advice you can offer in light of what I have
just described? I apologize for the oversight.

-- Gary Chambers

Re: Duplicates Processing

От

Rob Sargent

Дата:

08 октября 2010 г., 20:07:20

On 10/08/2010 01:42 PM, Gary Chambers wrote:
> Tim,
> 
> Thanks for taking the time to reply!
> 
>> | INSERT INTO substitutes ([...])
>> |   SELECT [...] FROM
>> |     (SELECT *,
>> |             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
>> |                         ORDER BY part_number) AS RN
>> |      FROM parts) AS SubQuery
>> |   WHERE RN > 1;
> 
>> | DELETE FROM parts
>> | WHERE primary_key IN
>> |   (SELECT primary_key FROM
>> |     (SELECT *,
>> |             ROW_NUMBER() OVER (PARTITION BY wattage, tolerance, temperature
>> |                                ORDER BY part_number) AS RN
>> |      FROM parts) AS SubQuery
>> |    WHERE RN > 1);
> 
> You have solved the problem precisely as I described it.  In my haste
> to make the request for assistance, I omitted one critical piece of
> information that may call into question my data model.  In its current
> state, my substitute parts table contains only the part number (the
> "new" one, so-to-speak), a foreign key reference to the original parts
> table, and some location data (which is also in the original parts
> table).  Is there any advice you can offer in light of what I have
> just described? I apologize for the oversight.
> 
> -- Gary Chambers
> 

Perhaps a trade off between nullable fields and redundant types.  If
your original table simply had a nullable column called
isReplacementFor, into which you place in the subsequent rows the id of
the first instance found.

Re: Duplicates Processing

От

Gary Chambers

Дата:

08 октября 2010 г., 20:42:38

Rob,

> Perhaps a trade off between nullable fields and redundant types.  If
> your original table simply had a nullable column called
> isReplacementFor, into which you place in the subsequent rows the id of
> the first instance found.

Am I misunderstanding you when you're suggesting a table like:

part_number           INTEGER
is_replacement_for INTEGER references part_number
value                       INTEGER
wattage                   FLOAT8
...

-- Gary Chambers

Re: Duplicates Processing

От

Rob Sargent

Дата:

08 октября 2010 г., 21:34:56

Yes.  With this you can find all part numbers/supplies which match your
value, wattage criteria in one table. Or exclude any which have a
non-null is_replacement_for value.

If you need to drop the "replaceable" variant, you choose which of the
replacements to promote and update the others to match the new
"replaceable".  They're all instances of the same type of thing so in my
view they ought to live in the same table.

Also possible to maintain the replacement structure via a (self) join
record with replacable/is_replacement_for tuples.  You have a similar
but slightly more involve maintenance issue of course.

On 10/08/2010 02:42 PM, Gary Chambers wrote:
> Rob,
> 
>> Perhaps a trade off between nullable fields and redundant types.  If
>> your original table simply had a nullable column called
>> isReplacementFor, into which you place in the subsequent rows the id of
>> the first instance found.
> 
> Am I misunderstanding you when you're suggesting a table like:
> 
> part_number           INTEGER
> is_replacement_for INTEGER references part_number
> value                       INTEGER
> wattage                   FLOAT8
> ...
> 
> -- Gary Chambers

Re: Duplicates Processing

От

Gary Chambers

Дата:

08 октября 2010 г., 22:12:48

Rob,

> Yes.  With this you can find all part numbers/supplies which match your
> value, wattage criteria in one table. Or exclude any which have a
> non-null is_replacement_for value.

I understand -- thanks.  I have received contradictory advice in a
purely data modeling context.  What about the null values that will be
in the properties columns of the part?  It would appear to be more
applicable to an employee database where the columns are populated
regardless and the "replacement_for" in the context of our discussion
would be a self-reference to the employee's manager.  No?

Thanks again for your help.

-- Gary Chambers

Re: Duplicates Processing

От

Rob Sargent

Дата:

08 октября 2010 г., 23:52:06

My understanding was that the values were in fact in the data of the
"replacers".  If not, you are correct.  In this case the replacers are
more like alias for the only instance you have.

If the replacers are immutable by all means ship them off to some other
table (where I suppose the become pointers to other suppliers of the
type of thing holding the real data). Not sure I've ever met immutable
data but there you go ;)

And to your point of self-reference, it would be to a co-worker more
than a manager.  Managers are often not good replacements for workers. :)

On 10/08/2010 04:12 PM, Gary Chambers wrote:
> Rob,
> 
>> Yes.  With this you can find all part numbers/supplies which match your
>> value, wattage criteria in one table. Or exclude any which have a
>> non-null is_replacement_for value.
> 
> I understand -- thanks.  I have received contradictory advice in a
> purely data modeling context.  What about the null values that will be
> in the properties columns of the part?  It would appear to be more
> applicable to an employee database where the columns are populated
> regardless and the "replacement_for" in the context of our discussion
> would be a self-reference to the employee's manager.  No?
> 
> Thanks again for your help.
> 
> -- Gary Chambers

Re: Duplicates Processing

От

Gary Chambers

Дата:

12 октября 2010 г., 14:19:37

Rob,

Thanks for your reply!

> And to your point of self-reference, it would be to a co-worker more
> than a manager.  Managers are often not good replacements for workers. :)

:)  Absolutely!

I was having a conversation over on #postgresql yesterday about this
and, due to my inexperience with managing electronic components (and
some of the database issues involved), I still have a few unanswered
questions.  I would like to create a master table of parts derived
from the individual tables (e.g. resistors, capacitors, diodes, etc.).I have the rare opportunity to build this from
theground, up and 
would like to ensure that I get it right.  Thanks for any advice that
you (or anyone) can offer.

-- Gary Chambers

Re: Duplicates Processing

От

Rob Sargent

Дата:

12 октября 2010 г., 18:41:50

Gross generalization perhaps, but keep in mind what the over app/system
needs of the components.  Bounce those off you standard ER modeling
instincts and vice versa and you have a chance!

On 10/12/2010 08:19 AM, Gary Chambers wrote:
> Rob,
> 
> Thanks for your reply!
> 
>> And to your point of self-reference, it would be to a co-worker more
>> than a manager.  Managers are often not good replacements for workers. :)
> 
> :)  Absolutely!
> 
> I was having a conversation over on #postgresql yesterday about this
> and, due to my inexperience with managing electronic components (and
> some of the database issues involved), I still have a few unanswered
> questions.  I would like to create a master table of parts derived
> from the individual tables (e.g. resistors, capacitors, diodes, etc.).
>  I have the rare opportunity to build this from the ground, up and
> would like to ensure that I get it right.  Thanks for any advice that
> you (or anyone) can offer.
> 
> -- Gary Chambers

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Duplicates Processing