Обсуждение: Detect which sum of cartesian product (+ any combination of n records in tables) exceeds a value

Поиск
Список
Период
Сортировка
Hi all,

I hope someone can helps me....


I have a problem detecting a sum of cartesian product of tables.

----------------------------
Test case:

//CREATE TABLES

create table t1 (
id  serial,
field_1 integer);

create table t2 (
id  serial,
field_1 integer);

create table t3 (
id  serial,
field_1 integer);

create table t4 (
id  serial,
field_1 integer);


//FILL TABLES

insert into t1 (field_1) select cast(random()*10 as integer) from 
generate_series(1,10);
insert into t2 (field_1) select cast(random()*10 as integer) from 
generate_series(1,10);
insert into t3 (field_1) select cast(random()*10 as integer) from 
generate_series(1,10);
insert into t4 (field_1) select cast(random()*10 as integer) from 
generate_series(1,10);

--------------------

Example: i have 4 tables with fields, i would detect which combination 
of field_1 in any table exceed a value (eg. 35).

Simple, ugly & slow but simple:

select * from t1, t2,t3,t4  where t1.field_1 + t2.field_1 + t3.field_1 + 
t4.field_1 >35

It works.



Now my question: i would determine which combination on field_1 of 
t1,t2,t3 plus a combination(any)  of 2 records on field_1 of  t4, 
exceeds a value (eg. 35)

It should be something like  t1.field_1 + t2.field_1 + t3.field_1 + ( 
any combination of  2 records of t4.field_1) > 35


Suppose i have these records in tables (field_1), for simple explain of 
my problem:

t1 = 1
t2 = 5
t3 = 4
t4 = 1,3,4

the combination of 2 record on t4.field_1 should be:

1+5+4 +  ( 1+3)
1+5+4 +  ( 1+4)
1+5+4 +  ( 3+1)
1+5+4 +  ( 3+4)
1+5+4 +  ( 4+1)
1+5+4 +  ( 4+3)


How to do it???


This is a static test case with a static (2 records) problem, in my 
production db it could be any combination (2,3,4,5+ records ) of field_1 
of any table.



Hope I was clear,

Best regards and thanks in advance,

Agharta










Re: Detect which sum of cartesian product (+ any combination of n records in tables) exceeds a value

От
"David G. Johnston"
Дата:
<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span
style="font-family:arial,sans-serif">OnThu, Mar 19, 2015 at 9:30 AM, agharta </span><span dir="ltr"
style="font-family:arial,sans-serif"><<ahref="mailto:agharta82@gmail.com"
target="_blank">agharta82@gmail.com</a>></span><spanstyle="font-family:arial,sans-serif"> wrote:</span><br
/></div><divclass="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><br /> It should be something like  t1.field_1 + t2.field_1 +
t3.field_1+ ( any combination of  2 records of t4.field_1) > 35<br /><span class="HOEnZb"><font color="#888888"><br
/></font></span></blockquote></div><br/></div><div class="gmail_extra"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">​Icould probably brute-force write such a query in maybe a half-hour.  I
likelywould not be alive if I tried executing it on any non-trivial sized database though.</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Asan algorithm:</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Createtwo relations (temp tables/views/materialized views), one for
t1/t2/t3and one for t4/t4 each having a single row for every potential combination of rows.  Each table would
contributetwo values, the content of "field_1" and the primary key of the corresponding table.  The new PK would be a
compositeof all the contributing PKs</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br
/></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">For each relation, if the sum of the
valuecolumns is > 35 then every single row from the other table will provide a match.  This is your first
output.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif">Cross Join the two relations, after removing those
ineach that were matched above, and sum together all 5 fields.  This is your second output.</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">UnionAll the two outputs together and you have your result.</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Itcan be done in one step but this at least gives you a prayer of
executingin reasonable time for meaningfully sized datasets.  You can just write the second part and avoid the union
untilyour data warrants the more complex, but likely faster, setup.</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">DavidJ.</div></div></div> 
<div class="moz-cite-prefix"><br /> On 03/19/2015 06:05 PM, David G. Johnston wrote:<br /></div><blockquote
cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com"type="cite"><div dir="ltr"><div
class="gmail_extra"><br/></div><div class="gmail_extra"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">​Ilikely would not be alive if I tried executing it on any non-trivial
sizeddatabase though.</div></div></div></blockquote><br /> Me too :) !<br /><br /><blockquote
cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com"type="cite"><div dir="ltr"><div
class="gmail_extra"><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif">As an algorithm:</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Createtwo relations (temp tables/views/materialized views), one for
t1/t2/t3and one for t4/t4 each having a single row for every potential combination of rows.  Each table would
contributetwo values, the content of "field_1" and the primary key of the corresponding table.  The new PK would be a
compositeof all the contributing PKs</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br
/></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">For each relation, if the sum of the
valuecolumns is > 35 then every single row from the other table will provide a match.  This is your first
output.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif">Cross Join the two relations, after removing those
ineach that were matched above, and sum together all 5 fields.  This is your second output.</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">UnionAll the two outputs together and you have your result.</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Itcan be done in one step but this at least gives you a prayer of
executingin reasonable time for meaningfully sized datasets.  You can just write the second part and avoid the union
untilyour data warrants the more complex, but likely faster, setup.</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">DavidJ.</div></div></div></blockquote><br /> You're right, this should
bethe fastest implementation possible, but cross/cartesian matching is very slow with a huge amount of data (it is
natural).<br /><br /> I think that a simple & dynamic (t4/t4/t4/t4... n times) solution is not possible, as 9.4 PG
version.Correct me if i am wrong.<br /><br /><div id="gt-src-tools"><div id="gt-src-tools-l"><div id="gt-input-tool"
style="display:inline-block;"><div id="itamenu"><span class="ita-kd-inputtools-div"></span></div></div></div></div><div
class="almost_half_cell"id="gt-res-content"><div dir="ltr" style="zoom:1"><span class="short_text" id="result_box"
lang="en"><spanclass="hps alt-edited">I hoped</span> <span class="hps">that there was</span> <span class="hps">a
magic-trick-functionthat would resolve the problem. Nope. :(<br /><br /> I need to review & rewrite my
db/applicationto solve the problem in another way. <br /><br /></span></span><br /><span class="short_text"
id="result_box"lang="en"><span class="hps"><span class="short_text" id="result_box" lang="en"><span class="hps">I owe
you</span><span class="hps">a beer</span></span>, thanks a lot for your suggestions.<br /><br /><br /> Cheers,<br /><br
/>Agharta<br /><br /><br /></span></span></div></div><br /> 
<div class="moz-text-html" lang="x-unicode"><div class="moz-cite-prefix"><br /> On 03/19/2015 06:05 PM, David G.
Johnstonwrote:<br /></div><blockquote cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com"
type="cite"><divdir="ltr"><div class="gmail_extra"><br /></div><div class="gmail_extra"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">​Ilikely would not be alive if I tried executing it on any non-trivial
sizeddatabase though.</div></div></div></blockquote><br /> Me too :) !<br /><br /><blockquote
cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com"type="cite"><div dir="ltr"><div
class="gmail_extra"><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif">As an algorithm:</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Createtwo relations (temp tables/views/materialized views), one for
t1/t2/t3and one for t4/t4 each having a single row for every potential combination of rows.  Each table would
contributetwo values, the content of "field_1" and the primary key of the corresponding table.  The new PK would be a
compositeof all the contributing PKs</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br
/></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">For each relation, if the sum of the
valuecolumns is > 35 then every single row from the other table will provide a match.  This is your first
output.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif">Cross Join the two relations, after removing those
ineach that were matched above, and sum together all 5 fields.  This is your second output.</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">UnionAll the two outputs together and you have your result.</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Itcan be done in one step but this at least gives you a prayer of
executingin reasonable time for meaningfully sized datasets.  You can just write the second part and avoid the union
untilyour data warrants the more complex, but likely faster, setup.</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">DavidJ.</div></div></div></blockquote><br /> You're right, this should
bethe fastest implementation possible, but cross/cartesian matching is very slow with a huge amount of data (it is
natural).<br /><br /> I think that a simple & dynamic (t4/t4/t4/t4... n times) solution is not possible, as 9.4 PG
version.Correct me if i am wrong.<br /><br /><div id="gt-src-tools"><div id="gt-src-tools-l"><div id="gt-input-tool"
style="display:inline-block;"><div id="itamenu"><span class="ita-kd-inputtools-div"></span></div></div></div></div><div
class="almost_half_cell"id="gt-res-content"><div dir="ltr" style="zoom:1"><span class="short_text" id="result_box"
lang="en"><spanclass="hps alt-edited">I hoped</span> <span class="hps">that there was</span> <span class="hps">a
magic-trick-functionthat would resolve the problem. Nope. :(<br /><br /> I need to review & rewrite my
db/applicationto solve the problem in another way. <br /><br /></span></span><br /><span class="short_text"
id="result_box"lang="en"><span class="hps"><span class="short_text" id="result_box" lang="en"><span class="hps">I owe
you</span><span class="hps">a beer</span></span>, thanks a lot for your suggestions.<br /><br /><br /> Cheers,<br /><br
/>Agharta<br /><br /><br /></span></span></div></div><br /></div>