Обсуждение: Enforce Symmetric Matrix

Поиск
Список
Период
Сортировка

Enforce Symmetric Matrix

От
Randy Westlund
Дата:
I have a table with N (over 35k) rows.  I need to compare each of
those elements to every other element in the list, which is the
handshake problem.

This results in a symmetric matrix (an adjacency matrix for an
undirected graph).  Because all relations are symmetric, I only need to
store the upper triangle of this matrix.  A ~ B and B ~ A are the same.

I need some advice on how to implement this.  At first, I thought that
I'd only store the upper triangle, resulting in N^2 / 2 rows of the
form:

id1   id2   value
---- ----   -----
A     B      1
A     C      2
A     D      3
B     C      4
B     D      5
C     D      6

Where value is the result of an expensive computation, and with the
constraint that id1 < id2.

But there are problems.  To get a list of all things compared to B, I
have to look in both columns.  Complicated queries with selects nested
inside updates become very difficult or impossible to do in one go.  I'd
also need to index both columns, which seems like a waste.

So I thought maybe I'd just store the full matrix, with A,B and B,A
having the same data

id1   id2   value
---- ----   -----
A     B       1
A     C       2
A     D       3
B     A       1
B     C       4
B     D       5
C     A       2
C     B       4
C     D       6

With the constraint that id1 != id2 because I don't need the diagonal.

The table is twice as big, but still O(n^2).  This would let me get the
list of all things compared to B with SELECT ... WHERE id1 = B, which is
super easy.

My problem here is that I'm not sure how to enforce the rule that A,B
and B,A have the same value.  I want to use a rule or trigger such that
when row A,B is updated or inserted, row B,A is updated or inserted with
the same date.  But then it would get called recursively, and that
doesn't work.

For my application, the total number of items I have (N) will be growing
over time and this table will need to be updated accordingly.

This seems like a problem that many people must have come across before,
but I've been strangely unable to find advice online.  Any
recommendations?

Вложения

Re: Enforce Symmetric Matrix

От
David G Johnston
Дата:
SELECT l.id, u.id, func(l.id, u.id)
FROM ids l CROSS JOIN ids u
WHERE l.id < u.id

Depending on whether you always update a known pair, or instead invalidate
all rows where either id is a given value, you can use various means to
manage the resultant materialized view.  Triggers or interface functions
mainly.

Without calling the value function you would also know, at any given time,
whether a given pair is present.  The usefulness of this depends on how
real-time you need the updates to be; which is a trade-off with performance
during changes.

Adding a simple limit on the two ids sub-queries, and doing the incremental
add in a loop, you can appropriately scale the updates to limit memory usage
during the bulk load phase.  Likely ongoing updates will not have the same
requirement since you only have N updates instead of N^2/2; but can be done
all the same.

SELECT LID, UID, FUNC(lid, uid) FROM
SELECT CASE WHEN c1 < c2 THEN c1 ELSE c2 END AS LID , CASE WHEN c1 < c2 THEN
c2 ELSE c1 END AS UID FROM
SELECT * FROM -> WHERE c1 <> c2
SELECT :newval AS c1, ids.id AS c2 FROM ids

David J.



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Enforce-Symmetric-Matrix-tp5803064p5803126.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Enforce Symmetric Matrix

От
Randy Westlund
Дата:
On Wed, May 07, 2014 at 04:07:22PM -0500, John McKown wrote:
> On Wed, May 7, 2014 at 3:36 PM, Randy Westlund <rwestlun@gmail.com> wrote:
>
> > I have a table with N (over 35k) rows.  I need to compare each of
> > those elements to every other element in the list, which is the
> > handshake problem.
> >
> > This results in a symmetric matrix (an adjacency matrix for an
> > undirected graph).  Because all relations are symmetric, I only need to
> > store the upper triangle of this matrix.  A ~ B and B ~ A are the same.
> >
> > I need some advice on how to implement this.
>
> Just a crazy idea, but have you considered something like the below:
>
> create view view1 as select id1 as v_id1, id2 as v_id2, value from abovetbl
> union select id2, id1, value from abovetbl;
>
> The above basically "expands" the "small matrix" table to have the same
> values that the "full matrix" table would have. because id1, id2 has the
> same "value" as id2, id1 .
>
> You can now create an index on "view1" like:
> create index view1_index on view1(v_id1);
>
> To find all the values for "B", you can now:
> select * from view1 where v_id1='B';
>
> The view does not take up much space. But the index on the view might. But
> only about as much space as an index on the "full matrix" table would.  Oh,
> I changed the column names in the view so that it would hopefully not be as
> confusing has having multiple "id1" and "id2" column names with different
> meanings.
>
> I'm not a real heavy, but this may be of some use.
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown

This was exactly what I needed, thank you.  Everything is working now :)


Вложения