Обсуждение: how to group by similarity ?
Hi, I'm trying to get an idea about pg_trgrm. I created a GIST index on a text column in a table. Now I can filter the table with similarity(). How would I group the table so that it shows groups that have similarity () > x ? Lets say the table looks like this: id, txt 1, aa1 2, bb1 3, cc1 4, bb2 5, bb3 6, aa2 ... How would a select look like that shows: id, txt, group_id 1, aa1, 1, 6, aa2, 1, 2, bb1, 2, 4, bb2, 2, 5, bb3, 2, 3, cc1, 3
Andreas <maps.on@gmx.net> writes: > How would I group the table so that it shows groups that have > similarity () > x ? > > Lets say the table looks like this: > > id, txt > 1, aa1 > 2, bb1 > 3, cc1 > 4, bb2 > 5, bb3 > 6, aa2 > ... > > How would a select look like that shows: > > id, txt, group_id > 1, aa1, 1, > 6, aa2, 1, > 2, bb1, 2, > 4, bb2, 2, > 5, bb3, 2, > 3, cc1, 3 I could only come up with this convoluted query: WITH grp (t1, id, t2) AS ( SELECT t1.txt, t1.id, t2.txt FROM tbl t1 LEFT JOIN tbl t2 ON t2.txt > t1.txt WHERE t2.txt IS NULL OR similarity(t1.txt, t2.txt) > 0 ) SELECT t1, min(id) FROM ( SELECT t1, id FROM grp UNION ALL SELECT t2, id FROM grp WHERE t2 IS NOT NULL ) dummy GROUP BY t1 ORDER BY t1
Andreas <maps.on@gmx.net> writes: > How would I group the table so that it shows groups that have > similarity () > x ? > > Lets say the table looks like this: > > id, txt > 1, aa1 > 2, bb1 > 3, cc1 > 4, bb2 > 5, bb3 > 6, aa2 > ... > > How would a select look like that shows: > > id, txt, group_id > 1, aa1, 1, > 6, aa2, 1, > 2, bb1, 2, > 4, bb2, 2, > 5, bb3, 2, > 3, cc1, 3 The following query will do that, but it's convoluted: WITH grp (t1, id, t2) AS ( SELECT t1.txt, t1.id, t2.txt FROM tbt t1 LEFT JOIN tbt t2 ON t2.txt > t1.txt WHERE t2.txt IS NULL OR similarity(t1.txt, t2.txt) > 0 ) SELECT t1, min(id) FROM ( SELECT t1, id FROM grp UNION ALL SELECT t2, id FROM grp WHERE t2 IS NOT NULL ) dummy GROUP BY t1 ORDER BY t1
On Wed, Apr 25, 2012 at 8:34 PM, <hari.fuchs@gmail.com> wrote: > Andreas <maps.on@gmx.net> writes: > >> How would I group the table so that it shows groups that have >> similarity () > x ? >> >> Lets say the table looks like this: >> >> id, txt >> 1, aa1 >> 2, bb1 >> 3, cc1 >> 4, bb2 >> 5, bb3 >> 6, aa2 >> ... >> >> How would a select look like that shows: >> >> id, txt, group_id >> 1, aa1, 1, >> 6, aa2, 1, >> 2, bb1, 2, >> 4, bb2, 2, >> 5, bb3, 2, >> 3, cc1, 3 > Hey guys. I have a similar problem and I tried a couple of ways to solve this including the window function described in the answer to the original poster in this thread. The problem I am having is that even with a trigam index and a table with only 80,000 records the query takes forever to run. In both cases I ended the query manually and have no idea how long it would actually take to run. I have included the two queries below and am hoping somebody can give me a pointer on how to accomplish with a query that runs. Query 1 WITH grp (t1, id, t2) AS ( SELECT t1.raw_data, t1.id, t2.raw_data FROM schema.a t1 LEFT JOIN schema.a t2 ON t2.raw_data > t1.raw_data WHERE t2.raw_data IS NULL OR similarity(t1.raw_data, t2.raw_data) > .75 ) SELECT t1, min(id) FROM ( SELECT t1, id FROM grp UNION ALL SELECT t2, id FROM grp WHERE t2 IS NOT NULL ) dummy GROUP BY t1 ORDER BY t1 query 2 select similarity(a.raw_data,b.raw_data),* from schema.a a, schema.a b where similarity(a.raw_data,b.raw_data) > .75 and a.id != b.id