Обсуждение: cube operations
Hi, I have a array column which has 12 real values in it. Basically these values represent co-ordinates in 12 dimensions for a substance. My main need is to find substances similar to a particular compound. Now I can do by calculating differences with each array in the whole table. But the table has millions of rows. So I need some kinda higher dimensional index. I have read about the cube operation in postgre, can it be extended to 12 dimensions or something like that. Thanks Abhang
2007/5/16, ABHANG RANE <arane@indiana.edu>: > Hi, > I have a array column which has 12 real values in it. Basically these > values represent co-ordinates in 12 dimensions for a substance. My main > need is to find substances similar to a particular compound. Now I can > do by calculating differences with each array in the whole table. But > the table has millions of rows. So I need some kinda higher dimensional > index. I have read about the cube operation in postgre, can it be > extended to 12 dimensions or something like that. Don't know if this helps, but have a look at intarray: http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/intarray/ If you feel brave you could take this code and try to write some proximity- or similarity-checking functions in C to speedup the calculations. Also consider representing values by integers, since integer operations are much faster. -- Filip Rembiałkowski
ABHANG RANE wrote: > I have a array column which has 12 real values in it. Basically > these values represent co-ordinates in 12 dimensions for a > substance. My main need is to find substances similar to a > particular compound. Now I can do by calculating differences with > each array in the whole table. But the table has millions of rows. > So I need some kinda higher dimensional index. Is there any particular reason you're using an array? If every row has all twelve values, I'd just make them columns. Then I could use a multi-column index. > I have read about the cube operation in postgre, can it be extended > to 12 dimensions or something like that. I have no experience with CUBE, but I think it's just a kind of summarization aggregate. It sounds like you want the Nearest Neighbor(s) of your "particular compound". You might to read about that: http://en.wikipedia.org/wiki/Nearest_neighbor_search - John Burger G63
hacking contrib/intarray could help you. You need to add function which return the number of overlapped elements. Oleg On Wed, 16 May 2007, John D. Burger wrote: > ABHANG RANE wrote: > >> I have a array column which has 12 real values in it. Basically these >> values represent co-ordinates in 12 dimensions for a substance. My main >> need is to find substances similar to a particular compound. Now I can do >> by calculating differences with each array in the whole table. But the >> table has millions of rows. So I need some kinda higher dimensional index. > > Is there any particular reason you're using an array? If every row has all > twelve values, I'd just make them columns. Then I could use a multi-column > index. > >> I have read about the cube operation in postgre, can it be extended to 12 >> dimensions or something like that. > > I have no experience with CUBE, but I think it's just a kind of summarization > aggregate. > > It sounds like you want the Nearest Neighbor(s) of your "particular > compound". You might to read about that: > > http://en.wikipedia.org/wiki/Nearest_neighbor_search > > - John Burger > G63 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Hi, But now having 12 columns and multicolumn index, wont this slow down the search process. I mean in general retrieving 12 columns using a multicolumn index is slower or faster compared to an index on a 12 size array? Thanks Abhang Quoting "John D. Burger" <john@mitre.org>: > ABHANG RANE wrote: > >> I have a array column which has 12 real values in it. Basically >> these values represent co-ordinates in 12 dimensions for a >> substance. My main need is to find substances similar to a >> particular compound. Now I can do by calculating differences with >> each array in the whole table. But the table has millions of rows. >> So I need some kinda higher dimensional index. > > Is there any particular reason you're using an array? If every row > has all twelve values, I'd just make them columns. Then I could use > a multi-column index. > >> I have read about the cube operation in postgre, can it be extended >> to 12 dimensions or something like that. > > I have no experience with CUBE, but I think it's just a kind of > summarization aggregate. > > It sounds like you want the Nearest Neighbor(s) of your "particular > compound". You might to read about that: > > http://en.wikipedia.org/wiki/Nearest_neighbor_search > > - John Burger > G63 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >