Обсуждение: subquery returning array
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! I have posted this as a user comment in section 9.17: - ---- = ANY(array expression) doesn't work in the obvious way when the array expression is a subselect. For example: select * from stat3 where stat3.id = any ('{4,5,6,7}'); works (and returns the expected tuples). However, select * from stat3 where stat3.id = any (select stat3 from helix_request where id=11); DOESN'T work and complains with an error: operator does not exist: integer = integer[]. The inner select returns EXACTLY ONE value, namely the same array as the literal in the first example. The solution is: select * from stat3 where (select stat3.id = any (stat3) from helix_request where id=11); I thank to Chris Kings-Lynne ("KL") for helping me out with this over IRC. - ---- KL suggested to mail this question to the hackers list. The problem with this solution is that postgresql uses sequential scan for the proposed solution: Seq Scan on stat3 (cost=0.00..40018.94 rows=3321 width=32) (actual time=0.112..75.911 rows=4 loops=1) Filter: (subplan) SubPlan -> Index Scan using helix_request_pkey on helix_request (cost=0.00..6.01 rows=1 width=32) (actual time=0.007..0.008 rows=1 loops=6756) Index Cond: (id = 11)Total runtime: 76.040 ms (6 rows) If I write select * from stat3 where id in (4,5,6,7) and index scan is used:Index Scan using stat3_pkey, stat3_pkey, stat3_pkey,stat3_pkey on stat3 (cost=0.00..23.94 rows=4 width=32) (actual time=0.066..0.091 rows=4 loops=1) Index Cond: ((id = 4) OR (id = 5) OR (id = 6) OR (id = 7))Total runtime: 0.164 ms I'm going to have MANY queries of this kind and having sequential scan for each such query will lead to quadratic performance - unacceptable for the amount of data I will have (millions of rows). Is there yet another way of making WHERE field = ANY (subselect returning an array) work? Or make postgres to use index? OK, I know the suggestion from the manual: usually it's bad database design searching through arrays and a separate table is better. however, to convert {4,5,6,7} into a table would require a table with two columns, e.g. (1,4), (1,5), (1,6), (1,7) where the first column would be used to identify elements in the single array). this additional column is a waste of space. KL has mentioned a package for converting an array into a tuple. Where can I find it? Also, what is the limit on the number of elements in the IN (...) condition before the database resorts to sequential scan? PS: Please reply also via e-mail as I'm not a regular subscriber of this list. Thanks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.7 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iQCVAwUBQpF0U2nN2B8CwPRmAQKN8gP+JCzLiX5b48kMYmHRwTSFZWN5Jydfw0iH MABuYj2mKCY9Dgmd4pLK8Xlxhf/tEYzd3N2lcPFYf1vIXCSpbFasRrO3hJ4WjRLr MZ6MLXCn59Y8wtd8Iz8uug47XuYVGSreZESKA9tRfl+u8t16nPC9nEMyDNDFIRI9 ZOnBMCnQzrY= =5qmB -----END PGP SIGNATURE-----
Zeljko Vrba wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi! I have posted this as a user comment in section 9.17: > > - ---- > = ANY(array expression) doesn't work in the obvious way when the array > expression is a subselect. For example: > > select * from stat3 where stat3.id = any ('{4,5,6,7}'); > > works (and returns the expected tuples). However, > > select * from stat3 where stat3.id = any (select stat3 from > helix_request where id=11); > > DOESN'T work and complains with an error: operator does not exist: > integer = integer[]. The inner select returns EXACTLY ONE value, namely > the same array as the literal in the first example. No, it doesn't - hence the error. What it returns is a SET containing ONE TUPLE which contains ONE ARRAY. Here the "any" ranges over the SET not the array, so it tries to compare the outer integer with the inner array of integers. > The solution is: > > select * from stat3 where (select stat3.id = any (stat3) from > helix_request where id=11); > > I thank to Chris Kings-Lynne ("KL") for helping me out with this over IRC. > - ---- > > KL suggested to mail this question to the hackers list. The problem with > this solution is that postgresql uses sequential scan for the proposed > solution: [snip] > I'm going to have MANY queries of this kind and having sequential scan > for each such query will lead to quadratic performance - unacceptable > for the amount of data I will have (millions of rows). > > Is there yet another way of making WHERE field = ANY (subselect > returning an array) work? Or make postgres to use index? > > OK, I know the suggestion from the manual: usually it's bad database > design searching through arrays and a separate table is better. however, > to convert {4,5,6,7} into a table would require a table with two > columns, e.g. (1,4), (1,5), (1,6), (1,7) where the first column would be > used to identify elements in the single array). this additional column > is a waste of space. You're trading space for time, and deliberately spoiling your design to do so. If you're going to want to access each value separately, design your system to reflect that fact. Have you actually tested your example with a proper two-column table? It might be that weaknesses in PostgreSQL force you to compromise your design, but I wouldn't start from that point. Try a proper relational design, with a bit of tuning and see if that will work for you. If you really want to use arrays, perhaps see if the contrib/intarray module can help you. -- Richard Huxton Archonet Ltd
Zeljko Vrba <zvrba@ifi.uio.no> writes: > Is there yet another way of making WHERE field = ANY (subselect > returning an array) work? Or make postgres to use index? You could use the int_array_enum() function from the contrib/int_agg module. > Also, what is the limit on the number of elements in the IN (...) condition > before the database resorts to sequential scan? It depends on the table. If, for example, there's little free space in your table and the records are very narrow then sequential scans will be especially efficient since each i/o will read in many records. An index scan of any significant size would likely have to read nearly every page in multiple times. The problem in your case is that Postgres has no idea how large an array it's going to find when it's doing the planning. I'm not sure what the fallback logic is, apparently it's assuming a fairly low selectivity for the =ANY constraint. Actually I think there was some discussion a while back about making =ANY a little less pessimistic about the selectivity. Perhaps 8.1 will be better for this. If you're still in development with a long timeframe you could try a CVS build to see if that's the case. -- greg