Some questions on the following situation related to my
recent previous thread. I'm sorry to keep pushing on this
but I'd like to use and recommend pgsql to the astronomical
community as an option for serving large databases but need
to make sure it will work for typical queries.
Say I have three float fields and I want to select records
according to combinations of ranges in each of them. I made what
I believe are appropriate indices but I don't seem to understand
when multiple index threads are used. For example, using the
three-variable btree index on float fields:
create index m_col on lmctot using btree (j_m, h_m, k_m);
and an index on each of the three individually:
create index j on lmctot using btree (j_m, h_m, k_m);
create index h on lmctot using btree (j_m, h_m, k_m);
create index k on lmctot using btree (j_m, h_m, k_m);
I would think that the multiple index would expedite a query
such as
select count(*) from lmctot where
j_m > '5.5' and h_m > '5.5' and k_m > '5.5';
but explain suggests that the index jm is used:
lmc=> explain select count(*) from lmctot where j_m > '5.5' and h_m > '5.5' and k_m > '5.5';
NOTICE: QUERY PLAN:
Aggregate (cost=187116.91 rows=262700 width=4)
-> Index Scan using j on lmctot (cost=187116.91 rows=262700
width=4)
EXPLAIN
Such as query takes about 5 minutes (450Mhz PII w/Linux).
Equalities rather than inequalities go much much faster.
Does anyone know what the best strategy is here? Bruce Momjian
pointed out that a cluster will help enormously on one of the
fields but not on multiple fields.
Will pgsql will work efficiently on queries such as these for
very large databases. E.g. 20 to 200 million records. Should I
be considering an non-pgsql database for this sort of
application?
TIA,
--Martin