Hi all,
in order to better understand this feature of 10, I've created a table
like this:
CREATE TABLE expenses(
...
day date,
year int,
CHECK( year = EXTRACT( year FROM day ) )
);
so that I can ensure 'year' and 'day' are tied together:
SELECT
count(*) FILTER( WHERE year = 2016 ) AS by_year,
count(*) FILTER( WHERE EXTRACT( year FROM day ) = 2016 ) AS by_day
FROM expenses;
-[ RECORD 1 ]-
by_year | 8784
by_day | 8784
Then I created a statistic:
CREATE STATISTICS stat_day_year ( dependencies )
ON day, year
FROM expenses;
select * from pg_statistic_ext ;
-[ RECORD 1 ]---+---------------------
stxrelid | 42833
stxname | stat_day_year
stxnamespace | 2200
stxowner | 16384
stxkeys | 3 5
stxkind | {f}
stxndistinct |
stxdependencies | {"3 => 5": 1.000000}
Now, having an index on the extract year of day as follows:
CREATE INDEX idx_year
ON expenses ( EXTRACT( year FROM day ) );
why is the planner not choosing to use such index on a 'year' raw query?
EXPLAIN SELECT * FROM expenses
WHERE year = 2016;
QUERY PLAN
-------------------------------------------------------------------------------
Gather (cost=1000.00..92782.34 rows=8465 width=32)
Workers Planned: 2
-> Parallel Seq Scan on expenses (cost=0.00..90935.84 rows=3527 width=32)
Filter: (year = 2016)
The number of rows are correct, but I was expecting it to use the same
index as a query like "WHERE EXTRACT( year FROM day) = 2016" triggers.
Even altering the year column to not null does show any change, and
this is the plan obtained turning off seq_scan (to see the costs):
EXPLAIN ANALYZE SELECT * FROM expenses
WHERE year = 2016;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on expenses (cost=10000000000.00..10000127402.44 rows=8451
width=32) (actual time=972.734..2189.300 rows=8784 loops=1)
Filter: (year = 2016)
Rows Removed by Filter: 4991216
Am I misunderstaing this functional dependency?
Thanks,
Luca