On Mon, Aug 29, 2016 at 11:48 PM, <chenkaijiang@gmail.com> wrote:
>
> the explain result:
>
> explain select * from renren.user_relations where parent_id=846346 order by
> user_id limit 10;
>
> QUERY PLAN
> ------------------------------------------------------------
> -------------------------------------------------------
> Limit (cost=4.57..442.35 rows=10 width=102)
> -> Merge Append (cost=4.57..496534.92 rows=11342 width=102)
> Sort Key: user_relations.user_id
>
...
>
> It uses the Index Scan using index on user_id, which is very stupid.
>
This a classic planning problem with ORDER BY...LIMIT. Probably parent_id
is correlated with user_id, and if you pick a high value of parent_id then
you are implicitly getting high values of user_id. But PostgreSQL doesn't
know that, it assumes things with parent_id=846346 are randomly dispersed
over the user_id values, and so it will gather 10 of them very quickly by
walking the indexes in order.
>
> If I explain select * from renren.user_relations where parent_id=846346
> order by user_id, then it uses the index on parent_id to get records and
> then sort it, which is very wise since the number of qualified records is
> 1725.
>
You know it is 1725, but PostgreSQL thinks it is 11342. Is autoanalyze
analyzing often enough? Is default_statistics_target high enough?
(Although if I'm right about the correlation between parent_id and
user_id, then fixing that estimate might still not be enough to fix things).
> So, I think the optimizer/planner has a performance bug with LIMIT clause.
>
Well, it has to make decisions with the information available to it. That
is not really a bug. It is constantly being improved, but will never be
perfect.
Cheers,
Jeff