Обсуждение: LIMIT confuses the planner
Hello, I'm experiencing a strange issue. I have a table with around 11 million records (11471762 to be exact), storing login attempts to a web site. Thanks to the index I have created on username, looking into that table by username is very fast: db=# EXPLAIN ANALYZE SELECT * FROM login_attempt WHERE username='kouber' ORDER BY login_attempt_sid DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=1415.15..1434.93 rows=7914 width=38) (actual time=0.103..0.104 rows=2 loops=1) Sort Key: login_attempt_sid Sort Method: quicksort Memory: 25kB -> Index Scan using login_attempt_username_idx on login_attempt (cost=0.00..902.71 rows=7914 width=38) (actual time=0.090..0.091 rows=2 loops=1) Index Cond: ((username)::text = 'kouber'::text) Total runtime: 0.140 ms (6 rows) As you can see, there are only 2 records for that particular username. However when I add a LIMIT clause to the same query the planner no longer uses the right index, hence the query becomes very slow: db=# EXPLAIN ANALYZE SELECT * FROM login_attempt WHERE username='kouber' ORDER BY login_attempt_sid DESC LIMIT 20; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..770.45 rows=20 width=38) (actual time=0.064..3797.660 rows=2 loops=1) -> Index Scan Backward using login_attempt_pkey on login_attempt (cost=0.00..304866.46 rows=7914 width=38) (actual time=0.062..3797.657 rows=2 loops=1) Filter: ((username)::text = 'kouber'::text) Total runtime: 3797.691 ms (4 rows) Now, recently I have altered some of the default parameters in order to get as much as possible out of the hardware - 12 GB of RAM, 8 processors. So, I guess I have done something wrong, thus the planner is taking that wrong decision. Here's what I have changed in postgresql.conf (from the default one): max_connections = 200 shared_buffers = 256MB work_mem = 64MB maintenance_work_mem = 128MB max_stack_depth = 6MB max_fsm_pages = 100000 synchronous_commit = off wal_buffers = 1MB commit_delay = 100 commit_siblings = 5 checkpoint_segments = 10 checkpoint_timeout = 10min random_page_cost = 0.1 effective_cache_size = 2048MB Any idea what's wrong here? Regards, -- Kouber Saparev http://kouber.saparev.com
Kouber Saparev wrote: > db=# EXPLAIN ANALYZE > SELECT > * > FROM > login_attempt > WHERE > username='kouber' > ORDER BY > login_attempt_sid DESC; > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > Sort (cost=1415.15..1434.93 rows=7914 width=38) (actual > time=0.103..0.104 rows=2 loops=1) > Sort Key: login_attempt_sid > Sort Method: quicksort Memory: 25kB > -> Index Scan using login_attempt_username_idx on login_attempt > (cost=0.00..902.71 rows=7914 width=38) (actual time=0.090..0.091 rows=2 > loops=1) > Index Cond: ((username)::text = 'kouber'::text) > Total runtime: 0.140 ms It's expecting 7914 rows returned and is getting only 2. That is probably the root of the problem. > However when I add a LIMIT clause to the same query the planner no > longer uses the right index, hence the query becomes very slow: > > > db=# EXPLAIN ANALYZE > SELECT > * > FROM > login_attempt > WHERE > username='kouber' > ORDER BY > login_attempt_sid DESC LIMIT 20; Since it's expecting 7914 rows for "kouber" it thinks it will find the 20 rows you want fairly quickly by just looking backward through the login_attempt_pkey index. Try increasing the stats on the username column. ALTER TABLE login_attempt ALTER COLUMN username SET STATISTICS 100; ANALYZE login_attempt; You can try different values of statistics up to 1000, but there's no point in setting it too high. -- Richard Huxton Archonet Ltd
On Mon, Feb 23, 2009 at 7:26 AM, Kouber Saparev <kouber@saparev.com> wrote: > Now, recently I have altered some of the default parameters in order to get > as much as possible out of the hardware - 12 GB of RAM, 8 processors. So, I > guess I have done something wrong, thus the planner is taking that wrong > decision. Here's what I have changed in postgresql.conf (from the default > one): > > max_connections = 200 > shared_buffers = 256MB > work_mem = 64MB > maintenance_work_mem = 128MB > max_stack_depth = 6MB > max_fsm_pages = 100000 > synchronous_commit = off > wal_buffers = 1MB > commit_delay = 100 > commit_siblings = 5 > checkpoint_segments = 10 > checkpoint_timeout = 10min > random_page_cost = 0.1 > effective_cache_size = 2048MB > > Any idea what's wrong here? If you left seq_page_cost (which isn't mentioned here) at the default value but reduced random_page_cost to 0.1, then you have random_page_cost < seq_page_cost. That's probably Bad. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > If you left seq_page_cost (which isn't mentioned here) at the default > value but reduced random_page_cost to 0.1, then you have > random_page_cost < seq_page_cost. That's probably Bad. ... well, it's certainly going to push the planner to believe indexscans are cheaper than sorts no matter what. The previously noted rowcount estimation problem might be a bigger issue in this particular case, but I agree this is a Bad Idea. regards, tom lane
Richard Huxton wrote: > Since it's expecting 7914 rows for "kouber" it thinks it will find the > 20 rows you want fairly quickly by just looking backward through the > login_attempt_pkey index. > > Try increasing the stats on the username column. > > ALTER TABLE login_attempt ALTER COLUMN username SET STATISTICS 100; > ANALYZE login_attempt; > > You can try different values of statistics up to 1000, but there's no > point in setting it too high. > Hmmm, that did the trick, thank you. I updated the statistics of the column to 300, so now the query plan changed to: Limit (cost=127.65..127.70 rows=20 width=38) (actual time=0.085..0.086 rows=3 loops=1) -> Sort (cost=127.65..129.93 rows=910 width=38) (actual time=0.084..0.085 rows=3 loops=1) Sort Key: login_attempt_sid Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on login_attempt (cost=7.74..103.44 rows=910 width=38) (actual time=0.075..0.078 rows=3 loops=1) Recheck Cond: ((username)::text = 'kouber'::text) -> Bitmap Index Scan on login_attempt_username_idx (cost=0.00..7.51 rows=910 width=0) (actual time=0.069..0.069 rows=3 loops=1) Index Cond: ((username)::text = 'kouber'::text) Total runtime: 0.114 ms Now the planner believes there're 910 rows, which is a bit closer to the real data: swing=# select avg(length) from (select username, count(*) as length from login_attempt group by username) as freq; avg ---------------------- 491.6087310427555479 (1 row) -- Kouber Saparev http://kouber.saparev.com
Kouber Saparev <kouber@saparev.com> writes: > Now the planner believes there're 910 rows, which is a bit closer to the > real data: > swing=# select avg(length) from (select username, count(*) as length > from login_attempt group by username) as freq; > avg > ---------------------- > 491.6087310427555479 > (1 row) Hmph, that's still not real good. Ideally it should be estimating *less* than the average frequency, because the estimate is made after excluding all the most-common-values, which evidently 'kouber' is not one of. I suppose there's quite a large number of infrequently-seen usernames and the ndistinct estimate is much less than reality? (Look at the pg_stats row for this column.) It might be worth going all the way to stats target 1000 for this column. regards, tom lane
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> If you left seq_page_cost (which isn't mentioned here) at the default >> value but reduced random_page_cost to 0.1, then you have >> random_page_cost < seq_page_cost. That's probably Bad. > > ... well, it's certainly going to push the planner to believe indexscans > are cheaper than sorts no matter what. > > The previously noted rowcount estimation problem might be a bigger issue > in this particular case, but I agree this is a Bad Idea. So I've set it wrong, I guess. :-) Now I put it to: seq_page_cost = 1 random_page_cost = 2 Regards, -- Kouber Saparev http://kouber.saparev.com
Tom Lane wrote: > Kouber Saparev <kouber@saparev.com> writes: >> Now the planner believes there're 910 rows, which is a bit closer to the >> real data: > >> swing=# select avg(length) from (select username, count(*) as length >> from login_attempt group by username) as freq; >> avg >> ---------------------- >> 491.6087310427555479 >> (1 row) > > Hmph, that's still not real good. Ideally it should be estimating > *less* than the average frequency, because the estimate is made after > excluding all the most-common-values, which evidently 'kouber' is not > one of. I suppose there's quite a large number of infrequently-seen > usernames and the ndistinct estimate is much less than reality? (Look > at the pg_stats row for this column.) It might be worth going all the > way to stats target 1000 for this column. I altered the statistics for that column to 1000, so now the planner assumes exactly 492 rows for the fore-mentioned query, which is indeed the average. It never went *less* than that value, it was always higher, i.e. for a statistics value of 600, it was 588, for 800, it became 540. The current value of n_distinct (given statistics=1000) is: db=# SELECT n_distinct FROM pg_stats WHERE tablename='login_attempt' AND attname='username'; n_distinct ------------ 5605 (1 row) db=# SELECT COUNT(DISTINCT username) FROM login_attempt; count ------- 23391 (1 row) In fact, what is n_distinct standing for, apart the famous formula: n*d / (n - f1 + f1*n/N) ;-) Regards, -- Kouber Saparev http://kouber.saparev.com
Kouber Saparev <kouber@saparev.com> writes: > Tom Lane wrote: >> Hmph, that's still not real good. Ideally it should be estimating >> *less* than the average frequency, because the estimate is made after >> excluding all the most-common-values, which evidently 'kouber' is not >> one of. > I altered the statistics for that column to 1000, so now the planner > assumes exactly 492 rows for the fore-mentioned query, which is indeed > the average. It never went *less* than that value, it was always higher, > i.e. for a statistics value of 600, it was 588, for 800, it became 540. I got some time to think more about this and experiment a bit further. As far as I can tell there is no fundamental bug here --- given reasonably accurate stats the rowcount estimate behaves as expected, ie, you get an estimate that's less than the actual average number of values if the target value is not one of the known MCVs. However, as the n_distinct estimate falls below the actual number of distinct values, that rowcount estimate necessarily rises. What had surprised me about this report is that the estimate matched the true average number of rows so closely; I wondered if there was some property of the way we estimate n_distinct that would make that happen. But I now think that that was just chance: there doesn't seem to be any underlying behavior that would cause it. I did some experiments with random data matching a Zipfian distribution (1/k law) and did not observe that the rowcount estimate converged to the true average when the n_distinct value was too low. So the bottom line here is just that the estimated n_distinct is too low. We've seen before that the equation we use tends to do that more often than not. I doubt that consistently erring on the high side would be better though :-(. Estimating n_distinct from a limited sample of the population is known to be a statistically hard problem, so we'll probably not ever have perfect answers, but doing better is on the to-do list. regards, tom lane
> So the bottom line here is just that the estimated n_distinct is too > low. We've seen before that the equation we use tends to do that more > often than not. I doubt that consistently erring on the high side would > be better though :-(. Estimating n_distinct from a limited sample of > the population is known to be a statistically hard problem, so we'll > probably not ever have perfect answers, but doing better is on the > to-do list. > I hit an interestinhg paper on n_distinct calculation: http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf the PCSA algorithm described there requires O(1) calculation per value. Page 22 describes what to do with updates streams. This I think (disclaimer: I know little about PG internals) means that the n_distinct estimation can be done during vacuum time (it would play well with the visibility map addon). What do You think? Greetings Marcin
> I hit an interestinhg paper on n_distinct calculation: > > http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf > > the PCSA algorithm described there requires O(1) calculation per > value. Page 22 describes what to do with updates streams. > > This I think (disclaimer: I know little about PG internals) means that > the n_distinct estimation can be done during vacuum time (it would > play well with the visibility map addon). > > What do You think? ok, if You think that calculating a has function of every data field for each insert or delete is prohibitive, just say so and don`t bother reading the paper :] Greetings Marcin
marcin mank <marcin.mank@gmail.com> writes: > I hit an interestinhg paper on n_distinct calculation: > http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf I don't think we're quite ready to make ANALYZE read every row of a table in order to estimate n_distinct. It is an interesting paper in that it says that you have to do that in order to get *provably* good estimates, but I've not abandoned the hope of getting *usually* good estimates without so much work. regards, tom lane
Now I am experiencing similar issue with another table, called "message", for which there's a conditional index: CREATE TABLE message ( message_sid SERIAL PRIMARY KEY, from_profile_sid INT NOT NULL REFERENCES profile, to_profile_sid INT NOT NULL REFERENCES profile, sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, datetime TIMESTAMP NOT NULL DEFAULT NOW(), body TEXT ); CREATE INDEX message_from_profile_idx (from_profile_sid) WHERE NOT sender_has_deleted; So, again... adding a LIMIT makes the planner choose the "wrong" index. db=# EXPLAIN ANALYZE SELECT message_sid FROM message WHERE from_profile_sid = 1296 AND NOT sender_has_deleted ORDER BY message_sid DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=2307.70..2310.74 rows=1215 width=4) (actual time=0.040..0.040 rows=2 loops=1) Sort Key: message_sid Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on message (cost=23.59..2245.45 rows=1215 width=4) (actual time=0.029..0.033 rows=2 loops=1) Recheck Cond: ((from_profile_sid = 1296) AND (NOT sender_has_deleted)) -> Bitmap Index Scan on message_from_profile_idx (cost=0.00..23.28 rows=1215 width=0) (actual time=0.022..0.022 rows=2 loops=1) Index Cond: (from_profile_sid = 1296) Total runtime: 0.068 ms (8 rows) db=# EXPLAIN ANALYZE SELECT message_sid FROM message WHERE from_profile_sid = 1296 AND NOT sender_has_deleted ORDER BY message_sid DESC LIMIT 20; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1461.12 rows=20 width=4) (actual time=0.817..932.398 rows=2 loops=1) -> Index Scan Backward using message_pkey on message (cost=0.00..88762.80 rows=1215 width=4) (actual time=0.816..932.395 rows=2 loops=1) Filter: ((NOT sender_has_deleted) AND (from_profile_sid = 1296)) Total runtime: 932.432 ms (4 rows) I had already increased STATISTICS to 1000 for both from_profile_sid and sender_has_deleted, and vacuum analyzed respectively (also did reindex), but still statistical data is confusing me: db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND attname='from_profile_sid'; n_distinct ------------ 4068 (1 row) db=# select avg(length) from (select from_profile_sid, count(*) as length from message group by from_profile_sid) as freq; avg ---------------------- 206.5117822008693663 (1 row) Any ideas/thoughts? -- Kouber Saparev http://kouber.saparev.com