Обсуждение: Efficiently searching for the most recent rows where a columnmatches any result from a different query
Hello,
I have the following schema:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE friends (
user_id BIGINT NOT NULL REFERENCES users,
friend_user_id BIGINT NOT NULL REFERENCES users,
UNIQUE (user_id, friend_user_id)
);
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users,
content TEXT NOT NULL
);
CREATE INDEX posts_user_id_id_index ON posts(user_id, id);
Each user can unilaterally follow any number of friends. The posts table has a large number of rows and is rapidly growing.
My goal is to retrieve the 10 most recent posts of a user's friends. This query gives the correct result, but is inefficient:
SELECT posts.id, users.name, posts.content
FROM posts JOIN users ON posts.user_id = users.id
WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1)
ORDER BY posts.id DESC LIMIT 10;
If the user's friends have recently posted, the query is still reasonably fast (https://explain.depesz.com/s/6ykR). But if the user's friends haven't recently posted or the user has no friends, it quickly deteriorates (https://explain.depesz.com/s/OnoG).
If I match only a single post author (e.g. WHERE posts.user_id = 5), Postgres uses the index posts_user_id_id_index. But if I use IN, the index doesn't appear to be used at all.
How can I get these results more efficiently?
I've uploaded the schema and the queries I've tried to dbfiddle at http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0. The output of "SELECT version()" is "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for me.
Thank you in advance for any insights, pointers or suggestions you are able to give me.
Regards,
Milo
Hello:
EXPLAIN (ANALYZE, BUFFERS)
select * from (
FROM posts JOIN users ON posts.user_id = users.id
WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1)
ORDER BY posts.id DESC
) as a
ORDER BY a.id DESC
LIMIT 10;
------
2018-02-13 8:28 GMT-05:00 <mkslaf@keemail.me>:
Hello,I have the following schema:CREATE TABLE users (id BIGSERIAL PRIMARY KEY,name TEXT NOT NULL UNIQUE);CREATE TABLE friends (user_id BIGINT NOT NULL REFERENCES users,friend_user_id BIGINT NOT NULL REFERENCES users,UNIQUE (user_id, friend_user_id));CREATE TABLE posts (id BIGSERIAL PRIMARY KEY,user_id BIGINT NOT NULL REFERENCES users,content TEXT NOT NULL);CREATE INDEX posts_user_id_id_index ON posts(user_id, id);Each user can unilaterally follow any number of friends. The posts table has a large number of rows and is rapidly growing.My goal is to retrieve the 10 most recent posts of a user's friends. This query gives the correct result, but is inefficient:SELECT posts.id, users.name, posts.contentFROM posts JOIN users ON posts.user_id = users.idWHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1)ORDER BY posts.id DESC LIMIT 10;If the user's friends have recently posted, the query is still reasonably fast (https://explain.depesz.com/s/6ykR). But if the user's friends haven't recently posted or the user has no friends, it quickly deteriorates (https://explain.depesz.com/s/ OnoG). If I match only a single post author (e.g. WHERE posts.user_id = 5), Postgres uses the index posts_user_id_id_index. But if I use IN, the index doesn't appear to be used at all.How can I get these results more efficiently?I've uploaded the schema and the queries I've tried to dbfiddle at http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle= cf1489b7f6d53c3fe0b55ed7ccbad1 f0. The output of "SELECT version()" is "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for me. Thank you in advance for any insights, pointers or suggestions you are able to give me.Regards,Milo
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Hello Hellmuth,
Thank you for your response.
I've uploaded the query plan for the first query (user_id=2) here: https://gist.github.com/anonymous/6d251b277ef71f8977b03cab91fedccd
The query plan for the second query (user_id=1) can be found here: https://gist.github.com/anonymous/32ed485b40cce2651ddc52661f3e7f7b
Just like in the original queries, posts_user_id_id_index is not used.
Kind regards,
Milo
13. Feb 2018 22:13 by hivs77@gmail.com:
Hello:EXPLAIN (ANALYZE, BUFFERS)select * from (FROM posts JOIN users ON posts.user_id = users.idWHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1)ORDER BY posts.id DESC) as aORDER BY a.id DESCLIMIT 10;------2018-02-13 8:28 GMT-05:00 <mkslaf@keemail.me>:Hello,I have the following schema:CREATE TABLE users (id BIGSERIAL PRIMARY KEY,name TEXT NOT NULL UNIQUE);CREATE TABLE friends (user_id BIGINT NOT NULL REFERENCES users,friend_user_id BIGINT NOT NULL REFERENCES users,UNIQUE (user_id, friend_user_id));CREATE TABLE posts (id BIGSERIAL PRIMARY KEY,user_id BIGINT NOT NULL REFERENCES users,content TEXT NOT NULL);CREATE INDEX posts_user_id_id_index ON posts(user_id, id);Each user can unilaterally follow any number of friends. The posts table has a large number of rows and is rapidly growing.My goal is to retrieve the 10 most recent posts of a user's friends. This query gives the correct result, but is inefficient:SELECT posts.id, users.name, posts.contentFROM posts JOIN users ON posts.user_id = users.idWHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1)ORDER BY posts.id DESC LIMIT 10;If the user's friends have recently posted, the query is still reasonably fast (https://explain.depesz.com/s/6ykR). But if the user's friends haven't recently posted or the user has no friends, it quickly deteriorates (https://explain.depesz.com/s/OnoG).If I match only a single post author (e.g. WHERE posts.user_id = 5), Postgres uses the index posts_user_id_id_index. But if I use IN, the index doesn't appear to be used at all.How can I get these results more efficiently?I've uploaded the schema and the queries I've tried to dbfiddle at http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0. The output of "SELECT version()" is "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for me.Thank you in advance for any insights, pointers or suggestions you are able to give me.Regards,Milo--Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por InternetOracle Database 10g Administrator Certified AssociateEnterpriseDB Certified PostgreSQL 9.3 Associate
Hi, I myself am new to performance tuning queries. But, from what you have said it looks like Postgres has to go through all the posts using the backward index scan and find out whether their author is amongst the user's friends list. Since the number of friends is arbitrary for any user, even if a user has few friends (or no friends at all), the stats will not reflect this and so the planner cannot take advantage of this to directly fetch the posts from the small set of friends. My suggestion (which involves changing the schema and query) is to have a last_post_id or last_posted_time column in user table, find the last 10 friends who have posted first and then use it to find the last 10 posts. Something like, select * from posts where posts.author_id in (select id from users where id in (select friend_id from user_friend where user_id = 1) and last_posted_time is not null order by last_posted_time desc limit 10); I am not sure if this is the best way to solve this. If there are better solutions I would be happy to learn the same. Regards Nanda On Thu, Feb 15, 2018 at 5:48 PM, <mkslaf@keemail.me> wrote: > > Hello Hellmuth, > > Thank you for your response. > > I've uploaded the query plan for the first query (user_id=2) here: > https://gist.github.com/anonymous/6d251b277ef71f8977b03cab91fedccd > The query plan for the second query (user_id=1) can be found here: > https://gist.github.com/anonymous/32ed485b40cce2651ddc52661f3e7f7b > > Just like in the original queries, posts_user_id_id_index is not used. > > Kind regards, > Milo > > 13. Feb 2018 22:13 by hivs77@gmail.com: > > Hello: > > > EXPLAIN (ANALYZE, BUFFERS) > select * from ( > SELECT posts.id, users.name, posts.content > FROM posts JOIN users ON posts.user_id = users.id > WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = > 1) > > ORDER BY posts.id DESC > ) as a > ORDER BY a.id DESC > LIMIT 10; > > ------ > > > EXPLAIN (ANALYZE, BUFFERS) > select * from ( > SELECT posts.id, users.name, posts.content > FROM posts JOIN users ON posts.user_id = users.id > WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = > 2) > > ORDER BY posts.id DESC > ) as a > ORDER BY a.id DESC > LIMIT 10; > > 2018-02-13 8:28 GMT-05:00 <mkslaf@keemail.me>: >> >> Hello, >> >> I have the following schema: >> >> CREATE TABLE users ( >> id BIGSERIAL PRIMARY KEY, >> name TEXT NOT NULL UNIQUE >> ); >> >> CREATE TABLE friends ( >> user_id BIGINT NOT NULL REFERENCES users, >> friend_user_id BIGINT NOT NULL REFERENCES users, >> UNIQUE (user_id, friend_user_id) >> ); >> >> CREATE TABLE posts ( >> id BIGSERIAL PRIMARY KEY, >> user_id BIGINT NOT NULL REFERENCES users, >> content TEXT NOT NULL >> ); >> CREATE INDEX posts_user_id_id_index ON posts(user_id, id); >> >> Each user can unilaterally follow any number of friends. The posts table >> has a large number of rows and is rapidly growing. >> >> My goal is to retrieve the 10 most recent posts of a user's friends. This >> query gives the correct result, but is inefficient: >> >> SELECT posts.id, users.name, posts.content >> FROM posts JOIN users ON posts.user_id = users.id >> WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE >> user_id = 1) >> ORDER BY posts.id DESC LIMIT 10; >> >> If the user's friends have recently posted, the query is still reasonably >> fast (https://explain.depesz.com/s/6ykR). But if the user's friends haven't >> recently posted or the user has no friends, it quickly deteriorates >> (https://explain.depesz.com/s/OnoG). >> >> If I match only a single post author (e.g. WHERE posts.user_id = 5), >> Postgres uses the index posts_user_id_id_index. But if I use IN, the index >> doesn't appear to be used at all. >> >> How can I get these results more efficiently? >> >> I've uploaded the schema and the queries I've tried to dbfiddle at >> http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0. >> The output of "SELECT version()" is "PostgreSQL 9.6.5 on >> x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for >> me. >> >> Thank you in advance for any insights, pointers or suggestions you are >> able to give me. >> >> Regards, >> Milo > > > > > -- > Cordialmente, > > Ing. Hellmuth I. Vargas S. > Esp. Telemática y Negocios por Internet > Oracle Database 10g Administrator Certified Associate > EnterpriseDB Certified PostgreSQL 9.3 Associate >
Hi, Correction in the query. I missed to add limit 10 in the outer most query.. > select * from posts where posts.author_id in (select id from users > where id in (select friend_id from user_friend where user_id = 1) and > last_posted_time is not null order by last_posted_time desc limit 10); > select * from posts where posts.author_id in (select id from users where id in (select friend_id from user_friend where user_id = 1) and last_posted_time is not null order by last_posted_time desc limit 10) order by post_id desc limit 10; Regards, Nanda