Обсуждение: Inconsistent query performance based on relation hit frequency

Поиск
Список
Период
Сортировка

Inconsistent query performance based on relation hit frequency

От
Laura Hausmann
Дата:
Heya, I hope the title is somewhat descriptive. I'm working on a decentralized social media platform and have encountered the following performance issue/quirk, and would like to ask for input, since I'm not sure I missed anything.

I'm running PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230801, 64-bit, running on an Arch Linux box with 128GB of RAM & an 8c16t Ryzen 3700X CPU. Disk is a NVME RAID0.

Postgres configuration: https://paste.depesz.com/s/iTv

I'm using autovacuum defaults & am running a manual VACUUM ANALYZE on the entire database nightly.

The relevant database parts consist of a table with posts (note), a table with users (user), and a table with follow relationships (following). The query in question takes the most recent n (e.g. 50) posts, filtered by the users follow relations.

The note table on my main production instance grows by about 200k entries per week.

Schema & tuple counts: https://paste.depesz.com/s/cfI

Here's the shortest query I can reproduce the issue with: https://paste.depesz.com/s/RoC
Specifically, it works well for users that follow a relatively large amount of users (https://explain.depesz.com/s/tJnB), and is very slow for users that follow a low amount of users / users that post infrequently (https://explain.depesz.com/s/Mtyr).

From what I can tell, this is because this query causes postgres to scan the note table from the bottom (most recent posts first), discarding anything by users that are not followed.

Curiously, rewriting the query like this (https://paste.depesz.com/s/8rN) causes the opposite problem, this query is fast for users with a low following count (https://explain.depesz.com/s/yHAz#query), and slow for users with a high following count (https://explain.depesz.com/s/1v6L, https://explain.depesz.com/s/yg3N).

These numbers are even further apart (to the point of 10-30s query timeouts) in the most extreme outlier cases I've observed, and on lower-end hardware.

I've sidestepped the issue by running either of these queries based on a heuristic that checks whether there are more than 250 matching posts in the past 7 days, recomputed once per day for every user, but it feels more like a hack than a proper solution.

I'm able to make the planner make a sensible decision in both cases by setting enable_sort = off, but that tanks performance for the rest of my application, is even more of a hack, and doesn't seem to work in all cases.

I've been able to reproduce this issue with mock data (https://paste.depesz.com/s/CnY), though it's not generating quite the same query plans and is behaving a bit differently.

I'd appreciate any and all input on the situation. If I've left out any information that would be useful in figuring this out, please tell me.

Thanks in advance,
Laura Hausmann

Re: Inconsistent query performance based on relation hit frequency

От
Andrei Lepikhov
Дата:
On 6/27/24 07:50, Laura Hausmann wrote:
> I'd appreciate any and all input on the situation. If I've left out any 
> information that would be useful in figuring this out, please tell me.
Thanks for this curious case, I like it!
At first, you can try to avoid "OR" expressions - PostgreSQL has quite 
limited set of optimisation/prediction tricks on such expressions.
Second - I see, postgres predicts wrong number of tuples. But using my 
typical tool [1] and getting more precise estimations i don't see 
significant profit:

  Limit  (cost=10832.85..10838.69 rows=50 width=21)
    ->  Gather Merge  (cost=10832.85..10838.92 rows=52 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9832.83..9832.90 rows=26 width=21)
                Sort Key: objects.id DESC
                Sort Method: top-N heapsort  Memory: 32kB
                Worker 0:  Sort Method: quicksort  Memory: 32kB
                Worker 1:  Sort Method: quicksort  Memory: 32kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 1))
                      Rows Removed by Filter: 183372
                      SubPlan 1
                        ->  Nested Loop
                              ->  Index Only Scan using users_pkey on
                                    Index Cond: (id = 1)
                                    Heap Fetches: 0
                              ->  Index Only Scan using 
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 1)
                                    Heap Fetches: 0
  Planning Time: 0.762 ms
  Execution Time: 43.816 ms

  Limit  (cost=10818.83..10819.07 rows=2 width=21)
    ->  Gather Merge  (cost=10818.83..10819.07 rows=2 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9818.81..9818.81 rows=1 width=21)
                Sort Key: objects.id DESC
                Sort Method: quicksort  Memory: 25kB
                Worker 0:  Sort Method: quicksort  Memory: 25kB
                Worker 1:  Sort Method: quicksort  Memory: 25kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                      Rows Removed by Filter: 183477
                      SubPlan 1
                        ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4)
                              ->  Index Only Scan using 
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 4)
                                    Heap Fetches: 0
                              ->  Index Only Scan using users_pkey
                                    Index Cond: (id = 4)
                                    Heap Fetches: 0
  Planning Time: 0.646 ms
  Execution Time: 30.824 ms

But this was achieved just because of parallel workers utilisation. 
Disabling them we get:

  Limit  (cost=14635.07..14635.08 rows=2 width=21) (actual 
time=75.941..75.943 rows=0 loops=1)
    ->  Sort  (cost=14635.07..14635.08 rows=2 width=21) (actual 
time=75.939..75.940 rows=0 loops=1)
          Sort Key: objects.id DESC
          Sort Method: quicksort  Memory: 25kB
          ->  Seq Scan on objects  (cost=8.61..14635.06 rows=2 width=21) 
(actual time=75.931..75.932 rows=0 loops=1)
                Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                Rows Removed by Filter: 550430
                SubPlan 1
                  ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4) 
(actual time=0.039..0.040 rows=0 loops=1)
                        ->  Index Only Scan using 
"relationships_followerId_followeeId_idx" on relationships 
(cost=0.28..4.29 rows=1 width=8) (actual time=0.038..0.038 rows=0 loops=1)
                              Index Cond: ("followerId" = 4)
                              Heap Fetches: 0
                        ->  Index Only Scan using users_pkey on users 
(cost=0.29..4.31 rows=1 width=4) (never executed)
                              Index Cond: (id = 4)
                              Heap Fetches: 0
  Planning Time: 0.945 ms
  Execution Time: 76.123 ms

So, from the optimiser's point of view, it has done the best it could.
Theoretically, if you have a big table with indexes and must select a 
small number of tuples, the ideal query plan will include parameterised 
NestLoop JOINs. Unfortunately, parameterisation in PostgreSQL can't pass 
inside a subquery. It could be a reason for new development because 
MSSQL can do such a trick, but it is a long way.
You can try to rewrite your schema and query to avoid subqueries in 
expressions at all.
I hope this message gave you some insights.

[1] https://github.com/postgrespro/aqo

-- 
regards, Andrei Lepikhov




Re: Inconsistent query performance based on relation hit frequency

От
Achilleas Mantzios - cloud
Дата:


On 6/27/24 03:50, Laura Hausmann wrote:
Heya, I hope the title is somewhat descriptive. I'm working on a decentralized social media platform and have encountered the following performance issue/quirk, and would like to ask for input, since I'm not sure I missed anything.

I'm running PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230801, 64-bit, running on an Arch Linux box with 128GB of RAM & an 8c16t Ryzen 3700X CPU. Disk is a NVME RAID0.

Postgres configuration: https://paste.depesz.com/s/iTv

I'm using autovacuum defaults & am running a manual VACUUM ANALYZE on the entire database nightly.

The relevant database parts consist of a table with posts (note), a table with users (user), and a table with follow relationships (following). The query in question takes the most recent n (e.g. 50) posts, filtered by the users follow relations.

The note table on my main production instance grows by about 200k entries per week.

Schema & tuple counts: https://paste.depesz.com/s/cfI

Here's the shortest query I can reproduce the issue with: https://paste.depesz.com/s/RoC
Specifically, it works well for users that follow a relatively large amount of users (https://explain.depesz.com/s/tJnB), and is very slow for users that follow a low amount of users / users that post infrequently (https://explain.depesz.com/s/Mtyr).

From what I can tell, this is because this query causes postgres to scan the note table from the bottom (most recent posts first), discarding anything by users that are not followed.

Curiously, rewriting the query like this (https://paste.depesz.com/s/8rN) causes the opposite problem, this query is fast for users with a low following count (https://explain.depesz.com/s/yHAz#query), and slow for users with a high following count (https://explain.depesz.com/s/1v6L, https://explain.depesz.com/s/yg3N).

These numbers are even further apart (to the point of 10-30s query timeouts) in the most extreme outlier cases I've observed, and on lower-end hardware.

I've sidestepped the issue by running either of these queries based on a heuristic that checks whether there are more than 250 matching posts in the past 7 days, recomputed once per day for every user, but it feels more like a hack than a proper solution.

I'm able to make the planner make a sensible decision in both cases by setting enable_sort = off, but that tanks performance for the rest of my application, is even more of a hack, and doesn't seem to work in all cases.

I've been able to reproduce this issue with mock data (https://paste.depesz.com/s/CnY), though it's not generating quite the same query plans and is behaving a bit differently.

Before deep dive into everybody's favorite topic you may simplify your query :

select o.* from objects o where o."userId" = :userid UNION select o.* from objects o where o."userId" IN

(SELECT r."followeeId" FROM relationships r WHERE r."followerId"= :userid)

postgres@[local]/laura=# explain (analyze, buffers) select o.* from objects o where o."userId" = 1 UNION select o.* from objects o where o."userId" IN (SELECT r."followeeId" FROM relati
onships r WHERE r."followerId"=1) ORDER BY id DESC ;
                                                                                        QUERY PLAN                                                                                      
   
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---
Sort  (cost=8622.04..8767.98 rows=58376 width=40) (actual time=1.041..1.053 rows=314 loops=1)
  Sort Key: o.id DESC
  Sort Method: quicksort  Memory: 39kB
  Buffers: shared hit=1265
  ->  HashAggregate  (cost=3416.92..4000.68 rows=58376 width=40) (actual time=0.900..1.006 rows=314 loops=1)
        Group Key: o.id, o."userId", o.data
        Batches: 1  Memory Usage: 1585kB
        Buffers: shared hit=1265
        ->  Append  (cost=0.42..2979.10 rows=58376 width=40) (actual time=0.024..0.816 rows=314 loops=1)
              Buffers: shared hit=1265
              ->  Index Scan using "objects_userId_idx" on objects o  (cost=0.42..3.10 rows=17 width=21) (actual time=0.003..0.003 rows=0 loops=1)
                    Index Cond: ("userId" = 1)
                    Buffers: shared hit=3
              ->  Nested Loop  (cost=0.70..2684.12 rows=58359 width=21) (actual time=0.020..0.794 rows=314 loops=1)
                    Buffers: shared hit=1262
                    ->  Index Only Scan using "relationships_followerId_followeeId_idx" on relationships r  (cost=0.28..7.99 rows=315 width=4) (actual time=0.011..0.030 rows=315 loops=
1)
                          Index Cond: ("followerId" = 1)
                          Heap Fetches: 0
                          Buffers: shared hit=3
                    ->  Index Scan using "objects_userId_idx" on objects o_1  (cost=0.42..6.65 rows=185 width=21) (actual time=0.002..0.002 rows=1 loops=315)
                          Index Cond: ("userId" = r."followeeId")
                          Buffers: shared hit=1259
Planning:
  Buffers: shared hit=8
Planning Time: 0.190 ms
Execution Time: 1.184 ms
(26 rows)

Time: 1.612 ms
postgres@[local]/laura=# explain (analyze, buffers) select o.* from objects o where o."userId" = 4 UNION select o.* from objects o where o."userId" IN (SELECT r."followeeId" FROM relati
onships r WHERE r."followerId"=4) ORDER BY id DESC ;
                                                                                      QUERY PLAN                                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=27.53..28.03 rows=202 width=40) (actual time=0.015..0.016 rows=0 loops=1)
  Sort Key: o.id DESC
  Sort Method: quicksort  Memory: 25kB
  Buffers: shared hit=5
  ->  HashAggregate  (cost=17.77..19.79 rows=202 width=40) (actual time=0.013..0.013 rows=0 loops=1)
        Group Key: o.id, o."userId", o.data
        Batches: 1  Memory Usage: 40kB
        Buffers: shared hit=5
        ->  Append  (cost=0.42..16.26 rows=202 width=40) (actual time=0.011..0.012 rows=0 loops=1)
              Buffers: shared hit=5
              ->  Index Scan using "objects_userId_idx" on objects o  (cost=0.42..3.10 rows=17 width=21) (actual time=0.005..0.005 rows=0 loops=1)
                    Index Cond: ("userId" = 4)
                    Buffers: shared hit=3
              ->  Nested Loop  (cost=0.70..12.14 rows=185 width=21) (actual time=0.005..0.005 rows=0 loops=1)
                    Buffers: shared hit=2
                    ->  Index Only Scan using "relationships_followerId_followeeId_idx" on relationships r  (cost=0.28..1.39 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=1)
                          Index Cond: ("followerId" = 4)
                          Heap Fetches: 0
                          Buffers: shared hit=2
                    ->  Index Scan using "objects_userId_idx" on objects o_1  (cost=0.42..8.90 rows=185 width=21) (never executed)
                          Index Cond: ("userId" = r."followeeId")
Planning:
  Buffers: shared hit=8
Planning Time: 0.201 ms
Execution Time: 0.048 ms
(25 rows)

Time: 0.490 ms



I'd appreciate any and all input on the situation. If I've left out any information that would be useful in figuring this out, please tell me.

Thanks in advance,
Laura Hausmann

Re: Inconsistent query performance based on relation hit frequency

От
Laura Hausmann
Дата:
Heya & thank you for the response!

That makes a lot of sense. I'm glad to hear it's on the radar of the team, but I understand that this is a complex task and won't happen anytime soon.

For the meantime, I've tried a couple ways of rewriting the query, sadly none of which seem to translate to the production database:

Simply dropping the or/union clause (and adding a relationship to the user themselves) fixes the problem in the test database (both user 1 (https://explain.depesz.com/s/ZY8l) and user 4 (https://explain.depesz.com/s/Q2Wk) run in 1~15ms, which isn't perfect but good enough), but not the production one (still fast for high frequency (https://explain.depesz.com/s/DixF) and slow for low frequency (https://explain.depesz.com/s/fIKm) users).

I also tried rewriting it as a join (https://explain.depesz.com/s/36Ve), but that also didn't seem to have an effect.

It's very possible I missed one or multiple ways the query could be rewritten in.

I'm sadly not sure how I could generate a test dataset that more closely resembles the production workload. In case that would be helpful in debugging this further, any tips on that would be greatly appreciated.

Thanks in advance,
Laura Hausmann


On Thu, Jun 27, 2024 at 12:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 6/27/24 07:50, Laura Hausmann wrote:
> I'd appreciate any and all input on the situation. If I've left out any
> information that would be useful in figuring this out, please tell me.
Thanks for this curious case, I like it!
At first, you can try to avoid "OR" expressions - PostgreSQL has quite
limited set of optimisation/prediction tricks on such expressions.
Second - I see, postgres predicts wrong number of tuples. But using my
typical tool [1] and getting more precise estimations i don't see
significant profit:

  Limit  (cost=10832.85..10838.69 rows=50 width=21)
    ->  Gather Merge  (cost=10832.85..10838.92 rows=52 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9832.83..9832.90 rows=26 width=21)
                Sort Key: objects.id DESC
                Sort Method: top-N heapsort  Memory: 32kB
                Worker 0:  Sort Method: quicksort  Memory: 32kB
                Worker 1:  Sort Method: quicksort  Memory: 32kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 1))
                      Rows Removed by Filter: 183372
                      SubPlan 1
                        ->  Nested Loop
                              ->  Index Only Scan using users_pkey on
                                    Index Cond: (id = 1)
                                    Heap Fetches: 0
                              ->  Index Only Scan using
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 1)
                                    Heap Fetches: 0
  Planning Time: 0.762 ms
  Execution Time: 43.816 ms

  Limit  (cost=10818.83..10819.07 rows=2 width=21)
    ->  Gather Merge  (cost=10818.83..10819.07 rows=2 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9818.81..9818.81 rows=1 width=21)
                Sort Key: objects.id DESC
                Sort Method: quicksort  Memory: 25kB
                Worker 0:  Sort Method: quicksort  Memory: 25kB
                Worker 1:  Sort Method: quicksort  Memory: 25kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                      Rows Removed by Filter: 183477
                      SubPlan 1
                        ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4)
                              ->  Index Only Scan using
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 4)
                                    Heap Fetches: 0
                              ->  Index Only Scan using users_pkey
                                    Index Cond: (id = 4)
                                    Heap Fetches: 0
  Planning Time: 0.646 ms
  Execution Time: 30.824 ms

But this was achieved just because of parallel workers utilisation.
Disabling them we get:

  Limit  (cost=14635.07..14635.08 rows=2 width=21) (actual
time=75.941..75.943 rows=0 loops=1)
    ->  Sort  (cost=14635.07..14635.08 rows=2 width=21) (actual
time=75.939..75.940 rows=0 loops=1)
          Sort Key: objects.id DESC
          Sort Method: quicksort  Memory: 25kB
          ->  Seq Scan on objects  (cost=8.61..14635.06 rows=2 width=21)
(actual time=75.931..75.932 rows=0 loops=1)
                Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                Rows Removed by Filter: 550430
                SubPlan 1
                  ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4)
(actual time=0.039..0.040 rows=0 loops=1)
                        ->  Index Only Scan using
"relationships_followerId_followeeId_idx" on relationships
(cost=0.28..4.29 rows=1 width=8) (actual time=0.038..0.038 rows=0 loops=1)
                              Index Cond: ("followerId" = 4)
                              Heap Fetches: 0
                        ->  Index Only Scan using users_pkey on users
(cost=0.29..4.31 rows=1 width=4) (never executed)
                              Index Cond: (id = 4)
                              Heap Fetches: 0
  Planning Time: 0.945 ms
  Execution Time: 76.123 ms

So, from the optimiser's point of view, it has done the best it could.
Theoretically, if you have a big table with indexes and must select a
small number of tuples, the ideal query plan will include parameterised
NestLoop JOINs. Unfortunately, parameterisation in PostgreSQL can't pass
inside a subquery. It could be a reason for new development because
MSSQL can do such a trick, but it is a long way.
You can try to rewrite your schema and query to avoid subqueries in
expressions at all.
I hope this message gave you some insights.

[1] https://github.com/postgrespro/aqo

--
regards, Andrei Lepikhov

Re: Inconsistent query performance based on relation hit frequency

От
Achilleas Mantzios
Дата:
Στις 27/6/24 17:58, ο/η Laura Hausmann έγραψε:
Heya & thank you for the response!

That makes a lot of sense. I'm glad to hear it's on the radar of the team, but I understand that this is a complex task and won't happen anytime soon.

For the meantime, I've tried a couple ways of rewriting the query, sadly none of which seem to translate to the production database:

Simply dropping the or/union clause (and adding a relationship to the user themselves) fixes the problem in the test database (both user 1 (https://explain.depesz.com/s/ZY8l) and user 4 (https://explain.depesz.com/s/Q2Wk) run in 1~15ms, which isn't perfect but good enough), but not the production one (still fast for high frequency (https://explain.depesz.com/s/DixF) and slow for low frequency (https://explain.depesz.com/s/fIKm) users).

I also tried rewriting it as a join (https://explain.depesz.com/s/36Ve), but that also didn't seem to have an effect.

It's very possible I missed one or multiple ways the query could be rewritten in.

I'm sadly not sure how I could generate a test dataset that more closely resembles the production workload. In case that would be helpful in debugging this further, any tips on that would be greatly appreciated.

I am not sure my message made it through to you, I dont know if you are subscribed to the list, here is an idea :

select o.* from objects o where o."userId" = :userid UNION select o.* from objects o where o."userId" IN 

(SELECT r."followeeId" FROM relationships r WHERE r."followerId"=:userid) ORDER BY id DESC ;

With your test data I get <= 1ms answers with all inputs.


Thanks in advance,
Laura Hausmann


On Thu, Jun 27, 2024 at 12:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 6/27/24 07:50, Laura Hausmann wrote:
> I'd appreciate any and all input on the situation. If I've left out any
> information that would be useful in figuring this out, please tell me.
Thanks for this curious case, I like it!
At first, you can try to avoid "OR" expressions - PostgreSQL has quite
limited set of optimisation/prediction tricks on such expressions.
Second - I see, postgres predicts wrong number of tuples. But using my
typical tool [1] and getting more precise estimations i don't see
significant profit:

  Limit  (cost=10832.85..10838.69 rows=50 width=21)
    ->  Gather Merge  (cost=10832.85..10838.92 rows=52 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9832.83..9832.90 rows=26 width=21)
                Sort Key: objects.id DESC
                Sort Method: top-N heapsort  Memory: 32kB
                Worker 0:  Sort Method: quicksort  Memory: 32kB
                Worker 1:  Sort Method: quicksort  Memory: 32kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 1))
                      Rows Removed by Filter: 183372
                      SubPlan 1
                        ->  Nested Loop
                              ->  Index Only Scan using users_pkey on
                                    Index Cond: (id = 1)
                                    Heap Fetches: 0
                              ->  Index Only Scan using
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 1)
                                    Heap Fetches: 0
  Planning Time: 0.762 ms
  Execution Time: 43.816 ms

  Limit  (cost=10818.83..10819.07 rows=2 width=21)
    ->  Gather Merge  (cost=10818.83..10819.07 rows=2 width=21)
          Workers Planned: 2
          Workers Launched: 2
          ->  Sort  (cost=9818.81..9818.81 rows=1 width=21)
                Sort Key: objects.id DESC
                Sort Method: quicksort  Memory: 25kB
                Worker 0:  Sort Method: quicksort  Memory: 25kB
                Worker 1:  Sort Method: quicksort  Memory: 25kB
                ->  Parallel Seq Scan on objects
                      Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                      Rows Removed by Filter: 183477
                      SubPlan 1
                        ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4)
                              ->  Index Only Scan using
"relationships_followerId_followeeId_idx" on relationships
                                    Index Cond: ("followerId" = 4)
                                    Heap Fetches: 0
                              ->  Index Only Scan using users_pkey
                                    Index Cond: (id = 4)
                                    Heap Fetches: 0
  Planning Time: 0.646 ms
  Execution Time: 30.824 ms

But this was achieved just because of parallel workers utilisation.
Disabling them we get:

  Limit  (cost=14635.07..14635.08 rows=2 width=21) (actual
time=75.941..75.943 rows=0 loops=1)
    ->  Sort  (cost=14635.07..14635.08 rows=2 width=21) (actual
time=75.939..75.940 rows=0 loops=1)
          Sort Key: objects.id DESC
          Sort Method: quicksort  Memory: 25kB
          ->  Seq Scan on objects  (cost=8.61..14635.06 rows=2 width=21)
(actual time=75.931..75.932 rows=0 loops=1)
                Filter: ((hashed SubPlan 1) OR ("userId" = 4))
                Rows Removed by Filter: 550430
                SubPlan 1
                  ->  Nested Loop  (cost=0.56..8.61 rows=1 width=4)
(actual time=0.039..0.040 rows=0 loops=1)
                        ->  Index Only Scan using
"relationships_followerId_followeeId_idx" on relationships
(cost=0.28..4.29 rows=1 width=8) (actual time=0.038..0.038 rows=0 loops=1)
                              Index Cond: ("followerId" = 4)
                              Heap Fetches: 0
                        ->  Index Only Scan using users_pkey on users
(cost=0.29..4.31 rows=1 width=4) (never executed)
                              Index Cond: (id = 4)
                              Heap Fetches: 0
  Planning Time: 0.945 ms
  Execution Time: 76.123 ms

So, from the optimiser's point of view, it has done the best it could.
Theoretically, if you have a big table with indexes and must select a
small number of tuples, the ideal query plan will include parameterised
NestLoop JOINs. Unfortunately, parameterisation in PostgreSQL can't pass
inside a subquery. It could be a reason for new development because
MSSQL can do such a trick, but it is a long way.
You can try to rewrite your schema and query to avoid subqueries in
expressions at all.
I hope this message gave you some insights.

[1] https://github.com/postgrespro/aqo

--
regards, Andrei Lepikhov

-- 
Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt (as agents only)