Re: [GENERAL] Bad planning data resulting in OOM killing of postgres

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Дата
Msg-id CAMkU=1zdL_04dNCZMgxRYs9w4-tJHumj+o+_3-uGVf584f7wWw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Bad planning data resulting in OOM killing of postgres  (David Hinkle <hinkle@cipafilter.com>)
Список pgsql-general
On Thu, Feb 16, 2017 at 3:27 PM, David Hinkle <hinkle@cipafilter.com> wrote:
I guess this doesn't work, latest test run crashed.  It still uses the
bad plan for the hostid column even after n_distinct is updated.

cipafilter=# select attname, n_distinct from pg_stats where tablename
cipafilter-# = 'log_raw' and (attname = 'urlid' or attname = 'titleid'
or attname =
cipafilter(# 'hostid');
 attname | n_distinct
---------+-------------
 urlid   | 1.51625e+08
 hostid  |      304710
 titleid |      886499
(3 rows)

cipafilter=# explain DELETE FROM hosts WHERE NOT EXISTS ( SELECT
log_raw.hostid FROM log_raw WHERE log_raw.hostid = hosts.hostid );
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Delete on hosts  (cost=22249475.67..74844813.47 rows=1 width=12)
   ->  Hash Anti Join  (cost=22249475.67..74844813.47 rows=1 width=12)
         Hash Cond: (hosts.hostid = log_raw.hostid)
         ->  Seq Scan on hosts  (cost=0.00..5017.10 rows=304710 width=10)
         ->  Hash  (cost=12799395.52..12799395.52 rows=543645052 width=10)
               ->  Seq Scan on log_raw  (cost=0.00..12799395.52
rows=543645052 width=10)
(6 rows)

I guess I will also try throwing in 'set enable_hashjoin = false;' and
see if that gets these purges to go.



Another option would be to force the de-dup to happen, with:

explain with t as (select distinct hostid from log_raw) delete from hosts where not exists (select 1 from t where t.hostid=hosts.hostid)

That way you can use the hash join without running out of memory, in case the hash join is actually faster than the merge join.  Also, it just seems cleaner than fiddling with enable_* parameters and then having to remember to reset them when done.

Cheers,

Jeff

В списке pgsql-general по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: [GENERAL] postgresql how to duplicate rows in result.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] postgresql how to duplicate rows in result.