Обсуждение: Deleting takes days, should I add some index?

Поиск
Список
Период
Сортировка

Deleting takes days, should I add some index?

От
Alexander Farber
Дата:
Hello,

I am using PostgreSQL 10.15 on CentOS 7 with 64 GB RAM, Intel i7 6700 and I have the following 2 tables there:

words_ru=> \d words_games
                                      Table "public.words_games"
  Column  |           Type           | Collation | Nullable |                 Default
----------+--------------------------+-----------+----------+------------------------------------------
 gid      | integer                  |           | not null | nextval('words_games_gid_seq'::regclass)
 created  | timestamp with time zone |           | not null |
 finished | timestamp with time zone |           |          |
 player1  | integer                  |           | not null |
 player2  | integer                  |           |          |
 played1  | timestamp with time zone |           |          |
 played2  | timestamp with time zone |           |          |
 state1   | text                     |           |          |
 state2   | text                     |           |          |
 reason   | text                     |           |          |
 hint1    | text                     |           |          |
 hint2    | text                     |           |          |
 score1   | integer                  |           | not null |
 score2   | integer                  |           | not null |
 chat1    | integer                  |           | not null |
 chat2    | integer                  |           | not null |
 hand1    | character(1)[]           |           | not null |
 hand2    | character(1)[]           |           | not null |
 pile     | character(1)[]           |           | not null |
 letters  | character(1)[]           |           | not null |
 values   | integer[]                |           | not null |
 bid      | integer                  |           | not null |
 diff1    | integer                  |           |          |
 diff2    | integer                  |           |          |
Indexes:
    "words_games_pkey" PRIMARY KEY, btree (gid)
    "words_games_player1_coalesce_idx" btree (player1, COALESCE(finished, 'infinity'::timestamp with time zone))
    "words_games_player2_coalesce_idx" btree (player2, COALESCE(finished, 'infinity'::timestamp with time zone))
Check constraints:
    "words_games_chat1_check" CHECK (chat1 >= 0)
    "words_games_chat2_check" CHECK (chat2 >= 0)
    "words_games_check" CHECK (player1 <> player2)
    "words_games_score1_check" CHECK (score1 >= 0)
    "words_games_score2_check" CHECK (score2 >= 0)
Foreign-key constraints:
    "words_games_bid_fkey" FOREIGN KEY (bid) REFERENCES words_boards(bid) ON DELETE CASCADE
    "words_games_player1_fkey" FOREIGN KEY (player1) REFERENCES words_users(uid) ON DELETE CASCADE
    "words_games_player2_fkey" FOREIGN KEY (player2) REFERENCES words_users(uid) ON DELETE CASCADE
Referenced by:
    TABLE "words_chat" CONSTRAINT "words_chat_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
    TABLE "words_moves" CONSTRAINT "words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
    TABLE "words_scores" CONSTRAINT "words_scores_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE

words_ru=> \d words_moves
                                      Table "public.words_moves"
 Column  |           Type           | Collation | Nullable |                 Default
---------+--------------------------+-----------+----------+------------------------------------------
 mid     | bigint                   |           | not null | nextval('words_moves_mid_seq'::regclass)
 action  | text                     |           | not null |
 gid     | integer                  |           | not null |
 uid     | integer                  |           | not null |
 played  | timestamp with time zone |           | not null |
 tiles   | jsonb                    |           |          |
 score   | integer                  |           |          |
 str     | text                     |           |          |
 hand    | text                     |           |          |
 puzzle  | boolean                  |           | not null | false
 letters | character(1)[]           |           |          |
 values  | integer[]                |           |          |
Indexes:
    "words_moves_pkey" PRIMARY KEY, btree (mid)
    "words_moves_gid_played_idx" btree (gid, played DESC)
    "words_moves_puzzle_idx" btree (puzzle)
    "words_moves_uid_action_played_idx" btree (uid, action, played)
    "words_moves_uid_idx" btree (uid)
Check constraints:
    "words_moves_score_check" CHECK (score >= 0)
Foreign-key constraints:
    "words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
    "words_moves_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
Referenced by:
    TABLE "words_scores" CONSTRAINT "words_scores_mid_fkey" FOREIGN KEY (mid) REFERENCES words_moves(mid) ON DELETE CASCADE

My word game is published since beginning of 2018 and I have that many entries there:

words_ru=> select count(*) from words_games;
 count
--------
 155585
(1 row)

words_ru=> select count(*) from words_moves;
  count
---------
 5429162
(1 row)

However I was not saving some important data in the 1st months of 2018, so I would like to delete those old games:

words_ru=> select count(*) from words_games where finished < '2018-06-01';
 count
-------
  6223
(1 row)

words_ru=> select count(*) from words_moves where played < '2018-06-01';
 count
--------
 196319
(1 row)

My problem is - it takes days (I run my command using "screen").

So I ctrl-c (surprisingly not a single record was deleted; I was expecting at least some to be gone) and then do it one by one month ( delete from words_games where finished < '2018-01-01' and so on).

And it still takes days :-)

Since I gradually get more users in my game and I will probably have to run similar tasks in future, I would like to learn if there is some trick for faster deletion here?

Should I add some index maybe?

words_ru=> EXPLAIN delete from words_games where finished < '2018-06-01';
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Delete on words_games  (cost=0.00..39991.29 rows=7375 width=6)
   ->  Seq Scan on words_games  (cost=0.00..39991.29 rows=7375 width=6)
         Filter: (finished < '2018-06-01 00:00:00+02'::timestamp with time zone)
(3 rows)

Thank you
Alex


Re: Deleting takes days, should I add some index?

От
Alvaro Herrera
Дата:
On 2020-Nov-27, Alexander Farber wrote:

> Referenced by:
>     TABLE "words_chat" CONSTRAINT "words_chat_gid_fkey" FOREIGN KEY (gid)
> REFERENCES words_games(gid) ON DELETE CASCADE
>     TABLE "words_moves" CONSTRAINT "words_moves_gid_fkey" FOREIGN KEY (gid)
> REFERENCES words_games(gid) ON DELETE CASCADE
>     TABLE "words_scores" CONSTRAINT "words_scores_gid_fkey" FOREIGN KEY
> (gid) REFERENCES words_games(gid) ON DELETE CASCADE

Make sure you have indexes on the gid columns of these tables.  Delete
needs to scan them in order to find the rows that are cascaded to.

> So I ctrl-c (surprisingly not a single record was deleted; I was expecting
> at least some to be gone)

Ctrl-C aborts the transaction, so even though the rows are marked
deleted, they are so by an aborted transaction.  Therefore they're
alive.



Re: Deleting takes days, should I add some index?

От
Guillaume Lelarge
Дата:
Le ven. 27 nov. 2020 à 16:05, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
On 2020-Nov-27, Alexander Farber wrote:

> Referenced by:
>     TABLE "words_chat" CONSTRAINT "words_chat_gid_fkey" FOREIGN KEY (gid)
> REFERENCES words_games(gid) ON DELETE CASCADE
>     TABLE "words_moves" CONSTRAINT "words_moves_gid_fkey" FOREIGN KEY (gid)
> REFERENCES words_games(gid) ON DELETE CASCADE
>     TABLE "words_scores" CONSTRAINT "words_scores_gid_fkey" FOREIGN KEY
> (gid) REFERENCES words_games(gid) ON DELETE CASCADE

Make sure you have indexes on the gid columns of these tables.  Delete
needs to scan them in order to find the rows that are cascaded to.


An index on words_games(finished) and words_moves(played) would help too.


--
Guillaume.