BUG #14918: Unexpected lock on standby server
От | fhagur@gmail.com |
---|---|
Тема | BUG #14918: Unexpected lock on standby server |
Дата | |
Msg-id | 20171117160833.1463.36850@wrigleys.postgresql.org обсуждение исходный текст |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 14918 Logged by: Flavio Gurgel Email address: fhagur@gmail.com PostgreSQL version: 9.6.6 Operating system: Debian 8.9 Linux 3.16.0-4-amd64 Description: Hello A lock on a relation in a standby server caused a query to stay waiting. Owr in-house application load-balances some non time-critical read queries to an assynchronous streaming standby server. The standby server was running PostgreSQL 9.6.6 and the master server was running 9.6.5 (today the master server has been updated to 9.6.6). The query is a single table very fast select like this (predicate was anonymized for non-disclosure, we're not running prepared statements here): SELECT search_results.* FROM search_results WHERE search_results.profile_id = ? ORDER BY array_length(agenda_ids, 1) desc nulls last, practice_id; The table structure is: Table "public.search_results" Column | Type | Collation | Nullable | Default ---------------------------------+-----------------------------+-----------+----------+-------------------------------------------- id | integer | | not null | nextval('search_results_id_seq'::regclass) practice_id | integer | | | speciality_id | integer | | | profile_id | integer | | not null | agenda_ids | character varying(255)[] | | | name | character varying(255) | | | latitude | double precision | | | longitude | double precision | | | created_at | timestamp without time zone | | | updated_at | timestamp without time zone | | | searchable | boolean | | not null | false rank | integer | | not null | 0 zipcode | character varying(255) | | not null | country | character varying(2) | | | visit_motive_label_ids | integer[] | | not null | '{}'::integer[] external_availabilities_service | boolean | | not null | false profile_practice_detail_id | integer | | | visit_motive_ids | integer[] | | | '{}'::integer[] name_search_only | boolean | | not null | false top_specialities | character varying[] | | not null | '{}'::character varying[] Indexes: "search_results_pkey" PRIMARY KEY, btree (id) "index_search_results_on_practice_id" btree (practice_id) "index_search_results_on_profile_id" btree (profile_id) "index_search_results_on_profile_practice_detail_id" btree (profile_practice_detail_id) "index_search_results_on_speciality_id" btree (speciality_id) "tsvector_on_search_results" gin (to_tsvector('doctolib_name_search'::regconfig, COALESCE(name, ''::character varying)::text)) Foreign-key constraints: "search_results_practice_id_fk" FOREIGN KEY (practice_id) REFERENCES practices(id) "search_results_profile_id_fk" FOREIGN KEY (profile_id) REFERENCES profiles(id) "search_results_profile_practice_detail_id_fk" FOREIGN KEY (profile_practice_detail_id) REFERENCES profile_practice_details(id) "search_results_speciality_id_fk" FOREIGN KEY (speciality_id) REFERENCES specialities(id) Several similar queries stacked up until we restarted postmaster. A statement_timeout of 7 seconds is set to all queries on standby. The blocking process was probably the wal receiver, since we didn't found queries from the blocking PID on the logs, and we didn't run a PS during the incident to confirm, we lost track of this with the postmaster restart, nothing else is logged. max_standby_streaming_delay is set to -1. We're used to see some replication lag caused by master cleanups on tables related to running queries on standby. During this incident, streaming lag exploded to several GB, we have it monitored from pg_stat_replication view. The only thing we found on master was an autovacuum that logged this, exactly at the beggining of the incident: automatic vacuum of table "doctolib.public.search_results": index scans: 1 pages: 7512 removed, 8138 remain, 0 skipped due to pins, 7801 skipped frozen tuples: 0 removed, 264186 remain, 0 are dead but not yet removable buffer usage: 49032 hits, 35850 misses, 18054 dirtied avg read rate: 4.068 MB/s, avg write rate: 2.049 MB/s system usage: CPU 0.62s/1.30u sec elapsed 68.84 sec This table is fully refreshed every hour hour with this transaction: BEGIN; DELETE FROM search_results; Loop here - INSERT INTO search_results...; COMMIT; The inserts loop can take tens of minutes, around half an hour. Delete runs in less then a second. This autovacuum worker seemed to appear just after this refresh operation, as expected. No recent changes were made in the table's structure or the query. We see these operations all the time in our application and we can say that it's the first time in months that we saw this lock behaviour in the standby server. Locks are also monitored, relation locks never appeared before. Since this lock is not expected in a standby server configured this way, we're suspecting that a bug in PostgreSQL is possible. Many thanks, Flavio Gurgel
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: psql ignores server port setting through alter system when connecting?