Обсуждение: Auto vacuum not running -- Could not bind socket for statistics collector
Dear list, I am having trouble running PostgreSQL 9.3 under OpenSuSE because auto vacuum does not seem to work. Here are the details on my version: # select version(); PostgreSQL 9.3.5 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388], 64-bit) After starting the server with pg_ctl start, I get the following entries in the logs: 2014-12-02 15:27:36 CET LOG: could not bind socket for statistics collector: Cannot assign requested address 2014-12-02 15:27:36 CET LOG: disabling statistics collector for lack of working socket 2014-12-02 15:27:36 CET WARNING: autovacuum not started because of misconfiguration 2014-12-02 15:27:36 CET HINT: Enable the "track_counts" option. BUT: track_counts is set to on in the postgresql.conf file (and so is auto_vacuum). I found some older threads using Google, and the person was given the advice to check the listen addresses resolve to the proper IP addresses, but this is the case for me: > grep listen_address /var/lib/pgsql/data/postgresql.conf listen_addresses = '127.0.0.1, 192.168.185.41' # what IP address(es) to listen on; > /sbin/ifconfig | grep eth0 -C 2 eth0 Link encap:Ethernet HWaddr 00:25:90:5A:B0:42 inet addr:192.168.185.41 Bcast:192.168.185.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > ping localhost PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.026 ms ... Some threads claim this was only a warning, and AV would be running, but this is not the case: # SELECT schemaname, relname, last_vacuum, last_autovacuum, vacuum_count, autovacuum_count FROM pg_stat_user_tables; schemaname | relname | last_vacuum | last_autovacuum | vacuum_count | autovacuum_count ------------+------------------------------+-------------+-----------------+--------------+------------------ public | plcc_motiftype | | | 0 | 0 public | plcc_ssecontact_complexgraph | | | 0 | 0 public | plcc_nm_ssetoproteingraph | | | 0 | 0 public | plcc_ssetypes | | | 0 | 0 public | plcc_contact | | | 0 | 0 public | plcc_complexcontacttypes | | | 0 | 0 public | plcc_protein | | | 0 | 0 public | plcc_contacttypes | | | 0 | 0 public | plcc_graphtypes | | | 0 | 0 public | plcc_sse | | | 0 | 0 public | plcc_secondat | | | 0 | 0 public | plcc_nm_ssetofoldinggraph | | | 0 | 0 public | plcc_fglinnot | | | 0 | 0 public | plcc_complex_contact | | | 0 | 0 public | plcc_foldinggraph | | | 0 | 0 public | plcc_ligand | | | 0 | 0 public | plcc_nm_ligandtochain | | | 0 | 0 public | plcc_graph | | | 0 | 0 public | plcc_graphlets | | | 0 | 0 public | plcc_motif | | | 0 | 0 public | plcc_chain | | | 0 | 0 public | plcc_complexgraph | | | 0 | 0 public | plcc_nm_chaintomotif | | | 0 | 0 public | plcc_graphletsimilarity | | | 0 | 0 Atm, 64 parallel instances of a custom Java application write a lot of of data into this database server (this is a computer cluster), so it SHOULD vaccuum. Doing simple SELECTs takes a long time after some hours (a 'SELECT count(*)' from a table with 5.5M columns takes 4 secs). When I run VACUUM manually, it works (takes very long though), and afterwards, the SELECTs are fast again (2ms). But this changes again when I re-start the processes, of course. What should I try next? How can I get more info on why auto vaccum is not running? Thanks in advance, -- Tim
=?UTF-8?Q?Tim_Sch=C3=A4fer?= <ts+ml@rcmd.org> writes: > After starting the server with pg_ctl start, I get the following entries in the > logs: > 2014-12-02 15:27:36 CET LOG: could not bind socket for statistics > collector: Cannot assign requested address > 2014-12-02 15:27:36 CET LOG: disabling statistics collector for lack of > working socket Yes, this will break autovacuum, because it won't have any way to find out what it should vacuum. The cause probably is a DNS issue: "localhost" isn't resolving to anything sensible. "dig localhost" on the command line might offer some insight. > I found some older threads using Google, and the person was given the advice to > check the listen addresses resolve to the proper IP addresses, but this is the > case for me: > grep listen_address /var/lib/pgsql/data/postgresql.conf listen_addresses is not related to this. The stats code tries to bind to whatever "localhost" resolves as, independently of that. regards, tom lane
Hi, > On December 2, 2014 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > =?UTF-8?Q?Tim_Sch=C3=A4fer?= <ts+ml@rcmd.org> writes: > > After starting the server with pg_ctl start, I get the following entries in > > the > > logs: > > > 2014-12-02 15:27:36 CET LOG: could not bind socket for statistics > > collector: Cannot assign requested address > > 2014-12-02 15:27:36 CET LOG: disabling statistics collector for lack of > > working socket > > Yes, this will break autovacuum, because it won't have any way to find out > what it should vacuum. The cause probably is a DNS issue: "localhost" > isn't resolving to anything sensible. "dig localhost" on the command line > might offer some insight. thanks for your answer. Here is my full 'dig localhost' from the database server: ~/data> dig localhost ; <<>> DiG 9.9.4-rpz2.13269.14-P2 <<>> localhost ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10157 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;localhost. IN A ;; ANSWER SECTION: localhost. 604800 IN A 127.0.0.1 ;; AUTHORITY SECTION: localhost. 604800 IN NS localhost. ;; ADDITIONAL SECTION: localhost. 604800 IN AAAA ::1 ;; Query time: 1 msec ;; SERVER: 192.168.185.11#53(192.168.185.11) ;; WHEN: Wed Dec 03 09:24:53 CET 2014 ;; MSG SIZE rcvd: 96 Looks fine to me. Or is there something wrong with it? And are you sure pgsql is unhappy with localhost? It would be great if I definitely knew the address it is trying to bind. Is there a way to tell? Thanks again & greetings, -- Tim
Re: Auto vacuum not running -- Could not bind socket for statistics collector
От
Adrian Klaver
Дата:
On 12/03/2014 12:36 AM, Tim Schäfer wrote: > Hi, > > >> On December 2, 2014 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> =?UTF-8?Q?Tim_Sch=C3=A4fer?= <ts+ml@rcmd.org> writes: >>> After starting the server with pg_ctl start, I get the following entries in >>> the >>> logs: >> >>> 2014-12-02 15:27:36 CET LOG: could not bind socket for statistics >>> collector: Cannot assign requested address >>> 2014-12-02 15:27:36 CET LOG: disabling statistics collector for lack of >>> working socket >> >> Yes, this will break autovacuum, because it won't have any way to find out >> what it should vacuum. The cause probably is a DNS issue: "localhost" >> isn't resolving to anything sensible. "dig localhost" on the command line >> might offer some insight. > > > thanks for your answer. Here is my full 'dig localhost' from the database > server: > > ~/data> dig localhost > > ; <<>> DiG 9.9.4-rpz2.13269.14-P2 <<>> localhost > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10157 > ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2 > > ;; OPT PSEUDOSECTION: > ; EDNS: version: 0, flags:; udp: 4096 > ;; QUESTION SECTION: > ;localhost. IN A > > ;; ANSWER SECTION: > localhost. 604800 IN A 127.0.0.1 > > ;; AUTHORITY SECTION: > localhost. 604800 IN NS localhost. > > ;; ADDITIONAL SECTION: > localhost. 604800 IN AAAA ::1 > > ;; Query time: 1 msec > ;; SERVER: 192.168.185.11#53(192.168.185.11) > ;; WHEN: Wed Dec 03 09:24:53 CET 2014 > ;; MSG SIZE rcvd: 96 > > > Looks fine to me. Or is there something wrong with it? > > And are you sure pgsql is unhappy with localhost? It would be great if I > definitely knew the address it is trying to bind. Is there a way to tell? The below after starting Postgres will show what it successfully bound to. Not a complete answer but might help narrow the possibilities. netstat -a | grep postgres netstat -a | grep <port_number> Where port_number is the one in postgresql.conf Also you say you are using pg_ctl to start the program. What user are you doing that as? Also any other changes in the Connections portion of postgresql.conf. > > Thanks again & greetings, > > -- > Tim > > -- Adrian Klaver adrian.klaver@aklaver.com
Hi, on the netstat question: it bound the default postgres port (5432) sucessfully on both interfaces (lo and eth0) it seems: postgres@ixion41:~/data> netstat -na | grep postg unix 2 [ ACC ] STREAM LISTENING 21156728 /var/run/postgresql/.s.PGSQL.5432 postgres@ixion41:~/data> netstat -na | grep 5432 tcp 0 0 192.168.185.41:5432 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN tcp 0 0 192.168.185.41:5432 192.168.185.42:40013 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34332 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40036 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34285 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49972 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40494 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40035 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:39976 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40042 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49975 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40041 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40023 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50950 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34339 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:39995 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50922 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40009 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34343 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40048 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50962 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34320 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49966 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.36:58217 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49910 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40563 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40566 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49937 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50891 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40549 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.36:58215 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59597 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59590 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34335 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.36:58212 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49963 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49950 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40540 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.36:58167 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.36:58188 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40032 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40020 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59587 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59575 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40046 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59594 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34345 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40027 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40559 ESTABLISHED tcp 0 846 192.168.185.41:5432 192.168.185.36:58221 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50947 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:39960 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.36:58196 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50954 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.42:40024 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.35:49969 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50961 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40522 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.36:58191 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.38:34294 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59584 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59581 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.34:50958 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.40:59599 ESTABLISHED tcp 0 0 192.168.185.41:5432 192.168.185.39:40557 ESTABLISHED unix 2 [ ACC ] STREAM LISTENING 21156730 /tmp/.s.PGSQL.5432 unix 2 [ ACC ] STREAM LISTENING 21156728 /var/run/postgresql/.s.PGSQL.5432 unix 3 [ ] STREAM CONNECTED 21244990 /tmp/.s.PGSQL.5432 All the ESTABLISHED connections are the clients writing to it (slowly). Also you say you are using pg_ctl to start the program. What user are you doing that as? As user 'postgres'. If I try it as root, it gives me the following message: # pg_ctl restart pg_ctl: cannot be run as root Please log in (using, e.g., "su") as the (unprivileged) user that will own the server process. Also any other changes in the Connections portion of postgresql.conf. The only change in that section besides 'listen_addresses' is 'max_connections = 140'. (The default value on SuSE 12.3 was 100.) I also noticed the following, but it is somehow expected I guess: vplg=# show autovacuum; autovacuum ------------ on (1 row) vplg=# show track_counts; track_counts -------------- off (1 row) Here are the relevant parts from my postgresql.conf: postgres@dbsrv:~/data> grep autovac postgresql.conf autovacuum = on # Enable autovacuum subprocess? 'on' #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and autovacuum_max_workers = 5 # max number of autovacuum subprocesses autovacuum_naptime = 1min # time between autovacuum runs #autovacuum_vacuum_threshold = 50 # min number of row updates before #autovacuum_analyze_threshold = 50 # min number of row updates before #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum #autovacuum_multixact_freeze_max_age = 400000000 # maximum Multixact age #autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for # autovacuum, in milliseconds; #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use postgres@dbsrv:~/data> grep track postgresql.conf track_activities = on track_counts = on #track_io_timing = off #track_functions = none # none, pl, all #track_activity_query_size = 1024 # (change requires restart) Thanks & greetings, -- Tim
=?UTF-8?Q?Tim_Sch=C3=A4fer?= <ts+ml@rcmd.org> writes: >> On December 2, 2014 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yes, this will break autovacuum, because it won't have any way to find out >> what it should vacuum. The cause probably is a DNS issue: "localhost" >> isn't resolving to anything sensible. "dig localhost" on the command line >> might offer some insight. > thanks for your answer. Here is my full 'dig localhost' from the database > server: > ... > Looks fine to me. Or is there something wrong with it? Hmph, looks fine to me too. > And are you sure pgsql is unhappy with localhost? It would be great if I > definitely knew the address it is trying to bind. Is there a way to tell? As I mentioned, this has nothing to do with local_addresses or the port setting; the code in pgstat.c is hard-wired to bind to whatever "localhost" resolves as. One idea is to see if you can strace postmaster startup (or whatever your preferred local equivalent of strace is, perhaps truss). That will produce a great deal of output but there should only be a few bind() calls so it won't be too hard to find the section where this is happening. Hmm ... actually, when I try it here (on a RHEL6 machine) the relevant stretch of output is open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 8 fcntl(8, F_GETFD) = 0x1 (flags FD_CLOEXEC) fstat(8, {st_mode=S_IFREG|0644, st_size=158, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa42206a000 read(8, "127.0.0.1 localhost localhost."..., 4096) = 158 read(8, "", 4096) = 0 close(8) = 0 munmap(0x7fa42206a000, 4096) = 0 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 8 connect(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 getsockname(8, {sa_family=AF_INET, sin_port=htons(37185), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 close(8) = 0 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 8 connect(8, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},28) = 0 getsockname(8, {sa_family=AF_INET6, sin6_port=htons(39774), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},[28]) = 0 close(8) = 0 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 8 bind(8, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},28) = 0 getsockname(8, {sa_family=AF_INET6, sin6_port=htons(46928), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},[28]) = 0 connect(8, {sa_family=AF_INET6, sin6_port=htons(46928), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},28) = 0 sendto(8, "\307", 1, 0, NULL, 0) = 1 select(9, [8], NULL, NULL, {0, 500000}) = 1 (in [8], left {0, 499994}) recvfrom(8, "\307", 1, 0, NULL, NULL) = 1 fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 which suggests that getaddrinfo() preferentially looks in /etc/hosts before contacting any DNS server. So perhaps that "dig" call is not telling you the real state of affairs, and what you need to do is see if there's a bogus entry for localhost in /etc/hosts. regards, tom lane
Hi, > On December 3, 2014 at 5:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > ... > which suggests that getaddrinfo() preferentially looks in /etc/hosts > before contacting any DNS server. So perhaps that "dig" call is not > telling you the real state of affairs, and what you need to do is > see if there's a bogus entry for localhost in /etc/hosts. Thanks a lot, your last hint did the trick! The /etc/hosts file looked like this on the DB server: ::1 localhost ipv6-localhost ipv6-loopback 192.168.185.41 ixion41.molbi ixion41 192.168.185.41 ixion41.molbi ixion41 I changed it to this: 127.0.0.1 localhost ::1 ipv6-localhost ipv6-loopback 192.168.185.41 ixion41.molbi ixion41 192.168.185.41 ixion41.molbi ixion41 Then I restarted the db server. Now I find this in the logs: 2014-12-03 17:17:53 CET LOG: database system was shut down at 2014-12-03 17:17:51 CET 2014-12-03 17:17:53 CET LOG: database system is ready to accept connections 2014-12-03 17:17:53 CET LOG: autovacuum launcher started So it seems to work now. Thanks a lot for your time & efforts, Tom Lane! -- Tim