Обсуждение: DNS vs /etc/hosts
I am changing from 7.2 to 8.0 and have both installed now on various Linux machines. When I use the psql command line interface with a -h hostname, the connection time from 7.2 is instant while the connection time from 8.0 is 15 seconds. My assumption is that 7.2 checks the /etc/hosts file first and if unable to find the specified host it reverts to a DNS lookup, and the 8.0 is just the opposite. Is this a correct assumption, and if so, can I modify 8.0 to behave as 7.2 does?
Am Donnerstag, den 04.08.2005, 10:13 -0500 schrieb Lowell.Hought@faa.gov: > > I am changing from 7.2 to 8.0 and have both installed now on various > Linux machines. When I use the psql command line interface with a -h > hostname, the connection time from 7.2 is instant while the connection > time from 8.0 is 15 seconds. My assumption is that 7.2 checks > the /etc/hosts file first and if unable to find the specified host it > reverts to a DNS lookup, and the 8.0 is just the opposite. Is this a > correct assumption, and if so, can I modify 8.0 to behave as 7.2 does? No, applications dont do lookups theirself. The os (or rather the resolver lib) decides how it works and therefore both 7.2 and 8.0 will behave the same. I think you have different user policies in their pg_hba.conf and 8.0 might (per default) want to check ident. And if you firewall it or so it might take a while to timeout. -- Tino Wildenhain <tino@wildenhain.de>
On Thu, Aug 04, 2005 at 10:13:43AM -0500, Lowell.Hought@faa.gov wrote: > I am changing from 7.2 to 8.0 and have both installed now on various Linux > machines. When I use the psql command line interface with a -h hostname, > the connection time from 7.2 is instant while the connection time from 8.0 > is 15 seconds. My assumption is that 7.2 checks the /etc/hosts file first > and if unable to find the specified host it reverts to a DNS lookup, and > the 8.0 is just the opposite. Is this a correct assumption, and if so, > can I modify 8.0 to behave as 7.2 does? Have you determined whether the difference is in the client (psql), in the server, or in both? What happens if you use a 7.2 client to connect to an 8.0 server, and if you use an 8.0 client to connect to a 7.2 server? Have you run a process trace or network sniffer to test your hypothesis? Let's find out exactly what and where the problem is before looking for a solution. But if DNS is the problem, why not fix it instead of working around it? -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Aug 4, 2005, at 8:13 AM, Lowell.Hought@faa.gov wrote:
I am changing from 7.2 to 8.0 and have both installed now on various Linux machines. When I use the psql command line interface with a -h hostname, the connection time from 7.2 is instant while the connection time from 8.0 is 15 seconds. My assumption is that 7.2 checks the /etc/hosts file first and if unable to find the specified host it reverts to a DNS lookup, and the 8.0 is just the opposite. Is this a correct assumption, and if so, can I modify 8.0 to behave as 7.2 does?
Is this on the same machine, or have you changed machines when you changed db versions?
(1) the lookups are usually handled by system calls, and assuming your are on a Unix type system, the files /etc/host.conf and /etc/nsswitch.conf will determine the order lookups are performed. Most every system I have seen comes with a default configuration of using the files first, and dns second. It might be useful to make sure these are set correctly.
(2) have you checked the 8.0 pg_hba.conf? It looks like ident is used. I am not very familiar with ident, usually only seeing it used for IRC chats, but I believe it looks to your client for the ident information. Are you running an ident server, or do you possibly have a firewall that just drops packets for blocked ports (assuming ident is among the blocked ports)? I would guess that a simple dropped packet would make it time out, while a rejected or no server on port would cause the ident connection to fail more quickly.
Just a couple of ideas.
Greg
Machine 1 is running version 8.0
Machine 2 is running version 7.2
Machine 3 has version 7.2 and version 8.0 installed, so both versions of "psql" are available for testing.
From machine 3 to machine 2
Version 7.2 psql - /usr/bin/psql -d dbname -h machine2 ---- connection time instant
Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h machine2 ---- conection time 15 seconds
Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h ip.address ---- connection time instant
From machine 3 to machine 1
Version 7.2 psql - /usr/bin/psql -d dbname -h machine1 ---- connection time instant
Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h machine1 ---- conection time 15 seconds
Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h ip.address ---- connection time instant
Tino Wildenhain <tino@wildenhain.de> Sent by: pgsql-general-owner@postgresql.org 08/04/2005 10:56 AM |
|
Am Donnerstag, den 04.08.2005, 10:13 -0500 schrieb
Lowell.Hought@faa.gov:
>
> I am changing from 7.2 to 8.0 and have both installed now on various
> Linux machines. When I use the psql command line interface with a -h
> hostname, the connection time from 7.2 is instant while the connection
> time from 8.0 is 15 seconds. My assumption is that 7.2 checks
> the /etc/hosts file first and if unable to find the specified host it
> reverts to a DNS lookup, and the 8.0 is just the opposite. Is this a
> correct assumption, and if so, can I modify 8.0 to behave as 7.2 does?
No, applications dont do lookups theirself.
The os (or rather the resolver lib) decides
how it works and therefore both 7.2 and 8.0
will behave the same.
I think you have different user policies in their
pg_hba.conf and 8.0 might (per default) want to
check ident. And if you firewall it or so it might
take a while to timeout.
--
Tino Wildenhain <tino@wildenhain.de>
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On Thu, Aug 04, 2005 at 12:04:27PM -0500, Lowell.Hought@faa.gov wrote: > Version 7.2 psql - /usr/bin/psql -d dbname -h machine1 ---- > connection time instant > Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h machine1 ---- > conection time 15 seconds > Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h ip.address ---- > connection time instant Do the 8.0 connections to a name take exactly 15 seconds every time, or does the time vary? Have you done process traces on 7.2 vs. 8.0 to see what they're doing differently? You mentioned that you were using Linux, so something like "strace -o filename -r psql ..." should work (the -r option should add relative timestamps to the trace so you can see where the slowness is happening). As others have mentioned, name resolution is generally done by libraries that aren't part of PostgreSQL, so if two versions of PostgreSQL behave differently in that respect then we need to find out what's different about them. Have you used ldd to see what libraries each version of psql is linked against? Are there differences aside from libpq? Have you used a tool like dig, host, or nslookup to test whether DNS indeed has a problem? That wouldn't answer why different versions of psql apparently behave differently, but it should at least tell us whether DNS is really a problem. Have you used a sniffer like tcpdump or ethereal to watch DNS queries and PostgreSQL connections? -- Michael Fuhr
I'd start by comparing the /etc/nsswitch.conf files on the various machines. If the second column contains "files" for passwd and hosts on the fast machines, and "dns" on the slow machine, then change the slow machine to "files" and see if it speeds up. That's an easy way to rule out or condemn DNS. If you change a machine to "files", make sure the /etc/passwd has at least the user you intend to login as, and /etc/hosts has the hostnames. Rick Michael Fuhr <mike@fuhr.org> Sent by: To pgsql-general-own Lowell.Hought@faa.gov er@postgresql.org cc Tino Wildenhain <tino@wildenhain.de>, 08/04/2005 02:29 pgsql-general@postgresql.org PM Subject Re: [GENERAL] DNS vs /etc/hosts On Thu, Aug 04, 2005 at 12:04:27PM -0500, Lowell.Hought@faa.gov wrote: > Version 7.2 psql - /usr/bin/psql -d dbname -h machine1 ---- > connection time instant > Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h machine1 ---- > conection time 15 seconds > Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h ip.address ---- > connection time instant Do the 8.0 connections to a name take exactly 15 seconds every time, or does the time vary? Have you done process traces on 7.2 vs. 8.0 to see what they're doing differently? You mentioned that you were using Linux, so something like "strace -o filename -r psql ..." should work (the -r option should add relative timestamps to the trace so you can see where the slowness is happening). As others have mentioned, name resolution is generally done by libraries that aren't part of PostgreSQL, so if two versions of PostgreSQL behave differently in that respect then we need to find out what's different about them. Have you used ldd to see what libraries each version of psql is linked against? Are there differences aside from libpq? Have you used a tool like dig, host, or nslookup to test whether DNS indeed has a problem? That wouldn't answer why different versions of psql apparently behave differently, but it should at least tell us whether DNS is really a problem. Have you used a sniffer like tcpdump or ethereal to watch DNS queries and PostgreSQL connections? -- Michael Fuhr ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
On Thu, Aug 04, 2005 at 03:01:31PM -0500, Richard_D_Levine@raytheon.com wrote: > I'd start by comparing the /etc/nsswitch.conf files on the various > machines. If the second column contains "files" for passwd and hosts on > the fast machines, and "dns" on the slow machine, then change the slow > machine to "files" and see if it speeds up. That's an easy way to rule out > or condemn DNS. The information we've been given suggests that the same version of psql behaves the same on different machines, and that different versions of psql behave differently on the same machine. If that's the case, then such behavior isn't easily explained by differing nsswitch.conf configurations. Even if mucking around with nsswitch.conf did appear to fix things, we'd still have the mystery of why the two versions of psql behave differently. -- Michael Fuhr
Sorry to re-reply, but I had a much simpler idea. From the client machine that is slow to connect, type the command "nslookup hostname1". If it takes 15 seconds. If it does, DNS is the problem. Rick pgsql-general-owner@postgresql.org wrote on 08/04/2005 03:01:31 PM: > I'd start by comparing the /etc/nsswitch.conf files on the various > machines. If the second column contains "files" for passwd and hosts on > the fast machines, and "dns" on the slow machine, then change the slow > machine to "files" and see if it speeds up. That's an easy way to rule out > or condemn DNS. > > If you change a machine to "files", make sure the /etc/passwd has at least > the user you intend to login as, and /etc/hosts has the hostnames. > > Rick > > > > Michael Fuhr > <mike@fuhr.org> > Sent by: To > pgsql-general-own Lowell.Hought@faa.gov > er@postgresql.org cc > Tino Wildenhain > <tino@wildenhain.de>, > 08/04/2005 02:29 pgsql-general@postgresql.org > PM Subject > Re: [GENERAL] DNS vs /etc/hosts > > > > > > > > > > > On Thu, Aug 04, 2005 at 12:04:27PM -0500, Lowell.Hought@faa.gov wrote: > > Version 7.2 psql - /usr/bin/psql -d dbname -h machine1 ---- > > connection time instant > > Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h machine1 ---- > > conection time 15 seconds > > Version 8.0 psql - /usr/local/pgsql/bin/psql -d dbname -h ip.address > ---- > > connection time instant > > Do the 8.0 connections to a name take exactly 15 seconds every time, > or does the time vary? > > Have you done process traces on 7.2 vs. 8.0 to see what they're > doing differently? You mentioned that you were using Linux, so > something like "strace -o filename -r psql ..." should work (the > -r option should add relative timestamps to the trace so you can > see where the slowness is happening). As others have mentioned, > name resolution is generally done by libraries that aren't part of > PostgreSQL, so if two versions of PostgreSQL behave differently in > that respect then we need to find out what's different about them. > Have you used ldd to see what libraries each version of psql is > linked against? Are there differences aside from libpq? > > Have you used a tool like dig, host, or nslookup to test whether > DNS indeed has a problem? That wouldn't answer why different > versions of psql apparently behave differently, but it should at > least tell us whether DNS is really a problem. > > Have you used a sniffer like tcpdump or ethereal to watch DNS queries > and PostgreSQL connections? > > -- > Michael Fuhr > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
nslookup isn't the easiest tool for use in diagnosing dns problems as it goes through the whole messy nsswitch process, and doesn't readily isolate much of anything. the dig command focuses on dns only, skips nsswitch altogether, and lets you rule dns problems in or out in one swell foop. if dig is fast and nslookup is slow, then you need to examine /etc/nsswitch for foulups. richard -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Richard_D_Levine@raytheon.com Sent: Thursday, August 04, 2005 4:29 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] DNS vs /etc/hosts Sorry to re-reply, but I had a much simpler idea. From the client machine that is slow to connect, type the command "nslookup hostname1". If it takes 15 seconds. If it does, DNS is the problem.
Your assessment is correct ... the same version of
psql behaves the same on different machines, and different
versions of psql behave differently on the same machine.
The difference must have to do with the functions that differ in the different versions of psql. In looking through the code for version 8.0 in the file /interfaces/libpq/ip.c, the function that resolves hostname is "getaddrinfo". Is this the same function that was used in version 7.2, and if not, how does it differ? Is there something on my machine that I can configure?
Michael Fuhr <mike@fuhr.org> Sent by: pgsql-general-owner@postgresql.org 08/04/2005 03:25 PM |
|
On Thu, Aug 04, 2005 at 03:01:31PM -0500, Richard_D_Levine@raytheon.com wrote:
> I'd start by comparing the /etc/nsswitch.conf files on the various
> machines. If the second column contains "files" for passwd and hosts on
> the fast machines, and "dns" on the slow machine, then change the slow
> machine to "files" and see if it speeds up. That's an easy way to rule out
> or condemn DNS.
The information we've been given suggests that the same version of
psql behaves the same on different machines, and that different
versions of psql behave differently on the same machine. If that's
the case, then such behavior isn't easily explained by differing
nsswitch.conf configurations. Even if mucking around with nsswitch.conf
did appear to fix things, we'd still have the mystery of why the two
versions of psql behave differently.
--
Michael Fuhr
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Both dig and nslookup are fast on all machines. 'psql' is fast on all machines, as long as I am using the version compiled with version 7.2. It is only 'psql' compiled with version 8.0 that is slow. I don't think DNS is the problem, but rather the way psql in version 8.0 attempts to get the DNS info. My Linux kernal version is 2.4.18.
"WELTY, RICHARD" <richard.welty@bankofamerica.com> Sent by: pgsql-general-owner@postgresql.org 08/04/2005 03:46 PM |
|
nslookup isn't the easiest tool for use in diagnosing dns problems
as it goes through the whole messy nsswitch process, and doesn't
readily isolate much of anything.
the dig command focuses on dns only, skips nsswitch altogether,
and lets you rule dns problems in or out in one swell foop. if
dig is fast and nslookup is slow, then you need to examine
/etc/nsswitch for foulups.
richard
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of
Richard_D_Levine@raytheon.com
Sent: Thursday, August 04, 2005 4:29 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] DNS vs /etc/hosts
Sorry to re-reply, but I had a much simpler idea. From the client machine
that is slow to connect, type the command "nslookup hostname1". If it
takes 15 seconds. If it does, DNS is the problem.
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
On Thu, Aug 04, 2005 at 04:01:43PM -0500, Lowell.Hought@faa.gov wrote: > I also performed the trace you suggested. The results are the same until > this point, where the time for > version 8.0 totals 0.025960 and for > version 7.2 totals 0.009481 Those differences probably don't matter, but what comes next does. The 7.2 trace shows a DNS query to 10.32.104.5 for a name that begins with zmpweb5.dms.ats.agl (the strace output is truncated after that). The DNS server responds with a packet of 142 bytes, after which the process makes a TCP connection to 10.32.104.110:5432, which is presumably the database server. The 8.0 trace is different: it appears to make the same DNS query to 10.32.104.5, but the response it receives is only 98 bytes (was it in fact the same query?). The process then makes a DNS query to 10.32.104.5 for just zmpweb5, and that query times out after 5 seconds. Then the process sends a query for zmpweb5 to 172.17.46.46, which refuses the connection, possibly because no DNS server is running on that machine. We then see a query for zmpweb5 to 172.17.40.42, and that query times out after 6 seconds. Then another query for zmpweb5 to 10.32.104.5 and a 5-second timeout, a query for zmpweb5 to 172.17.46.46 and a refused connection, and a query for zmpweb5 to 172.17.40.42 and a 6-second timeout. We then see the process read /etc/hosts, but afterwards it makes another DNS query to 10.32.104.5 for zmpweb5.dms.ats.agl.<truncated>, and this time we see a 142-byte response, as 7.2 had received on its first attempt. Finally we see a TCP connection to 10.32.104.110:5432. So why does 8.0 receive a 98-byte response to its first DNS query when 7.2 received a 142-byte response? We can tell a little something about the responses by looking at the data in the strace output, with the help of RFC 1035 Section 4.1.1. In octal, the DNS response headers are: 7.2 \260\5\205\200\0\1\0\1\0\2\0\2 8.0 \30\310\205\200\0\1\0\0\0\1\0\0 The response to 7.2 has an ANCOUNT (number of records in the answer section) of 1 and an NSCOUNT (number of records in the authority section) of 2, whereas the response to 8.0 has an ANCOUNT of 0 and an NSCOUNT of 1. That disparity is odd if the DNS queries were indeed the same. A few DNS queries with dig might show what's happening, and some sniffer output of the DNS queries that psql makes might also be enlightening. Something like the following ought to do the trick: tcpdump -s526 -n -vv udp and port 53 The -s526 option tells tcpdump to grab enough data for the largest possible UDP DNS packet (512 octets) plus a bit extra for the layer 2 header. It might be interesting to see the tcpdump output for psql 7.2's DNS queries and then 8.0's DNS queries (or use ethereal/tethereal or another sniffer if you prefer, as long as we can see as much of the DNS packets as possible). BTW, some resolver libraries can be configured not to attempt DNS queries for just "hostname" when "hostname.subdomain.domain" fails. I seldom find such queries useful and I do occasionally find them problematic, so if my resolver has such an option then I usually enable it (e.g., "options no_tld_query" in /etc/resolv.conf on FreeBSD). -- Michael Fuhr
On Thu, Aug 04, 2005 at 04:39:02PM -0500, Lowell.Hought@faa.gov wrote: > The difference must have to do with the functions that differ in the > different versions of psql. In looking through the code for version 8.0 > in the file /interfaces/libpq/ip.c, the function that resolves hostname is > "getaddrinfo". Is this the same function that was used in version 7.2, > and if not, how does it differ? Is there something on my machine that I > can configure? Good catch -- the use of getaddrinfo() appears to have been added in 7.4. I see calls to inet_aton() and gethostbyname() in earlier versions, so maybe that explains the difference. A simple test program should be able to confirm or refute that hypothesis. The tcpdump output I suggested in another message should show exactly what queries are being made and what responses are being received. Different systems have different resolver customizations; you'll have to check your local documentation. I'd start with "man resolv.conf". I'd especially look for options that control if and when queries for the top-level domain "hostname" are made when queries for "hostname.domain" fail. You might also want to examine your domain search list. -- Michael Fuhr
On Thu, Aug 04, 2005 at 04:30:52PM -0600, Michael Fuhr wrote: > The response to 7.2 has an ANCOUNT (number of records in the answer > section) of 1 and an NSCOUNT (number of records in the authority > section) of 2, whereas the response to 8.0 has an ANCOUNT of 0 and > an NSCOUNT of 1. That disparity is odd if the DNS queries were > indeed the same. I wonder if the use of getaddrinfo() in 8.0 is causing the first DNS query to be for an AAAA record instead of for an A record. The connectDBStart() function in src/interfaces/libpq/fe-connect.c sets hint.ai_family = AF_UNSPEC, which on some systems might cause the resolver to try an AAAA query first. That would explain the above disparity: the response to the AAAA query would return a response code of NOERROR, no records in the answer section, and the zone's SOA record in the authority section (at least that's how BIND 9 responds). The resolver then makes AAAA queries for the unqualified name (i.e., the name as a top-level domain) and those queries time out; finally it makes A queries for the fully-qualified name and we get success. This is exactly what the strace output appears to show. A packet sniff should be able to confirm or refute. Anybody know if AAAA queries can be disabled in Linux? Lowell, if nobody answers here then you might need to seek help in a different forum. Or you could just hack the code and change AF_UNSPEC to AF_INET ;-) -- Michael Fuhr
On Thu, Aug 04, 2005 at 06:29:46PM -0600, Michael Fuhr wrote: > Anybody know if AAAA queries can be disabled in Linux? Lowell, if > nobody answers here then you might need to seek help in a different > forum. Or you could just hack the code and change AF_UNSPEC to > AF_INET ;-) Lowell, aside from trying to disable AAAA queries altogether, you might want to investigate why those top-level domain queries are timing out. Those queries should fail fairly quickly -- is your connectivity to the root DNS servers poor or non-existent? But that's getting off-topic for this list.... -- Michael Fuhr
On Aug 4, 2005, at 2:39 PM, Lowell.Hought@faa.gov wrote:
Both dig and nslookup are fast on all machines. 'psql' is fast on all machines, as long as I am using the version compiled with version 7.2. It is only 'psql' compiled with version 8.0 that is slow. I don't think DNS is the problem, but rather the way psql in version 8.0 attempts to get the DNS info. My Linux kernal version is 2.4.18.
Silly question.
Could the version of psql from 8.0 be linked against readline, and it's reading in and storing in memory some of the information it needs to have cached in order to provide the tab-completion feature? And, 7.2 is not?
I've seen some delays with both mysql and pgsql when readline libraries are involved on databases with lots of tables and fields in them.
Just a thought.
Greg
Hi, On Thursday 04 August 2005 17:13, Lowell.Hought@faa.gov wrote: | I am changing from 7.2 to 8.0 and have both installed now on various Linux | machines. When I use the psql command line interface with a -h hostname, | the connection time from 7.2 is instant while the connection time from 8.0 | is 15 seconds. My assumption is that 7.2 checks the /etc/hosts file first | and if unable to find the specified host it reverts to a DNS lookup, and | the 8.0 is just the opposite. Is this a correct assumption, and if so, | can I modify 8.0 to behave as 7.2 does? I've once seen nameservice and connection delays caused by improperly configured IPV6 support on some Linux machines. Removing the responsible modules from the kernel fixed it. Just another guess though :-) Ciao, Thomas -- Dr. Thomas Pundt <thomas.pundt@rp-online.de> ---- http://rp-online.de/ ----
How might I check for that? And if it is determined to be a problem, how would I remove the guilty modules?
Thomas Pundt <mlists@rp-online.de> Sent by: pgsql-general-owner@postgresql.org 08/05/2005 07:19 AM |
|
Hi,
On Thursday 04 August 2005 17:13, Lowell.Hought@faa.gov wrote:
| I am changing from 7.2 to 8.0 and have both installed now on various Linux
| machines. When I use the psql command line interface with a -h hostname,
| the connection time from 7.2 is instant while the connection time from 8.0
| is 15 seconds. My assumption is that 7.2 checks the /etc/hosts file first
| and if unable to find the specified host it reverts to a DNS lookup, and
| the 8.0 is just the opposite. Is this a correct assumption, and if so,
| can I modify 8.0 to behave as 7.2 does?
I've once seen nameservice and connection delays caused by improperly
configured IPV6 support on some Linux machines. Removing the responsible
modules from the kernel fixed it. Just another guess though :-)
Ciao,
Thomas
--
Dr. Thomas Pundt <thomas.pundt@rp-online.de> ---- http://rp-online.de/ ----
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
Your are correct in that 8.0 is doing a AAAA request first. I am running Red Hat version 8.0. The difference in the way 7.2 and 8.0 resolve the host option has to be because of the change from gethostbyname to getaddrinfo. Is there some way I can force my machine to do an A search before a AAAA search?
Here is the output from the tcpdump you suggested for 7.2:
--------------------------------------------------------------------------------------------------------------------
14:50:37.679429 10.32.104.97.32777 > 10.32.104.5.domain: [udp sum ok] 9750+ A? zmpweb5.dms.ats.agl.faa.gov. [|domain] (DF) (ttl 64, id 23879, len 73)
14:50:37.680131 10.32.104.5.domain > 10.32.104.97.32777: [udp sum ok] 9750* q: A? zmpweb5.dms.ats.agl.faa.gov. 1/2/2 zmpweb5.dms.ats.agl.faa.gov. A 10.32.104.110 ns: dms.ats.agl.faa.gov. NS agldmszmps1.dms.ats.agl.faa.gov., dms.ats.agl.faa.gov. NS agldmss3.dms.ats.agl.faa.gov. ar: agldmss3.dms.ats.agl.faa.gov. A 10.32.104.3, agldmszmps1.dms.ats.agl.faa.gov. A 10.32.104.5 (142) (ttl 128, id 33877, len 170)
--------------------------------------------------------------------------------------------------------------------
Here is the output from 8.0:
--------------------------------------------------------------------------------------------------------------------
14:50:03.736903 10.32.104.97.32777 > 10.32.104.5.domain: [udp sum ok] 18412+ AAAA? zmpweb5.dms.ats.agl.faa.gov. [|domain] (DF) (ttl 64, id 6499, len 73)
14:50:03.737652 10.32.104.5.domain > 10.32.104.97.32777: [udp sum ok] 18412* q: AAAA? zmpweb5.dms.ats.agl.faa.gov. 0/1/0 ns: dms.ats.agl.faa.gov. SOA agldmszmps1.dms.ats.agl.faa.gov. root.dms.ats.agl.faa.gov. 2001145122 10800 3600 43200 7200 (98) (ttl 128, id 44115, len 126)
14:50:03.737822 10.32.104.97.32777 > 10.32.104.5.domain: [udp sum ok] 18413+ AAAA? zmpweb5. [|domain] (DF) (ttl 64, id 6500, len 53)
14:50:08.738756 10.32.104.97.32777 > 10.32.104.5.domain: [udp sum ok] 18413+ AAAA? zmpweb5. [|domain] (DF) (ttl 64, id 6501, len 53)
14:50:10.686497 10.32.104.5.domain > 10.32.104.97.32777: [udp sum ok] 21278 ServFail q: AAAA? zmpweb5. 0/0/0 (25) (ttl 128, id 7764, len 53)
14:50:10.686617 10.32.104.5.domain > 10.32.104.97.32777: [udp sum ok] 21278 ServFail q: AAAA? zmpweb5. 0/0/0 (25) (ttl 128, id 8020, len 53)
14:50:10.686622 10.32.104.5.domain > 10.32.104.97.32777: [udp sum ok] 18413 ServFail q: AAAA? zmpweb5. 0/0/0 (25) (ttl 128, id 8276, len 53)
14:50:10.686676 10.32.104.5.domain > 10.32.104.97.32777: [udp sum ok] 18413 ServFail q: AAAA? zmpweb5. 0/0/0 (25) (ttl 128, id 8532, len 53)
14:50:10.687162 10.32.104.97.32777 > 10.32.104.5.domain: [udp sum ok] 18414+ A? zmpweb5.dms.ats.agl.faa.gov. [|domain] (DF) (ttl 64, id 10058, len 73)
14:50:10.688109 10.32.104.5.domain > 10.32.104.97.32777: [udp sum ok] 18414* q: A? zmpweb5.dms.ats.agl.faa.gov. 1/2/2 zmpweb5.dms.ats.agl.faa.gov. A 10.32.104.110 ns: dms.ats.agl.faa.gov. NS agldmss3.dms.ats.agl.faa.gov., dms.ats.agl.faa.gov. NS agldmszmps1.dms.ats.agl.faa.gov. ar: agldmss3.dms.ats.agl.faa.gov. A 10.32.104.3, agldmszmps1.dms.ats.agl.faa.gov. A 10.32.104.5 (142) (ttl 128, id 8788, len 170)
-----------------------------------------------------------------------------------------------------------------------
Michael Fuhr <mike@fuhr.org> Sent by: pgsql-general-owner@postgresql.org 08/04/2005 05:30 PM |
|
On Thu, Aug 04, 2005 at 04:01:43PM -0500, Lowell.Hought@faa.gov wrote:
> I also performed the trace you suggested. The results are the same until
> this point, where the time for
> version 8.0 totals 0.025960 and for
> version 7.2 totals 0.009481
Those differences probably don't matter, but what comes next does.
The 7.2 trace shows a DNS query to 10.32.104.5 for a name that
begins with zmpweb5.dms.ats.agl (the strace output is truncated
after that). The DNS server responds with a packet of 142 bytes,
after which the process makes a TCP connection to 10.32.104.110:5432,
which is presumably the database server.
The 8.0 trace is different: it appears to make the same DNS query
to 10.32.104.5, but the response it receives is only 98 bytes (was
it in fact the same query?). The process then makes a DNS query
to 10.32.104.5 for just zmpweb5, and that query times out after 5
seconds. Then the process sends a query for zmpweb5 to 172.17.46.46,
which refuses the connection, possibly because no DNS server is
running on that machine. We then see a query for zmpweb5 to
172.17.40.42, and that query times out after 6 seconds. Then another
query for zmpweb5 to 10.32.104.5 and a 5-second timeout, a query
for zmpweb5 to 172.17.46.46 and a refused connection, and a query
for zmpweb5 to 172.17.40.42 and a 6-second timeout. We then see
the process read /etc/hosts, but afterwards it makes another DNS
query to 10.32.104.5 for zmpweb5.dms.ats.agl.<truncated>, and this
time we see a 142-byte response, as 7.2 had received on its first
attempt. Finally we see a TCP connection to 10.32.104.110:5432.
So why does 8.0 receive a 98-byte response to its first DNS query
when 7.2 received a 142-byte response? We can tell a little something
about the responses by looking at the data in the strace output,
with the help of RFC 1035 Section 4.1.1. In octal, the DNS response
headers are:
7.2 \260\5\205\200\0\1\0\1\0\2\0\2
8.0 \30\310\205\200\0\1\0\0\0\1\0\0
The response to 7.2 has an ANCOUNT (number of records in the answer
section) of 1 and an NSCOUNT (number of records in the authority
section) of 2, whereas the response to 8.0 has an ANCOUNT of 0 and
an NSCOUNT of 1. That disparity is odd if the DNS queries were
indeed the same.
A few DNS queries with dig might show what's happening, and some
sniffer output of the DNS queries that psql makes might also be
enlightening. Something like the following ought to do the trick:
tcpdump -s526 -n -vv udp and port 53
The -s526 option tells tcpdump to grab enough data for the largest
possible UDP DNS packet (512 octets) plus a bit extra for the layer 2
header. It might be interesting to see the tcpdump output for psql
7.2's DNS queries and then 8.0's DNS queries (or use ethereal/tethereal
or another sniffer if you prefer, as long as we can see as much of the
DNS packets as possible).
BTW, some resolver libraries can be configured not to attempt DNS
queries for just "hostname" when "hostname.subdomain.domain" fails.
I seldom find such queries useful and I do occasionally find them
problematic, so if my resolver has such an option then I usually
enable it (e.g., "options no_tld_query" in /etc/resolv.conf on
FreeBSD).
--
Michael Fuhr
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Hi, On Friday 05 August 2005 16:21 Lowell.Hought@faa.gov wrote: | How might I check for that? If it's a standard distribution kernel, try "lsmod | grep ipv6" - this will show you if you have loaded the IPv6 module; try to remove the module by issueing "rmmod ipv6". If that fails, you probably have to edit your /etc/modprobe.conf or /etc/modprobe.d/aliases file, comment out the ipv6 module entry and reboot the machine. Then repeat your test. But again, I'm just guessing in the wild. Ciao, Thomas -- Thomas Pundt ------- http://www.pundt.de/ E-Mail: thomas@pundt.de
Lowell.Hought@faa.gov writes: > Your are correct in that 8.0 is doing a AAAA request first. I am running > Red Hat version 8.0. The difference in the way 7.2 and 8.0 resolve the > host option has to be because of the change from gethostbyname to > getaddrinfo. Is there some way I can force my machine to do an A search > before a AAAA search? On a recent RH system, "man 5 resolver" suggests that putting "options inet6" into /etc/resolv.conf is what makes this happen ... if there is such an entry on your system, try removing it. RH 8.0 is a good ways back though, so read the local version of that man page before doing anything with that config file. I concur with Michael's previous suggestion that the best answer is to fix the clearly-broken DNS environment you're dealing with. It is no longer acceptable for anyone to be running nameservers that have not heard of IPv6 --- unless it's for a network that only contains clients that have not heard of IPv6, which yours evidently is not. Have a word with your local network admin. regards, tom lane
On Sat, Aug 06, 2005 at 12:38:50AM -0400, Tom Lane wrote: > Lowell.Hought@faa.gov writes: > > Your are correct in that 8.0 is doing a AAAA request first. I am running > > Red Hat version 8.0. The difference in the way 7.2 and 8.0 resolve the > > host option has to be because of the change from gethostbyname to > > getaddrinfo. Is there some way I can force my machine to do an A search > > before a AAAA search? > > On a recent RH system, "man 5 resolver" suggests that putting "options > inet6" into /etc/resolv.conf is what makes this happen ... if there is > such an entry on your system, try removing it. RH 8.0 is a good ways > back though, so read the local version of that man page before doing > anything with that config file. Hmmm...I have unprivileged access to a RH 7.3 box and I see the "inet6" option in its resolver(5) manual page, but /etc/resolv.conf doesn't have that option. Yet a test program that calls getaddrinfo() with hints.ai_family = AF_UNSPEC nevertheless tries AAAA queries first (I can't run a sniffer on that box, so I tweaked the test program's _res structure to send DNS queries to a server that I can sniff). The resolver algorithm for an unqualified hostname is: 1. AAAA query for hostname.domain (for each domain in the search list). 2. AAAA query for hostname (i.e., as a top-level domain). 3. A query for hostname.domain. 4. A query for hostname. Lowell's sniffer output shows this algorithm in action. The (1) query returns zero answers, so we proceed to the (2) query. Here we see a retry due to a timeout and eventually the DNS server responds with SERVFAIL (see later comments on this). Then we proceed to (3) and finally get an answer. Thomas Pundt suggested running "lsmod | grep ipv6" and disabling the ipv6 module if it's not needed. On the RH 7.3 box I have access to, lsmod shows nothing like "ipv6", "ip6", "inet6", etc. So, /etc/resolv.conf doesn't have an "inet6" option and the kernel doesn't appear to have an IPv6 module, and yet getaddrinfo() still makes AAAA queries. Does anybody know if this behavior can be disabled on Linux if the box doesn't use IPv6? The (2) and (4) queries above (the queries for the hostname as a top-level domain) are also a nuisance. On FreeBSD those can be disabled with the "no_tld_query" option in /etc/resolv.conf, but a glance through Linux's resolver(5) manual page doesn't show any such option. Can these queries be disabled on Linux? (This is becoming a Linux configuration thread, so these questions might need to be asked elsewhere.) > I concur with Michael's previous suggestion that the best answer > is to fix the clearly-broken DNS environment you're dealing with. > It is no longer acceptable for anyone to be running nameservers > that have not heard of IPv6 --- unless it's for a network that > only contains clients that have not heard of IPv6, which yours > evidently is not. Have a word with your local network admin. Something Wrong does appear to be happening with this site's DNS. The top-level domain AAAA queries should fail fairly quickly with NXDOMAIN after the query goes to a root DNS server that responds with "sorry, ain't no such name," yet the DNS server takes several seconds to respond at all, and when it does it responds with SERVFAIL. That's why I was wondering about connectivity problems to the roots. In summary, several things would be desirable: 1. Disable AAAA queries if the box doesn't use IPv6. 2. Disable top-level domain queries in the resolver search algorithm when looking up an unqualified hostname. 3. Fix the DNS servers so that if top-level domain queries for hostnames are made, responses are made quickly instead of taking so long and failing with SERVFAIL. Lowell, you'll probably have to look elsewhere for solutions to these problems, as they're not PostgreSQL-specific. -- Michael Fuhr