Обсуждение: AIX 4.3 getaddrinfo busted
AIX 4.3 was released in late 1999, so I thought it was worth mentioning. I only have AIX 4.3 and 6.1, so I have no ideahow AIX 5 handles this. AIX 6.1 works fine. Anyways, the service argument to getaddrinfo is busted on AIX 4.3, thus src/backend/libpq/ip.c pg_getaddrinfo_all() is bustedon this platform. It fails with EAI_NODATA "Host not found". If this argument is left NULL, everything works. I can supply a patch to fix this. My suggestion would be to always supply a NULL service to getaddrinfo. After a successful call, set the port if it was provided ... htons((unsigned short)atoi(servname)). This approach avoids a configure check. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow wrote: > AIX 4.3 was released in late 1999, so I thought it was worth mentioning. > I only have AIX 4.3 and 6.1, so I have no idea how AIX 5 handles this. > AIX 6.1 works fine. > > Anyways, the service argument to getaddrinfo is busted on AIX 4.3, thus > src/backend/libpq/ip.c pg_getaddrinfo_all() is busted on this > platform. It fails with EAI_NODATA "Host not found". If this argument > is left NULL, everything works. > > I can supply a patch to fix this. My suggestion would be to always > supply a NULL service to getaddrinfo. After a successful call, set the > port if it was provided ... htons((unsigned short)atoi(servname)). This > approach avoids a configure check. Why would we risk breaking other platforms to avoid an AIX bug? At best we can put a code comment in that section of the code. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, Jan 23, 2009 at 9:18 AM, Andrew Chernow <ac@esilo.com> wrote: > AIX 4.3 was released in late 1999, so I thought it was worth mentioning. I > only have AIX 4.3 and 6.1, so I have no idea how AIX 5 handles this. AIX > 6.1 works fine. > > Anyways, the service argument to getaddrinfo is busted on AIX 4.3, thus > src/backend/libpq/ip.c pg_getaddrinfo_all() is busted on this platform. It > fails with EAI_NODATA "Host not found". If this argument is left NULL, > everything works. > > I can supply a patch to fix this. My suggestion would be to always supply a > NULL service to getaddrinfo. After a successful call, set the port if it > was provided ... htons((unsigned short)atoi(servname)). This approach > avoids a configure check. FYI: There are AIX 5.3 nodes on BuildFarm - if the change is a regression, it will be noticed :-). -- http://linuxfinances.info/info/linuxdistributions.html George Burns - "You can't help getting older, but you don't have to get old."
Bruce Momjian wrote: > Andrew Chernow wrote: >> AIX 4.3 was released in late 1999, so I thought it was worth mentioning. >> I only have AIX 4.3 and 6.1, so I have no idea how AIX 5 handles this. >> AIX 6.1 works fine. >> >> Anyways, the service argument to getaddrinfo is busted on AIX 4.3, thus >> src/backend/libpq/ip.c pg_getaddrinfo_all() is busted on this >> platform. It fails with EAI_NODATA "Host not found". If this argument >> is left NULL, everything works. >> >> I can supply a patch to fix this. My suggestion would be to always >> supply a NULL service to getaddrinfo. After a successful call, set the >> port if it was provided ... htons((unsigned short)atoi(servname)). This >> approach avoids a configure check. > > Why would we risk breaking other platforms to avoid an AIX bug? At best > we can put a code comment in that section of the code. > IMO, there is no risk. getaddrinfo allows a NULL second argument on every platform I have worked with. I can't see this breaking anything. Our internal libraries do it this way for this exactreason, and it works on a large number of platforms. But, it can be done so that it only affects AIX machines. There is already a #ifdef _AIX block in this function so we can handle all AIX versions as I have suggested and let everyone else run the same old code. #ifdef _AIX getaddrinfo(hostname, NULL /* servname */, ....); // manually set the port #else getaddrinfo(hostname, servname, ....); /* same code as now */ #endif -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Christopher Browne wrote: > FYI: There are AIX 5.3 nodes on BuildFarm - if the change is a > regression, it will be noticed :-). > This confirms that its an isolated AIX 4.3 bug. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow wrote: > Bruce Momjian wrote: > > Andrew Chernow wrote: > >> AIX 4.3 was released in late 1999, so I thought it was worth mentioning. > >> I only have AIX 4.3 and 6.1, so I have no idea how AIX 5 handles this. > >> AIX 6.1 works fine. > >> > >> Anyways, the service argument to getaddrinfo is busted on AIX 4.3, thus > >> src/backend/libpq/ip.c pg_getaddrinfo_all() is busted on this > >> platform. It fails with EAI_NODATA "Host not found". If this argument > >> is left NULL, everything works. > >> > >> I can supply a patch to fix this. My suggestion would be to always > >> supply a NULL service to getaddrinfo. After a successful call, set the > >> port if it was provided ... htons((unsigned short)atoi(servname)). This > >> approach avoids a configure check. > > > > Why would we risk breaking other platforms to avoid an AIX bug? At best > > we can put a code comment in that section of the code. > > > > IMO, there is no risk. getaddrinfo allows a NULL second argument on > every platform I have worked with. I can't see this breaking anything. > Our internal libraries do it this way for this exact reason, and it > works on a large number of platforms. > > But, it can be done so that it only affects AIX machines. There is > already a #ifdef _AIX block in this function so we can handle all AIX > versions as I have suggested and let everyone else run the same old code. > > #ifdef _AIX > getaddrinfo(hostname, NULL /* servname */, ....); > // manually set the port > #else > getaddrinfo(hostname, servname, ....); /* same code as now */ > #endif Well, the platform is from 1999 and we have never had bug reports on this, even though the code has been there for quite a few years; I just don't see the motivation for the change. If you really want this platform to work, I would submit a patch that tests for a C compiler symbol or #define that is only defined for that platform and make service = null in that case. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Andrew Chernow wrote: > Christopher Browne wrote: > > FYI: There are AIX 5.3 nodes on BuildFarm - if the change is a > > regression, it will be noticed :-). > > > > This confirms that its an isolated AIX 4.3 bug. It confirms that the bug exists in 4.3 but not on 5.3; not sure how you can make assumptions about intermediate releases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > If you really want this platform to work, I would submit a patch that > tests for a C compiler symbol or #define that is only defined for that > platform and make service = null in that case. > I am not aware of such an animal. I looked at the output of " touch x.c && gcc -v -dM -E x.c" but didn't find anything when compared to AIX 6.1. We could set one in configure (like _AIXVER43)or use _AIX + uname(). How about checking for the failure case, where getaddrinfo returns EAI_NODATA on _AIX with a non-NULL servname. If that happens, call getaddrinfo again with a NULL servname. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes: > Bruce Momjian wrote: >> Why would we risk breaking other platforms to avoid an AIX bug? At best >> we can put a code comment in that section of the code. > IMO, there is no risk. getaddrinfo allows a NULL second argument on > every platform I have worked with. The portability risk is in the "manually set the port" part. regards, tom lane
Tom Lane wrote: > Andrew Chernow <ac@esilo.com> writes: >> Bruce Momjian wrote: >>> Why would we risk breaking other platforms to avoid an AIX bug? At best >>> we can put a code comment in that section of the code. > >> IMO, there is no risk. getaddrinfo allows a NULL second argument on >> every platform I have worked with. > > The portability risk is in the "manually set the port" part. > Right. If this method is limited to _AIX and only when the failure case occurs, there are no portability issues. I've already confirmed my fix works on 4.3 and 6.1. The buildfarm will tell us if 5.3 works (very likely it will). -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes: > Tom Lane wrote: >> The portability risk is in the "manually set the port" part. > Right. If this method is limited to _AIX and only when the failure case > occurs, there are no portability issues. That seems like unnecessary complexity (which carries its own risks). I thought this sketch was about right: #ifdef _AIX getaddrinfo(hostname, NULL /* servname */, ....); // manually set the port #else getaddrinfo(hostname, servname, ....); /* same code as now */ #endif although please flip it around to ifndef. The simpler and more general case should usually be first to aid the reader. regards, tom lane
BTW, what about the comments in ip.c to the effect that some versions of AIX fail when getaddrinfo's second argument *is* null? Right at the moment I'm wondering why we are going to change the code now to support a ten-year-old OS version that evidently no one has tried to use Postgres on before. regards, tom lane
Tom Lane wrote: > BTW, what about the comments in ip.c to the effect that some versions of > AIX fail when getaddrinfo's second argument *is* null? > For starters, it indicates that sin_port is not zero'd properly. That wouldn't matter here since the plan is to manually set the port in this case, since providing it to getaddrinfo is broken. Also, this is why I suggested only doing a NULL 2nd arg within the failure case. rc = getaddrinfo(hostname, service, ....); #ifdef _AIX if(rc == EAI_NODATA && servname && *servname) if(!(rc = getaddrinfo(hostname, NULL, ....))) /* try again */ /* set sin_portor sin6_port */ #endif // code continues as is from here down. Or, set a define in configure indicating aix 4.3 or indicating the 2nd arg to getaddrinfo must be NULL (the former being much easier). I don't think we have to resort to configure checks though. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On 1/23/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Right at the moment I'm wondering why we are going to change the code > now to support a ten-year-old OS version that evidently no one has tried > to use Postgres on before. I'd like to address this observation. You may have noticed that eSilo has been contributing a number of patches to Postgres involving legacy systems. Understandably, there is very little overlap between modern versions of Postgres and these aging unixen. However, quite a few of these systems are still in production serving legacy applications. eSilo provides backup software. Based on customer feedback we came up with a list of systems that are under-represented by current backup solutions in the industry . For various reasons, we decided to involve libpq in our backup client and market aggressively to platforms for which there are very few backup options. libpq is quite portable, but there are a few understandable nits that have popped up here and there over time for older systems. We are providing fixes for those nits to the community. merlin
Andrew Chernow <ac@esilo.com> writes: > Tom Lane wrote: >> BTW, what about the comments in ip.c to the effect that some versions of >> AIX fail when getaddrinfo's second argument *is* null? > For starters, it indicates that sin_port is not zero'd properly. That > wouldn't matter here since the plan is to manually set the port in this > case, since providing it to getaddrinfo is broken. Hmm ... so actually we could get *rid* of that special case if we added this one. Okay, I withdraw the complaint. We should simply not rely on getaddrinfo to do anything right at all w.r.t. the port if we are running on AIX. Pass NULL for servname and set the port ourselves in all cases. regards, tom lane
Tom Lane wrote: > Andrew Chernow <ac@esilo.com> writes: >> Tom Lane wrote: >>> BTW, what about the comments in ip.c to the effect that some versions of >>> AIX fail when getaddrinfo's second argument *is* null? > >> For starters, it indicates that sin_port is not zero'd properly. That >> wouldn't matter here since the plan is to manually set the port in this >> case, since providing it to getaddrinfo is broken. > > Hmm ... so actually we could get *rid* of that special case if we added > this one. Okay, I withdraw the complaint. We should simply not rely on > getaddrinfo to do anything right at all w.r.t. the port if we are > running on AIX. Pass NULL for servname and set the port ourselves in > all cases. > > regards, tom lane > Done. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ Index: src/backend/libpq/ip.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/libpq/ip.c,v retrieving revision 1.43 diff -C6 -r1.43 ip.c *** src/backend/libpq/ip.c 1 Jan 2009 17:23:42 -0000 1.43 --- src/backend/libpq/ip.c 23 Jan 2009 19:14:07 -0000 *************** *** 71,112 **** #ifdef HAVE_UNIX_SOCKETS if (hintp->ai_family == AF_UNIX) return getaddrinfo_unix(servname, hintp, result); #endif /* NULL has special meaning to getaddrinfo(). */ rc = getaddrinfo((!hostname || hostname[0] == '\0') ? NULL : hostname, servname, hintp, result); ! #ifdef _AIX /* * It seems some versions of AIX's getaddrinfo don't reliably zero ! * sin_port when servname is NULL, so clean up after it. */ ! if (servname == NULL && rc == 0) { struct addrinfo *addr; for (addr = *result; addr; addr = addr->ai_next) { switch (addr->ai_family) { case AF_INET: ! ((struct sockaddr_in *) addr->ai_addr)->sin_port = htons(0); break; #ifdef HAVE_IPV6 case AF_INET6: ! ((struct sockaddr_in6 *) addr->ai_addr)->sin6_port = htons(0); break; #endif } } } ! #endif return rc; } /* --- 71,124 ---- #ifdef HAVE_UNIX_SOCKETS if (hintp->ai_family == AF_UNIX) return getaddrinfo_unix(servname, hintp, result); #endif + #ifndef _AIX /* NULL has special meaning to getaddrinfo(). */ rc = getaddrinfo((!hostname || hostname[0] == '\0') ? NULL : hostname, servname, hintp, result); ! #else ! /* NULL hostname has special meaning to getaddrinfo(). We have to ! * set servname to NULL because some AIX versions, like 4.3, always ! * fail with EAI_NODATA if not NULL. ! */ ! rc = getaddrinfo((!hostname || hostname[0] == '\0') ? NULL : hostname, ! NULL, hintp, result); /* * It seems some versions of AIX's getaddrinfo don't reliably zero ! * sin_port when servname is NULL, so clean up after it. Also, ! * manually set the port when servname is provided. */ ! if(rc == 0) { struct addrinfo *addr; + unsigned short port = 0; + + if(servname && *servname) + port = htons((unsigned short)atoi(servname)); for (addr = *result; addr; addr = addr->ai_next) { switch (addr->ai_family) { case AF_INET: ! ((struct sockaddr_in *) addr->ai_addr)->sin_port = port; break; #ifdef HAVE_IPV6 case AF_INET6: ! ((struct sockaddr_in6 *) addr->ai_addr)->sin6_port = port; break; #endif } } } ! #endif /* !_AIX */ return rc; } /*
Merlin Moncure wrote: > On 1/23/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right at the moment I'm wondering why we are going to change the code >> now to support a ten-year-old OS version that evidently no one has tried >> to use Postgres on before. > > I'd like to address this observation. You may have noticed that eSilo > has been contributing a number of patches to Postgres involving legacy > systems. Understandably, there is very little overlap between modern > versions of Postgres and these aging unixen. However, quite a few of > these systems are still in production serving legacy applications. > > eSilo provides backup software. Based on customer feedback we came up > with a list of systems that are under-represented by current backup > solutions in the industry . For various reasons, we decided to > involve libpq in our backup client and market aggressively to > platforms for which there are very few backup options. libpq is quite > portable, but there are a few understandable nits that have popped up > here and there over time for older systems. We are providing fixes > for those nits to the community. > > merlin > I will add that its not our hobby to find obscure libpq and/or platform bugs on fossil machines. Actually, its a rather horrible and frustrating task. We simply have some unusual requirements for our new product. What we found amazing is how many requests we get for these old systems. We are just painfully filling that need. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes: > Tom Lane wrote: >> Hmm ... so actually we could get *rid* of that special case if we added >> this one. Okay, I withdraw the complaint. > Done. Were you hoping this would get back-patched, and if so how far? regards, tom lane
Tom Lane wrote: > Andrew Chernow <ac@esilo.com> writes: >> Tom Lane wrote: >>> Hmm ... so actually we could get *rid* of that special case if we added >>> this one. Okay, I withdraw the complaint. > >> Done. > > Were you hoping this would get back-patched, and if so how far? > > No, we don't need a back-patch. We need way too many features in 8.4 ... like the really amazing libpq-events feature :) -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes: > Tom Lane wrote: >> Were you hoping this would get back-patched, and if so how far? > No, we don't need a back-patch. We need way too many features in 8.4 > ... like the really amazing libpq-events feature :) OK, applied to HEAD with some tiny editorialization. regards, tom lane
Merlin Moncure wrote: > On 1/23/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Right at the moment I'm wondering why we are going to change the code > > now to support a ten-year-old OS version that evidently no one has tried > > to use Postgres on before. > > I'd like to address this observation. You may have noticed that eSilo > has been contributing a number of patches to Postgres involving legacy > systems. Understandably, there is very little overlap between modern > versions of Postgres and these aging unixen. However, quite a few of > these systems are still in production serving legacy applications. > > eSilo provides backup software. Based on customer feedback we came up > with a list of systems that are under-represented by current backup > solutions in the industry . For various reasons, we decided to > involve libpq in our backup client and market aggressively to > platforms for which there are very few backup options. libpq is quite > portable, but there are a few understandable nits that have popped up > here and there over time for older systems. We are providing fixes > for those nits to the community. Well, this helps explain why were are getting these problems reports only now. How many hacks do you have that we don't support, aside from the threading one for HPUX? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > Well, this helps explain why were are getting these problems reports > only now. How many hacks do you have that we don't support, aside from > the threading one for HPUX? > We submit them as we find them. We've submitted for windows, hpux, solaris and aix. Still have not tested SCO openserver and unixware yet. The only thing we have that we have not submitted, are features we don't feel are of general interest: query white list system and bandwidth throttling in the backend on a per session basis. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/