Wednesday, February 6, 2013

PRVF-5637 : DNS response time could not be checked on following nodes

Running cluvfy (pre crsinst) showed the following error for /etc/resolve.conf
Checking consistency of file "/etc/resolv.conf" across nodes
Checking the file "/etc/resolv.conf" to make sure only one of domain and search entries is defined
File "/etc/resolv.conf" does not have both domain and search entries defined
Checking if domain entry in file "/etc/resolv.conf" is consistent across the nodes...
domain entry in file "/etc/resolv.conf" is consistent across nodes
Checking file "/etc/resolv.conf" to make sure that only one domain entry is defined
All nodes have one domain entry defined in file "/etc/resolv.conf"
Checking all nodes to make sure that domain is "oracle.private" as found on node "DB-01"
All nodes of the cluster have same value for 'domain'
Checking if search entry in file "/etc/resolv.conf" is consistent across the nodes...
search entry in file "/etc/resolv.conf" is consistent across nodes
Checking DNS response time for an unreachable node
  Node Name                             Status
  ------------------------------------  ------------------------
  DB-01                             failed
  DB-02                             failed
PRVF-5637 : DNS response time could not be checked on following nodes: DB-01,DB-02
File "/etc/resolv.conf" is not consistent across nodes
1480242.1 and 1356975.1 list several bugs and other reasons as to why this check failed. But in this case the reason was none of those specified in the metalink notes. Beside the reasons listed on the above mentioned metalink notes, another reason it seems is the missing nslookup utility. Cluvfy must be able to find the nslookup if not it will erroneously state that File "/etc/resolv.conf" is not consistent across nodes when in fact it has not done the test at all.



Install nslookup utility with
yum install bind-utils
Afterwards this step would be successful. (if none of the bugs/reasons mentioned on 1480242.1 and 1356975.1 are there)
Checking consistency of file "/etc/resolv.conf" across nodes
Checking the file "/etc/resolv.conf" to make sure only one of domain and search entries is defined
File "/etc/resolv.conf" does not have both domain and search entries defined
Checking if domain entry in file "/etc/resolv.conf" is consistent across the nodes...
domain entry in file "/etc/resolv.conf" is consistent across nodes
Checking file "/etc/resolv.conf" to make sure that only one domain entry is defined
All nodes have one domain entry defined in file "/etc/resolv.conf"
Checking all nodes to make sure that domain is "oracle.private" as found on node "DB-01"
All nodes of the cluster have same value for 'domain'
Checking if search entry in file "/etc/resolv.conf" is consistent across the nodes...
search entry in file "/etc/resolv.conf" is consistent across nodes
Checking DNS response time for an unreachable node
  Node Name                             Status
  ------------------------------------  ------------------------
  DB-01                             passed
  DB-02                             passed
The DNS response time for an unreachable node is within acceptable limit on all nodes
File "/etc/resolv.conf" is consistent across nodes
Useful metalink notes
PRVF-5637 : DNS response time could not be checked on following nodes [ID 1480242.1]
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes [ID 1356975.1]


Update 12 March 2013
As mentioned on 1480242.1 RHEL 6 the nslookup for unknown host would return 1 instead of 0. Metalink note says this was confirmed from 6.3 but wasn't the case on the tested 6.3 as it returned 0
[root@rhel6m1 ~]# uname -r
2.6.32-279.el6.x86_64 <-- (rhel-server-6.3-x86_64)

[root@rhel6m1 ~]#  nslookup unknown-not-reachable-node
Server:         1.6.9.2
Address:        1.6.9.2#53

Non-authoritative answer:
Name:   unknown-not-reachable-node.code.net
Address: 23.11.26.18

[root@rhel6m1 ~]# echo $?
0
Same was observed on 6.2
uname -r
2.6.32-220.el6.x86_64  <-- (rhel-server-6.2-x86_64)

[root@rhel6m1 ~]# nslookup unknown-not-reachable-node
Server:         11.7.9.2
Address:        11.7.9.2#53

Non-authoritative answer:
Name:   unknown-not-reachable-node.code.net
Address: 23.11.26.18

[root@rhel6m1 ~]# echo $?
0
However on 6.4 the nslookup returned 1 instead of 0 (matching the description on metalink note)
[grid@db1 grid]$ uname -r
2.6.32-358.0.1.el6.x86_64  <-- (rhel-server-6.4-x86_64)

[grid@db1 grid]$ nslookup unknown-not-reachable-node
Server:         13.20.9.5
Address:        13.20.9.5#53

** server can't find unknown-not-reachable-node: SERVFAIL

[grid@db1 grid]$  echo $?
1
Therefore pre-req would fail on RHEL 6.4 with PRVF-5637 due to the change in the return value of the nslookup. Only way to verify that pre-req is passed is through manual inspection of /etc/resolve.conf