Wednesday, January 9, 2013

Installing Oracle RAC When SSH Is Listening On A Non-Default Port

It's not uncommon to see server environments where ssh is configured to listen on a non-default port (other than port 22). Though this has no consequence when installing a single instance Oracle DB.For clusterware or RAC installation this would make the OUI and runcluvfy to fail on the user equivalence if minor configuration changes are not made.
When ssh is running on a non-default port to ssh from one server to another the port must be specified, similar to
ssh -p 2020 remotehost
or an alias could be setup with ssh name and value with port such
alias ssh='ssh -p 2020'
and then simply use ssh remotehost.
It's even possible to create user equivalency manually for grid or oracle user using /usr/sbin/ssh-keygen and get pass-phraseless ssh access among the nodes considered for cluster installation by specifying the ssh port. However the sshUserSetup.sh script will fail as it tries create user equivalency as it uses /usr/bin/ssh in the script (without port) and this results in it being unable to copy the generated keys to nodes listed.
The user may be prompted for a password here since the script would be running SSH on host DB-01.
ssh: connect to host DB-01 port 22: Connection refused
Done with creating .ssh directory and setting permissions on remote host DB-01.
Creating .ssh directory and setting permissions on remote host DB-02
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host DB-02. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host DB-02.
ssh: connect to host DB-02 port 22: Connection refused
Done with creating .ssh directory and setting permissions on remote host DB-02.
Copying local host public key to the remote host DB-01
The user may be prompted for a password or passphrase here since the script would be using SCP for host DB-01.
ssh: connect to host DB-01 port 22: Connection refused
lost connection
Done copying local host public key to the remote host DB-01
Copying local host public key to the remote host DB-02
The user may be prompted for a password or passphrase here since the script would be using SCP for host DB-02.
ssh: connect to host DB-02 port 22: Connection refused
lost connection
Done copying local host public key to the remote host DB-02
ssh: connect to host DB-01 port 22: Connection refused
ssh: connect to host DB-02 port 22: Connection refused
SSH setup is complete.
As mentioned earlier it's possible to manually create the user equivalency or even edit the script. But this will not make runcluvfy.sh or OUI pass the user equivalency test.
OUI failing even after creating user equivalency manually.

Running runcluvfy.sh will fail with
Checking user equivalence...
PRVF-4007 : User equivalence check failed for user "grid"
Check failed on nodes:
DB-01,DB-02
Alias on ssh as mentioned earlier is no use as these use the full path "/usr/bin/ssh" thus alias seem to get ignored. One of the other workaround tried was to create a script name ssh and call the original ssh executable with port and whatever parameter passed on to it.
cd /usr/bin
mv ssh sshi
and then create an file called ssh in /usr/bin with following text, assuming ssh runs on port 2020
/usr/bin/sshi -p 2020 $*
This managed to get the runcluvfy.sh working for local node but still failed checking pre-reqs on the remote node and didn't go into detail investigation into why it failed.
Finally raised an SR asking if it's possible to install Oracle RAC when ssh is listening on a non-default port and if so how to get runcluvfy.sh to pass the user equivalence.



Strangely Oracle's answer was to unblock the port 22 (run ssh on default port) and once installed change back to non-default port. SSH is only used during installation, upgrades, patches and etc not during "normal" database activities. But according to 220970.1 user equivalency must exists even after the installation as many assistants and scripts depends on it.
It was tested on a existing environment by changing the ssh port from 22 to 2020 and see if cluster stack would start without an error and all "seemed to be fine" (it wasn't an extensive test, just start and stop) but this approach didn't seem something good to have on a production system (changing ssh port every time some script fails due to missing user equivalency).
Oracle asked to trace the runcluvfy.sh comp nodecon and upload files and after looking at the output said "issue is similar to 4193093 and it's not a bug but an exception". Couldn't find anything on 4193093 so it must be some internal bug document. But still didn't resolve the issue at hand.
Used strace to get the system calls for the command executed by cluvfy to check user equivalency.
strace -o db01_output_ssh_p.log /usr/bin/ssh -p 2020 -o FallBackToRsh=no -o PasswordAuthentication=no -o StrictHostKeyChecking=yes -o NumberOfPasswordPrompts=0 DB-02 /bin/true
Looking at the output of the strace it could be seen the ssh is using the default port even though port change is specified in sshd_config (this is RHEL 5 server 2.6.18-308.24.1.el5).
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("xxx.xxx.xx.xx")}, 16) = -1 ECONNREFUSED (Connection refused)
Oracle's answer was the above line shows that when port is not passed ssh still uses default port this is not a bug. But still if it's needed to pass a port to cluvfy then it must go through the product enhancement process. Not exactly what was needed for the problem at hand and not much help coming from the SR.
But the strace output did give an idea. Why is it ssh still using port 22 even when port 2020 is defined in the sshd_config? Answer is ssh configuration parameters are loaded from ~/.ssh/config file. This is already created as part of clusterware pre-req. All that's needed to get user equivalence working when ssh is listening on a non-default port is to specify the port in the user's (oracle and grid) .ssh/config file.
grid]$cat .ssh/config
Host *
        ForwardX11      no
        Port    2020
After this no need to specify the port (use ssh -p 2020) on the shell command and runcluvfy.sh runs without an error on user equivalence.
Check: User equivalence for user "grid"
  Node Name                             Status
  ------------------------------------  ------------------------
  DB-01                             passed
  DB-02                             passed
Result: User equivalence check passed for user "grid"
Confirmation that ssh is using the non-default port from the strace
connect(3, {sa_family=AF_INET, sin_port=htons(2020), sin_addr=inet_addr("xxx.xxx.xx.xx")}, 16) = 0
Bottom line is Oracle RAC can be installed when ssh is listening on a non-default port (something asked on the SR and Oracle didn't give an answer to). No changes are required on Oracle, simply add the ssh port to .ssh/config file.