Monday, March 1, 2021

19c runcluvfy.sh faile with PRVF-7596 PRVG-2002

When upgrading Oracle Infrastructure 12.2 to 19c, runcluvfy.sh failed with following messages
Verifying OCR Integrity ...FAILED
host02: PRVF-7596 : CSS is probably working with a non-clustered, local-only
          configuration on node "host02"
Verifying resolv.conf Integrity ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/resolv.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request
Verifying DNS/NIS name service ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/netsvc.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request

Set log location and re-run runcluvfy.sh
# runcluvfy.sh will write log files to directory /u01/tmp
export CV_TRACELOC=/u01/tmp
# re-run runcluvfy.sh
./runcluvfy.sh stage -pre crsinst -upgrade -rolling -src_crshome /u01/app/12.2.0/grid_1 -dest_crshome /u01/app/19.0.0/grid_1 -dest_version 19.0.0.0.0 -fixup -verbose
# Log files created
cd /u01/tmp
ls -l
total 15200
-rw-r--r--    1 grid     oinstall     952467 Feb 24 16:52 cvuhelper.log.0
-rw-r--r--    1 grid     oinstall          0 Feb 24 16:48 cvuhelper.log.0.lck
-rw-r--r--    1 grid     oinstall    6824223 Feb 24 16:52 cvutrace.log.0

The log file cvutrace.log.0 shows messsage
[Worker 0] [ 2021-02-24 16:47:49.784 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/tmp/CVU_19.0.0.0.0_grid/scratch/getFileInfo3605304.out' /tmp/host02.getFileInfo3605304.out
[Worker 0] [ 2021-02-24 16:47:49.798 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-83] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-82] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-83] [ 2021-02-24 16:47:50.201 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:46.747 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/resolv.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
[main] [ 2021-02-24 16:51:46.760 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1060] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-1059] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.148 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1060] [ 2021-02-24 16:51:47.149 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:47.548 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/netsvc.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/nssw_conf_host02
[Thread-1066] [ 2021-02-24 16:51:47.559 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.559 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1065] [ 2021-02-24 16:51:47.560 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.965 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1066] [ 2021-02-24 16:51:47.973 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request

All three errors occurred when scp is executed to copy file from remote node host02 to local node.
Interesting thing is that all three commands are using single quotation marks to enclose remote file name, I guess Oracle programmer left them there by accident because there is no reason to use them. Let's verify if it is the culprit by manually running scp
[grid@host01]$ scp -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
protocol error: filename does not match request
[grid@host01]$
[grid@host01]$ scp -T -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
resolv.conf                                             96    51.3KB/s   00:00
[grid@host01]$

The scp fails with same error "protocol error: filename does not match request", but succeeds if extra option -T is used.

The -T option was introduced by OpenSSH 8.0 released in April 2019. In earlier version of OpenSSH, when copying files from a remote system to a local directory, scp did not verify that the file names that the server sent matched those requested by the client. This could allow a hostile server to create or clobber unexpected local files with attacker-controlled content. OpenSSH 8.0 fixed this security issue and scp, by default, verifies the file name on client side, and also introduced -T option to provide capacity to disable the verification.

Although OpenSSH officially claims that the fix is introduced in 8.0, runcluvfy.sh shipped with Grid Infrastructure 19.3 also fails with same reason on AIX with OpenSSH 7.5p1 and it is where the errors used in this article happened.

This version of runcluvfy.sh sends remote file name with single quotation marks, but remote server returns file name without quotation. Technically, they are same thing, but they are visually different. Therefore, old version scp worked because it did not verify them, but current scp fails it with "filename does not match".

It can be resolved by downloading new standalone version Cluster Verification Utility from My Oracle Support(Patch 30839369). Note: the executable is cluvfy instead of runcluvfy.sh. If download is not the option. As a temporary workaround, we can rename scp and create a new scp
# Rename the original scp
mv /usr/bin/scp /usr/bin/scp.bak
# Create a new file scp
echo "/usr/bin/scp.orig -T $*" > /usr/bin/scp
# Make the file executable
chmod a+rx /usr/bin/scp

Now, the errors are gone. After successfully installing GI, remember to restore original scp
# Delete interim scp
rm /usr/bin/scp
# Restore the original scp.
mv /usr/bin/scp.bak /usr/bin/scp

No comments: