Sunday, June 20, 2021

Script Run datapatch against all running instance in parallel

This script can be used to run datapatch against all currently running instances in parallel to apply sql patches after patches are applied on Oracle homes. It is helpful for post patching operations. It works for different instances running out of different Oracle homes and different home with different owners.

The script accepts one optional parameter as log file path, it will save log files under /tmp if no parameter is presented while the script is started.

In order to have the script succeed, please be advised about following limitations,

1. The script is only tested on Linux and AIX, it does not work on Solaris.

2. Only root or Oracle database home owner is supposed to run this script. If the instances are running out of different Oracle homes which are owned by different OS users, root user is recommended. Otherwise, the instances running out of Oracle home which owner is different from current user will be excluded.

3. Oracle database home is 12c or higher which supports datapatch

4. The script runs following command as root to retrieve Oracle home path
   /bin/ls
   Therefore, if Oracle home owner (normally oracle) runs this script, sudo has to be configure to grant the user privilege to run '/bin/ls' as root without being asked for password.

Friday, June 18, 2021

Install PostgreSQL on RHEL/OL/CentOS 6/7/8

There are many ways to install PostgreSQL on Linux box. Here, I am going to use yum to install different version of PostgreSQL on Red Hat Enterprise Linux (RHEL) 7/8, it also works for Oracle Linux (OL) / CentOS 7/8.

1. Install yum repository for PostgreSQL

Depends on version of OS, run one of following commands
  * RHEL/OL/CentOS 8
  yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm

  * RHEL/OL/CentOS 7
  yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

  * RHEL/OL/CentOS 6
  yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-6-x86_64/pgdg-redhat-repo-latest.noarch.rpm

2. Validate yum repository list with following command

  yum repolist

The command lists all configured repositories, example output on RHEL 8 
repo id                repo name
pgdg-common            PostgreSQL common RPMs for RHEL/CentOS 8 - x86_64
pgdg10                 PostgreSQL 10 for RHEL/CentOS 8 - x86_64
pgdg11                 PostgreSQL 11 for RHEL/CentOS 8 - x86_64
pgdg12                 PostgreSQL 12 for RHEL/CentOS 8 - x86_64
pgdg13                 PostgreSQL 13 for RHEL/CentOS 8 - x86_64
pgdg96                 PostgreSQL 9.6 for RHEL/CentOS 8 - x86_64

It configured PostgreSQL 9.6, 10, 11, 12, 13 repositories on RHEL 8. Therefore, I can install any or all of these version PostgreSQL. Note, available version PostgreSQL is different for different OS version.

3. Install PostgreSQL

Depends on which version you want to install, run one of following command,

  * Install PostgreSQL 13
  yum install -y postgresql13-server

  * Install PostgreSQL 12
  yum install -y postgresql12-server

  * Install PostgreSQL 11
  yum install -y postgresql11-server

  * Install PostgreSQL 10
  yum install -y postgresql10-server

  * Install PostgreSQL 96
  yum install -y postgresql96-server

4. Install package postgresql-contrib

This package contains various extension modules that are included in the PostgreSQL distribution. Depends on the version of PostgreSQL installed, run one of following command to install contrib package,

  * PostgreSQL 13
  yum install -y postgresql13-contrib

  * PostgreSQL 12
  yum install -y postgresql12-contrib

  * PostgreSQL 11
  yum install -y postgresql11-contrib

  * PostgreSQL 10
  yum install -y postgresql10-contrib

  * PostgreSQL 96
  yum install -y postgresql96-contrib

Thursday, June 17, 2021

Java Stored Procedure failed with java.lang.OutOfMemoryError

Oracle database Jave stored procedure failed with "java.lang.OutOfMemoryError", the error stack looks like
ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
ORA-06512: at "USER01.MYJAVAPROC", line 28
ORA-06512: at line 7

This error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The Java garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.  

As we know, Java stored procedure is impleted in Java, so that the excution of the procedure also follows the same rule as normal Java application doing. The only difference is that the procedure is running on JVM built inside database, not normal Java Runtime Engine (java.exe).

When normal Java application got out of memory error, parameter -Xmx can be used while starting the application to configure the heap memory with larger value. However, as the Oracle Databae JVM is running in the process space of the Oracle executable, there is no way to use the -Xmx parameter.  But it is also configurable using a method in the "Java Runtime" class "oracle.aurora.vm.OracleRuntime". The class has following methods,

    getMaxMemorySize   - Get current setting of heap size
    setMaxMemorySize   - Set new heap size

In order to call the methods inside database, we have to create Java stored procedure to expose the method to Oracle database, the example code as following
create or replace function get_java_heap_size return number is
    language java name 'oracle.aurora.vm.OracleRuntime.getMaxMemorySize() returns long';

create or replace function set_java_heap_size(mem_size number) return number is
    language java name 'oracle.aurora.vm.OracleRuntime.setMaxMemorySize(long) returns long';

The out of memory error can be fixed by running created Jave store procedure set_java_heap_size before running the application Java stored procedure which triggers the error. For example
declare
  heap_size number;
begin
  -- Set heap size to 1GB
  heap_size := set_java_heap_size(1024*1024*1024);
  -- Run application Java stored procedure
  user01.myjavaproc;
end;
/

Wednesday, June 16, 2021

Oracle 19.11 roothas.sh failed with "Out of memory" on AIX

When applying Oracle GI Release Update 19.11.0.0.210420 on AIX, command "roothas.sh -postpatch" failed with "Out of memory" as following
[root@host01]# /u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.sh -postpatch
Using configuration parameter file: /u01/app/oracle/product/19.0.0/grid_1/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/host01/crsconfig/hapatch_2021-06-15_01-53-27PM.log
Out of memory!
Out of memory!
Out of memory!
/u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.sh[137]: 7930494 Segmentation fault(coredump)
The command '/u01/app/oracle/product/19.0.0/grid_1/perl/bin/perl -I/u01/app/oracle/product/19.0.0/grid_1/perl/lib -I/u01/app/oracle/product/19.0.0/grid_1/crs/install -I/u01/app/oracle/product/19.0.0/grid_1/xag /u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.pl -postpatch' execution failed

The error is coming from the perl process (script roothas.pl) executing the root configuration scripts, the perl process does not have enough memory(especially data segments) during the script execution.

Technically, the "Out of memory" error could be seen on AIX while running Oracle 19c perl scripts for root configuration. These perl scripts are usually executed/called by DBA with sh script root.sh, rootupgrade.sh, rootcrs.sh, roothas.sh, etc. Although the error was hard to see before, it is becoming popular since Oracle GI RU 19.11 was released.

It is caused by OS memory allocation method. On AIX, the number of data segments that a process is allowed to use also limits the process memory size. The default number of data segments is one. The size of a data segment is 256 MB. Data segments are shared for both data and stack. The maximum number of additional data segments a process can use is eight (2GB). The number of segments that a process can use for data is controlled by the LDR_CNTRL environment variable. It is defined in the parent process of the process that is to be affected.

Therefore, we are able to fix the issue by increasing process memory size through increasing value of environment variable LDR_CNTRL. For example, the following defines eight additional data segments

export LDR_CNTRL=MAXDATA=0x80000000
<root script>
unset LDR_CNTRL

Here, 
  <root script> is what you have to run as root which got "Out of memory". In my case, it is "roothas.sh -postpatch".
  unset command remove (unset) the LDR_CNTRL environment variable, so that it does not unintentionally affect other processes.

Some argumentative guy says eight additional data segments (2GB) is too large, ok, you can set to 4 as following

export LDR_CNTRL=MAXDATA=0x40000000

Tuesday, June 15, 2021

Configure yum with proxy server on RHEL/OL/Fedora

When using yum to maintain packages on hosts running Red Hat Enterprise Linux (RHEL), Oracle Linux (OL) or Fedora, you may have to configure proxy option if the host is running behind firewall.

In order to do that, add following lines to yum configuration file /etc/yum.conf

proxy=http://<proxy server>:<port>
proxy_username=<user name>
proxy_password=<password>

Here, 

  <proxy server> is the host name or IP address of proxy server
  <user name> user name if proxy server requires
  <password>  password if proxy server requires

File /etc/yum.conf  example
gpgcheck=1
installonly_limit=3
clean_requirements_on_remove=True
best=True
skip_if_unavailable=False
proxy=http://proxy.dbaplus.ca:8080
proxy_username=user01
proxy_password=user01pwd

Thursday, June 3, 2021

How to Permanently Change PowerShell Prompt

 The PowerShell command prompt indicates that PowerShell is ready to run a command,
PS C:\>

It is determined by the built-in Prompt function and can be customized by running following command to re-define Prompt function,

  function prompt {"<Script block>"}

Here, <Script block> is script block which will determine/generate prompt value (character string). For example, following have prompt include current date and time
PS C:\>
PS C:\> function Prompt {"$(Get-Date)> "}
06/03/2021 11:39:23>
06/03/2021 11:39:25>

The change is only valid for current session, the prompt of new session is still default value. In order to keep new prompt for all sessions, you have to create your own Prompt function and saving it in your PowerShell profile as following,

1. Find out your PowerShell profile file with "$profile"
PS C:\> $profile
C:\Users\admin\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
PS C:\>

2. Add customized Prompt function to profile

If the file or directory does not exist, you can manually create it, then add customized Prompt function to the file. For example, add following lines
function Prompt { "PS [" + ${ENV:USERNAME} + "@" + ${ENV:COMPUTERNAME} + "]> "}

The prompt will be in the format "PS [user-name@computer-name]> " in all new PowerShell sessions.

Sunday, May 16, 2021

Grid Infrastructure 12.2 restore OCR, Voting File and mgmtdb when disk group corrupted

For some reason, Oracle desupported the placement of OCR and voting files directly on a shared file system from Grid Infrastructure 12.2 until rescinding the desupport for Standalone Clusters from 19.3. Therefore, when GI 12.2 is installed, OCR, Voting file, and OCR backup location are, by default, configured in the ASM disk group, and mostly likely in the same disk group. When the disk group cannot be mounted because of any issues, the cluster will not be brought up anymore.

Technically, GI has to be re-configured as like a new installation, that could be a big job. However, it may not be that bad. Although 12.2 does not allow you have OCR backup location out of ASM disk group, DBA still can copy the OCR backup file from ASM disk group to file system with command 'asmcmd cp'. I am going to demonstrate how we can maximize the opportunity of bringing the cluster backup without rebuild/reinstall it.

Sunday, May 9, 2021

Windows How to log all output on the console to text file

Windows Command Prompt console does not have built-in facility to log the console output to file. In order to implement logging function, PowerShell has to be used instead of normal command console.

To enable logging, running following command under PowerShell prompt,

  Start-Transcript [-Path] "<file-name>" [-Append]

Here, 

  -Path parameter is used to specify the file "<file-name>" and "<file-name>" is full path of log file in which output messages will be saved

  -Append Indicates that the logging text will be added to the end of an existing file instead of overriding.

To stop logging, running command

  Stop-Trasnscript

Note: PowerShell can be started on Windows by clicking "Windows PowerShell" applicaiton or running command "PowerShell" in normal "Command Prompt" console, but DO NOT start PowerShell in "Windows Terminal". If the PowerShell is started from "Window Terminal", Start-Transcript may not be able to log everything. For example, non-Windows built-in command (e.g. sqlplus) output cannot be completely logged.
  
Example

PS> Start-Transcript -Path "C:\temp\test.log"
Transcript started, output file is C:\temp\test.log
PS>
PS> sqlplus system/oracle@orcl

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 9 20:30:15 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 13 2021 16:46:29 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

system@orcl> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
host01

system@orcl> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
PS>
PS> Stop-Transcript
Transcript stopped, output file is C:\temp\test.log

The content of file C:\temp\test.log

**********************
Windows PowerShell transcript start
Start time: 20210509202857
Username: dbaplus
RunAs User: dbaplus
Configuration Name: 
Machine: wkstn01 (Microsoft Windows NT 10.0.19042.0)
Host Application: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
Process ID: 15732
PSVersion: 5.1.19041.906
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.19041.906
BuildVersion: 10.0.19041.906
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
Transcript started, output file is C:\temp\test.log
PS> sqlplus system/oracle@orcl

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 9 20:30:15 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 13 2021 16:46:29 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

system@orcl> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
host01

system@orcl> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
PS> Stop-Transcript
**********************
Windows PowerShell transcript end
End time: 20210509203043
**********************

Wednesday, May 5, 2021

OEM 12c/13c Database Target Discovered and Named with host name as suffix

 When EM discovers database targets, it generates default target name for the database in one of following formats,

  db_unique_name.db_domain   if both parameters db_unique_name and db_domain are set
  db_unique_name             if parameter db_unique_name is set, but db_domain is not set
  db_name.db_domain          if parameter db_unique_name is not set, but db_domain is set
  db_name                    if neither db_unique_name nor db_domain is set

EM discovery script retrieves these parameters' values from parameter file (pfile or spfile) of the database, not from running instance. If the discovery process has issue to locate or process parameter file, EM will name database target in format,

  <sid>_<hostname>
  
Therefore, when you find the new discovered database target is named in this format, it means EM agent discovery script has issue with parameter file. The details can be found from agent trace file "<AGENT_INST_HOME>/sysman/log/emagent_perl.trc". Most popular error messages look like

  ERROR:  initParameterFileUtl::convertSPFileToPFile: Failed to convert spfile
  
or

  ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance <instancename> in oracle home  <oracle_home>

For example, when Oracle Restart (standalone Grid Infrastructure) is installed and ASM storage is configured on server host1.dbaplus.ca, DBCA creates a database orcl (sid & db_name) in ASM diskgroup, the spfile will also be saved in ASM diskgroup and no parameter file (pfile/spfile) will be created under directory <ORACLE_HOME>/dbs. The database will be discovered by EM with default name orcl_host1.dbaplus.ca and you will see errors in agent trace file "emagent_perl.trc",

oracledb.pl: 2021-05-04 04:03:08,923: INFO:  DB_LISTENER_DISCOVERY:  processing sid="orcl"
oracledb.pl: 2021-05-04 04:03:08,926: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:08,930: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:09,065: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:09,069: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs

The reason is that discovery script cannot find parameter file from <ORACLE_HOME>/dbs. To fix this problem, create a pfile 'initorcl.ora' under <ORACLE_HOME>/dbs with following content,

   spfile='<full path of spfile saved in diskgroup>'

Note: DO NOT leave any SPACE character at the beginning of the line (before word "spfile"). If you did, you will not see any errors in the trace file, but the target name will be <sid>_<hostname>.

Sample init file

$ srvctl config database -db orcl | grep spfile
Spfile: +DATA/orcl/PARAMETERFILE/spfile.919.1071658047
$
$ echo "spfile='+DATA/orcl/PARAMETERFILE/spfile.919.1071658047'" > $ORACLE_HOME/dbs/initorcl.ora
$
$ cat $ORACLE_HOME/dbs/initorcl.ora
spfile='+DATA/orcl/PARAMETERFILE/spfile.919.1071658047'

Sunday, May 2, 2021

Oracle 19.11 deinstall failed with "ERROR: oracle/rat/tfa/util/ManageTfa"

After Oracle database 19c home is applied Release Update 19.11.0.0.210420, the home cannot be de-installed anymore.

The deinstall utility will fail with errors,
######################## DEINSTALL CLEAN OPERATION START ########################
## [START] Preparing for Deinstall ##
Setting LOCAL_NODE to thanos
Setting CRS_HOME to false
Setting oracle.installer.invPtrLoc to /tmp/deinstall2021-05-02_07-34-15PM/oraInst.loc
Setting oracle.installer.local to false

ERROR: oracle/rat/tfa/util/ManageTfa
Exited from program.


############# ORACLE DEINSTALL TOOL END #############

In deinstall error log,
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at oracle.install.db.deinstall.wrapper.Deinstall.callCleanImpl(Deinstall.java:1876)
        at oracle.install.db.deinstall.wrapper.Deinstall.main(Deinstall.java:907)
Caused by: java.lang.NoClassDefFoundError: oracle/rat/tfa/util/ManageTfa
        at oracle.install.db.deinstall.core.PrepForOUIDeinstall.cleanConfig(PrepForOUIDeinstall.java:187)
        ... 6 more
Caused by: java.lang.ClassNotFoundException: oracle.rat.tfa.util.ManageTfa
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 7 more

Looks like something related to TFA is missing after RU 19.11 is applied.

Workaround,

Roll back RU 19.11, then re-run deinstall utility.

Oracle 19c runInstaller failed with "undefined reference to 'jox_eujs_nowait_'"

When installing Oracle 19c with patch apply of Database RU 19.11 and OJVM RU 19.11, runInstaller failed with message

   Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile '/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/ins_rdbms.mk'. See /u01/app/oraInventory/logs/InstallActions2021-05-01_08-40-24PM/installActions2021-05-01_08-40-24PM.log for details.

The error can be reproduced on Linux x86_64 (RU 19.11 is not released for other platforms yet) as following,

  1. Unzip 19c base installation media 19.3 to new created home

  2. Under new home, replace original OPatch 12.2.0.1.17 with OPatch 12.2.0.1.24

  3. Unzip 19c Database Release Update 19.11.0.0.210420 (Patch 32545013 ) to a separate directory  /stage/ru

  4. Unzip 19c Oracle JavaVM Component Release Update 19.11.0.0.210420 (Patch 32399816) to a separate  directory /stage/ojvm

  5. Start runInstaller from new home with -applyRU and -applyOneOffs options

     ./runInstaller -applyRU /stage/ru/32545013 -applyOneOffs /stage/ojvm/32399816

In the installation log file (file name shown by runInstaller),

INFO:
 - Linking Oracle
INFO:
rm -f /u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/oracle
INFO:
   ... ...
INFO:
/u01/app/oracle/product/19.11.0/dbhome_1/lib//libserver19.a(joxwtp.o): In function `jox_eujs_nowait':
joxwtp.c:(.text+0xf7b): undefined reference to `jox_eujs_nowait_'

INFO:
make: *** [/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/oracle] Error 1
INFO: End output from spawned process.
INFO: ----------------------------------
INFO: Exception thrown from action: make
Exception Name: MakefileException
Exception String: Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile '/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/ins_rdbms.mk'. See '/u01/app/oraInventory/logs/InstallActions2021-05-01_08-40-24PM/installActions2021-05-01_08-40-24PM.log' for details.
Exception Severity: 1
INFO: Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile '/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/ins_rdbms.mk'. See '/u01/app/oraInventory/logs/InstallActions2021-05-01_08-40-24PM/installActions2021-05-01_08-40-24PM.log' for details.

It is reported as a bug. Before the bug fix is released, the workaround is to run installer without OJVM RU 19.11 and apply it separately after installation.

Saturday, May 1, 2021

OEM 13c Software Standardization Advisor Returns Collection Errors

In EM 13c console, access "Software Standardization Advisor" as following,

 Targets -> Databases -> Administration -> Software Standardization Advisor
 
You could see a number beside "Collection Errors" for database and/or Grid infrastructure home. The issue can be found in EM 13.3, 13.4 and 13.5.

Click the number beside "Collection Errors", all Oracle home targets which have "Metric Collection Errors" are listed with Path, Host and Owner information. 

Click the name of Oracle Home Target to show home page of the target, at the bottom of "Summary" section, click link "Found 1 Metric Collection Error(s)" beside "Reason".

Metric "Files affected by a Patch" has following "Message"

     java.lang.UnsupportedOperationException: Collection Result Maximum Flood Control Level Exceeded

The issue can be reproduced / verified by run following command on the host where the Oracle home exists,

  <AGENT_HOME>/bin/emctl control agent runCollection <Target name of the Oracle home>:oracle_home oracle_home_config

$ emctl control agent runCollection OraHome12_host01:oracle_home oracle_home_config
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD runCollection completed successfully

Following lines are found from the log file "<AGENT_INST_HOME>/sysman/log/gcagent.log"
2021-04-29 20:25:18,725 [668:D9EFD971:HTTPListener--668 (DispatchRequests emdctl@28474@amber.corp.toronto.ca=>[161974231840001])] INFO - >>> Dispatching request: RunCollectionRequest <<<
2021-04-29 20:25:22,986 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:22,992 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:24,398 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:24,406 [1129:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchedFile)] ERROR - Result set exceeded max flood control level
2021-04-29 20:25:24,408 [1129:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchedFile)] ERROR - oracle_home:OraHome12_host01:oracle_home_config:PatchedFile
java.lang.UnsupportedOperationException: Collection Result Maximum Flood Control Level Exceeded
at oracle.sysman.emSDK.agent.datacollection.CollectionResult.performFloodControl(CollectionResult.java:459)
at oracle.sysman.emSDK.agent.datacollection.CollectionResult.addCollectionRow(CollectionResult.java:662)
at oracle.sysman.gcagent.addon.fetchlet.osfetchlet.BaseOSFetchlet.getOSMetric(BaseOSFetchlet.java:1157)
at oracle.sysman.gcagent.addon.fetchlet.osfetchlet.BaseOSFetchlet.getMetric(BaseOSFetchlet.java:476)
at oracle.sysman.gcagent.target.interaction.execution.FetchletFactory.getMetric(FetchletFactory.java:437)
at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.executeQueryDescriptor(ExecuteTask.java:1284)
at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.runTask(ExecuteTask.java:3167)

From the log, we can find two exceptions:

  1. When collectiong metric PatchFixedBug (Bugs fixed by Patch), got WARN message "Result set exceeding min flood control level"
  2. When collectiong metric PatchedFile (Files affected by Patch), got ERROR message "Result set exceeded max flood control level"

The ERROR can be also confirmed by collecting specific metric "Files affected by Patch" with command

  <AGENT_HOME>/bin/emctl getmetric agent <Oracle Home target name>,oracle_home,PatchedFile

$ emctl getmetric agent OraHome12_host01,oracle_home,PatchedFile
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD getmetric error: Collection Result Maximum Flood Control Level Exceeded


Solution

1.  Agent side settings:

  * Set Min flood control value "CollectionResults.MaximumRowsFloodControlMin" to remove WARN message

    <AGENT_HOME>/bin/emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMin -value 5000

  * Set Max flood control value "CollectionResults.MaximumRowsFloodControlMax" to fix ERROR issue
  
    <AGENT_HOME>/bin/emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMax -value 50000

  * Verify the new value of properties with commands
  
    <AGENT_HOME>/bin/emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
    <AGENT_HOME>/bin/emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax

    Or reviewing agent property file "<AGENT_INST_HOME>/sysman/config/emd.properties"

    grep 'CollectionResults' <AGENT_INST_HOME>/sysman/config/emd.properties

  * Manually start metric collection to reset the error status or you can wait until next scheduled metric collection job running (by default, every 24 hours)

    <AGENT_HOME>/bin/emctl control agent runCollection <Oracle home target name>:oracle_home oracle_home_config

  * In case you change your mind (or want to see the errors) the new value of properties can be cleared/removed with commands

    <AGENT_HOME>/bin/emctl clear_property agent -name CollectionResults.MaximumRowsFloodControlMin
    <AGENT_HOME>/bin/emctl clear_property agent -name CollectionResults.MaximumRowsFloodControlMax

Sample output
$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMin is not a valid configuration property

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMax is not a valid configuration property

$ emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMin -value 5000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD setproperty succeeded

$ emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMax -value 50000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD setproperty succeeded

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMin=5000

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMax=50000

$ grep 'CollectionResults' /u01/app/oracle/em13.5/agent/agent_inst/sysman/config/emd.properties
CollectionResults.MaximumRowsFloodControlMin=5000
CollectionResults.MaximumRowsFloodControlMax=50000

2. OMS side settings:

  * Setting the min value:
    
    <OMS_HOME>/bin/emctl set property -name CollectionResults.MaximumRowsFloodControlMin -value 5000

  * Setting the max value:
    
    <OMS_HOME>/bin/emctl set property -name CollectionResults.MaximumRowsFloodControlMax -value 50000

  * Verify the new value of properties
  
    <OMS_HOME>/bin/emctl get property -name CollectionResults.MaximumRowsFloodControlMin
    <OMS_HOME>/bin/emctl get property -name CollectionResults.MaximumRowsFloodControlMax
    
  * Delete new properties if they are not needed

    <OMS_HOME>/bin/emctl delete property -name CollectionResults.MaximumRowsFloodControlMin
    <OMS_HOME>/bin/emctl delete property -name CollectionResults.MaximumRowsFloodControlMax

Sample output
[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMin for oms All Management Servers is null

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMax for oms All Management Servers is null

[oracle@oms]$ emctl set property -name CollectionResults.MaximumRowsFloodControlMin -value 5000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property CollectionResults.MaximumRowsFloodControlMin has been set to value 5000 for all Management Servers
OMS restart is not required to reflect the new property value

[oracle@oms]$ emctl set property -name CollectionResults.MaximumRowsFloodControlMax -value 50000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property CollectionResults.MaximumRowsFloodControlMax has been set to value 50000 for all Management Servers
OMS restart is not required to reflect the new property value

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMin at Global level is 5000

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMax at Global level is 50000

Sunday, April 25, 2021

OEM 13c EMGC_ADMINSERVER & EMGC_OMS1 target status show DOWN when emctl secure wls with cut certificate

 When custom certificate is configured for OEM 13c, WebLogic Servers installed as part of Enterprise Manager Cloud control (Administration Server and Managed Servers) can be secured with custom certificate using following command,

   $OMS_HOME/bin/emclt secure wls

However, the WebLogic Servers and their deployments could show down in OEM console after secured with custom certificate, though they are still running well.

The reason is that the CA involved in issuing the custom certificate for OMS is not "well known", at least it is not accepted by Oracle as default trusted CA. When agent running on OMS server communicates with WebLogic Servers (WLS), WLS is using the custom certificate as his own identification, but agent cannot find trusted certificates of CA involved in issuing the custom certificate from agent's local keystore. Therefore, agent cannot verify the validation of WLS's certificate, and stops the communication with WLS.

The quick fix is to import the certificate of each CA involved in issuing the ticket into agent local keystore with following command,

    $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -trust_certs_loc <ca_certificate_file> -alias <certificate_alias> [-password <keystore_pwd>]

Here, <certificate_alias> is used to identify the certificates saved in the keystore, they must be unique for each certificate, <keystore_pwd> is the password of the keystore, the default value is welcome.

For example, I have installed a CA in my lab network, and the CA issued certificate to my OMS server. The two of my CA server certificates (root certificate & intermediate certificate) has to be imported into agent keystore as following,

 $AGENT_HOME/bin/emctl stop agent

 $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -password welcome -alias dbaplus-root -trust_certs_loc /home/oracle/Root_CA_Certificate.txt

 $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -password welcome -alias dbaplus-intermediate -trust_certs_loc /home/oracle/Intermediate_CA_Certificate.txt

 $AGENT_HOME/bin/emctl start agent

List the certificates imported into agent monitor keystore,

 $AGENT_HOME/jdk/bin/keytool -list -alias <certificate_alias> -keystore   $AGENT_INSTANCE_HOME/sysman/config/montrust/AgentTrust.jks -storepass welcome -v

If needed, the certificates can be removed from keystore as following

 $AGENT_HOME/jdk/bin/keytool -delete -alias <certificate_alias> -keystore   $AGENT_INSTANCE_HOME/sysman/config/montrust/AgentTrust.jks -storepass welcome -v

Saturday, April 24, 2021

OEM 13c Target "EM Jobs Service" shown as down in EM Console while all associated targets are up

"EM Jobs Service" target status is showing down in Enterprise Manager Cloud Control(EM) console even though all associated targets are up and running. It could be an issue with the metric collection definition. Usually it is seen at post blackout of associated targets.

The status of EM Jobs Service is aggregated target status, it is calculated based on the status of the associated targets. The associated targets and calculation logic are defined by default when the system is installed, and you can change it later.

The issue can be fixed by changing/restoring Availability Definition of the service as following,

1. In EM Console navigate to the following menu

     Targets > Services > Click on "EM Jobs Service" target

2. In "EM Jobs Service" home page, click on the tab "Monitoring Configuration" and then click on the link "Availability Definition"

3. Take a screen shot of the "Availability Definition" configurations, change the definition to a different option and click OK to save it.

    For Instance, If Availability Definition is to consider "All key components are up" (default definition), change it to "At least one key component is up" and save change.

4. Now revert "Availability Definition" of the service back to original configration by following the same procedure.

    For instance, change and save "Availability Definition" to "All key components are up"

The target status shows up as all components are up.

Wednesday, April 14, 2021

OEM 12c/13c Agent Deployment fails with "Remote Validatons: Shell Path Validation Failed"

When deploying agent on OEM 12c/13c using 'Add Host Targets' wizard, the deployment fails with

Remote Validations:  Shell Path validation failed

Cause:  Shell path is incorrect or not defined.:/bin/bash(SH_PATH),-c(SH_ARGS) on host <host name> 

Recommendation:  Check the property values in the following files in this order, ssPaths_<plat>.properties or sPaths.properties or Paths.properties, in "/u01/app/oracle/em13.4/middleware/oui/prov/resources" directory. If the property values are correct, then ensure the login user account is enabled for remote logins.For more details, refer to the Oracle Enterprise Manager Basic Installation Guide.

Most common reason why it happened could be one of following

1. Shell (sh, bash & ksh) location is different from OEM defined location
OEM defined shell location can be found from file 'ssPaths_<platform>.properties' under directory '$OMS_HOME/oui/prov/resources'. For example, if the errors happens on deploying agent to AIX host, type the content of file 'ssPaths_aix.properties' which looks like following
SH_PATH=/bin/bash
SH_ARGS=-c
SHELL_PATH=/bin/bash
SHELL_ARGS=-c
KSH_PATH=/usr/bin/ksh
RMDIR_ARGS=
#the date should be in the format of year:month:date:hour:minute:second
DATE_ARGS=-u +%y:%m:%d:%H:%M:%S
PING_PATH=/usr/sbin/ping
SSH_KEYGEN_PATH=/usr/bin/ssh-keygen
TAR_EXCLUDE_ARGS=X
TAR_INCLUDE_ARGS=-I
DF_COL_NAME=avail
SSH_HOST_KEY_LOC=/etc/ssh

On the host where agent is going to be installed, check if the executables/shell exist and are located at same place as in the OEM file 'ssPaths_<platform>.properties'. In previous example files, the executables/shell are

/bin/bash
/usr/bin/ksh
/usr/bin/ssh-keygen

If it does not exist, you have to install it. If it exists but is located at different directory, edit the OEM file and replace the shell/executable path with the directory where the shell/executable is.

2. Incorrect user name or password configured in Named Credential which is used to deploy the agent

If incorrect user name or password is used, the error could also happen. If you do not have the password of the user defined in Named Credential, the issue can be confirmed by checking following log file on oms server,
  
$OMS_INSTANCE_BASE/em/EMGC_OMS1/sysman/agentpush/<timestamp>/applogs/<host_name>_deploy.log

For example, the failed deployment log is

  /u01/app/oracle/em13.4/gc_inst/em/EMGC_OMS1/sysman/agentpush/2021-04-13_12-58-49-PM/applogs/host01.lab.dbaplus.ca_deploy.log

And following message is found in the log
2021-04-13_12-59-55:INFO:===VALIDATION===:Checking SH_PATH on target nodes
2021-04-13_12-59-55:INFO:isWrongShPath:remotePathPropertiesLoc:/u01/app/oracle/em13.4/middleware/oui/prov/resources Platform id:212
2021-04-13_12-59-55:INFO:NODES=host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Running cmd /bin/bash -c /bin/true on node host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Action description Execution of command /bin/bash -c /bin/true  on host host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Attempt :1 pty required false  with no inputs
2021-04-13_12-59-56:INFO:/bin/bash -c /bin/true execution failed on host host01.lab.dbaplus.ca
2021-04-13_12-59-56:INFO: OUT null
2021-04-13_12-59-56:INFO: ERR WARNING: Your password has expired.
Password change required but no TTY available.

We can see that the password has expired, ask system administrator to reset the password and also update the password for Named Credential.

The easist way to eliminate this error because of user name or password issue is to ask system administrator to test the login manually out of OEM.

Sunday, April 11, 2021

OEM 13c Discovering WebLogic Domain failed to save Node Manger target with error 'This target requires a local Management Agent'

When discovering or refreshing a WebLogic Domain or Fusion Middleware Farm in Enterprise Manager (EM) 13.4 Cloud Control, the Node Manager target is not saved. The error is shown in EM:

Failed to save NM_xxx_x(Oracle WebLogic Node Manager) on host <IP/host name>. This target requires a local Management Agent, but a local Management Agent was not found.  In order to add this target, you need to install a Management Agent on the same host as the target and then perform a "Refresh WebLogic Domain" operation.

The agent has been installed on the host. The errors happened because of difference between Listen Address of Node Manager configuration and host name of EM Agent URL. As a solution, the Listen Address Node Manager should be changed to host name of EM Agent URL.

Oracle explains it as incorrect configuration of Oracle WebLogic Node Manager. Therefore, it could happen on all release of EM 13c. However, I can only reproduce the problem in EM 13.1 and 13.4 when Listen Address of WebLogic Node Manager is configured with IP address instead of host name which is used by EM Agent URL and there is no problem with EM 13.2. Anyway, having both configuration use same host name is not bad idea.

Find out host name of EM agent URL with command <AGENT_HOME>/bin/emctl status agent
$ /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0/bin/emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 4
Copyright (c) 1996, 2020 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version          : 13.4.0.0.0
OMS Version            : 13.4.0.0.0
Protocol Version       : 12.1.0.1.0
Agent Home             : /u01/app/oracle/em13.4/agent/agent_inst
Agent Log Directory    : /u01/app/oracle/em13.4/agent/agent_inst/sysman/log
Agent Binaries         : /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0
Core JAR Location      : /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0/jlib
Agent Process ID       : 76282
Parent Process ID      : 76240
Agent URL              : https://host01.lab.dbaplus.ca:3872/emd/main/
Local Agent URL in NAT : https://host01.lab.dbaplus.ca:3872/emd/main/
Repository URL         : https://oms.lab.dbaplus.ca:4903/empbs/upload
Started at             : 2021-04-07 17:53:56
Started by user        : oracle
Operating System       : Linux version 4.1.12-124.46.4.1.el7uek.x86_64 (amd64)
...
---------------------------------------------------------------
Agent is Running and Ready

Change Listen Address of Node Manager to the host name of EM Agent URL in the WebLogic Admin Console as following,
1. Go to Node Manger configuraiton page

   Environment > Machines > [Machine Name] > Configuration > Node Manager

2. Click 'Lock & Edit' to enable edit mode

3. Set the value of "Listen Address" property to the host name given by previous command 'emctl status agent'

4. Click 'Save', then click 'Activate Changes'

Refresh or rediscover the domain, the Node Manger will be discovered successfully.

Wednesday, March 31, 2021

OEM 13.4 AgentPull.sh failed with '0403-011 The specified substitution is not valid for this command'

When using Oracle Enterprise Manager 13.4 AgentPull.sh to deploy Agent 13.4 on AIX server, got following error,
./AgentPull.sh[270]: platform=${platform/\)/%29}: 0403-011 The specified substitution is not valid for this command.

Line 270 in AgentPull.sh is
platform=${platform/\)/%29}

It is a bash statement which tries to replace right parenthesis with code '%29', it works for bash but not Bourne shell (sh). The problem is that the script has directive to have Unix/Linux to run the script under bourne shell
$ head AgentPull.sh
#!/bin/sh
#
# $Header: emcore/install/unix/AgentPull.sh.template /main/30 2019/12/19 21:29:10 vbhaagav Exp $
#
# AgentPull.sh
#
# Copyright (c) 2011, 2019, Oracle and/or its affiliates. All rights reserved.
#
#    NAME
#      AgentPull.sh - <one-line expansion of the name>

The script works on most Linux servers because the /bin/sh is a symbolic link file linked to /bin/bash on these servers. Unfortunately, AIX is honest :), sh is sh and bash is bash, they are different.

As workaround, replace first line of AgentPull.sh '#!/bin/sh' with '#!/bin/bash'.

Tuesday, March 30, 2021

OEM 13.4 New Imported RuleSet False Evaluation of Lifecycle Status

If Rule Set is defined for targets based on Lifecycle Status, and the Rule Set is exported and imported into Enterprise Manager 13.4, the Rule Set will be always activated regardless of Lifecycle Status.

For example, two Rule Sets are created in source Enterprise Manager,

   Rule Set A for targets which Lifecycle Status is Production
   Rule Set B for targets which Lifecycle Status is Development

Both A and B are exported from source EM and imported into EM 13.4, and one target (no matter what value of Lifecycle Status) is brought down, both A and B will send out notification.

It can be fixed by manually editing the Rule Set. You do not need really to change anything of the Rule Set, just select the Rule Set and click Edit, then click Save.

OEM 13.2 Exported Ruleset Failed with 'Error in parsing properties for RuleSet from the XML' when Importing into EM 13.x

When importing ruleset in Enterprise Manager 13.x (confirmed in 13.2 and 13.4) from xml file exported from Enterprise Manager 13.2, got following error,

  Error in parsing properties for RuleSet from the XML

If the destination EM version is 13.2, following message will be found from trace file emoms.trc under directory <OMS_INSTANCE_BASE>/em/EMGC_OMS1/sysman/log
2021-03-29 22:49:12,394 [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR exportimport.ImportHelper logp.251 - Error in parsing properties for RuleSet from the XML
oracle.sysman.emSDK.app.exception.EMSystemException
     at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesModelUtil.throwEMSystemException(RulesModelUtil.java:1486)
     at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:116)
     ...
    ... 100 more
Caused by: oracle.jbo.JboException: JBO-29000: Unexpected exception caught: oracle.jbo.DMLConstraintException, msg=JBO-26048: Constraint "EM_RULE_EXPRESSIONS_UK1" is violated during post operation "Insert" using SQL statement "INSERT INTO EM_RULE_EXPRESSIONS(SELECTION_OBJECT_ID,EXPRESSION_ID,EXPRESSION_GROUP_ID,ATTR_ID,OPERATOR_CODE,ATTR_VALUE,SEL_OBJ_TYPE,EXPRESSION_TYPE,UI_GROUP_ID,EXCLUSION_GROUP_ID) VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)".
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.doDML(RulesFwkEntityImpl.java:201)
    at oracle.jbo.server.EntityImpl.postChanges(EntityImpl.java:7271)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:97)
    ... 104 more
Caused by: oracle.jbo.DMLConstraintException: JBO-26048: Constraint "EM_RULE_EXPRESSIONS_UK1" is violated during post operation "Insert" using SQL statement "INSERT INTO EM_RULE_EXPRESSIONS(SELECTION_OBJECT_ID,EXPRESSION_ID,EXPRESSION_GROUP_ID,ATTR_ID,OPERATOR_CODE,ATTR_VALUE,SEL_OBJ_TYPE,EXPRESSION_TYPE,UI_GROUP_ID,EXCLUSION_GROUP_ID) VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)".
    at oracle.jbo.server.OracleSQLBuilderImpl.doEntityDML(OracleSQLBuilderImpl.java:565)
    at oracle.jbo.server.EntityImpl.doDML(EntityImpl.java:9098)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.doDML(RulesFwkEntityImpl.java:196)
    ... 106 more
Caused by: java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (SYSMAN.EM_RULE_EXPRESSIONS_UK1) violated

If the destination EM version is 13.4 (or 13.3), following message will be found from trace file emoms.trc
2021-03-29 22:04:29,284 [[ACTIVE] ExecuteThread: '70' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR exportimport.ImportHelper logp.251 - Error in parsing properties for RuleSet from the XML
oracle.sysman.emSDK.app.exception.EMSystemException
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesModelUtil.throwEMSystemException(RulesModelUtil.java:1509)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:106)
    ...
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
Caused by: oracle.jbo.RowAlreadyDeletedException: JBO-25019: Entity row with key oracle.jbo.Key[BEB7FD94E2112377E053DA570F89CC8F ] is not found in EmRuleSetsEO.
    at oracle.jbo.server.OracleSQLBuilderImpl.doEntitySelectForAltKey(OracleSQLBuilderImpl.java:811)
    at oracle.jbo.server.BaseSQLBuilderImpl.doEntitySelect(BaseSQLBuilderImpl.java:554)
    at oracle.jbo.server.EntityImpl.doSelect(EntityImpl.java:9133)
    at oracle.jbo.server.EntityImpl.lock(EntityImpl.java:6612)
    at oracle.jbo.server.EntityImpl.beforePost(EntityImpl.java:7150)
    at oracle.jbo.server.EntityImpl.postChanges(EntityImpl.java:7384)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:97)
    ... 95 more

It is a bug of EM 13.2, it can be fixed by apply OMS one-off patch 25986453 on source EM 13.2. The patch is also needed for destination EM if the version is 13.2.

Monday, March 29, 2021

OEM 13c: Oracle Coherence Cache Targets Showing Down after WebLogic Server PSU applied

 After WebLogic Server (WLS) Patch Set Update (PSU) is applied on WLS home installed as part of Oracle Enterprise Manager (EM) 13c, two of Oracle Coherence Cache objects in WebLogic Domain '/EMGC_GCDomain/GCDomain' are shown Down in EM console.

The issues, at least, was seen after applying WLS PSU 12.2.1.3.201217 on EM 13.4.0.9 home, and WLS PSU 12.2.1.4.210330 on EM 13.5.0.0.0 home.

It can be fixed by removing and re-adding the targets as following,

1.  Remove the Oracle Coherence Cache Targets showing Down from OEM

    * In the OEM Console,  navigate to 'Targets' > 'Middleware'
    * On left panel expand 'Target Type', then expand 'Coherence', Select Oracle Coherence Cache. All Oracle Coherence Cache targets will be listed on the right window.
    * Right click on the name of the targets showing Down, click pop-up menu 'Target Setup' > 'Remove Target'
   
    The Down targets are deleted from OEM console.
   
2.  Refresh the domain '/EMGC_GCDomain/GCDomain' to add removed/missing Oracle Coherence Cache objects

    * In the OEM Console,  navigate to 'Targets' > 'Middleware'
    * On left panel, clear the selection you did while removing the targets. The Oracle WebLogic Domain '/EMGC_GCDomain/GCDomain' will be listed in right window
    * Click the domain name '/EMGC_GCDomain/GCDomain'
    * Navigate to the  'Refresh Weblogic Domain' in the Drop-Down menu on the home page of target '/EMGC_GCDomain/GCDomain'
    * New Screen will appear, click on "Add and Update Targets..."
    * Following the prompt to add new found targets
 
The removed targets will be added with new target names.

Wednesday, March 10, 2021

DBCA does not list ASM diskgroup for storage option

Oracle 19c DBCA does not list ASM disk groups in "Select Database Storage Option" window.
Messages in dbca trace file $ORACLE_BASE/cfgtoollogs/dbca/trace.log_<TIMESTAMP>
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  /u01/app/19.0.0/grid/bin/kfod
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  nohdr=TRUE
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  OP=GROUPS
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  status=true
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  asmcompatibility=true dbcompatibility=true
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:46.045 EST ] [KfodUtil.kfodOutput:386]  Kfod result
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context

[DBStorageOption.flowWorker] [ 2021-03-08 13:24:46.046 EST ] [ASMUtils.loadDiskGroups:1221]  Loading  the diskgroups. exception using kfodError retrieving diskgroup using kfod utility, null
INFO: Mar 08, 2021 1:24:46 PM oracle.assistants.common.lib.FileSystemInfo getSharedStoragePaths
INFO: Getting default shared storage path.

DBCA failed with command "/u01/app/19.0.0/grid/bin/kfod", manually run the command
[oracle@host01]$ /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context

Debug the command with truss or strace as following
  On Linux:  strace -o /tmp/kfod.out /u01/app/19.9.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
  On Solaris/AIX: truss -o /tmp/kfod.out /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Example on Solaris
[oracle@host01]$ truss -o /tmp/kfod.out /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context
[oracle@host01]$
[oracle@host01]$ tail /tmp/kfod.out
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01/kfod/log", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01/kfod", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag", 0xFFFFFFFF7FFFBF80) = 0
getuid()     = 104 [104]
getgid()     = 112 [112]
getuid()     = 104 [104]
getuid()     = 104 [104]
mmap(0x00010000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFFFFFFFF73900000
getuid()     = 104 [104]
getuid()     = 104 [104]
open("/system/volatile/name_service_door", O_RDONLY) = 6
fcntl(6, F_SETFD, 0x00000001)   = 0
door_info(6, 0xFFFFFFFF75D8B5B0)  = 0
door_call(6, 0xFFFFFFFF7FFFBBB8)  = 0
ioctl(1, TCGETA, 0xFFFFFFFF7FFF8F7C)  = 0
fstat(1, 0xFFFFFFFF7FFF8F10)   = 0
write(1, " E r r o r   4 9 8 0 2  ".., 29) = 29
write(1, " E R R O R ! ! !   c o u".., 47) = 47
close(5)     = 0
_exit(1)

From the trace file /tmp/kfod.out, kfod failed with accessing following directories
    /u01/app/19.0.0/grid/log/diag/kfod/host01/kfod/log
    /u01/app/19.0.0/grid/log/diag/kfod/host01/kfod
    /u01/app/19.0.0/grid/log/diag/kfod/host01
    /u01/app/19.0.0/grid/log/diag/kfod
    /u01/app/19.0.0/grid/log/diag

Check the permission of the directory
[oracle@host01]$ ls -ld /u01/app/19.0.0/grid/log/diag
drwxr-x---   4 grid     oinstall       5 Dec 12 23:17 /u01/app/19.0.0/grid/log/diag
[oracle@host01]$ id
uid=104(oracle) gid=112(oinstall)

oracle, who is running kfod, is member of group oinstall, but group oinstall does not have write permission on directory. Trying to grant write permission as root or grid (owner of GI home)
[grid@host01]$ chmod g+w /u01/app/19.0.0/grid/log/diag
[grid@host01]$ ls -ld /u01/app/19.0.0/grid/log/diag
drwxrwx---   4 grid     oinstall       5 Dec 13 02:45 /u01/app/19.0.0/grid/log/diag

Re-try kfod as oracle
[oracle@host01]$ /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
    204528     188110 EXTERN REDO 10.1.0.0.0 12.1.0.0.0
    613584     388124 EXTERN FRA  10.1.0.0.0 12.1.0.0.0
   4091624     470712 EXTERN DATA 10.1.0.0.0 12.1.0.0.0

Now, dbca can list ASM disk groups.

Monday, March 1, 2021

19c runcluvfy.sh faile with PRVF-7596 PRVG-2002

When upgrading Oracle Infrastructure 12.2 to 19c, runcluvfy.sh failed with following messages
Verifying OCR Integrity ...FAILED
host02: PRVF-7596 : CSS is probably working with a non-clustered, local-only
          configuration on node "host02"
Verifying resolv.conf Integrity ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/resolv.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request
Verifying DNS/NIS name service ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/netsvc.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request

Set log location and re-run runcluvfy.sh
# runcluvfy.sh will write log files to directory /u01/tmp
export CV_TRACELOC=/u01/tmp
# re-run runcluvfy.sh
./runcluvfy.sh stage -pre crsinst -upgrade -rolling -src_crshome /u01/app/12.2.0/grid_1 -dest_crshome /u01/app/19.0.0/grid_1 -dest_version 19.0.0.0.0 -fixup -verbose
# Log files created
cd /u01/tmp
ls -l
total 15200
-rw-r--r--    1 grid     oinstall     952467 Feb 24 16:52 cvuhelper.log.0
-rw-r--r--    1 grid     oinstall          0 Feb 24 16:48 cvuhelper.log.0.lck
-rw-r--r--    1 grid     oinstall    6824223 Feb 24 16:52 cvutrace.log.0

The log file cvutrace.log.0 shows messsage
[Worker 0] [ 2021-02-24 16:47:49.784 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/tmp/CVU_19.0.0.0.0_grid/scratch/getFileInfo3605304.out' /tmp/host02.getFileInfo3605304.out
[Worker 0] [ 2021-02-24 16:47:49.798 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-83] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-82] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-83] [ 2021-02-24 16:47:50.201 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:46.747 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/resolv.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
[main] [ 2021-02-24 16:51:46.760 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1060] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-1059] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.148 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1060] [ 2021-02-24 16:51:47.149 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:47.548 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/netsvc.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/nssw_conf_host02
[Thread-1066] [ 2021-02-24 16:51:47.559 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.559 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1065] [ 2021-02-24 16:51:47.560 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.965 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1066] [ 2021-02-24 16:51:47.973 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request

All three errors occurred when scp is executed to copy file from remote node host02 to local node.
Interesting thing is that all three commands are using single quotation marks to enclose remote file name, I guess Oracle programmer left them there by accident because there is no reason to use them. Let's verify if it is the culprit by manually running scp
[grid@host01]$ scp -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
protocol error: filename does not match request
[grid@host01]$
[grid@host01]$ scp -T -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
resolv.conf                                             96    51.3KB/s   00:00
[grid@host01]$

The scp fails with same error "protocol error: filename does not match request", but succeeds if extra option -T is used.

The -T option was introduced by OpenSSH 8.0 released in April 2019. In earlier version of OpenSSH, when copying files from a remote system to a local directory, scp did not verify that the file names that the server sent matched those requested by the client. This could allow a hostile server to create or clobber unexpected local files with attacker-controlled content. OpenSSH 8.0 fixed this security issue and scp, by default, verifies the file name on client side, and also introduced -T option to provide capacity to disable the verification.

Although OpenSSH officially claims that the fix is introduced in 8.0, runcluvfy.sh shipped with Grid Infrastructure 19.3 also fails with same reason on AIX with OpenSSH 7.5p1 and it is where the errors used in this article happened.

This version of runcluvfy.sh sends remote file name with single quotation marks, but remote server returns file name without quotation. Technically, they are same thing, but they are visually different. Therefore, old version scp worked because it did not verify them, but current scp fails it with "filename does not match".

It can be resolved by downloading new standalone version Cluster Verification Utility from My Oracle Support(Patch 30839369). Note: the executable is cluvfy instead of runcluvfy.sh. If download is not the option. As a temporary workaround, we can rename scp and create a new scp
# Rename the original scp
mv /usr/bin/scp /usr/bin/scp.bak
# Create a new file scp
echo "/usr/bin/scp.orig -T $*" > /usr/bin/scp
# Make the file executable
chmod a+rx /usr/bin/scp

Now, the errors are gone. After successfully installing GI, remember to restore original scp
# Delete interim scp
rm /usr/bin/scp
# Restore the original scp.
mv /usr/bin/scp.bak /usr/bin/scp

Sunday, February 28, 2021

19c gridSetup.sh failed with [INS-06006] Passwordless SSH connectivity not set up between the following nodes

When running 19.3 gridSetup.sh to configure Grid Infrastructure, following error occurs at configuring passwordless SSH connectivity between cluster nodes
[INS-06006] Passwordless SSH connectivity not set up between the following node(s): [host02].

Cause - Either passwordless SSH connectivity is not setup between specified node(s) or they are not reachable. Refer to the logs for more details.

Action - Refer to the logs for more details or contact Oracle Support Services.

More Details
PRVF-5311 : File "/tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out" either does not exist or is not accessible on node "host02". 

Manually test passwordless SSH connectivity between nodes
[grid@host01]$ ssh host02       <=  Connect to second node host02 from first node host01
[grid@host02]$                  <=  Connected to host02 successfully without password

[grid@host02]$ ssh host02       <=  Connect to first node host01 from second node host02
[grid@host01]$                  <=  Connected to host01 successfully without password

Apparently, the passwordless SSH connectivity has been configured properly. Re-run gridSetup.sh in debug mode to figure out what happened,
grid@host01]$ /u01/app/19.0.0/grid _1/gridSetup.sh -debug

Launching Oracle Grid Infrastructure Setup Wizard...

[main] [ 2021-02-24 17:29:53.105 EST ] [Version.isPre:757]  version to be checked 19.0.0.0.0 major version to check against 10
[main] [ 2021-02-24 17:29:53.106 EST ] [Version.isPre:768]  isPre.java: Returning FALSE

  <<Message truncated>>

[Worker 2] [ 2021-02-24 17:31:26.521 EST ] [Utils.getLocalHost:487]  Hostname retrieved: host01, returned: host01
[Worker 2] [ 2021-02-24 17:31:26.521 EST ] [Utils.getLocalHost:487]  Hostname retrieved: host01, returned: host01
[Worker 2] [ 2021-02-24 17:31:26.521 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out' /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
[Worker 2] [ 2021-02-24 17:31:26.534 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-442] [ 2021-02-24 17:31:26.534 EST ] [StreamReader.run:62]  In StreamReader.run 
[Thread-443] [ 2021-02-24 17:31:26.534 EST ] [StreamReader.run:62]  In StreamReader.run 
[Thread-443] [ 2021-02-24 17:31:26.855 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
[Worker 2] [ 2021-02-24 17:31:26.855 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.runCommand:323]  RunTimeExec: error>
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.runCommand:326]  protocol error: filename does not match request
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:516]  Calling Runtime.exec() with the command 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  /usr/bin/scp 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  -p 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  host02:'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out' 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.runCommand:349]  Returning from RunTimeExec.runCommand
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [NativeSystem.rununixcmd:1345]  NativeSystem.rununixcmd: RetString 0|protocol error: filename does not match request :failed
[Worker 2] [ 2021-02-24 17:31:26.859 EST ] [ClusterConfig$ExecuteCommand.returnCommandToClient:3324]  returnCommandToClient; fillCount=1 is full=false
[Worker 2] [ 2021-02-24 17:31:26.859 EST ] [Semaphore.release:88]  SyncBufferFull:Release called by thread Worker 2 m_count=2

When validating/configuring passwordless SSH connectivity, it has to copy SSH key information between nodes using scp. From the trace we can find that, it uses scp with -p option to copy the files, and destination file on local server (host01) is sent to scp with file name:

  /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
  
And source file on remote server(host02) is sent to scp with file name:

  host02:'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out'

Interesting thing is that the whole file path includes single quotation marks, I guess Oracle programmer left them there by accident because there is no reason to use them. Let's verify if it is the culprit by manually running scp
[grid@host01]$ scp -p host02:"'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out'" /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
protocol error: filename does not match request
[grid@host01]$
[grid@host01]$ scp -T -p host02:"'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out'" /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
getFileInfo1638946.out                                             98    56.5KB/s   00:00
[grid@host01]$

The scp fails with same error "protocol error: filename does not match request", but succeeds if extra option -T is used.

The -T option was introduced by OpenSSH 8.0 released in April 2019. In earlier version of OpenSSH, when copying files from a remote system to a local directory, scp did not verify that the file names that the server sent matched those requested by the client. This could allow a hostile server to create or clobber unexpected local files with attacker-controlled content. OpenSSH 8.0 fixed this security issue and scp, by default, verifies the file name on client side, and also introduced -T option to provide capacity to disable the verification.

Although OpenSSH officially claims that the fix is introduced in 8.0, gridSetup.sh 19.3 also fails with same reason on AIX with OpenSSH 7.5p1 and it is where the errors used in this article happened.

Oracle gridSetup.sh 19.3 (base pubic release of 19c) sends remote file name with single quotation marks, but remote server returns file name without quotation. Techinally, they are same thing, but they are visually different. Therefore, old version scp worked because it did not verify them, but current scp fails it with "filename does not match".

Oracle Release Update 19.6 fixed this problem by removing the single quotation marks. Therefore, gridSetup.sh can be run successfully with -applyRU option to apply 19.6 or higher Realease Update before installing/configuring GI
[grid@r6-dart]$ ./gridSetup.sh -applyRU /u01/stage/30501910
Preparing the home to patch...
Applying the patch /u01/stage/grid/30501910...
Successfully applied the patch.
The log can be found at: /u01/app/oraInventory/logs/GridSetupActions2021-02-24_03-25-57PM/installerPatchActions_2021-02-24_03-25-57PM.log
...

Here, Oracle Grid Infrastructure Release Update 19.6 (Patch 30501910) is unzipped under directory /u01/stage/30501910.

If 19.3 is really needed for some reason, as a temporary workaround, we can rename scp and create a new scp
# Rename the original scp
mv /usr/bin/scp /usr/bin/scp.bak

# Create a new file scp
echo "/usr/bin/scp.orig -T $*" > /usr/bin/scp

# Make the file executable
chmod a+rx /usr/bin/scp

After successfully installing GI 19.3, remember restore original scp
# Delete interim scp
rm /usr/bin/scp

# Restore the original scp.
mv /usr/bin/scp.bak /usr/bin/scp

Saturday, February 27, 2021

19c Upgrade - Fail with ORA-07445: exception encountered: core dump [joevm_invokevirtual()+1260]

When 12.2 pluggable database is upgraded to 19c, dbupgrade fails with message
**********   Upgrading Java and non-Java   *********
Serial   Phase #:53   [PDB1] Files:2 
Unexpected error encountered in catctlMain; Error Stack Below; exiting
Died at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 9188.
 at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 9188.
main::catctlDie("\x{a}Unexpected error encountered in catconExec; exiting\x{a} No such"...) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 5650
main::catctlExecutePhaseFiles(53, 2, "PDB1", "", undef) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 2035
main::catctlRunPhase(53, 2, "PDB1", "", undef) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 2197
main::catctlRunPhases(0, 108, 108, "PDB1", "", undef) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 2816
main::catctlRunMainPhases() called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 1463
main::catctlMain() called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 1370
eval {...} called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 1368

------------------------------------------------------
CATCTL FATAL ERROR
------------------------------------------------------

 LOG FILES: (/u01/app/oracle/product/19.0.0/dbhome_1/cfgtoollogs/DB01/upgrade20210222125535/catupgrdPDB1*.log)
TRACE FILE: (/u01/app/oracle/product/19.0.0/dbhome_1/cfgtoollogs/DB01/upgrade20210222125535/catupgrdPDB1_trace.log)

In log file catupgrdPDB1*.log
13:09:24 SQL> -- Load all the Java classes
13:09:24 SQL> begin if initjvmaux.startstep('CREATE_JAVA_SYSTEM') then
13:09:24   2  initjvmaux.rollbacksetup;
13:09:24   3  commit;
13:09:24   4  initjvmaux.rollbackset;
13:09:24   5  initjvmaux.exec('create or replace java system');
13:09:24   6  commit;
13:09:24   7  initjvmaux.rollbackcleanup;
13:09:24   8  initjvmaux.endstep;
13:09:24   9  dbms_registry.update_schema_list('JAVAVM',
13:09:24  10        dbms_registry.schema_list_t('OJVMSYS'));
13:09:24  11  end if; end;
13:09:24  12  /
begin if initjvmaux.startstep('CREATE_JAVA_SYSTEM') then
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
ERROR:
ORA-03114: not connected to ORACLE

In database alert log file
2021-02-22T13:09:41.671656-05:00
PDB1(4):Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x4] [PC:0x118CE298C, joevm_invokevirtual()+1260] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_2_ora_4739.trc  (incident=28841) (PDBNAME=PDB1):
ORA-07445: exception encountered: core dump [joevm_invokevirtual()+1260] [SIGSEGV] [ADDR:0x4] [PC:0x118CE298C] [Address not mapped to object] []
PDB1(4):Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_28841/DB01_ora_4739_i28841.trc
PDB1(4):Use ADRCI or Support Workbench to package the incident.

The culprit is that the PDB (PDB1) was plugged from NONCDB which was followed by running noncdb_to_pdb.sql.  It can be fixed as following,

1. Connect to the database as SYS and run following sql commands in root container CDB$ROOT
alter session set container=CDB$ROOT;

alter session set "_ORACLE_SCRIPT"=true;

create or replace view rootobj sharing=object as
   select obj#,o.name,u.name uname,o.type#,o.flags
     from obj$ o,user$ u where owner#=user#;

2. Connect to the database as SYS and run following sql commands in PDB container PDB1 (your PDB name may be different)
alter session set container=PDB1;

create or replace view rootobj sharing=object as
   select obj#,o.name,u.name uname,o.type#,o.flags
     from obj$ o,user$ u where owner#=user#;

update obj$ set flags=flags+65536
 where type# in (28,29,30,56)
       and bitand(flags,65536)=0
       and obj# in (select o.obj# from obj$ o,user$ u,rootobj r
                     where o.name=r.name and o.type#=r.type# and o.owner#=u.user# and u.name=r.uname and bitand(r.flags,65536)!=0
                    union
                    select obj# from obj$ where bitand(flags,4259840)=4194304);

delete from sys.idl_ub1$
 where obj# in (select obj# from sys.obj$ where bitand(flags, 65536)=65536 and type# in (28,29,30,56));

commit;

3. Re-start upgrade from failed step (Phase #:53 in my case)
$ORACLE_HOME/bin/dbupgrade -p 53 -c 'PDB1'

Tuesday, February 23, 2021

Use UTL_FILE to Write BLOBs Stored Inside the Database Out to Files

Create a precedure as following
create or replace procedure  save_blob_to_file (a_directory varchar2, a_filename varchar2, a_data blob)
is
  p_blob_length  integer;
  p_file         utl_file.file_type;
  p_offset       integer := 1;
  p_chunksize    pls_integer := 32767;
  p_buffer       raw(32767);
begin
  /* Determine size of BLOB */
  p_blob_length := dbms_lob.getlength(a_data);
  /* open a file handle */
  p_file := utl_file.fopen(a_directory, a_filename, 'wb', p_chunksize);
  begin
    /* write BLOB to file in chunks */
    while p_offset <= p_blob_length loop
      if p_offset + p_chunksize - 1 > p_blob_length then
         p_chunksize := p_blob_length - p_offset + 1;
      end if;
      dbms_lob.read(a_data, p_chunksize, p_offset, p_buffer);
      utl_file.put_raw(p_file, p_buffer, true);
      p_offset := p_offset + p_chunksize;
    end loop;
  exception
    when others then
      /* Close the file handle in case of any exceptions*/
      utl_file.fclose(p_file);
      raise;
  end;
  /* Close the file handle after all data is written out*/
  utl_file.fclose(p_file);
end;
/

Sample code to save BLOB to file with procedure 'save_blob_to_file'
declare
  p_blob       blob;
  p_directory  varchar2(30) := 'BLOB_DIR';
  p_filename   varchar2(255):= 'first_blob_file.tst';
begin
  select blob_data into p_blob from tb_blob_file where rownum=1;
  save_blob_to_file(p_directory,p_filename,p_blob);
end;
/

Here, directory 'BLOB_DIR' has to be created before calling the procedure. Example for creating directory,
system@db01> create directory BLOB_DIR as '/u01/blob_files';
Directory created.

Make sure OS directory '/u01/blob_files' exists and Oracle home owner (normally oracle) has read/write under this directory
[oracle@host01]$ ls -ld /u01/blob_files
drwxr-xr-x    2 oracle   oinstall       8192 Feb 20 20:33 /u01/blob_files

Sunday, February 7, 2021

Oracle Data Pump Export Failing with ORA-39244

Oracle database 19.9 Data Pump export job fails with ORA-39244: Event to disable dropping null bit image header during relational select, as following,
Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
ORA-31626: job does not exist
ORA-31637: cannot create job SYS_EXPORT_SCHEMA_01 for user SYSTEM
ORA-06512: at "SYS.KUPV$FT", line 1142
ORA-06512: at "SYS.KUPV$FT", line 1744
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT_INT", line 1099
ORA-39244: Event to disable dropping null bit image header during relational select
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.KUPC$QUE_INT", line 2002
ORA-24145: evaluation context SYS.AQ$_KUPC$DATAPUMP_QUETAB_1_V already exists
The evaluation context SYS.AQ$_KUPC$DATAPUMP_QUETAB_1_V was left by corrupted/failed Data Pump job. Manually dropping the context will fix the issue
sys@DB> exec DBMS_RULE_ADM.DROP_EVALUATION_CONTEXT(EVALUATION_CONTEXT_NAME=>'AQ$_KUPC$DATAPUMP_QUETAB_1_V',FORCE=>true);

PL/SQL procedure successfully completed.