Wednesday, December 1, 2021

Use oradebug To List All Internal Events Set In Oracle Database

Oracle internal events can be set at both sesson level and system level as following,

A. Session level
Session level events only affect current session. Users, who has ALTER SESSION privilege granted, can set event at session level with SQL command,

    alter session set events '<event context>' ;

For example
# -- Set sql_trace (event 10046) to level 12
user01@orcl> alter session set events '10046 trace name context forever, level 12';

# -- Unset/disable sql_trace (event 10046)
user01@orcl> alter session set events '10046 trace name context off';

# -- Set event for ORA-1652: unable to extend temp segment by %s in tablespace %s
user01@orcl> alter session set events '1652 trace name errorstack';

# -- Unset event for ORA-1652
user01@orcl> alter session set events '1652 trace name context off';
B. System level
Events set at systm level affect all database sessions. Users, who has both ALTER SYSTEM and ALTER SESSION privileges granted, can set events at system level with SQL command,

    alter system set events '<event context>';

For example
# -- Set sql_trace (event 10046) to level 12
user01@orcl> alter system set events '10046 trace name context forever, level 12';

# -- Unset/disable sql_trace (event 10046)
user01@orcl> alter system set events '10046 trace name context off';

# -- Set event for ORA-1652: unable to extend temp segment by %s in tablespace %s
user01@orcl> alter system set events '1652 trace name errorstack';

# -- Unset event for ORA-1652
user01@orcl> alter system set events '1652 trace name context off';
Note: ALTER SYSTEM SET EVENTS will set the events at system level for all sessions, as well as session level for current session. Therefore, the user has to be granted both ALTER SYSTEM and ALTER SESSION privileges to run this command.

In order to find out / list all events set at system level or session level, following steps is needed

1. Run SQL*Plus and connect to database as sysdba

The oradebug command will be run in SQL*Plus.

2. Identify the session, this step is only needed if you want to list session level events set in the sessions other than current session

Find out the session's Oracle process identifier (PID), Operating system process identifier (SPID) and Oracle process name (PNAME) with following SQL,
select p.pid, p.spid, p.pname, s.username, s.machine, s.program
  from v$process p, v$session s 
 where p.addr = s.paddr;
Sample output
PID SPID   PNAME  USERNAME   MACHINE         PROGRAM
--- ------ ------ ---------- --------------- ----------------------
 45 24432         USER01     workstation01   sqlplus.exe
The the sample output shows that USER01 is running sqlplus.exe on machine workstation01, which session PID is 45 and SPID is 24432.

3. Attached to the identified session

A. If you want to list system level events or session level events in current session, run oradebug command ,

   oradebug setmypid

For example
sys@orcl> oradebug setmypid
Statement processed.
B. If you want to list events set in the session other than current session, run one of following oradebug commands,

   oradebug setorapid      <PID>
   oradebug setospid       <SPID>
   oradebug setorapname    <PNAME>

For example, attaching the session found in step 2
sys@sys@orcl> oradebug setorapid 45
Oracle pid: 45, Unix process pid: 24432, image: oracle@dbhost01
Or
sys@sys@orcl> oradebug setospid 24432
Oracle pid: 45, Unix process pid: 24432, image: oracle@dbhost01
Here, oradebug setorapname cannot be used to attach the session because the process of the session does not have PNAME (PNAME is null). 

4. List the events

A. List events set at system level, run command

   oradebug eventdump system   -- 10g or newer
   oradebug dump events 4      -- 8i/9i

Sample output
sys@orcl> oradebug eventdump system
1652 trace name errorstack
The trace is enabled for ORA-1652 at system level.

B. List events set at session/process level, run command

   oradebug eventdump session   -- 10g or newer
   oradebug dump events 1          -- 8i/9i for session
   oradebug eventdump process  -- 10g or newer
   oradebug dump events 2          -- 8i/9i for process

Sample output
sys@orcl> oradebug eventdump process
1652 trace name errorstack
sys@orcl> oradebug eventdump session
sql_trace level=12
1652 trace name errorstack
The trace is set for ORA-1652 at both process and session level. And sql_trace is set at session level, the trace level is 12.

Tuesday, November 30, 2021

Identify SQL statements consuming temporary tablespace

You may have to find out the SQL statements which are using temporary tablespace when you get storage pressure for temporary tablespace.

Monday, November 15, 2021

Oracle 19c root.sh not working while gridSetup.sh switchGridHome to software-only installation

When the command "griddSetup.sh -switchGridHome" is used to perform Oracle database 19c out of place (OOP) patching, it will prompt to run root.sh on each cluster node. And root.sh will switch Grid Home from old one (without new patches) to new one (applied with new patches) on each node and also patch Grid Infrastructure Management Repository (GIMR) database on last node of the cluster.

Unfortunately, roo.sh could only work on first node, and not on others if you deploy the software/patches in following steps,

Step 1. Install new GI home with software only option with command,

   ./gridSetup.sh -applyRU <RU_Patch_Location> -applyOneOffs <comma_separated_one-off_patches_location>

At the end of the installation, you will be asked to run root.sh on each node. This software-only version root.sh does not touch old GI home and any running cluster processes (listener, asm, etc.). The new home is applied with new patctes (specified by command option applyRU and/or applyOneOffs).

Step 2. Switch GI from old home to new patched home with command,

   ./gridSetup.sh -switchGridHome

This installation/configuration utility will create a new version of rootconfig.sh under directory "$ORACLE_HOME/crs/config" with home-switching and patching options enabled on local node (known as fist node), the affected options as following,
   Software-only version            New version
   ---------------------------      --------------------------
   PATCH_HOME=false                 PATCH_HOME=true
   MOVE_HOME=false                  MOVE_HOME=true
   TRANSPARENT_MOVE_HOME=false      TRANSPARENT_MOVE_HOME=true
   SWONLY_MULTINODE=true            SWONLY_MULTINODE=false
The rootconfig.sh is called by root.sh. Technically, the utility gridSetup.sh should copy this new version of rootconfig.sh to all other nodes. However, it does not always happen and you will see nothing being changed by root.sh on nodes other than first node where the utility is running. This issue is experienced when new home is patches with Oracle Release Update 19.13.0.0.211019.

If root.sh does not work on other nodes, you can fix it by manually copying new version of rootconfig.sh from first node to other nodes.

Note: Always make a backup of existing rootconfig.sh before overriding it with new rootconfig.sh copied from first node and it is also a good practice to compare two versions of rootconfig.sh and verify what the differences are. 

Saturday, November 6, 2021

EM 12c/13c Agent Unreachable (Agent is running but is currently not read to accept client requests)

EM 12c/13c create Agent Unreachable alerts as follows: 
Host=host01.dbaplus.ca 
Target type=Agent 
Target name=host01.dbaplus.ca:3872 
Categories=Availability 
Message=Agent Unreachable (REASON = The agent is running but is currently not ready to accept client requests). Host is reachable. 
Severity=Critical 
Event reported time=Nov 6, 2021 12:36:48 PM EDT 
Operating System=SunOS
Platform=sparc
Event Type=Target Availability 
Event name=Status 
Availability status=Agent Unreachable
Update Details:
Agent Unreachable (REASON = The agent is running but is currently not ready to accept client requests). Host is reachable.
It usually happens when the agent is monitoring many targets or heavily loaded.  Agent cannot respond to OMS heartbeat, it cause oms to generate Agent Unreachable alerts.

Check the value of agent property MaxInComingConnections with following command
${AGNET_INSTANCE_HOME}/bin/emctl getproperty agent -name MaxInComingConnections
Sample output if the property is configured,
$ emctl getproperty agent -name MaxInComingConnections
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
MaxInComingConnections=50
If the property is not configured, the output looks like following,
$ emctl getproperty agent -name MaxInComingConnections
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
MaxInComingConnections is not a valid configuration property
The property MaxInComingConnections can be configured by adding following line in agent configuration file "${AGENT_INSTANCE_HOME}/sysman/config/emd.properties",
MaxInComingConnections=200
If the property already exists, you can find this line (the value may be different depending on your system) in the file, and there is no need to add new line,  just update the value to a bigger number (e.g. 200).

Note, it is a best practice to do a backup before editing the configuration file.

After saving the change, restart the agent.

Friday, November 5, 2021

Javascript/CSS: Add Copy-to-Clipboard Button in HTML Code Example

When feeling tired to select the text before copying them, I decide to add a COPY button to every code example block in my blog. The button will copy the whole example block to clipboard. I reached it through 4 steps.

Wednesday, November 3, 2021

Javascript/CSS: Add Line Number to HTML Source Code Block

 When I shared the source code in my blog before, I used to reference source code by highlighting the code lines with HTML tag. It was not an issue until I had many references to different lines of a long clode block. It could take too much of my time to highlight or mark the reference when I post an artical. Therefore, I added the line number to the source code in my posts. The line-numbered code block looks like,

Sunday, October 24, 2021

EM 12c/13c The listener is down: Failed to run lsnrctl

The listener is up and running. However, the lisener target shows as Down in the Oracle Enterprise Manager Cloud Control Console and the incident message as following,
The listener is down: Failed to run lsnrctl.

Friday, October 1, 2021

Change SYS password in Oracle Data Guard

In Oracle Data Guard configuration, redo transport uses Oracle Net sessions to transport redo data. These redo transport sessions are authenticated using either the Secure Socket Layer (SSL) protocol or a remote login password file. Most time, remote login password file is adopted because of the difficulty of meeting SSL authentication requirements.

Monday, August 30, 2021

Database startup failed with ORA-600 [dbgripmg_2: infinite init action] [ADR_CONTROL_AUX]

Oracle database startup failed with ORA-00600 as following
SQL> startup
ORACLE instance started.

Total System Global Area 8589878792 bytes
Fixed Size                 12854792 bytes
Variable Size            5402263552 bytes
Database Buffers         3154116608 bytes
Redo Buffers               20643840 bytes
ORA-00600: internal error code, arguments: [dbgripmg_2: infinite init action],
[11], [ADR_CONTROL_AUX], [], [], [], [], [], [], [], [], []

Errors ORA-00700 [dbgrmblcp_corrupt_page]/ORA-00600 [dbgrmblgp_get_page_1] can be found from alert log as following,
Dumping diagnostic data in directory=[cdmp_20210830103319], requested by (instance=1, osid=85656173 (M000)), summary=[incident=186167].
2021-08-30 10:33:19.773000 -04:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_m000_11797188.trc  (incident=186169) (PDBNAME=CDB$ROOT):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186169/DB01_m000_11797188_i186169.trc
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_ora_59376292.trc  (incident=186150) (PDBNAME=CDB$ROOT):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/ADR_CONTROL.ams], [11], [], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/ADR_CONTROL.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186150/DB01_ora_59376292_i186150.trc
Dumping diagnostic data in directory=[cdmp_20210830103320], requested by (instance=1, osid=85656173 (M000)), summary=[incident=186168].
2021-08-30 10:33:21.211000 -04:00
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_m000_11797188.trc  (incident=186170) (PDBNAME=CDB$ROOT):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_m000_11797188.trc  (incident=186171) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [11], [0], [0], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186171/DB01_m000_11797188_i186171.trc
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_ora_59376292.trc  (incident=186151) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [11], [0], [0], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/ADR_CONTROL.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186151/DB01_ora_59376292_i186151.trc

ADRCI is accessing ams files which are corrupted. In the example alert, corrupted files are INCIDENT.ams and ADR_CONTROL.ams under $ADR_HOME. The corrupted ams file may be different in different scenarios.

Solution is that deleting the corrupted files and restart database.

Wednesday, July 7, 2021

AQ Queue Monitor Does Not Change Message MSG_STATE from WAIT to READY on RAC

In RAC database, the Advanced Queue (AQ) message MSG_STATE (column of view AQ$<QUEUE_TABLE>) stays with value 'WAIT' and will not be changed to 'READY' any more. It usually happens after database was shut down with immediate/abort option followed by database patching/upgrading. 

Friday, June 25, 2021

EM 12c/13c Update Oracle Home Path of Targets with SQL

 As out-of-place patching can dramatically help minimize database downtime, especially when multiple databases run out of same home, it does not require all databases shut down at same time, more and more DBA adopt this method. However, if you are patching a home which host multiple databases, you may feel frustrated to update the databases' Oracle Home property in Oracle Enterprise Manage by clicking through different windows and changing it one by one. Now, the first question you will ask is how I can update all databases at one time. The answer is to run SQL (PL/SQL) in EM repository database.

Wednesday, June 23, 2021

EM 12c/13c Change Lifecycle Status with SQL

The Lifecycle Status property of EM targets is often used to prioritize the notification of incident rule. For example, DBA will be paged while database which Lifecycle Status is 'Production' crashed, and only emailed for 'Test' database. This post is going to show how to use sql scripts to find out targets with different Lifecycle Status and how to change the Lifecycle Status.

Tuesday, June 22, 2021

EM 12c/13c How to find out new discovered targets with SQL

 When agent is deployed to new host, or new targets is installed or created on existing hosts, Oracle Enterprise Manger (EM) Cloud Control can discover the targets automatically. However, EM does not promote the new discovered targets automatically.

Sunday, June 20, 2021

Script Run datapatch against all running instance in parallel

This script can be used to run datapatch against all currently running instances in parallel to apply sql patches after patches are applied on Oracle homes. It is helpful for post patching operations. It works for different instances running out of different Oracle homes and different home with different owners.

The script accepts one optional parameter as log file path, it will save log files under /tmp if no parameter is presented while the script is started.

In order to have the script succeed, please be advised about following limitations,

1. The script is only tested on Linux and AIX, it does not work on Solaris.

2. Only root or Oracle database home owner is supposed to run this script. If the instances are running out of different Oracle homes which are owned by different OS users, root user is recommended. Otherwise, the instances running out of Oracle home which owner is different from current user will be excluded.

3. Oracle database home is 12c or higher which supports datapatch

4. The script runs following command as root to retrieve Oracle home path
   /bin/ls
   Therefore, if Oracle home owner (normally oracle) runs this script, sudo has to be configure to grant the user privilege to run '/bin/ls' as root without being asked for password.

Friday, June 18, 2021

Install PostgreSQL on RHEL/OL/CentOS 6/7/8

There are many ways to install PostgreSQL on Linux box. Here, I am going to use yum to install different version of PostgreSQL on Red Hat Enterprise Linux (RHEL) 7/8, it also works for Oracle Linux (OL) / CentOS 7/8.

Thursday, June 17, 2021

Java Stored Procedure failed with java.lang.OutOfMemoryError

Oracle database Jave stored procedure failed with "java.lang.OutOfMemoryError", the error stack looks like
ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
ORA-06512: at "USER01.MYJAVAPROC", line 28
ORA-06512: at line 7

This error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The Java garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.  

Wednesday, June 16, 2021

Oracle 19.11 roothas.sh failed with "Out of memory" on AIX

When applying Oracle GI Release Update 19.11.0.0.210420 on AIX, command "roothas.sh -postpatch" failed with "Out of memory" as following
[root@host01]# /u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.sh -postpatch
Using configuration parameter file: /u01/app/oracle/product/19.0.0/grid_1/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/host01/crsconfig/hapatch_2021-06-15_01-53-27PM.log
Out of memory!
Out of memory!
Out of memory!
/u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.sh[137]: 7930494 Segmentation fault(coredump)
The command '/u01/app/oracle/product/19.0.0/grid_1/perl/bin/perl -I/u01/app/oracle/product/19.0.0/grid_1/perl/lib -I/u01/app/oracle/product/19.0.0/grid_1/crs/install -I/u01/app/oracle/product/19.0.0/grid_1/xag /u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.pl -postpatch' execution failed

Tuesday, June 15, 2021

Configure yum with proxy server on RHEL/OL/Fedora

When using yum to maintain packages on hosts running Red Hat Enterprise Linux (RHEL), Oracle Linux (OL) or Fedora, you may have to configure proxy option if the host is running behind firewall.

In order to do that, add following lines to yum configuration file /etc/yum.conf

proxy=http://<proxy server>:<port>
proxy_username=<user name>
proxy_password=<password>

Here, 

  <proxy server> is the host name or IP address of proxy server
  <user name> user name if proxy server requires
  <password>  password if proxy server requires

File /etc/yum.conf  example
gpgcheck=1
installonly_limit=3
clean_requirements_on_remove=True
best=True
skip_if_unavailable=False
proxy=http://proxy.dbaplus.ca:8080
proxy_username=user01
proxy_password=user01pwd

Thursday, June 3, 2021

How to Permanently Change PowerShell Prompt

 The PowerShell command prompt indicates that PowerShell is ready to run a command,
PS C:\>

It is determined by the built-in Prompt function and can be customized by running following command to re-define Prompt function,

  function prompt {"<Script block>"}

Here, <Script block> is script block which will determine/generate prompt value (character string). For example, following have prompt include current date and time
PS C:\>
PS C:\> function Prompt {"$(Get-Date)> "}
06/03/2021 11:39:23>
06/03/2021 11:39:25>

The change is only valid for current session, the prompt of new session is still default value. In order to keep new prompt for all sessions, you have to create your own Prompt function and saving it in your PowerShell profile as following,

1. Find out your PowerShell profile file with "$profile"
PS C:\> $profile
C:\Users\admin\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
PS C:\>

2. Add customized Prompt function to profile

If the file or directory does not exist, you can manually create it, then add customized Prompt function to the file. For example, add following lines
function Prompt { "PS [" + ${ENV:USERNAME} + "@" + ${ENV:COMPUTERNAME} + "]> "}

The prompt will be in the format "PS [user-name@computer-name]> " in all new PowerShell sessions.

Sunday, May 16, 2021

Grid Infrastructure 12.2 restore OCR, Voting File and mgmtdb when disk group corrupted

For some reason, Oracle desupported the placement of OCR and voting files directly on a shared file system from Grid Infrastructure 12.2 until rescinding the desupport for Standalone Clusters from 19.3. Therefore, when GI 12.2 is installed, OCR, Voting file, and OCR backup location are, by default, configured in the ASM disk group, and mostly likely in the same disk group. When the disk group cannot be mounted because of any issues, the cluster will not be brought up anymore.

Technically, GI has to be re-configured as like a new installation, that could be a big job. However, it may not be that bad. Although 12.2 does not allow you have OCR backup location out of ASM disk group, DBA still can copy the OCR backup file from ASM disk group to file system with command 'asmcmd cp'. I am going to demonstrate how we can maximize the opportunity of bringing the cluster backup without rebuild/reinstall it.

Sunday, May 9, 2021

Windows How to log all output on the console to text file

Windows Command Prompt console does not have built-in facility to log the console output to file. In order to implement logging function, PowerShell has to be used instead of normal command console.

To enable logging, running following command under PowerShell prompt,

  Start-Transcript [-Path] "<file-name>" [-Append]

Here, 

  -Path parameter is used to specify the file "<file-name>" and "<file-name>" is full path of log file in which output messages will be saved

  -Append Indicates that the logging text will be added to the end of an existing file instead of overriding.

To stop logging, running command

  Stop-Trasnscript

Note: PowerShell can be started on Windows by clicking "Windows PowerShell" applicaiton or running command "PowerShell" in normal "Command Prompt" console, but DO NOT start PowerShell in "Windows Terminal". If the PowerShell is started from "Window Terminal", Start-Transcript may not be able to log everything. For example, non-Windows built-in command (e.g. sqlplus) output cannot be completely logged.
  
Example

PS> Start-Transcript -Path "C:\temp\test.log"
Transcript started, output file is C:\temp\test.log
PS>
PS> sqlplus system/oracle@orcl

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 9 20:30:15 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 13 2021 16:46:29 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

system@orcl> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
host01

system@orcl> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
PS>
PS> Stop-Transcript
Transcript stopped, output file is C:\temp\test.log

The content of file C:\temp\test.log

**********************
Windows PowerShell transcript start
Start time: 20210509202857
Username: dbaplus
RunAs User: dbaplus
Configuration Name: 
Machine: wkstn01 (Microsoft Windows NT 10.0.19042.0)
Host Application: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
Process ID: 15732
PSVersion: 5.1.19041.906
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.19041.906
BuildVersion: 10.0.19041.906
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
Transcript started, output file is C:\temp\test.log
PS> sqlplus system/oracle@orcl

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 9 20:30:15 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 13 2021 16:46:29 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

system@orcl> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
host01

system@orcl> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
PS> Stop-Transcript
**********************
Windows PowerShell transcript end
End time: 20210509203043
**********************

Wednesday, May 5, 2021

OEM 12c/13c Database Target Discovered and Named with host name as suffix

 When EM discovers database targets, it generates default target name for the database in one of following formats,

  db_unique_name.db_domain   if both parameters db_unique_name and db_domain are set
  db_unique_name             if parameter db_unique_name is set, but db_domain is not set
  db_name.db_domain          if parameter db_unique_name is not set, but db_domain is set
  db_name                    if neither db_unique_name nor db_domain is set

EM discovery script retrieves these parameters' values from parameter file (pfile or spfile) of the database, not from running instance. If the discovery process has issue to locate or process parameter file, EM will name database target in format,

  <sid>_<hostname>
  
Therefore, when you find the new discovered database target is named in this format, it means EM agent discovery script has issue with parameter file. The details can be found from agent trace file "<AGENT_INST_HOME>/sysman/log/emagent_perl.trc". Most popular error messages look like

  ERROR:  initParameterFileUtl::convertSPFileToPFile: Failed to convert spfile
  
or

  ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance <instancename> in oracle home  <oracle_home>

For example, when Oracle Restart (standalone Grid Infrastructure) is installed and ASM storage is configured on server host1.dbaplus.ca, DBCA creates a database orcl (sid & db_name) in ASM diskgroup, the spfile will also be saved in ASM diskgroup and no parameter file (pfile/spfile) will be created under directory <ORACLE_HOME>/dbs. The database will be discovered by EM with default name orcl_host1.dbaplus.ca and you will see errors in agent trace file "emagent_perl.trc",

oracledb.pl: 2021-05-04 04:03:08,923: INFO:  DB_LISTENER_DISCOVERY:  processing sid="orcl"
oracledb.pl: 2021-05-04 04:03:08,926: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:08,930: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:09,065: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:09,069: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs

The reason is that discovery script cannot find parameter file from <ORACLE_HOME>/dbs. To fix this problem, create a pfile 'initorcl.ora' under <ORACLE_HOME>/dbs with following content,

   spfile='<full path of spfile saved in diskgroup>'

Note: DO NOT leave any SPACE character at the beginning of the line (before word "spfile"). If you did, you will not see any errors in the trace file, but the target name will be <sid>_<hostname>.

Sample init file

$ srvctl config database -db orcl | grep spfile
Spfile: +DATA/orcl/PARAMETERFILE/spfile.919.1071658047
$
$ echo "spfile='+DATA/orcl/PARAMETERFILE/spfile.919.1071658047'" > $ORACLE_HOME/dbs/initorcl.ora
$
$ cat $ORACLE_HOME/dbs/initorcl.ora
spfile='+DATA/orcl/PARAMETERFILE/spfile.919.1071658047'

Sunday, May 2, 2021

Oracle 19.11 deinstall failed with "ERROR: oracle/rat/tfa/util/ManageTfa"

After Oracle database 19c home is applied Release Update 19.11.0.0.210420, the home cannot be de-installed anymore.

The deinstall utility will fail with errors,
######################## DEINSTALL CLEAN OPERATION START ########################
## [START] Preparing for Deinstall ##
Setting LOCAL_NODE to thanos
Setting CRS_HOME to false
Setting oracle.installer.invPtrLoc to /tmp/deinstall2021-05-02_07-34-15PM/oraInst.loc
Setting oracle.installer.local to false

ERROR: oracle/rat/tfa/util/ManageTfa
Exited from program.


############# ORACLE DEINSTALL TOOL END #############

In deinstall error log,
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at oracle.install.db.deinstall.wrapper.Deinstall.callCleanImpl(Deinstall.java:1876)
        at oracle.install.db.deinstall.wrapper.Deinstall.main(Deinstall.java:907)
Caused by: java.lang.NoClassDefFoundError: oracle/rat/tfa/util/ManageTfa
        at oracle.install.db.deinstall.core.PrepForOUIDeinstall.cleanConfig(PrepForOUIDeinstall.java:187)
        ... 6 more
Caused by: java.lang.ClassNotFoundException: oracle.rat.tfa.util.ManageTfa
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 7 more

Looks like something related to TFA is missing after RU 19.11 is applied.

Workaround,

Roll back RU 19.11, then re-run deinstall utility.

Oracle 19c runInstaller failed with "undefined reference to 'jox_eujs_nowait_'"

Updated on Thursday, Oct 21, 2021 with following,

The error is also found with RU 19.13 (19.13.0.0.211019).
I believe it is same thing with 19.12, though I did not test it.

When installing Oracle 19c with patch apply of Database RU 19.11 and OJVM RU 19.11, runInstaller failed with message

   Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile '/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/ins_rdbms.mk'. See /u01/app/oraInventory/logs/InstallActions2021-05-01_08-40-24PM/installActions2021-05-01_08-40-24PM.log for details.

Saturday, May 1, 2021

OEM 13c Software Standardization Advisor Returns Collection Errors

In EM 13c console, access "Software Standardization Advisor" as following,

 Targets -> Databases -> Administration -> Software Standardization Advisor
 
You could see a number beside "Collection Errors" for database and/or Grid infrastructure home. The issue can be found in EM 13.3, 13.4 and 13.5.

Click the number beside "Collection Errors", all Oracle home targets which have "Metric Collection Errors" are listed with Path, Host and Owner information. 

Click the name of Oracle Home Target to show home page of the target, at the bottom of "Summary" section, click link "Found 1 Metric Collection Error(s)" beside "Reason".

Metric "Files affected by a Patch" has following "Message"

     java.lang.UnsupportedOperationException: Collection Result Maximum Flood Control Level Exceeded

The issue can be reproduced / verified by run following command on the host where the Oracle home exists,

  <AGENT_HOME>/bin/emctl control agent runCollection <Target name of the Oracle home>:oracle_home oracle_home_config

$ emctl control agent runCollection OraHome12_host01:oracle_home oracle_home_config
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD runCollection completed successfully

Following lines are found from the log file "<AGENT_INST_HOME>/sysman/log/gcagent.log"
2021-04-29 20:25:18,725 [668:D9EFD971:HTTPListener--668 (DispatchRequests emdctl@28474@amber.corp.toronto.ca=>[161974231840001])] INFO - >>> Dispatching request: RunCollectionRequest <<<
2021-04-29 20:25:22,986 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:22,992 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:24,398 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:24,406 [1129:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchedFile)] ERROR - Result set exceeded max flood control level
2021-04-29 20:25:24,408 [1129:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchedFile)] ERROR - oracle_home:OraHome12_host01:oracle_home_config:PatchedFile
java.lang.UnsupportedOperationException: Collection Result Maximum Flood Control Level Exceeded
at oracle.sysman.emSDK.agent.datacollection.CollectionResult.performFloodControl(CollectionResult.java:459)
at oracle.sysman.emSDK.agent.datacollection.CollectionResult.addCollectionRow(CollectionResult.java:662)
at oracle.sysman.gcagent.addon.fetchlet.osfetchlet.BaseOSFetchlet.getOSMetric(BaseOSFetchlet.java:1157)
at oracle.sysman.gcagent.addon.fetchlet.osfetchlet.BaseOSFetchlet.getMetric(BaseOSFetchlet.java:476)
at oracle.sysman.gcagent.target.interaction.execution.FetchletFactory.getMetric(FetchletFactory.java:437)
at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.executeQueryDescriptor(ExecuteTask.java:1284)
at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.runTask(ExecuteTask.java:3167)

From the log, we can find two exceptions:

  1. When collectiong metric PatchFixedBug (Bugs fixed by Patch), got WARN message "Result set exceeding min flood control level"
  2. When collectiong metric PatchedFile (Files affected by Patch), got ERROR message "Result set exceeded max flood control level"

The ERROR can be also confirmed by collecting specific metric "Files affected by Patch" with command

  <AGENT_HOME>/bin/emctl getmetric agent <Oracle Home target name>,oracle_home,PatchedFile

$ emctl getmetric agent OraHome12_host01,oracle_home,PatchedFile
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD getmetric error: Collection Result Maximum Flood Control Level Exceeded


Solution

1.  Agent side settings:

  * Set Min flood control value "CollectionResults.MaximumRowsFloodControlMin" to remove WARN message

    <AGENT_HOME>/bin/emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMin -value 5000

  * Set Max flood control value "CollectionResults.MaximumRowsFloodControlMax" to fix ERROR issue
  
    <AGENT_HOME>/bin/emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMax -value 50000

  * Verify the new value of properties with commands
  
    <AGENT_HOME>/bin/emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
    <AGENT_HOME>/bin/emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax

    Or reviewing agent property file "<AGENT_INST_HOME>/sysman/config/emd.properties"

    grep 'CollectionResults' <AGENT_INST_HOME>/sysman/config/emd.properties

  * Manually start metric collection to reset the error status or you can wait until next scheduled metric collection job running (by default, every 24 hours)

    <AGENT_HOME>/bin/emctl control agent runCollection <Oracle home target name>:oracle_home oracle_home_config

  * In case you change your mind (or want to see the errors) the new value of properties can be cleared/removed with commands

    <AGENT_HOME>/bin/emctl clear_property agent -name CollectionResults.MaximumRowsFloodControlMin
    <AGENT_HOME>/bin/emctl clear_property agent -name CollectionResults.MaximumRowsFloodControlMax

Sample output
$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMin is not a valid configuration property

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMax is not a valid configuration property

$ emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMin -value 5000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD setproperty succeeded

$ emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMax -value 50000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD setproperty succeeded

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMin=5000

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMax=50000

$ grep 'CollectionResults' /u01/app/oracle/em13.5/agent/agent_inst/sysman/config/emd.properties
CollectionResults.MaximumRowsFloodControlMin=5000
CollectionResults.MaximumRowsFloodControlMax=50000

2. OMS side settings:

  * Setting the min value:
    
    <OMS_HOME>/bin/emctl set property -name CollectionResults.MaximumRowsFloodControlMin -value 5000

  * Setting the max value:
    
    <OMS_HOME>/bin/emctl set property -name CollectionResults.MaximumRowsFloodControlMax -value 50000

  * Verify the new value of properties
  
    <OMS_HOME>/bin/emctl get property -name CollectionResults.MaximumRowsFloodControlMin
    <OMS_HOME>/bin/emctl get property -name CollectionResults.MaximumRowsFloodControlMax
    
  * Delete new properties if they are not needed

    <OMS_HOME>/bin/emctl delete property -name CollectionResults.MaximumRowsFloodControlMin
    <OMS_HOME>/bin/emctl delete property -name CollectionResults.MaximumRowsFloodControlMax

Sample output
[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMin for oms All Management Servers is null

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMax for oms All Management Servers is null

[oracle@oms]$ emctl set property -name CollectionResults.MaximumRowsFloodControlMin -value 5000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property CollectionResults.MaximumRowsFloodControlMin has been set to value 5000 for all Management Servers
OMS restart is not required to reflect the new property value

[oracle@oms]$ emctl set property -name CollectionResults.MaximumRowsFloodControlMax -value 50000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property CollectionResults.MaximumRowsFloodControlMax has been set to value 50000 for all Management Servers
OMS restart is not required to reflect the new property value

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMin at Global level is 5000

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMax at Global level is 50000

Sunday, April 25, 2021

OEM 13c EMGC_ADMINSERVER & EMGC_OMS1 target status show DOWN when emctl secure wls with cut certificate

 When custom certificate is configured for OEM 13c, WebLogic Servers installed as part of Enterprise Manager Cloud control (Administration Server and Managed Servers) can be secured with custom certificate using following command,

   $OMS_HOME/bin/emclt secure wls

However, the WebLogic Servers and their deployments could show down in OEM console after secured with custom certificate, though they are still running well.

The reason is that the CA involved in issuing the custom certificate for OMS is not "well known", at least it is not accepted by Oracle as default trusted CA. When agent running on OMS server communicates with WebLogic Servers (WLS), WLS is using the custom certificate as his own identification, but agent cannot find trusted certificates of CA involved in issuing the custom certificate from agent's local keystore. Therefore, agent cannot verify the validation of WLS's certificate, and stops the communication with WLS.

The quick fix is to import the certificate of each CA involved in issuing the ticket into agent local keystore with following command,

    $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -trust_certs_loc <ca_certificate_file> -alias <certificate_alias> [-password <keystore_pwd>]

Here, <certificate_alias> is used to identify the certificates saved in the keystore, they must be unique for each certificate, <keystore_pwd> is the password of the keystore, the default value is welcome.

For example, I have installed a CA in my lab network, and the CA issued certificate to my OMS server. The two of my CA server certificates (root certificate & intermediate certificate) has to be imported into agent keystore as following,

 $AGENT_HOME/bin/emctl stop agent

 $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -password welcome -alias dbaplus-root -trust_certs_loc /home/oracle/Root_CA_Certificate.txt

 $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -password welcome -alias dbaplus-intermediate -trust_certs_loc /home/oracle/Intermediate_CA_Certificate.txt

 $AGENT_HOME/bin/emctl start agent

List the certificates imported into agent monitor keystore,

 $AGENT_HOME/jdk/bin/keytool -list -alias <certificate_alias> -keystore   $AGENT_INSTANCE_HOME/sysman/config/montrust/AgentTrust.jks -storepass welcome -v

If needed, the certificates can be removed from keystore as following

 $AGENT_HOME/jdk/bin/keytool -delete -alias <certificate_alias> -keystore   $AGENT_INSTANCE_HOME/sysman/config/montrust/AgentTrust.jks -storepass welcome -v

Saturday, April 24, 2021

OEM 13c Target "EM Jobs Service" shown as down in EM Console while all associated targets are up

"EM Jobs Service" target status is showing down in Enterprise Manager Cloud Control(EM) console even though all associated targets are up and running. It could be an issue with the metric collection definition. Usually it is seen at post blackout of associated targets.

The status of EM Jobs Service is aggregated target status, it is calculated based on the status of the associated targets. The associated targets and calculation logic are defined by default when the system is installed, and you can change it later.

The issue can be fixed by changing/restoring Availability Definition of the service as following,

1. In EM Console navigate to the following menu

     Targets > Services > Click on "EM Jobs Service" target

2. In "EM Jobs Service" home page, click on the tab "Monitoring Configuration" and then click on the link "Availability Definition"

3. Take a screen shot of the "Availability Definition" configurations, change the definition to a different option and click OK to save it.

    For Instance, If Availability Definition is to consider "All key components are up" (default definition), change it to "At least one key component is up" and save change.

4. Now revert "Availability Definition" of the service back to original configration by following the same procedure.

    For instance, change and save "Availability Definition" to "All key components are up"

The target status shows up as all components are up.

Wednesday, April 14, 2021

OEM 12c/13c Agent Deployment fails with "Remote Validatons: Shell Path Validation Failed"

When deploying agent on OEM 12c/13c using 'Add Host Targets' wizard, the deployment fails with

Remote Validations:  Shell Path validation failed

Cause:  Shell path is incorrect or not defined.:/bin/bash(SH_PATH),-c(SH_ARGS) on host <host name> 

Recommendation:  Check the property values in the following files in this order, ssPaths_<plat>.properties or sPaths.properties or Paths.properties, in "/u01/app/oracle/em13.4/middleware/oui/prov/resources" directory. If the property values are correct, then ensure the login user account is enabled for remote logins.For more details, refer to the Oracle Enterprise Manager Basic Installation Guide.

Most common reason why it happened could be one of following

1. Shell (sh, bash & ksh) location is different from OEM defined location
OEM defined shell location can be found from file 'ssPaths_<platform>.properties' under directory '$OMS_HOME/oui/prov/resources'. For example, if the errors happens on deploying agent to AIX host, type the content of file 'ssPaths_aix.properties' which looks like following
SH_PATH=/bin/bash
SH_ARGS=-c
SHELL_PATH=/bin/bash
SHELL_ARGS=-c
KSH_PATH=/usr/bin/ksh
RMDIR_ARGS=
#the date should be in the format of year:month:date:hour:minute:second
DATE_ARGS=-u +%y:%m:%d:%H:%M:%S
PING_PATH=/usr/sbin/ping
SSH_KEYGEN_PATH=/usr/bin/ssh-keygen
TAR_EXCLUDE_ARGS=X
TAR_INCLUDE_ARGS=-I
DF_COL_NAME=avail
SSH_HOST_KEY_LOC=/etc/ssh

On the host where agent is going to be installed, check if the executables/shell exist and are located at same place as in the OEM file 'ssPaths_<platform>.properties'. In previous example files, the executables/shell are

/bin/bash
/usr/bin/ksh
/usr/bin/ssh-keygen

If it does not exist, you have to install it. If it exists but is located at different directory, edit the OEM file and replace the shell/executable path with the directory where the shell/executable is.

2. Incorrect user name or password configured in Named Credential which is used to deploy the agent

If incorrect user name or password is used, the error could also happen. If you do not have the password of the user defined in Named Credential, the issue can be confirmed by checking following log file on oms server,
  
$OMS_INSTANCE_BASE/em/EMGC_OMS1/sysman/agentpush/<timestamp>/applogs/<host_name>_deploy.log

For example, the failed deployment log is

  /u01/app/oracle/em13.4/gc_inst/em/EMGC_OMS1/sysman/agentpush/2021-04-13_12-58-49-PM/applogs/host01.lab.dbaplus.ca_deploy.log

And following message is found in the log
2021-04-13_12-59-55:INFO:===VALIDATION===:Checking SH_PATH on target nodes
2021-04-13_12-59-55:INFO:isWrongShPath:remotePathPropertiesLoc:/u01/app/oracle/em13.4/middleware/oui/prov/resources Platform id:212
2021-04-13_12-59-55:INFO:NODES=host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Running cmd /bin/bash -c /bin/true on node host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Action description Execution of command /bin/bash -c /bin/true  on host host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Attempt :1 pty required false  with no inputs
2021-04-13_12-59-56:INFO:/bin/bash -c /bin/true execution failed on host host01.lab.dbaplus.ca
2021-04-13_12-59-56:INFO: OUT null
2021-04-13_12-59-56:INFO: ERR WARNING: Your password has expired.
Password change required but no TTY available.

We can see that the password has expired, ask system administrator to reset the password and also update the password for Named Credential.

The easist way to eliminate this error because of user name or password issue is to ask system administrator to test the login manually out of OEM.

Sunday, April 11, 2021

OEM 13c Discovering WebLogic Domain failed to save Node Manger target with error 'This target requires a local Management Agent'

When discovering or refreshing a WebLogic Domain or Fusion Middleware Farm in Enterprise Manager (EM) 13.4 Cloud Control, the Node Manager target is not saved. The error is shown in EM:

Failed to save NM_xxx_x(Oracle WebLogic Node Manager) on host <IP/host name>. This target requires a local Management Agent, but a local Management Agent was not found.  In order to add this target, you need to install a Management Agent on the same host as the target and then perform a "Refresh WebLogic Domain" operation.

The agent has been installed on the host. The errors happened because of difference between Listen Address of Node Manager configuration and host name of EM Agent URL. As a solution, the Listen Address Node Manager should be changed to host name of EM Agent URL.

Oracle explains it as incorrect configuration of Oracle WebLogic Node Manager. Therefore, it could happen on all release of EM 13c. However, I can only reproduce the problem in EM 13.1 and 13.4 when Listen Address of WebLogic Node Manager is configured with IP address instead of host name which is used by EM Agent URL and there is no problem with EM 13.2. Anyway, having both configuration use same host name is not bad idea.

Find out host name of EM agent URL with command <AGENT_HOME>/bin/emctl status agent
$ /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0/bin/emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 4
Copyright (c) 1996, 2020 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version          : 13.4.0.0.0
OMS Version            : 13.4.0.0.0
Protocol Version       : 12.1.0.1.0
Agent Home             : /u01/app/oracle/em13.4/agent/agent_inst
Agent Log Directory    : /u01/app/oracle/em13.4/agent/agent_inst/sysman/log
Agent Binaries         : /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0
Core JAR Location      : /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0/jlib
Agent Process ID       : 76282
Parent Process ID      : 76240
Agent URL              : https://host01.lab.dbaplus.ca:3872/emd/main/
Local Agent URL in NAT : https://host01.lab.dbaplus.ca:3872/emd/main/
Repository URL         : https://oms.lab.dbaplus.ca:4903/empbs/upload
Started at             : 2021-04-07 17:53:56
Started by user        : oracle
Operating System       : Linux version 4.1.12-124.46.4.1.el7uek.x86_64 (amd64)
...
---------------------------------------------------------------
Agent is Running and Ready

Change Listen Address of Node Manager to the host name of EM Agent URL in the WebLogic Admin Console as following,
1. Go to Node Manger configuraiton page

   Environment > Machines > [Machine Name] > Configuration > Node Manager

2. Click 'Lock & Edit' to enable edit mode

3. Set the value of "Listen Address" property to the host name given by previous command 'emctl status agent'

4. Click 'Save', then click 'Activate Changes'

Refresh or rediscover the domain, the Node Manger will be discovered successfully.

Wednesday, March 31, 2021

OEM 13.4 AgentPull.sh failed with '0403-011 The specified substitution is not valid for this command'

When using Oracle Enterprise Manager 13.4 AgentPull.sh to deploy Agent 13.4 on AIX server, got following error,
./AgentPull.sh[270]: platform=${platform/\)/%29}: 0403-011 The specified substitution is not valid for this command.

Line 270 in AgentPull.sh is
platform=${platform/\)/%29}

It is a bash statement which tries to replace right parenthesis with code '%29', it works for bash but not Bourne shell (sh). The problem is that the script has directive to have Unix/Linux to run the script under bourne shell
$ head AgentPull.sh
#!/bin/sh
#
# $Header: emcore/install/unix/AgentPull.sh.template /main/30 2019/12/19 21:29:10 vbhaagav Exp $
#
# AgentPull.sh
#
# Copyright (c) 2011, 2019, Oracle and/or its affiliates. All rights reserved.
#
#    NAME
#      AgentPull.sh - <one-line expansion of the name>

The script works on most Linux servers because the /bin/sh is a symbolic link file linked to /bin/bash on these servers. Unfortunately, AIX is honest :), sh is sh and bash is bash, they are different.

As workaround, replace first line of AgentPull.sh '#!/bin/sh' with '#!/bin/bash'.

Tuesday, March 30, 2021

OEM 13.4 New Imported RuleSet False Evaluation of Lifecycle Status

If Rule Set is defined for targets based on Lifecycle Status, and the Rule Set is exported and imported into Enterprise Manager 13.4, the Rule Set will be always activated regardless of Lifecycle Status.

For example, two Rule Sets are created in source Enterprise Manager,

   Rule Set A for targets which Lifecycle Status is Production
   Rule Set B for targets which Lifecycle Status is Development

Both A and B are exported from source EM and imported into EM 13.4, and one target (no matter what value of Lifecycle Status) is brought down, both A and B will send out notification.

It can be fixed by manually editing the Rule Set. You do not need really to change anything of the Rule Set, just select the Rule Set and click Edit, then click Save.

OEM 13.2 Exported Ruleset Failed with 'Error in parsing properties for RuleSet from the XML' when Importing into EM 13.x

When importing ruleset in Enterprise Manager 13.x (confirmed in 13.2 and 13.4) from xml file exported from Enterprise Manager 13.2, got following error,

  Error in parsing properties for RuleSet from the XML

If the destination EM version is 13.2, following message will be found from trace file emoms.trc under directory <OMS_INSTANCE_BASE>/em/EMGC_OMS1/sysman/log
2021-03-29 22:49:12,394 [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR exportimport.ImportHelper logp.251 - Error in parsing properties for RuleSet from the XML
oracle.sysman.emSDK.app.exception.EMSystemException
     at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesModelUtil.throwEMSystemException(RulesModelUtil.java:1486)
     at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:116)
     ...
    ... 100 more
Caused by: oracle.jbo.JboException: JBO-29000: Unexpected exception caught: oracle.jbo.DMLConstraintException, msg=JBO-26048: Constraint "EM_RULE_EXPRESSIONS_UK1" is violated during post operation "Insert" using SQL statement "INSERT INTO EM_RULE_EXPRESSIONS(SELECTION_OBJECT_ID,EXPRESSION_ID,EXPRESSION_GROUP_ID,ATTR_ID,OPERATOR_CODE,ATTR_VALUE,SEL_OBJ_TYPE,EXPRESSION_TYPE,UI_GROUP_ID,EXCLUSION_GROUP_ID) VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)".
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.doDML(RulesFwkEntityImpl.java:201)
    at oracle.jbo.server.EntityImpl.postChanges(EntityImpl.java:7271)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:97)
    ... 104 more
Caused by: oracle.jbo.DMLConstraintException: JBO-26048: Constraint "EM_RULE_EXPRESSIONS_UK1" is violated during post operation "Insert" using SQL statement "INSERT INTO EM_RULE_EXPRESSIONS(SELECTION_OBJECT_ID,EXPRESSION_ID,EXPRESSION_GROUP_ID,ATTR_ID,OPERATOR_CODE,ATTR_VALUE,SEL_OBJ_TYPE,EXPRESSION_TYPE,UI_GROUP_ID,EXCLUSION_GROUP_ID) VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)".
    at oracle.jbo.server.OracleSQLBuilderImpl.doEntityDML(OracleSQLBuilderImpl.java:565)
    at oracle.jbo.server.EntityImpl.doDML(EntityImpl.java:9098)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.doDML(RulesFwkEntityImpl.java:196)
    ... 106 more
Caused by: java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (SYSMAN.EM_RULE_EXPRESSIONS_UK1) violated

If the destination EM version is 13.4 (or 13.3), following message will be found from trace file emoms.trc
2021-03-29 22:04:29,284 [[ACTIVE] ExecuteThread: '70' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR exportimport.ImportHelper logp.251 - Error in parsing properties for RuleSet from the XML
oracle.sysman.emSDK.app.exception.EMSystemException
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesModelUtil.throwEMSystemException(RulesModelUtil.java:1509)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:106)
    ...
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
Caused by: oracle.jbo.RowAlreadyDeletedException: JBO-25019: Entity row with key oracle.jbo.Key[BEB7FD94E2112377E053DA570F89CC8F ] is not found in EmRuleSetsEO.
    at oracle.jbo.server.OracleSQLBuilderImpl.doEntitySelectForAltKey(OracleSQLBuilderImpl.java:811)
    at oracle.jbo.server.BaseSQLBuilderImpl.doEntitySelect(BaseSQLBuilderImpl.java:554)
    at oracle.jbo.server.EntityImpl.doSelect(EntityImpl.java:9133)
    at oracle.jbo.server.EntityImpl.lock(EntityImpl.java:6612)
    at oracle.jbo.server.EntityImpl.beforePost(EntityImpl.java:7150)
    at oracle.jbo.server.EntityImpl.postChanges(EntityImpl.java:7384)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:97)
    ... 95 more

It is a bug of EM 13.2, it can be fixed by apply OMS one-off patch 25986453 on source EM 13.2. The patch is also needed for destination EM if the version is 13.2.

Monday, March 29, 2021

OEM 13c: Oracle Coherence Cache Targets Showing Down after WebLogic Server PSU applied

 After WebLogic Server (WLS) Patch Set Update (PSU) is applied on WLS home installed as part of Oracle Enterprise Manager (EM) 13c, two of Oracle Coherence Cache objects in WebLogic Domain '/EMGC_GCDomain/GCDomain' are shown Down in EM console.

The issues, at least, was seen after applying WLS PSU 12.2.1.3.201217 on EM 13.4.0.9 home, and WLS PSU 12.2.1.4.210330 on EM 13.5.0.0.0 home.

It can be fixed by removing and re-adding the targets as following,

1.  Remove the Oracle Coherence Cache Targets showing Down from OEM

    * In the OEM Console,  navigate to 'Targets' > 'Middleware'
    * On left panel expand 'Target Type', then expand 'Coherence', Select Oracle Coherence Cache. All Oracle Coherence Cache targets will be listed on the right window.
    * Right click on the name of the targets showing Down, click pop-up menu 'Target Setup' > 'Remove Target'
   
    The Down targets are deleted from OEM console.
   
2.  Refresh the domain '/EMGC_GCDomain/GCDomain' to add removed/missing Oracle Coherence Cache objects

    * In the OEM Console,  navigate to 'Targets' > 'Middleware'
    * On left panel, clear the selection you did while removing the targets. The Oracle WebLogic Domain '/EMGC_GCDomain/GCDomain' will be listed in right window
    * Click the domain name '/EMGC_GCDomain/GCDomain'
    * Navigate to the  'Refresh Weblogic Domain' in the Drop-Down menu on the home page of target '/EMGC_GCDomain/GCDomain'
    * New Screen will appear, click on "Add and Update Targets..."
    * Following the prompt to add new found targets
 
The removed targets will be added with new target names.

Wednesday, March 10, 2021

DBCA does not list ASM diskgroup for storage option

Oracle 19c DBCA does not list ASM disk groups in "Select Database Storage Option" window.
Messages in dbca trace file $ORACLE_BASE/cfgtoollogs/dbca/trace.log_<TIMESTAMP>
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  /u01/app/19.0.0/grid/bin/kfod
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  nohdr=TRUE
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  OP=GROUPS
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  status=true
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  asmcompatibility=true dbcompatibility=true
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:46.045 EST ] [KfodUtil.kfodOutput:386]  Kfod result
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context

[DBStorageOption.flowWorker] [ 2021-03-08 13:24:46.046 EST ] [ASMUtils.loadDiskGroups:1221]  Loading  the diskgroups. exception using kfodError retrieving diskgroup using kfod utility, null
INFO: Mar 08, 2021 1:24:46 PM oracle.assistants.common.lib.FileSystemInfo getSharedStoragePaths
INFO: Getting default shared storage path.

DBCA failed with command "/u01/app/19.0.0/grid/bin/kfod", manually run the command
[oracle@host01]$ /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context

Debug the command with truss or strace as following
  On Linux:  strace -o /tmp/kfod.out /u01/app/19.9.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
  On Solaris/AIX: truss -o /tmp/kfod.out /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Example on Solaris
[oracle@host01]$ truss -o /tmp/kfod.out /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context
[oracle@host01]$
[oracle@host01]$ tail /tmp/kfod.out
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01/kfod/log", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01/kfod", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag", 0xFFFFFFFF7FFFBF80) = 0
getuid()     = 104 [104]
getgid()     = 112 [112]
getuid()     = 104 [104]
getuid()     = 104 [104]
mmap(0x00010000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFFFFFFFF73900000
getuid()     = 104 [104]
getuid()     = 104 [104]
open("/system/volatile/name_service_door", O_RDONLY) = 6
fcntl(6, F_SETFD, 0x00000001)   = 0
door_info(6, 0xFFFFFFFF75D8B5B0)  = 0
door_call(6, 0xFFFFFFFF7FFFBBB8)  = 0
ioctl(1, TCGETA, 0xFFFFFFFF7FFF8F7C)  = 0
fstat(1, 0xFFFFFFFF7FFF8F10)   = 0
write(1, " E r r o r   4 9 8 0 2  ".., 29) = 29
write(1, " E R R O R ! ! !   c o u".., 47) = 47
close(5)     = 0
_exit(1)

From the trace file /tmp/kfod.out, kfod failed with accessing following directories
    /u01/app/19.0.0/grid/log/diag/kfod/host01/kfod/log
    /u01/app/19.0.0/grid/log/diag/kfod/host01/kfod
    /u01/app/19.0.0/grid/log/diag/kfod/host01
    /u01/app/19.0.0/grid/log/diag/kfod
    /u01/app/19.0.0/grid/log/diag

Check the permission of the directory
[oracle@host01]$ ls -ld /u01/app/19.0.0/grid/log/diag
drwxr-x---   4 grid     oinstall       5 Dec 12 23:17 /u01/app/19.0.0/grid/log/diag
[oracle@host01]$ id
uid=104(oracle) gid=112(oinstall)

oracle, who is running kfod, is member of group oinstall, but group oinstall does not have write permission on directory. Trying to grant write permission as root or grid (owner of GI home)
[grid@host01]$ chmod g+w /u01/app/19.0.0/grid/log/diag
[grid@host01]$ ls -ld /u01/app/19.0.0/grid/log/diag
drwxrwx---   4 grid     oinstall       5 Dec 13 02:45 /u01/app/19.0.0/grid/log/diag

Re-try kfod as oracle
[oracle@host01]$ /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
    204528     188110 EXTERN REDO 10.1.0.0.0 12.1.0.0.0
    613584     388124 EXTERN FRA  10.1.0.0.0 12.1.0.0.0
   4091624     470712 EXTERN DATA 10.1.0.0.0 12.1.0.0.0

Now, dbca can list ASM disk groups.

Monday, March 1, 2021

19c runcluvfy.sh faile with PRVF-7596 PRVG-2002

When upgrading Oracle Infrastructure 12.2 to 19c, runcluvfy.sh failed with following messages
Verifying OCR Integrity ...FAILED
host02: PRVF-7596 : CSS is probably working with a non-clustered, local-only
          configuration on node "host02"
Verifying resolv.conf Integrity ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/resolv.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request
Verifying DNS/NIS name service ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/netsvc.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request

Set log location and re-run runcluvfy.sh
# runcluvfy.sh will write log files to directory /u01/tmp
export CV_TRACELOC=/u01/tmp
# re-run runcluvfy.sh
./runcluvfy.sh stage -pre crsinst -upgrade -rolling -src_crshome /u01/app/12.2.0/grid_1 -dest_crshome /u01/app/19.0.0/grid_1 -dest_version 19.0.0.0.0 -fixup -verbose
# Log files created
cd /u01/tmp
ls -l
total 15200
-rw-r--r--    1 grid     oinstall     952467 Feb 24 16:52 cvuhelper.log.0
-rw-r--r--    1 grid     oinstall          0 Feb 24 16:48 cvuhelper.log.0.lck
-rw-r--r--    1 grid     oinstall    6824223 Feb 24 16:52 cvutrace.log.0

The log file cvutrace.log.0 shows messsage
[Worker 0] [ 2021-02-24 16:47:49.784 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/tmp/CVU_19.0.0.0.0_grid/scratch/getFileInfo3605304.out' /tmp/host02.getFileInfo3605304.out
[Worker 0] [ 2021-02-24 16:47:49.798 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-83] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-82] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-83] [ 2021-02-24 16:47:50.201 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:46.747 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/resolv.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
[main] [ 2021-02-24 16:51:46.760 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1060] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-1059] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.148 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1060] [ 2021-02-24 16:51:47.149 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:47.548 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/netsvc.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/nssw_conf_host02
[Thread-1066] [ 2021-02-24 16:51:47.559 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.559 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1065] [ 2021-02-24 16:51:47.560 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.965 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1066] [ 2021-02-24 16:51:47.973 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request

All three errors occurred when scp is executed to copy file from remote node host02 to local node.
Interesting thing is that all three commands are using single quotation marks to enclose remote file name, I guess Oracle programmer left them there by accident because there is no reason to use them. Let's verify if it is the culprit by manually running scp
[grid@host01]$ scp -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
protocol error: filename does not match request
[grid@host01]$
[grid@host01]$ scp -T -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
resolv.conf                                             96    51.3KB/s   00:00
[grid@host01]$

The scp fails with same error "protocol error: filename does not match request", but succeeds if extra option -T is used.

The -T option was introduced by OpenSSH 8.0 released in April 2019. In earlier version of OpenSSH, when copying files from a remote system to a local directory, scp did not verify that the file names that the server sent matched those requested by the client. This could allow a hostile server to create or clobber unexpected local files with attacker-controlled content. OpenSSH 8.0 fixed this security issue and scp, by default, verifies the file name on client side, and also introduced -T option to provide capacity to disable the verification.

Although OpenSSH officially claims that the fix is introduced in 8.0, runcluvfy.sh shipped with Grid Infrastructure 19.3 also fails with same reason on AIX with OpenSSH 7.5p1 and it is where the errors used in this article happened.

This version of runcluvfy.sh sends remote file name with single quotation marks, but remote server returns file name without quotation. Technically, they are same thing, but they are visually different. Therefore, old version scp worked because it did not verify them, but current scp fails it with "filename does not match".

It can be resolved by downloading new standalone version Cluster Verification Utility from My Oracle Support(Patch 30839369). Note: the executable is cluvfy instead of runcluvfy.sh. If download is not the option. As a temporary workaround, we can rename scp and create a new scp
# Rename the original scp
mv /usr/bin/scp /usr/bin/scp.bak
# Create a new file scp
echo "/usr/bin/scp.orig -T $*" > /usr/bin/scp
# Make the file executable
chmod a+rx /usr/bin/scp

Now, the errors are gone. After successfully installing GI, remember to restore original scp
# Delete interim scp
rm /usr/bin/scp
# Restore the original scp.
mv /usr/bin/scp.bak /usr/bin/scp

Sunday, February 28, 2021

19c gridSetup.sh failed with [INS-06006] Passwordless SSH connectivity not set up between the following nodes

When running 19.3 gridSetup.sh to configure Grid Infrastructure, following error occurs at configuring passwordless SSH connectivity between cluster nodes
[INS-06006] Passwordless SSH connectivity not set up between the following node(s): [host02].

Cause - Either passwordless SSH connectivity is not setup between specified node(s) or they are not reachable. Refer to the logs for more details.

Action - Refer to the logs for more details or contact Oracle Support Services.

More Details
PRVF-5311 : File "/tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out" either does not exist or is not accessible on node "host02". 

Manually test passwordless SSH connectivity between nodes
[grid@host01]$ ssh host02       <=  Connect to second node host02 from first node host01
[grid@host02]$                  <=  Connected to host02 successfully without password

[grid@host02]$ ssh host02       <=  Connect to first node host01 from second node host02
[grid@host01]$                  <=  Connected to host01 successfully without password

Apparently, the passwordless SSH connectivity has been configured properly. Re-run gridSetup.sh in debug mode to figure out what happened,
grid@host01]$ /u01/app/19.0.0/grid _1/gridSetup.sh -debug

Launching Oracle Grid Infrastructure Setup Wizard...

[main] [ 2021-02-24 17:29:53.105 EST ] [Version.isPre:757]  version to be checked 19.0.0.0.0 major version to check against 10
[main] [ 2021-02-24 17:29:53.106 EST ] [Version.isPre:768]  isPre.java: Returning FALSE

  <<Message truncated>>

[Worker 2] [ 2021-02-24 17:31:26.521 EST ] [Utils.getLocalHost:487]  Hostname retrieved: host01, returned: host01
[Worker 2] [ 2021-02-24 17:31:26.521 EST ] [Utils.getLocalHost:487]  Hostname retrieved: host01, returned: host01
[Worker 2] [ 2021-02-24 17:31:26.521 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out' /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
[Worker 2] [ 2021-02-24 17:31:26.534 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-442] [ 2021-02-24 17:31:26.534 EST ] [StreamReader.run:62]  In StreamReader.run 
[Thread-443] [ 2021-02-24 17:31:26.534 EST ] [StreamReader.run:62]  In StreamReader.run 
[Thread-443] [ 2021-02-24 17:31:26.855 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
[Worker 2] [ 2021-02-24 17:31:26.855 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.runCommand:323]  RunTimeExec: error>
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.runCommand:326]  protocol error: filename does not match request
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:516]  Calling Runtime.exec() with the command 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  /usr/bin/scp 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  -p 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  host02:'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out' 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.traceCmdEnv:518]  /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out 
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [RuntimeExec.runCommand:349]  Returning from RunTimeExec.runCommand
[Worker 2] [ 2021-02-24 17:31:26.856 EST ] [NativeSystem.rununixcmd:1345]  NativeSystem.rununixcmd: RetString 0|protocol error: filename does not match request :failed
[Worker 2] [ 2021-02-24 17:31:26.859 EST ] [ClusterConfig$ExecuteCommand.returnCommandToClient:3324]  returnCommandToClient; fillCount=1 is full=false
[Worker 2] [ 2021-02-24 17:31:26.859 EST ] [Semaphore.release:88]  SyncBufferFull:Release called by thread Worker 2 m_count=2

When validating/configuring passwordless SSH connectivity, it has to copy SSH key information between nodes using scp. From the trace we can find that, it uses scp with -p option to copy the files, and destination file on local server (host01) is sent to scp with file name:

  /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
  
And source file on remote server(host02) is sent to scp with file name:

  host02:'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out'

Interesting thing is that the whole file path includes single quotation marks, I guess Oracle programmer left them there by accident because there is no reason to use them. Let's verify if it is the culprit by manually running scp
[grid@host01]$ scp -p host02:"'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out'" /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
protocol error: filename does not match request
[grid@host01]$
[grid@host01]$ scp -T -p host02:"'/tmp/GridSetupActions2021-02-24_05-29-41PM/CVU_19.0.0.0.0_grid/scratch/getFileInfo1638946.out'" /tmp/GridSetupActions2021-02-24_05-29-41PM/host02.getFileInfo1638946.out
getFileInfo1638946.out                                             98    56.5KB/s   00:00
[grid@host01]$

The scp fails with same error "protocol error: filename does not match request", but succeeds if extra option -T is used.

The -T option was introduced by OpenSSH 8.0 released in April 2019. In earlier version of OpenSSH, when copying files from a remote system to a local directory, scp did not verify that the file names that the server sent matched those requested by the client. This could allow a hostile server to create or clobber unexpected local files with attacker-controlled content. OpenSSH 8.0 fixed this security issue and scp, by default, verifies the file name on client side, and also introduced -T option to provide capacity to disable the verification.

Although OpenSSH officially claims that the fix is introduced in 8.0, gridSetup.sh 19.3 also fails with same reason on AIX with OpenSSH 7.5p1 and it is where the errors used in this article happened.

Oracle gridSetup.sh 19.3 (base pubic release of 19c) sends remote file name with single quotation marks, but remote server returns file name without quotation. Techinally, they are same thing, but they are visually different. Therefore, old version scp worked because it did not verify them, but current scp fails it with "filename does not match".

Oracle Release Update 19.6 fixed this problem by removing the single quotation marks. Therefore, gridSetup.sh can be run successfully with -applyRU option to apply 19.6 or higher Realease Update before installing/configuring GI
[grid@r6-dart]$ ./gridSetup.sh -applyRU /u01/stage/30501910
Preparing the home to patch...
Applying the patch /u01/stage/grid/30501910...
Successfully applied the patch.
The log can be found at: /u01/app/oraInventory/logs/GridSetupActions2021-02-24_03-25-57PM/installerPatchActions_2021-02-24_03-25-57PM.log
...

Here, Oracle Grid Infrastructure Release Update 19.6 (Patch 30501910) is unzipped under directory /u01/stage/30501910.

If 19.3 is really needed for some reason, as a temporary workaround, we can rename scp and create a new scp
# Rename the original scp
mv /usr/bin/scp /usr/bin/scp.bak

# Create a new file scp
echo "/usr/bin/scp.orig -T $*" > /usr/bin/scp

# Make the file executable
chmod a+rx /usr/bin/scp

After successfully installing GI 19.3, remember restore original scp
# Delete interim scp
rm /usr/bin/scp

# Restore the original scp.
mv /usr/bin/scp.bak /usr/bin/scp

Saturday, February 27, 2021

19c Upgrade - Fail with ORA-07445: exception encountered: core dump [joevm_invokevirtual()+1260]

When 12.2 pluggable database is upgraded to 19c, dbupgrade fails with message
**********   Upgrading Java and non-Java   *********
Serial   Phase #:53   [PDB1] Files:2 
Unexpected error encountered in catctlMain; Error Stack Below; exiting
Died at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 9188.
 at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 9188.
main::catctlDie("\x{a}Unexpected error encountered in catconExec; exiting\x{a} No such"...) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 5650
main::catctlExecutePhaseFiles(53, 2, "PDB1", "", undef) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 2035
main::catctlRunPhase(53, 2, "PDB1", "", undef) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 2197
main::catctlRunPhases(0, 108, 108, "PDB1", "", undef) called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 2816
main::catctlRunMainPhases() called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 1463
main::catctlMain() called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 1370
eval {...} called at /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/admin/catctl.pl line 1368

------------------------------------------------------
CATCTL FATAL ERROR
------------------------------------------------------

 LOG FILES: (/u01/app/oracle/product/19.0.0/dbhome_1/cfgtoollogs/DB01/upgrade20210222125535/catupgrdPDB1*.log)
TRACE FILE: (/u01/app/oracle/product/19.0.0/dbhome_1/cfgtoollogs/DB01/upgrade20210222125535/catupgrdPDB1_trace.log)

In log file catupgrdPDB1*.log
13:09:24 SQL> -- Load all the Java classes
13:09:24 SQL> begin if initjvmaux.startstep('CREATE_JAVA_SYSTEM') then
13:09:24   2  initjvmaux.rollbacksetup;
13:09:24   3  commit;
13:09:24   4  initjvmaux.rollbackset;
13:09:24   5  initjvmaux.exec('create or replace java system');
13:09:24   6  commit;
13:09:24   7  initjvmaux.rollbackcleanup;
13:09:24   8  initjvmaux.endstep;
13:09:24   9  dbms_registry.update_schema_list('JAVAVM',
13:09:24  10        dbms_registry.schema_list_t('OJVMSYS'));
13:09:24  11  end if; end;
13:09:24  12  /
begin if initjvmaux.startstep('CREATE_JAVA_SYSTEM') then
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
ERROR:
ORA-03114: not connected to ORACLE

In database alert log file
2021-02-22T13:09:41.671656-05:00
PDB1(4):Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x4] [PC:0x118CE298C, joevm_invokevirtual()+1260] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_2_ora_4739.trc  (incident=28841) (PDBNAME=PDB1):
ORA-07445: exception encountered: core dump [joevm_invokevirtual()+1260] [SIGSEGV] [ADDR:0x4] [PC:0x118CE298C] [Address not mapped to object] []
PDB1(4):Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_28841/DB01_ora_4739_i28841.trc
PDB1(4):Use ADRCI or Support Workbench to package the incident.

The culprit is that the PDB (PDB1) was plugged from NONCDB which was followed by running noncdb_to_pdb.sql.  It can be fixed as following,

1. Connect to the database as SYS and run following sql commands in root container CDB$ROOT
alter session set container=CDB$ROOT;

alter session set "_ORACLE_SCRIPT"=true;

create or replace view rootobj sharing=object as
   select obj#,o.name,u.name uname,o.type#,o.flags
     from obj$ o,user$ u where owner#=user#;

2. Connect to the database as SYS and run following sql commands in PDB container PDB1 (your PDB name may be different)
alter session set container=PDB1;

create or replace view rootobj sharing=object as
   select obj#,o.name,u.name uname,o.type#,o.flags
     from obj$ o,user$ u where owner#=user#;

update obj$ set flags=flags+65536
 where type# in (28,29,30,56)
       and bitand(flags,65536)=0
       and obj# in (select o.obj# from obj$ o,user$ u,rootobj r
                     where o.name=r.name and o.type#=r.type# and o.owner#=u.user# and u.name=r.uname and bitand(r.flags,65536)!=0
                    union
                    select obj# from obj$ where bitand(flags,4259840)=4194304);

delete from sys.idl_ub1$
 where obj# in (select obj# from sys.obj$ where bitand(flags, 65536)=65536 and type# in (28,29,30,56));

commit;

3. Re-start upgrade from failed step (Phase #:53 in my case)
$ORACLE_HOME/bin/dbupgrade -p 53 -c 'PDB1'