Friday, October 1, 2021

Change SYS password in Oracle Data Guard

In Oracle Data Guard configuration, redo transport uses Oracle Net sessions to transport redo data. These redo transport sessions are authenticated using either the Secure Socket Layer (SSL) protocol or a remote login password file. Most time, remote login password file is adopted because of the difficulty of meeting SSL authentication requirements.

When Redo Transport Authentication uses a password file, all physical and snapshot standby databases must use a copy of the password file from the primary database. Technically, the password file has to be copied every time when it is changed on primary database. 

Note: the password file should not be manually changed on physical and snapshot standby databases.

The password file will be updated(changed) whenever an administrative privilege (SYSDBA, SYSOPER, SYSDG and so on) is granted or revoked, and after the password of any user with administrative privileges is changed. Therefore, changing SYS password will update password file and should follow the following steps,

1. Change SYS password on primary database using one of following commands

   * SQL command:  alter user SYS identified by <new password>
   * SQLPlus command:  password SYS

Note: Do not use command orapwd to recreate password file unless the file is corrupted or you want to change file format.

2. Find out the password file location on primary database

If database version is 12c or higher and Grid Infrastructure (standalone or cluster) is configured, run command

   srvctl config database -db <db_unique_name>

Sample output
$ srvctl config database -db DBPRIMAY
Database unique name: DBPRIMAY
Database name: DBPRIMAY
Oracle home: /u01/app/oracle/product/19.0.0/dbhome_1
Oracle user: oracle
Spfile: +DATA/DBPRIMAY/PARAMETERFILE/spfile.437.1064771063
Password file: +DATA/DBPRIMAY/PASSWORD/pwdDBPRIMAY.367.1084460285
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA,FRA1,REDO1,REDO2
Services: CNY1,HRVSTR13
OSDBA group:
OSOPER group:
Database instance: DBPRIMAY

The sample command output shows the password file of database DBPRIMARY is "+DATA/DBPRIMAY/PASSWORD/pwdDBPRIMAY.461.1084460285" which is stored on ASM.

If srvctl command does not show password file, Oracle should use default password file which is "$ORACLE_HOME/dbs/orapw<ORACLE_SID>".

3. Copy the password file from primary database to all physical and snapshot standby databases

Note: It does not need to copy password file if database version is 12.2 or higher, Oracle will automatically refresh password files on standby databases.

If the password file is on ASM disk group, the ASM command "pwcopy" has to be used to copy password file from ASM to OS. Then use OS network copy utility to copy file to standby database hosts. Command "scp -p" is recommended.

If the password file is not copied correctly, you may get following errors,

  ORA-46952: standby database format mismatch for password file '+DATA/DBSTBY/PASSWORD/pwdSTBY.251.3084560285'

It can be fixed by re-copying the password file.

Monday, August 30, 2021

Database startup failed with ORA-600 [dbgripmg_2: infinite init action] [ADR_CONTROL_AUX]

Oracle database startup failed with ORA-00600 as following
SQL> startup
ORACLE instance started.

Total System Global Area 8589878792 bytes
Fixed Size                 12854792 bytes
Variable Size            5402263552 bytes
Database Buffers         3154116608 bytes
Redo Buffers               20643840 bytes
ORA-00600: internal error code, arguments: [dbgripmg_2: infinite init action],
[11], [ADR_CONTROL_AUX], [], [], [], [], [], [], [], [], []

Errors ORA-00700 [dbgrmblcp_corrupt_page]/ORA-00600 [dbgrmblgp_get_page_1] can be found from alert log as following,
Dumping diagnostic data in directory=[cdmp_20210830103319], requested by (instance=1, osid=85656173 (M000)), summary=[incident=186167].
2021-08-30 10:33:19.773000 -04:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_m000_11797188.trc  (incident=186169) (PDBNAME=CDB$ROOT):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186169/DB01_m000_11797188_i186169.trc
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_ora_59376292.trc  (incident=186150) (PDBNAME=CDB$ROOT):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/ADR_CONTROL.ams], [11], [], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/ADR_CONTROL.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186150/DB01_ora_59376292_i186150.trc
Dumping diagnostic data in directory=[cdmp_20210830103320], requested by (instance=1, osid=85656173 (M000)), summary=[incident=186168].
2021-08-30 10:33:21.211000 -04:00
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_m000_11797188.trc  (incident=186170) (PDBNAME=CDB$ROOT):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_m000_11797188.trc  (incident=186171) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [11], [0], [0], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/INCIDENT.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186171/DB01_m000_11797188_i186171.trc
Errors in file /u01/app/oracle/diag/rdbms/db01/DB01/trace/DB01_ora_59376292.trc  (incident=186151) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [11], [0], [0], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [/u01/app/oracle/diag/rdbms/db01/DB01/metadata/ADR_CONTROL.ams], [11], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db01/DB01/incident/incdir_186151/DB01_ora_59376292_i186151.trc

ADRCI is accessing ams files which are corrupted. In the example alert, corrupted files are INCIDENT.ams and ADR_CONTROL.ams under $ADR_HOME. The corrupted ams file may be different in different scenarios.

Solution is that deleting the corrupted files and restart database.

Wednesday, July 7, 2021

AQ Queue Monitor Does Not Change Message MSG_STATE from WAIT to READY on RAC

In RAC database, the Advanced Queue (AQ) message MSG_STATE (column of view AQ$<QUEUE_TABLE>) stays with value 'WAIT' and will not be changed to 'READY' any more. It usually happens after database was shut down with immediate/abort option followed by database patching/upgrading. 

Friday, June 25, 2021

EM 12c/13c Update Oracle Home Path of Targets with SQL

 As out-of-place patching can dramatically help minimize database downtime, especially when multiple databases run out of same home, it does not require all databases shut down at same time, more and more DBA adopt this method. However, if you are patching a home which host multiple databases, you may feel frustrated to update the databases' Oracle Home property in Oracle Enterprise Manage by clicking through different windows and changing it one by one. Now, the first question you will ask is how I can update all databases at one time. The answer is to run SQL (PL/SQL) in EM repository database.

Wednesday, June 23, 2021

EM 12c/13c Change Lifecycle Status with SQL

The Lifecycle Status property of EM targets is often used to prioritize the notification of incident rule. For example, DBA will be paged while database which Lifecycle Status is 'Production' crashed, and only emailed for 'Test' database. This post is going to show how to use sql scripts to find out targets with different Lifecycle Status and how to change the Lifecycle Status.

Before you can run these scripts, you have to connect to EM repository database as sysman.

List all databases which Lifecycle Status are not set yet with following sql,
select t.target_name, t.target_type, t.host_name
  from mgmt_targets t, mgmt_target_properties p
 where t.target_guid=p.target_guid(+)
       and t.target_type='oracle_database' -- comment out this line if going to widen the query
       and p.property_name(+)='orcl_gtp_lifecycle_status' and p.property_value is null;

If you want to widen the query result, you can remove 'target_type' predicate from where clause, or you assign different value for the predicate to find out other type of targets (e.g. listener, asm instance, etc.), the populate values of 'target_type' include

    oracle_database      Single instance database or RAC instance
    oracle_pdb              Pluggable database
    rac_database           RAC database
    osm_instance          ASM instance
    oracle_listener        Oracle database listener

Sample query
sysman@OEMR> select t.target_name, t.target_type, t.host_name
  2    from mgmt_targets t, mgmt_target_properties p
  3   where t.target_guid=p.target_guid(+)
  4         and t.target_type in ('oracle_database','oracle_pdb','rac_database','osm_instance','oracle_listener')
  5         and p.property_name(+)='orcl_gtp_lifecycle_status' and p.property_value is null;

ENTITY_NAME                    ENTITY_TYPE                                        HOST_NAME                     
------------------------------ -------------------------------------------------- ------------------------------
+ASM_host01.dbaplus.ca         osm_instance                                       host01.dbaplus.ca
ORCL                           oracle_database                                    host01.dbaplus.ca
DB01                           oracle_database                                    host01.dbaplus.ca
DB01_CDBROOT                   oracle_pdb                                         host01.dbaplus.ca
DB01_PDB1                      oracle_pdb                                         host01.dbaplus.ca
DB01_PDB2                      oracle_pdb                                         host01.dbaplus.ca
DB02                           rac_database                                       rac01.dbaplus.ca
DB02_CDBROOT                   oracle_pdb                                         rac01.dbaplus.ca
DB02_DB02_1                    oracle_database                                    rac02.dbaplus.ca
DB02_DB02_2                    oracle_database                                    rac01.dbaplus.ca
DB02_PDB1                      oracle_pdb                                         rac01.dbaplus.ca
LISTENER_host01.dbaplus.ca     oracle_listener                                    host01.dbaplus.ca

The Lifecycle Status can be changed with following sql (pl/sql),
exec mgmt_target.set_target_property('<target_name>','<target_type>','orcl_gtp_lifecycle_status','INSTANCE','<Lifecycle Status>');

Valid 'Lifecycle Status' can be one of following values,

   Development, Test, Stage, Production

Following command changes Lifcycle Status of database DB01 to 'Test',
# Before changing
sysman@OEMR> select t.target_name, t.target_type, t.host_name, p.property_value lifecycle_status
  2    from mgmt_targets t, mgmt_target_properties p
  3   where t.target_guid=p.target_guid(+)
  4         and t.target_name='DB01'
  5         and t.target_type='oracle_database'
  6         and p.property_name(+)='orcl_gtp_lifecycle_status';

TARGET_NAME   TARGET_TYPE        HOST_NAME          LIFECYCLE_STATUS
------------- ------------------ ------------------ ----------------
DB01          oracle_database    host01.dbaplus.ca

# Change
sysman@OEMR> exec mgmt_target.set_target_property('DB01','oracle_database','orcl_gtp_lifecycle_status','INSTANCE','Test');

PL/SQL procedure successfully completed.

sysman@OEMR> commit;

Commit complete.

# After changing
sysman@OEMR> select t.target_name, t.target_type, t.host_name, p.property_value lifecycle_status
  2    from mgmt_targets t, mgmt_target_properties p
  3   where t.target_guid=p.target_guid(+)
  4         and t.target_name='CHPTEST1'
  5         and t.target_type='oracle_database'
  6         and p.property_name(+)='orcl_gtp_lifecycle_status';

TARGET_NAME   TARGET_TYPE        HOST_NAME            LIFECYCLE_STATUS
------------- ------------------ -------------------- ----------------
DB01          oracle_database    host01.dbaplus.ca    Test

Change multiple targets with sql statements like following,
declare
  cursor c1 is
   select t.target_name, t.target_type, t.host_name
     from mgmt_targets t, mgmt_target_properties p
    where t.target_guid=p.target_guid(+)
          and t.target_type in ('oracle_database','oracle_pdb','rac_database','osm_instance','oracle_listener')
          and p.property_name(+)='orcl_gtp_lifecycle_status' and p.property_value is null;
begin
  for cc in c1 loop
 mgmt_target.set_target_property(cc.target_name,cc.target_type,'orcl_gtp_lifecycle_status','INSTANCE','Test')
  end loop;
end;


For example, change all new databases (Lifecycle Status not set) to 'Test',
sysman@OEMR> declare
  2    cursor c1 is
  3     select t.target_name, t.target_type, t.host_name
  4       from mgmt_targets t, mgmt_target_properties p
  5      where t.target_guid=p.target_guid(+)
  6            and t.target_type='oracle_database'
  7            and p.property_name(+)='orcl_gtp_lifecycle_status' and p.property_value is null;
  8  begin
  9    for cc in c1 loop
 10        mgmt_target.SET_TARGET_PROPERTY(cc.target_name,cc.target_type,'orcl_gtp_lifecycle_status','INSTANCE','Test');
 11    end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.

sysman@OEMR> commit;

Commit complete.
[/div]

Tuesday, June 22, 2021

EM 12c/13c How to find out new discovered targets with SQL

 When agent is deployed to new host, or new targets is installed or created on existing hosts, Oracle Enterprise Manger (EM) Cloud Control can discover the targets automatically. However, EM does not promote the new discovered targets automatically.

Although, the new discovered targets can be found from EM console -> Setup -> Add Target -> Auto Discovery Results, many DBAs like command line to do that. It is more efficient and flexible.

Log into EM repository database as sysman, run following sql,
col entity_name for a20
col entity_type for a20
col host_name for a50
select entity_name,entity_type,host_name,
       decode (manage_status, 0, 'Ignored', 
                              1, 'Not managed yet', 
                              2, 'Managed', 
                              3, 'Managed target component',
                                 'Unknown') "Manage Status", 
       decode (promote_status, 0, 'Cannot promote (existence only entity)', 
                              1, 'Eligible for promotion', 
                              2, 'Promotion in progress', 
                              3, 'Promoted',
                                 'Unknown') "Promote Status"
from mgmt$manageable_entities
where promote_status=1;

Example output,
sysman@OEMR> col entity_name for a20
sysman@OEMR> col entity_type for a20
sysman@OEMR> col host_name for a50
sysman@OEMR>
sysman@OEMR> select entity_name,entity_type,host_name,
  2         decode (manage_status, 0, 'Ignored',
  3                                1, 'Not managed yet',
  4                                2, 'Managed',
  5                                3, 'Managed target component',
  6                                   'Unknown') "Manage Status",
  7         decode (promote_status, 0, 'Cannot promote (existence only entity)',
  8                                1, 'Eligible for promotion',
  9                                2, 'Promotion in progress',
 10                                3, 'Promoted',
 11                                   'Unknown') "Promote Status"
 12  from mgmt$manageable_entities
 13  where promote_status=1;

ENTITY_NAME   ENTITY_TYPE          HOST_NAME                Manage Status     Promote Status
------------- -------------------- ------------------------ ----------------- -------------
dbtest        oracle_database      host01.lab.dbaplus.ca    Ignored           Eligible for promotion
db02          oracle_database      host01.lab.dbaplus.ca    Not managed yet   Eligible for promotion

Two databases are discovered, dbtest is ignored because it was created for temporary testing purpose and does not need to be promoted, and db01 is candidate for promotion.

Sunday, June 20, 2021

Script Run datapatch against all running instance in parallel

This script can be used to run datapatch against all currently running instances in parallel to apply sql patches after patches are applied on Oracle homes. It is helpful for post patching operations. It works for different instances running out of different Oracle homes and different home with different owners.

The script accepts one optional parameter as log file path, it will save log files under /tmp if no parameter is presented while the script is started.

In order to have the script succeed, please be advised about following limitations,

1. The script is only tested on Linux and AIX, it does not work on Solaris.

2. Only root or Oracle database home owner is supposed to run this script. If the instances are running out of different Oracle homes which are owned by different OS users, root user is recommended. Otherwise, the instances running out of Oracle home which owner is different from current user will be excluded.

3. Oracle database home is 12c or higher which supports datapatch

4. The script runs following command as root to retrieve Oracle home path
   /bin/ls
   Therefore, if Oracle home owner (normally oracle) runs this script, sudo has to be configure to grant the user privilege to run '/bin/ls' as root without being asked for password.

Friday, June 18, 2021

Install PostgreSQL on RHEL/OL/CentOS 6/7/8

There are many ways to install PostgreSQL on Linux box. Here, I am going to use yum to install different version of PostgreSQL on Red Hat Enterprise Linux (RHEL) 7/8, it also works for Oracle Linux (OL) / CentOS 7/8.

1. Install yum repository for PostgreSQL

Depends on version of OS, run one of following commands
  * RHEL/OL/CentOS 8
  yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm

  * RHEL/OL/CentOS 7
  yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

  * RHEL/OL/CentOS 6
  yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-6-x86_64/pgdg-redhat-repo-latest.noarch.rpm

2. Validate yum repository list with following command

  yum repolist

The command lists all configured repositories, example output on RHEL 8 
repo id                repo name
pgdg-common            PostgreSQL common RPMs for RHEL/CentOS 8 - x86_64
pgdg10                 PostgreSQL 10 for RHEL/CentOS 8 - x86_64
pgdg11                 PostgreSQL 11 for RHEL/CentOS 8 - x86_64
pgdg12                 PostgreSQL 12 for RHEL/CentOS 8 - x86_64
pgdg13                 PostgreSQL 13 for RHEL/CentOS 8 - x86_64
pgdg96                 PostgreSQL 9.6 for RHEL/CentOS 8 - x86_64

It configured PostgreSQL 9.6, 10, 11, 12, 13 repositories on RHEL 8. Therefore, I can install any or all of these version PostgreSQL. Note, available version PostgreSQL is different for different OS version.

3. Install PostgreSQL

Depends on which version you want to install, run one of following command,

  * Install PostgreSQL 13
  yum install -y postgresql13-server

  * Install PostgreSQL 12
  yum install -y postgresql12-server

  * Install PostgreSQL 11
  yum install -y postgresql11-server

  * Install PostgreSQL 10
  yum install -y postgresql10-server

  * Install PostgreSQL 96
  yum install -y postgresql96-server

4. Install package postgresql-contrib

This package contains various extension modules that are included in the PostgreSQL distribution. Depends on the version of PostgreSQL installed, run one of following command to install contrib package,

  * PostgreSQL 13
  yum install -y postgresql13-contrib

  * PostgreSQL 12
  yum install -y postgresql12-contrib

  * PostgreSQL 11
  yum install -y postgresql11-contrib

  * PostgreSQL 10
  yum install -y postgresql10-contrib

  * PostgreSQL 96
  yum install -y postgresql96-contrib

Thursday, June 17, 2021

Java Stored Procedure failed with java.lang.OutOfMemoryError

Oracle database Jave stored procedure failed with "java.lang.OutOfMemoryError", the error stack looks like
ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
ORA-06512: at "USER01.MYJAVAPROC", line 28
ORA-06512: at line 7

This error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The Java garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.  

As we know, Java stored procedure is impleted in Java, so that the excution of the procedure also follows the same rule as normal Java application doing. The only difference is that the procedure is running on JVM built inside database, not normal Java Runtime Engine (java.exe).

When normal Java application got out of memory error, parameter -Xmx can be used while starting the application to configure the heap memory with larger value. However, as the Oracle Databae JVM is running in the process space of the Oracle executable, there is no way to use the -Xmx parameter.  But it is also configurable using a method in the "Java Runtime" class "oracle.aurora.vm.OracleRuntime". The class has following methods,

    getMaxMemorySize   - Get current setting of heap size
    setMaxMemorySize   - Set new heap size

In order to call the methods inside database, we have to create Java stored procedure to expose the method to Oracle database, the example code as following
create or replace function get_java_heap_size return number is
    language java name 'oracle.aurora.vm.OracleRuntime.getMaxMemorySize() returns long';

create or replace function set_java_heap_size(mem_size number) return number is
    language java name 'oracle.aurora.vm.OracleRuntime.setMaxMemorySize(long) returns long';

The out of memory error can be fixed by running created Jave store procedure set_java_heap_size before running the application Java stored procedure which triggers the error. For example
declare
  heap_size number;
begin
  -- Set heap size to 1GB
  heap_size := set_java_heap_size(1024*1024*1024);
  -- Run application Java stored procedure
  user01.myjavaproc;
end;
/

Wednesday, June 16, 2021

Oracle 19.11 roothas.sh failed with "Out of memory" on AIX

When applying Oracle GI Release Update 19.11.0.0.210420 on AIX, command "roothas.sh -postpatch" failed with "Out of memory" as following
[root@host01]# /u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.sh -postpatch
Using configuration parameter file: /u01/app/oracle/product/19.0.0/grid_1/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/host01/crsconfig/hapatch_2021-06-15_01-53-27PM.log
Out of memory!
Out of memory!
Out of memory!
/u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.sh[137]: 7930494 Segmentation fault(coredump)
The command '/u01/app/oracle/product/19.0.0/grid_1/perl/bin/perl -I/u01/app/oracle/product/19.0.0/grid_1/perl/lib -I/u01/app/oracle/product/19.0.0/grid_1/crs/install -I/u01/app/oracle/product/19.0.0/grid_1/xag /u01/app/oracle/product/19.0.0/grid_1/crs/install/roothas.pl -postpatch' execution failed

The error is coming from the perl process (script roothas.pl) executing the root configuration scripts, the perl process does not have enough memory(especially data segments) during the script execution.

Technically, the "Out of memory" error could be seen on AIX while running Oracle 19c perl scripts for root configuration. These perl scripts are usually executed/called by DBA with sh script root.sh, rootupgrade.sh, rootcrs.sh, roothas.sh, etc. Although the error was hard to see before, it is becoming popular since Oracle GI RU 19.11 was released.

It is caused by OS memory allocation method. On AIX, the number of data segments that a process is allowed to use also limits the process memory size. The default number of data segments is one. The size of a data segment is 256 MB. Data segments are shared for both data and stack. The maximum number of additional data segments a process can use is eight (2GB). The number of segments that a process can use for data is controlled by the LDR_CNTRL environment variable. It is defined in the parent process of the process that is to be affected.

Therefore, we are able to fix the issue by increasing process memory size through increasing value of environment variable LDR_CNTRL. For example, the following defines eight additional data segments

export LDR_CNTRL=MAXDATA=0x80000000
<root script>
unset LDR_CNTRL

Here, 
  <root script> is what you have to run as root which got "Out of memory". In my case, it is "roothas.sh -postpatch".
  unset command remove (unset) the LDR_CNTRL environment variable, so that it does not unintentionally affect other processes.

Some argumentative guy says eight additional data segments (2GB) is too large, ok, you can set to 4 as following

export LDR_CNTRL=MAXDATA=0x40000000

Tuesday, June 15, 2021

Configure yum with proxy server on RHEL/OL/Fedora

When using yum to maintain packages on hosts running Red Hat Enterprise Linux (RHEL), Oracle Linux (OL) or Fedora, you may have to configure proxy option if the host is running behind firewall.

In order to do that, add following lines to yum configuration file /etc/yum.conf

proxy=http://<proxy server>:<port>
proxy_username=<user name>
proxy_password=<password>

Here, 

  <proxy server> is the host name or IP address of proxy server
  <user name> user name if proxy server requires
  <password>  password if proxy server requires

File /etc/yum.conf  example
gpgcheck=1
installonly_limit=3
clean_requirements_on_remove=True
best=True
skip_if_unavailable=False
proxy=http://proxy.dbaplus.ca:8080
proxy_username=user01
proxy_password=user01pwd

Thursday, June 3, 2021

How to Permanently Change PowerShell Prompt

 The PowerShell command prompt indicates that PowerShell is ready to run a command,
PS C:\>

It is determined by the built-in Prompt function and can be customized by running following command to re-define Prompt function,

  function prompt {"<Script block>"}

Here, <Script block> is script block which will determine/generate prompt value (character string). For example, following have prompt include current date and time
PS C:\>
PS C:\> function Prompt {"$(Get-Date)> "}
06/03/2021 11:39:23>
06/03/2021 11:39:25>

The change is only valid for current session, the prompt of new session is still default value. In order to keep new prompt for all sessions, you have to create your own Prompt function and saving it in your PowerShell profile as following,

1. Find out your PowerShell profile file with "$profile"
PS C:\> $profile
C:\Users\admin\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
PS C:\>

2. Add customized Prompt function to profile

If the file or directory does not exist, you can manually create it, then add customized Prompt function to the file. For example, add following lines
function Prompt { "PS [" + ${ENV:USERNAME} + "@" + ${ENV:COMPUTERNAME} + "]> "}

The prompt will be in the format "PS [user-name@computer-name]> " in all new PowerShell sessions.

Sunday, May 16, 2021

Grid Infrastructure 12.2 restore OCR, Voting File and mgmtdb when disk group corrupted

For some reason, Oracle desupported the placement of OCR and voting files directly on a shared file system from Grid Infrastructure 12.2 until rescinding the desupport for Standalone Clusters from 19.3. Therefore, when GI 12.2 is installed, OCR, Voting file, and OCR backup location are, by default, configured in the ASM disk group, and mostly likely in the same disk group. When the disk group cannot be mounted because of any issues, the cluster will not be brought up anymore.

Technically, GI has to be re-configured as like a new installation, that could be a big job. However, it may not be that bad. Although 12.2 does not allow you have OCR backup location out of ASM disk group, DBA still can copy the OCR backup file from ASM disk group to file system with command 'asmcmd cp'. I am going to demonstrate how we can maximize the opportunity of bringing the cluster backup without rebuild/reinstall it.

Sunday, May 9, 2021

Windows How to log all output on the console to text file

Windows Command Prompt console does not have built-in facility to log the console output to file. In order to implement logging function, PowerShell has to be used instead of normal command console.

To enable logging, running following command under PowerShell prompt,

  Start-Transcript [-Path] "<file-name>" [-Append]

Here, 

  -Path parameter is used to specify the file "<file-name>" and "<file-name>" is full path of log file in which output messages will be saved

  -Append Indicates that the logging text will be added to the end of an existing file instead of overriding.

To stop logging, running command

  Stop-Trasnscript

Note: PowerShell can be started on Windows by clicking "Windows PowerShell" applicaiton or running command "PowerShell" in normal "Command Prompt" console, but DO NOT start PowerShell in "Windows Terminal". If the PowerShell is started from "Window Terminal", Start-Transcript may not be able to log everything. For example, non-Windows built-in command (e.g. sqlplus) output cannot be completely logged.
  
Example

PS> Start-Transcript -Path "C:\temp\test.log"
Transcript started, output file is C:\temp\test.log
PS>
PS> sqlplus system/oracle@orcl

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 9 20:30:15 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 13 2021 16:46:29 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

system@orcl> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
host01

system@orcl> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
PS>
PS> Stop-Transcript
Transcript stopped, output file is C:\temp\test.log

The content of file C:\temp\test.log

**********************
Windows PowerShell transcript start
Start time: 20210509202857
Username: dbaplus
RunAs User: dbaplus
Configuration Name: 
Machine: wkstn01 (Microsoft Windows NT 10.0.19042.0)
Host Application: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
Process ID: 15732
PSVersion: 5.1.19041.906
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.19041.906
BuildVersion: 10.0.19041.906
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
Transcript started, output file is C:\temp\test.log
PS> sqlplus system/oracle@orcl

SQL*Plus: Release 12.2.0.1.0 Production on Sun May 9 20:30:15 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 13 2021 16:46:29 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

system@orcl> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
host01

system@orcl> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
PS> Stop-Transcript
**********************
Windows PowerShell transcript end
End time: 20210509203043
**********************

Wednesday, May 5, 2021

OEM 12c/13c Database Target Discovered and Named with host name as suffix

 When EM discovers database targets, it generates default target name for the database in one of following formats,

  db_unique_name.db_domain   if both parameters db_unique_name and db_domain are set
  db_unique_name             if parameter db_unique_name is set, but db_domain is not set
  db_name.db_domain          if parameter db_unique_name is not set, but db_domain is set
  db_name                    if neither db_unique_name nor db_domain is set

EM discovery script retrieves these parameters' values from parameter file (pfile or spfile) of the database, not from running instance. If the discovery process has issue to locate or process parameter file, EM will name database target in format,

  <sid>_<hostname>
  
Therefore, when you find the new discovered database target is named in this format, it means EM agent discovery script has issue with parameter file. The details can be found from agent trace file "<AGENT_INST_HOME>/sysman/log/emagent_perl.trc". Most popular error messages look like

  ERROR:  initParameterFileUtl::convertSPFileToPFile: Failed to convert spfile
  
or

  ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance <instancename> in oracle home  <oracle_home>

For example, when Oracle Restart (standalone Grid Infrastructure) is installed and ASM storage is configured on server host1.dbaplus.ca, DBCA creates a database orcl (sid & db_name) in ASM diskgroup, the spfile will also be saved in ASM diskgroup and no parameter file (pfile/spfile) will be created under directory <ORACLE_HOME>/dbs. The database will be discovered by EM with default name orcl_host1.dbaplus.ca and you will see errors in agent trace file "emagent_perl.trc",

oracledb.pl: 2021-05-04 04:03:08,923: INFO:  DB_LISTENER_DISCOVERY:  processing sid="orcl"
oracledb.pl: 2021-05-04 04:03:08,926: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:08,930: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:09,065: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs
oracledb.pl: 2021-05-04 04:03:09,069: ERROR:  initParameterFileUtl::getParameterFile: Cannot find any init parameter file for instance orcl in oracle home /u01/app/oracle/product/19.9.0/dbhome_1/dbs

The reason is that discovery script cannot find parameter file from <ORACLE_HOME>/dbs. To fix this problem, create a pfile 'initorcl.ora' under <ORACLE_HOME>/dbs with following content,

   spfile='<full path of spfile saved in diskgroup>'

Note: DO NOT leave any SPACE character at the beginning of the line (before word "spfile"). If you did, you will not see any errors in the trace file, but the target name will be <sid>_<hostname>.

Sample init file

$ srvctl config database -db orcl | grep spfile
Spfile: +DATA/orcl/PARAMETERFILE/spfile.919.1071658047
$
$ echo "spfile='+DATA/orcl/PARAMETERFILE/spfile.919.1071658047'" > $ORACLE_HOME/dbs/initorcl.ora
$
$ cat $ORACLE_HOME/dbs/initorcl.ora
spfile='+DATA/orcl/PARAMETERFILE/spfile.919.1071658047'

Sunday, May 2, 2021

Oracle 19.11 deinstall failed with "ERROR: oracle/rat/tfa/util/ManageTfa"

After Oracle database 19c home is applied Release Update 19.11.0.0.210420, the home cannot be de-installed anymore.

The deinstall utility will fail with errors,
######################## DEINSTALL CLEAN OPERATION START ########################
## [START] Preparing for Deinstall ##
Setting LOCAL_NODE to thanos
Setting CRS_HOME to false
Setting oracle.installer.invPtrLoc to /tmp/deinstall2021-05-02_07-34-15PM/oraInst.loc
Setting oracle.installer.local to false

ERROR: oracle/rat/tfa/util/ManageTfa
Exited from program.


############# ORACLE DEINSTALL TOOL END #############

In deinstall error log,
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at oracle.install.db.deinstall.wrapper.Deinstall.callCleanImpl(Deinstall.java:1876)
        at oracle.install.db.deinstall.wrapper.Deinstall.main(Deinstall.java:907)
Caused by: java.lang.NoClassDefFoundError: oracle/rat/tfa/util/ManageTfa
        at oracle.install.db.deinstall.core.PrepForOUIDeinstall.cleanConfig(PrepForOUIDeinstall.java:187)
        ... 6 more
Caused by: java.lang.ClassNotFoundException: oracle.rat.tfa.util.ManageTfa
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 7 more

Looks like something related to TFA is missing after RU 19.11 is applied.

Workaround,

Roll back RU 19.11, then re-run deinstall utility.

Oracle 19c runInstaller failed with "undefined reference to 'jox_eujs_nowait_'"

When installing Oracle 19c with patch apply of Database RU 19.11 and OJVM RU 19.11, runInstaller failed with message

   Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile '/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/ins_rdbms.mk'. See /u01/app/oraInventory/logs/InstallActions2021-05-01_08-40-24PM/installActions2021-05-01_08-40-24PM.log for details.

The error can be reproduced on Linux x86_64 (RU 19.11 is not released for other platforms yet) as following,

  1. Unzip 19c base installation media 19.3 to new created home

  2. Under new home, replace original OPatch 12.2.0.1.17 with OPatch 12.2.0.1.24

  3. Unzip 19c Database Release Update 19.11.0.0.210420 (Patch 32545013 ) to a separate directory  /stage/ru

  4. Unzip 19c Oracle JavaVM Component Release Update 19.11.0.0.210420 (Patch 32399816) to a separate  directory /stage/ojvm

  5. Start runInstaller from new home with -applyRU and -applyOneOffs options

     ./runInstaller -applyRU /stage/ru/32545013 -applyOneOffs /stage/ojvm/32399816

In the installation log file (file name shown by runInstaller),

INFO:
 - Linking Oracle
INFO:
rm -f /u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/oracle
INFO:
   ... ...
INFO:
/u01/app/oracle/product/19.11.0/dbhome_1/lib//libserver19.a(joxwtp.o): In function `jox_eujs_nowait':
joxwtp.c:(.text+0xf7b): undefined reference to `jox_eujs_nowait_'

INFO:
make: *** [/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/oracle] Error 1
INFO: End output from spawned process.
INFO: ----------------------------------
INFO: Exception thrown from action: make
Exception Name: MakefileException
Exception String: Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile '/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/ins_rdbms.mk'. See '/u01/app/oraInventory/logs/InstallActions2021-05-01_08-40-24PM/installActions2021-05-01_08-40-24PM.log' for details.
Exception Severity: 1
INFO: Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile '/u01/app/oracle/product/19.11.0/dbhome_1/rdbms/lib/ins_rdbms.mk'. See '/u01/app/oraInventory/logs/InstallActions2021-05-01_08-40-24PM/installActions2021-05-01_08-40-24PM.log' for details.

It is reported as a bug. Before the bug fix is released, the workaround is to run installer without OJVM RU 19.11 and apply it separately after installation.

Saturday, May 1, 2021

OEM 13c Software Standardization Advisor Returns Collection Errors

In EM 13c console, access "Software Standardization Advisor" as following,

 Targets -> Databases -> Administration -> Software Standardization Advisor
 
You could see a number beside "Collection Errors" for database and/or Grid infrastructure home. The issue can be found in EM 13.3, 13.4 and 13.5.

Click the number beside "Collection Errors", all Oracle home targets which have "Metric Collection Errors" are listed with Path, Host and Owner information. 

Click the name of Oracle Home Target to show home page of the target, at the bottom of "Summary" section, click link "Found 1 Metric Collection Error(s)" beside "Reason".

Metric "Files affected by a Patch" has following "Message"

     java.lang.UnsupportedOperationException: Collection Result Maximum Flood Control Level Exceeded

The issue can be reproduced / verified by run following command on the host where the Oracle home exists,

  <AGENT_HOME>/bin/emctl control agent runCollection <Target name of the Oracle home>:oracle_home oracle_home_config

$ emctl control agent runCollection OraHome12_host01:oracle_home oracle_home_config
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD runCollection completed successfully

Following lines are found from the log file "<AGENT_INST_HOME>/sysman/log/gcagent.log"
2021-04-29 20:25:18,725 [668:D9EFD971:HTTPListener--668 (DispatchRequests emdctl@28474@amber.corp.toronto.ca=>[161974231840001])] INFO - >>> Dispatching request: RunCollectionRequest <<<
2021-04-29 20:25:22,986 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:22,992 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:24,398 [1129:A9983BB8:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchFixedBug)] WARN - Result set exceeding min flood control level
2021-04-29 20:25:24,406 [1129:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchedFile)] ERROR - Result set exceeded max flood control level
2021-04-29 20:25:24,408 [1129:GC.OnDemand.8 (oracle_home:OraHome12_host01:Real-time collection oracle_home_config) (oracle_home:OraHome12_host01:oracle_home_config:PatchedFile)] ERROR - oracle_home:OraHome12_host01:oracle_home_config:PatchedFile
java.lang.UnsupportedOperationException: Collection Result Maximum Flood Control Level Exceeded
at oracle.sysman.emSDK.agent.datacollection.CollectionResult.performFloodControl(CollectionResult.java:459)
at oracle.sysman.emSDK.agent.datacollection.CollectionResult.addCollectionRow(CollectionResult.java:662)
at oracle.sysman.gcagent.addon.fetchlet.osfetchlet.BaseOSFetchlet.getOSMetric(BaseOSFetchlet.java:1157)
at oracle.sysman.gcagent.addon.fetchlet.osfetchlet.BaseOSFetchlet.getMetric(BaseOSFetchlet.java:476)
at oracle.sysman.gcagent.target.interaction.execution.FetchletFactory.getMetric(FetchletFactory.java:437)
at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.executeQueryDescriptor(ExecuteTask.java:1284)
at oracle.sysman.gcagent.target.interaction.execution.ExecuteTask.runTask(ExecuteTask.java:3167)

From the log, we can find two exceptions:

  1. When collectiong metric PatchFixedBug (Bugs fixed by Patch), got WARN message "Result set exceeding min flood control level"
  2. When collectiong metric PatchedFile (Files affected by Patch), got ERROR message "Result set exceeded max flood control level"

The ERROR can be also confirmed by collecting specific metric "Files affected by Patch" with command

  <AGENT_HOME>/bin/emctl getmetric agent <Oracle Home target name>,oracle_home,PatchedFile

$ emctl getmetric agent OraHome12_host01,oracle_home,PatchedFile
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD getmetric error: Collection Result Maximum Flood Control Level Exceeded


Solution

1.  Agent side settings:

  * Set Min flood control value "CollectionResults.MaximumRowsFloodControlMin" to remove WARN message

    <AGENT_HOME>/bin/emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMin -value 5000

  * Set Max flood control value "CollectionResults.MaximumRowsFloodControlMax" to fix ERROR issue
  
    <AGENT_HOME>/bin/emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMax -value 50000

  * Verify the new value of properties with commands
  
    <AGENT_HOME>/bin/emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
    <AGENT_HOME>/bin/emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax

    Or reviewing agent property file "<AGENT_INST_HOME>/sysman/config/emd.properties"

    grep 'CollectionResults' <AGENT_INST_HOME>/sysman/config/emd.properties

  * Manually start metric collection to reset the error status or you can wait until next scheduled metric collection job running (by default, every 24 hours)

    <AGENT_HOME>/bin/emctl control agent runCollection <Oracle home target name>:oracle_home oracle_home_config

  * In case you change your mind (or want to see the errors) the new value of properties can be cleared/removed with commands

    <AGENT_HOME>/bin/emctl clear_property agent -name CollectionResults.MaximumRowsFloodControlMin
    <AGENT_HOME>/bin/emctl clear_property agent -name CollectionResults.MaximumRowsFloodControlMax

Sample output
$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMin is not a valid configuration property

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMax is not a valid configuration property

$ emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMin -value 5000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD setproperty succeeded

$ emctl setproperty agent -allow_new -name CollectionResults.MaximumRowsFloodControlMax -value 50000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
EMD setproperty succeeded

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMin=5000

$ emctl getproperty agent -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
CollectionResults.MaximumRowsFloodControlMax=50000

$ grep 'CollectionResults' /u01/app/oracle/em13.5/agent/agent_inst/sysman/config/emd.properties
CollectionResults.MaximumRowsFloodControlMin=5000
CollectionResults.MaximumRowsFloodControlMax=50000

2. OMS side settings:

  * Setting the min value:
    
    <OMS_HOME>/bin/emctl set property -name CollectionResults.MaximumRowsFloodControlMin -value 5000

  * Setting the max value:
    
    <OMS_HOME>/bin/emctl set property -name CollectionResults.MaximumRowsFloodControlMax -value 50000

  * Verify the new value of properties
  
    <OMS_HOME>/bin/emctl get property -name CollectionResults.MaximumRowsFloodControlMin
    <OMS_HOME>/bin/emctl get property -name CollectionResults.MaximumRowsFloodControlMax
    
  * Delete new properties if they are not needed

    <OMS_HOME>/bin/emctl delete property -name CollectionResults.MaximumRowsFloodControlMin
    <OMS_HOME>/bin/emctl delete property -name CollectionResults.MaximumRowsFloodControlMax

Sample output
[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMin for oms All Management Servers is null

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMax for oms All Management Servers is null

[oracle@oms]$ emctl set property -name CollectionResults.MaximumRowsFloodControlMin -value 5000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property CollectionResults.MaximumRowsFloodControlMin has been set to value 5000 for all Management Servers
OMS restart is not required to reflect the new property value

[oracle@oms]$ emctl set property -name CollectionResults.MaximumRowsFloodControlMax -value 50000
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Property CollectionResults.MaximumRowsFloodControlMax has been set to value 50000 for all Management Servers
OMS restart is not required to reflect the new property value

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMin
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMin at Global level is 5000

[oracle@oms]$ emctl get property -name CollectionResults.MaximumRowsFloodControlMax
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
SYSMAN password:
Value for property CollectionResults.MaximumRowsFloodControlMax at Global level is 50000

Sunday, April 25, 2021

OEM 13c EMGC_ADMINSERVER & EMGC_OMS1 target status show DOWN when emctl secure wls with cut certificate

 When custom certificate is configured for OEM 13c, WebLogic Servers installed as part of Enterprise Manager Cloud control (Administration Server and Managed Servers) can be secured with custom certificate using following command,

   $OMS_HOME/bin/emclt secure wls

However, the WebLogic Servers and their deployments could show down in OEM console after secured with custom certificate, though they are still running well.

The reason is that the CA involved in issuing the custom certificate for OMS is not "well known", at least it is not accepted by Oracle as default trusted CA. When agent running on OMS server communicates with WebLogic Servers (WLS), WLS is using the custom certificate as his own identification, but agent cannot find trusted certificates of CA involved in issuing the custom certificate from agent's local keystore. Therefore, agent cannot verify the validation of WLS's certificate, and stops the communication with WLS.

The quick fix is to import the certificate of each CA involved in issuing the ticket into agent local keystore with following command,

    $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -trust_certs_loc <ca_certificate_file> -alias <certificate_alias> [-password <keystore_pwd>]

Here, <certificate_alias> is used to identify the certificates saved in the keystore, they must be unique for each certificate, <keystore_pwd> is the password of the keystore, the default value is welcome.

For example, I have installed a CA in my lab network, and the CA issued certificate to my OMS server. The two of my CA server certificates (root certificate & intermediate certificate) has to be imported into agent keystore as following,

 $AGENT_HOME/bin/emctl stop agent

 $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -password welcome -alias dbaplus-root -trust_certs_loc /home/oracle/Root_CA_Certificate.txt

 $AGENT_HOME/bin/emctl secure add_trust_cert_to_jks -password welcome -alias dbaplus-intermediate -trust_certs_loc /home/oracle/Intermediate_CA_Certificate.txt

 $AGENT_HOME/bin/emctl start agent

List the certificates imported into agent monitor keystore,

 $AGENT_HOME/jdk/bin/keytool -list -alias <certificate_alias> -keystore   $AGENT_INSTANCE_HOME/sysman/config/montrust/AgentTrust.jks -storepass welcome -v

If needed, the certificates can be removed from keystore as following

 $AGENT_HOME/jdk/bin/keytool -delete -alias <certificate_alias> -keystore   $AGENT_INSTANCE_HOME/sysman/config/montrust/AgentTrust.jks -storepass welcome -v

Saturday, April 24, 2021

OEM 13c Target "EM Jobs Service" shown as down in EM Console while all associated targets are up

"EM Jobs Service" target status is showing down in Enterprise Manager Cloud Control(EM) console even though all associated targets are up and running. It could be an issue with the metric collection definition. Usually it is seen at post blackout of associated targets.

The status of EM Jobs Service is aggregated target status, it is calculated based on the status of the associated targets. The associated targets and calculation logic are defined by default when the system is installed, and you can change it later.

The issue can be fixed by changing/restoring Availability Definition of the service as following,

1. In EM Console navigate to the following menu

     Targets > Services > Click on "EM Jobs Service" target

2. In "EM Jobs Service" home page, click on the tab "Monitoring Configuration" and then click on the link "Availability Definition"

3. Take a screen shot of the "Availability Definition" configurations, change the definition to a different option and click OK to save it.

    For Instance, If Availability Definition is to consider "All key components are up" (default definition), change it to "At least one key component is up" and save change.

4. Now revert "Availability Definition" of the service back to original configration by following the same procedure.

    For instance, change and save "Availability Definition" to "All key components are up"

The target status shows up as all components are up.

Wednesday, April 14, 2021

OEM 12c/13c Agent Deployment fails with "Remote Validatons: Shell Path Validation Failed"

When deploying agent on OEM 12c/13c using 'Add Host Targets' wizard, the deployment fails with

Remote Validations:  Shell Path validation failed

Cause:  Shell path is incorrect or not defined.:/bin/bash(SH_PATH),-c(SH_ARGS) on host <host name> 

Recommendation:  Check the property values in the following files in this order, ssPaths_<plat>.properties or sPaths.properties or Paths.properties, in "/u01/app/oracle/em13.4/middleware/oui/prov/resources" directory. If the property values are correct, then ensure the login user account is enabled for remote logins.For more details, refer to the Oracle Enterprise Manager Basic Installation Guide.

Most common reason why it happened could be one of following

1. Shell (sh, bash & ksh) location is different from OEM defined location
OEM defined shell location can be found from file 'ssPaths_<platform>.properties' under directory '$OMS_HOME/oui/prov/resources'. For example, if the errors happens on deploying agent to AIX host, type the content of file 'ssPaths_aix.properties' which looks like following
SH_PATH=/bin/bash
SH_ARGS=-c
SHELL_PATH=/bin/bash
SHELL_ARGS=-c
KSH_PATH=/usr/bin/ksh
RMDIR_ARGS=
#the date should be in the format of year:month:date:hour:minute:second
DATE_ARGS=-u +%y:%m:%d:%H:%M:%S
PING_PATH=/usr/sbin/ping
SSH_KEYGEN_PATH=/usr/bin/ssh-keygen
TAR_EXCLUDE_ARGS=X
TAR_INCLUDE_ARGS=-I
DF_COL_NAME=avail
SSH_HOST_KEY_LOC=/etc/ssh

On the host where agent is going to be installed, check if the executables/shell exist and are located at same place as in the OEM file 'ssPaths_<platform>.properties'. In previous example files, the executables/shell are

/bin/bash
/usr/bin/ksh
/usr/bin/ssh-keygen

If it does not exist, you have to install it. If it exists but is located at different directory, edit the OEM file and replace the shell/executable path with the directory where the shell/executable is.

2. Incorrect user name or password configured in Named Credential which is used to deploy the agent

If incorrect user name or password is used, the error could also happen. If you do not have the password of the user defined in Named Credential, the issue can be confirmed by checking following log file on oms server,
  
$OMS_INSTANCE_BASE/em/EMGC_OMS1/sysman/agentpush/<timestamp>/applogs/<host_name>_deploy.log

For example, the failed deployment log is

  /u01/app/oracle/em13.4/gc_inst/em/EMGC_OMS1/sysman/agentpush/2021-04-13_12-58-49-PM/applogs/host01.lab.dbaplus.ca_deploy.log

And following message is found in the log
2021-04-13_12-59-55:INFO:===VALIDATION===:Checking SH_PATH on target nodes
2021-04-13_12-59-55:INFO:isWrongShPath:remotePathPropertiesLoc:/u01/app/oracle/em13.4/middleware/oui/prov/resources Platform id:212
2021-04-13_12-59-55:INFO:NODES=host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Running cmd /bin/bash -c /bin/true on node host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Action description Execution of command /bin/bash -c /bin/true  on host host01.lab.dbaplus.ca
2021-04-13_12-59-55:INFO:Attempt :1 pty required false  with no inputs
2021-04-13_12-59-56:INFO:/bin/bash -c /bin/true execution failed on host host01.lab.dbaplus.ca
2021-04-13_12-59-56:INFO: OUT null
2021-04-13_12-59-56:INFO: ERR WARNING: Your password has expired.
Password change required but no TTY available.

We can see that the password has expired, ask system administrator to reset the password and also update the password for Named Credential.

The easist way to eliminate this error because of user name or password issue is to ask system administrator to test the login manually out of OEM.

Sunday, April 11, 2021

OEM 13c Discovering WebLogic Domain failed to save Node Manger target with error 'This target requires a local Management Agent'

When discovering or refreshing a WebLogic Domain or Fusion Middleware Farm in Enterprise Manager (EM) 13.4 Cloud Control, the Node Manager target is not saved. The error is shown in EM:

Failed to save NM_xxx_x(Oracle WebLogic Node Manager) on host <IP/host name>. This target requires a local Management Agent, but a local Management Agent was not found.  In order to add this target, you need to install a Management Agent on the same host as the target and then perform a "Refresh WebLogic Domain" operation.

The agent has been installed on the host. The errors happened because of difference between Listen Address of Node Manager configuration and host name of EM Agent URL. As a solution, the Listen Address Node Manager should be changed to host name of EM Agent URL.

Oracle explains it as incorrect configuration of Oracle WebLogic Node Manager. Therefore, it could happen on all release of EM 13c. However, I can only reproduce the problem in EM 13.1 and 13.4 when Listen Address of WebLogic Node Manager is configured with IP address instead of host name which is used by EM Agent URL and there is no problem with EM 13.2. Anyway, having both configuration use same host name is not bad idea.

Find out host name of EM agent URL with command <AGENT_HOME>/bin/emctl status agent
$ /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0/bin/emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 4
Copyright (c) 1996, 2020 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version          : 13.4.0.0.0
OMS Version            : 13.4.0.0.0
Protocol Version       : 12.1.0.1.0
Agent Home             : /u01/app/oracle/em13.4/agent/agent_inst
Agent Log Directory    : /u01/app/oracle/em13.4/agent/agent_inst/sysman/log
Agent Binaries         : /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0
Core JAR Location      : /u01/app/oracle/em13.4/agent/agent_13.4.0.0.0/jlib
Agent Process ID       : 76282
Parent Process ID      : 76240
Agent URL              : https://host01.lab.dbaplus.ca:3872/emd/main/
Local Agent URL in NAT : https://host01.lab.dbaplus.ca:3872/emd/main/
Repository URL         : https://oms.lab.dbaplus.ca:4903/empbs/upload
Started at             : 2021-04-07 17:53:56
Started by user        : oracle
Operating System       : Linux version 4.1.12-124.46.4.1.el7uek.x86_64 (amd64)
...
---------------------------------------------------------------
Agent is Running and Ready

Change Listen Address of Node Manager to the host name of EM Agent URL in the WebLogic Admin Console as following,
1. Go to Node Manger configuraiton page

   Environment > Machines > [Machine Name] > Configuration > Node Manager

2. Click 'Lock & Edit' to enable edit mode

3. Set the value of "Listen Address" property to the host name given by previous command 'emctl status agent'

4. Click 'Save', then click 'Activate Changes'

Refresh or rediscover the domain, the Node Manger will be discovered successfully.

Wednesday, March 31, 2021

OEM 13.4 AgentPull.sh failed with '0403-011 The specified substitution is not valid for this command'

When using Oracle Enterprise Manager 13.4 AgentPull.sh to deploy Agent 13.4 on AIX server, got following error,
./AgentPull.sh[270]: platform=${platform/\)/%29}: 0403-011 The specified substitution is not valid for this command.

Line 270 in AgentPull.sh is
platform=${platform/\)/%29}

It is a bash statement which tries to replace right parenthesis with code '%29', it works for bash but not Bourne shell (sh). The problem is that the script has directive to have Unix/Linux to run the script under bourne shell
$ head AgentPull.sh
#!/bin/sh
#
# $Header: emcore/install/unix/AgentPull.sh.template /main/30 2019/12/19 21:29:10 vbhaagav Exp $
#
# AgentPull.sh
#
# Copyright (c) 2011, 2019, Oracle and/or its affiliates. All rights reserved.
#
#    NAME
#      AgentPull.sh - <one-line expansion of the name>

The script works on most Linux servers because the /bin/sh is a symbolic link file linked to /bin/bash on these servers. Unfortunately, AIX is honest :), sh is sh and bash is bash, they are different.

As workaround, replace first line of AgentPull.sh '#!/bin/sh' with '#!/bin/bash'.

Tuesday, March 30, 2021

OEM 13.4 New Imported RuleSet False Evaluation of Lifecycle Status

If Rule Set is defined for targets based on Lifecycle Status, and the Rule Set is exported and imported into Enterprise Manager 13.4, the Rule Set will be always activated regardless of Lifecycle Status.

For example, two Rule Sets are created in source Enterprise Manager,

   Rule Set A for targets which Lifecycle Status is Production
   Rule Set B for targets which Lifecycle Status is Development

Both A and B are exported from source EM and imported into EM 13.4, and one target (no matter what value of Lifecycle Status) is brought down, both A and B will send out notification.

It can be fixed by manually editing the Rule Set. You do not need really to change anything of the Rule Set, just select the Rule Set and click Edit, then click Save.

OEM 13.2 Exported Ruleset Failed with 'Error in parsing properties for RuleSet from the XML' when Importing into EM 13.x

When importing ruleset in Enterprise Manager 13.x (confirmed in 13.2 and 13.4) from xml file exported from Enterprise Manager 13.2, got following error,

  Error in parsing properties for RuleSet from the XML

If the destination EM version is 13.2, following message will be found from trace file emoms.trc under directory <OMS_INSTANCE_BASE>/em/EMGC_OMS1/sysman/log
2021-03-29 22:49:12,394 [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR exportimport.ImportHelper logp.251 - Error in parsing properties for RuleSet from the XML
oracle.sysman.emSDK.app.exception.EMSystemException
     at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesModelUtil.throwEMSystemException(RulesModelUtil.java:1486)
     at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:116)
     ...
    ... 100 more
Caused by: oracle.jbo.JboException: JBO-29000: Unexpected exception caught: oracle.jbo.DMLConstraintException, msg=JBO-26048: Constraint "EM_RULE_EXPRESSIONS_UK1" is violated during post operation "Insert" using SQL statement "INSERT INTO EM_RULE_EXPRESSIONS(SELECTION_OBJECT_ID,EXPRESSION_ID,EXPRESSION_GROUP_ID,ATTR_ID,OPERATOR_CODE,ATTR_VALUE,SEL_OBJ_TYPE,EXPRESSION_TYPE,UI_GROUP_ID,EXCLUSION_GROUP_ID) VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)".
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.doDML(RulesFwkEntityImpl.java:201)
    at oracle.jbo.server.EntityImpl.postChanges(EntityImpl.java:7271)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:97)
    ... 104 more
Caused by: oracle.jbo.DMLConstraintException: JBO-26048: Constraint "EM_RULE_EXPRESSIONS_UK1" is violated during post operation "Insert" using SQL statement "INSERT INTO EM_RULE_EXPRESSIONS(SELECTION_OBJECT_ID,EXPRESSION_ID,EXPRESSION_GROUP_ID,ATTR_ID,OPERATOR_CODE,ATTR_VALUE,SEL_OBJ_TYPE,EXPRESSION_TYPE,UI_GROUP_ID,EXCLUSION_GROUP_ID) VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)".
    at oracle.jbo.server.OracleSQLBuilderImpl.doEntityDML(OracleSQLBuilderImpl.java:565)
    at oracle.jbo.server.EntityImpl.doDML(EntityImpl.java:9098)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.doDML(RulesFwkEntityImpl.java:196)
    ... 106 more
Caused by: java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (SYSMAN.EM_RULE_EXPRESSIONS_UK1) violated

If the destination EM version is 13.4 (or 13.3), following message will be found from trace file emoms.trc
2021-03-29 22:04:29,284 [[ACTIVE] ExecuteThread: '70' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR exportimport.ImportHelper logp.251 - Error in parsing properties for RuleSet from the XML
oracle.sysman.emSDK.app.exception.EMSystemException
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesModelUtil.throwEMSystemException(RulesModelUtil.java:1509)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:106)
    ...
    at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
    at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
Caused by: oracle.jbo.RowAlreadyDeletedException: JBO-25019: Entity row with key oracle.jbo.Key[BEB7FD94E2112377E053DA570F89CC8F ] is not found in EmRuleSetsEO.
    at oracle.jbo.server.OracleSQLBuilderImpl.doEntitySelectForAltKey(OracleSQLBuilderImpl.java:811)
    at oracle.jbo.server.BaseSQLBuilderImpl.doEntitySelect(BaseSQLBuilderImpl.java:554)
    at oracle.jbo.server.EntityImpl.doSelect(EntityImpl.java:9133)
    at oracle.jbo.server.EntityImpl.lock(EntityImpl.java:6612)
    at oracle.jbo.server.EntityImpl.beforePost(EntityImpl.java:7150)
    at oracle.jbo.server.EntityImpl.postChanges(EntityImpl.java:7384)
    at oracle.sysman.core.event.rules.uimodel.rulesfwk.RulesFwkEntityImpl.postChanges(RulesFwkEntityImpl.java:97)
    ... 95 more

It is a bug of EM 13.2, it can be fixed by apply OMS one-off patch 25986453 on source EM 13.2. The patch is also needed for destination EM if the version is 13.2.

Monday, March 29, 2021

OEM 13c: Oracle Coherence Cache Targets Showing Down after WebLogic Server PSU applied

 After WebLogic Server (WLS) Patch Set Update (PSU) is applied on WLS home installed as part of Oracle Enterprise Manager (EM) 13c, two of Oracle Coherence Cache objects in WebLogic Domain '/EMGC_GCDomain/GCDomain' are shown Down in EM console.

The issues, at least, was seen after applying WLS PSU 12.2.1.3.201217 on EM 13.4.0.9 home, and WLS PSU 12.2.1.4.210330 on EM 13.5.0.0.0 home.

It can be fixed by removing and re-adding the targets as following,

1.  Remove the Oracle Coherence Cache Targets showing Down from OEM

    * In the OEM Console,  navigate to 'Targets' > 'Middleware'
    * On left panel expand 'Target Type', then expand 'Coherence', Select Oracle Coherence Cache. All Oracle Coherence Cache targets will be listed on the right window.
    * Right click on the name of the targets showing Down, click pop-up menu 'Target Setup' > 'Remove Target'
   
    The Down targets are deleted from OEM console.
   
2.  Refresh the domain '/EMGC_GCDomain/GCDomain' to add removed/missing Oracle Coherence Cache objects

    * In the OEM Console,  navigate to 'Targets' > 'Middleware'
    * On left panel, clear the selection you did while removing the targets. The Oracle WebLogic Domain '/EMGC_GCDomain/GCDomain' will be listed in right window
    * Click the domain name '/EMGC_GCDomain/GCDomain'
    * Navigate to the  'Refresh Weblogic Domain' in the Drop-Down menu on the home page of target '/EMGC_GCDomain/GCDomain'
    * New Screen will appear, click on "Add and Update Targets..."
    * Following the prompt to add new found targets
 
The removed targets will be added with new target names.

Wednesday, March 10, 2021

DBCA does not list ASM diskgroup for storage option

Oracle 19c DBCA does not list ASM disk groups in "Select Database Storage Option" window.
Messages in dbca trace file $ORACLE_BASE/cfgtoollogs/dbca/trace.log_<TIMESTAMP>
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  /u01/app/19.0.0/grid/bin/kfod
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  nohdr=TRUE
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  OP=GROUPS
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  status=true
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:45.969 EST ] [KfodUtil.kfodOutput:375]  asmcompatibility=true dbcompatibility=true
[DBStorageOption.flowWorker] [ 2021-03-08 13:24:46.045 EST ] [KfodUtil.kfodOutput:386]  Kfod result
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context

[DBStorageOption.flowWorker] [ 2021-03-08 13:24:46.046 EST ] [ASMUtils.loadDiskGroups:1221]  Loading  the diskgroups. exception using kfodError retrieving diskgroup using kfod utility, null
INFO: Mar 08, 2021 1:24:46 PM oracle.assistants.common.lib.FileSystemInfo getSharedStoragePaths
INFO: Getting default shared storage path.

DBCA failed with command "/u01/app/19.0.0/grid/bin/kfod", manually run the command
[oracle@host01]$ /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context

Debug the command with truss or strace as following
  On Linux:  strace -o /tmp/kfod.out /u01/app/19.9.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
  On Solaris/AIX: truss -o /tmp/kfod.out /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Example on Solaris
[oracle@host01]$ truss -o /tmp/kfod.out /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
Error 49802 initializing ADR
ERROR!!! could not initialize the diag context
[oracle@host01]$
[oracle@host01]$ tail /tmp/kfod.out
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01/kfod/log", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01/kfod", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod/host01", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag/kfod", 0xFFFFFFFF7FFFBF80) Err#2 ENOENT
stat("/u01/app/19.0.0/grid/log/diag", 0xFFFFFFFF7FFFBF80) = 0
getuid()     = 104 [104]
getgid()     = 112 [112]
getuid()     = 104 [104]
getuid()     = 104 [104]
mmap(0x00010000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFFFFFFFF73900000
getuid()     = 104 [104]
getuid()     = 104 [104]
open("/system/volatile/name_service_door", O_RDONLY) = 6
fcntl(6, F_SETFD, 0x00000001)   = 0
door_info(6, 0xFFFFFFFF75D8B5B0)  = 0
door_call(6, 0xFFFFFFFF7FFFBBB8)  = 0
ioctl(1, TCGETA, 0xFFFFFFFF7FFF8F7C)  = 0
fstat(1, 0xFFFFFFFF7FFF8F10)   = 0
write(1, " E r r o r   4 9 8 0 2  ".., 29) = 29
write(1, " E R R O R ! ! !   c o u".., 47) = 47
close(5)     = 0
_exit(1)

From the trace file /tmp/kfod.out, kfod failed with accessing following directories
    /u01/app/19.0.0/grid/log/diag/kfod/host01/kfod/log
    /u01/app/19.0.0/grid/log/diag/kfod/host01/kfod
    /u01/app/19.0.0/grid/log/diag/kfod/host01
    /u01/app/19.0.0/grid/log/diag/kfod
    /u01/app/19.0.0/grid/log/diag

Check the permission of the directory
[oracle@host01]$ ls -ld /u01/app/19.0.0/grid/log/diag
drwxr-x---   4 grid     oinstall       5 Dec 12 23:17 /u01/app/19.0.0/grid/log/diag
[oracle@host01]$ id
uid=104(oracle) gid=112(oinstall)

oracle, who is running kfod, is member of group oinstall, but group oinstall does not have write permission on directory. Trying to grant write permission as root or grid (owner of GI home)
[grid@host01]$ chmod g+w /u01/app/19.0.0/grid/log/diag
[grid@host01]$ ls -ld /u01/app/19.0.0/grid/log/diag
drwxrwx---   4 grid     oinstall       5 Dec 13 02:45 /u01/app/19.0.0/grid/log/diag

Re-try kfod as oracle
[oracle@host01]$ /u01/app/19.0.0/grid/bin/kfod nohdr=TRUE OP=GROUPS status=true asmcompatibility=true dbcompatibility=true
    204528     188110 EXTERN REDO 10.1.0.0.0 12.1.0.0.0
    613584     388124 EXTERN FRA  10.1.0.0.0 12.1.0.0.0
   4091624     470712 EXTERN DATA 10.1.0.0.0 12.1.0.0.0

Now, dbca can list ASM disk groups.

Monday, March 1, 2021

19c runcluvfy.sh faile with PRVF-7596 PRVG-2002

When upgrading Oracle Infrastructure 12.2 to 19c, runcluvfy.sh failed with following messages
Verifying OCR Integrity ...FAILED
host02: PRVF-7596 : CSS is probably working with a non-clustered, local-only
          configuration on node "host02"
Verifying resolv.conf Integrity ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/resolv.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request
Verifying DNS/NIS name service ...FAILED
host02: PRVG-2002 : Encountered error in copying file "/etc/netsvc.conf" from
          node "host02" to node "host01"
          protocol error: filename does not match request

Set log location and re-run runcluvfy.sh
# runcluvfy.sh will write log files to directory /u01/tmp
export CV_TRACELOC=/u01/tmp
# re-run runcluvfy.sh
./runcluvfy.sh stage -pre crsinst -upgrade -rolling -src_crshome /u01/app/12.2.0/grid_1 -dest_crshome /u01/app/19.0.0/grid_1 -dest_version 19.0.0.0.0 -fixup -verbose
# Log files created
cd /u01/tmp
ls -l
total 15200
-rw-r--r--    1 grid     oinstall     952467 Feb 24 16:52 cvuhelper.log.0
-rw-r--r--    1 grid     oinstall          0 Feb 24 16:48 cvuhelper.log.0.lck
-rw-r--r--    1 grid     oinstall    6824223 Feb 24 16:52 cvutrace.log.0

The log file cvutrace.log.0 shows messsage
[Worker 0] [ 2021-02-24 16:47:49.784 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/tmp/CVU_19.0.0.0.0_grid/scratch/getFileInfo3605304.out' /tmp/host02.getFileInfo3605304.out
[Worker 0] [ 2021-02-24 16:47:49.798 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-83] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-82] [ 2021-02-24 16:47:49.798 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-83] [ 2021-02-24 16:47:50.201 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:46.747 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/resolv.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
[main] [ 2021-02-24 16:51:46.760 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1060] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[Thread-1059] [ 2021-02-24 16:51:46.760 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.148 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1060] [ 2021-02-24 16:51:47.149 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request
...
[main] [ 2021-02-24 16:51:47.548 EST ] [UnixSystem.remoteCopyFile:848]  UnixSystem: /usr/bin/scp -p host02:'/etc/netsvc.conf' /tmp/CVU_19.0.0.0.0_grid/scratch/nssw_conf_host02
[Thread-1066] [ 2021-02-24 16:51:47.559 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.559 EST ] [RuntimeExec.runCommand:294]  runCommand: Waiting for the process
[Thread-1065] [ 2021-02-24 16:51:47.560 EST ] [StreamReader.run:62]  In StreamReader.run
[main] [ 2021-02-24 16:51:47.965 EST ] [RuntimeExec.runCommand:296]  runCommand: process returns 1
[Thread-1066] [ 2021-02-24 16:51:47.973 EST ] [StreamReader.run:66]  ERROR>protocol error: filename does not match request

All three errors occurred when scp is executed to copy file from remote node host02 to local node.
Interesting thing is that all three commands are using single quotation marks to enclose remote file name, I guess Oracle programmer left them there by accident because there is no reason to use them. Let's verify if it is the culprit by manually running scp
[grid@host01]$ scp -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
protocol error: filename does not match request
[grid@host01]$
[grid@host01]$ scp -T -p host02:"'/etc/resolv.conf'" /tmp/CVU_19.0.0.0.0_grid/scratch/resolv.conf_host02
resolv.conf                                             96    51.3KB/s   00:00
[grid@host01]$

The scp fails with same error "protocol error: filename does not match request", but succeeds if extra option -T is used.

The -T option was introduced by OpenSSH 8.0 released in April 2019. In earlier version of OpenSSH, when copying files from a remote system to a local directory, scp did not verify that the file names that the server sent matched those requested by the client. This could allow a hostile server to create or clobber unexpected local files with attacker-controlled content. OpenSSH 8.0 fixed this security issue and scp, by default, verifies the file name on client side, and also introduced -T option to provide capacity to disable the verification.

Although OpenSSH officially claims that the fix is introduced in 8.0, runcluvfy.sh shipped with Grid Infrastructure 19.3 also fails with same reason on AIX with OpenSSH 7.5p1 and it is where the errors used in this article happened.

This version of runcluvfy.sh sends remote file name with single quotation marks, but remote server returns file name without quotation. Technically, they are same thing, but they are visually different. Therefore, old version scp worked because it did not verify them, but current scp fails it with "filename does not match".

It can be resolved by downloading new standalone version Cluster Verification Utility from My Oracle Support(Patch 30839369). Note: the executable is cluvfy instead of runcluvfy.sh. If download is not the option. As a temporary workaround, we can rename scp and create a new scp
# Rename the original scp
mv /usr/bin/scp /usr/bin/scp.bak
# Create a new file scp
echo "/usr/bin/scp.orig -T $*" > /usr/bin/scp
# Make the file executable
chmod a+rx /usr/bin/scp

Now, the errors are gone. After successfully installing GI, remember to restore original scp
# Delete interim scp
rm /usr/bin/scp
# Restore the original scp.
mv /usr/bin/scp.bak /usr/bin/scp