Tuesday, December 24, 2013

SCAN (Single Client Access Name) Set Up Using DNS

Setting up a SCAN allow clients connecting to databases (or instances) in a RAC to use a single name to connect to any database (or instance). It could be considered an alias for all the databases in the cluster. Unlike TNS entries with VIP which need to be modified each time an instance is added or removed, TNS entries with SCAN IP does not need to be modified. SCAN was first introduced with 11gR2 and had additional features added with 12c. For more read the SCAN white paper.
As a prerequisite for RAC, SCAN could be setup either using DNS or GNS (Grid Naming Service). For non-production, test environment a single IP with same hostname could be added to /etc/hosts files of all the RAC nodes to get the installation going but this is not recommended (See 887522.1 Instead of DNS or GNS, Can we use '/etc/hosts' to resolve SCAN?) and prerequisite for SCAN check will fail (887471.1),but the installation will complete.
Setting up the SCAN comes under network administrators job role. It's for DBA to request "at least one single name that resolves to three IP addresses using a round robin algorithm". This post list the steps to setup a SCAN using DNS configuration. It will setup a single client access name that resolve to three IPs in a round robin fashion. Nevertheless it is recommended that a network administrator is consulted when setting up SCAN for a production system.
1. Verify DNS related rpms are installed. It would require following three rpms
# rpm -qa | grep bind
bind-libs-9.8.2-0.17.rc1.el6.x86_64
bind-9.8.2-0.17.rc1.el6.x86_64
bind-utils-9.8.2-0.17.rc1.el6.x86_64
2. Note down the hostname and the IP of the server where the SCAN is setup. Also find out the DNS IP (that is already configured)
# hostname
hpc6.mydomain.net

# ifconfig
em1       Link encap:Ethernet  HWaddr 00:26:B9:FE:7D:E0
          inet addr:192.168.0.104  Bcast:192.168.0.255  Mask:255.255.255.0

cat /etc/resolve.conf
search mydomain.net
nameserver 11.6.9.2
These values will be referred in subsequent steps.
3. Setting up the DNS involves adding IP and ports the DNS service listen on to the /etc/named.conf file. The default /etc/named.conf file looks like as follows
options {
        listen-on port 53 { 127.0.0.1; };
        listen-on-v6 port 53 { ::1; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        allow-query     { localhost; };
        recursion yes;

        dnssec-enable yes;
        dnssec-validation yes;
        dnssec-lookaside auto;

        /* Path to ISC DLV key */
        bindkeys-file "/etc/named.iscdlv.key";

        managed-keys-directory "/var/named/dynamic";
};

logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};

zone "." IN {
        type hint;
        file "named.ca";
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
Add new entries to configure the DNS and the SCAN
options {
        listen-on port 53 { 192.168.0.104; 127.0.0.1; }; #IP of the DNS server (noted on step 2)
#       listen-on-v6 port 53 { ::1; }; # IPv6 disabled
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        allow-query     { localhost; 192.168.0.0/24; };
        recursion yes;
        allow-transfer {"none";};

        forwarders { 11.6.9.2; };  # main DNS IP, anything that cannot be resolved will be forwarded to this IP
};

logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};

zone "mydomain.net" IN {
        type master;
        file "mydomain.net.zone"; # forward lookups entry file
        allow-update { none; };
};


zone "0.168.192.in-addr.arpa" IN {
        type master;
        file "rev.mydomain.net.zone"; # reverse lookup entry file
        allow-update { none; };
};
4. Create forward lookup and reverse lookup files in /var/named. The forward lookup file was named "mydomain.net.zone" for the zone "mydomain.net" in step 3.
cat /var/named/mydomain.net.zone

$TTL 86400
@          IN     SOA    hpc6.mydomain.net.  root.hpc6.mydomain.net. (
                         42 ; serial (d. adams)
                         3H ; refresh
                        15M ; retry
                         1W ; expiry
                         1D ) ; minimum
          IN   NS     hpc6.mydomain.net.
hpc6       IN   A      192.168.0.104
rac-scan        IN      A       192.168.0.86
rac-scan        IN      A       192.168.0.93
rac-scan        IN      A       192.168.0.94
Note the places where the hostname of the server where SCAN is setup being used. Create the reverse lookup file as follows
cat rev.mydomain.net.zone
$ORIGIN 0.168.192.in-addr.arpa.
$TTL 1H
@       IN      SOA     hpc6.mydomain.net.     root.hpc6.mydomain.net. (      2
                                                3H
                                                1H
                                                1W
                                                1H )
0.168.192.in-addr.arpa.         IN NS      hpc6.mydomain.net.

104     IN PTR  hpc6.mydomain.net.
86     IN PTR  rac-scan.mydomain.net.
94     IN PTR  rac-scan.mydomain.net.
93     IN PTR  rac-scan.mydomain.net.
5. Validity of these configuration files could be checked with named-checkconf and named-checkzone.
# named-checkconf /etc/named.conf

# named-checkzone mydomain.net /var/named/mydomain.net.zone
zone mydomain.net/IN: loaded serial 42
OK

# named-checkzone 0.168.192.in-addr.arpa  /var/named/rev.mydomain.net.zone
zone 0.168.192.in-addr.arpa/IN: loaded serial 2
OK
However no error in this step is no guarantee that SCAN will work as expected. It could only be verified with a dig command (shown on a later step).
6. Edit the resolve.conf file and add the new DNS server IP (server IP itself)
cat /etc/resolve.conf 

search mydomain.net
nameserver 192.168.0.104
Also add entry to /etc/hosts
/etc/hosts as below
192.168.0.104   hpc6.mydomain.net        hpc6
7. Start the DNS service
# /etc/init.d/named start
If the first time start of the service get stuck on the following step
# /etc/init.d/named start
Generating /etc/rndc.key:
run the following command and start the service again
# rndc-confgen -a -r /dev/urandom
wrote key file "/etc/rndc.key"

# /etc/init.d/named start
Once started verify that DNS service is listening on port (53 in this case) defined on the names.conf file
netstat -ntl
tcp        0      0 192.168.0.104:53            0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:53                0.0.0.0:*                   LISTEN


8. Using dig command verify that forward lookup and reverse lookup are functioning properly.
# dig hpc6.mydomain.net

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6 <<>> hpc6.mydomain.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52260
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;hpc6.mydomain.net.              IN      A

;; ANSWER SECTION:
hpc6.mydomain.net.       86400   IN      A       192.168.0.104

;; AUTHORITY SECTION:
mydomain.net.            86400   IN      NS      hpc6.mydomain.net.

;; Query time: 0 msec
;; SERVER: 192.168.0.104#53(192.168.0.104)
;; WHEN: Wed Aug 28 16:12:44 2013
;; MSG SIZE  rcvd: 64
Check if the query status is NOERROR and answer is 1. This confirms that query was answered without any errors. Check the reverse look as well
# dig -x 192.168.0.104

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6 <<>> -x 192.168.0.104
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30913
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; QUESTION SECTION:
;104.0.168.192.in-addr.arpa.    IN      PTR

;; ANSWER SECTION:
104.0.168.192.in-addr.arpa. 3600 IN     PTR     hpc6.mydomain.net.

;; AUTHORITY SECTION:
0.168.192.in-addr.arpa. 3600    IN      NS      hpc6.mydomain.net.

;; ADDITIONAL SECTION:
hpc6.mydomain.net.       86400   IN      A       192.168.0.104

;; Query time: 0 msec
;; SERVER: 192.168.0.104#53(192.168.0.104)
;; WHEN: Wed Aug 28 16:13:09 2013
;; MSG SIZE  rcvd: 104
9. If there are errors recheck the zone files and correct any mistakes. Once the dig command return no error then run the lookup command to see the name resolved to 3 IPs in a round robin fashion.
# nslookup rac-scan # lookup 1
Server:         192.168.0.104
Address:        192.168.0.104#53

Name:   rac-scan.mydomain.net
Address: 192.168.0.94
Name:   rac-scan.mydomain.net
Address: 192.168.0.86
Name:   rac-scan.mydomain.net
Address: 192.168.0.93

# nslookup rac-scan # lookup 2
Server:         192.168.0.104
Address:        192.168.0.104#53

Name:   rac-scan.mydomain.net
Address: 192.168.0.86
Name:   rac-scan.mydomain.net
Address: 192.168.0.93
Name:   rac-scan.mydomain.net
Address: 192.168.0.94

# nslookup rac-scan # lookup 3
Server:         192.168.0.104
Address:        192.168.0.104#53

Name:   rac-scan.mydomain.net
Address: 192.168.0.93
Name:   rac-scan.mydomain.net
Address: 192.168.0.94
Name:   rac-scan.mydomain.net
Address: 192.168.0.86
10. To use it in a RAC configuration edit the /etc/resolve.conf file of the RAC nodes and add the IP of the server where SCAN is setup
cat /etc/resolve.conf 

search mydomain.net
nameserver 192.168.0.104
Related Post
GNS Setup for RAC

Wednesday, December 4, 2013

Excessive Audit File (*.aud) Generation

Excessive amount of audit file generation was observed in a 11gR2 RAC environment. The audit trail parameter was set to OS and adump directory was getting populated with *.aud files at a rate of 60 per second. Audit files were cleaned out using cron (1298957.1) and schedule frequency wasn't enough to reduce the amount of *.aud file to an acceptable level. With 11gR2 (also available in 10.2.0.5 and 11.1.0.7) it is possible to use DBMS_AUDIT_MGMT to purge audit records including OS audit files (731908.1).
Starting with 11gR1 audit file generation has changed to create a new audit file for new session instead of appending to an existing audit file (1474823.1). This would generate lot of small (in size) audit files and could see burst of audit file generation for 3 tier application when connection pools are initiated. Also by default audit is enable of several actions. These could be found with the scripts on 1019552.6 or 287436.1. Below is the output from running script (1019552.6) on a new 11.2.0.4 RAC DB (only audit trail is changed to OS).
SQL> @tstaudit
Press return to see the audit related parameters...

NAME                 DISPLAY_VALUE
-------------------- --------------------
audit_file_dest      /opt/app/oracle/admin/std11g2/adump

audit_sys_operations FALSE
audit_syslog_level
audit_trail          OS

System auditing options across the system and by user

User name    Proxy name   Audit Option                   SUCCESS    FAILURE
------------ ------------ ------------------------------ ---------- ----------
                          ALTER ANY PROCEDURE            BY ACCESS  BY ACCESS
                          ALTER ANY TABLE                BY ACCESS  BY ACCESS
                          ALTER DATABASE                 BY ACCESS  BY ACCESS
                          ALTER PROFILE                  BY ACCESS  BY ACCESS
                          ALTER SYSTEM                   BY ACCESS  BY ACCESS
                          ALTER USER                     BY ACCESS  BY ACCESS
                          CREATE ANY JOB                 BY ACCESS  BY ACCESS
                          CREATE ANY LIBRARY             BY ACCESS  BY ACCESS
                          CREATE ANY PROCEDURE           BY ACCESS  BY ACCESS
                          CREATE ANY TABLE               BY ACCESS  BY ACCESS
                          CREATE EXTERNAL JOB            BY ACCESS  BY ACCESS
                          CREATE PUBLIC DATABASE LINK    BY ACCESS  BY ACCESS
                          CREATE SESSION                 BY ACCESS  BY ACCESS
                          CREATE USER                    BY ACCESS  BY ACCESS
                          DATABASE LINK                  BY ACCESS  BY ACCESS
                          DIRECTORY                      BY ACCESS  BY ACCESS
                          DROP ANY PROCEDURE             BY ACCESS  BY ACCESS
                          DROP ANY TABLE                 BY ACCESS  BY ACCESS
                          DROP PROFILE                   BY ACCESS  BY ACCESS
                          DROP USER                      BY ACCESS  BY ACCESS
                          EXEMPT ACCESS POLICY           BY ACCESS  BY ACCESS
                          GRANT ANY OBJECT PRIVILEGE     BY ACCESS  BY ACCESS
                          GRANT ANY PRIVILEGE            BY ACCESS  BY ACCESS
                          GRANT ANY ROLE                 BY ACCESS  BY ACCESS
                          PROFILE                        BY ACCESS  BY ACCESS
                          PUBLIC SYNONYM                 BY ACCESS  BY ACCESS
                          ROLE                           BY ACCESS  BY ACCESS
                          SYSTEM AUDIT                   BY ACCESS  BY ACCESS
                          SYSTEM GRANT                   BY ACCESS  BY ACCESS

29 rows selected.

Press return to see auditing options on all objects...

no rows selected
Press return to see audit trail... Note that the query returns the audit data for the last day only

no rows selected
Press return to see system privileges audited across the system and by user...
As seen from above output one of the audit option is "CREATE SESSION". As changing audit trail requires a restart of the database (RAC allows rolling restarts) it was decided to remove the audit on create session. This reduced the amount of audit file generated but still could see burst of audit files being generated every 5 second. Having examined the audit file it was now clear no audit files were generated for non sys users after removing audit on create session. Only audit files now generated are for sys users and only had following content on them
Thu Nov 28 14:32:12 2013 +01:00
LENGTH : '155'
ACTION :[7] 'CONNECT'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[10] '2765679112'



Audit for various sys user operations are created by default irrespective of the configuration setting (308066.1). Audit files could also be generated by grid control agent (1196323.1) which wasn't applicable in this case as there was no grid control or agent running. Another possible cause is due to the resource health check by the grid infrastructure (1378881.1). Resource check interval was set to 1 second and if this had been the cause the audit file generation frequency would have been every 1 second instead of every 5 second. So this wasn't the cause either. Metalink note 1171314.1 list possible causes for excessive audit file generation and classify them as expected behavior, due to audit setting or due to bug but it still didn't provide any help in diagnosing the cause of this audit file generation.
There was no application or cron jobs that were executed on the OS as sys user. So it was decided to check where the sys connections originating from.
SQL>  select inst_id,sid,serial#,program,machine,USERNAME ,sql_id from gv$session where username='SYS' order by 1;

   INST_ID        SID    SERIAL# PROGRAM                               MACHINE      USERN   SQL_ID
---------- ---------- ---------- -----------------------------------   ---------    ------- -------------
         1        569          5 oraagent.bin@dm-db1.hps (TNS V1-V3)   dm-db1.hps   SYS
         1       2566       6317 oracle@dm-db1.hps (O001)              dm-db1.hps   SYS
         1       3683        561 oraagent.bin@dm-db1.hps (TNS V1-V3)   dm-db1.hps   SYS    4qm8a3w6a1rfd
         1       3967          1 oraagent.bin@dm-db1.hps (TNS V1-V3)   dm-db1.hps   SYS
         2        569          3 oraagent.bin@dm-db2.hps (TNS V1-V3)   dm-db2.hps   SYS
         2       3682          1 oraagent.bin@dm-db2.hps (TNS V1-V3)   dm-db2.hps   SYS

6 rows selected.
Sys sessions with program oraagent.bin@hostname are expected in RAC environment (except for bug in 11.2.0.2, refer 1307139.1) and does not cause excessive audit file generation. Only thing out of place was O001 process. Looking at the logon time for this session it showed been logon while back and similar systems (11.2.0.3 RAC) didn't have this process logon for long periods as in this case. From the Oracle reference document states "Onnn slave processes are spawned on demand. These processes communicate with the ASM instance. Maintains a connection to the ASM instance for metadata operations". According to 1556564.1 Onnn processes are spawned and terminated automatically and could be killed of if required and system will re-spawn it when needed and killing it will not effect database operations.
However it was decided to do a cluster stack restart on the node 1 during system maintenance. After the restart the Onnn process was not there and audit file generation went back to normal, burst of audit file generation every 5 second wasn't there anymore.

Useful metalink note
How to Setup Auditing [ID 1020945.6]
11g: Possible reasons for many OS audit trail (.aud) files, <1KB in size [ID 1474823.1]
Large Number of Audit Files Generated by Oracle Restart or Grid Infrastructure [ID 1378881.1]
Large Number Of Audit Files Generated During Rman Backups [ID 1211396.1]
AUDIT_SYS_OPERATIONS Set To FALSE Yet Audit Files Are Generated [ID 308066.1]
Huge/Large/Excessive Number Of Audit Records Are Being Generated In The Database [ID 1171314.1]
SCRIPT: Generate AUDIT and NOAUDIT Statements for Current Audit Settings [ID 287436.1]
How does the NOAUDIT option work (Disabling AUDIT commands)[ID 1068714.6]
A Lot of Audit Files in ASM Home [ID 813416.1]
Many OS Audit Files Produced By The Grid Control Agent Connections [ID 1196323.1]
New Feature DBMS_AUDIT_MGMT To Manage And Purge Audit Information [ID 731908.1]
Manage Audit File Directory Growth with cron [ID 1298957.1]
Script to Show Audit Options/Audit Trail [ID 1019552.6]
AUDIT_TRAIL Set to DB yet Some Audited Entries for non-Sysdba Users Are Created in the OS Trail. [ID 1279934.1]
The Column DBA_PRIV_AUDIT_OPTS Has Rows With USER_NAME 'ANY CLIENT' and PROXY_NAME NULL [ID 455565.1]
High Load On Server from Process Ora_onnn [ID 1556564.1]

Thursday, November 14, 2013

ORA-1691: unable to extend lobsegment Message Only on Alert Log When Inserting to a Table with SecureFile

Similar to Basic LOB situation mentioned in the earlier post a server side (or alert log only) ora-1691 message appears when inserting to table with a securefile. The test was created using the same infrastructure as before so the tablespace names, table names and the java code are identical to the ones mentioned in earlier post. Therefore the test case used for the previous post could also be used here. Only difference is that lob segment is now a securefile.
CREATE TABLE lobtest ( ID number,  "OBJECT" BLOB ) SEGMENT CREATION IMMEDIATE TABLESPACE datatbs
   LOB
  (
    "OBJECT"
  )
  STORE AS securefile object_lob_seg (
                TABLESPACE lobtbs
                DISABLE STORAGE IN ROW
        CACHE
                RETENTION NONE
        STORAGE (MAXEXTENTS UNLIMITED)
        INDEX object_lob_idx (
            TABLESPACE lobtbs
            STORAGE (MAXEXTENTS UNLIMITED)
        )
    )
/
In the previous case the ora-1691 was logged on the alert log during the subsequent inserts. But with the securefile table ora-1691 is raised in the initial insert itself and well before the tablespace is exhausted.
Using the tablespace from the previous test case (each with maximum of 10M) 143 rows could be inserted (different to maximum value in the earlier case. This could be due to use of securefile) before the client side ora-1691 is shown. But at the point of inserting the 133rd row and forward ora-1691 is logged on the alert log but all rows are inserted successfully. Use of retention none has no effect in removing or reducing this logging. Similar to previous case, unless number of the rows inserted somehow known by other means it would be difficult to know any row insertion failed due to a space issue.
Few SR updates later issue is being investigated as a possible bug. Post will be updated with the outcome.

Related Post
ORA-1691: unable to extend lobsegment is expected behavior?




Update 06 December 2013
SR is being inactivated and issue is to be tracked with
Bug 17463217 : ORA-1691: UNABLE TO EXTEND LOBSEGMENT SHOWN ON ALERT LOG WHEN INSERTING TO TABLE

Update 02 January 2014
Oracle's reply to the SR was that this is not a bug but expected behavior. The explanation is when inserting around 133rd row mark oracle process realize there's not enough in the table so it allocate some extents to the table and insert is successful, as such no error on the client side. However at this time oracle process also identifies that space pressure exits for this table and starts background process to preallocate further extents to the table. This is done by the Space Management Slave Process (Wnnn) which "performs various background space management tasks, including proactive space allocation and space reclamation". When this slave process tries to preallocate it gets the ora-1691 as there's not enough free space in the data file, because pre-allocation of extents goes beyond the maximum size for the data file. Therefore slave process logs ora-1691 on alert log. End of explanation.
So if there's any ora-1691 on alert log it's worth while to check the application logs as well to check if any row insertion failed, since as shown here it is possible to get ora-1691 on alert log but all rows to successfully get inserted as well.

Tuesday, November 5, 2013

CALIBRATE_IO and Non-Default Block Size

Oracle IO Calibration using CALIBRATE_IO procedure in the dbms_resource_manager package consists of two steps, first step is to find out the IOPS and second is to find the MBPS.

According to Oracle documentation "In the first step of I/O calibration with the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure, the procedure issues random database-block-sized reads, by default, 8 KB, to all data files from all database instances. This step provides the maximum IOPS, in the output parameter max_iops, that the database can sustain. The value max_iops is an important metric for OLTP databases. The output parameter actual_latency provides the average latency for this workload. When you need a specific target latency, you can specify the target latency with the input parameter max_latency (specifies the maximum tolerable latency in milliseconds for database-block-sized IO requests)."

The second step of calibration using the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure issues random, 1 MB reads to all data files from all database instances. The second step yields the output parameter max_mbps, which specifies the maximum MBPS of I/O that the database can sustain. This step provides an important metric for data warehouses.

Though documentation says that to determine the IOPS calibrate_IO procedure issues database-block-size reads this is not the case when the database has non-default block size tablespaces. When there are non-default block size tablespaces the block size used in the first step is the maximum block size in the system. For example if there only 8K block size tablespaces and 2K block size tablespaces then first step (IOPS phase) will use 8K, which may seem like it's using the default block size. However if there are 8K block size tablespaces and 32K block size tablespaces then first step will use 32K as the block size for read operations. Even if there are mixture of block sizes for example 8K, 16K, 32K still the first step would choose the 32K as the block size for the reads.




This behavior could be observed tracing the IO calibration operations. To enable the tracing set following events before running the IO calibration
alter system set events '56724 trace name context forever, level 2';
alter system set events '10298 trace name context forever, level 1';
To turn off tracing set the following after IO calibration has finished
alter system set events '56724 trace name context off';
alter system set events '10298 trace name context off';
Tracing does have some overhead and final values seen while tracing is enabled may be lower compared to values seen from calibration without tracing enabled. When the tracing is enabled several trace files will be created in the trace file directory. They are of the form
SID_ora_pid.trc (eg: std11g21_ora_2307.trc)
SID_cs##_pid.trc (eg: std11g21_cs00_2314.trc, std11g21_cs01_2316.trc)
Files with SID_cs##_pid.trc name will have the block sizes used by the random reads. These could be identified by the nbyt field in the line.
ksfd_osdrqfil:fob=0x8e4ffe38 bufp=0x793f7e00 blkno=4750400 nbyt=32768 flags=0x4
ksfd_osdrqfil:fob=0x8e4fff70 bufp=0x76452000 blkno=1144832 nbyt=1048576 flags=0x4
ksfd_osdrqfil:fob=0x8e4fec08 bufp=0x878b7600 blkno=4967936 nbyt=16384 flags=0x4
Total number of read requests issues with various block sizes could be counted with the following shell command. The database only has 8K and 32K block size tablespaces.
SQL> select block_size,count(*) from v$datafile group by block_size;

BLOCK_SIZE   COUNT(*)
---------- ----------
     32768          1
      8192          9

SQL> select name,value from v$parameter where name='db_block_size';

NAME            VALUE
--------------- -----
db_block_size   8192

$ grep nbyt=8192 *cs*trc | wc -l
0

$ grep nbyt=32768 *cs*trc | wc -l
1394478

$ grep nbyt=1048576 *cs*trc | wc -l
348904
1048576 is the 1MB request issued during the second step. From the above output it could be seen that even though database had 9 tablespaces with the default block size 8K and only one tablespace with a non-default block size of 32K, all the read request issues during first step were of 32K block size and not a single 8K block size read request was issued. Because of this values seen for IOPS differ depending on whether the database has tablespaces with non-default block sizes.
Tested on 11.2.0.3, 11.2.0.4 and 12.1.0.1. It's not clear at this point if this is a bug or expected behavior even though documentation says IOPS is determined by default block size reads. SR is ongoing.

Update 18 April 2017
SR has been closed. The issue could be tracked with bug 17434257 : DISCREPANCIES IN THE VALUES OBSERVED WHEN IO CALIBRATION IS RUN

Related Post
I/O Elevator Comparision Using CALIBRATE_IO

Thursday, October 17, 2013

Patching 12c (12.1.0.1) RAC with October 2013 PSU

First critical patch update for 12c was released Oct 15 2013. This post looks at the difference in patching 12c RAC environment (with role separation) compared to 11.2 environment. The environment used for patching is the environment that was upgraded from 11.2 to 12c.
First thing to notice is the name of the patch. On the readme.html that is included in the patch it is referred to as "Oracle Grid Infrastructure System Patch" instead of "Oracle Grid Infrastructure Patch Set Update" (More jargon to converse with!). However in the PSU and CPU availability document (1571391.1) it is still referred to as PSU (GI 12.1.0.1.1 PSU Patch 17272829). "GI System Patch" is used throughout the readme.html document so it's pretty safe to assume that's how the 12c patches going to be referred from now on.
Opatch auto option has been merged into one single command called "opatchauto".
However it is still possible to apply the patch manually. But at the time of this post (16/10/2013) the document with instruction for manual patch apply/rollback (1591616.1) is not available on MOS though the readme.html mentions it (shouldn't this be available before patches are released?). When this become available follow it for manual patch apply. In mean time as a workaround generateSteps option could be used to list the steps used by opatchauto
/opt/app/12.1.0/grid/OPatch/opatchauto apply  /usr/local/patch/17272829  -ocmrf ocm.rsp  -generateSteps
OPatch 12.1.0.1.2 or later is needed to apply this patch. Installing new OPatch on GI_HOME causes the following
unzip p6880880_121010_Linux-x86-64.zip
  ..
  inflating: OPatch/operr
error:  cannot create PatchSearch.xml
        Permission denied
File PatchSearch.xml is to be copied (or unzipped) to GI_HOME outside the OPatch directory and since GI_HOME has restrictive permission unzipping as grid user causes the above error. The file could be copied manually as root user into GI_HOME or ignore the error (this caused no issue when installing the patch). Looking inside the PatchSearch.xml file it seem this might be used to get the OPatch from MOS (has urls of MOS and OPatch including CSI number). No such issue installing the new OPatch on ORACLE_HOME.
Next issue is related to patch location. Readme.html mentions to use "PATH_TO_PATCH_DIRECTORY" in the opatchauto command. PATH_TO_PATCH_DIRECTORY is the location where the patch was unzipped. This is same as the 11.2. However this location is not recognized by the opatchauto command and complains of the missing bundle.xml file.
[grid@rhel6m2 patches]$ pwd
/usr/local/patches  <<-- this becomes the PATH_TO_PATCH_DIRECTORY (same as 11.2 as shown here)
[grid@rhel6m2 patches]$ ls
p17027533_121010_Linux-x86-64.zip
[grid@rhel6m2 patches]$ unzip p17027533_121010_Linux-x86-64.zip
[grid@rhel6m2 patches]$ su <-- preparing to run opatchauto as root user
[root@rhel6m2 patches]# /opt/app/12.1.0/grid/OPatch/opatchauto apply /usr/local/patches -ocmrf ocm.rsp

Parameter Validation: Successful

Patch Collection failed: Invalid patch location "/usr/local/patches" as there is no bundle.xml file in it or its parent directory.

opatchauto failed with error code 2.
So using the location where the patch was unzipped doesn't work unlike 11.2. Give the full path to the patch directory
[root@rhel6m2 patches]# /opt/app/12.1.0/grid/OPatch/opatchauto apply /usr/local/patches/17272829 -ocmrf ocm.rsp

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /opt/app/12.1.0/grid

opatchauto log file: /opt/app/12.1.0/grid/cfgtoollogs/opatchauto/17272829/opatch_gi_2013-10-16_15-39-34_deploy.log

Parameter Validation: Successful
...
Apply of patch progress.
Also worth noting is that along with opatchauto keyword apply must be given without it a syntax error occurs
[root@rhel6m1 patches]# /opt/app/12.1.0/grid/OPatch/opatchauto /usr/local/patches/17272829 -ocmrf ocm.rsp
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

Syntax Error... Unrecognized Command or Option (/usr/local/patches/17272829): 1st argument must be one of the following:
   apply
   rollback
   version
   ..
Section 2.3 on the readme.html does mention apply keyword in the commands but in 2.4 Patch installation section the apply key word missing. This is another difference compared to 11.2 where there was no apply key word when opatch auto option was used. Rollback commands on section 2.7 are also incorrectly listed. Correct rollback commands are listed on section 2.3.
The readme.html for GI system patch doesn't list any post installation task such as loading modified SQLs. This is automatically run as part of the patch apply. Once the patch is applied on the last node of the RAC the registry history is updated
SQL> select * from dba_registry_history;

ACTION_TIME                    ACTION     NAMESPACE  VERSION            ID BUNDLE_SER COMMENTS
------------------------------ ---------- ---------- ---------- ---------- ---------- ------------------------------
12-AUG-13 04.28.26.378432 PM   UPGRADE    SERVER     12.1.0.1.0                       Upgraded from 11.2.0.3.0
12-AUG-13 04.34.09.496894 PM   APPLY      SERVER     12.1.0.1            0 PSU        Patchset 12.1.0.0.0
16-OCT-13 04.05.54.514261 PM   APPLY      SERVER     12.1.0.1            1 PSU        PSU 12.1.0.1.1
SQL apply is logged in dba_registry_sqlpatch table
SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT

SQL> select * from dba_registry_sqlpatch;

  PATCH_ID ACTION     STATUS          ACTION_TIME                    DESCRIPTIO LOGFILE
---------- ---------- --------------- ------------------------------ ---------- --------------------------------------------------------------------------------
  17027533 APPLY      SUCCESS         16-OCT-13 05.54.42.295071 PM   sqlpatch   /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_apply_CDB12C_
                                                                                CDBROOT_2013Oct16_17_51_30.log
Each PDB will also have its own log file entry in the dba_registry_sqlpatch view
SQL> alter session set container=pdb12c;

Session altered.

SQL> show con_name

CON_NAME
------------------------------
PDB12C

SQL> select * from dba_registry_sqlpatch;

  PATCH_ID ACTION     STATUS          ACTION_TIME                    DESCRIPTIO LOGFILE
---------- ---------- --------------- ------------------------------ ---------- --------------------------------------------------------------------------------
  17027533 APPLY      END             16-OCT-13 05.54.44.488402 PM   sqlpatch   /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_apply_CDB12C_
                                                                                PDB12C_2013Oct16_17_51_49.log
Even the pdb$seed database could be queried this way to confirm that it is also updated with the SQL changes made by the patch. Any new PDB created using the seed PDB also gets these modification and no patch post installation work is necessary.




Full output of running the opaatchauto is given below
[root@rhel6m1 17272829]# /opt/app/12.1.0/grid/OPatch/opatchauto apply `pwd` -ocmrf ../ocm.rsp
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /opt/app/12.1.0/grid

opatchauto log file: /opt/app/12.1.0/grid/cfgtoollogs/opatchauto/17272829/opatch_gi_2013-10-16_14-23-50_deploy.log

Parameter Validation: Successful


Grid Infrastructure home:
/opt/app/12.1.0/grid
RAC home(s):
/opt/app/oracle/product/12.1.0/dbhome_1

Configuration Validation: Successful

Patch Location: /usr/local/patches/17272829
Grid Infrastructure Patch(es): 17027533 17077442 17303297
RAC Patch(es): 17027533 17077442

Patch Validation: Successful

Stopping RAC (/opt/app/oracle/product/12.1.0/dbhome_1) ... Successful
Following database(s) were stopped and will be restarted later during the session: std11g2

Applying patch(es) to "/opt/app/oracle/product/12.1.0/dbhome_1" ...
Patch "/usr/local/patches/17272829/17027533" successfully applied to "/opt/app/oracle/product/12.1.0/dbhome_1".
Patch "/usr/local/patches/17272829/17077442" successfully applied to "/opt/app/oracle/product/12.1.0/dbhome_1".

Stopping CRS ... Successful

Applying patch(es) to "/opt/app/12.1.0/grid" ...
Patch "/usr/local/patches/17272829/17027533" successfully applied to "/opt/app/12.1.0/grid".
Patch "/usr/local/patches/17272829/17077442" successfully applied to "/opt/app/12.1.0/grid".
Patch "/usr/local/patches/17272829/17303297" successfully applied to "/opt/app/12.1.0/grid".

Starting CRS ... Successful

Starting RAC (/opt/app/oracle/product/12.1.0/dbhome_1) ... Successful

SQL changes, if any, are applied successfully on the following database(s): std11g2

Apply Summary:
Following patch(es) are successfully installed:
GI Home: /opt/app/12.1.0/grid: 17027533, 17077442, 17303297
RAC Home: /opt/app/oracle/product/12.1.0/dbhome_1: 17027533, 17077442

On a system with PDBs that have dynamic services created for them, stopping RAC step will have the following output listing the service
Stopping RAC (/opt/app/oracle/product/12.1.0/dbhome_1) ... Successful
Following database(s) were stopped and will be restarted later during the session: -pdbsvc,cdb12c
If there are no services created for the PDBs then only the CDB is mentioned in the output
Stopping RAC (/opt/app/oracle/product/12.1.0/dbhome_1) ... Successful
Following database(s) were stopped and will be restarted later during the session: cdb12c

Apply has the option of analyze which says
-analyze
              This option runs all the required prerequisite checks to confirm
              the patchability of the system without actually patching or
              affecting the system in any way.
Even though it says "runs all the required prerequisite checks to confirm the patchability" this seem not be the case. Analyze could suceed and actual patch apply could fail.
[root@rhel12c2 patch]# /opt/app/12.1.0/grid/OPatch/opatchauto apply /usr/local/patch/17272829 -ocmrf ocm.rsp -analyze
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /opt/app/12.1.0/grid

opatchauto log file: /opt/app/12.1.0/grid/cfgtoollogs/opatchauto/17272829/opatch_gi_2013-10-17_11-28-37_analyze.log

NOTE: opatchauto is running in ANALYZE mode. There will be no change to your system.

Parameter Validation: Successful

Grid Infrastructure home:
/opt/app/12.1.0/grid
RAC home(s):
/opt/app/oracle/product/12.1.0/dbhome_1

Configuration Validation: Successful

Patch Location: /usr/local/patch/17272829
Grid Infrastructure Patch(es): 17027533 17077442 17303297
RAC Patch(es): 17027533 17077442

Patch Validation: Successful

Analyzing patch(es) on "/opt/app/oracle/product/12.1.0/dbhome_1" ...
Patch "/usr/local/patch/17272829/17027533" successfully analyzed on "/opt/app/oracle/product/12.1.0/dbhome_1" for apply.
Patch "/usr/local/patch/17272829/17077442" successfully analyzed on "/opt/app/oracle/product/12.1.0/dbhome_1" for apply.

Analyzing patch(es) on "/opt/app/12.1.0/grid" ...
Patch "/usr/local/patch/17272829/17027533" successfully analyzed on "/opt/app/12.1.0/grid" for apply.
Patch "/usr/local/patch/17272829/17077442" successfully analyzed on "/opt/app/12.1.0/grid" for apply.
Patch "/usr/local/patch/17272829/17303297" successfully analyzed on "/opt/app/12.1.0/grid" for apply.

SQL changes, if any, are analyzed successfully on the following database(s): cdb12c

Apply Summary:
Following patch(es) are successfully analyzed:
GI Home: /opt/app/12.1.0/grid: 17027533, 17077442, 17303297
RAC Home: /opt/app/oracle/product/12.1.0/dbhome_1: 17027533, 17077442

opatchauto succeeded.

<<------ Running of actual patch command ----------->>
[root@rhel12c2 patch]# /opt/app/12.1.0/grid/OPatch/opatchauto apply /usr/local/patch/17272829 -ocmrf ocm.rsp
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /opt/app/12.1.0/grid

opatchauto log file: /opt/app/12.1.0/grid/cfgtoollogs/opatchauto/17272829/opatch_gi_2013-10-17_11-32-12_deploy.log

Parameter Validation: Successful

Grid Infrastructure home:
/opt/app/12.1.0/grid
RAC home(s):
/opt/app/oracle/product/12.1.0/dbhome_1

Configuration Validation: Successful

Patch Location: /usr/local/patch/17272829
Grid Infrastructure Patch(es): 17027533 17077442 17303297
RAC Patch(es): 17027533 17077442

Patch Validation: Successful

Stopping RAC (/opt/app/oracle/product/12.1.0/dbhome_1) ... Successful
Following database(s) were stopped and will be restarted later during the session: -pdbsvc,cdb12c

Applying patch(es) to "/opt/app/oracle/product/12.1.0/dbhome_1" ...
Patch "/usr/local/patch/17272829/17027533" successfully applied to "/opt/app/oracle/product/12.1.0/dbhome_1".
Patch "/usr/local/patch/17272829/17077442" successfully applied to "/opt/app/oracle/product/12.1.0/dbhome_1".

Stopping CRS ... Successful

Applying patch(es) to "/opt/app/12.1.0/grid" ...
Command "/opt/app/12.1.0/grid/OPatch/opatch napply -phBaseFile /tmp/OraGI12Home1_patchList -local  -invPtrLoc /opt/app/12.1.0/grid/oraInst.loc -oh /opt/app/12.1.0/grid -silent -ocmrf /usr/local/patch/ocm.rsp" execution failed:
UtilSession failed:
Prerequisite check "CheckSystemSpace" failed.

Log file Location for the failed command: /opt/app/12.1.0/grid/cfgtoollogs/opatch/opatch2013-10-17_11-39-04AM_1.log

[WARNING] The local database instance 'cdb12c2' from '/opt/app/oracle/product/12.1.0/dbhome_1' is not running. SQL changes, if any,  will not be applied. Please refer to the log file for more details.
For more details, please refer to the log file "/opt/app/12.1.0/grid/cfgtoollogs/opatchauto/17272829/opatch_gi_2013-10-17_11-32-12_deploy.debug.log".

Apply Summary:
Following patch(es) are successfully installed:
RAC Home: /opt/app/oracle/product/12.1.0/dbhome_1: 17027533, 17077442

Following patch(es) failed to be installed:
GI Home: /opt/app/12.1.0/grid: 17027533, 17077442, 17303297

opatchauto failed with error code 2.
Log files list the failed steps and has commands that could be manually executed.
-------------------Following steps still need to be executed-------------------

/opt/app/12.1.0/grid/OPatch/opatch napply -phBaseFile /tmp/OraGI12Home1_patchList -local  -invPtrLoc /opt/app/12.1.0/grid/oraInst.loc -oh /opt/app/12.1.0/grid -silent -ocmrf /usr/local/patch/ocm.rsp (TRIED BUT FAILED)

/opt/app/12.1.0/grid/rdbms/install/rootadd_rdbms.sh

/usr/bin/perl /opt/app/12.1.0/grid/crs/install/rootcrs.pl -postpatch
Executing the first command shows how much free disk space must be available before the patch apply
[grid@rhel12c2 patch]$ /opt/app/12.1.0/grid/OPatch/opatch napply -phBaseFile /tmp/OraGI12Home1_patchList -local  -invPtrLoc /opt/app/12.1.0/grid/oraInst.loc -oh /opt/app/12.1.0/grid -silent -ocmrf /usr/local/patch/ocm.rsp
Oracle Interim Patch Installer version 12.1.0.1.2
Copyright (c) 2013, Oracle Corporation.  All rights reserved.


Oracle Home       : /opt/app/12.1.0/grid
Central Inventory : /opt/app/oraInventory
   from           : /opt/app/12.1.0/grid/oraInst.loc
OPatch version    : 12.1.0.1.2
OUI version       : 12.1.0.1.0
Log file location : /opt/app/12.1.0/grid/cfgtoollogs/opatch/opatch2013-10-17_11-42-52AM_1.log

Verifying environment and performing prerequisite checks...
Prerequisite check "CheckSystemSpace" failed.
The details are:
Required amount of space(10578.277MB) is not available.
UtilSession failed:
Prerequisite check "CheckSystemSpace" failed.
Log file location: /opt/app/12.1.0/grid/cfgtoollogs/opatch/opatch2013-10-17_11-42-52AM_1.log

OPatch failed with error code 73

Unlike the RAC environment single instance database requires running the "loading modified SQLs" manually. 12c provides the datapatch tool for this purpose unlike in 11.2 where catbundle script was run for the same purpose. All databases (CDB and PDB) are updated.
[oracle@rhel6m1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.1.0 on Mon Oct 21 16:58:15 2013
Copyright (c) 2013, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...
Currently installed SQL Patches:
  PDB CDB$ROOT:
  PDB PDB$SEED:
  PDB PDB12C:
  PDB PDB12CDI:
Currently installed C Patches: 17027533
For the following PDBs: CDB$ROOT
  Nothing to roll back
  The following patches will be applied: 17027533
For the following PDBs: PDB$SEED
  Nothing to roll back
  The following patches will be applied: 17027533
For the following PDBs: PDB12C
  Nothing to roll back
  The following patches will be applied: 17027533
For the following PDBs: PDB12CDI
  Nothing to roll back
  The following patches will be applied: 17027533
Adding patches to installation queue...
Installing patches...
Validating logfiles...
Patch 17027533 apply (pdb CDB$ROOT): SUCCESS
  logfile: /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_apply_ENT12C_CDBROOT_2013Oct21_16_58_30.log (no errors)
Patch 17027533 apply (pdb PDB$SEED): SUCCESS
  logfile: /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_apply_ENT12C_PDBSEED_2013Oct21_16_59_06.log (no errors)
Patch 17027533 apply (pdb PDB12C): SUCCESS
  logfile: /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_apply_ENT12C_PDB12C_2013Oct21_16_59_32.log (no errors)
Patch 17027533 apply (pdb PDB12CDI): SUCCESS
  logfile: /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_apply_ENT12C_PDB12CDI_2013Oct21_16_59_55.log (no errors)
SQL Patching tool complete on Mon Oct 21 17:00:30 2013
Each container could be queried to check the status of the apply.
SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT
SQL> select * from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS          ACTION_TIME                  DESCRIPTIO LOGFILE
---------- --------------- --------------- ---------------------------- ---------- ----------------------------------------------------------------------
  17027533 APPLY           SUCCESS         21-OCT-13 05.00.27.856979 PM sqlpatch   /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_app
                                                                                   ly_ENT12C_CDBROOT_2013Oct21_16_58_30.log

SQL> ALTER SESSION SET container = pdb$seed;
Session altered.

SQL> show con_name

CON_NAME
------------------------------
PDB$SEED
SQL>  select * from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS          ACTION_TIME                  DESCRIPTIO LOGFILE
---------- --------------- --------------- ---------------------------- ---------- ----------------------------------------------------------------------
  17027533 APPLY           SUCCESS         21-OCT-13 05.00.29.488402 PM sqlpatch   /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_app
                                                                                   ly_ENT12C_PDBSEED_2013Oct21_16_59_06.log                       

SQL> ALTER SESSION SET container = pdb12c;
Session altered.
                       
SQL> show con_name

CON_NAME
------------------------------
PDB12C
SQL> select * from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS          ACTION_TIME                  DESCRIPTIO LOGFILE
---------- --------------- --------------- ---------------------------- ---------- ----------------------------------------------------------------------
  17027533 APPLY           SUCCESS         21-OCT-13 05.00.30.823562 PM sqlpatch   /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_app
                                                                                   ly_ENT12C_PDB12C_2013Oct21_16_59_32.log
                      
SQL> ALTER SESSION SET container = pdb12cdi;
Session altered.

SQL> show con_name

CON_NAME
------------------------------
PDB12CDI
SQL> select * from dba_registry_sqlpatch;

  PATCH_ID ACTION          STATUS          ACTION_TIME                  DESCRIPTIO LOGFILE
---------- --------------- --------------- ---------------------------- ---------- ----------------------------------------------------------------------
  17027533 APPLY           SUCCESS         21-OCT-13 05.00.30.996406 PM sqlpatch   /opt/app/oracle/product/12.1.0/dbhome_1/sqlpatch/17027533/17027533_app
                                                                                   ly_ENT12C_PDB12CDI_2013Oct21_16_59_55.log

Useful metalink notes
Known Patching Issues for the Oct 15 PSU, Oracle Database 12c R1 using opatchauto and EM [ID 1592252.1]

Update 17 January 2014
More Useful metalink notes
Supplemental Readme - Patch Installation and Deinstallation For 12.1.0.1.x GI PSU [ID 1591616.1]
Example: Manually Apply a 12c GI PSU in Cluster Environment [ID 1594184.1]
Example: Manually Apply a 12c GI PSU in Standalone Environment [ID 1595408.1]
Example: Applying a 12c GI PSU With opatchauto in GI Cluster or Standalone Environment [ID 1594183.1]
What's the sub-patches in 12c GI PSU [ID 1595371.1]

Wednesday, October 16, 2013

Upgrade from 11.1.0.7 to 11.2.0.4 (Clusterware, ASM & RAC)

This is not a step by step guide to upgrading from 11.1.0.7 to 11.2.0.4. There's an earlier post which shows upgrading from 11.1.0.7 to 11.2.0.3 and this post is a follow up to that highlighting mainly the differences. For oracle documentation and useful metalink notes refer the previous post. The 11.1 environment used is a clone of the one used for the previous post.
First difference encountered during the upgrade to 11.2.0.4 is on the cluvfy
./runcluvfy.sh stage -pre crsinst -upgrade -n rac1,rac2 -rolling -src_crshome /opt/crs/oracle/product/11.1.0/crs -dest_crshome /opt/app/11.2.0/grid -dest_version 11.2.0.4.0 -fixup  -verbose
Running the cluvfy that came with the installation media flagged several per-requisites as failed. This seem to be an issue/bug on the 11.2.0.4 installation's cluvfy as the per-requisites that were flagged as failed were successful when evaluated with a 11.2.0.3 cluvfy and the checked values hasn't changed from 11.2.0.3 to 11.2.0.4. For the most part the failures were when evaluating the remote node. If the node running the cluvfy was changed (earlier remote node becomes the local node running the cluvfy) then per-requisites that were flagged as failed are now successful and same per-requisites are flagged as failed on the new remote node. In short the runcluvfy.sh that comes with the 11.2.0.4 installation media(in file p13390677_112040_Linux-x86-64_3of7.zip) is not useful in evaluating the per-requisites for upgrade are met. Following is the list of per-requisites that had issues, clvufy was run from node called rac1 (local node) and in this case rac2 is the remote node
Check: Free disk space for "rac2:/opt/app/11.2.0/grid,rac2:/tmp"
  Path              Node Name     Mount point   Available     Required      Status
  ----------------  ------------  ------------  ------------  ------------  ------------
  /opt/app/11.2.0/grid  rac2          UNKNOWN       NOTAVAIL      7.5GB         failed
  /tmp              rac2          UNKNOWN       NOTAVAIL      7.5GB         failed
Result: Free disk space check failed for "rac2:/opt/app/11.2.0/grid,rac2:/tmp"
cluvfy seem unable to get the space usage from the remote node. When cluvfy was run from rac2 space check on rac2 was passed and space check on rac1 would fail.
Checking for Oracle patch "11724953" in home "/opt/crs/oracle/product/11.1.0/crs".
  Node Name     Applied                   Required                  Comment
  ------------  ------------------------  ------------------------  ----------
  rac2          missing                   11724953                  failed
  rac1          11724953                  11724953                  passed
Result: Check for Oracle patch "11724953" in home "/opt/crs/oracle/product/11.1.0/crs" failed
Patch 11724953 (2011 April CRS PSU) is required to be present in the 11.1 environment before the upgrade to 11.2.0.4 and cluvfy is unable to verify this on the remote node. This could be manually checked with OPatch.
Check: TCP connectivity of subnet "192.168.0.0"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  rac1:192.168.0.85               rac2:192.168.0.85               failed

ERROR:
PRVF-7617 : Node connectivity between "rac1 : 192.168.0.85" and "rac2 : 192.168.0.85" failed
  rac1:192.168.0.85               rac2:192.168.0.89               failed

ERROR:
PRVF-7617 : Node connectivity between "rac1 : 192.168.0.85" and "rac2 : 192.168.0.89" failed
  rac1:192.168.0.85               rac1:192.168.0.89               failed

ERROR:
PRVF-7617 : Node connectivity between "rac1 : 192.168.0.85" and "rac1 : 192.168.0.89" failed
Result: TCP connectivity check failed for subnet "192.168.0.0"
Some of the node connectivity checks also fails. Oddly enough using cluvfy's own nodereach and nodecon checks pass.
ERROR:
PRVF-5449 : Check of Voting Disk location "/dev/sdb2(/dev/sdb2)" failed on the following nodes:
        rac2
        rac2:GetFileInfo command failed.

PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
Even though cvuqdisk-1.0.9-1.rpm is installed sharedness check for vote disk fails on the remote node. (update 2015/02/20 : workaround for this error is given on 1599025.1)
Apart from cluvfy, raccheck could also be used to evaluate the upgrade readiness.
raccheck -u -o pre
Even though cluvfy fails to evaluate certain per-requisites OUI is able to evaluate all without any issue. Below is the output from the OUI

Create additional user groups for ASM administration (refer the previous post) and begin the clusterware upgrade. It is possible to upgrade ASM after the clusterware upgrade but in this case ASM is upgraded at the same time as the clusterware. This is a out-of-place rolling upgrade. The clusterware stack will be up until rootupgrade.sh is run. Versions before the upgrade
[oracle@rac1 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.1.0.7.0]

[oracle@rac1 ~]$ crsctl query crs softwareversion rac1
Oracle Clusterware version on node [rac1] is [11.1.0.7.0]

[oracle@rac1 ~]$ crsctl query crs softwareversion rac2
Oracle Clusterware version on node [rac2] is [11.1.0.7.0]

[oracle@rac1 ~]$ crsctl query crs releaseversion
11.1.0.7.0
Summary page
Rootupgrade execution output from rac1 node
[root@rac1 ~]# /opt/app/11.2.0/grid/rootupgrade.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /opt/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /opt/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
Installing Trace File Analyzer
OLR initialization - successful
  root wallet
  root wallet cert
  root cert export
  peer wallet
  profile reader wallet
  pa wallet
  peer wallet keys
  pa wallet keys
  peer cert request
  pa cert request
  peer cert
  pa cert
  peer root cert TP
  profile reader root cert TP
  pa root cert TP
  peer pa cert TP
  pa peer cert TP
  profile reader pa cert TP
  profile reader peer cert TP
  peer user cert
  pa user cert
Replacing Clusterware entries in inittab
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
This will update the software version but active version will remain the lower version of 11.1 until all nodes are upgraded.
[oracle@rac1 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.1.0.7.0]

[oracle@rac1 ~]$ crsctl query crs softwareversion
Oracle Clusterware version on node [rac1] is [11.2.0.4.0]
Rootupgrade output from node rac2 (last node)
[root@rac2 ~]# /opt/app/11.2.0/grid/rootupgrade.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /opt/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /opt/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
Installing Trace File Analyzer
OLR initialization - successful
Replacing Clusterware entries in inittab
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Start upgrade invoked..
Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the OCR.
Started to upgrade the CSS.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Successfully upgraded the Oracle Clusterware.
Oracle Clusterware operating version was successfully set to 11.2.0.4.0
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
Active version is updated to 11.2.0.4
[oracle@rac2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]
Once the OK button on "Execute Configuration Script" dialog is clicked it will start the execution of set of configuration assistants and ASMCA among them. ASM upgrade is done in a rolling fashion and following could be seen on the ASM alert log
Tue Oct 15 12:52:37 2013
ALTER SYSTEM START ROLLING MIGRATION TO 11.2.0.4.0
Once the configuration assistants are run clusterware upgrade is complete. It must also be noted that the upgraded environment had OCR and Vote disk using block devices
[oracle@rac1 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   9687be081c784f98bf9166d561d875b0 (/dev/sdb2) []
Located 1 voting disk(s).
If it's planned to upgrade to 12c (from 11.2) these must be moved to an ASM diskgroup.
Check and change auto start status of certain resources so they are always brought up
>   'BEGIN {printf "%-35s %-25s %-18s\n", "Resource Name", "Type", "Auto Start State";
>           printf "%-35s %-25s %-18s\n", "-----------", "------", "----------------";}'
Resource Name                       Type                      Auto Start State
-----------                         ------                    ----------------
[oracle@rac1 ~]$ crsctl stat res -p | egrep -w "NAME|TYPE|AUTO_START" | grep -v DEFAULT_TEMPLATE | awk \
>  'BEGIN { FS="="; state = 0; }
>   $1~/NAME/ {appname = $2; state=1};
>   state == 0 {next;}
>   $1~/TYPE/ && state == 1 {apptarget = $2; state=2;}
>   $1~/AUTO_START/ && state == 2 {appstate = $2; state=3;}
>   state == 3 {printf "%-35s %-25s %-18s\n", appname, apptarget, appstate; state=0;}'
ora.DATA.dg                         ora.diskgroup.type        never
ora.FLASH.dg                        ora.diskgroup.type        never
ora.LISTENER.lsnr                   ora.listener.type         restore
ora.LISTENER_SCAN1.lsnr             ora.scan_listener.type    restore
ora.asm                             ora.asm.type              never
ora.cvu                             ora.cvu.type              restore
ora.gsd                             ora.gsd.type              always
ora.net1.network                    ora.network.type          restore
ora.oc4j                            ora.oc4j.type             restore
ora.ons                             ora.ons.type              always
ora.rac1.vip                        ora.cluster_vip_net1.type restore
ora.rac11g1.db                      application               1
ora.rac11g1.rac11g11.inst           application               1
ora.rac11g1.rac11g12.inst           application               1
ora.rac11g1.bx.cs                  application               1
ora.rac11g1.bx.rac11g11.srv        application               restore
ora.rac11g1.bx.rac11g12.srv        application               restore
ora.rac2.vip                        ora.cluster_vip_net1.type restore
ora.registry.acfs                   ora.registry.acfs.type    restore
ora.scan1.vip                       ora.scan_vip.type         restore
Remove the old 11.1 clusterware installation.
During the database shutdown (and before DB was upgraded to 11.2 version) following was seen during the cluster resrouce stop.
CRS-5809: Failed to execute 'ACTION_SCRIPT' value of '/opt/crs/oracle/product/11.1.0/crs/bin/racgwrap' for 'ora.rac11g1.db'. Error information 'cmd /opt/crs/oracle/product/11.1.0/crs/bin/racgwrap not found', Category : -2, OS error : 2
CRS-2678: 'ora.rac11g1.db' on 'rac1' has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
This was due to action_script attribute not being updated to reflect the new clusterware location. To fix this start the clusterware stack and run the following
[root@rac1 oracle]# crsctl modify resource ora.rac11g1.db -attr "ACTION_SCRIPT=/opt/app/11.2.0/grid/bin/racgwrap"
"/opt/app/11.2.0/grid" is the new clusterware home. More on this is available on Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment [948456.1]
Post clusterware installation could be checked with cluvyf and raccheck
cluvfy stage -post crsinst -n rac1,rac2
./raccheck -u -o post
This conclude the upgrade of clusteware and ASM. Next step is the upgrade of database.



The database software upgrade is an out-of-place upgrade. Though it is possible for database software, in-place upgrades are not recommended. Unlike the previous post in this upgrade database software is installed first and then database is upgrade. Check the database installation readiness with
cluvfy stage -pre dbinst -upgrade -src_dbhome /opt/app/oracle/product/11.1.0/db_1 -dbname racse11g1 -dest_dbhome /opt/app/oracle/product/11.2.0/dbhome_1 -dest_version 11.2.0.4.0 -verbose -fixup
Unset ORACLE_HOME varaible and run the installation.

Give an different location to current ORACLE_HOME to proceed with the out-of-place upgrade.

Once the database software is installed next step is to upgrade the database. Copy utlu112i.sql from the 11.2 ORACLE_HOME/rdbms/admin to a location outside ORACLE_HOME and run. The pre-upgrade configuration tool utlu112i.sql will list any work needed before the upgrade.
SQL> @utlu112i.sql
SQL> SET SERVEROUTPUT ON FORMAT WRAPPED;
SQL> -- Linesize 100 for 'i' version 1000 for 'x' version
SQL> SET ECHO OFF FEEDBACK OFF PAGESIZE 0 LINESIZE 100;
Oracle Database 11.2 Pre-Upgrade Information Tool 10-15-2013 15:22:44
Script Version: 11.2.0.4.0 Build: 001
.
**********************************************************************
Database:
**********************************************************************
--> name:          RAC11G1
--> version:       11.1.0.7.0
--> compatible:    11.1.0.0.0
--> blocksize:     8192
--> platform:      Linux x86 64-bit
--> timezone file: V4
.
**********************************************************************
Tablespaces: [make adjustments in the current environment]
**********************************************************************
--> SYSTEM tablespace is adequate for the upgrade.
.... minimum required size: 1100 MB
--> SYSAUX tablespace is adequate for the upgrade.
.... minimum required size: 1445 MB
--> UNDOTBS1 tablespace is adequate for the upgrade.
.... minimum required size: 400 MB
--> TEMP tablespace is adequate for the upgrade.
.... minimum required size: 60 MB
.
**********************************************************************
Flashback: OFF
**********************************************************************
**********************************************************************
Update Parameters: [Update Oracle Database 11.2 init.ora or spfile]
Note: Pre-upgrade tool was run on a lower version 64-bit database.
**********************************************************************
--> If Target Oracle is 32-Bit, refer here for Update Parameters:
-- No update parameter changes are required.
.

--> If Target Oracle is 64-Bit, refer here for Update Parameters:
-- No update parameter changes are required.
.
**********************************************************************
Renamed Parameters: [Update Oracle Database 11.2 init.ora or spfile]
**********************************************************************
-- No renamed parameters found. No changes are required.
.
**********************************************************************
Obsolete/Deprecated Parameters: [Update Oracle Database 11.2 init.ora or spfile]
**********************************************************************
-- No obsolete parameters found. No changes are required
.

**********************************************************************
Components: [The following database components will be upgraded or installed]
**********************************************************************
--> Oracle Catalog Views         [upgrade]  VALID
--> Oracle Packages and Types    [upgrade]  VALID
--> JServer JAVA Virtual Machine [upgrade]  VALID
--> Oracle XDK for Java          [upgrade]  VALID
--> Real Application Clusters    [upgrade]  VALID
--> Oracle Workspace Manager     [upgrade]  VALID
--> OLAP Analytic Workspace      [upgrade]  VALID
--> OLAP Catalog                 [upgrade]  VALID
--> EM Repository                [upgrade]  VALID
--> Oracle Text                  [upgrade]  VALID
--> Oracle XML Database          [upgrade]  VALID
--> Oracle Java Packages         [upgrade]  VALID
--> Oracle interMedia            [upgrade]  VALID
--> Spatial                      [upgrade]  VALID
--> Oracle Ultra Search          [upgrade]  VALID
--> Expression Filter            [upgrade]  VALID
--> Rule Manager                 [upgrade]  VALID
--> Oracle Application Express   [upgrade]  VALID
... APEX will only be upgraded if the version of APEX in
... the target Oracle home is higher than the current one.
--> Oracle OLAP API              [upgrade]  VALID
.
**********************************************************************
Miscellaneous Warnings
**********************************************************************
WARNING: --> The "cluster_database" parameter is currently "TRUE"
.... and must be set to "FALSE" prior to running a manual upgrade.
WARNING: --> Database is using a timezone file older than version 14.
.... After the release migration, it is recommended that DBMS_DST package
.... be used to upgrade the 11.1.0.7.0 database timezone version
.... to the latest version which comes with the new release.
WARNING: --> Database contains INVALID objects prior to upgrade.
.... The list of invalid SYS/SYSTEM objects was written to
.... registry$sys_inv_objs.
.... The list of non-SYS/SYSTEM objects was written to
.... registry$nonsys_inv_objs.
.... Use utluiobj.sql after the upgrade to identify any new invalid
.... objects due to the upgrade.
.... USER ASANGA has 1 INVALID objects.
WARNING: --> EM Database Control Repository exists in the database.
.... Direct downgrade of EM Database Control is not supported. Refer to the
.... Upgrade Guide for instructions to save the EM data prior to upgrade.
WARNING: --> Ultra Search is not supported in 11.2 and must be removed
.... prior to upgrading by running rdbms/admin/wkremov.sql.
.... If you need to preserve Ultra Search data
.... please perform a manual cold backup prior to upgrade.
WARNING: --> Your recycle bin contains 4 object(s).
.... It is REQUIRED that the recycle bin is empty prior to upgrading
.... your database.  The command:
        PURGE DBA_RECYCLEBIN
.... must be executed immediately prior to executing your upgrade.
WARNING: --> Database contains schemas with objects dependent on DBMS_LDAP package.
.... Refer to the 11g Upgrade Guide for instructions to configure Network ACLs.
.... USER WKSYS has dependent objects.
.... USER FLOWS_030000 has dependent objects.
.
**********************************************************************
Recommendations
**********************************************************************
Oracle recommends gathering dictionary statistics prior to
upgrading the database.
To gather dictionary statistics execute the following command
while connected as SYSDBA:

    EXECUTE dbms_stats.gather_dictionary_stats;

**********************************************************************
Oracle recommends removing all hidden parameters prior to upgrading.

To view existing hidden parameters execute the following command
while connected AS SYSDBA:

    SELECT name,description from SYS.V$PARAMETER WHERE name
        LIKE '\_%' ESCAPE '\'

Changes will need to be made in the init.ora or spfile.

**********************************************************************
Oracle recommends reviewing any defined events prior to upgrading.

To view existing non-default events execute the following commands
while connected AS SYSDBA:
  Events:
    SELECT (translate(value,chr(13)||chr(10),' ')) FROM sys.v$parameter2
      WHERE  UPPER(name) ='EVENT' AND  isdefault='FALSE'

  Trace Events:
    SELECT (translate(value,chr(13)||chr(10),' ')) from sys.v$parameter2
      WHERE UPPER(name) = '_TRACE_EVENTS' AND isdefault='FALSE'

Changes will need to be made in the init.ora or spfile.

**********************************************************************
Elapsed: 00:00:01.26
For 11.1.0.7 to 11.2.0.4 upgrades if the database timezone is less than 14 there's no additional patches needed before the upgrade. But it's recommended to upgrade the database to 11.2.0.4 timezone once the upgrade is done. Timezone upgrade could also be done at the same time as database upgrade using DBUA. More on 1562142.1
If there are large number of records in AUD$ and FGA_LOG$ tables, pre-processing these tables could speed up the database upgrade. More on 1329590.1
Large amount of files in $ORACLE_HOME/`hostname -s`_$ORACLE_SID/sysman/emd/upload location could also lengthen the upgrade time. Refer 870814.1 and 837570.1 for more information.
Upgrade summary before DBUA is executed

Upgrade summary after the upgrade

Since in 11.2 versions _external_scn_rejection_threshold_hours is set to 24 by default commenting of this parameter after the upgrade is not a problem.
Check the timezone values of the upgraded database.
SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
  2          FROM DATABASE_PROPERTIES
  3          WHERE PROPERTY_NAME LIKE 'DST_%'
  4          ORDER BY PROPERTY_NAME;

PROPERTY_NAME                  VALUE
------------------------------ ----------
DST_PRIMARY_TT_VERSION         14
DST_SECONDARY_TT_VERSION       0
DST_UPGRADE_STATE              NONE


SQL>  SELECT VERSION FROM v$timezone_file;

   VERSION
----------
        14

SQL> select TZ_VERSION from registry$database;

TZ_VERSION
----------
         4
If timezone_file value and value shown in database registry differ then registry could be updated as per 1509653.1
SQL> update registry$database set TZ_VERSION = (select version FROM v$timezone_file);

1 row updated.

SQL> commit;

Commit complete.

SQL> select TZ_VERSION from registry$database;

TZ_VERSION
----------
        14
Remote listener parameter will contain both scan VIP and pre-11.2 value. This could be reset to have only thee SCAN IPs.
SQL> select name,value from v$parameter where name='remote_listener';

NAME            VALUE
--------------- ----------------------------------------
remote_listener LISTENERS_RAC11G1, rac-scan-vip:1521
Finally update the compatible parameter to 11.2.0.4 once the upgrade is deemed satisfactory. This concludes the upgrade from 11.1.0.7 to 11.2.0.4

Useful metalink notes
RACcheck Upgrade Readiness Assessment [ID 1457357.1]
Complete Checklist for Manual Upgrades to 11gR2 [ID 837570.1]
Complete Checklist to Upgrade the Database to 11gR2 using DBUA [ID 870814.1]
Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment [ID 948456.1]
Things to Consider Before Upgrading to 11.2.0.3/11.2.0.4 Grid Infrastructure/ASM [ID 1363369.1]
Actions For DST Updates When Upgrading To Or Applying The 11.2.0.4 Patchset [ID 1579838.1]
How to Pre-Process SYS.AUD$ Records Pre-Upgrade From 10.1 or later to 11gR1 or later. [ID 1329590.1]
Things to Consider Before Upgrading to 11.2.0.3 to Avoid Poor Performance or Wrong Results [ID 1392633.1]
Things to Consider Before Upgrading to 11.2.0.4 to Avoid Poor Performance or Wrong Results [ID 1645862.1]
Things to Consider Before Upgrading to Avoid Poor Performance or Wrong Results (11.2.0.X) [ID 1904820.1]

Related Posts
Upgrading from 10.2.0.4 to 10.2.0.5 (Clusterware, RAC, ASM)
Upgrade from 10.2.0.5 to 11.2.0.3 (Clusterware, RAC, ASM)
Upgrade from 11.1.0.7 to 11.2.0.3 (Clusterware, ASM & RAC)
Upgrading from 11.1.0.7 to 11.2.0.3 with Transient Logical Standby
Upgrading from 11.2.0.1 to 11.2.0.3 with in-place upgrade for RAC
In-place upgrade from 11.2.0.2 to 11.2.0.3
Upgrading from 11.2.0.2 to 11.2.0.3 with Physical Standby - 1
Upgrading from 11.2.0.2 to 11.2.0.3 with Physical Standby - 2
Upgrading from 11gR2 (11.2.0.3) to 12c (12.1.0.1) Grid Infrastructure
Upgrading RAC from 11.2.0.4 to 12.1.0.2 - Grid Infrastructure

Update on 2016-01-05

On a recent upgrade two pre-req checks failed which were successful on some of the earlier environments.
First one was related to ASMLib check.
Checking ASMLib configuration.
  Node Name                             Status
  ------------------------------------  ------------------------
  abx-db2                               (failed) ASMLib configuration is incorrect.
  abx-db1                               (failed) ASMLib configuration is incorrect.
Result: Check for ASMLib configuration failed.
However according to Linux: cluvfy reports "ASMLib configuration is incorrect" (Doc ID 1541309.1) this is ignorable if the manual check of the ASMLib status returns OK.
Second one was OCR sharedness.
Checking OCR integrity...
Check for compatible storage device for OCR location "/dev/mapper/mpath1p1"...

Checking OCR device "/dev/mapper/mpath1p1" for sharedness...
ERROR:
PRVF-4172 : Check of OCR device "/dev/mapper/mpath1p1" for sharedness failed
Could not find the storage
Check for compatible storage device for OCR location "/dev/mapper/mpath2p1"...

Checking OCR device "/dev/mapper/mpath2p1" for sharedness...
ERROR:
PRVF-4172 : Check of OCR device "/dev/mapper/mpath2p1" for sharedness failed
Could not find the storage

OCR integrity check failed
Again this is ignorable according to PRVF-4172 Check Of OCR Device For Sharedness Failed (Doc ID 1600719.1) and INS-20802 PRVF-4172 Reported after Successful Upgrade to 11gR2 Grid Infrastructure (Doc ID 1051763.1). These MOS notes refer mostly for solaris but the environment in this case was Linux 64-bit (RHEL 5).

Update on 2016-01-15

During the ASM upgrade following error could be seen on the ASM alert log where ASM upgrade is happening.
ALTER SYSTEM STOP ROLLING MIGRATION
KSXP:RM:       ->
KSXP:RM:       ->arg:[hgTo6 (111070->112040) @ inrm3, pay 30203]
KSXP:RM:       ->rm:[cver 30203 nver 30204 cifv 0 swtch/ing 1/0 flux 3 lastunrdy 0/put4]
KSXP:RM:       ->pages:[cur 2 refs 25 tot 25]
KSXP:RM:       ->oob:[ia changed 0 sync 1 lw 0 sg 0 sg_a 0 ssg 0 parm 0]
KSXP:RM:       ->ia:[changed 0 compat 1 sg1 {[0/1]=192.168.1.87} sg2 {[0/1]=192.168.1.87}]
KSXP:RM:  RET hgTo [SKGXP] incompat3 [not-native] at 112040
KSXP:RM:       ->
Errors in file /opt/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_5838.trc:
ORA-15160: rolling migration internal fatal error in module SKGXP,hgTo:not-native
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.239.174, mac=08-00-27-79-49-de, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
  [name='eth0', type=1, ip=192.168.0.85, mac=08-00-27-c4-55-af, net=192.168.0.0/24, mask=255.255.255.0, use=public/1]
  [name='eth0:1', type=1, ip=192.168.0.89, mac=08-00-27-c4-55-af, net=192.168.0.0/24, mask=255.255.255.0, use=public/1]
  [name='eth0:2', type=1, ip=192.168.0.92, mac=08-00-27-c4-55-af, net=192.168.0.0/24, mask=255.255.255.0, use=public/1]
Cluster communication is configured to use the following interface(s) for this instance
  169.254.239.174
There are two MOS notes regarding this error.
ORA-15160: rolling migration internal fatal error in module SKGXP,valNorm:not-native (Doc ID 1682591.1)
Oracle Clusterware and RAC Support for RDS Over Infiniband (Doc ID 751343.1)
However the solutions mentioned in the MOS notes are not needed if before the upgrade ASM instances were using the UDP protocol. This could be verified looking at the ASM alert log. Before the upgrade on ASM alert log
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in server-side pfile /opt/app/oracle/product/11.1.0/asm/dbs/init+ASM1.ora
Cluster communication is configured to use the following interface(s) for this instance
  10.0.3.2
cluster interconnect IPC version:Oracle UDP/IP (generic)
After the upgrade on the ASM alert log
ORACLE_HOME = /opt/app/11.2.0/grid
Using parameter settings in server-side spfile /dev/mapper/mpath6p1
Cluster communication is configured to use the following interface(s) for this instance
  169.254.213.90
cluster interconnect IPC version:Oracle UDP/IP (generic)
If the upgraded database edition is standard then refer the following posts to rectify expdp/impdp related issues.
DBMS_AW_EXP: SYS.AW$EXPRESS: OLAP not enabled After Upgrading to 11.2.0.4 Standard Edition
ORA-39127: unexpected error from call to export_string :=SYS.DBMS_CUBE_EXP.SCHEMA_INFO_EXP while Exporting

Wednesday, October 9, 2013

Role Separation and External Tables

Success of a select query on a external table that is created in a system with role separation whether it's single instance or RAC will depend on the connection (local or remote) and the permissions of the directories/files used for the external table. This seem to be odd behavior as output for a query shouldn't depend on the nature of the connection, if it fails or return data for a local connection same should happen for a remote connection. If one set of permission return results for a locally connected session then same permission should also return data for a remote connection.
Most likely scenario is that since under role separation listener runs as grid user and it appears certain commands are getting executed as grid user. Which muddies the "role separation" when grid user is involved with database objects. Whether this is a bug or not is not confirmed yet, SR is on going.
Below is the test case to recreate the behavior.
1. Create a directory to hold the external table related files in a location that could be made common for both grid and oracle user. In this case /usr/local is chosen as the location.
[root@rhel6m1 local]# cd /usr/local/
[root@rhel6m1 local]# mkdir exdata
[root@rhel6m1 local]# chmod 770 exdata
[root@rhel6m1 local]# chown oracle:oinstall exdata
Directory permission are set 770 for the first set of test cases.
2. The external table is created using a shell script and using the preprocessor clause (this external table example is available in Oracle Magazine 2012 Nov, if memory is correct!). Create the shell script file that will be the basis for the external table with following content
[oracle@rhel6m1 exdata]$ more run_df.sh
#!/bin/sh
/bin/df -Pl
3. Create a directory object and grant read,write permission on the directory object to the user that will create the external table
SQL> create or replace directory exec_dir as '/usr/local/exdata';

SQL> grant read,write on directory exec_dir to asanga;
Check the user has the privileges on the directory
SQL> select * from dba_tab_privs where table_name='EXEC_DIR';

GRANTEE  OWNER TABLE_NAME GRANTOR  PRIVILEGE  GRA HIE
-------- ----- ---------- -------- ---------- --- ---
ASANGA   SYS   EXEC_DIR   SYS      READ       NO  NO
ASANGA   SYS   EXEC_DIR   SYS      WRITE      NO  NO
4. Create the external table
SQL> conn asanga/asa

CREATE TABLE "DF"
    ( "FSNAME" VARCHAR2(100),
    "BLOCKS" NUMBER,
    "USED" NUMBER,
    "AVAIL" NUMBER,
    "CAPACITY" VARCHAR2(10),
    "MOUNT" VARCHAR2(100)
    )
    ORGANIZATION EXTERNAL
   ( TYPE ORACLE_LOADER
   DEFAULT DIRECTORY "EXEC_DIR"
   ACCESS PARAMETERS
   ( records delimited by newline
     preprocessor exec_dir:'run_df.sh'
     skip 1 fields terminated by whitespace ldrtrim
   )
   LOCATION
   ( "EXEC_DIR":'run_df.sh' ));
5. Case 1. Only oracle user having permission on the external table file (run_df.sh).
Set permission on the external file so that only oracle user has read and execute permission and no other permissions set on the file
[oracle@rhel6m1 exdata]$ chmod 500 run_df.sh
[oracle@rhel6m1 exdata]$ ls -l run_df.sh
-r-x------. 1 oracle oinstall 22 Oct  9 14:01 run_df.sh
Test 1.1 Run select query on the external table connecting with a local connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa

SQL> select  * from df;

FSNAME         BLOCKS       USED      AVAIL CAPACITY   MOUNT
---------- ---------- ---------- ---------- ---------- ----------
/dev/sda3    37054144   32706324    2465564 93%        /
tmpfs         1961580     228652    1732928 12%        /dev/shm
/dev/sda1       99150      27725      66305 30%        /boot
Test 1.2 Run select query on the external table connecting with a remote connection. In this case an SQLPlus remote connection is used. Output is the same even if a JDBC connection is used.
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select * from df;
select * from df
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04095: preprocessor command /usr/local/exdata/run_df.sh encountered error
"/bin/sh: /usr/local/exdata/run_df.sh: Permission denied
As seen from the above output no rows returned and an error occurs. Interesting line is "/usr/local/exdata/run_df.sh: Permission denied" no such permission issue when executing with a local connection! Same table,same query and different outputs depending on whether connection is local or remote.
6. Case 2 Oracle user having execute and group having read permission.
All else remaining the same change the permission of the external file to 140 so oracle user has execute permission and group (oinstall) has read permission
[oracle@rhel6m1 exdata]$ chmod 140 run_df.sh
[oracle@rhel6m1 exdata]$ ls -l run_df.sh
---xr-----. 1 oracle oinstall 22 Oct  9 14:01 run_df.sh
Test 2.1 Run select query with a local connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa

SQL> select * from df;
select * from df
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04095: preprocessor command /usr/local/exdata/run_df.sh encountered error
"/bin/sh: /usr/local/exdata/run_df.sh: Permission denied
This time running with a local connection fails. Same error as before.
Test 2.2 Run select query with a remote connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select  * from df;

FSNAME         BLOCKS       USED      AVAIL CAPACITY   MOUNT
---------- ---------- ---------- ---------- ---------- ----------
/dev/sda3    37054144   32697924    2473964 93%        /
tmpfs         1961580     228652    1732928 12%        /dev/shm
/dev/sda1       99150      27725      66305 30%        /boot
Running with a remote connection succeeds. Again same table,same query and different outputs depending on whether connection is local or remote. Looking at the file permission it is clear that when connection is remote file is read by a user that belongs to the oinstall group and executed by Oracle user. Only other user in the oinstall group beside oracle is grid. As remote connection comes in via the listener which is running as grid user remote connections are able to read the external file.
7. Case 3 Oracle user having read and execute permission and group having read permission.
This is a amalgamation of the permission from the above cases. Change external file permission so oracle user has read and execute permission and group has read permission.
[oracle@rhel6m1 exdata]$ chmod 540 run_df.sh
[oracle@rhel6m1 exdata]$ ls -l run_df.sh
-r-xr-----. 1 oracle dba 22 Oct  9 14:01 run_df.sh
Test 3.1 Run select query with a local connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa

SQL> select  * from df;

FSNAME         BLOCKS       USED      AVAIL CAPACITY   MOUNT
---------- ---------- ---------- ---------- ---------- ----------
/dev/sda3    37054144   32706324    2465564 93%        /
tmpfs         1961580     228652    1732928 12%        /dev/shm
/dev/sda1       99150      27725      66305 30%        /boot
Test 3.2 Run select query with a remote connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select  * from df;

FSNAME         BLOCKS       USED      AVAIL CAPACITY   MOUNT
---------- ---------- ---------- ---------- ---------- ----------
/dev/sda3    37054144   32697924    2473964 93%        /
tmpfs         1961580     228652    1732928 12%        /dev/shm
/dev/sda1       99150      27725      66305 30%        /boot
This time both local and remote connections returns rows. So when a external table is created with a preprocessor to get same behavior for both local and remote connections permission must be set for both Oracle user and group.




If the external table is created without a preprocessor but using a static data file then permission of the file makes no difference to the output whether the connection is local or remote. However in this case it's the permission of the directory that matters and this directory permission also applicable to the preprocessor cases mentioned above as well.
8. A static file containing comma separated list of values will be created with the following command where it extract information about file permission and ownership.Execute it in a folder with several files
for i in `ls`; do ls -l $i | awk '{print $1","$3","$4","$9}' >> permission.txt ; done

[oracle@rhel6m1 exdata]$ more permission.txt
-rw-r--r--.,oracle,asmadmin,DF_27539.log
-rw-r--r--.,oracle,asmadmin,DF_27545.log
-rw-r--r--.,oracle,asmadmin,FILE_PERMS_27818.log
-rw-r--r--.,oracle,asmadmin,FILE_PERMS_27842.log
-r-xr-----.,oracle,dba,run_df.sh
-rwxr-xr-x.,root,root,status.sh
-rwxr-xr-x.,root,root,t.sh
Copy the generated file to exdata directory created in the previous cases. Set the permission of the permission.txt file such that only oracle user has read permission and no other permissions set.
[oracle@rhel6m1 exdata]$ chmod 400 permission.txt
[oracle@rhel6m1 exdata]$ ls -l permission.txt
-r--------. 1 oracle oinstall 272 Oct  9 15:38 permission.txt
9. Create the external table using this csv file
create table file_perms (
permission varchar2(12),
owner varchar2(15),
groups varchar2(15),
file_name varchar2(40))
ORGANIZATION EXTERNAL (
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY exec_dir
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY ','
    MISSING FIELD VALUES ARE NULL
  ) LOCATION ('permission.txt'))
   PARALLEL 5
   REJECT LIMIT UNLIMITED;
10.Case 4 CSV file having only read permission for Oracle user
Test 4.1 Run select query with a local connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa

SQL> select * from file_perms;

PERMISSION   OWNER           GROUPS          FILE_NAME
------------ --------------- --------------- -------------------------
-rw-r--r--.  oracle          asmadmin        DF_27539.log
-rw-r--r--.  oracle          asmadmin        DF_27545.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27818.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27842.log
-r-xr-----.  oracle          dba             run_df.sh
-rwxr-xr-x.  root            root            status.sh
-rwxr-xr-x.  root            root            t.sh
Local connection returns rows.
Test 4.2 Run select with a remote connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select * from file_perms;

PERMISSION   OWNER           GROUPS          FILE_NAME
------------ --------------- --------------- -------------------------
-rw-r--r--.  oracle          asmadmin        DF_27539.log
-rw-r--r--.  oracle          asmadmin        DF_27545.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27818.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27842.log
-r-xr-----.  oracle          dba             run_df.sh
-rwxr-xr-x.  root            root            status.sh
-rwxr-xr-x.  root            root            t.sh
Remote connection returns rows as well and there's no difference to the output due to the fact oinstall group not having any permission the csv file.
11. Case 5 Change the permission on the directory containing the files such that only oracle has full set of permission and no other permissions. Only the directory's permission are changed permissions of the file inside the directory are unchanged.
[oracle@rhel6m1 local]$ chmod 700 exdata
drwx------. 2 oracle oinstall 4096 Oct  9 15:48 exdata
Test 5.1 Run select query with a local connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa

SQL> select * from file_perms;

PERMISSION   OWNER           GROUPS          FILE_NAME
------------ --------------- --------------- -------------------------
-rw-r--r--.  oracle          asmadmin        DF_27539.log
-rw-r--r--.  oracle          asmadmin        DF_27545.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27818.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27842.log
-r-xr-----.  oracle          dba             run_df.sh
-rwxr-xr-x.  root            root            status.sh
-rwxr-xr-x.  root            root            t.sh
Local connection returns rows as before.
Test 5.2 Run select with a remote connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select * from file_perms;
select * from file_perms
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file permission.txt in EXEC_DIR not found
Remote connection fails to return any rows. The error says file permission.txt is not found in the directory which is not true. Again same table, same query two different outputs based on whether connection being local or remote. The same behavior could be seen for running the table that uses shell script file as well.
Test 5.3 Run select query with a local connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa

SQL> select  * from df;

FSNAME         BLOCKS       USED      AVAIL CAPACITY   MOUNT
---------- ---------- ---------- ---------- ---------- ----------
/dev/sda3    37054144   32706324    2465564 93%        /
tmpfs         1961580     228652    1732928 12%        /dev/shm
/dev/sda1       99150      27725      66305 30%        /boot
Running select with a remote connection
Test 5.4
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select * from df;
select * from df
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file run_df.sh in /usr/local/exdata not found
12. Case 6 Oracle user having full permission and group having execute permission on the directory.
Change the permission of the directory containing the csv file as shown below
[oracle@rhel6m1 local]$ chmod 710 exdata
drwx--x---. 2 oracle oinstall 4096 Oct  9 15:59 exdata
Test 6.1 Run select query with a local connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa

SQL> select * from file_perms;

PERMISSION   OWNER           GROUPS          FILE_NAME
------------ --------------- --------------- -------------------------
-rw-r--r--.  oracle          asmadmin        DF_27539.log
-rw-r--r--.  oracle          asmadmin        DF_27545.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27818.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27842.log
-r-xr-----.  oracle          dba             run_df.sh
-rwxr-xr-x.  root            root            status.sh
-rwxr-xr-x.  root            root            t.sh
Local connection returns rows.
Test 6.2 Run select with a remote connection
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select * from file_perms;

PERMISSION   OWNER           GROUPS          FILE_NAME
------------ --------------- --------------- -------------------------
-rw-r--r--.  oracle          asmadmin        DF_27539.log
-rw-r--r--.  oracle          asmadmin        DF_27545.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27818.log
-rw-r--r--.  oracle          asmadmin        FILE_PERMS_27842.log
-r-xr-----.  oracle          dba             run_df.sh
-rwxr-xr-x.  root            root            status.sh
-rwxr-xr-x.  root            root            t.sh
Remote connection returns rows as well. Unlike before output is same for both remote and local connection. This permission (710) also works for the external table based on the shell script
[oracle@rhel6m1 exdata]$ sqlplus  asanga/asa@std11g21

SQL> select  * from df;

FSNAME         BLOCKS       USED      AVAIL CAPACITY   MOUNT
---------- ---------- ---------- ---------- ---------- ----------
/dev/sda3    37054144   32697924    2473964 93%        /
tmpfs         1961580     228652    1732928 12%        /dev/shm
/dev/sda1       99150      27725      66305 30%        /boot
In summary when external tables are created in a role separated environment group permissions must be set for files/directories used for creating the external tables. If not the output may vary depending on whether the connection is a local connection or a remote connection.
Tested on 11.2.0.3, 11.2.0.4 and 12.1.0.1 (non-CDB)