Friday, January 15, 2016

For input string: SP2-0640: Not connected

Upgrade of a 11.1.0.7 instance to 11.2.0.4 fails with the following error message on DBUA.
Following errors could be seen when DBUA trace.log is examined.
[Thread-12] [ 2016-01-15 11:43:16.187 GMT ] [OracleHome.initOptions:1256]  ORA-27102: out of memory

oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-27102: out of memory

        at oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
        at oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeSql(SQLEngine.java:1925)
        at oracle.sysman.assistants.util.OracleHome.initOptions(OracleHome.java:1249)
        at oracle.sysman.assistants.dbma.backend.CompManager.doPreMigrationChecks(CompManager.java:2955)
        at oracle.sysman.assistants.dbma.backend.CompManager.obtainDatabaseInformation(CompManager.java:4269)
        at oracle.sysman.assistants.dbma.ui.DatabasesPage.doProcessing(DatabasesPage.java:811)
        at oracle.sysman.assistants.util.WaitDialog.run(WaitDialog.java:187)
        at java.lang.Thread.run(Thread.java:637)
..
[Thread-12] [ 2016-01-15 11:43:16.260 GMT ] [OracleHome.initOptions:1324]  Database Options queried: 1
java.lang.NumberFormatException: For input string: "SP2-0640: Not connected"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Integer.parseInt(Integer.java:447)
        at java.lang.Integer.parseInt(Integer.java:497)
        at oracle.sysman.assistants.dbma.backend.CompManager.doPreMigrationChecks(CompManager.java:2956)
        at oracle.sysman.assistants.dbma.backend.CompManager.obtainDatabaseInformation(CompManager.java:4269)
        at oracle.sysman.assistants.dbma.ui.DatabasesPage.doProcessing(DatabasesPage.java:811)
        at oracle.sysman.assistants.util.WaitDialog.run(WaitDialog.java:187)
        at java.lang.Thread.run(Thread.java:637)
Even though the latter stack trace matches the error output (For input string: "SP2-0640: Not connected") seen on DBUA it's not the root cause but the ORA-27102: out of memory. Key to solving the issue is to identify which instance runs out of the memory.
The current shm* parameters are
kernel.shmmax = 5370806272
kernel.shmall = 1311236
kernel.shmmni = 4096

ipcs  -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 5244928
max total shared memory (kbytes) = 5244944
min seg size (bytes) = 1
And the database is using a one segment that's equal to the SGA size
ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x12982ecc 3112962    oracle    660        5370806272 36

SQL> show parameter sga

NAME                                 TYPE            VALUE
------------------------------------ --------------- ----------
sga_max_size                         big integer     5G
sga_target                           big integer     5G
As it is configured now there's sufficient shared segment to run the database. However it is not the database that's been upgraded that runs out of memory but a dummy database instance created by DBUA for a short period during information gathering phase. Information on this dummy instance is also logged on the trace.log
[OracleHome.initOptions:1248]  executing: startup nomount pfile='/opt/app/oracle/product/11.2.0/std3/dbs/initDBUA4304056.ora'
When the DBUA succeed all the files created for this dummy instance are deleted and there's no trace of this dummy instance whatsoever but if it fails the init file used will remain and could be examined.
cd /opt/app/oracle/product/11.2.0/std3/dbs/  
-rw-r----- 1 oracle oinstall     1536 Jan 15 11:43 orapwDBUA4304056
-rw-r----- 1 oracle oinstall      161 Jan 15 11:43 initDBUA4304056.ora
Looking at the init file the log directory could be identified.
more initDBUA4304056.ora
db_name=DBUA4304
db_unique_name=DBUA4304056
shared_pool_size=128m
cpu_count=1
diagnostic_dest=/opt/app/oracle/product/11.2.0/std3/log
_enable_NUMA_support=false
Within the log directory will be a diagnostic directory structure which will have the alert log and the trace file of the failed dummy instance
cd /opt/app/oracle/product/11.2.0/std3/log
cd diag/rdbms/dbua4304056/DBUA4304056/trace/
ls 
alert_DBUA4304056.log  DBUA4304056_ora_6420.trc
These files will have content similar to following
more DBUA4304056_ora_6420.trc

Switching to regular size pages for segment size 8388608
skgm warning: ENOSPC creating segment of size 0000000000800000
fix shm parameters in /etc/system or equivalent

more alert_DBUA4304056.log
Fri Jan 15 11:43:15 2016
The value of parameter shared_pool_size is below the required minimum
It has been reset to the minimum value
Starting ORACLE instance (normal)


As seen on the trace file, the dummy segment is trying acquire a segment of size 8388608. But for the DBUA to be successful it need to create at least 2 other segments. Following segment list captured during a successful DBUA run
ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x12982ecc 3112962    oracle    660        5370806272 37                  <-- DB for upgrade
0x00000000 4423684    oracle    777        13160      2          dest     <-- DBUA itself
0x00000000 4882437    oracle    640        8388608    1                   <-- Dummy instance
0x00000000 4915206    oracle    640        230686720  1                   <-- Dummy instance
0xbcf2a6a8 4947975    oracle    640        2097152    1                   <-- Dummy instance
Dummy instance trace file show this information
*** 2016-01-15 12:10:27.997
Switching to regular size pages for segment size 8388608
Switching to regular size pages for segment size 230686720
Switching to regular size pages for segment size 2097152
CELL communication is configured to use 0 interface(s):
kcrfwy: minimum sleep is 1922 usecs (overhead is 922 usecs)
Running with 1 strand for Non-Enterprise Edition
Running without dynamic strands for Non-Enterprise Edition
kgfmIsAppliance=0 (due to init)
The dummy instance's alert log shows the SGA size of it
****************** Large Pages Information *****************

Total Shared Global Region in Large Pages = 0 KB (0%)

Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 4096 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB

RECOMMENDATION:
  Total Shared Global Region size is 230 MB. For optimal performance,
  prior to the next instance restart increase the number
  of unused Large Pages by atleast 115 2048 KB Large Pages (230 MB)
  system wide to get 100% of the Shared
  Global Region allocated with Large pages
***********************************************************
...
Starting up:
Oracle Database 11g Release 11.2.0.3.0 - 64bit Production.
ORACLE_HOME = /opt/app/oracle/product/11.2.0/std3
Using parameter settings in client-side pfile /opt/app/oracle/product/11.2.0/std3/dbs/initDBUA0520596.ora on machine hpc1.domain.net
System parameters with non-default values:
  cpu_count                = 1
  _enable_NUMA_support     = FALSE
  shared_pool_size         = 148M
  db_name                  = "DBUA0520"
  db_unique_name           = "DBUA0520596"
  diagnostic_dest          = "/opt/app/oracle/product/11.2.0/std3/log"
With the increase demand for the shared memory segments the minimum kernel.shmall value must be at least 1370116 (theoretical value. The sum of all the segments size listed above / PAGE_SIZE). However it should be slightly larger than that in practice, in this case kernel.shmall had to be set for 1370123 before DBUA worked.
Point to remember is that if DBUA fails with above error then it's not the database being upgraded that ran out of memory. It is also good idea to set kernel.shmall to a higher value than SGA size.

Useful metalink notes
11.2.0.3 DBUA Fails : ORA-27102: Out Of Memory, "SP2-0640: Not Connected" [ID 1526424.1]
DBUA Fails With Error "SP2-0640 Not Connected" [ID 1384814.1]
Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device [ID 301830.1]