Sunday, December 16, 2012

java.sql.SQLRecoverableException: IO Error: Connection reset and /dev/urandom

Following stack trace could be seen when the application server was starting.
java.sql.SQLRecoverableException: IO Error: Connection reset
        at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:886)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1491)
        at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:406)
        
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at oracle.net.ns.Packet.receive(Packet.java:300)
        at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
        at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
        at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
        at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873)
        ... 6 more
The application servers (6 of them) were running on RHEL 5 (x86_64) and this is a new deployment so there's less activity on the servers. The JDBC drivers were 11.2.0.3.
It seems the JDBC thin clients running on Linux system uses /dev/random for security purposes and the amount of activity (entropy - disorder in the system to be precise) determine how fast /dev/random can deliver random numbers. A good explanation of this is available here. By default JDBC uses /dev/random and could cause JDBC connection creation to hang when the /dev/random cannot generate random numbers fast enough.


One of the solutions (more available on the above mentioned link) is to make JDBC use /dev/urandom which can deliver random numbers with less activity than /dev/random. (According link above these values are "theoretically vulnerable to a cryptographic attack").
JDBC could be made to use /dev/urandom by specifying it as a system property (triple "/" is needed)
-Djava.security.egd=file:///dev/urandom
For some systems alternate formats of the URL may be needed and these are available in the metalink note given below.

Useful metalink notes
ODI Receives Timeout Messages On Linux x86 With Java 1.5 Onwards [ID 1277360.1]
Oracle Service Registry 11.1 Fails To Install with 'java.sql.SQLRecoverableException: IO Error: Connection reset' Error [ID 1343776.1]
JDBC 11.2.0.3 Application Fails With java.sql.SQLRecoverableException: IO Error: Connection reset [ID 1575238.1]
Oracle Service Registry 11.1 Fails To Install with 'java.sql.SQLRecoverableException: IO Error: Connection reset' Error [ID 1343776.1]

Wednesday, December 12, 2012

HAIP for Private Interconnect in 11gR2

Multiple NICs would be defined of private interconnect during installation (or after) on 11gR2. When there are multiple NICs defined for private interconnect oracle creates highly available IP (HAIP) for interconnect traffic.
From 11.2.0.2 onwards even if single NICs is used for private interconnect Oracle will use HAIP by default. Because of this v$cluster_interconnects view will show HAIP on 11gR2 (11.2.0.2 or later) compared to earlier versions such as 10g or 11gR1.
For example on 11gR1 IPs used for private interconnect could be identified using v$cluster_interconnects view
SQL> select * from gv$cluster_interconnects order by 1;

   INST_ID NAME            IP_ADDRESS       IS_ SOURCE
---------- --------------- ---------------- --- -------------------------------
         1 eth1            192.168.1.87     NO  Oracle Cluster Repository
         2 eth1            192.168.1.88     NO  Oracle Cluster Repository
Or on the database alert log will have following entries
Cluster communication is configured to use the following interface(s) for this instance
  192.168.1.88
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
However due to HAIP this is not the same on 11gR2, and what you specified during installation is not what you see on v$cluster_interconnects.
SQL> select * from gv$cluster_interconnects order by 1;

   INST_ID NAME            IP_ADDRESS       IS_ SOURCE
---------- --------------- ---------------- --- -------------------------------
         1 eth1:1          169.254.73.209   NO
         2 eth1:1          169.254.180.78   NO


Grid automatically creates the 169.254.*.* range IPs from the link local addresses that will be used for HAIP. These IPs are visible when ifconfig is executed and will appear on the same NIC assigned to private interconnect IP.
eth1      Link encap:Ethernet  HWaddr 08:00:27:7C:F4:94
          inet addr:192.168.1.88  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe7c:f494/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1997903 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1680302 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1349158236 (1.2 GiB)  TX bytes:965427458 (920.7 MiB)

eth1:1    Link encap:Ethernet  HWaddr 08:00:27:7C:F4:94
          inet addr:169.254.73.209  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Instead of getif option iflist could be used to list the HAIP associated interface.
oifcfg getif
eth0  192.168.0.0  global  public
eth1  192.168.1.0  global  cluster_interconnect

oifcfg iflist -p -n
eth0  192.168.0.0  PRIVATE  255.255.255.0
eth1  192.168.1.0  PRIVATE  255.255.255.0
eth1  169.254.0.0  UNKNOWN  255.255.0.0
Both ASM and DB alert log will also list the HAIP instead of the actual IP used during installation. From 11gR2 DB alert log
Cluster communication is configured to use the following interface(s) for this instance
  169.254.73.209
cluster interconnect IPC version:Oracle UDP/IP (generic)
From ASM alert log
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.73.209, mac=08-00-27-7c-f4-94, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
  [name='eth0', type=1, ip=192.168.0.86, mac=08-00-27-c1-59-a3, net=192.168.0.0/24, mask=255.255.255.0, use=public/1]

Useful metalink notes
11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip [ID 1210883.1]

Saturday, December 1, 2012

Using OracleConnectionWrapper to Clear Session Context in JDBC Connection Pools

Using application context to limit to the data visibility is one way VPDs are implemented. However in application with JDBC session pools this creates an additional challenge since session context is not cleared automatically when the connection close method is closed. It's only a logical close and connection return to the pool with session context values intact. Following java code illustrate this behavior. The JDBC libraries used are
Driver Name Oracle JDBC driver
Driver Version 11.2.0.3.0
Driver Major Version 11
Driver Minor Version 2
Database Major Version 11
Database Minor Version 2
The DBConnection class which is the connection pool class is the same class used earlier and not listed below.
public class Test {

    public static void main(String[] args) {

        try {

            DBConnection pool = new DBConnection("sbx", "sbx", "192.168.0.99", "1521", "fgacdb");
            
            System.out.println("setting context");
            testSet(pool);
            System.out.println("context set done connection closed");
            Thread.sleep(1 * 5 * 1000);
            System.out.println("getting from pool after close");
            testGet(pool);
            System.out.println("explicit clearing");
            testClear(pool);
            System.out.println("explicit clearing done");
            Thread.sleep(1 * 5 * 1000);
            System.out.println("getting context value after clear");
            testGet(pool);

        } catch (Exception ex) {

            ex.printStackTrace();
        }

    }

    public static void testSet(DBConnection pool) throws SQLException {

        Connection[] cons = new Connection[10];

        for (int i = 0; i < 10; i++) {

            cons[i] = pool.getConnection();
        }

        for (int i = 0; i < 10; i++) {
            try {
                Connection con = cons[i];

                CallableStatement cls = con.prepareCall("{call fgac_ctx_pkg.fgac_set_context(?,?)}");
                cls.setString(1, "user_id");
                cls.setString(2, "user " + i);
                cls.executeUpdate();
                cls.close();
                con.close();

            } catch (SQLException ex) {
                ex.printStackTrace();
            }
        }
    }

    public static void testGet(DBConnection pool) throws SQLException {

        Connection[] cons = new Connection[10];

        for (int i = 0; i < 10; i++) {

            cons[i] = pool.getConnection();
        }

        for (int i = 0; i < 10; i++) {
            try {
                Connection con = cons[i];

                PreparedStatement pr = con.prepareStatement("select sys_context ('fgac_ctx','user_id') from dual");
                ResultSet rs = pr.executeQuery();

                while (rs.next()) {

                    System.out.println(rs.getString(1));
                }

                rs.close();
                pr.close();
                con.close();

            } catch (SQLException ex) {
                ex.printStackTrace();
            }
        }
    }

    public static void testClear(DBConnection pool) throws SQLException {

        Connection[] cons = new Connection[10];

        for (int i = 0; i < 10; i++) {

            cons[i] = pool.getConnection();
        }

        for (int i = 0; i < 10; i++) {
            try {
                Connection con = cons[i];

                CallableStatement cls = con.prepareCall("{call fgac_ctx_pkg.fgac_clear_context(?)}");
                cls.setString(1, "user_id");

                cls.executeUpdate();

                cls.close();
                con.close();

            } catch (SQLException ex) {
                ex.printStackTrace();
            }
        }
    }
}
Code uses following PL/SQL package
create or replace package fgac_ctx_pkg is

procedure fgac_set_context( i_attribute in varchar2, value in varchar2 );
procedure fgac_clear_context (i_attribute in varchar2 );

end fgac_ctx_pkg;
/

create or replace package body fgac_ctx_pkg is

procedure fgac_set_context
( i_attribute in varchar2, value in varchar2 )
is
begin
dbms_Session.set_context('fgac_ctx',i_attribute, value);
end;

procedure fgac_clear_context
( i_attribute in varchar2 )
is
begin
dbms_Session.clear_context('fgac_ctx',attribute => i_attribute);
end;

end fgac_ctx_pkg;
/

create or replace context fgac_ctx using FGAC_CTX_PKG;
Running the code would give the following output
setting context
context set done connection closed
getting from pool after close
user 2
user 1
user 0
user 3
user 4
user 5
user 7
user 6
user 8
user 9
explicit clearing
explicit clearing done
getting context value after clear
null
null
null
null
null
null
null
null
null
null
It could be seen from the output that after setting session context and calling the connection close method, it is still possible to get the session context values when connections are borrowed from the connection pool. The solution is to explicitly call the session clear method. After the explicit clear the session context values are no longer there hence the null output.


Explicitly setting and clearing maybe feasible for a small program but when application is already developed and trying to add these additional lines may require considerable amount of work and testing. This is where the OracleConnectionWrapper comes in handy which allows to wrap the connection close method and allows to call the explicit clear method from one place. This requires minimum work for an existing application. Similarly setting the context value would be done by overriding the getConnection method of the connection pool class.
Extend OracleConnectionWrapper to create a customized wrapper.
public class MyOracleConnectionWrapper extends OracleConnectionWrapper {

    public MyOracleConnectionWrapper(Connection con) {

        super((OracleConnection) con);
    }

    @Override
    public void close() throws SQLException {


        System.out.println("clearing");

        CallableStatement cls = super.prepareCall("{call fgac_ctx_pkg.fgac_clear_context(?)}");
        cls.setString(1, "user_id");
        cls.executeUpdate();
        cls.close();
        super.close();
    }
}
Modify Connection Pool class by overriding getConnection method and populating session context with user id using application specific methods.
public class DBConnection {

    private PoolDataSource pds = null;

    public DBConnection(String username, String password, String host, String port, String sid) {
        try {
            pds = PoolDataSourceFactory.getPoolDataSource();
            pds.setUser(username);
            pds.setPassword(password);
            pds.setURL("jdbc:oracle:thin:@" + host + ":" + port + ":" + sid);
            pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSource");
            pds.setInitialPoolSize(10);
            pds.setMinPoolSize(10);
            pds.setMaxPoolSize(15);

        } catch (SQLException ex) {
            ex.printStackTrace();
        }
    }

    public Connection getConnection() throws SQLException {

        MyOracleConnectionWrapper con = new MyOracleConnectionWrapper((OracleConnection) pds.getConnection());

        CallableStatement cls = con.prepareCall("{call fgac_ctx_pkg.fgac_set_context(?,?)}");
        cls.setString(1, "user_id");
        cls.setString(2, "user " + applicationSessionObj.getUserID());
        cls.executeUpdate();
        cls.close();

        return con;

    }
}
With this code when a connection is borrowed from the connection pool it will have session context values associated with that user id and when connection close method is called session context will be cleared.
Similar to session context auto commit set at connection level is also not reset when returned to pool. To examine this behavior change for loop code where connections are borrowed from pool as follows on testSet method
for (int i = 0; i < 10; i++) {

            cons[i] = pool.getConnection();
            System.out.println(cons[i].getAutoCommit());
            cons[i].setAutoCommit(false);

        }
and on testGet method
 for (int i = 0; i < 10; i++) {

            cons[i] = pool.getConnection();
            System.out.println(cons[i].getAutoCommit());
        }
This will show the status of auto commit when connections are initially borrowed from the connection pool versus status after auto commit is set to false, connection closed and returned to pool and borrowed from pool again.