Friday, February 1, 2013

Parsing KFOD-00307: asmlib error!! function = [asm_open], error = [0], mesg = [Operation not permitted]

This was a new installation and storage vendor had setup the block devices as below.
ls -l /dev/eql/*
lrwxrwxrwx 1 root root 8 Jan 15 12:00 /dev/eql/clus-01 -> ../dm-16
lrwxrwxrwx 1 root root 8 Jan 15 12:00 /dev/eql/clus-02 -> ../dm-18
lrwxrwxrwx 1 root root 8 Jan 15 12:00 /dev/eql/clus-03 -> ../dm-20
(Showing only the block devices relevant to the post). This was using Dell EqualLogic PS4100x iSCSI SAN and was told to use /dev/eql/* for the database. Though asked "why the softlink and /dev/dm*" was told that's how it setup and have to use the /dev/eql/* softlink for "my work". At this point decided to use ASMLib since it was not possible to change the ownership of the softlink and ASMLib was able to use root owned disks without a problem. The Oracle documents that have read (605828.1,470913.1,602952.1) said not to use /dev/dm* so didn't want to directly use them and conveyed this to storage vendor before proceeding.
Installed and configured ASMLib
/usr/sbin/oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"
and created the ASM disk that will be used in theASM diskgroup used for storing clusterware files using oracleasm
/etc/init.d/oracleasm listdisks
CLUS01
CLUS02
CLUS03
Went ahead with the installation and when at the ASM creation page, not ASM disks were visible when the discovery string ORCL:* was set. On the installer log following error message could be seen
INFO: ... discoveryString = ORCL:*
INFO: Executing [/tmp/OraInstall2013-01-10_02-28-18PM/ext/bin/kfod, nohdr=true, verbose=true, disks=all, status=true, op=disks, asm_diskstring='ORCL:*']
 with environment variables {XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt, LANG=en_GB.UTF-8, DISPLAY=:1, PWD=/home/grid/grid, _=/home/grid/grid/install/.oui, GS_LIB=/home/grid/.fonts, KDEDIR=/usr, KDE_MULTIHEAD=false, NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat, USER=grid, CLASSPATH=, XMODIFIERS=@im=none, G_BROKEN_FILENAMES=1, SSH_AGENT_PID=8475, COLORTERM=, HOME=/home/grid, HOSTNAME=RCL-DB-02, SESSION_MANAGER=local/RCL-DB-02:/tmp/.ICE-unix/8600, LESSOPEN=|/usr/bin/lesspipe.sh %s, LD_LIBRARY_PATH=/tmp/OraInstall2013-01-10_02-28-18PM/ext/bin/../lib, WINDOWID=4194311, ORACLE_HOME=/tmp/OraInstall2013-01-10_02-28-18PM/ext/bin, LOGNAME=grid, SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass, SHELL=/bin/bash, SSH_AUTH_SOCK=/tmp/ssh-GSFGtX8450/agent.8450, CVS_RSH=ssh, DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-KbHCyuMD9K,guid=1f484b25202997262dd3e40050eecda6, INPUTRC=/etc/inputrc, SHLVL=5, KDE_IS_PRELINKED=1, KDE_FULL_SESSION=true, HISTTIMEFORMAT=%b %e - %H:%M:%S , MAIL=/var/spool/mail/grid, KDE_NO_IPV6=1, VNCDESKTOP=RCL-DB-02:1 (grid), KONSOLE_DCOP=DCOPRef(konsole-8643,konsole), JAVA_HOME=, LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:, KONSOLE_DCOP_SESSION=DCOPRef(konsole-8643,session-1), LC_ALL=, HISTSIZE=1000, XDG_CONFIG_DIRS=/etc/kde/xdg:/etc/xdg, TERM=xterm, PATH=/usr/bin:/usr/ccs/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/grid/bin:/home/grid/grid/install}
INFO: Starting Output Reader Threads for process /tmp/OraInstall2013-01-10_02-28-18PM/ext/bin/kfod
INFO: Parsing KFOD-00307: asmlib error!! function = [asm_close], error = [0], mesg = [Invalid argument]
INFO: Parsing KFOD-00307: asmlib error!! function = [asm_open], error = [0], mesg = [Operation not permitted]
INFO: The process /tmp/OraInstall2013-01-10_02-28-18PM/ext/bin/kfod exited with code 0
INFO: Parsing KFOD-00307: asmlib error!! function = [asm_close], error = [0], mesg = [Invalid argument]
INFO: Waiting for output processor threads to exit.
INFO: Parsing KFOD-00307: asmlib error!! function = [asm_open], error = [0], mesg = [Operation not permitted]
INFO: Parsing KFOD-00307: asmlib error!! function = [asm_close], error = [0], mesg = [Invalid argument]
INFO: Parsing KFOD-00307: asmlib error!! function = [asm_open], error = [0], mesg = [Operation not permitted]
INFO: Output processor threads exited.
Closest metalink note that could be found was ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted] [ID 398622.1] but even that was not an exact match in terms of the error code and of no help as this was a new installation.
However it was confirmed that ASM disk are there by listing the /dev/oracleasm/disk and using that path in the discovery string.




It's at this point the tinkering work began. First place was the scan order. Try different combinations including eql/clus-*, eql/*, dm-* , dm* but still didn't discover any ASM disk using ORCL:*.
But when the exact device was specified in the scan order it was possible to get the ASM disks using the ORCL:*. It took a while to get the head around this
ORACLEASM_SCANORDER="dm" <---- DOES NOT WORK NO ASM DISK DISCOVERED
ORACLEASM_SCANORDER="dm-16 dm-18 dm-20" <--- WORK FINE SEE THE ASM DISK
At this point raise an SR. Understandably Oracle's initial effort revolved around checking if ASMLib setup is correct and it was confirmed to be setup correct and correct value for ORACLEASM_SCANORDER was dm. But still couldn't explain why dm doesn't work and dm-16 work (work = ASM disk visible in the installer).
After sometime an Oracle engineer spot the issue which was believed to do with the multipath setup used by the storage vendor and might even explain the oddity mentioned above. Storage vendor wasn't using the Linux multipathing software but a proprietary one that came with the SAN. Similar to SCANORDER and SCANEXCLUDE it had eql and ignore-eql.
ls -l /dev/ignore_eql/
lrwxrwxrwx 1 root root 8 Jan 18 19:32 0-1cb196-32b50460e-993001e8c7650e17_a -> ../dm-15
lrwxrwxrwx 1 root root 8 Jan 18 19:32 0-1cb196-33f50460e-197001e8c7950e17_a -> ../dm-17
lrwxrwxrwx 1 root root 8 Jan 18 19:32 0-1cb196-34f50460e-8a7001e8c7c50e17_a -> ../dm-19
lrwxrwxrwx 1 root root 7 Jan 18 19:32 0-1cb196-3a950460e-f17001e8c8550e17_a -> ../dm-9
lrwxrwxrwx 1 root root 8 Jan 18 19:32 0-1cb196-3ba50460e-ffb001e8c8850e17_a -> ../dm-13
lrwxrwxrwx 1 root root 8 Jan 18 19:32 0-1cb196-3c950460e-ce7001e8c8b50e17_a -> ../dm-11
lrwxrwxrwx 1 root root 7 Jan 18 19:32 0-1cb196-3e050460e-0a1001e8c8e50e17_a -> ../dm-5
lrwxrwxrwx 1 root root 7 Jan 18 19:32 0-1cb196-3f050460e-833001e8c9150e17_a -> ../dm-7
lrwxrwxrwx 1 root root 8 Jan 18 19:32 0-1cb196-3ff50460e-a3b001e8c9450e17_a -> ../dm-21

ls -l /dev/eql/
lrwxrwxrwx 1 root root 8 Jan 18 19:32 clus-01 -> ../dm-16
lrwxrwxrwx 1 root root 8 Jan 18 19:32 clus-02 -> ../dm-18
lrwxrwxrwx 1 root root 8 Jan 18 19:32 clus-03 -> ../dm-20
However looking at the /dev/mapper could see that there are multiple labels for the same device(only one set highlighted)
ls -l /dev/mapper/
brw-rw---- 1 root disk 253, 15 Jan 18 19:32 eql-0-1cb196-32b50460e-993001e8c7650e17_a <-- dm-15
brw-rw---- 1 root disk 253, 16 Jan 18 19:32 eql-0-1cb196-32b50460e-993001e8c7650e17-clus-01 <-- dm-16
brw-rw---- 1 root disk 253, 17 Jan 18 19:32 eql-0-1cb196-33f50460e-197001e8c7950e17_a
brw-rw---- 1 root disk 253, 18 Jan 18 19:32 eql-0-1cb196-33f50460e-197001e8c7950e17-clus-02
brw-rw---- 1 root disk 253, 19 Jan 18 19:32 eql-0-1cb196-34f50460e-8a7001e8c7c50e17_a
brw-rw---- 1 root disk 253, 20 Jan 18 19:32 eql-0-1cb196-34f50460e-8a7001e8c7c50e17-clus-03
and on the multipath list only one of these labels from each set appeared (only one highlighted)
 multipath -ll
eql-0-1cb196-34f50460e-8a7001e8c7c50e17_a () dm-19 EQLOGIC,100E-00
[size=1.0G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
 \_ 6:0:0:0 sdf 8:80  [active][ready]
 \_ 14:0:0:0 sdm 8:192 [active][ready]
eql-0-1cb196-33f50460e-197001e8c7950e17_a () dm-17 EQLOGIC,100E-00
[size=1.0G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
 \_ 1:0:0:0 sdj 8:144 [active][ready]
 \_ 15:0:0:0 sdn 8:208 [active][ready]
eql-0-1cb196-32b50460e-993001e8c7650e17_a () dm-15 EQLOGIC,100E-00
[size=1.0G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
 \_ 16:0:0:0 sdo 8:224 [active][ready]
 \_ 17:0:0:0 sdp 8:240 [active][ready]
But the ASM is seeing all these paths.(only one set highlighted)
/etc/init.d/oracleasm querydisk -p CLUS01
Disk "CLUS01" is a valid ASM disk
/dev/sdj: LABEL="CLUS01" TYPE="oracleasm"
/dev/mapper/eql-0-1cb196-32b50460e-993001e8c7650e17_a: LABEL="CLUS01" TYPE="oracleasm" <-- dm-15
/dev/mapper/eql-0-1cb196-32b50460e-993001e8c7650e17-clus-01: LABEL="CLUS01" TYPE="oracleasm" <-- dm-16
/dev/sdp: LABEL="CLUS01" TYPE="oracleasm"

/etc/init.d/oracleasm querydisk -p CLUS02
Disk "CLUS02" is a valid ASM disk
/dev/sdf: LABEL="CLUS02" TYPE="oracleasm"
/dev/mapper/eql-0-1cb196-33f50460e-197001e8c7950e17_a: LABEL="CLUS02" TYPE="oracleasm"
/dev/mapper/eql-0-1cb196-33f50460e-197001e8c7950e17-clus-02: LABEL="CLUS02" TYPE="oracleasm"
/dev/sdo: LABEL="CLUS02" TYPE="oracleasm"

/etc/init.d/oracleasm querydisk -p CLUS03
Disk "CLUS03" is a valid ASM disk
/dev/mapper/eql-0-1cb196-34f50460e-8a7001e8c7c50e17_a: LABEL="CLUS03" TYPE="oracleasm"
/dev/mapper/eql-0-1cb196-34f50460e-8a7001e8c7c50e17-clus-03: LABEL="CLUS03" TYPE="oracleasm"
/dev/sdm: LABEL="CLUS03" TYPE="oracleasm"
/dev/sdn: LABEL="CLUS03" TYPE="oracleasm"
It was thought that this could explain why dm in the scan order would not discover any ASM disks while dm-16 (and others) could discover them.
Unfortunately that's where this post ends. Going to the storage vendor with this information didn't yield anything. "That's how it is setup" was repeated and due the time critical nature of the work decided to drop using ASMLib altogether and decided to go for block devices (events on this post happened before the post on the link) after letting the storage vendor know what Oracle says about using /dev/dm*.

Useful metalink notes
New ASMLib / oracleasm Disk Gets "header_status=Unknown" - Cannot be Added to ASM Diskgroup [ID 391136.1]
ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted] [ID 398622.1]