пятница, 9 октября 2009 г.

Oracle: Troubleshoot Database Control startup



Common startup issues:
  1. Environment variables;
  2. SYSMAN/DBSNMP issues
  3. Timezone
  4. Network
  5. Configuration (wrong port assignment, wrong connection string)
Troubleshooting steps:

1. Check if environment variables are set correctly. Mainly, check, ORACLE_HOME, PATH, LD_LIBRARY_PATH, LANG.

2. Check if the SYSMAN/DBSNMP account is open. To check, connect to database as SYS and run:

SQL> select username, account_status from dba_users where username in ('SYSMAN','DBSNMP');

and the output:

USERNAME ACCOUNT_STATUS
-------------- ------------------------
DBSNMP OPEN
SYSMAN OPEN

emagent.trc errors

2008-01-19 11:20:21,231 [HttpRequestHandler-28730188] ERROR conn.ConnectionService verifyRepositoryEx.433 - Invalid Connection Pool. ERROR = ORA-28000: the account is locked


3. Check the timezone set in the environment. If the timezone does not match one of the values in ORACLE_HOME/sysman/admin/supportedtzs.lst the dbcontrol agent will not start. When checking dbcontrol status returns: "EM Deamon is not running".

emdb.nohup errors

----- Wed Jul 25 22:31:53 2007::property 'agentTZregion' in '/usr/pkg/oracle/product/10.2.0/db//sysman/config/emd.properties' contains an invalid value of 'TZ set to '.Agent start up can not proceed.This value might have been manually modified to be an incorrect value.This value needs to be set to one of the values listed in '/usr/pkg/oracle/product/10.2.0/db/sysman/admin/supportedtzs.lst'. Execute 'emctl config agent getTZ' and see if this is an appropriate value. -----

4. Check the OS network configuration:
  • static IP (no DHCP generated IP);
  • hostname must not contain "_" (underscore character);
  • nslookup, ping must resolve the fully qualified name;
  • "hosts" file entries pattern:
  • lookup and reverse lookup must work;
  • IPv6 is not supported;
5. Check the database network configuration:
  • check "lsnrctl status" if shows same listener details as ORACLE_HOME/network/admin/listener.ora
  • check TNS status with tnsping utility
emoms.trc errors

ORA-2005-07-04 12:23:08,120 [XMLLoader0] ERROR conn.ConnectionService verifyRepositoryEx.418 - Invalid Connection Pool. ERROR = Listener refused the connection with the following error: ORA-12514, TNS:listener does not currently know of service requested in connect descriptor The Connection descriptor used by the client was: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=)))(CONNECT_DATA=(SERVIC_NAME=)))

6. Check the configuration files are correct.
  • ORACLE_HOME//sysman/emd.properties. Check properties:
REPOSITORY_URL=http://:/em/upload/

EMD_URL=http://:/emd/main
  • ORACLE_HOME//sysman/emoms.properties. Check properties:
oracle.sysman.eml.mntr.emdRepConnectDescriptor - must have a valid connection string
oracle.sysman.eml.mntr.emdRepPort=
oracle.sysman.eml.mntr.emdRepDBName=
oracle.sysman.emSDK.svlt.ConsoleServerPort=
oracle.sysman.emSDK.svlt.ConsoleServerHost=
oracle.sysman.emSDK.svlt.ConsoleServerHTTPSPort=

If none of the above helps check the Known Issues list below.

Known issues

Generic platform

1.
EM Deamon is not running. Database Control starts successfully, however checking the status shows "EM Deamon is not running". Also checking the agent status shows that agent is not running.
//sysman/log/emdctl.trc shows:

2005-11-06 18:04:40 Thread-3840 ERROR main: nmectl.c: nmectl_validateTZRegion, agentTZoffset=120, and testTZoffset for GMT:0 do not match

Solution:

a) Set the desired time zone at the OS level:
Windows: Control Panel->Date&Time->Time Zone
Linux/Unix: export TZ=
the timezone select must correspond to one of the timzones in ORACLE_HOME/sysman/admin/supportedtzs.lst

b) Stop the dbconsole
ORACLE_HOME/bin/emctl stop dbconsole

c) Run:
ORACLE_HOME\bin\emctl config agent getTZ
This may return a diffrent timezone than the one set in step 1.

ORACLE_HOME\bin\emctl config agent updateTZ
This will update the ORACLE_HOME\\sysman\config\emd.properties file with the correct timezone.

d) Start dbconsole
ORACLE_HOME/bin/emctl start dbconsole

2. GIM-00104: Health check failed to connect to instance.
ORACLE_HOME//sysman/log/emagent.trc shows:

2006-05-04 13:17:29 Thread-2206875655 ERROR fetchlets.healthCheck: GIM-00104:Health check failed to connect to instance.
GIM-00090: OS-dependent operation:open failed with status: 24
GIM-00091: OS failure message: Too many open files
GIM-00092: OS failure occurred at: sskgmsmr_7
2006-05-04 13:17:29 Thread-2206875655 ERROR engine: [oracle_database,tmprod_tmprod2,health_check] : nmeegd_GetMetricData failed : Instance HealthCheck initialization failed due to one of the following causes: the owner of the EM agent process is not same as the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above.
2006-05-04 13:17:30 Thread-2206892039 ERROR http: snmehl_connect: failed to create socket: Too many open files (error = 24)

Solution:

Check Note 368612.1

3. Error starting ORMI-Server. Unable to bind socket: Address already in use. Trying to start db control fails without an obvious reason.
ORACLE_HOME//sysman/log/emdb.nohup shows:

----- Mon Nov 6 10:34:13 2006::Console Launched with PID 3441 at time Mon Nov 6 10:34:13 2006
06/11/06 10:34:16 Error starting ORMI-Server. Unable to bind socket: Address already in use


Solution:

Check Note 398499.1, Note 419586.1, Note 438504.1, Note 358961.1

4. Unable to determine local host from URL.
emctl start dbconsole shows:

EMD_URL=http://:/emd/main

Solution:

Check A HREF="/metalink/plsql/showdoc?db=NOT&id=266027.1&blackframe=1" >Note 266027.1, Note 343748.1

5. OC4J Configuration Issue: $ORACLE_HOME/oc4j/j2ee/OC4J_DBConsole_host_sid not found

emctl start dbconsole fails with the following error:
OC4J Configuration Issue: $ORACLE_HOME/oc4j/j2ee/OC4J_DBConsole_host_sid not found

There are three cases for this issue:

a) ORACLE_HOME variable needed to run emctl is set to the wrong database home. Set the right value and retry the process.
b) Network changes. If the hostname where Database Control was created in the first place is not resolvable anymore startup will fail with above error.

Example:

Database configuration folders:
ORACLE_HOME/oc4j/j2ee/OC4J_DBConsole_myhost_sid
ORACLE_HOME/myhost_sid

network changes made hostname "myhost" to be unresolvable. Instead of "myhost" the hostname resolves to "myhost.mydomain.com" emctl will not find:
ORACLE_HOME/oc4j/j2ee/OC4J_DBConsole_myhost.mydomain.com_sid
ORACLE_HOME/myhost.mydomain.com_sid

To resolve the issue database control needs to be recreated using the correct hostname.
Check Note 278100.1 for steps to create db control.

c) Database Control was not configured. Check Note 278100.1 for steps to create db control.

6. Perl errors. Getting the following errors when starting the dbcontrol:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LC_ALL = (unset),
LC__FASTMSG = "true",
LANG = "En_US"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C")


Solution

This is a label view environment problem. In the local environment do the following:

Unset the LANG variable.
Stop and restart the database.

7. Start Dbconsole Shows Errors With Wrong Ps Option on Unix AIX, HP, SOLARIS

> emctl start dbconsole
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
http://:1158/em/console/aboutApplication
ps: unknown output format: -o cmd
ps: illegal option -- -
ps: unknown output format: -o ls
usage: ps [ -aAdeflcjLPy ] [ -o format ] [ -t termlist ]
[ -u userlist ] [ -U userlist ] [ -G grouplist ]
[ -p proclist ] [ -g pgrplist ] [ -s sidlist ]
'format' is one or more of:
user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid pri opri pcpu pmem vsz rss osz nice class time etime stime f s c lwp nlwp psr tty addr wchan fname comm args projid project pset
Starting Oracle Enterprise Manager 10g Database Control ..................... started.


Solution

Check Note 358479.1

8. Starting dbcontrol fails. emdb.nohup shows:

----- ::Console Launched with PID 12031 at time -----
Exception in thread "main" java.util.zip.ZipException: No such file or directory
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:112)
at java.util.jar.JarFile.(JarFile.java:127)

Solution

Check Note 312652.1

9. Startup dbcontrol is making too many "__JDBC__" entries in LISTENER log:

26-SEP-2005 12:09:00 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))(SERVICE_NAME=beta.query))* (ADDRESS=(PROTO =tcp)(HOST=hostname)(PORT=52163)) * establish * beta.query * 0
26-SEP-2005 12:09:00 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))(SERVICE_NAME=beta.query))* (ADDRESS=(PROTO =tcp)(HOST=hostname)(PORT=52164)) * establish * beta.query * 0

Solution

Check Note 336177.1

10. 'emctl start dbconsole' takes ages to start. Emagent process is actually running, although one is unable to access dbcontrol from the browser. emagent.trc shows:

2007-09-16 10:48:16 Thread-1290 WARN vpxoci: OCI Error -- ErrorCode(6550): ORA-06550: line 1, column 65: PLS-00201: identifier 'DBMS_AQADM' must be declared ORA-06550: line 1, column 65:
PL/SQL: Statement ignored
SQL = "/* OracleOEM */ BEGN :succ_sub := 0; dbms_aqadm.creat"...
LOGIN = dbsnmp/@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=1521))(CONNECT_DATA=(SID=FSSYS)))

Solution

Check Note 458834.1

11. dbconsole having hostname starting with letter "u" fails. This is because of DBCONSOLE DOES NOT WORK HAVING A HOSTNAME STARTING WITH "U". Bug is fixed in 10.2.0.2 patchset.

Apply patchset 10.2.0.2 or if not possible:

a) Save the file $ORACLE_HOME\host_SID\sysman\config\emd.properties to emd.properties.orig
b) Update the file $ORACLE_HOME\host_SID\sysman\config\emd.properties, replacing \ with / in the
following line:

For example change:
omsRecvDir=d:\oracle\product\10.2.0\db_1\ukp001_db0\sysman\recv
to
omsRecvDir=d:/oracle/product/10.2.0/db_1/ukp001_db0l/sysman/recv

c) Bounce DB Control

12. SEVERE: Cannot start Database Control. The following ports are already in use: [EM agent port:3938]

a) Check which ports are available for use for this dbcontrol
b) Run the creation of the dbcontrol with using this port numbers:

ORACLE_HOME\bin\emca -config all db -repos recreate -AGENT_PORT -DBCONTROL_HTTP_PORT -RMI_PORT -JMS_PORT

The default port numbers are:

AGENT_PORT: 3938
DBCONTROL_HTTP_PORT: 5500 or 1158
RMI_PORT: 5520
JMS_PORT: 5540

13. The dbconsole cannot be initialized correctly and the logfile $ORACLE_HOME/_/sysman/log/emoms.trc shows the following error:

ORA-12516, TNS:listener could not find available handler with matching protocol stack
The Connection descriptor used by the client was:
<>_/sysman/config/emoms.properties>

Solution

Check Note 458308.1

14. Generic Time zone issues.
Check Note 338556.1, Note 304585.1, Note 332123.1, Note 461918.1.

Platform specific

AIX5L

1.
dbcontrol start fails with:

./emctl start dbconsole
bin/emctl[336]: unlimited: 0403-009 The specified number is not valid for this command.

This happens because AIX 5L O/S itself as documented in AIX Version 4.3 to 5L Migration Guide:
http://www.redbooks.ibm.com/redbooks/pdfs/sg246924.pdf on Page 195.
When the emctl script try to see the value of ulimit. If it's set to unlimited, it throws this exception and continue starting the DBConsole.

Solution

a) You can simply ignore the warning
b) Set the value of the ulimit to a value either than unlimited.

2. unable to star dbcontrol. Main error in emdb.nohup:

+12139 [ Unable to alloc heap of requested size, perhaps the maxdata value is too small - see
README.HTML for more information. ]
+12140 [ **Out of memory, aborting** ]
+12141 [ ]
+12142 [ ]
+12142 [ *** panic: JVMST017: Cannot allocate memory in initializeMarkAndAllocBits(markbits1) ]
+12143 /u01/app/oracle/CPSS/10.2.0/jdk/bin/java[3]: 1253762 IOT/Abort trap(coredump)
+12144 ----- Thu Jul 19 18:27:17 2007::DBConsole exited at Thu Jul 19 18:27:17 2007 with return
value 134. -----
+12145 ----- Thu Jul 19 18:27:17 2007::DBConsole has exited due to an internal error -----
+12146 ----- Thu Jul 19 18:27:17 2007:: - checking for corefile at
/u01/app/oracle/CPSS/10.2.0/abc.xyz.com_sid/sysman/emd -----
+12147 ----- Thu Jul 19 18:27:17 2007::Restarting DBConsole. -----
+12148 ----- Thu Jul 19 18:27:17 2007::Console Launched with PID 1015900 at time Thu Jul 19
18:27:17 2007 -----
+12149 [ Unable to alloc heap of requested size, perhaps the maxdata value is too small - see
README.HTML for more information. ]
+12150 [ **Out of memory, aborting** ]


Solution

Increase ulimit resources for the user starting the dbcontrol.

3. dbcontrol fails to start. emagent.trc shows:

2006-06-14 14:06:01 Thread-1872 ERROR engine: [oracle_database,,health_check] : nmeegd_GetMetricData failed :
2006-06-14 14:06:11 Thread-1562 ERROR pingManager: nmepm_pingReposURL: Error in request response. code = 400. text = 2006-06-14 14:06:16 Thread-1634 ERROR fetchlets: Could not load library '/u01/app/oracle/product/10.2.0/db/lib32/libnmcfhc.so' for reason 'rtld: 0712-001 Symbol main was referenced from module /u01/app/oracle/product/10.2.0/db/lib32/libnmcfhc.so(), but a runtime definition of the symbol was not found.
rtld: 0712-001 Symbol nmeusb_StringBuffer_new was referenced from module /u01/app/oracle/product/10.2.0/db/lib32/libnmcfhc.so(), but a runtime definition of the symbol was not found.


Solution

Check Note 378104.1

4. The Refresh time on the database control home page shows two hours behind the standard time for
Europe/Copenhagen on AIX platform. How to change the time to correct value? (this w/a can be implemented for other timezones regions as long as the corresponding AIX timezone is known).

Check Note 860955.1

Windows Server 2003

1.
Starting the dbcontrol fails with:

The OracleDBConsoleCIMISYU service terminated with service-specific error 1 (0x1)
An error occured while trying to initialize the service.

Solution

a) Apply the latest Patch 6012744 - 10.2.0.3.0 Patch6 for Microsoft Windows (x64).
b) All bugs included in Patch 5846378 are also included in Patch 6012744, since these are cumulative patches.

2. dbcontrol fails to start. emagent.trc shows:

2005-08-26 11:53:56 Thread-544 ERROR pingManager: nmepm_pingReposURL: Cannot connect to
http://:5501/em/upload/: retStatus=-1
2005-08-26 11:53:57 Thread-544 WARN http: snmehl_connect: connect failed to (:5501): No connection could be made because the target machine actively refused it.

Solution

This is an installation issue. During the installation there are some files missing.

The files
oc4j\j2ee\oc4j_applications\applications\em\em\WEB-INF\lib\uix2.jar
oc4j\j2ee\oc4j_applications\applications\em\em\WEB-INF\lib\ohw.jar
oc4j\j2ee\oc4j_applications\applications\em\em\WEB-INF\lib\share.jar

are missing. In the most cases you needs to create the directory WEB-INF\lib manually.

As a workaround, copy the above 3 files to '...\WEB-INF\lib' and restarted dbconsole. The files are located in $ORACLE_HOME\jlib.

References

Note 266027.1 - Problem: Startup: Emctl Start Dbconsole Fails with Agent port missing in EMD_URL
Note 278100.1 - How To Drop, Create And Recreate DB Control In A 10g Database
Note 343748.1 - Problem: Startup: Error starting Database Control, dbconsole - Unable to determine local host from URL
Note 358961.1 - Problem: Startup OMS: Oms Startup Fails With Integration Class Not Found
Note 368612.1 - Problem: Startup: DB Control Agent Crashes: Gim-00091 OS failure Message: Too Many Open Files
Note 398499.1 - Problem: Startup: EM Database Control Has Stopped Working and Unable to Start Again
Note 403928.1 - How to cycle the DB Control emdb.nohup file in $ORACLE_HOME/host_sid/sysman/log
Note 419586.1 - Problem: Startup: Cannot Start dbconsole and log Shows 'ORMI-Server address is already being used'
Note 438504.1 - EMCA or DB Control (DBConsole) Fails with Error starting ORMI-Server
Note 452284.1 - How to manage DB Console Log and Trace files