Kevin Kempf's Blog

July 13, 2009

There’s a reason EM is free…

Filed under: Enterprise Manager, Oracle — Tags: , — kkempf @ 7:46 pm

I’m mostly kidding about the title of this entry; on the whole, I really like Enterprise Manager Grid Control.   It simplifies management of my Oracle databases, backups, and Blackberry notifications when something is amiss.  I don’t use the “pay” 11i pack; I rely on custom written (User-Defined Metrics, or UDM’s) SQL to keep me informed if there is something wrong in the apps.  It’s not that I’m opposed to the management packs (we use, and pay for, diagnostics & tuning, and they’ve saved me a lot of time), I just don’t see what it brings to the table for me beside another annual maintenance fee. 

Well it turns out this past weekend, for yet undetermined reason, EM stopped collecting information from all agents.  This is really obnoxious, as I didn’t even know I was “blind” to my Oracle databases.  It just silently stopped collecting.    Like a union on strike without the picket line.   It happens that I did some minor maintenance Sunday, which required me to bounce my ERP PROD database, and I was curious how quickly my buffer cache recovered, and how it was behaving.  Well surprise, surprise, all my data is stale as of 11pm Saturday night. 

A bit of background; I run EM “Grid Control” on a RH5 Linux x86_64, with about 4GB devoted to the SGA and 2 processors, all in a virtual machine.  There’s nothing unusual about this, it’s always performed fine.

Alright, back to the problem at hand.  I did a little bit of checking, bounced some agents, even bounced the whole EM application server and database, just to be sure it was running normally.  Everything checks out.   The agents upload fine (or at least, think they did, as far as I could tell) and I have no problem doing real-time monitoring of any of my systems.  This is puzzling, and I open the perfunctory SR to see if there’s any intelligent life at home today at Oracle support.   Turns out, no.   The analyst asks me for reasonable files, such as logs from the agent.   But it’s going nowhere fast.  So I reinstall the agent on my PROD system, thinking it may be messed up somehow; this has happened from time to time.  No dice. 

Then the analyst starts asking me to do downright dumb things.  She noticed that there was an error message in the log about one of my custom metrics complaining about a trailing semi-colon at the end of my SQL.  Well this has never hurt in the past, and although I will admit that there was an error, and definitely a problem with one of my custom metrics, I failed to understand how this could have run for months and suddenly caused catastrophic failure Saturday night at 11pm (nobody had done anything to EM all day Saturday).   About this time, I gave up on the analyst, and started doing some real digging on my own. 

I looked in the EM application home, and noticed that there were a ton of .xml files in $ORACLE_HOME/sysman/recv/errors.  This didn’t look right; I didn’t really care about the metrics at this point that were “stuck” so I deleted everything in that directory.  Then I found a note which mentioned running this code as sysman against my EM repository, based on some odd “unavailable partition” errors I saw in the logs:

SQL > exec emd_maintenance.analyze_emd_schema(‘SYSMAN’)
SQL> exec emd_maintenance.partition_maintenance

Wouldn’t you know it, agents start reporting in and things return to normal.

It’s sad, because when I was a rookie, my Oracle mentor taught me to open a TAR/SR on every issue you couldn’t solve in short order, believing that 2 heads were better than one, and this was the analysts’ specialty.  It turns out, as of late, I’m about 0 for 5 on analysts solving my problems.  I don’t know if they’re overworked, underqualified, or just plain incompetent, but I just don’t have any faith anymore in support analysts.  I do much better searching on my own.   Don’t get me wrong, in most cases without Metalink I couldn’t have figured out the solution, but it pains me to see my company shell out so much money for support when all I really need is Metalink access.  I guess that’s why I get paid.

Create a free website or blog at