Kevin Kempf's Blog

October 11, 2013

Agents with security issues

Filed under: Enterprise Manager — kkempf @ 7:52 am

agent

So we’re undergoing a massive push to virtualization at a co-location facility (rented datacenter, if you will), and my first smallish production Oracle database got migrated. As a result, the application server (Windoze) and the 11g Oracle database (OL) got new IP addresses. Always scary, but in the end I knew Oracle would be less problematic than the application server because Oracle doesn’t really have strong ties to IP addresses.

The move went great except for one thing: when I brought up my database on the new subnet, the agent was acting all crazy. It reported everything was up (meaning the listener, host and database) but I couldn’t drill into anything via grid control. It gave me a succinct, useless error:

connect error

I’ll type it here so there’s some change Google can index it: Database Error The Network Adapter could not establish the connection

Well this didn’t mean anything, so I figured I’d leverage my support and open a ticket with Oracle. The analyst didn’t exactly nail the issue, but as they were asking for logs I took a peek and noticed this in the $AGENT_HOME/sysman/log/emctl.log file:

28319 :: Fri Oct 11 06:15:12 2013::AgentStatus.pm:Processing status agent
28319 :: Fri Oct 11 06:15:12 2013::AgentStatus.pm:emdctl status returned 3
30263 :: Fri Oct 11 07:15:35 2013::AgentLifeCycle.pm: Processing status agent
30263 :: Fri Oct 11 07:15:35 2013::AgentStatus.pm:Processing status agent
30263 :: Fri Oct 11 07:15:35 2013::AgentStatus.pm:emdctl status returned 3

The fix? emctl secure agent

After I secured it, I logged into EM 11g (yeah, I’m still not on EM 12c, it’s complicated) and got right into the tabs and pieces I wanted to see. My best guess is that the agent became unsecured as a result of the IP address change, and I merely had to re-secure it to make everything work again on the new IP.

July 17, 2012

Agent Autostart process on Linux

Filed under: Enterprise Manager, Oracle — kkempf @ 2:40 pm

Ever wonder how EM agents automagically start up during host restarts on Linux?

I recently had cause to dig into this, and figured I’d share the architecture for general knowledge sake.

For purposes of this entry, I’m making the rather obvious assumption that the agent has been successfully installed on the Linux host.

Relevant scripts:

These scripts should be landed by the Agent install.

/etc/rc.d/init.d/gcstartup

  • call the $AGENT_HOME/install/unix/scripts/agentstup script with an argument of start or stop

$AGENT_HOME/install/unix/scripts/agentstup

  • start or stop the agent (based on argument $1) via $AGENT_HOME/bin/emctl start|stop agent

$AGENT_HOME/install/unix/scripts/lockgcstartup (/etc/rc.d/init.d/lockgcstartup)

  • touch /var/lock/subsys/gcstartup
  • touch /var/lock/subsys/unlockgcstartup

$AGENT_HOME/install/unix/scripts/unlockgcstartup (/etc/rc.d/init.d/unlockgcstartup)

  • rm -f /var/lock/subsys/gcstartup
  • rm -rf /var/lock/subsys/unlockgcstartup

Relevant Symbolic Links by Run Level:

These sym links should be created during the Agent install.

Run Level 2

/etc/rc2.d/S98gcstartup → /etc/rc.d/init.d/gcstartup

/etc/rc2.d/K98gcstartup → /etc/rc.d/init.d/gcstartup

/etc/rc2.d/S99lockgcstartup -> /etc/rc.d/init.d/lockgcstartup

/etc/rc2.d/K99unlockgcstartup -> /etc/rc.d/init.d/unlockgcstartup

Run Level 3

/etc/rc3.d/S98gcstartup → /etc/rc.d/init.d/gcstartup

/etc/rc3.d/K98gcstartup → /etc/rc.d/init.d/gcstartup

/etc/rc3.d/S99lockgcstartup -> /etc/rc.d/init.d/lockgcstartup

/etc/rc3.d/K99unlockgcstartup -> /etc/rc.d/init.d/unlockgcstartup

Run Level 5

/etc/rc5.d/S98gcstartup → /etc/rc.d/init.d/gcstartup

/etc/rc5.d/K98gcstartup → /etc/rc.d/init.d/gcstartup

/etc/rc5.d/S99lockgcstartup -> /etc/rc.d/init.d/lockgcstartup

/etc/rc5.d/K99unlockgcstartup -> /etc/rc.d/init.d/unlockgcstartup

Walking through the bootup sequence for Runlevel 5:

  • /etc/rc calls the kill scripts (for i in /etc/rc$runlevel.d/K* ; do ) with the command “stop”
    • /etc/rc5.d/K98gcstartup stop {awesome oracle syntax there!}
    • /etc/rc5.d/ K99unlockgcstartup stop {takes no arguments though one is passed}
  • /etc/rc calls the startup scripts (for i in /etc/rc$runlevel.d/S* ; do ) with the command “start”
    • /etc/rc5.d/S98gcstartup start {startup, start!}
    • /etc/rc5.d/S99lockgcstartup start {takes no arguments though one is passed}

Incidentally:

  • Note ID 374068.1 covers some of this information.
  • I’m testing EM 12c and the agent install hooks are exactly the same (now gcstartup calls $AGENT_HOME/core/12.1.0.1.0/install/unix/scripts/agentstup)

Another Oddity

  • For some reason, a Linux x86_64 agent of mine was failing to start on boot
  • The console error was <AgentHome>/install/unix/scripts/agentstup: line 19: [: =: unary operator expected
  • The “fix” from support was to use quotes to protect shell variables in their startup script
    • vi $AGENT_HOME/install/unix/scripts/agentstup
    • change both occurrences of $executingUser to “$executingUser”
    • change both occurrences of $installUser to “$installUser”
  • This is detailed in 789363.1
  • I’m floored that the fix, from support, is to manually edit an Oracle provided script, but at least I got a solution!

 

February 14, 2012

Annoying Agent Problems

Filed under: 11g, Enterprise Manager — kkempf @ 9:30 am

New PROD RDBMS host

We run the main ERP database on a physical machine; I’d love to virtualize, and probably will oneday soon, but we couldn’t get to vSphere 5 (required because of CPU count) before the hardware refresh.  So we migrated Oracle to a brand new spiffy Dell R610 and it’s smokin’ fast.  The process was what is known as physical to physical (P to P) server migration, and it went as well as can be expected.  There was a bit of LVM manipulation required at the OS level, but for the most part we managed to bumble our way through it.

In the process of migrating to the new physical machine (from a rather reliable but ancient IBM blade server, incidentally), I took the plunge and cut over our production database hosts from RedHat 5.7 to Oracle Linux 5.7.  I say take the plunge, but in truth the risk was a known entity: it’s a RedHat compatible kernel.  What sparked this decision was 2 miserable, unresponsive tickets with RedHat support about high system CPU on my application server.  Not to be funny about it, but if I can pay about half as much to get bad support, perhaps better, from Oracle, why wouldn’t I?  Incidentally, the process of migrating from RH5 to OL5 (formerly OEL5, know they just call it OL5) is something which I will put in a detailed post shortly.

Angry Agents

After bringing up the database on new hardware, the agent would not communicate with the OMS:

The Oracle Management Server (OMS) has blocked this agent because it has either been reinstalled or restored from a filesystem backup.  Please click on the Agent Resynchronization button to resync the agent.

Your agent is hopelessly confused

When I “clicked on the agent resynchronization button to resync the agent” if failed with an error.   You can bet your last dime, however, the first thing Oracle Support asked in my ticket?  “Did you try clicking the agent resynchronization button?”.    This is the subsequent message (see below as well):

Agent Operation completed with errors.  For those targets that could not be saved, please go to the target’s monitoring configuration page to save them.  All other targets have been saved successfully.  Agent has not been unblocked.

Error communicating with agent.  Exception message – oracle.sysman.emSDK.emd.comm.CommException: IOException in reading Response :: Connection reset

Your agent has double crossed you

Blocked Agents

If there’s one thing I hate, it’s blocked agents.  You bet I tried to unblock it, then resync it.  I tried command line updates like emctl status agent, emctl upload agent, emctl unsecure agent, emctl secure agent.  You name it.  Nada.

The Fix

I stumbled across Document ID 1307816.1 while my analyst was busy asking me things like “can you upload your log files”.  In the end, as the sysman user, I ran this against your EM database:

exec mgmt_admin.cleanup_agent(‘problemhost.domainname.com:3872′);

After that my agent was happy, could talk to the OMS, and life was good.

January 6, 2012

EM Grid Control Base 12c now with extra useless!

Filed under: Enterprise Manager — kkempf @ 12:06 pm

C is for Cloud

I noticed EM 12c was available for download late last year, but didn’t look into it in earnest until recently.  Then I saw it prominently advertised in the January/February 2012 Oracle Magazine.  Who wouldn’t want EM 12c?

A is for Agent Upgrades

For those of you who are considering an upgrade to EM 12c, two things to know:

1. Per their own “Things to Know” section of Oracle® Enterprise Manager Cloud Control Upgrade Guide
12c Release 1 (12.1.0.1) Part Number E22625-05 : “Oracle Management Service 12c communicates only with Oracle Management Agent 12c. Therefore, it is important to upgrade your Management Agents before upgrading your OMS.”

2. There is no 12c agent available for Windows, confirmed by an SR today.

U is for Useless

Thanks 12c, guess I’ll be waiting. While I don’t like having Windows targets to monitor, my guess is they’re a reality in many, many datacenters.  What is this, like a beta release?  Seriously Oracle, you’re calling it enterprise manager, not database manager.

November 10, 2011

Security Patching EM 11g : I don’t have all day!

Filed under: Enterprise Manager, Security — kkempf @ 3:35 pm

What’s in a Name?

I should begin by saying Enterprise manager is now Enterprise Manager Base Platform.  See ID 1361443.1 if you’re curious as to why they would take a good name and turn it into a terrible one.  If they thought they’d do this to avoid confusion, they failed.  I’ll continue to refer to it as EM 11g, except when necessary for greater clarity.

Oracle Critical Patch Update October 2011

Okay, so I didn’t get around to looking at the October security patch update until a few weeks back.  I still figure that’s better than those who don’t look at it at all.  I decided to start with a non-critical system, my EM 11g setup.  In the olden days, I seem to recall Enterprise Manager had its own category from the main page; maybe I’m mistaken.  Regardless, now you have to click through “Oracle Fusion Middleware 11g Release 1, versions 11.1.1.3.0, 11.1.1.4.0, 11.1.1.5.0” to find EM.  From there, scroll down for days until you get to section 3.3 and there’s Enterprise Manager.  Since I’m running 11g, I proceed to section 3.3.3 “Patch Availability for Oracle Enterprise Manager Base Platform 11.1.0.1“.

This consists of 5 distinct pieces, by Oracle classification

  • Database home (CPU, DB PSU, GI PSU, or Exadata BP12)
    • CPU = Critical Patch Update, incremental security patch
    • PSU = Patch Set Update, cumulative patch which includes recommended + security
    • GI PSU = Grid Infrastructure Patch Set Update, cumulative patch which includes recommended + security for rich people (Grid/Rac users)
    • Exadata BP12 = Oracle Exadata Database Recommended Patch, cumulative patch which appears to include recommended + security for super rich people (Exadata/Rac users)
  • 11.1.0.1 Enterprise manager Base Platform – OMS home: (OMS)
  • 11.1.0.1 Enterprise manager Base Platform – OMS Fusion Middleware home (Weblogic Home)
  • 11.1.0.1 Enterprise manager Base Platform – OMS Fusion Middleware Oracle HTTP Server home (OMS, I think)
  • 11.1.0.1 Enterprise manager Base Platform – Agent home (Agent Home)

Getting on my soapbox

Dear Oracle,

I know you have lots of products, you buy new companies every week, and you are the 800 pound gorilla of the business software world.  Could you please simplify patching?  It’s gotten worse, not better, in the past couple of years.  Why isn’t there one place I can go within each application stack/entity, to see what patches have been applied?  Why are there 5 different methods of security patching (SQL Plus, opatch, adpatch, shell scripts and the wacky Weblogic GUI or CLI) for Oracle apps and Oracle EM11g (sorry, Enterprise Manager Base Platform)?  Also, I don’t have spare weeks to apply quarterly patches to all my systems.  Believe it or not, there’s other things I’m responsible for.  Thanks.

PS: If you want to see a good patch management architecture, check out the RedHat Network.  Systems, once configured, check in every couple of hours and see if there’s something to apply.  If there is, the patch can be released from the website, and either downloaded or applied.  It quite literally runs circles around Oracle’s Configuration Manager (OCM).

I can’t get that day back

There’s basically 4 environments in an EM 11g home: The RDBMS, the OMS, the Agent, and the WLS home.

  • RDBMS
    • Thankfully, patching the database hasn’t changed in years (Linux, non-RAC)
      1. Pull the database PSU to your desktop (in my case, 12827726 PSU for 11.2.0.2)
      2. sftp/scp the file to the RDBMS server/staging area
      3. Shut down the database and listener
      4. opatch apply
      5. sqlplus / as sysdba and run catbundle.sql psu apply
      6. Start the database and listener
  • OMS (Enterprise Manager Base Platform – OMS home for those who prefer maximum verbosity)
      1. opatch apply (12833678)
      2. I think I hit a weird java exception applying this; you may need to apply patch 12620174 first.  I don’t mean to sound vague; I simply don’t remember.
  • Agent Home
      • opatch apply (9345921)
  • WLS (Enterprise Manager Base Platform – OMS Fusion Middleware home for those who prefer maximum verbosity or want to impress their friends)
    • I gotta tell you, this is where it got wacky: Oracle Smart Update (aka bsu.sh).  I never patched a WLS home before.  Shame on me.  Apparently, there are two choices: run their GUI or run their CLI
    • I chose the GUI
      • You might reference ID 1072763.1 regarding how to patch WLS… I thought I had a better example but that one will suffice.  It also covers command line patching.
      • cd $ORACLE_OMS_HOME/../utils/bsu
      • Land the following patches to my desktop.  sftp/scp them to the server under $ORACLE_OMS_HOME/../utils/bsu/cache_dir
        • 12875001
        • 12875006
        • 12874981
        • 10625613
        • 10625676
      • unzip the 5 patches above.  remove the .zip file, and the README file included with them all.
      • ./bsu.sh
        • Using the GUI, the patches appear at the bottom.
        • Ensure you have the right Middleware home selected on the left (in my case, WLS runs on this server as the Discoverer server in a separate Oracle Home)
        • Hit the arrow or some such nonsense to make them go to the top
        • Here’s some screenshots to show you the general flow

Obviously, how you'd launch a patching utility

BSU Main Screen

Select from the list of patches in your cache directory, and hit the green up arrow

After clicking the green arrow, the patch is validated against... something

End state. Everything is installed. I think.

To Summarize

Just to patch EM 11g, the discrete steps involved for me were

  • Read the CPU to determine applicability
  • Determine which patches need to be applied
  • Pull the patches from the world’s slowest support site
  • Stage the patches to the EM server
  • Apply the patches to the database using opatch and sqlplus
  • Apply the patches to the OMS using opatch
  • Apply the patches to the Agent Home using opatch
  • Apply the patches to WLS using the wacky GUI

While this is somewhat of a detailed overview of how to apply the CPU to EM11g last month, I wanted to make two points.

  • First, there are too many disparate ways of patching Oracle, in my opinion.  They range from the simplicity of a GUI for WebLogic patching to literally issuing unzip and cp commands on a Linux host to apply a patch to the 11i techstack home (don’t believe me?  check out patch 10410398).
    • As a result of the above, patching (especially security patching) takes too long
    • As a result of it taking too long, it’s very easy to see how one would choose to ignore security patching
  • I wanted to show the “new” WLS patching method on the blog, as I hadn’t seen it before.  It’s surprisingly simple, yet it felt like Oracle took the ball to the opponents 4 yard line and fumbled.  Why not just automatically pull patches to the cache directory based on a checkin like RedHat (RHN)?  Apply them and report back in a web GUI somewhere?

October 16, 2011

Enterprise Manager 12c

Filed under: Enterprise Manager — kkempf @ 7:31 am

Just a Teaser..

For those of you (like me?) who didn’t get to Open World this year, you may not be aware that Enterprise Manager 12c has been released.  It promises to support cloud-based deployments of Oracle software in addition to all things which were previously managed by EM.  I admit I’m curious about it, but because I’m going to some training next week and will be out of town, I wasn’t in a hurry to upgrade a known entity (EM11g) with an unknown.  I did, however, begin the upgrade process by applying patch 1044087 to my EM 11g.  This makes the upgrade link show up under the deployments tab:

New content

When you click through, you get a whole screen full of upgrade steps to follow

More than an afternoon's work...

I’m running EM 11g, I’m not sure why it suggests I’m upgrading from 10.2.0.5 (underlined in red in screenshot above) but it’s likely this is just a typo.

Sorry for the teaser, that’s as far as my “upgrade” is going for a few weeks as I need EM to be reliable and stable…

September 30, 2011

Geek Arcana

Filed under: 11g, 11i, Enterprise Manager, iPhone, Oracle — kkempf @ 1:54 pm

My former colleague in Chicago Richard complains that I haven’t updated this blog in a while.  To help him steal ideas from me and tell his boss he thought them up, I figured I’d post a few miscellaneous new things.

Broken Record

I’ve been sounding like a broken record, in that whenever I went into My Oracle Support (you know, the slow version of what used to be Metalink?) I’d have to log out and log back into their non-flash site to upload files because it didn’t work with Chrome.  Well I don’t know whether it was an update to Chrome or MOS, but you can finally upload all the inane logs and output file your analyst asks you for while in Chrome!

Chrome is not a crime!

While we’re on that subject

Some evidence all my whining about Oracle’s lack of support for Chrome may have merit!  I noticed this on Slashdot today, and, in a nutshell, it says Chrome will overtake Firefox as the #2 browser within the next few months, and IE is taking huge losses to Chrome as well.  I despise IE, avoid it like the plague (virus?) that it is, and will not shed a tear when it finally falls from the top spot.  It all goes back to the way Micro$oft forced IE down our throats, and I will not browse with a known monopolist.

EM 11g from an iPhone

I found another reason my iPhone runs circles around my old Blackberry.  With it’s built-in VPN capabilities, I can actually get to my EM 11g web server and, well, work!  The only exception is, of course, the well publicized lack of flash support for iOS, noted in the screenshot of the performance tab in the database.  If anyone from Oracle is watching… any chance we can get off flash (HTML5?) in some future version so there’s full mobile functionality?  If not, how about an EM Grid Control App?

These are unedited screenshots from the iPhone, except where I had to hide IP addresses and the like.

Login Page

EM Starting Screen

Database Targets Screen

A specific database

The Performance Tab

The one thing which won't work - Flash for the performance graphs!

Scheduled Jobs

July 7, 2011

More fun and games with EM 11g (Collection Errors)

Filed under: EM to monitor 11i, Enterprise Manager — kkempf @ 2:38 pm

Rarely Resilient

The other day I went into my custom metrics and to my surprise, saw most of the metrics were broken with Collection Error status as shown here:

Damage Control!

Obviously the problem here, is that aside from catastrophic database or general host problems, I’ve lost alerts on all the things Oracle didn’t bother to put in EM11g which are relevant to 11i.  Don’t get me started on the apps management pack.  You can try it for yourself.

Technical Fix

Finding nothing on the world’s slowest support site, I restarted agents, restarted the entire EM tier, but nothing helped.  Finally I went into each metric, typed in the apps password, and hit test:

 

Amazing!  It is working, except it’s not working.  Not sure why it just stopped trying. Regardless, hit OK and you can do the next one, and the next one, and the next one.

End State

About 15 minutes after I did this to all custom metrics, they were all working except for two, which I redid as described above and eventually they snapped into shape also

As normal as it gets

March 29, 2011

Stuck in status pending

Filed under: 11g, Enterprise Manager — kkempf @ 4:00 pm

How annoying is this

After a brief network outage, 3 of my EM database targets were showing as status pending.  The databases were fine.  The agents were fine.

The quickfix

Go to the OS where the agent resides corresponding to the stuck database

  • emctl stop agent
  • cd $AGENT_HOME/sysman/emd
  • rm -rf state/*
  • rm -rf upload/*
  • rm -rf collection/*
  • rm lastupld.xml
  • rm agntstmp.txt
  • rm blackouts.xml
  • emctl clearstate agent
  • emctl start agent
  • emctl upload agent

From EM go to Setup->Agents

  • click on the now jacked up agent you just messed with
  • click on “Start Resynchronization”
  • after that, you may need to unblock the agent, as well, from the Setup->Agents screen

End result

After a minute or two, the agent unscrewed itself, and realized that the database was up as well.

January 13, 2011

No more OUI/GUI Agent install for 11.1 on Windows

Filed under: 11g, Enterprise Manager, Windows 2008 — kkempf @ 9:57 am

Oil and Water

Looks like a great Enterprise Solution!

Windows Server and Oracle databases go together like oil and water.  Everything about administering an Oracle database on Windows is annoying.  From the command line interface to starting services before I can start a database it fails in many ways.  Yet all of this is just my opinion.  What I found out yesterday is a new, substantial fact and a good reason to hate Windows even more.

Server 2008

We have a non-Oracle application which has an Oracle back end database.  It happens to be certified (only) on Windows.  It’s the only Windows RDBMS server I have to administer, so I suppose I should be grateful.  Still, as a result of an upgrade, we were able to move it to 64-bit Oracle 11.1 for Windows Server 2008.  All in all, it was a nice refresh/update of the technology stack as it had previously been running 32-bit Oracle 10.2 on Windows Server 2003.

Agent Woes

So I go to pull the 11.1 agent from OTN/Metalink/MOS/Oracle/World’s Slowest Support Site and am humored to find it’s now 500+ MB!  Seriously?  It’s an agent!  Back in the dark ages, under 9i, I swear they were like 40mb.    I put the thing in my $OH/sysman/agent_download/11.1.0.1.0 directory, and sure enough, it shows up as an option under deployments in EM.  I go through the process outlined here to push the install, and it fails because SSH isn’t running on the Windows host.  Who runs SSH on Windows?  I know it’s technically possible, but seriously, who expects that?   Needless to say, I’m annoyed, but I’m not about to go try to get SSH running on a Windows server I don’t want to administer.  So I push the agent download .zip file to the host and run the installer (tried both setup.exe and installer.exe) only to get this error:

Obviously

Time to contact support

I actually did try to create a response file and run it from the Win 2008 CLI.  It failed for an unknown reason, telling me to check the logs.  Of course, the logs weren’t in the directory I was in, and I was beyond annoyed at this point. Reluctantly, I opened an SR to see what I was doing wrong with the GUI install.  It turns out, nothing.  The analyst confirmed that in 11.1, the OUI/GUI installer has been removed.

One step forward, two steps back

Step back and ask yourself, is this a step forward?  Honestly, how many people run SSH on a Windows server?  My only other recourse is to mock up some cryptic response file (in Windows, no less, with notepad!) and then use a command line interface to manually install the agent (silently!).  Seriously, Oracle, this is just plain stupid.  There’s like 4 parameters required in the old GUI: where do you want to install it, what host and port is Grid Control installed on, and what’s the dbsnmp password?  Why not just leave this in the GUI?  Whoever made this call has obviously never worked in the real world.

My Solution

After berating the analyst, I installed the 10.2.0.5 agent (via the OUI GUI) to monitor my 11.1 RDBMS.  Makes more sense than Oracle’s stance.

Older Posts »

Theme: Silver is the New Black. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 30 other followers