Kevin Kempf's Blog

April 16, 2014

Password Reset Error

Filed under: 11i, Cloning, R12 — kkempf @ 12:31 pm

Just minding my business

I received an email saying the following error occurs in an 11i cloned instance.  Mind you, this is because we added users to the environment, and by rule Oracle requires them to change their password the first time they login so the password is different than what the system administrator assigned.

Internet Explorer error message:

jsp

 

I’ll spell out a few keywords to the search engines can index this one: AppsChangePassword.jsp java.lang.NoClassDefFoundError JSP Error

If you’re running Chrome, here’s your error message

 

 

 

 

That Was Supposed to be a Joke

From Chrome, I just got a white page, no error, no nothing.  I had to debug this from IE, which just pains me.  Oh yeah, I’m not supposed to use Chrome for Ebusiness suite, even though it works fine and I’ve been using it from a Linux desktop for 8 years now.  Incidentally, there’s a plug in for Chrome to make R12 work, it turns out it does work fine though I have some privacy concerns… you can find it here.

The Fix

First I tried to bounce apache, but that did nothing, so I opened an SR.  I think I found this fix on MOS before the analyst gave it to me, but it didn’t seem like a great fit based on the description in the doc so I didn’t do it until he told me to.  Hat tip to MOS and the ATG team, they identified a fix for an issue quickly and without asking me 10 irrelevant questions first.  Anyways, run this script

$JTF_TOP/admin/scripts/ojspCompile.pl –compile -s ‘AppsChangePassword.jsp’ –flush

March 11, 2014

A Fun Diversion?

Filed under: Uncategorized — kkempf @ 11:01 am

What do others see on your resume?

I ran across a suggestion today to see what a recruiter sees on your resume. It’s kind of a fun exercise; just copy your resume to the clipboard and paste it in. It looks a LOT prettier on their website…

www.tagcrowd.com

Here’s where mine ended up:

February 26, 2014

Controlling Data Guard Replication Lag

Filed under: Uncategorized — kkempf @ 12:35 pm

oracledg

Tuning Standby Lag with Oracle Active Data Guard

We bought active data guard because of increasing reporting demands on our primary database in our ERP environment. While active data guard doesn’t play nicely with Apps 11i out of the box (it’s read-only, and just to establish a forms session you need to be able to write), it can fill a nice role offloading CPU load by serving up near-real time reports for specific (in our case bolt on) applications which only need to read tables.

What is Near-Real Time?

Once an SLA is established for the “oldest” data the report can return, you can tweak Oracle to honor it. In my case, the example below shows me going from no lag target to 300 seconds and finally to 600 seconds. Note that once you settle on a number, you’d be wise to add a “scope=both” to the end of the alter system.

$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Wed Feb 26 10:26:49 2014

Copyright (c) 1982, 2013, Oracle. All rights reserved.
 
Connected to:
 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options
 SQL> show parameter archive_lag_target;

 NAME TYPE VALUE
 ------------------------------------ ----------- ------------------------------
 archive_lag_target integer 0
 SQL> alter system set archive_lag_target = 300;
 System altered.
 SQL> alter system set archive_lag_target = 600;
 System altered.
 

Lag Behavior

I found it interesting to note that left to “its own devices” the pattern of archivelog ships (and therefore application on the other end)becomes an inverse function of how active your database is. In other words, the recency of your data at your standby is related to how fast you fill your online redo logs (which is also a function of how big they are), plus the odd twist of system-driven logfile switches. Lets say you had 200MB redo logs with a nearly idle system. Your lag can get huge if not tuned!

The graphic below captures the result of SQL:
select to_number(substr(value,instr(value,':',1,2)+1,length(value))) + 60 * to_number(substr(value,instr(value,':',1,1)+1,2)) seconds from v$dataguard_stats@apps_to_dataguard where name = 'apply lag';

Dataguard

The left part of the graphic (before 10:21) shows data transport left untuned. From 10:21 to 11:21 you can see where I had it set to 300 seconds.
From 11:21 onward it’s set to 600 seconds.

Near-Real Time on Steroids: Real Time Apply

Check your licensing, your mileage may vary. The easiest way to keep the standby up to date is to use real-time apply and standby logs. To create standby logs, you go to your standby and cancel recovery:

alter database recover managed standby database cancel;

Next, add your standby logs. They need to be the same size as the online redo logs on the primary. Make N+1 of them on the standby. The syntax looks like this:

alter database add standby logfile group 41 ('/usr/local/oracle/redo/log41b.dbf','/u04/appprod/proddata/log41a.dbf') size 100M;
alter database add standby logfile group 42 ('/usr/local/oracle/redo/log42b.dbf','/u04/appprod/proddata/log42a.dbf') size 100M;
alter database add standby logfile group 43 ('/usr/local/oracle/redo/log43b.dbf','/u04/appprod/proddata/log43a.dbf') size 100M;
alter database add standby logfile group 44 ('/usr/local/oracle/redo/log44b.dbf','/u04/appprod/proddata/log44a.dbf') size 100M;

Finally, restart your recovery with the real-time apply:

alter database recover managed standby database disconnect using current logfile;
Now your apply lag and transport lag drop to zero:

Dataguard

In the graph above, you can see the final version of data apply rates to the standby.

8:58a-10:21a: unmanaged apply rate (mostly happening when a log on the primary got full)

10:21a-11:21a: honoring the 300 second alter system set archive_lag_target=300;

11:21a-4:30p: honoring the 600 second alter system set archive_lag_target=600;

4:30p-end: real time apply started with standby logfile groups (effectively 0).

Thanks Roth!

Thanks Roth!

February 24, 2014

Oracle RDBMS tier is Virtualized

Filed under: Uncategorized — kkempf @ 2:24 pm

vmware

Background

So we’ve been running non-Production versions of our Oracle 11i E-Business suite environment on VMWare since about 2006, and my Linux x86 PAE 11i front end on VMWare since at least 2008.  I never opened a ticket or had any issue with VMWare affecting a guest OS in any way, whether the kernel was Red Hat, Oracle Linux “Red Hat compatible” or even now the UEK.  Oracle protects themselves with this (Doc ID 249212.1):

 Oracle has not certified any of its products on VMware virtualized environments. Oracle Support will assist customers running Oracle products on VMware in the following manner: Oracle will only provide support for issues that either are known to occur on the native OS, or can be demonstrated not to be as a result of running on VMware.

The truth is, they try to scare you away from it, but it runs just fine, in my experience.  The big unknown was always the core database, running on 64-bit Linux.  It really does work hard, and has lots of moving parts.  As we began the project to convert it, we realized we had a lot of questions without firm answers, so we engaged a prominent 3rd party Oracle to VMWare integrator.  They did a great job managing the project, but to be honest, unbeknownst to us we’d already figured out most of the technical detail.

Project Flow

There was an extra wrinkle for our production go-live.  The server was physically moving from one data center to another one about 50 miles up the road.  We had good WAN links, but it still added some time.  While we considered leveraging dataguard to accomplish this, in the end we landed on an RMAN restore and Data Domain replication.

  • Create a VM with lots of CPUs and the same memory footprint as production.  Put in all the patches, updates, kernel parameters, and set up huge pages for the 64gb SGA.
  • Add disk from the SAN leveraging LVM and ext3.   We could have used ext4, but the Linux OS was Oracle Linux 5.10 (UEK) and ext3 felt more “tried and true”
  • Create a VMWare template at this point (this was actually done several times, after many, many OS tweaks)
  • Install the Oracle RDBMS software (11.2.0.4 in my case) as well as deploying the EM agent
  • Use RMAN to bring the database to the new VM from the physical box.  In the case of test runs, this was an RMAN duplicate.  For the final, live production run it was a little trickier:
    • RMAN full backup of the RDBMS the night prior, so that full can get across the WAN and my incremental difference will be smaller
    • Shut down the 11i front end so users can’t get in
    • RMAN (hot) backup the RDBMS, shut it down
    • Wait for the Data Domain to replicate the RMAN backup and archivelogs to where the new server was
    • RMAN restore and recover the database
    • IP changes
      • Perform internal DNS changes to the new vlan
      • Bring up the 11i front end at the new location with a VEEAM restore
      • Re-IP the two machines, via /etc/hosts and /etc/sysconfig/network, as well as FND_NODES changes (see 751328.1)

Results & Advantages

Since the move to VMWare, we’ve had no issues whatsoever.  The database runs mostly in memory, and the users are none the wiser.  VMWare does bring some advantages to the table:

  • Unlike a physical machine, where I have to be in the data center at a console, or possibly over a DRAC card or the like, when I reboot my machine now, I can watch it in VCenter.
  • I don’t have to worry about drivers.  I had a serious issue when our oddball 10gb NICs decided to stop working after a yum update on the physical box during a reboot.
  • If the physical server (which in the case of the RDBMS is effectively the same thing as the ESX host, as the host is entirely devoted to running Oracle) running the RDBMS breaks, overheats, has a bad memory chip, burns up a CPU, or gets struck by lightening, we can shut it down cold and storage v-motion it to another host in minutes.
  • It’s a mainstream, mature product.  I did a lot of homework on this, and the general consensus was that Oracle VM wasn’t ready yet, but there’s LOTS of people running Oracle on VMWare.

December 31, 2013

2013 in review

Filed under: Uncategorized — kkempf @ 7:55 am

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.   I guess I need to blog more.  Haven’t had as much time this year, nor relevant content.  Next year I have a few heavy hitters on the docket: leveraging active data guard, tearing down Discoverer 11g and replacing it with Oracle Apex (nearly done, by the way), virtualizing the main ERP Production database on VMWare, and (finally) an upgrade to R12.2.  While continuing to work on my MBA.  Should be fun, see you there!

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 49,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 18 sold-out performances for that many people to see it.

Click here to see the complete report.

December 5, 2013

Cleaning up after Autoconfig

Filed under: 11i, Linux — kkempf @ 12:01 pm

shell

This post could also be called “writing a tiny shell script to notify me of a difference in a config file”.  Short and sweet 2nd post today, I love shell scripting!

Whenever I run autoconfig, it reverts back a value for my Mobile Supply Chain dispatcher port to a value stored in the database somewhere which is wrong.  If I dig into the log, it’s complaining that the port I want it to change to is in use.  Duh, it’s running.  Except it complains when it’s not running also.  The dispatcher is downright critical to getting into Mobile Supply Chain, and Oracle Support is being their usual helpful self.  Except now as of Dec 1, new for 11i: extra ignore!

Let’s assume you have a “good” version of a config file sitting next to one autoconfig wants to keep changing.  This script will email you the differences if they exist.  It assumes you have sendmail setup on your linux host.

cat checkmwacfg.sh

MWACHECK=`diff --brief $MWA_TOP/secure/mwa.cfg $MWA_TOP/secure/mwa.cfg.good|grep -woF differ`
if [ $MWACHECK ]
then
  diff $MWA_TOP/secure/mwa.cfg $MWA_TOP/secure/mwa.cfg.good | mail -s "$MWA_TOP/secure/mwa.cfg file is different" kempf@myemailaddress.com
fi

Crontab it up and you’re done:

# email me bad MWA config file in $MWA_TOP/secure once a day
0 16 * * * /(path to script)/checkmwacfg.sh

Good old ADI

Filed under: 11i — kkempf @ 9:42 am

Applications Desktop Integrator

I can’t really defend the fact that we’re still using ADI, running on 11i.  Some lines of business are resistant to change.  I recently ran into an issue after upgrading to RDBMS 11.2.0.4 and pushing the techstack to Autoconfig U.  It was a bugger, because accounting couldn’t publish their Financial Statements (FSG’s), and to be honest, while the truth resided on Oracle’s Support Site (or MOS), it wasn’t easy to find.

What is ADI?

For those who are blissfully unaware of this tool, it’s essentially a desktop Excel “plug-in” which connects to the database and the 8.0.6 techstack via the RDBMS listener or 8.0.6 listener on the applications tier (depending on what they’re doing).  In a word, it allows accountants to do their business in Excel, where they prefer to work, and then push data to the EBS.  It’s technically only certified for Windows XP, and Oracle has said it’s dead, to be replaced by Web ADI for uploading journal entries and Oracle Report Manager for publishing.  It seems accountants tend to like things just like they were in 1970, so that’s where we’re at.

New Security

As a part of moving to 11.2.0.4, some part of the applications tier packages (I heavily suspect Autoconfig U, but there were other patches), Oracle implemented a “security feature” which in effect just tightened down the 8.0.6 listener.  Generically, you can find the problem description in MOS note 291897.1.  What really happens is that after autoconfig, a protocol.ora file gets landed in your 8.0.6 $TNS_ADMIN directory.  This protocol.ora file has 2 lines (or at least mine did) generated by autoconfig:

tcp.validnode_checking = yes
tcp.invited_nodes = (<database node IP>, <apps tier IP>, <apps tier IP>)

Behavior

As a result of this change, and a regression test “miss”, accounting could log into ADI and post journals, but when they went to publish reports they received this awesome Oracle message:

adi error

I’m typing it out so google can crawl it: An error occurred while attempting to establish an Applications File Server connection.  There may be a network configuration problem, or the TNS listener may not be running.

The Quick Fix

# pound out the 2 lines in protocol.ora, and restart the 8.0.6 listener with adalnctl.sh stop/start.

The Better Fix

Log into 11i as system administrator and change profile option value SQLNet Access to value: ALLOW_ALL at the site level.  In theory this lets you survive an autoconfig run.

The Most Correct Fix (Grudgingly Admitted)

Figure out who needs access, and add them via OAM.  See MOS 281758.1 for details, look under Managed SQL*Net Access from Hosts

Soapbox

Why does Oracle wait 10 years and then change something like this in the terminal release?  Because they can.  While I admit it would be ideal to name every PC from which an accountant could log in to publish a report, the truth is I don’t know when they might get a new machine, or who might be doing it from home over VPN, etc.  It’s a good idea, but somewhat impossible to administer, and I’d venture to guess irrelevant for most installations.  Why not add the feature, but by default DISABLE it?  Just saying.

October 11, 2013

Agents with security issues

Filed under: Enterprise Manager — kkempf @ 7:52 am

agent

So we’re undergoing a massive push to virtualization at a co-location facility (rented datacenter, if you will), and my first smallish production Oracle database got migrated. As a result, the application server (Windoze) and the 11g Oracle database (OL) got new IP addresses. Always scary, but in the end I knew Oracle would be less problematic than the application server because Oracle doesn’t really have strong ties to IP addresses.

The move went great except for one thing: when I brought up my database on the new subnet, the agent was acting all crazy. It reported everything was up (meaning the listener, host and database) but I couldn’t drill into anything via grid control. It gave me a succinct, useless error:

connect error

I’ll type it here so there’s some change Google can index it: Database Error The Network Adapter could not establish the connection

Well this didn’t mean anything, so I figured I’d leverage my support and open a ticket with Oracle. The analyst didn’t exactly nail the issue, but as they were asking for logs I took a peek and noticed this in the $AGENT_HOME/sysman/log/emctl.log file:

28319 :: Fri Oct 11 06:15:12 2013::AgentStatus.pm:Processing status agent
28319 :: Fri Oct 11 06:15:12 2013::AgentStatus.pm:emdctl status returned 3
30263 :: Fri Oct 11 07:15:35 2013::AgentLifeCycle.pm: Processing status agent
30263 :: Fri Oct 11 07:15:35 2013::AgentStatus.pm:Processing status agent
30263 :: Fri Oct 11 07:15:35 2013::AgentStatus.pm:emdctl status returned 3

The fix? emctl secure agent

After I secured it, I logged into EM 11g (yeah, I’m still not on EM 12c, it’s complicated) and got right into the tabs and pieces I wanted to see. My best guess is that the agent became unsecured as a result of the IP address change, and I merely had to re-secure it to make everything work again on the new IP.

July 15, 2013

Before flashback DB was cool

Filed under: 11g, Oracle, RMAN — kkempf @ 3:50 pm

There was RMAN to flash your database back in time:

run
{
set until time “to_date(’07/04/13 04:00:00′,’mm/dd/yy hh24:mi:ss’)”;
allocate auxiliary channel d1 type disk;
allocate auxiliary channel d2 type disk;
restore database;
recover database;
alter database open resetlogs;
}

July 9, 2013

Oracle Mobile Supply Chain Applications (MSCA) : One month after go-live

Filed under: 11i — kkempf @ 10:25 am

brokenc

MSCA vs. Highjump

We replaced a rather over-customized and pretty unstable 3rd party software called (formerly 3M) Highjump with Oracle’s own Mobile supply chain.  In the end, I think the only gain was that now we’re not running the implementation of mobile on a Windows server.  MSCA sits on the 11i application server, and runs* there.  Out of the box, MSCA is rather useless, most transactions require so much irrelevant or derivable (is that a word?) input that by the time you get through the mobile form you feel like you ran a half-marathon.  That said, it does work, it’s faster and more native than a 3rd party bolt-on, and we can lean on Oracle support when we have problems.

* depending upon your definition of run

Failures in Unexpected Places (aka AUTHENTICATION FAILURE)

So I went out to Wikipedia, to confirm that telnet is (slightly) older than I am.  What amazes me is that for a protocol that old, Oracle has managed to completely botch implantation, and yet still charge us for the pleasure of having to deal with it.  Here’s where MSCA is really frustrating.  They can’t keep 3 telnet threads and a round-robin dispatcher (or listener) running right.  Yes, we’re on 11i and thus we’re behind, but it runs off of Java 6 and as I mentioned telnet has been around awhile.  I opened an SR with Oracle on this issue.  The analyst confirmed I was on the latest version of code, then proceeded to tell me most customers bounce the MWA dispatcher and the telnet threads every shift.  Really?  That’s the support answer?  The tech even agreed that it was a bad answer, but the best they could do:

Hi Kevin

I agree with your comment about 198543.1 . When I saw this first time I had same impression but over the years now I have seen almost every customer do start once a day so it is not strange 
any more for me . Base on information you gave you already restarting which is good only other suggestion is increase ports (MWA servers ) if you look at 567214.1 suggestion would be 4 servers 
for 60 users

Add this to the flakiness of the Java MSCA GUI client, and you have a product that’s well implemented and cemented into Oracle behind the scenes, but is an unmitigated disaster to the users.

For whatever reason, every X hours, one of the telnet ports decides to stop working.  When you happen to get that port via round robin (or go to it directly) you get AUTHENTICATION FAILURE messages on the client.  When they get a hold of me to fix it, I first try to issue a graceful stop of the port via $MWA_TOP/bin/mwactl.sh -login apps/password -stop PORT#.  This always returns AUTHENTICATION FAILURE.  So graceful just went out the window.  Now I have to ps -ef |grep PORT# to find the PID on the command line and kill it, then use $MWA_TOP/bin/mwactl.sh start PORT# to restart it.  Really, really lame.

This has happened often enough that I’m going to have to script around it in a rather elaborate and spectacular fashion.  It’s better to script by day, than answer the phone at night, I always say.   In essence, the script will have to do the following on some regular basis:

  • try to stop each port gracefully and see if it returns AUTHENTICATION FAILED
  • if it does get AUTHENTICATION FAILED, grep out the PID for that particular telnet server and kill -9 it
  • restart that telnet server I just killed via MWACTL.sh

The Dispatcher

The one thing this doesn’t do is account for the dispatcher.  When that goes South, it’s game over.  Basically you can find the dispatcher PID by ps -ef|grep’ing for MWADIS.  I have, however, had cases where the MWADIS process is not running but when I use MWACTL.sh to start the dispatcher (only) it tells me it’s already running.  netstat -anp doesn’t see it listening on the assigned port.  It’s really a mess I hope doesn’t pop up in production.

Impressions

While I’m aware that 11i is showing its age, Oracle has still failed to provide a functional incentive for my organization to migrate to R12.  There’s financial reasons (“We’re gonna charge you more!”) there’s techstack reasons (I’d really like to be running on a web server released in the past decade, and I’d like to be running the front end on 64-bit linux) but functionally, we’re not getting anything.  After implementing MSCA, and the 6-months of headache preceding it, I’m in NO hurry to get on the R12 bandwagon.  Why do I mention this?  Because, of course, the answer to my complaints about the stability of the telnet server will be “get on the latest release”.  In my opinion, Oracle has had 40 years to figure out how to make telnet stable, and you’re trying to tell me they just go it in R12?

 

Older Posts »

The Silver is the New Black Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers