Kevin Kempf's Blog

August 24, 2009

11i Stub Library: libc-2.1.3-stub.so (aka Concurrent Managers Won’t Start with GSM=Y) Finally Fixed!

Filed under: 11i, Cloning, Concurrent Managers, Linux, Oracle, Utilities — Tags: , — kkempf @ 10:06 am

I’ve been working this week on setting up my regression TEST environment for an upcoming cycle; among the patches I’m testing are ATG_PF_H RUP7, AD.I.7 and the latest autoconfig templates.  It never fails when I apply core framework technology: my 8.0.6 stub libraries get hosed.  By “getting hosed” I’m referring to various Signal 11 errors, or variants of these, which cause the concurrent managers to fail to start until you set GSM=N or fix it as I’ll detail further here.  But I got sick of applying 3830807 to fix it, and decided to dig deeper into the 8.0.6 homes…. here’s what I found….

As review, Developer 6i (Forms, Reports, CMs) in 11i is generally what people are talking about when they say $ORACLE_HOME or 8.0.6 Home on the application server.   The iAS_ORACLE_HOME is Application Server 9iR1 (technically 1.0.2.2.2 in my case) and runs Apache.

3830807 delivers 2 binaries and a shell script.  Basically the script goes out and fixes make files, if necessary, puts a few symbolic links in the directory, and lands the binaries in the 8.0.6 home under lib/stubs.

  • libc-2.1.3-stub.so ($8.0.6 Home/lib/stubs) needs to be 261328 bytes in size, and libc.so 71 bytes.  In my case, libc-2.1.3-stub.so was 131951 bytes and needed to be fixed.

First I had to understand what I was dealing with, and what caused it.  A quick scan of my environments showed all non-production environments had the wrong version, and they were all date stamped the date of the last clone.  It turns out that adcfgclone (adclone, autoclone, whichever you prefer) is the culprit.  But I’m on the latest everything as far as I know: ATG F RUP 6, ADI.7, TXK Autoconfig Templates T….

With help, I found that the setup_stubs.sh script in the $iAS Home (Apache Home, or $ORACLE_HOME/../iAS, generally, on the apps tier) was actually corrupting the libc-2.1.3-stubs.so file in the $8.0.6 Home/lib/stubs.  I could not figure out where the 131951 byte version of libc-2.1.3-stubs.so was even coming from, but there it was, sitting in the iAS Home/lib/stubs, dated June of 2001.  Wow!  That’s old! Another piece I was able to confirm: unless adadmin executable recompiles are forced, or a patch specifically calls for a recompile, this “bad” libc-2.1.3-stubs.so can sit there silently waiting to affect you later.  When I looked at all my cloned environments, they all had the “bad” version of the file, but because I hadn’t happened to manually force an adadmin executable relink (and why would I, under normal circumstances?) my concurrent managers and all other FND binaries still worked.

Now we’re getting somewhere… but I still don’t know one fundamental question: can I safely patch the iAS Home with a newer version of libc-2.1.3-stubs.so?

At this point, I turned to Oracle support..  It’s been there since… I’ll update this entry as I make progress.

Relevant Notes: 465629.1, 847775.1

Related Patch: 3830807

* edit* 9/2/09: still waiting on word from support…

* edit* 9/23/09: Support confirmed I can overwrite the $IAS_HOME/lib/stubs version (131951 byte) of the file libc-2.1.3-stubs.so with the (newer) version (261328 bytes) of the file from $8.0.6 Home/lib/stubs:

UPDATE
=======
Development confirm that you can safely copy the newer file over the older file.

QUESTION
=========
Can the older (131951 byte) version of libc-2.1.3.sub.so in the iAS home be overwritten by the one delivered with patch:3830807 (261328 bytes)?

ANSWER
========
Yes, based on reasearch this appears to be the intended situation. So please copy the newer file over the older file. This should then be a permanent fix for this problem.

July 8, 2009

RDBMS 11g Cloning “Gotcha”

Filed under: 11g, Oracle — Tags: , , , , — kkempf @ 3:08 pm

As you might be able to tell, I’ve been doing some cloning (and cloning streamlining) lately and hit another obscure bug which required some attention. Toward the end of my cloning process I used the FNDCPASS utility to change the apps and non-apps but related (GL, AR, AP, etc) passwords so they differ from production. I’m going about my normal business of running the script which does this for me, and next thing I know, I can’t log in as apps with either the old or the new password. Bugger.

After making this mistake once, I re-cloned the environment (which is a VM) and took a snapshot before I started messing with the offending script. Two or three tries (and snapshot reverts) later, while following 159244.1, I’m out of ideas. I’m doing it verbatim from their doc. I even opened an SR with Oracle. I told the analyst I was running this in a VM. Thus it was of no value to me to try to troubleshoot the “broken” post-FNDCPASS condition, since I could just revert the snapshot. I think her head exploded when she read this. She didn’t know what a VM was, or if FNDCPASS could possibly even work in one. At this point, I (once again) gave up on support.

After trying to relink the binary without a change in behavior, I came to the sudden realization that I could not cite a case where I had successfully used this utility since going to RDBMS 11g. All my non-production environments were upgraded in place, and thus had never had this utility run against them. That’s when I found a note on Metalink which ultimately fixed the issue (751868.1). Curiously, it is worded to imply that this condition came up only in cases where you were migrated from 11i to R12, and also 10.2 to 11.1 RDBMS. Upon reading it in detail, however, you can see that R12 is irrelevant and this did fix my issue.

Reverted my snapshots, started RDBMS with sec_case_sensitive_logon=false in my init file, and voila, suddenly FNDCPASS correctly changes the password.

July 7, 2009

Java Color Scheme (POV) change after a clone

Filed under: Oracle, Utilities — Tags: , , , , — kkempf @ 12:50 pm

After a clone, I like to change the Java Color Scheme (Profile Option) of the new environment. This is mostly a visual queue for myself and anyone using the environment. First and foremost, it helps you confirm you’re not in PROD. Since I consistently use the same colors for each environment each time, it also helps confirm you’re in the right flavor of DEV/TRAIN/TEST etc.

After a clone, I got tired of always having to log in to the new environment and update this in the GUI, and thought this script might be helpful to others.  

declare
  cursor c2 is
    select
        fu2.user_id   
       ,fpo.profile_option_name pon
       ,fpot.user_profile_option_name upon
       ,fu2.user_name lov
       ,fpov.profile_option_value pov
    from
        fnd_profile_options_tl fpot
       ,fnd_profile_options fpo
       ,fnd_profile_option_values fpov
       ,fnd_user fu
       ,fnd_user fu2
    where
        fpot.user_profile_option_name = 'Java Color Scheme' 
    and
        fpot.profile_option_name = fpo.profile_option_name
    and
        fpo.profile_option_id = fpov.profile_option_id
    and
        fpo.created_by = fu.user_id
    and
        fpov.level_id = 10001  /* site (10004=user, 10001=site, 10002=Appl, 10003=Resp) */
    and
        fpov.level_value = fu2.user_id
    and
        fpot.language = Userenv('Lang')
    ;
begin
  dbms_output.disable;
  dbms_output.enable(100000)
  ;
  for c2_rec in c2 loop
    status := fnd_profile.save(c2_rec.pon,'&New_Color_Scheme_lower','SITE');
    if status then
      dbms_output.put_line('Java Color Scheme Updated');
    else
      dbms_output.put_line('Java Color Scheme FAILED');
    end if
    ;
    commit
    ;
  end loop
  ;
end
;
/

incidentally, valid values to enter are as follows:

blaf
blue
khaki
olive
purple
red
teal
titanium

As you can see, if you use your imagination, you can modify this script to update virtually any system profile option value.  If you have a user value which needs adjusting, it changes slightly in that a new parameter is required to pass to fnd_profile.save .  This is the user_id (in the c2_rec loop for your convenience).  For example, it might look like this:

status := fnd_profile.save(c1_rec.pon,null,’USER’,c1_rec.user_id);

Well anyways, enjoy, let me know what you think.

June 18, 2009

11g Dataguard/Advanced Compression Bug

Filed under: 11g, Bugs, Oracle — Tags: , , — kkempf @ 3:50 pm

Ah it was inevitable. I spoke too kindly of RDBMS 11g. Now I’m stuck waiting on Oracle Development to fix a major problem. The gist of it is that after I began compressing tables with 11g Advanced Compression, the dataguard instance would crash.

First, I saw odd ORA-07445 [__INTEL_NEW_MEMCPY()+44] SIGSEGV and ORA-0600 errors in my standby alert log. Eventually the instance crashed. After much research, I enabled the init parm db_block_checking (highly recommended!) and the error became much clearer; because of db_block_checking it was no longer writing garbage it was failing the check:

Errors in file /u01/appprod/oracle/proddb/11.1.0/log/diag/rdbms/proddg/PROD/trace/PROD_pr0e_226
66.trc:
ORA-10562: Error occurred while applying redo to data block (file# 19, block# 445469)
ORA-10564: tablespace APPS_TS_TX_DATA
ORA-01110: data file 19: ‘/u05/appprod/proddata/apps_ts_tx_data07.dbf’
ORA-10561: block type ‘TRANSACTION MANAGED DATA BLOCK’, data object# 1186389
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [19], [445469], [6110], [], [], [], [], [],
[], [], []

Datafile 19, block 445469 tied back to the APPS_TS_TX_DATA tablespace and the object was BOM.CST_ITEM_COST_DETAILS. Yep, it was one of the handful of tables which I’d compressed so far.

The analyst tied it to bug 8277580 and tells me there’s unpublished parts of the bug which mention compression
Bug 8277580 ORA-7445: [__INTEL_NEW_MEMCPY()+44]
RDBMS Ver: 11.1.0.7
O/S: 226 Linux x86-64

What scares me is that the bug was opened 4 months ago. If they drag their feet too long, I guess I uncompress everything, and they send me a refund for Advanced Compression and the year of support I already paid, right?

One note about db_block_checking. It defaults to “FALSE” and the docs say it incurs a 1-10% overhead. With the alternative prospect of silently corrupting my standby (Gah!) I can’t help but think this is a no-brainer to activate. With db_block_checking enabled, the behavior was what I would consider to be appropriate: The standby database managed recovery process stops, and the RDBMS stays up. If you restart the managed recovery process, it dies on the exact same log and with the exact same error.

One follow up thought: At this point, I have to ask myself, do I have a recoverable database? In other words, my assumption so far is that dataguard and the log ship process is somehow corrupting logs due to advanced compression. Interesting trivia note: if you MD5 checksum the same archivelog on the primary and the standby, it will NOT match! I only thought to do this under 11g.. anyone know if they match under 10g? Is this my core problem?

I will have to prove this by restoring a backup of my database and recovering until cancel to roll through a few hundred logs. If, on the other hand, the local archivelogs are being written “wrong”… I’m fooked.

*edit* I’m not getting far with Oracle on this bug. It turns out, when I look at the bug status codes for this issue, it’s essentially unchanged since it was open. I sent this email to my sales rep in hopes of “lighting a fire” as we used to say in the Army:

After further working with support, I’m extremely unhappy with this situation. After we spoke last week, I’d mentioned that this bug had existed since late February 2009. Since then, there have been no significant updates to the bug, and when I looked up the bug status code, I was even more dismayed. According to Doc 16660.1, this bug has been in 1 of 2 statuses since inception:

10: Use to file a bug without all the needed information (for instance, trace files from the customer), to inform Development that a bug will soon become Status 11.

16: Used by BDE to pre-sort well-formed bugs before development start fixing.

This isn’t exactly encouraging! The way I read this is that for 4 months, development has either been trying to gather needed information or pre-sorting the bug before actually working on it. This means that our (certified) configuration has been broken since we bought it.

The only option the analyst gave me was to uncompress my tables, and I’m afraid that I may have to go this route since I have absolutely no feel from support whether this bug is even being worked or when it may be fixed. While I agree this is likely to fix the Dataguard bug, does it come with a refund for Advanced Compression?

*edit* It’s been one month since support positively identified my bug was existing bug 8277580. While there is still no resolution, I am being asked for a bunch of trace files and logs this week, which hopefully at least means someone is looking at it. While they don’t always tie out to the same object or block, the data guard apply always fails with an ORA-0600 and a block reference to a compressed table. Meanwhile, our admin is keeping a live snapshot at our standby site via the SAN; meaning at worst, if our primary failed we could start up the database on the remote end without losing much (if any) information. It requires a huge amount of storage and bandwidth, however, which Dataguard does not.

*edit* Now support is asking for the possibility of uploading/providing all the components necessary to reproduce the bug on their end.  Odd, since they acknowledge the bug, but I’ll be curious to see what comes of this.  I can’t very well upload the entire database to them (at least not easily), and even to send them one offending datafile with logs to reproduce the issue seems difficult.   I break my “data” and “index” datafiles into 8GB’s per, and currently have about 9 of them (last I looked), so at a minimum we’re probably talking about 8-10GB to reproduce this at their end.

Create a free website or blog at WordPress.com.