Kevin Kempf's Blog

July 22, 2010

A really nasty 11i bug

Filed under: 11i, Bugs — kkempf @ 3:32 pm

When is a user not a user anymore?

In 11i, we tend to end-date users’ responsibilities, and assume that’s the end of their ability to run jobs in the Apps.  Not always the case, it turns out.

So there I was, minding my own business

I’d actually seen this bug before, but at the time, when I opened an SR, the analyst “stumbled over” the solution (stop concurrent managers, run cmclean.sql, kill all pending/no manager jobs, start concurrent manager), without identifying the root cause.  So when my 11i front end became near-unresponsive, and I saw this query running at 100% on every CPU, I at least recognized it:

.
SELECT  R.Conc_Login_Id, R.Request_Id, R.Phase_Code, R.Status_Code, P.Application_ID, P.Concurrent_Program_ID, P.Concurrent_Program_Name, R.Enable_Trace, R.Restart, DECODE(R.Increment_Dates, ‘Y’, ‘Y’, ‘N’), R.NLS_Compliant, R.OUTPUT_FILE_TYPE, E.Executable_Name, E.Execution_File_Name, A2.Basepath, DECODE(R.Stale, ‘Y’, ‘C’, P.Execution_Method_Code), P.Print_Flag, P.Execution_Options, DECODE(P.Srs_Flag, ‘Y’, ‘Y’, ‘Q’, ‘Y’, ‘N’), P.Argument_Method_Code, R.Print_Style, R.Argument_Input_Method_Code, R.Queue_Method_Code, R.Responsibility_ID, R.Responsibility_Application_ID, R.Requested_By, R.Number_Of_Copies, R.Save_Output_Flag, R.Printer, R.Print_Group, R.Priority, U.User_Name, O.Oracle_Username, O.Encrypted_Oracle_Password, R.Cd_Id, A.Basepath, A.Application_Short_Name, TO_CHAR(R.Requested_Start_Date,’YYYY/MM/DD HH24:MI:SS’), R.Nls_Language, R.Nls_Territory, DECODE(R.Parent_Request_ID, NULL, 0, R.Parent_Request_ID), R.Priority_Request_ID, R.Single_Thread_Flag, R.Has_Sub_Request, R.Is_Sub_Request, R.Req_Information, R.Description, R.Resubmit_Time, TO_CHAR(R.Resubmit_Interval), R.Resubmit_Interval_Type_Code, R.Resubmit_Interval_Unit_Code, TO_CHAR(R.Resubmit_End_Date,’YYYY/MM/DD HH24:MI:SS’), Decode(E.Execution_File_Name, NULL, ‘N’, Decode(E.Subroutine_Name, NULL, Decode(E.Execution_Method_Code, ‘I’, ‘Y’, ‘J’, ‘Y’, ‘N’), ‘Y’)), R.Argument1, R.Argument2, R.Argument3, R.Argument4, R.Argument5, R.Argument6, R.Argument7, R.Argument8, R.Argument9, R.Argument10, R.Argument11, R.Argument12, R.Argument13, R.Argument14, R.Argument15, R.Argument16, R.Argument17, R.Argument18, R.Argument19, R.Argument20, R.Argument21, R.Argument22, R.Argument23, R.Argument24, R.Argument25, X.Argument26, X.Argument27, X.Argument28, X.Argument29, X.Argument30, X.Argument31, X.Argument32, X.Argument33, X.Argument34, X.Argument35, X.Argument36, X.Argument37, X.Argument38, X.Argument39, X.Argument40, X.Argument41, X.Argument42, X.Argument43, X.Argument44, X.Argument45, X.Argument46, X.Argument47, X.Argument48, X.Argument49, X.Argument50, X.Argument51, X.Argument52, X.Argument53, X.Argument54, X.Argument55, X.Argument56, X.Argument57, X.Argument58, X.Argument59, X.Argument60, X.Argument61, X.Argument62, X.Argument63, X.Argument64, X.Argument65, X.Argument66, X.Argument67, X.Argument68, X.Argument69, X.Argument70, X.Argument71, X.Argument72, X.Argument73, X.Argument74, X.Argument75, X.Argument76, X.Argument77, X.Argument78, X.Argument79, X.Argument80, X.Argument81, X.Argument82, X.Argument83, X.Argument84, X.Argument85, X.Argument86, X.Argument87, X.Argument88, X.Argument89, X.Argument90, X.Argument91, X.Argument92, X.Argument93, X.Argument94, X.Argument95, X.Argument96, X.Argument97, X.Argument98, X.Argument99, X.Argument100, R.number_of_arguments, C.CD_Name, NVL(R.Security_Group_ID, 0)
FROM fnd_concurrent_requests R, fnd_concurrent_programs P, fnd_application A, fnd_user U, fnd_oracle_userid O, fnd_conflicts_domain C, fnd_concurrent_queues Q, fnd_application A2, fnd_executables E, fnd_conc_request_arguments X
WHERE R.Status_code = ‘I’ And ((R.OPS_INSTANCE is null) or (R.OPS_INSTANCE = -1) or (R.OPS_INSTANCE = decode(:dcp_on,1,FND_CONC_GLOBAL.OPS_INST_NUM,R.OPS_INSTANCE))) And R.Request_ID = X.Request_ID(+) And R.Program_Application_Id = P.Application_Id(+) And R.Concurrent_Program_Id = P.Concurrent_Program_Id(+) And R.Program_Application_Id = A.Application_Id(+) And P.Executable_Application_Id = E.Application_Id(+) And P.Executable_Id = E.Executable_Id(+) And P.Executable_Application_Id = A2.Application_Id(+) And R.Requested_By = U.User_Id(+) And R.Cd_Id = C.Cd_Id(+) And R.Oracle_Id = O.Oracle_Id(+) And Q.Application_Id = :q_applid And Q.Concurrent_Queue_Id = :queue_id And (P.Enabled_Flag is NULL OR P.Enabled_Flag = ‘Y’) And R.Hold_Flag = ‘N’ And R.Requested_Start_Date <= Sysdate And ( R.Enforce_Seriality_Flag = ‘N’ OR ( C.RunAlone_Flag = P.Run_Alone_Flag And (P.Run_Alone_Flag = ‘N’ OR Not Exists
(Select Null
From Fnd_Concurrent_Requests Sr
Where Sr.Status_Code In (‘R’, ‘T’) And Sr.Enforce_Seriality_Flag = ‘Y’ And Sr.CD_id = C.CD_Id)))) And Q.Running_Processes <= Q.Max_Processes And R.Rowid = :reqname And ((P.Execution_Method_Code != ‘S’ OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757))) AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40112),(510,40113),(510,40910),(510,40911),(530,42260),(530,42261),(535,40893),(535,40894),(535,40895),(660,44069),(660,44100),(660,44808),(660,44849),(660,46421),(660,46422),(702,33152),(20004,44868),(20009,45000)))) FOR UPDATE OF R.status_code NoWait
SELECT  R.Conc_Login_Id, R.Request_Id, R.Phase_Code, R.Status_Code, P.Application_ID, P.Concurrent_Program_ID, P.Concurrent_Program_Name, R.Enable_Trace, R.Restart, DECODE(R.Increment_Dates, ‘Y’, ‘Y’, ‘N’), R.NLS_Compliant, R.OUTPUT_FILE_TYPE, E.Executable_Name, E.Execution_File_Name, A2.Basepath, DECODE(R.Stale, ‘Y’, ‘C’, P.Execution_Method_Code), P.Print_Flag, P.Execution_Options, DECODE(P.Srs_Flag, ‘Y’, ‘Y’, ‘Q’, ‘Y’, ‘N’), P.Argument_Method_Code, R.Print_Style, R.Argument_Input_Method_Code, R.Queue_Method_Code, R.Responsibility_ID, R.Responsibility_Application_ID, R.Requested_By, R.Number_Of_Copies, R.Save_Output_Flag, R.Printer, R.Print_Group, R.Priority, U.User_Name, O.Oracle_Username, O.Encrypted_Oracle_Password, R.Cd_Id, A.Basepath, A.Application_Short_Name, TO_CHAR(R.Requested_Start_Date,’YYYY/MM/DD HH24:MI:SS’), R.Nls_Language, R.Nls_Territory, DECODE(R.Parent_Request_ID, NULL, 0, R.Parent_Request_ID), R.Priority_Request_ID, R.Single_Thread_Flag, R.Has_Sub_Request, R.Is_Sub_Request, R.Req_Information, R.Description, R.Resubmit_Time, TO_CHAR(R.Resubmit_Interval), R.Resubmit_Interval_Type_Code, R.Resubmit_Interval_Unit_Code, TO_CHAR(R.Resubmit_End_Date,’YYYY/MM/DD HH24:MI:SS’), Decode(E.Execution_File_Name, NULL, ‘N’, Decode(E.Subroutine_Name, NULL, Decode(E.Execution_Method_Code, ‘I’, ‘Y’, ‘J’, ‘Y’, ‘N’), ‘Y’)), R.Argument1, R.Argument2, R.Argument3, R.Argument4, R.Argument5, R.Argument6, R.Argument7, R.Argument8, R.Argument9, R.Argument10, R.Argument11, R.Argument12, R.Argument13, R.Argument14, R.Argument15, R.Argument16, R.Argument17, R.Argument18, R.Argument19, R.Argument20, R.Argument21, R.Argument22, R.Argument23, R.Argument24, R.Argument25, X.Argument26, X.Argument27, X.Argument28, X.Argument29, X.Argument30, X.Argument31, X.Argument32, X.Argument33, X.Argument34, X.Argument35, X.Argument36, X.Argument37, X.Argument38, X.Argument39, X.Argument40, X.Argument41, X.Argument42, X.Argument43, X.Argument44, X.Argument45, X.Argument46, X.Argument47, X.Argument48, X.Argument49, X.Argument50, X.Argument51, X.Argument52, X.Argument53, X.Argument54, X.Argument55, X.Argument56, X.Argument57, X.Argument58, X.Argument59, X.Argument60, X.Argument61, X.Argument62, X.Argument63, X.Argument64, X.Argument65, X.Argument66, X.Argument67, X.Argument68, X.Argument69, X.Argument70, X.Argument71, X.Argument72, X.Argument73, X.Argument74, X.Argument75, X.Argument76, X.Argument77, X.Argument78, X.Argument79, X.Argument80, X.Argument81, X.Argument82, X.Argument83, X.Argument84, X.Argument85, X.Argument86, X.Argument87, X.Argument88, X.Argument89, X.Argument90, X.Argument91, X.Argument92, X.Argument93, X.Argument94, X.Argument95, X.Argument96, X.Argument97, X.Argument98, X.Argument99, X.Argument100, R.number_of_arguments, C.CD_Name, NVL(R.Security_Group_ID, 0)FROM fnd_concurrent_requests R, fnd_concurrent_programs P, fnd_application A, fnd_user U, fnd_oracle_userid O, fnd_conflicts_domain C, fnd_concurrent_queues Q, fnd_application A2, fnd_executables E, fnd_conc_request_arguments XWHERE R.Status_code = ‘I’ And ((R.OPS_INSTANCE is null) or (R.OPS_INSTANCE = -1) or (R.OPS_INSTANCE = decode(:dcp_on,1,FND_CONC_GLOBAL.OPS_INST_NUM,R.OPS_INSTANCE))) And R.Request_ID = X.Request_ID(+) And R.Program_Application_Id = P.Application_Id(+) And R.Concurrent_Program_Id = P.Concurrent_Program_Id(+) And R.Program_Application_Id = A.Application_Id(+) And P.Executable_Application_Id = E.Application_Id(+) And P.Executable_Id = E.Executable_Id(+) And P.Executable_Application_Id = A2.Application_Id(+) And R.Requested_By = U.User_Id(+) And R.Cd_Id = C.Cd_Id(+) And R.Oracle_Id = O.Oracle_Id(+) And Q.Application_Id = :q_applid And Q.Concurrent_Queue_Id = :queue_id And (P.Enabled_Flag is NULL OR P.Enabled_Flag = ‘Y’) And R.Hold_Flag = ‘N’ And R.Requested_Start_Date <= Sysdate And ( R.Enforce_Seriality_Flag = ‘N’ OR ( C.RunAlone_Flag = P.Run_Alone_Flag And (P.Run_Alone_Flag = ‘N’ OR Not Exists(Select NullFrom Fnd_Concurrent_Requests SrWhere Sr.Status_Code In (‘R’, ‘T’) And Sr.Enforce_Seriality_Flag = ‘Y’ And Sr.CD_id = C.CD_Id)))) And Q.Running_Processes <= Q.Max_Processes And R.Rowid = :reqname And ((P.Execution_Method_Code != ‘S’ OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757))) AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40112),(510,40113),(510,40910),(510,40911),(530,42260),(530,42261),(535,40893),(535,40894),(535,40895),(660,44069),(660,44100),(660,44808),(660,44849),(660,46421),(660,46422),(702,33152),(20004,44868),(20009,45000)))) FOR UPDATE OF R.status_code NoWait

What Enterprise Manager was telling me

The database itself was reasonably happy.  I mean it was running a stupid query over and over, but it was doing its job.  It was the application server that was getting killed

As you can see, the application server got slammed about 11:15am.  It was running out of memory, the load was in the 20’s (8 CPUs available) and Linux was starting to kill processes in order to preserver itself.

Root Cause and Fix

If you ever see this query, or possibly this one:

update  fnd_concurrent_requests set phase_code = :phase ,status_code = :status, completion_text = :text where request_id = :reqid

you are likely encountering bug 9109247: SCHEDULED REQUESTS STILL RUN FOR END_DATED USERS 100% CPU.

The immediate fix is to shut down concurrent managers (adcmctl.sh) and look in the concurrent request queue to see who is trying to run what (start with the lowest request ID which is trying to run).  In my case, one user (HFAULLING) had submitted about 20 similar requests.  I cancelled these requests (in fact, all of this users’ requests), can cmclean.sql and restarted the managers.

Turns out, this user was end dated the day this happened.  In other words, they weren’t even a valid Oracle apps user anymore.  But this bug allowed their scheduled concurrent requests to run, get stuck in some crazy endless loop, and kill my system (since he had submitted more requests than I had standard managers and they all queued up at the same time).

In closing

I decided to post this information for two reasons.  One, I think it’s rather random and impossible to predict when you may hit this bug.  It’s my opinion that this should have received a lot more press from Oracle as a recommended patch!  Wouldn’t it be rather embarrassing if it got out that Oracle 11i not only allowed terminated employees to run concurrent requests, but by doing so they could effectively sabotage the system?  Second, it’s a bug which can have catastrophic consequences to the whole ERP.  The first time I hit this bug I extracted the (long) query (above) and submitted it to the analyst asking them what the h*ll it was.  At the time, they had no idea.  Seems like it should be in the various notes related to the bug as well.


Advertisements

Create a free website or blog at WordPress.com.