Tuning Standby Lag with Oracle Active Data Guard
We bought active data guard because of increasing reporting demands on our primary database in our ERP environment. While active data guard doesn’t play nicely with Apps 11i out of the box (it’s read-only, and just to establish a forms session you need to be able to write), it can fill a nice role offloading CPU load by serving up near-real time reports for specific (in our case bolt on) applications which only need to read tables.
What is Near-Real Time?
Once an SLA is established for the “oldest” data the report can return, you can tweak Oracle to honor it. In my case, the example below shows me going from no lag target to 300 seconds and finally to 600 seconds. Note that once you settle on a number, you’d be wise to add a “scope=both” to the end of the alter system.
$ sqlplus / as sysdba SQL*Plus: Release 22.214.171.124.0 Production on Wed Feb 26 10:26:49 2014
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Oracle Database 11g Enterprise Edition Release 126.96.36.199.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> show parameter archive_lag_target;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
archive_lag_target integer 0
SQL> alter system set archive_lag_target = 300;
SQL> alter system set archive_lag_target = 600;
I found it interesting to note that left to “its own devices” the pattern of archivelog ships (and therefore application on the other end)becomes an inverse function of how active your database is. In other words, the recency of your data at your standby is related to how fast you fill your online redo logs (which is also a function of how big they are), plus the odd twist of system-driven logfile switches. Lets say you had 200MB redo logs with a nearly idle system. Your lag can get huge if not tuned!
The graphic below captures the result of SQL:
select to_number(substr(value,instr(value,':',1,2)+1,length(value))) + 60 * to_number(substr(value,instr(value,':',1,1)+1,2)) seconds from v$dataguard_stats@apps_to_dataguard where name = 'apply lag';
The left part of the graphic (before 10:21) shows data transport left untuned. From 10:21 to 11:21 you can see where I had it set to 300 seconds.
From 11:21 onward it’s set to 600 seconds.
Near-Real Time on Steroids: Real Time Apply
Check your licensing, your mileage may vary. The easiest way to keep the standby up to date is to use real-time apply and standby logs. To create standby logs, you go to your standby and cancel recovery:
alter database recover managed standby database cancel;
Next, add your standby logs. They need to be the same size as the online redo logs on the primary. Make N+1 of them on the standby. The syntax looks like this:
alter database add standby logfile group 41 ('/usr/local/oracle/redo/log41b.dbf','/u04/appprod/proddata/log41a.dbf') size 100M;
alter database add standby logfile group 42 ('/usr/local/oracle/redo/log42b.dbf','/u04/appprod/proddata/log42a.dbf') size 100M;
alter database add standby logfile group 43 ('/usr/local/oracle/redo/log43b.dbf','/u04/appprod/proddata/log43a.dbf') size 100M;
alter database add standby logfile group 44 ('/usr/local/oracle/redo/log44b.dbf','/u04/appprod/proddata/log44a.dbf') size 100M;
Finally, restart your recovery with the real-time apply:
alter database recover managed standby database disconnect using current logfile;
Now your apply lag and transport lag drop to zero:
In the graph above, you can see the final version of data apply rates to the standby.
8:58a-10:21a: unmanaged apply rate (mostly happening when a log on the primary got full)
10:21a-11:21a: honoring the 300 second alter system set archive_lag_target=300;
11:21a-4:30p: honoring the 600 second alter system set archive_lag_target=600;
4:30p-end: real time apply started with standby logfile groups (effectively 0).