适用范围:Oracle Data Guard / Active Data Guard 日常巡检、故障判断、恢复后验证。
适用版本:Oracle 11g / 12c / 18c / 19c 物理备库。
一、文档目标#
本文档用于检查 Oracle ADG 是否同步正常、是否存在传输延迟、应用延迟、归档缺口,以及主库到备库的日志传输目标是否异常。
巡检最终应输出以下结论之一:
| 结论 | 含义 |
|---|---|
| 正常 | 无 archive gap,transport lag=0 或秒级,apply lag=0 或秒级,MRP/RFS 正常 |
| 有传输延迟 | 主库 redo 未及时传到备库,transport lag 较大 |
| 有应用延迟 | redo 已到备库,但 MRP 应用慢,apply lag 较大 |
| 有归档缺口 | v$archive_gap 有记录,备库缺少中间归档 |
| MRP 未启动 | 没有 MRP0,或备库只读但未应用 |
| 主库传输异常 | 主库 v$archive_dest / v$archive_dest_status 有 ERROR |
二、总体判断逻辑#
ADG 巡检不能只看一个视图,要从以下几个维度综合判断:
| 检查维度 | 关键视图 | 正常状态 |
|---|---|---|
| 备库角色 | v$database | DATABASE_ROLE=PHYSICAL STANDBY |
| 备库打开模式 | v$database | MOUNTED 或 READ ONLY WITH APPLY |
| 日志接收 | v$managed_standby / v$dataguard_process | 有 RFS,且对应主库 LGWR/ARCH 连接 |
| 日志应用 | v$managed_standby / v$dataguard_process | 有 MRP0,状态为 APPLYING_LOG 或 WAIT_FOR_LOG |
| 传输延迟 | v$dataguard_stats | transport lag 为 0 或秒级 |
| 应用延迟 | v$dataguard_stats | apply lag 为 0 或秒级 |
| 归档缺口 | v$archive_gap | no rows selected |
| sequence 差距 | v$archived_log | 每个 thread# 的 seq_lag=0 或短时间可追平 |
| 主库传输目标 | v$archive_dest / v$archive_dest_status | STATUS=VALID,ERROR 为空 |
注意:Active Data Guard 实时应用场景下,
v$archived_log.applied可能略滞后于 standby redo log 的实时应用位置,因此需要结合v$managed_standby、v$dataguard_stats、v$archive_gap一起判断。
三、备库侧巡检 SQL#
巡检前 SQL*Plus 格式设置,建议每次巡检前先执行:
set lines 300 pages 1000
set long 20000
set trimspool on
set tab off
col database_role for a20
col open_mode for a25
col protection_mode for a25
col switchover_status for a25
col process for a20
col status for a20
col client_process for a20
col thread# for 999
col sequence# for 999999999
col block# for 999999999
col name for a30
col value for a30
col unit for a25
col time_computed for a30
col datum_time for a30
col dest_name for a20
col type for a10
col database_mode for a20
col recovery_mode for a20
col synchronization_status for a20
col synchronized for a10
col gap_status for a20
col db_unique_name for a20
col dest_name for a30
col destination for a80
col error for a100以下 SQL 优先在备库执行。
3.1 检查备库角色和打开状态#
select database_role,
open_mode,
protection_mode,
switchover_status
from v$database;正常参考:
DATABASE_ROLE OPEN_MODE PROTECTION_MODE SWITCHOVER_STATUS
-------------------- ------------------------- ------------------------- -------------------------
PHYSICAL STANDBY MOUNTED MAXIMUM PERFORMANCE NOT ALLOWED或者:
PHYSICAL STANDBY READ ONLY WITH APPLY MAXIMUM PERFORMANCE NOT ALLOWED判断标准:
| 字段 | 正常/异常判断 |
|---|---|
DATABASE_ROLE | 正常应为 PHYSICAL STANDBY |
OPEN_MODE=MOUNTED | 备库挂载状态,可以正常应用日志 |
OPEN_MODE=READ ONLY WITH APPLY | ADG 可读且实时应用,正常 |
OPEN_MODE=READ ONLY | 只读打开但未应用,需要检查 MRP 是否启动 |
OPEN_MODE=READ WRITE | 不是正常物理备库状态,需要确认是否已 failover/switchover |
3.2 检查 RFS / MRP 进程状态#
select process,
status,
thread#,
sequence#,
block#,
client_process
from v$managed_standby
order by process, thread#;正常参考:
PROCESS STATUS THREAD# SEQUENCE# BLOCK# CLIENT_PROCESS
------------------ ------------------------ ------- ---------- ---------- ----------------
MRP0 APPLYING_LOG 2 62398 27749 N/A
RFS IDLE 1 96120 239800 LGWR
RFS IDLE 2 62398 27754 LGWR判断标准:
| 进程/状态 | 含义 | 判断 |
|---|---|---|
RFS 存在 | 备库正在接收主库 redo | 正常 |
CLIENT_PROCESS=LGWR | 主库 LGWR 直接传输 redo | 正常 |
CLIENT_PROCESS=Archival | 归档方式传输 | 正常 |
MRP0 APPLYING_LOG | 备库正在应用日志 | 正常 |
MRP0 WAIT_FOR_LOG | 已追平,等待新日志 | 正常 |
没有 MRP0 | 日志应用未启动 | 异常 |
MRP0 WAIT_FOR_GAP | 缺少归档日志 | 异常 |
MRP0 的 sequence# / block# 长时间不变 | 可能卡住或 I/O 慢 | 需排查 |
注意:
RFS IDLE不一定异常,可能表示当前没有新的 redo 或已经接收完当前 redo。
3.3 检查 ADG 传输延迟和应用延迟#
select name,
value,
unit,
time_computed,
datum_time
from v$dataguard_stats
where name in ('transport lag',
'apply lag',
'apply finish time',
'estimated startup time');正常参考:
NAME VALUE TIME_COMPUTED DATUM_TIME
------------------------------ ------------------------------ ------------------------------ ------------------------------
transport lag +00 00:00:00 05/15/2026 21:10:06 05/15/2026 21:10:06
apply lag +00 00:00:00 05/15/2026 21:10:06 05/15/2026 21:10:06
apply finish time +00 00:00:00.000 05/15/2026 21:10:06判断标准:
| 指标 | 含义 | 正常判断 |
|---|---|---|
transport lag | 主库 redo 传到备库的延迟 | 0 或秒级 |
apply lag | 备库应用 redo 的延迟 | 0 或秒级 |
apply finish time | 预计追平还需要多久 | 0 或很短 |
datum_time | 统计数据采样时间 | 应接近当前时间 |
异常示例:
apply lag +92 07:52:39说明备库应用延迟已经超过 92 天,属于严重不同步。
注意:如果存在 archive gap,
transport lag和apply finish time可能不准确,应优先以v$archive_gap为准。
3.4 检查是否存在归档缺口#
select * from v$archive_gap;正常结果:
no rows selected异常示例:
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# CON_ID
------- ------------- -------------- ----------
1 88228 94296 1
2 56221 60831 1含义:
| Thread | 缺失范围 | 说明 |
|---|---|---|
| thread 1 | 88228 ~ 94296 | 缺少 thread 1 的中间归档 |
| thread 2 | 56221 ~ 60831 | 缺少 thread 2 的中间归档 |
只要 v$archive_gap 有记录,就不能认为 ADG 同步正常。
3.5 检查每个 thread 的接收和应用差距#
select thread#,
max(sequence#) as max_received_seq,
max(case when applied in ('YES','IN-MEMORY') then sequence# end) as max_applied_seq,
max(sequence#) -
max(case when applied in ('YES','IN-MEMORY') then sequence# end) as seq_lag
from v$archived_log
where resetlogs_change# = (select resetlogs_change# from v$database)
group by thread#
order by thread#;正常参考:
THREAD# MAX_RECEIVED_SEQ MAX_APPLIED_SEQ SEQ_LAG
------- ---------------- --------------- ----------
1 96119 96119 0
2 62397 62397 0判断标准:
seq_lag | 判断 |
|---|---|
| 0 | 已追平 |
| 1~2 | 可能是实时应用/归档标记滞后,需要结合 v$managed_standby 判断 |
| 持续增长 | 应用延迟或归档未连续应用 |
| 某个 thread 差距大 | RAC 某个线程存在传输或应用问题 |
3.6 检查最近应用到的时间点#
select thread#,
max(sequence#) as last_applied_seq,
to_char(max(next_time), 'yyyy-mm-dd hh24:mi:ss') as last_applied_time
from v$archived_log
where applied in ('YES','IN-MEMORY')
and resetlogs_change# = (select resetlogs_change# from v$database)
group by thread#
order by thread#;输出示例:
THREAD# LAST_APPLIED_SEQ LAST_APPLIED_TIME
------- ---------------- --------------------
1 96119 2026-05-15 21:09:44
2 62397 2026-05-15 21:09:44用途:
- 确认备库数据大致应用到哪个业务时间点;
- 当
apply lag异常时,用于判断落后时间范围; - RAC 环境下分别确认每个 thread 的应用时间。
3.7 查看最近归档日志应用明细#
select thread#,
sequence#,
applied,
to_char(first_time, 'yyyy-mm-dd hh24:mi:ss') as first_time,
to_char(next_time, 'yyyy-mm-dd hh24:mi:ss') as next_time,
to_char(completion_time, 'yyyy-mm-dd hh24:mi:ss') as completion_time
from (
select a.*,
row_number() over(partition by thread# order by sequence# desc) rn
from v$archived_log a
where resetlogs_change# = (select resetlogs_change# from v$database)
)
where rn <= 10
order by thread#, sequence# desc;用途:
- 查看每个 thread 最近 10 个归档日志是否已应用;
- 检查
APPLIED是否为YES或IN-MEMORY; - 辅助判断某个 thread 是否落后。
四、主库侧巡检 SQL#
巡检前 SQL*Plus 格式设置,建议每次巡检前先执行:
set lines 300 pages 1000
set long 20000
set trimspool on
set tab off
col database_role for a20
col open_mode for a25
col protection_mode for a25
col switchover_status for a25
col process for a20
col status for a20
col client_process for a20
col thread# for 999
col sequence# for 999999999
col block# for 999999999
col name for a30
col value for a30
col unit for a25
col time_computed for a30
col datum_time for a30
col dest_name for a20
col type for a10
col database_mode for a20
col recovery_mode for a20
col synchronization_status for a20
col synchronized for a10
col gap_status for a20
col db_unique_name for a20
col dest_name for a30
col destination for a80
col error for a100以下 SQL 在主库执行,用于确认主库到备库的日志传输目标是否正常。
4.1 检查主库归档目标状态#
select dest_id,
dest_name,
status,
type,
database_mode,
recovery_mode,
db_unique_name,
synchronization_status,
synchronized,
gap_status,
error
from v$archive_dest_status
where status <> 'INACTIVE'
and type = 'PHYSICAL'
order by dest_id;正常参考:
DEST_ID STATUS TARGET DATABASE_MODE RECOVERY_MODE ERROR
------- ------- ------- ---------------- ------------------------- -----
2 VALID STANDBY OPEN_READ-ONLY MANAGED REAL TIME APPLY判断标准:
| 字段 | 正常判断 |
|---|---|
STATUS | VALID |
TARGET | STANDBY |
RECOVERY_MODE | MANAGED REAL TIME APPLY 或 MANAGED |
ERROR | 为空 |
5.2 检查主库归档目标配置和错误信息#
select dest_id,
dest_name,
status,
destination,
error
from v$archive_dest
where status <> 'INACTIVE'
and destination is not null
order by dest_id;正常参考:
DEST_ID DEST_NAME STATUS DESTINATION ERROR
------- --------------------- -------- ----------------- -----
1 LOG_ARCHIVE_DEST_1 VALID USE_DB_RECOVERY_FILE_DEST
2 LOG_ARCHIVE_DEST_2 VALID standby_tns如果 ERROR 不为空,需要优先处理主库日志传输问题。
5.3 主库查看当前归档 sequence#
select thread#,
max(sequence#) as current_archived_seq
from v$archived_log
where resetlogs_change# = (select resetlogs_change# from v$database)
group by thread#
order by thread#;用途:
- 和备库
max_received_seq对比; - 判断主库已经生成到哪个归档 sequence;
- RAC 环境下必须分别看每个
thread#。
5.4 主库查看当前 online redo sequence#
select thread#,
group#,
sequence#,
status,
archived
from v$log
order by thread#, group#;用途:
- 查看主库当前正在写哪个 online redo;
- 和备库 RFS/MRP 当前 sequence 对比;
- 判断实时应用是否接近当前日志。
五、一键汇总巡检 SQL#
以下 SQL 建议保存为 adg_check.sql,在备库执行。
set lines 300 pages 1000
set trimspool on
set tab off
prompt ======================================================================
prompt 1. DATABASE ROLE / OPEN MODE
prompt ======================================================================
col database_role for a20
col open_mode for a25
col protection_mode for a25
col switchover_status for a25
select database_role,
open_mode,
protection_mode,
switchover_status
from v$database;
prompt ======================================================================
prompt 2. MANAGED STANDBY PROCESS
prompt ======================================================================
col process for a18
col status for a24
col client_process for a16
select process,
status,
thread#,
sequence#,
block#,
client_process
from v$managed_standby
order by process, thread#;
prompt ======================================================================
prompt 3. DATAGUARD LAG
prompt ======================================================================
col name for a30
col value for a30
col unit for a25
col time_computed for a30
col datum_time for a30
select name,
value,
unit,
time_computed,
datum_time
from v$dataguard_stats
where name in ('transport lag',
'apply lag',
'apply finish time',
'estimated startup time');
prompt ======================================================================
prompt 4. ARCHIVE GAP
prompt ======================================================================
select * from v$archive_gap;
prompt ======================================================================
prompt 5. RECEIVED/APPLIED SEQUENCE LAG BY THREAD
prompt ======================================================================
col max_received_seq for 999999999
col max_applied_seq for 999999999
col seq_lag for 999999999
select thread#,
max(sequence#) as max_received_seq,
max(case when applied in ('YES','IN-MEMORY') then sequence# end) as max_applied_seq,
max(sequence#) -
max(case when applied in ('YES','IN-MEMORY') then sequence# end) as seq_lag
from v$archived_log
where resetlogs_change# = (select resetlogs_change# from v$database)
group by thread#
order by thread#;
prompt ======================================================================
prompt 6. LAST APPLIED TIME BY THREAD
prompt ======================================================================
col last_applied_time for a20
select thread#,
max(sequence#) as last_applied_seq,
to_char(max(next_time), 'yyyy-mm-dd hh24:mi:ss') as last_applied_time
from v$archived_log
where applied in ('YES','IN-MEMORY')
and resetlogs_change# = (select resetlogs_change# from v$database)
group by thread#
order by thread#;执行方式:
sqlplus / as sysdba @adg_check.sql如需保存日志:
spool /tmp/adg_check_&&_DATE.log
@adg_check.sql
spool off六、异常场景判断与处理 SOP#
6.1 场景一:v$archive_gap 有记录#
异常表现:
select * from v$archive_gap;输出类似:
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
------- ------------- --------------
1 88228 94296
2 56221 60831判断:
备库缺少中间归档,MRP 无法连续恢复到最新位置。处理路径:
- 在主库确认缺失归档是否还存在;
- 如果归档存在,复制到备库并注册;
- 如果归档不存在或缺口太大,使用 RMAN 增量推进备库;
- 重启 MRP;
- 重新验证
v$archive_gap、apply lag、seq_lag。
主库查询缺失归档是否存在:
select thread#,
sequence#,
name,
deleted,
status,
to_char(first_time, 'yyyy-mm-dd hh24:mi:ss') first_time,
to_char(next_time, 'yyyy-mm-dd hh24:mi:ss') next_time
from v$archived_log
where resetlogs_change# = (select resetlogs_change# from v$database)
and (
(thread# = 1 and sequence# between 88228 and 94296)
or (thread# = 2 and sequence# between 56221 and 60831)
)
order by thread#, sequence#;备库注册归档:
alter database recover managed standby database cancel;
alter database register physical logfile '/u01/arch_gap/thread1/1_88228_xxx.arc';
alter database register physical logfile '/u01/arch_gap/thread2/2_56221_xxx.arc';
alter database recover managed standby database using current logfile disconnect from session;批量生成注册脚本:
find /u01/arch_gap -type f | sort | awk '{print "alter database register physical logfile '\''" $0 "'\'';"}' > /tmp/register_arch.sql
sqlplus / as sysdba @/tmp/register_arch.sql6.2 场景二:归档缺口太大或归档已丢失#
适用情况:
- 缺失归档数量很多;
- 主库 FRA 已清理;
- RMAN 备份中恢复归档成本过高;
- 手工补归档时间太长。
推荐处理:使用 RMAN 增量推进备库。
方式 A:18c/19c 从主库服务恢复备库#
备库执行:
rman target /recover standby database from service '主库TNS服务名';执行前确认备库能连接主库:
tnsping 主库TNS服务名
sqlplus sys/密码@主库TNS服务名 as sysdba方式 B:传统 RMAN incremental from SCN#
备库查询最低 SCN:
select min(checkpoint_change#) as standby_min_scn
from v$datafile_header;主库基于该 SCN 做增量备份:
run {
allocate channel c1 device type disk;
allocate channel c2 device type disk;
backup incremental from scn <standby_min_scn> database
format '/u01/standby_incr/standby_incr_%U.bkp';
release channel c1;
release channel c2;
}
backup current controlfile for standby format '/u01/standby_incr/standby_control_%U.bkp';拷贝到备库:
scp /u01/standby_incr/* oracle@备库主机:/u01/standby_incr/备库恢复:
rman target /startup force nomount;
restore standby controlfile from '/u01/standby_incr/standby_control_xxx.bkp';
alter database mount;
catalog start with '/u01/standby_incr/';
recover database noredo;恢复完成后启动 MRP:
alter database recover managed standby database using current logfile disconnect from session;6.3 场景三:没有 MRP0#
异常表现:
v$managed_standby 中没有 MRP0处理:
alter database recover managed standby database using current logfile disconnect from session;如果不使用实时应用:
alter database recover managed standby database disconnect from session;再次检查:
select process,
status,
thread#,
sequence#,
block#,
client_process
from v$managed_standby
order by process, thread#;正常应看到:
MRP0 APPLYING_LOG或:
MRP0 WAIT_FOR_LOG6.4 场景四:主库传输目标报错#
主库检查:
select dest_id,
dest_name,
status,
destination,
error
from v$archive_dest
where target = 'STANDBY'
order by dest_id;如果 ERROR 不为空,按以下方向排查:
| 问题方向 | 检查内容 |
|---|---|
| 网络 | 主备监听、端口、主机名解析、防火墙 |
| TNS | tnsping、tnsnames.ora、service_name |
| 密码文件 | 主备 SYS 密码、orapw 文件是否一致 |
| 归档目标 | log_archive_dest_n 配置是否正确 |
| 备库状态 | 备库实例是否启动到 mount/open 状态 |
| FRA/磁盘空间 | 主库或备库归档目录是否满 |
常用测试:
tnsping standby_tns
sqlplus sys/密码@standby_tns as sysdba6.5 场景五:传输正常但应用慢#
现象:
RFS正常;transport lag不大;apply lag持续增长;max_received_seq > max_applied_seq。
处理方向:
- 检查备库 I/O 性能;
- 检查备库 CPU / 内存压力;
- 检查是否有大量 redo 需要应用;
- 检查 standby redo log 是否合理;
- 检查 alert 日志是否有 MRP 报错。
操作系统检查:
top
vmstat 1 10
iostat -x 1 10重点关注:
| 指标 | 异常判断 |
|---|---|
%util | 接近 100%,磁盘忙 |
await | 明显升高,I/O 响应慢 |
%iowait | 长时间较高,系统等待 I/O |
| MRP 进程 CPU | 长时间高,说明正在追日志;长时间无变化则可能卡住 |
七、恢复后验证 SOP#
ADG 异常处理完成后,必须连续验证以下项目。
7.1 验证无归档缺口#
select * from v$archive_gap;正常:
no rows selected7.2 验证延迟归零#
select name,
value,
time_computed,
datum_time
from v$dataguard_stats
where name in ('transport lag','apply lag','apply finish time');正常:
transport lag +00 00:00:00
apply lag +00 00:00:00
apply finish time +00 00:00:00.0007.3 验证每个 thread 追平#
select thread#,
max(sequence#) as max_received_seq,
max(case when applied in ('YES','IN-MEMORY') then sequence# end) as max_applied_seq,
max(sequence#) -
max(case when applied in ('YES','IN-MEMORY') then sequence# end) as seq_lag
from v$archived_log
where resetlogs_change# = (select resetlogs_change# from v$database)
group by thread#
order by thread#;正常:
THREAD# MAX_RECEIVED_SEQ MAX_APPLIED_SEQ SEQ_LAG
------- ---------------- --------------- -------
1 96119 96119 0
2 62397 62397 07.4 验证 MRP/RFS 状态#
select process,
status,
thread#,
sequence#,
block#,
client_process
from v$managed_standby
order by process, thread#;正常:
MRP0 APPLYING_LOG 或 WAIT_FOR_LOG
RFS IDLE / RECEIVING,CLIENT_PROCESS 为 LGWR 或 Archival7.5 建议观察周期#
只有持续满足以下条件,才能认为恢复稳定:
- v$archive_gap = no rows selected
- transport lag = 0 或秒级
- apply lag = 0 或秒级
- 每个 thread# 的 seq_lag = 0 或短时间可追平
- MRP0/RFS 状态正常
- 主库传输目标无 ERROR
八、常用命令速查#
8.1 启动实时应用#
alter database recover managed standby database using current logfile disconnect from session;8.2 停止日志应用#
alter database recover managed standby database cancel;8.3 启动非实时应用#
alter database recover managed standby database disconnect from session;8.4 主库强制归档当前日志#
alter system archive log current;8.5 主库切换日志#
alter system switch logfile;8.6 备库注册单个归档#
alter database register physical logfile '/path/to/archive.arc';8.7 查看 alert 日志#
adrci
show homes
set home diag/rdbms/<db_unique_name>/<instance_name>
show alert -tail 200或:
tail -200f $ORACLE_BASE/diag/rdbms/*/*/trace/alert_*.log