ogg ckpt表过大导致sysaux表空间满问题分析-利来国际app

杨卓 2021-11-21



2.1 估算sysaux表空间数据增长的情况

sql> select ts# from v$tablespace where name='sysaux' and rownum=1;   
select a."used_g",b."used_g",a."used_g"-b."used_g" as "7_day" from
 (select round(sum(max(tablespace_usedsize)*8192/1024/1024/1024),4) as "used_g",
1 id
       from dba_hist_tbspc_space_usage where 
          trunc(to_date(rtime,'mm/dd/yyyy hh24:mi:ss'))=trunc(sysdate)
		   and tablespace_id=1
      group by tablespace_id) a,
  (select round(sum(max(tablespace_usedsize)*8192/1024/1024/1024),4) as "used_g",
1 id
       from dba_hist_tbspc_space_usage where 
          trunc(to_date(rtime,'mm/dd/yyyy hh24:mi:ss'))=trunc(sysdate)-5
		   and tablespace_id=1
      group by tablespace_id) b where a.id=b.id;
    used_g     used_g      7_day
---------- ---------- ----------
   51.8245    11.9203    39.9042

2.2 找到top segments

select * from (
select owner,segment_name,segment_type,round(bytes/1024/1024/1024) as "bytes_g"
 from dba_segments where tablespace_name='sysaux'
 and bytes>1*1024*1024*100
 ) a  order by a.bytes_g;
owner                segment_name                   segment_type          bytes_g
-------------------- ------------------------------ ------------------ ----------
system               logmnr_col$                    table partition             0
sys                  sys_lob0000011147c00038$$      lobsegment                  0
system               logmnr_i2col$                  index partition             0
mdsys                sys_lob0000064076c00006$$      lobsegment                  0
system               logmnr_col$                    table partition             0
system               logmnr_col$                    table partition             0
system               logmnr_col$                    table partition             0
system               logmnr_col$                    table partition             0
system               logmnr_col$                    table partition             0
system               logmnr_col$                    table partition             0
system               logmnr_i2col$                  index partition             0
system               logmnr_i2col$                  index partition             0
system               logmnr_i2col$                  index partition             0
system               logmnr_i2col$                  index partition             0
system               logmnr_i2col$                  index partition             0
system               logmnr_i2col$                  index partition             0
system               sys_lob0000001462c00009$$      lobsegment                  1
system               logmnr_restart_ckpt$_pk        index                       3
system               logmnr_restart_ckpt$           table                      45
19 rows selected.

2.3 查询一下top表信息

sql> select table_name,partitioned from dba_tables
 where table_name='logmnr_restart_ckpt$';
table_name                     par
------------------------------ ---
logmnr_restart_ckpt$           no
select * from system.logmnr_restart_ckpt$ where rownum<=20;
---- ---------- ---------- ----------
         1          1 8.9662e 11        131  
       33       9421        6648      47838
0000022000000000000001000000      2          1
其实基本上了解到这个是非分区表! 并且这个表有45g的大小占用!

2.4 检索相关资料

sysaux tablespace getting heavily utilized when extracts are running (doc id 2802325.1)

oracle goldengate - version and later
sysaux gets filled up when extract is started and continuously grows. 
stopping the extracts stops the growth and the tablespace releases
 the space after sometime but as soon as the extracts are started again
 the tablespace again gets filled up.
excessive checkpointing by the extracts
please check the logminer checkpoint interval using alert log.
for example here, the alert log indicates the logminer checkpoint is 10m. 
this is far too small and will cause excessive checkpointing.
logminer: memory size = 999m, checkpoint interval = 10m
this value can be influenced by setting
tranlogoptions integratedparams (max_sga_size 1000)
in which case it will be set to 5 times this value
e.g. logminer: memory size = 999m, checkpoint interval = 5000m
increasing this value will decrease the checkpointing.

sysaux 空间使用问题故障排除(文档 id 1399365.1)

system.logmnr_restart_ckpt$ grows quite fast (doc id 735071.1)
这篇文章描述,oracle db版本是10.2~11.1的情况下,ckpt基表数据增长很快,默认检查点也是10m写入,可以修改_checkpoint_frequency 到1000,从而降低检查点的写入频率,其实和上述ogg 18.1原理是一样的,只是具体操作不一样。

oracle database - enterprise edition 
- version to [release 10.2 to 11.1]
streams capture setup is defined and two symptoms have been observed:
1) system.logmnr_restart_ckpt$ table grows daily and the number
 of rows in the table just grow
2) excessive archive log generated in this database,
 when starting capture
observe if the value of capture parameter _checkpoint_frequency is below 10,
 which is the default value for this parameter.
this parameter control how often the logminer session associated
 to the capture is going to do a logminer checkpoint,
 so having a value of 10 means that after 10mb of redo activity
 we do a logminer checkpoint.
this causes lots of entries on table system.logmnr_restart_ckpt$
 and causes that the streams tables grows, also this causes delays
 on the streams performance.
in order to conduct this database to a normal situation, please do:
1) increase capture parameter _checkpoint_frequency to 1000 , by doing:
exec dbms_capture_adm.stop_capture ('capture');
exec dbms_capture_adm.start_capture ('capture');
note that the capture may take some time to start as there are lot of
 checkpoints to manage.
the aim of this is to reduce the number of logminer checkpoint and
 therefore reduce the number of entries on logmnr_checkpoint_entries
 table and also reduce the consume and increase performance.
value 1000 for _checkpoint_frequency is a recommended value for
 releases 10gr2 and 11g, according to metalink note 418755.1
2) during next 10 days shrink table system.logmnr_restart_ckpt$
 and associated indexes, to reduce
the amount of space consumed by this table.
execute at least once in a day, although recommended is three times in a day, 
the steps documented on metalink note 429599.1 to do the shrink:

how to reduce the highwater of logmnr_restart_ckpt$ (doc id 429599.1)
那这一篇文章,就是讲述如何清理这个基表,从10.2的版本开始,oracle 有参数checkpoint_retention_time 定期进行清理,默认是60天,当然每个版本都存在差异,不要被这个60天吓到了。

oracle database - enterprise edition 
- version to [release 10.2 to 11.2]
oracle streams enables the sharing of data and events in a data stream,
 either within a database or from one database to another.
 this article is intended to provide information regarding
 the management of logmnr_restart_ckpt$ table.
periodically, the mining process checkpoints itself for quicker restart. 
these checkpoint information is maintained in the sysaux tablespace by default. 
from oracle 10.2 onwards, the purging of logmnr_restart_ckpt$ is done
 automatically by oracle. there is a capture parameter
 checkpoint_retention_time that determines how 
frequently the purge occurs. 
checkpoint_retention_time, controls the amount of checkpoint data that is
 retained by moving the first_scn of the capture process forward.
 when the checkpoint_retention_time 
is exceeded (default = 60 days), the first_scn is moved and the streams metadata
 tables previous to this scn(first_scn) can be purged.
 space in the sysaux tablespace should be reclaimed at this time. 
you can alter checkpoint_retention_time to lesser value to purge 
the metatdata tables more frequently using the following syntax :
exec dbms_capture_adm.alter_capture
(capture_name =>'strmadmin_capture ',checkpoint_retention_time=>7);
it is also useful to tune _checkpoint_frequency appropriately in order
 to minimise the checkpoint information stored and this should be mentioned.
alter table system.logmnr_restart_ckpt$ enable row movement;
alter table system.logmnr_restart_ckpt$ shrink space ;
alter table system.logmnr_restart_ckpt$ modify lob (ckpt_info) (shrink space);
alter table system.logmnr_restart_ckpt$ disable row movement;
alter index  shrink space;

2.5 问题处理

1)降低ogg ckpt写入频率

修改前,查询db alert,发现很不规律,有写入频率低的,也有默认的10m

[[email protected] trace]$ cat alert_cdbprod1.log |grep memory
logminer: memory size = 8195m, checkpoint interval = 40980m
logminer: memory size = 999m, checkpoint interval = 10m
备注:写入间隔高的设置了参数max_sga_size 4098


tranlogoptions integratedparams(max_sga_size 1000, parallelism 2)

观察db alert

logminer: memory size = 999m, checkpoint interval = 5000m


tranlogoptions integratedparams(max_sga_size 2000, parallelism 12)

2)减少ogg ckpt保留时间

本次ogg19.1 for db 19.3 ogg有7个抽取进程,默认保留时间是7天,调整为5天!

select /*  parallel(12,a) */ count(*) from system.logmnr_restart_ckpt$ a;


【利来手机国际的版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:[email protected]进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。