码迷,mamicode.com
首页 > 数据库 > 详细

数据库常见告警项

时间:2020-02-12 10:38:33      阅读:122      评论:0      收藏:0      [点我收藏+]

标签:better   ORC   init   服务器   abort   cond   drivers   eterm   please   

数据库常见告警项
原创 Hehuyi_In 最后发布于2019-01-04 10:14:08 阅读数 1026 收藏
展开

遇到的错误号,错误原因、处理方法及参考文档小结
数据库类型     报错号     原因     参考     建议
Oracle     ORA-609     

The ORA-609 error is thrown when a client connection of any kind failed to complete or aborted the connection process before the connection/authentication process was complete.


Very often, this connection abort is due to a timeout.  Beginning with 10gR2, a default value for inbound connect timeout has been set at 60 seconds.  This time limit is often inadequate for the entire connection process to complete.
    文档 ID 1116960.1     

1.将SQLNET.INBOUND_CONNECT_TIMEOUT设为120秒(默认60秒)应该能解决大多数ORA-609问题

2.如果该报错发生在实例正在关闭时,可以忽略
Oracle     ORA-46268: Conflicting operation on audit table(s)     Session was interrupted by CTRL+C     https://nazim-dba.blogspot.com/2018/06/error-sql-begin-dbmsauditmgmt.html     可忽略
Oracle     ORA-00700[kskvmstatact]     发生swap     Doc ID 1919850.1     可忽略
Oracle     ORA-00600[1433], [60]     可能与归档进程、mmon、dbwr进程或rac有关,遇到的情况是与MMON 进程有关,遇到Bug 13541842     Doc ID 13541842.8     

数据库挂了后一般可以直接startup

该bug无workaround,需要打对应补丁
Oracle     ORA-03170: deadlocked on readable physical standby (undo segment 401)     

Issue has been reported in   Bug 25883955 - QUERIES FAIL WITH ORA-3170 ON ACTIVE DATA GUARD  --> Closed as Duplicate of  Unpublished Bug 24578056


This undo segment does not exist anymore on primary database.
    Doc ID 2311894.1     

workaround

1. 重启从库,看报错是否还出现

2. 如果主库启用了TEMP_UNDO_ENABLED,将该参数设置为false

3. 建议业务方将报错sql移到主库查询

 

解决方法

安装补丁 24578056
Oracle     ORA-235     

Concurrent update activity on a control file caused a process to read inconsistent information from the control file without a lock.
    文档 ID 2312580.1     可忽略
Oracle     ORA-07286: sksagdi: cannot obtain device information     *Cause:  Stat on the log archiving device failed.
无法进行归档或将归档发送至从库     Doc ID 316281.1     找到无法归档的原因,一般在alter日志中可以看到详细报错。可能是LOG_ARCHIVE_DEST设置错误、可能是主库或从库归档目录(或FRA区)满了,确定后具体处理
Oracle     ORA-48913     MAX_DUMP_FILE_SIZE 设置过低,trace file写满     文档 ID 1153040.1     视情况而定,可以加大MAX_DUMP_FILE_SIZE的值或将其设为unlimited,如果够大了可以忽略
Oracle     ORA-3136     从 10.2.0.1 开始,参数 SQLNET.INBOUND_CONNECT_TIMEOUT 的默认设置是60秒。如果客户端无法在60秒内进行身份验证,警告将出现在警报日志中,客户端连接将被终止。     文档 ID 2331569.1     

通常需要在监听器和数据库中加大 INBOUND CONNECT TIMEOUT 的值。通常建议将数据库(sqlnet.ora)值设置得稍微高于监听器(listener.ora)

 

当oracle性能压力过大时,也可能出现大量此报错
Oracle     ORA-01555     快照过旧           

检查undo大小是否合理,如果合理通常是找到对应的SQL语句进行优化。
Oracle     ORA-00028     session 被kill     Doc ID 1230858.1     可忽略
Oracle     ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_38"     Seed database was most likely not created right by package dbms_stats.init_package not being ran.      文档 ID 2127675.1     

sqlplus / as sysdba

EXEC dbms_stats.init_package();
oracle     ORA-00600[pmuocon2#1:invalid magic number]     自定义汇聚函数bug     Bug 21519686 - ORA-600 [pmuocon2#1: invalid magic num] from SQL using UDAG (used defined aggregate) (Doc ID 21519686.8)     

以下选择其一:

1.打patch 21519686

2.升级至12.2

3.修改隐含参数_odci_aggregate_save_space = true
oracle     

ORA-00600:[qernsRowP], [1], [], [], [], [], [], [], [], [], [], []
    可能是GROUP BY NOSORT hit中了Oracle的bug,但遇到的sql并不符合mos 文档中bug的描述     ID 285913.1     

 Alter session set events ‘10119 trace name context forever, level 1‘;

(注意在系统级别设置这个事件可能会对数据库性能造成影响)
oracle     ORA-1652     

Failed to allocate an extent of the required number of blocks for a temporary segment in the tablespace indicated.
    

文档 ID 1267351.1
NOTE:364417.1
    

增加tempfile或找出使用temp tablespace使用过量的语句,进行优化

 
oracle     CRS-10001 ACFS-9203:true     

ADVM/ACFS device drivers were installed/loaded.
    

oerr acfs 9203
    可忽略
 oracle     ORA-12751      

    There is a database issue that prevents the slave action from completing on time
    There is an issue related to how long the action being performed is taking such that it violates in internal run time policy (i.e. a run time policy violation). For example: A specific SQL or set of SQL statements in a particular thread of operation is taking too long to complete
    A thread of operation exceeds pre-determined CPU usage levels.

    

文档 ID 761298.1
     

When encountering this issue, check for other database issues at the time and investigate those as they could be slowing the actions such that limits are exceeded.

If the database is generally performing slowly then investigate the cause of the slowness, if the database hangs then investigate the hang
oracle     ORA-03137: TTC protocol internal error : [12333] [254] [64] [49] [] [] [] []     merge语句中绑定变量太长     

一个绑定变量太长

ORA-3137[12333] on a MERGE Statement using a Bind Variable Larger than 1000 Bytes(文档 ID 2307683.1)

 

两个绑定变量太长

Merge or Insert is Failing with ORA-3137 [12333](文档 ID 2039740.1)
    

一个绑定变量太长,可以选择以下任一种解决方法:

    升级到18.1
    安装补丁21616967
    将merge语句改写为insert或者update

两个绑定变量太长:

修改merge语句,避免在sql语句中使用两个太长的绑定变量
 oracle      ORA-01274: cannot add data file that was originally created as ‘/path/data01.dbf‘     

 Automated standby file management was disabled, so an added file could not automatically be created on the standby.

 The error from the creation attempt is displayed in another message. The control file file entry for the file is ‘UNNAMEDnnnnn‘.
     文档 ID 739618.1     

 alter database create datafile ‘/oracle/product/GSIPRDGB/dbs/UNNAMED00210‘ as new;

ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO scope=both;

alter database recover managed standby database using current logfile disconnect;
oracle     ora-600[KCBGTCR_13]     在备库上,对表空间的元数据进行检查时,通过调用kcbgcur()内部函数发生异常,系统改变号(System Change Number,SCN)出现问题,备库自身的SCN与主库同步过来的SCN号不能达成一致。     

SR 3-17107383911

Doc ID 18899974.8
    DBA介入,重启备库。如果挂死无法sqlplus登录,参考 Oracle重启步骤 中的(2)异常重启:情况2
Oracle     ORA-04030(koh-kghu sessi,pmuccst: adt/record)     单个进程使用内存超出4G(默认)     Doc ID 1325100.1     

DBA介入,找出消耗内存超过限制的进程。

可以修改以下参数将单个进程使用内存限制提高至16G

For versions 11.2.0.4 and lower:
_use_realfree_heap=TRUE
_realfree_heap_pagesize_hint = 262144

For 12.1 and higher:
_use_realfree_heap=TRUE
_realfree_heap_pagesize = 262144
Oracle     ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired     

1.执行DML时,当要操作的数据已经被加锁,这时在另一个回话中再次要取得这个对象的锁时,新会话要么被挂起,要么抛出ORA-00054异常。


2.当DML在执行中,又同时在相同对象上执行DDL语句。比如Update操作的事务尚未提交,在另一个会话中开始执行表结构修改、变更索引的SQL时,也往往会出现ORA-00054异常。
    文档 ID 1945579.1     

找出占用资源的会话正在执行的语句,联系开发判断能否kill掉释放资源
Oracle     

ORA-27300: OS system dependent operation:fork failed with status: 11


ORA-27301: OS failure message: Resource temporarily unavailable


ORA-27302: failure occurred at: skgpspawn5
    The error messages indicating that oracle has problem in forking more process, the maximum number of PROCESSES allowed per user could be too low     Doc ID 392006.1     

① 调整/etc/security/limits.conf 中ora用户的nproc限制值

② 修改/etc/security/limits.d/90-nproc.conf配置,这是linux 6 中的新特性,在5中没有该文件

RHEL6下引入了配置文件/etc/security/limits.d/90-nproc.conf
只有当使用*号让全局用户生效的时候,生效的nproc的值大小是受文件/etc/security/limits.d/90-nproc.conf中nproc值大小制约的,而如果仅仅是针对某个用户,那么就不受该文件nproc值大小的影响。
Oracle     ORA-07445: 出现异常错误: 核心转储 [qecinisub()+60] [SIGSEGV][ADDR:0x8A8] [PC:0xCCDF3
0C] [Address not mapped to object] []     hit中Oracle bug 21522416,该问题不一定会复现     

SR 3-17358806271

Bug 21522416(内部文档,无法访问)
    

 alter system set optimizer_dynamic_sampling=0;

若问题继续出现,建议联系Oracle Support
Oracle     

ORA-16401: archive log rejected by Remote File Server (RFS)

ORA-16055: FAL request rejected
    主库切换日志过于频繁     文档 ID 1243177.1     

1. Ignore these Messages as long as the Standby Database keeps synchronized with the Primary

2. Database Increase the Size of the Online Redologs to reduce Redolog Switch Frequency

3. Increase Network Bandwith between the Primary and Standby Database
Oracle     PRVG-1101 : SCAN name "cluscan.us.oracle.com" failed to resolve
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "cluscan.us.oracle.com"
PRVF-4657 : Name resolution setup check for "scanclunm" (IP address: 10.4.0.202) failed     

Cause 1. SCAN name is expected to be resolved by local hosts file instead of DNS or GNS

Cause 2. nslookup fails to find record for SCAN name

Cause 3. SCAN name is canonical name(CNAME record) in DNS
    文档 ID 887471.1     如果rac安装时scan name 是通过hosts文件而非DNS或GNS解析,该报错可忽略
Oracle     ORA-04031: unable to allocate 65576 bytes of shared memory ("shared pool","unknown object","ktli log buf s","ktli log bufs")     

一般ORA-04031错误可能由两个原因引起:

1.内存中存在大量碎片,导致在分配内存的时候,没有连续的内存可存放

2.   共享池容量不足
          

对于第一种原因,一般是需要在开发的角度上入手,比如增加绑定变量、缩短sql语句、减少硬解析来改善和避免

对于第二种,需要调整sga及shared_pool相关参数可能还需要扩大内存
Oracle     ORA-00600: internal error code, arguments: [ktecgsc:objdchk_kcbgcur_3]     

原因可能是truncate的并发或者速度太快,来不及将段头的信息更新,就进行了CR(consistent read)校验。这个校验其实是不必要的。在11.2.0.3中进行了这个CR校验,在11.2.0.4中就取消了。
    

SR 3-17769074971

文档 ID 2101512.1

文档 ID 2230425.1

文档 ID 2101512.1
    

以下选择其一:

    升级至11.2.0.4或更高版本
    安装补丁15974138

Oracle     ORA-02063: preceding 2 lines from MES           

https://blog.csdn.net/haiross/article/details/47275965

http://blog.itpub.net/27042095/viewspace-751953/
    
oracle     Ocssd.Bin Process Consumes 100% CPU in only one node of the RAC OneNode environment     Bug 22986384 - OCSSD threads 7,8 and 13 are using large amount of cpu     

Doc ID 22986384.8

SR 3-16564214191
    安装补丁 Patch 22986384: THREADS 7, 8, AND 13 OF OCSSD.BIN ARE USING LARGE AMOUNT OF CPU
Oracle     

ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
    This happens due to less space available for network buffer reservation.     

文档 ID 2041723.1

SR 3-17893167421

 
    
Oracle     ORA-00600: internal error code, arguments: [ktecgsc:objdchk_kcbgcur_3], [1654385], [4], [0], [0], [], [], [], [], [], [], []     

从报错信息看,大部分都是在报错前执行了truncate语句。而truncate table的时候,表的object id不变,但是data object id是会变。

段头中记录的信息,和buffer cache中记录的信息,由于某些原因不一致,导致校验报错了。
原因可能是truncate的并发或者速度太快,来不及将段头的信息更新,就进行了CR(consistent read)校验。这个校验其实是不必要的。在11.2.0.3中进行了这个CR校验,在11.2.0.4中就取消了。     

SR 3-17769074971
    安装补丁 Patch 15974138: UNNECESSARY FORCE CR SEGMENT HEADER REQUEST
Oracle     ORA-00904:"WM_CONCAT":标识符无效     12c中删除了WM_CONCAT函数,改用LISTAGG代替     文档 ID 2215183.1     改用LISTAGG代替
oracle     ORA-07445: 出现异常错误: 核心转储 [kghalp()+51] [SIGSEGV] [ADDR:0x7FC0FF4671DB] [PC:0xCD890C3] [Address not mapped to object] []      The error seems to be hit when allocating memory from the SQL Costing code as we have kkeutlCompHistActVals() -> qksshMemAllocPerm()      

Bug 21522416 - ORA-7445 [KGHALP()+51] [SIGSEGV]
SR 3-17358806271 : What caused ORA-07445:[kghalp()+51] [SIGSEGV] error
    

alter system set optimizer_dynamic_sampling=0;

 

The parameter OPTIMIZER_DYNAMIC_SAMPLING controls the level of sampling performed by the optimizer. The only impact of setting it to 0 will be that at the run time (while an SQL is running), if there is a better plan available, it won‘t switch to the better plan, it will rather stick to the present plan.
Oracle     ORA-28040: 没有匹配的验证协议     9i clients are not supported with Oracle Database 12     

文档 ID 2111118.1

https://blog.csdn.net/ddd306/article/details/42805959
    

以下选择其一:

1. 升级jdbc至11g或12c(官方文档推荐)

2. 在sqlnet.ora文件的最后添加SQLNET.ALLOWED_LOGON_VERSION=8
Oracle     ORA-08102:index key not found,obj#57848,file 6, block 6324(2)     

ORA-08102这种错误说明索引或表出现了数据不一致的,索引上记录的键值和表里的数据不一致,引起访问失败

 

遇到的情况是在online创建索引过程中恰好碰上了CLEANUP_ONLINE_IND_BUILD job 运行。可能是online建索引的时候,会创建一些临时extent,但是CLEANUP_ONLINE_IND_BUILD会清理掉。导致create index online虽然显示是创建成功的,但是实际索引是有问题的。
后续查询的时候,去找这些被job清理的extent去了。所以就找不到索引的对象。

 

查看官方文档这个报错还可能和bug 21532755 有关
    

文档 ID 8102.1

https://www.linuxidc.com/Linux/2014-11/109648.htm

文档 ID 21532755.8

https://chandlerdba.com/2017/05/12/online-index-rebuild-problem-in-12c/
    

法一:

SELECT owner, object_name, object_type
FROM Dba_Objects WHERE object_id IN (57848);

可以查出object_name的名字,然后重建索引

alter index PK_TB_WARE rebuild online; (记得要用rebuild online ,因为他会重新读表来创建索引,而rebuild可能会读取原先的索引段而不会去读表)

注意,这个时候千万不要人为终止,否则会遇到ora-08104

 

如果还不能解决,就删除重新创建

drop index PK_TB_WARE;

create index PK_TB_WARE on tb_ware(id);

 

法二:

analyze table t validate structure cascade

检查表里的行数据的完全性,并检查表或者索引的结构,并把分析过的结果写入INDEX_STATS 数据字典中
Oracle     

控制文件备份遇到报错

ORA-00230: operation disallowed: snapshot control file enqueue unavailable
    控制文件备份需要持有查看持有CF enqueue,但该锁目前被其他会话持有(一般会是其他rman进程或者CKPT进程)     http://ju.outofmemory.cn/entry/179736     

查看持有CF enqueue会话

SELECT s.SID, USERNAME AS "User", PROGRAM, MODULE, ACTION, LOGON_TIME "Logon" FROM V$SESSION s, V$ENQUEUE_LOCK l WHERE l.SID = s.SID AND l.TYPE = ‘CF‘  -- AND l.ID1 = 0 AND l.ID2 = 2;

 

根据查到的进程信息判断是kill掉还是等待当前持锁会话运行完
Oracle     

ORA-00600 [KTSLU_PUA_REMCHK-1]

导致主从同步中断
    

The assert is raised when trying to apply Redo for Secure LOB Segment. Redo was generated with changes introduced by fix for Bug:22905136. This fix was included in 12.1.0.2.170418DBBP. The error is generated because redo generated is not compatible with environments running on a release lower than  12.1.0.2.170418 DBBP or without patch:22905136 installed.

 

遇到的情况是主库打了12.1.0.2.170418 DBBP而从库没打,两边小版本不一致
    ORA-00600:[KTSLU_PUA_REMCHK-1] Could be generated after Applying April 2017 Database Bundle Patch (12.1.0.2.170418 DBBP) (文档 ID 2267842.1)     

两边小版本必须一致,要么没打补丁的库打补丁,要么打了补丁的库回滚;一般第一种比较推荐

 
Oracle     ORA-07445[pesld10_Undo_XREF_Instance()+23]     

This issue is caused by a product defect.

It was investigated in:

     Bug 13554646 - ORA-7445 [PESLD10_UNDO_XREF_INSTANCE()+60]

which was ultimately closed as a duplicate of unpublished Bug 13429702.
    Error in the Alert Log: ORA-7445[pesld10_Undo_XREF_Instance()+4] (文档 ID 1456810.1)     

法一:升级到已修复版本

 

法二:打补丁Patch 13429702
Oracle     ORA-00600: internal error code, arguments: [qcscbndv1], [65535]     sql中绑定变量数超过oracle上限65535,在11.2前该报错为ORA-7445[opiaba]     ORA-600[qcscbndv1], [65535, ORA-600[Kghssgfr2], ORA-600[17112] Instance Failure (文档 ID 1311230.1)     业务方请改sql谢谢
                        
                        
sqlserver     eventlog:BackupIoRequest::ReportIoError: write failure on backup device ‘VNBU0-10424-14500-1538441467‘. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).     NBU备份服务器内存不足导致备份失败           一般下次备份时会成功,如果不成功可以手动发起
sqlserver     

Error 8623

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
    

sql语句过于复杂,例如in中有太多值或者sql嵌套太多层

 

Explicitly including an extremely large number of values (many thousands of values separated by commas) within the parentheses, in an IN clause can consume resources and return errors 8623 or 8632.
    https://docs.microsoft.com/en-us/sql/t-sql/language-elements/in-transact-sql?view=sql-server-2017     

可以利用扩展事件找到对应语句

https://www.cnblogs.com/kerrycode/p/9860653.html

 

To work around this problem, store the items in the IN list in a table, and use a SELECT subquery within an IN clause.
————————————————
版权声明:本文为CSDN博主「Hehuyi_In」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/Hehuyi_In/article/details/85759200

数据库常见告警项

标签:better   ORC   init   服务器   abort   cond   drivers   eterm   please   

原文地址:https://www.cnblogs.com/yaoyangding/p/12297790.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!