一、逐步实现自动化的基本步骤

1. 手动巡检

最初阶段,多数DBA习惯于将常用的SQL语句总结记录下来,在日常巡检的时候,会在数据库中执行常用的SQL语句,然后以抓屏方式将结果复制出来,然后逐条检查分析。这种方法只适合于管理少量数据库,一旦数据库数量增加,工作量会非常大。

2. 脚本巡检

升级阶段,具备脚本语言开发功底的DBA会逐步将常用的SQL语句编制为固定SQL脚本,然后通过操作系统可执行的脚本语言(例如:KSH、BASH),通过Shell脚本去调用SQL脚本,并把执行结果写入日志文件,后期通过日志文件去进行检查分析。这种方法可管理较多数据库,但是需要频繁切换到不同的数据库服务器,容易误操作,尤其是操作系统平台不统一的场景。

3. 自动化巡检

最终阶段,具备Python、Expect等语言开发功底的DBA会逐步将常前面的成果物积淀下来,然后通过Python脚本的方式去自动调用各个系统的Shell脚本,从而批量完成所有数据库的巡检过程,并将所有数据库的巡检日志传输至集中位置。再通过文本过滤工具(例如:AWK)或者VBA对日志内容进行过滤筛选,最后对过滤后的内容进行最终分析。这种方法可以通过集中管理服务器对所有数据库服务器进行批量操作,并且消除了操作系统平台不一致带来的问题。

二、需要获取的数据库诊断信息

1. 基本信息

基本信息根据不同的数据库环境会有所差异,但是有些最基本的信息是必须的。如数据库名称、实例名称、唯一标识信息、系统版本、地址信息、数据库创建时间、数据库启动时间、数据库版本信息、数据库补丁信息、字符集、实例状态、数据库归档模式等。以下脚本(for Oracle)供参考:

SELECT name AS db_name,

dbid,

created,

platform_name,

db_unique_name,

log_mode,

guard_status,

force_logging,

flashback_on

FROM v$database;

SELECT instance_name,

inst_id,

host_name,

TO_CHAR(startup_time, 'YYYY/MM/DD HH24:MI:SS') AS startup_time,

status,

SYSDATE - STARTUP_TIME AS Running_Time

FROM gv$instance

ORDER BY inst_id;

2. 存储信息

存储信息包括逻辑存储信息和物理存储信息,具体包含数据库的表空间、数据文件、磁盘三个维度。表空间需要收集表空间名称,类型,区管理方式,区初始大小,段的管理方式,总大小,剩余大小,剩余百分比;数据文件需要收集对应的表空间,是否自动扩展,当前值,最大值等;磁盘信息需要收集磁盘组的序号、名称、冗余类型、总大小、剩余大小、使用频繁的数据大小、镜像需要的空间、可用于镜像的大小、offline的磁盘数量等。以下脚本(for Oracle)供参考:

SELECT c.tablespace_name,

c.contents,

c.extent_management ext_mgmt,

c.allocation_type alloc_type,

c.initial_extent / 1024 ext_kb,

c.segment_space_management SSM,

nvl(a.total_gb, 0) total_gb,

decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) free_gb,

100 * decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) / nvl(a.total_gb, 1) free_pct

FROM (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb

FROM dba_data_files

GROUP BY tablespace_name

UNION all

SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb

FROM dba_temp_files

GROUP BY tablespace_name) a,

(SELECT tablespace_name,

nvl(sum(bytes) / 1073741824, 0) free_gb,

0 used_blocks

FROM dba_free_space

GROUP BY tablespace_name

UNION all

SELECT tablespace_name,

0 free_gb,

nvl(sum(used_blocks), 0) used_blocks

FROM gv$sort_segment

GROUP BY tablespace_name) b,

dba_tablespaces c

WHERE c.tablespace_name = b.tablespace_name(+)

AND c.tablespace_name = a.tablespace_name(+)

ORDER BY c.contents, free_pct, c.tablespace_name;

SELECT file_id,

file_name,

tablespace_name,

autoextensible,

bytes / 1073741824 as current_gb,

maxbytes / 1073741824 as max_gb

FROM dba_data_files

UNION

SELECT file_id,

file_name,

tablespace_name as ts_name,

autoextensible,

bytes / 1073741824 as cur_gb,

maxbytes / 1073741824 as max_gb

FROM dba_temp_files

ORDER BY tablespace_name, file_id, file_name;

SELECT group_number,

name,

type,

total_mb,

free_mb,

hot_used_mb,

required_mirror_free_mb,

usable_file_mb,

offline_disks

FROM v$asm_diskgroup_stat;

3. 日志信息

日志信息主要是对联机日志、快速恢复日志相关动态信息的统计分析。具体包括联机日志的组号、线程号、序列号、文件路径、大小、状态、归档状态、组内成员数量;归档日志产生日期,产生的大小,和文件数等;每天日志切换的量以及切换的频率。以下脚本(for Oracle)供参考:

SELECT t2.member,

t1.group#,

t1.thread#,

t1.sequence#,

t1.bytes / 1024 / 1024 AS SIZE_MB,

t1.status,

t1.archived,

t1.members

FROM v$log t1, v$logfile t2

WHERE t1.group# = t2.group#

ORDER BY thread#, group#;

SELECT lpad(to_char(first_time, 'yyyymmdd'), 12) "DATE",

trunc(sum(blocks * block_size) / 1024 / 1024) "SIZE(MB)",

count(*)

FROM v$archived_log

WHERE first_time > sysdate - 6

AND creator = 'ARCH'

GROUP BY lpad(to_char(first_time, 'yyyymmdd'), 12)

ORDER BY 1;

SELECT SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) Day,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22 ,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23,

COUNT(*) TOTAL

FROM v$log_history a

WHERE first_time>=to_char(sysdate-7)

GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5)

ORDER BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) DESC;

SELECT thread#,

sequence#,

to_char(first_time, 'MM/DD/RR HH24:MI:SS')

FROM v$log_history

WHERE thread# = 1

AND first_time > sysdate - 1

ORDER BY first_time DESC;

SELECT thread#,

sequence#,

to_char(first_time, 'MM/DD/RR HH24:MI:SS')

FROM v$log_history

WHERE thread# = 2

AND first_time > sysdate - 1

ORDER BY first_time DESC;

4. 备份信息

备份信息主要是查看数据库备份任务的执行情况。主要包括备份任务的会话信息标识,备份内容,开始时间,结束时间,消耗时间。以下脚本(for Oracle)供参考:

SELECT session_key,

start_time,

end_time,

status,

time_taken_display tt

FROM v$rman_backup_job_details

WHERE start_time > sysdate -1

ORDER BY session_key;

5. 性能信息

数据库性能诊断分析的基本目标分两个方面,一方面要看数据库本身跟性能有关的资源使用情况是否正常,另外一方面就是要抓取异常的会话、事件、SQL等。因此,这部分内容首先要收集数据库缓存使用情况,然后要收集异常会话以及相关进程的资源参数,24小时内CPU等待最长的事件,执行解析最多的SQL语句等。以下脚本(for Oracle)供参考:

SELECT free_space,

avg_free_size,

used_space,

avg_used_size,

request_failures,

last_failure_size

FROM v$shared_pool_reserved;

SELECT *

FROM (SELECT t.sid,

t.serial#,

trunc(sysdate - logon_time) AS online_time,

t.PROGRAM,

t.status,

t.LOGON_TIME,

t.sql_id,

t.prev_sql_id,

t.event

FROM gv$session t

WHERE t.type <> 'BACKGROUND' AND program is not null

ORDER BY logon_time)

WHERE rownum <= 30;

SELECT *

FROM gv$resource_limit

WHERE trim(limit_value) != 'UNLIMITED';

SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser

FROM (SELECT c.USERNAME, a.event, to_char(a.cnt) as seconds, a.sql_id, dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext

FROM (SELECT rownum rn, t.*

FROM (SELECT decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu') Event, s.sql_id, s.user_id, count(*) cnt

FROM v$active_session_history s

WHERE sample_time > sysdate - 1

GROUP BY s.user_id, decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu'), s.sql_id

ORDER BY cnt DESC) t

WHERE rownum < 20) a, v$sqlarea b, dba_users c

WHERE a.sql_id = b.sql_id AND a.user_id = c.user_id

ORDER BY cnt DESC) t, v$session s

WHERE t.sql_id = s.sql_id(+);

SELECT *

FROM (SELECT sql_id, sql_text, s.executions, s.last_load_time, s.first_load_time, s.disk_reads, s.buffer_gets, s.parse_calls

FROM v$sql s

ORDER BY s.executions DESC)

WHERE rownum <= 10;

SELECT *

FROM (SELECT sql_id, sql_text, s.EXECUTIONS, s.LAST_LOAD_TIME, s.FIRST_LOAD_TIME, s.DISK_READS, s.BUFFER_GETS, s.PARSE_CALLS

FROM v$sql s

ORDER BY s.PARSE_CALLS DESC)

WHERE rownum <= 10;

6. 容灾信息

所谓容灾信息就是指数据库有容灾配置模式的场景,比如Oracle的Data Guard。如果有相关的配置,则需要检查主备库的同步是否异常,主要通过归档日志的同步信息来判断分析。这个时候需要获取归档目标名称、状态、数据库当前模式、目的地路径等静态信息,需要获取每个节点应用日志和归档日志的最大号,SCN最大值和最小值等相关信息。以下脚本(for Oracle)供参考:

SELECT dest_name,

status,

database_mode,

destination

FROM v$archive_dest_status

WHERE dest_id in ('1','2');

SELECT m.thread#,

m.sequence#,

first_change#,

next_change#

FROM v$log_history m,

(SELECT thread#, max(sequence#) as sequence#

FROM v$log_history

GROUP BY thread#) t

WHERE m.thread# = t.thread#

AND m.sequence# = t.sequence#;

SELECT UNIQUE thread# AS thread,

MAX(sequence#) OVER (PARTITION BY thread#) AS last

FROM v$archived_log;

三、实现自动化脚本集成

1. 将基本SQL文转换成可用SQL脚本

通过手动执行SQL命令的方式,查询结果可以展现在屏幕上。但是以脚本后台模式执行的时候,就需要将查询结果格式化之后输入到结果文件当中。可以采用文本文件或者HTML文件。采用文件文件的时候,需要对输出结果进行美观易读方面的格式化,想再升级为HTML文件时,SQL脚本中要加HTML头。例如以下是格式化之前和之后的对比:

PROMPT

XX公司数据库巡检报告

PROMPT

数据库基本信息

COLUMN log_mode FOR a12

COLUMN guard_status FOR a10

COLUMN force_logging FOR a15

COLUMN flashback_on FOR a15

COLUMN db_unique_name FOR a10

COLUMN platform_name FOR a20

SELECT name AS db_name,

dbid,

created,

platform_name,

db_unique_name,

log_mode,

guard_status,

force_logging,

flashback_on

FROM v$database;

2. 通过Shell脚本调用SQL脚本

如果单纯通过Shell脚本完成对SQL脚本的调用还是有些单薄。因此Shell脚本的设计需要完成两方面的功能,一方面需要收集操作系统相关的诊断信息,另外一方面要定义SQL脚本执行的初始化参数,完成对SQL脚本的调用执行。对于操作系统级别的诊断信息收集,无非是在Shell当中调用操作系统命令,如:uptime、vmstat、free、df、sar、iostat;对于SQL脚本的调用,需要完善相关初始化信息,如位置信息(脚本位置、输入文件位置、数据库安装目录等相关信息),时间信息(执行日期、时间),连接必要信息(用户、数据库名称、标识等)。以下调用过程脚本片段供参考:

function _dbinfo(){

cd $PWDDIRECTORY

su - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <

conn / as sysdba;

@$LOGDIR/../dailyhealthycheck_withadg.sql

exit;

EOF"

}

3. 通过Python脚本实现集中自动化调用

对于这个步骤实现的功能有两个重点,一个是完成集中化的优势,一个是完成自动化交互的功能。集中化的优势需要在循环语句当中完成。自动化交互的功能本来就是Python之类脚本语言的优势。只是我们在调用的时候需要考虑的初始化参数的定义和确认。以下调用过程脚本片段供参考:

def ssh_cmd(ip, user, passwd, cmd):

ret = -1

ssh = pexpect.spawn('ssh %s@%s "%s"' % (user,ip, cmd),timeout=120)

try:

i = ssh.expect(['password:', 'continue connecting (yes/no)?'], timeout=5)

if i == 0 :

ssh.sendline(passwd)

elif i == 1:

ssh.sendline('yes\n')

ssh.expect('password: ')

ssh.sendline(passwd)

ssh.sendline(cmd)

r = ssh.read()

print (r)

ret = r

except pexpect.EOF:

print ("EOF")

ssh.close()

ret = -1

except pexpect.TIMEOUT:

print ("TIMEOUT")

ssh.close()

ret = -2

return ret

4. 工具化产品化打造过程

大部分数据库管理员设计脚本工具都是为了便于自己的日常管理工作,基本上不会考虑到工具的可维护性、健壮性、规范化以及未来的扩展性。如果想让自己辛苦设计出来的脚本最终可以转化为自动化运维工具,那么还要考虑到以下三个方面的问题:

(1). 遵循开发者规范,将脚本的设计趋向于标准化、模块化。例如脚本注释标准化,利用函数模块化设计,变量定义规范化。

(2). 每一个执行步骤要考虑到执行前的确认步骤,执行后的反馈步骤,执行中的异常处理场景。

(3). 所有参数输入尽量采用变量化设计,所有常量以文件方式隔离出脚本本身。

四、参考脚本

-- -----------------------------------------------------------------------------------

-- File Name : dailyhealthycheck_withadg_pdb.sql

-- Author : haizdl@126.com

-- Description : Daily Healthy Checking for Oracle Rac Database.

-- Requirements : Access to the V$ views.

-- Call Syntax : by script "healthyCheck_forLinux.sh" or "healthyCheck1_forAIX.sh"

-- Last Modified: 23/05/2017

-- -----------------------------------------------------------------------------------

SET MARKUP HTML ON SPOOL ON ENTMAP OFF PREFORMAT OFF

SET TERM OFF

SET HEADING ON

SET VERIFY OFF

SET FEEDBACK OFF

SET LINE 55555

SET PAGES 999999

SET LONG 999999999

SET LONGCHUNKSIZE 999999

-- -----------------------------------------------------------------------------------

-- SECTION: 巡检脚本初始化

-- -----------------------------------------------------------------------------------

COLUMN dbid new_value spool_dbid

COLUMN inst_num new_value spool_inst_num

SELECT dbid

FROM v$database

WHERE rownum = 1;

SELECT instance_number AS inst_num

FROM v$instance

WHERE rownum = 1;

COLUMN spoolfile_name new_value spoolfile

SELECT 'spool_'||(SELECT instance_name FROM v$instance WHERE rownum=1)||'_'||TO_CHAR(SYSDATE,'yy-mm-dd_hh24.mi')||'_daily' AS spoolfile_name

FROM dual;

spool &&spoolfile..html

PROMPT

XXX数据库日巡检报告

-- -----------------------------------------------------------------------------------

-- SECTION: 数据库基本信息

-- -----------------------------------------------------------------------------------

PROMPT

数据库基本信息汇总

/*

数据库标识、数据库名、创建日期、平台名称、唯一名称、归档模式等信息。

*/

PROMPT

数据库状态

COLUMN log_mode FOR a12

COLUMN guard_status FOR a10

COLUMN force_logging FOR a15

COLUMN flashback_on FOR a15

COLUMN db_unique_name FOR a10

COLUMN platform_name FOR a20

SELECT name AS db_name,

dbid,

created,

platform_name,

db_unique_name,

log_mode,

guard_status,

force_logging,

flashback_on

FROM v$database;

/*

实例的序号、名称、主机名、启动时间、状态、运行时间等。

*/

PROMPT

数据库实例基本信息

CLEAR COLUMNS

SET LINE 200

COLUMN host_name FOR A50

SELECT instance_name,

inst_id,

host_name,

TO_CHAR(startup_time, 'YYYY/MM/DD HH24:MI:SS') AS startup_time,

status,

SYSDATE - STARTUP_TIME AS Running_Time

FROM gv$instance

ORDER BY inst_id;

-- -----------------------------------------------------------------------------------

-- SECTION: 联机重做日志信息

-- -----------------------------------------------------------------------------------

PROMPT

redo信息

/*

联机日志的组号、线程号、序列号、文件路径、大小、状态、归档状态、组内成员数量。

*/

PROMPT

数据库联机日志

CLEAR COLUMNS

SET LINE 200

SET PAGES 1000

COL status FOR a30

COL member FOR a45

SELECT t2.member,

t1.group#,

t1.thread#,

t1.sequence#,

t1.bytes / 1024 / 1024 AS SIZE_MB,

t1.status,

t1.archived,

t1.members

FROM v$log t1, v$logfile t2

WHERE t1.group# = t2.group#

ORDER BY thread#, group#;

/*

(即可分析6天的波度,又可分析24小时内,可很容易看出异常情况)。

*/

PROMPT

最近7天中每天日志切换的量

SELECT SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) Day,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22 ,

SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23,

COUNT(*) TOTAL

FROM v$log_history a

WHERE first_time>=to_char(sysdate-7)

GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5)

ORDER BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) DESC;

/*

日志切换频率分析(注意观察各行里first_time之间的时间差异会不会很短,很短就是切换过频繁)。

*/

PROMPT

日志切换频率分析

SET LINE 200

SET PAGES 1000

SELECT thread#,

sequence#,

to_char(first_time, 'MM/DD/RR HH24:MI:SS')

FROM v$log_history

WHERE thread# = 1

AND first_time > sysdate - 1

ORDER BY first_time DESC;

SELECT thread#,

sequence#,

to_char(first_time, 'MM/DD/RR HH24:MI:SS')

FROM v$log_history

WHERE thread# = 2

AND first_time > sysdate - 1

ORDER BY first_time DESC;

-- -----------------------------------------------------------------------------------

-- SECTION: 归档日志信息

-- -----------------------------------------------------------------------------------

PROMPT

归档日志信息

/*

归档日志产生日期,产生的大小,和文件数等。

*/

PROMPT

查询归档产生状况

SELECT lpad(to_char(first_time, 'yyyymmdd'), 12) "DATE",

trunc(sum(blocks * block_size) / 1024 / 1024) "SIZE(MB)",

count(*)

FROM v$archived_log

WHERE first_time > sysdate - 6

AND creator = 'ARCH'

GROUP BY lpad(to_char(first_time, 'yyyymmdd'), 12)

ORDER BY 1;

-- -----------------------------------------------------------------------------------

-- SECTION: 空间使用

-- -----------------------------------------------------------------------------------

PROMPT

空间信息

/*

表空间名称,类型,区管理方式,区初始大小,段的管理方式,总大小,剩余大小,剩余百分比。

*/

PROMPT

数据库表空间信息

CLEAR COLUMNS

COLUMN tablespace_name FOR a20

COLUMN contents FOR a9

COLUMN ext_mgmt FOR a12

COLUMN alloc_type FOR a9

COLUMN ext_kb FOR 9999999

COLUMN ssm FOR a10

COLUMN total_gb FOR 99999999.99

COLUMN free_gb FOR 99999999.99

COLUMN free_pct FOR 999.99

SET PAGES 100

SELECT c.tablespace_name,

c.contents,

c.extent_management ext_mgmt,

c.allocation_type alloc_type,

c.initial_extent / 1024 ext_kb,

c.segment_space_management SSM,

nvl(a.total_gb, 0) total_gb,

decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) free_gb,

100 * decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) / nvl(a.total_gb, 1) free_pct

FROM (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb

FROM dba_data_files

GROUP BY tablespace_name

UNION all

SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb

FROM dba_temp_files

GROUP BY tablespace_name) a,

(SELECT tablespace_name,

nvl(sum(bytes) / 1073741824, 0) free_gb,

0 used_blocks

FROM dba_free_space

GROUP BY tablespace_name

UNION all

SELECT tablespace_name,

0 free_gb,

nvl(sum(used_blocks), 0) used_blocks

FROM gv$sort_segment

GROUP BY tablespace_name) b,

dba_tablespaces c

WHERE c.tablespace_name = b.tablespace_name(+)

AND c.tablespace_name = a.tablespace_name(+)

ORDER BY c.contents, free_pct, c.tablespace_name;

PROMPT

数据库表空间与数据文件

/*

表空间文件对应的表空间,是否自动扩展,当前值,最大值。

*/

SET LINE 200

SET PAGES 100

COLUMN file_name FOR A65

COLUMN tablespace_name FOR A30

SELECT file_id,

file_name,

tablespace_name,

autoextensible,

bytes / 1073741824 as current_gb,

maxbytes / 1073741824 as max_gb

FROM dba_data_files

UNION

SELECT file_id,

file_name,

tablespace_name as ts_name,

autoextensible,

bytes / 1073741824 as cur_gb,

maxbytes / 1073741824 as max_gb

FROM dba_temp_files

ORDER BY tablespace_name, file_id, file_name;

-- -----------------------------------------------------------------------------------

-- SECTION: 存储信息

-- -----------------------------------------------------------------------------------

PROMPT

ASM信息

/*

磁盘组的序号、名称、冗余类型、总大小、剩余大小、使用频繁的数据大小、镜像需要的空间、可用于镜像的大小、offline的磁盘数量。

*/

PROMPT

查询ASM磁盘组信息

CLEAR COLUMNS

COLUMN group_number FOR 9999999999

COLUMN name FOR A20

COLUMN type FOR A20

COLUMN total_mb FOR 9999999999

COLUMN free_mb FOR 9999999999

COLUMN hot_used_mb FOR 9999999999

COLUMN required_mirror_free_mb FOR 9999999999

COLUMN usable_file_mb FOR 9999999999

COLUMN offline_disks FOR 9999999999

SELECT group_number,

name,

type,

total_mb,

free_mb,

hot_used_mb,

required_mirror_free_mb,

usable_file_mb,

offline_disks

FROM v$asm_diskgroup_stat;

/*

磁盘组对应磁盘的信息。

*/

PROMPT

查询ASM磁盘组存储信息

CLEAR COLUMNS

COLUMN g_number FOR 9999

COLUMN d_number FOR 9999

COLUMN g_name FOR A8

COLUMN d_name FOR A15

COLUMN g_total_mb FOR 99999999

COLUMN d_total_mb FOR 99999999

COLUMN path FOR A30

COLUMN h_status FOR A8

COLUMN g_free_mb FOR 99999999

COLUMN d_free_mb FOR 99999999

SELECT g.group_number g_number,

g.name g_name,

g.total_mb g_total_mb,

g.free_mb g_free_mb,

g.state g_state,

d.disk_number d_number,

d.name d_name,

d.path,

d.total_mb d_total_mb,

d.free_mb d_free_mb,

d.header_status h_status,

d.mode_status m_status

FROM v$asm_diskgroup_stat g, v$asm_disk_stat d

WHERE g.group_number = d.group_number

ORDER BY g.group_number, d.disk_number;

-- -----------------------------------------------------------------------------------

-- SECTION: 闪回区信息

-- -----------------------------------------------------------------------------------

PROMPT

闪回信息

/*

文件类型、空间使用百分比、可回收百分比、文件的数量。

*/

PROMPT

闪回区空间使用状况

CLEAR COLUMNS

SET LINE 100

SELECT *

FROM v$flash_recovery_area_usage;

-- -----------------------------------------------------------------------------------

-- SECTION: 备份信息

-- -----------------------------------------------------------------------------------

PROMPT

数据库备份信息

/*

会话信息标识,备份内容,开始时间,结束时间,消耗时间。

*/

PROMPT

查询备份信息

CLEAR COLUMNS

SET PAGES 1000

COLUMN time_taken_display FOR a10

COLUMN start_time FOR a20

COLUMN end_time FOR a20

COLUMN status FOR a15

COLUMN tt FOR a10

SELECT session_key,

start_time,

end_time,

status,

time_taken_display tt

FROM v$rman_backup_job_details

WHERE start_time > sysdate -1

ORDER BY session_key;

-- -----------------------------------------------------------------------------------

-- SECTION: 会话信息

-- -----------------------------------------------------------------------------------

PROMPT

session情况

/*

查询登录异常的会话。

*/

PROMPT

登录时间最长的会话

SET LINE 500

CLEAR COLUMNS

COLUMN spid for a15

COLUMN sql_id for a20

COLUMN program for a40

COLUMN event for a35

SELECT *

FROM (SELECT t.sid,

t.serial#,

trunc(sysdate - logon_time) AS online_time,

t.PROGRAM,

t.status,

t.LOGON_TIME,

t.sql_id,

t.prev_sql_id,

t.event

FROM gv$session t

WHERE t.type <> 'BACKGROUND' AND program is not null

ORDER BY logon_time)

WHERE rownum <= 30;

PROMPT

查看进程资源限制

/*

资源所在的实例,资源的名称,当前大小,上次启动以来的最大值,初始大小

*/

CLEAR COLUMNS

SET PAGES 100

SET LINE 200

COLUMN resource_name FOR a30

COLUMN current_utilization FOR 9999999

COLUMN max_utilization FOR 9999999

COLUMN initial_allocation FOR a18

COLUMN limit_value FOR a15

SELECT *

FROM gv$resource_limit

WHERE trim(limit_value) != 'UNLIMITED';

-- -----------------------------------------------------------------------------------

-- SECTION: 等待事件

-- -----------------------------------------------------------------------------------

prompt 24小时内CPU等待最长的事件

CLEAR COLUMNS

SET PAGES 100

SET LINE 800

COLUMN user_id for a45

COLUMN SQL_ID for a30

COLUMN Event for a45

COLUMN sql_id for a45

COLUMN user_id for a45

COLUMN cnt for a45

COLUMN machine for a45

COLUMN program for a45

SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser

FROM (SELECT c.USERNAME, a.event, to_char(a.cnt) as seconds, a.sql_id, dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext

FROM (SELECT rownum rn, t.*

FROM (SELECT decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu') Event, s.sql_id, s.user_id, count(*) cnt

FROM v$active_session_history s

WHERE sample_time > sysdate - 1

GROUP BY s.user_id, decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu'), s.sql_id

ORDER BY cnt DESC) t

WHERE rownum < 20) a, v$sqlarea b, dba_users c

WHERE a.sql_id = b.sql_id AND a.user_id = c.user_id

ORDER BY cnt DESC) t, v$session s

WHERE t.sql_id = s.sql_id(+);

-- -----------------------------------------------------------------------------------

-- SECTION: 内存信息

-- -----------------------------------------------------------------------------------

PROMPT

04031报错

PROMPT

04031报错(出错时,检查shared_pool_reserved池是否有报错)

SELECT free_space,

avg_free_size,

used_space,

avg_used_size,

request_failures,

last_failure_size

FROM v$shared_pool_reserved;

-- -----------------------------------------------------------------------------------

-- SECTION: SQL信息

-- -----------------------------------------------------------------------------------

PROMPT

当前SQL

PROMPT

执行次数最多

SET LINE 300

SELECT *

FROM (SELECT sql_id, sql_text, s.executions, s.last_load_time, s.first_load_time, s.disk_reads, s.buffer_gets, s.parse_calls

FROM v$sql s

ORDER BY s.executions DESC)

WHERE rownum <= 10;

PROMPT

解析次数最多

SET LINE 300

SELECT *

FROM (SELECT sql_id, sql_text, s.EXECUTIONS, s.LAST_LOAD_TIME, s.FIRST_LOAD_TIME, s.DISK_READS, s.BUFFER_GETS, s.PARSE_CALLS

FROM v$sql s

ORDER BY s.PARSE_CALLS DESC)

WHERE rownum <= 10;

-- -----------------------------------------------------------------------------------

-- SECTION: ADG

-- -----------------------------------------------------------------------------------

PROMPT

ADG信息

/*

归档目标名称、状态、数据库当前模式、目的地路径

*/

PROMPT

归档目的地状态

column dest_name FOR a20

column status FOR a10

column database_mode FOR a20

column destination FOR a20

SELECT dest_name,

status,

database_mode,

destination

FROM v$archive_dest_status

WHERE dest_id in ('1','2');

/*

每个节点,应用日志的最大号,和scn最大值和最小值

*/

PROMPT

查询最大的应用日志号

CLEAR COLUMNS

SET LINE 150

SELECT m.thread#,

m.sequence#,

first_change#,

next_change#

FROM v$log_history m,

(SELECT thread#, max(sequence#) as sequence#

FROM v$log_history

GROUP BY thread#) t

WHERE m.thread# = t.thread#

AND m.sequence# = t.sequence#;

/*

查询每个节点最大的归档日志号

*/

PROMPT

查询最大的归档日志号

SET LINE 100

SELECT UNIQUE thread# AS thread,

MAX(sequence#) OVER (PARTITION BY thread#) AS last

FROM v$archived_log;

/*

PROMPT

查询GAP

备库缺少日志的起始号码

clear columns

set line 200

SELECT THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE# FROM V$ARCHIVE_GAP;

*/

PROMPT
Back to Top

PROMPT

End of Report

PROMPT

SPOOL OFF

exit;

#!/bin/bash

#----------------------------------------------------------------------------------------------------+

# HAIZDL@126.COM |

#----------------------------------------------------------------------------------------------------|

# COPYRIGHT (C) 1998-2015 HAIZDL ALL RIGHTS RESERVED. |

#----------------------------------------------------------------------------------------------------|

# DATABASE : ORACLE |

# FILE : healthyCheck1_forLinux.sh |

# CLASS : DATABASE ADMINISTRATION |

# PURPOSE : COLLECTING LOGS FOR ORACLE DATABASE DAILY CHECKING. |

# NOTE : AS WITH ANY CODE, ENSURE TO TEST THIS SCRIPT IN A DEVELOPMENT |

# ENVIRONMENT BEFORE ATTEMPTING TO RUN IT IN PRODUCTION. |

#----------------------------------------------------------------------------------------------------+

#----------------------------------------------

#CHECK JOB OPERATOR USER INFO.

#----------------------------------------------

function _userVerify(){

USER=`whoami`

if [ "$USER" != "root" ]; then

echo "PLS USE ROOT USER FOR THIS JOB."

echo "USAGE: $0"

exit 1

fi

}

#----------------------------------------------

#BASIC VARIABLE.

#----------------------------------------------

HOST=`hostname`

PWDDIRECTORY=/tmp/ora

DATESTRING=`date '+%Y%m%d'`

LOGDIR=$PWDDIRECTORY/$DATESTRING$HOST

ASMSID=+ASM1

#LOG DIRECTORY PARAMETER

OINSTALLDIRECTORY=/home/u01/app

RDBMSDIRECTORY=$OINSTALLDIRECTORY/oracle/diag/rdbms

GRIDLOGDIRECTORY=$OINSTALLDIRECTORY/11.2.0.4/grid/log

GRIDDIRECTORY=$OINSTALLDIRECTORY/grid

ALERTOFCLUSTER=$GRIDLOGDIRECTORY/$HOST/alert$HOST.log

ALERTOFLISTENER=$GRIDDIRECTORY/diag/tnslsnr/$HOST/listener/trace/listener.log

ALERTOFLISTENERSCAN=$GRIDLOGDIRECTORY/diag/tnslsnr/$HOST/listener_scan1/trace/listener_scan1.log

ALERTOFASM=$GRIDDIRECTORY/diag/asm/+asm/$ASMSID/trace/alert_$ASMSID.log

OPATCHORACLEDIR=$OINSTALLDIRECTORY/oracle/product/11.2.0.4/dbhome_1/OPatch

OPATCHGRIDDIR=$OINSTALLDIRECTORY/11.2.0.4/grid/OPatch

#DIAGNOSTIC FILES

LOGOFOSINFO=$LOGDIR/os/info_os_`date '+%y%m%d%H%M'`

LOGOFCLUSTER=$LOGDIR/cluster/info_cluster_`date '+%y%m%d%H%M'`

DBFILE=$LOGDIR/dbs.tmp

#----------------------------------------------

#CREATE LOG INFO DIRECTORY.

#----------------------------------------------

function _initialization(){

if [ -d $LOGDIR ]; then

echo "INIT: LOG DIRECTORY EXISTS, DELETING."

rm -rf $LOGDIR

fi

echo "INIT: LOG DIRECTORY CREATING."

mkdir $LOGDIR

mkdir $LOGDIR/os

mkdir $LOGDIR/cluster

mkdir $LOGDIR/db

chmod -R 777 $LOGDIR

if [ $? -ne 0 ]; then

echo "INIT: LOG DIRECTORY CREATING ERROR!"

exit 1

fi

}

#----------------------------------------------------------------------------------------------------|

# 1. CLUSTER LOG. |

#----------------------------------------------------------------------------------------------------|

function _clusterlog(){

cd $PWDDIRECTORY

if [ -e $ALERTOFCLUSTER ]; then

ALERTOFCLUSTERNUM=$(($(awk '{print NR}' $ALERTOFCLUSTER |tail -n1) - $(cat -n $ALERTOFCLUSTER| grep "`date -d last-day +%Y-%m-%d|cut -c 1-10`" | awk 'BEGIN {min=99999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))

if [ $ALERTOFCLUSTERNUM -gt 0 ]; then

tail -$ALERTOFCLUSTERNUM $ALERTOFCLUSTER >>$LOGDIR/cluster/alert$HOST_`date '+%y%m%d'`.log

if [ $? -ne 0 ]; then

echo " JOB-1-1: CLUSTER ALERT LOG COPY ERROR!"

exit 1

fi

fi

fi

if [ -e $ALERTOFLISTENER ]; then

ALERTOFLISTENERNUM=$(($(awk '{print NR}' $ALERTOFLISTENER |tail -n1) - $(cat -n $ALERTOFLISTENER| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))

if [ $ALERTOFLISTENERNUM -gt 0 ]; then

tail -$ALERTOFLISTENERNUM $ALERTOFLISTENER >>$LOGDIR/cluster/alert_listener_$HOST_`date '+%y%m%d'`.log

if [ $? -ne 0 ]; then

echo " JOB-1-2: CLUSTER ALERT LOG COPY ERROR!"

exit 1

fi

fi

fi

if [ -e $ALERTOFLISTENERSCAN ]; then

ALERTOFLISTENERSCANNUM=$(($(awk '{print NR}' $ALERTOFLISTENERSCAN |tail -n1) - $(cat -n $ALERTOFLISTENERSCAN| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))

if [ $ALERTOFLISTENERSCANNUM -gt 0 ]; then

tail -$ALERTOFLISTENERSCANNUM $ALERTOFLISTENERSCAN >>$LOGDIR/cluster/alert_listener_scan1_`date '+%y%m%d'`.log

if [ $? -ne 0 ]; then

echo " JOB-1-3: CLUSTER ALERT LOG COPY ERROR!"

exit 1

fi

fi

fi

if [ -e $ALERTOFASM ]; then

ALERTOFASMNUM=$(($(awk '{print NR}' $ALERTOFASM |tail -n1) - $(cat -n $ALERTOFASM| grep "`date -d last-day|cut -c 1-8``date -d last-day +%d`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))

if [ $ALERTOFASMNUM -gt 0 ]; then

tail -$ALERTOFASMNUM $ALERTOFASM >>$LOGDIR/cluster/alert_$ASMSID_`date '+%y%m%d'`.log

if [ $? -ne 0 ]; then

echo " JOB-1-4: CLUSTER ALERT LOG COPY ERROR!"

exit 1

fi

fi

fi

echo " " >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

}

#----------------------------------------------------------------------------------------------------|

# 2. OS INFO. |

#----------------------------------------------------------------------------------------------------|

function _osinfo(){

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- UPTIME:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

uptime >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- VMSTAT:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

vmstat 1 5 >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- POWERMT:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

powermt display dev=all >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- FREE:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

free -m >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- DF:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

df -h >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- SAR-PAGE:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

sar -B 1 5 >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- SAR-CPU:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

sar -u 1 5 >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- IOSTAT:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

iostat -mx 1 5 | grep power >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

echo "- SAR-NETWORK:" >>$LOGOFOSINFO

echo "---------------------------------------------------------------" >>$LOGOFOSINFO

sar -n DEV 1 5 | grep bond >>$LOGOFOSINFO

echo " " >>$LOGOFOSINFO

if [ $? -ne 0 ]; then

echo " JOB-2-1: OS INFO ERROR!"

exit 1

fi

LOGOFSYS=/var/log/messages

if [ -e $LOGOFSYS ]; then

cp $LOGOFSYS $LOGDIR/os

if [ $? -ne 0 ]; then

echo " JOB-2-2: OS INFO ERROR!"

exit 1

fi

fi

}

#----------------------------------------------------------------------------------------------------|

# 3. CLUSTER INFO. |

#----------------------------------------------------------------------------------------------------|

function _clusterinfo(){

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo "- 查看集群资源状态:" >>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

su - grid -c "crsctl status res -t " >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo "- 查看集群服务状态:" >>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

su - grid -c "crsctl check cluster -all" >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo "- 查看集群监听状态:" >>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

echo "#1 本地监听 ">>$LOGOFCLUSTER

su - grid -c "lsnrctl status" >>$LOGOFCLUSTER

echo "#2 SCAN监听 " >>$LOGOFCLUSTER

su - grid -c "lsnrctl status LISTENER_SCAN1" >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo "- 查看磁盘组信息及状态 ">>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo "- #1 ">>$LOGOFCLUSTER

su - grid -c "asmcmd lsdg" >>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo "- 查看OCR信息 ">>$LOGOFCLUSTER

echo "---------------------------------------------------------------" >>$LOGOFCLUSTER

echo "- #1 心跳盘 ">>$LOGOFCLUSTER

su - grid -c "crsctl query css votedisk" >>$LOGOFCLUSTER

echo "- #2 一致性检查 " >>$LOGOFCLUSTER

su - grid -c "ocrcheck" >>$LOGOFCLUSTER

echo " " >>$LOGOFCLUSTER

}

#----------------------------------------------------------------------------------------------------|

# 4. DB INFO. |

#----------------------------------------------------------------------------------------------------|

function _dbinfo(){

cd $PWDDIRECTORY

su - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <

conn / as sysdba;

@$LOGDIR/../dailyhealthycheck_withadg.sql

exit;

EOF"

}

#----------------------------------------------------------------------------------------------------|

# 5. DB LOG. |

#----------------------------------------------------------------------------------------------------|

function _dblog(){

cd $PWDDIRECTORY

DBNAME2=$1

DBNAME2=${DBNAME2%?}

ALERTOFDB=$RDBMSDIRECTORY/$DBNAME2/$1/trace/alert_$1.log

if [ -e $ALERTOFDB ]; then

ALERTOFDBNUM=$(($(awk '{print NR}' $ALERTOFDB |tail -n1) - $(cat -n $ALERTOFDB| grep "`date -d last-day|cut -c 1-8``date -d last-day +%d`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1))

if [ $ALERTOFDBNUM -gt 0 ]; then

tail -$ALERTOFDBNUM $ALERTOFDB >>$LOGDIR/db/alert_$1_`date '+%y%m%d_%H:%M:%S'`.log

if [ $? -ne 0 ]; then

echo " JOB-5: DB LOG ERROR!"

exit 1

fi

fi

fi

}

#----------------------------------------------------------------------------------------------------|

# 6. RMAN INFO. |

#----------------------------------------------------------------------------------------------------|

function _rmaninfo(){

cd $PWDDIRECTORY

LOGOFRMAN=$LOGDIR/rman_$1-`date '+%y%m%d_%H:%M:%S'`.log

su - oracle -c "export ORACLE_SID=$1;rman target / <$LOGOFRMAN

show all;

list backup of database summary;

exit;

EOF"

if [ $? -ne 0 ]; then

echo "JOB-6: RMAN INFO ERROR!"

exit 1

fi

}

#----------------------------------------------------------------------------------------------------|

# 7. AWR. |

#----------------------------------------------------------------------------------------------------|

function _awrinfo(){

cd $PWDDIRECTORY

su - oracle -c "cd $LOGDIR/db;mkdir ./$1AWR;cd $1AWR;export ORACLE_SID=$1;sqlplus -silent /nolog <

conn /as sysdba

@$LOGDIR/../makeawr.sql;

exit;

EOF"

sed -i "s/PL\/SQL procedure successfully completed.//g" $LOGDIR/$1AWR/awrsql.sql

su - oracle -c "cd $LOGDIR;cd ./$1AWR;export ORACLE_SID=$1;sqlplus -silent /nolog <

conn /as sysdba

set echo off

@awrsql.sql;

exit;

EOF"

if [ $? -ne 0 ]; then

echo "JOB7-5: AWR ERROR!"

exit 1

fi

}

#----------------------------------------------------------------------------------------------------|

# 8. TAR. |

#----------------------------------------------------------------------------------------------------|

function _logtar(){

cd $PWDDIRECTORY

ls $LOGDIR/db/*.html > $LOGDIR/html.tmp

cat $LOGDIR/html.tmp | while read FNAME

do

sed -i 's/GBK/UTF8/g' $FNAME

done

rm -rf $LOGDIR/html.tmp

rm -f *$HOST.tar.gz

if [ -d $DATESTRING$HOST ]; then

tar -czvf $DATESTRING-$HOST.tar.gz ./$DATESTRING$HOST

rm -rf ./$DATESTRING$HOST

else

echo " JOB-8: TAR ERROR."

fi

}

#----------------------------------------------------------------------------------------------------|

# MAIN(). |

#----------------------------------------------------------------------------------------------------|

echo "========================================================="

echo "= JOB-NAME: $0"

echo "========================================================="

_userVerify

_initialization

echo "----------------------------------------------------"

echo "- JOB-1: CLUSTER LOG."

echo "----------------------------------------------------"

_clusterlog

echo "----------------------------------------------------"

echo "- JOB-2: OS INFO."

echo "----------------------------------------------------"

_osinfo

echo "----------------------------------------------------"

echo "- JOB-3: CLUSTER INFO."

echo "----------------------------------------------------"

_clusterinfo

echo "----------------------------------------------------"

echo "- JOB-(4-7): DB JOBs."

echo "----------------------------------------------------"

ps -ef | grep ora_pmon | grep -v grep | awk '{print $8}' | awk -F "_" '{print $3}' >$DBFILE

if [ -e $DBFILE ]; then

cat $DBFILE | while read DBSID

do

echo "- # JOB-4: DB INFO."

_dbinfo $DBSID

echo "- # JOB-5: DB LOG."

_dblog $DBSID

done

else

echo " JOB-4-7: NO DBSID FILES ERROR."

fi

rm -rf $DBFILE

echo "----------------------------------------------------"

echo "- JOB-8: LOG TAR."

echo "----------------------------------------------------"

_logtar

#!/bin/bash#----------------------------------------------------------------------------------------------------+# HAIZDL@126.COM |#----------------------------------------------------------------------------------------------------|# COPYRIGHT (C) 1998-2015 HAIZDL ALL RIGHTS RESERVED. |#----------------------------------------------------------------------------------------------------|# DATABASE : ORACLE |# FILE : healthyCheck1_forLinux.sh |# CLASS : DATABASE ADMINISTRATION |# PURPOSE : COLLECTING LOGS FOR ORACLE DATABASE DAILY CHECKING. |# NOTE : AS WITH ANY CODE, ENSURE TO TEST THIS SCRIPT IN A DEVELOPMENT |# ENVIRONMENT BEFORE ATTEMPTING TO RUN IT IN PRODUCTION. |#----------------------------------------------------------------------------------------------------+#----------------------------------------------#CHECK JOB OPERATOR USER INFO.#----------------------------------------------function _userVerify(){ USER=`whoami` if [ "$USER" != "root" ]; then echo "PLS USE ROOT USER FOR THIS JOB." echo "USAGE: $0" exit 1 fi}#----------------------------------------------#BASIC VARIABLE.#----------------------------------------------HOST=`hostname`PWDDIRECTORY=/tmp/oraDATESTRING=`date '+%Y%m%d'`LOGDIR=$PWDDIRECTORY/$DATESTRING$HOSTASMSID=+ASM1#LOG DIRECTORY PARAMETEROINSTALLDIRECTORY=/home/u01/appRDBMSDIRECTORY=$OINSTALLDIRECTORY/oracle/diag/rdbmsGRIDLOGDIRECTORY=$OINSTALLDIRECTORY/11.2.0.4/grid/logGRIDDIRECTORY=$OINSTALLDIRECTORY/gridALERTOFCLUSTER=$GRIDLOGDIRECTORY/$HOST/alert$HOST.logALERTOFLISTENER=$GRIDDIRECTORY/diag/tnslsnr/$HOST/listener/trace/listener.logALERTOFLISTENERSCAN=$GRIDLOGDIRECTORY/diag/tnslsnr/$HOST/listener_scan1/trace/listener_scan1.logALERTOFASM=$GRIDDIRECTORY/diag/asm/+asm/$ASMSID/trace/alert_$ASMSID.logOPATCHORACLEDIR=$OINSTALLDIRECTORY/oracle/product/11.2.0.4/dbhome_1/OPatchOPATCHGRIDDIR=$OINSTALLDIRECTORY/11.2.0.4/grid/OPatch#DIAGNOSTIC FILESLOGOFOSINFO=$LOGDIR/os/info_os_`date '+%y%m%d%H%M'`LOGOFCLUSTER=$LOGDIR/cluster/info_cluster_`date '+%y%m%d%H%M'`DBFILE=$LOGDIR/dbs.tmp#----------------------------------------------#CREATE LOG INFO DIRECTORY.#----------------------------------------------function _initialization(){ if [ -d $LOGDIR ]; then echo "INIT: LOG DIRECTORY EXISTS, DELETING." rm -rf $LOGDIR fi echo "INIT: LOG DIRECTORY CREATING." mkdir $LOGDIR mkdir $LOGDIR/os mkdir $LOGDIR/cluster mkdir $LOGDIR/db chmod -R 777 $LOGDIR if [ $? -ne 0 ]; then echo "INIT: LOG DIRECTORY CREATING ERROR!" exit 1 fi}#----------------------------------------------------------------------------------------------------|# 1. CLUSTER LOG. |#----------------------------------------------------------------------------------------------------|function _clusterlog(){ cd $PWDDIRECTORY if [ -e $ALERTOFCLUSTER ]; then ALERTOFCLUSTERNUM=$(($(awk '{print NR}' $ALERTOFCLUSTER |tail -n1) - $(cat -n $ALERTOFCLUSTER| grep "`date -d last-day +%Y-%m-%d|cut -c 1-10`" | awk 'BEGIN {min=99999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFCLUSTERNUM -gt 0 ]; then tail -$ALERTOFCLUSTERNUM $ALERTOFCLUSTER >>$LOGDIR/cluster/alert$HOST_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-1: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi if [ -e $ALERTOFLISTENER ]; then ALERTOFLISTENERNUM=$(($(awk '{print NR}' $ALERTOFLISTENER |tail -n1) - $(cat -n $ALERTOFLISTENER| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFLISTENERNUM -gt 0 ]; then tail -$ALERTOFLISTENERNUM $ALERTOFLISTENER >>$LOGDIR/cluster/alert_listener_$HOST_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-2: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi if [ -e $ALERTOFLISTENERSCAN ]; then ALERTOFLISTENERSCANNUM=$(($(awk '{print NR}' $ALERTOFLISTENERSCAN |tail -n1) - $(cat -n $ALERTOFLISTENERSCAN| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFLISTENERSCANNUM -gt 0 ]; then tail -$ALERTOFLISTENERSCANNUM $ALERTOFLISTENERSCAN >>$LOGDIR/cluster/alert_listener_scan1_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-3: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi if [ -e $ALERTOFASM ]; then ALERTOFASMNUM=$(($(awk '{print NR}' $ALERTOFASM |tail -n1) - $(cat -n $ALERTOFASM| grep "`date -d last-day|cut -c 1-8``date -d last-day +%d`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFASMNUM -gt 0 ]; then tail -$ALERTOFASMNUM $ALERTOFASM >>$LOGDIR/cluster/alert_$ASMSID_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-4: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi echo " " >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER}#----------------------------------------------------------------------------------------------------|# 2. OS INFO. |#----------------------------------------------------------------------------------------------------|function _osinfo(){ echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- UPTIME:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO uptime >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- VMSTAT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO vmstat 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- POWERMT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO powermt display dev=all >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- FREE:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO free -m >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- DF:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO df -h >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-PAGE:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -B 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-CPU:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -u 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- IOSTAT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO iostat -mx 1 5 | grep power >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-NETWORK:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -n DEV 1 5 | grep bond >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO if [ $? -ne 0 ]; then echo " JOB-2-1: OS INFO ERROR!" exit 1 fi LOGOFSYS=/var/log/messages if [ -e $LOGOFSYS ]; then cp $LOGOFSYS $LOGDIR/os if [ $? -ne 0 ]; then echo " JOB-2-2: OS INFO ERROR!" exit 1 fi fi}#----------------------------------------------------------------------------------------------------|# 3. CLUSTER INFO. |#----------------------------------------------------------------------------------------------------|function _clusterinfo(){ echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群资源状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER su - grid -c "crsctl status res -t " >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群服务状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER su - grid -c "crsctl check cluster -all" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群监听状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "#1 本地监听 ">>$LOGOFCLUSTER su - grid -c "lsnrctl status" >>$LOGOFCLUSTER echo "#2 SCAN监听 " >>$LOGOFCLUSTER su - grid -c "lsnrctl status LISTENER_SCAN1" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看磁盘组信息及状态 ">>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- #1 ">>$LOGOFCLUSTER su - grid -c "asmcmd lsdg" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看OCR信息 ">>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- #1 心跳盘 ">>$LOGOFCLUSTER su - grid -c "crsctl query css votedisk" >>$LOGOFCLUSTER echo "- #2 一致性检查 " >>$LOGOFCLUSTER su - grid -c "ocrcheck" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER}#----------------------------------------------------------------------------------------------------|# 4. DB INFO. |#----------------------------------------------------------------------------------------------------|function _dbinfo(){cd $PWDDIRECTORYsu - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFDBNUM -gt 0 ]; then tail -$ALERTOFDBNUM $ALERTOFDB >>$LOGDIR/db/alert_$1_`date '+%y%m%d_%H:%M:%S'`.log if [ $? -ne 0 ]; then echo " JOB-5: DB LOG ERROR!" exit 1 fi fi fi}#----------------------------------------------------------------------------------------------------|# 6. RMAN INFO. |#----------------------------------------------------------------------------------------------------|function _rmaninfo(){cd $PWDDIRECTORYLOGOFRMAN=$LOGDIR/rman_$1-`date '+%y%m%d_%H:%M:%S'`.logsu - oracle -c "export ORACLE_SID=$1;rman target / <$LOGOFRMANshow all;list backup of database summary;exit;EOF" if [ $? -ne 0 ]; then echo "JOB-6: RMAN INFO ERROR!" exit 1 fi}#----------------------------------------------------------------------------------------------------|# 7. AWR. |#----------------------------------------------------------------------------------------------------|function _awrinfo(){cd $PWDDIRECTORYsu - oracle -c "cd $LOGDIR/db;mkdir ./$1AWR;cd $1AWR;export ORACLE_SID=$1;sqlplus -silent /nolog < $LOGDIR/html.tmp cat $LOGDIR/html.tmp | while read FNAMEdo sed -i 's/GBK/UTF8/g' $FNAMEdonerm -rf $LOGDIR/html.tmprm -f *$HOST.tar.gz if [ -d $DATESTRING$HOST ]; then tar -czvf $DATESTRING-$HOST.tar.gz ./$DATESTRING$HOST rm -rf ./$DATESTRING$HOST else echo " JOB-8: TAR ERROR." fi}#----------------------------------------------------------------------------------------------------|# MAIN(). |#----------------------------------------------------------------------------------------------------|echo "========================================================="echo "= JOB-NAME: $0"echo "========================================================="_userVerify_initializationecho "----------------------------------------------------"echo "- JOB-1: CLUSTER LOG."echo "----------------------------------------------------"_clusterlogecho "----------------------------------------------------"echo "- JOB-2: OS INFO."echo "----------------------------------------------------"_osinfoecho "----------------------------------------------------"echo "- JOB-3: CLUSTER INFO."echo "----------------------------------------------------"_clusterinfoecho "----------------------------------------------------"echo "- JOB-(4-7): DB JOBs."echo "----------------------------------------------------"ps -ef | grep ora_pmon | grep -v grep | awk '{print $8}' | awk -F "_" '{print $3}' >$DBFILEif [ -e $DBFILE ]; then cat $DBFILE | while read DBSID do echo "- # JOB-4: DB INFO." _dbinfo $DBSID echo "- # JOB-5: DB LOG." _dblog $DBSID doneelse echo " JOB-4-7: NO DBSID FILES ERROR."firm -rf $DBFILEecho "----------------------------------------------------"echo "- JOB-8: LOG TAR."echo "----------------------------------------------------"_logtar

推荐链接

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: