Oracle is hanging? Don’t forget hanganalyze and systemstate!

sqlplus / as sysdba
 oradebug setmypid
 oradebug unlimit
 oradebug hanganalyze 3
 oradebug dump ashdumpseconds 30
 oradebug dump systemstate 266
 oradebug tracefile_name

Your Oracle database – production DB, of course – is hanging. All users are blocked. You quickly check the obvious suspects (archivelog destination full, system swapping, etc.) but it’s something else. Even you, the Oracle DBA, cannot do anything: any select is hanging. And maybe you’re even not able to connect with a simple ‘sqlplus / as sysdba’.

What do you do ? There may be several ways to investigate deeper (strace or truss for example) but it will take time. And your boss is clear: the only important thing is to get the production running again as soon as possible. No time to investigate. SHUTDOWN ABORT and restart.

Ok, but now that everything is back to normal, your boss rules has changed: the system was down for 15 minutes. We have to provide an explanation. Root Cause Analysis.

But how will you investigate now ? You have restarted everything, so all V$ information is gone. You have Diagnostic Pack ? But the system was hanged: no ASH information went to disk. You can open an SR but what information will you give?

Hang Analyze

The next time it happens, you need to have a way to get some information that can be analyzed post mortem. But you need to be able to do that very quickly just before your boss shouts ‘shutdown abort now’. And this is why I’ve put it at the begining of the post, so that you can find it quickly if you need it…

That takes only a few seconds to generate all post-mortem necessary information. If you can take 1 more minute, you will even be able to read the first lines of hanganalyze output, and you will be able to identify a true hanging situation and maybe just kill the root of the blocking sessions instead of a merciless restart.

In order to show you the kind of output you get, I’ve run a few jobs locking the same resources (TM locks) – which is not a true hanging situation because the blocking session can resolve the situation. Here is the first lines from the oradebug hanganalyze:

Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'PL/SQL lock timer'Systemstate has all information about System Objects (sessions, processes, ...) but you have to navigate into it in order to understand the wait chain. In my example:

SO: 0x914ada70, type: 4, owner: 0x91990478, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x91990478, name=session, file=ksu.h LINE:13580, pg=0 conuid=0
(session) sid: 23 ser: 7 trans: 0x8ea8e3e8, creator: 0x91990478
...
service name: SYS$USERS
client details:
O/S info: user: oracle, term: UNKNOWN, ospid: 7929
machine: vboxora12c program: oracle@vboxora12c (J002)
Current Wait Stack:
 0: waiting for 'enq: TM - contention'
    name|mode=0x544d0003, object #=0x1737c, table/partition=0x0
    wait_id=10 seq_num=11 snap_id=1
    wait times: snap=15.991474 sec, exc=15.991474 sec, total=15.991474 sec
    wait times: max=40.000000 sec, heur=15.991474 sec
    wait counts: calls=6 os=6
    in_wait=1 iflags=0x15a0
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
  inst: 1, sid: 254, ser: 5
Dumping final blocker: inst: 1, sid: 256, ser: 5
This is a session that is waiting, and we have the final blocker: inst: 1, sid: 256, ser: 5

Then we get to the final blocker by searching the sid: 256:

SO: 0x9168a408, type: 4, owner: 0x9198d058, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x9198d058, name=session, file=ksu.h LINE:13580, pg=0 conuid=0
(session) sid: 256 ser: 5 trans: 0x8ea6b618, creator: 0x9198d058
...
service name: SYS$USERS
client details:
O/S info: user: oracle, term: UNKNOWN, ospid: 7925
machine: vboxora12c program: oracle@vboxora12c (J000)
Current Wait Stack:
 0: waiting for 'PL/SQL lock timer'
    duration=0x0, =0x0, =0x0
    wait_id=0 seq_num=1 snap_id=1
    wait times: snap=25.936165 sec, exc=25.936165 sec, total=25.936165 sec
    wait times: max=50.000000 sec, heur=25.936165 sec
    wait counts: calls=1 os=9
    in_wait=1 iflags=0x5a0
There are 5 sessions blocked by this session.
Dumping one waiter:
  inst: 1, sid: 254, ser: 5
  wait event: 'enq: TM - contention'
    p1: 'name|mode'=0x544d0004
    p2: 'object #'=0x1737c
    p3: 'table/partition'=0x0
  row_wait_obj#: 95100, block#: 0, row#: 0, file# 0
  min_blocked_time: 19 secs, waiter_cache_ver: 44

Analysing the System State takes much longer than the hanganalyze, but it has more information.

V$WAIT_CHAINS

When the blocking situation is not so desesperate, but you just want to see what is blocking, the hanganalyze information is also available online in V$WAIT_CHAINS. The advantage over ASH is that you see all processes (not only foreground, not only active ones).

Here is an example:

CHAIN_ID	CHAIN	CHAIN_SIGNATURE	INSTANCE	OSID	PID	SID	BLOCK
1	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7929	42	23	TRUE
1	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7927	41	254	TRUE
1	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7925	39	256	FALSE
2	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7933	46	25	TRUE
3	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7931	45	260	TRUE
4	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7935	47	262	TRUE

ASH Dump

There is something else that you can get if you have Diagnostic Pack. The ASH information can be dumped to trace file even if it cannot be collected in the database.

oradebug dump ashdumpseconds 30

that will gather ASH from latest 30 seconds, and the trace file will even have the sqlldr ctl file to load it in an ASH like table.

sqlplus -prelim

But what can you do if you can’t even connect / as sysdba ? There is the ‘preliminary connection’ that does not create a session:

sqlplus -prelim / as sysdba

With that you will be able to get a systemstate. You will be able to get a ashdump. But unfortunately, since 11.2.0.2 you cannot get a hanganalyze:

ERROR: Can not perform hang analysis dump without a process state object and a session state object.

But there is a workaround for that (from Tanel Poders’s blog): try to use a session that is already connected.

For exemple I use the DIAG background process (it’s better not to use vital processes for that)

SQL> oradebug setorapname diag
Oracle pid: 8, Unix process pid: 7805, image: oracle@vboxora12c (DIAG)

Core message

Even in hurry,

Always check an hanganalyze to understand the problem.
Always get a systemstate before a shutdown abort.

and you will have information to investigate later, or to provide to Oracle Support.

8 Comments

JC Dauchy says:

June 24, 2015 at 15 h 53 min

Learnt a new oradebug command, really usefull this one :

oradebug dump ashdumpseconds 30

Reply to JC
Tom Robbins says:

July 13, 2016 at 2 h 04 min

Very nice. Thanks!

Reply to Tom
Interesting blog post on Oracle hangs (and hanganalyze/systemstate) | Finnzi! says:

July 20, 2016 at 18 h 55 min

[…] someone post this blog post earlier which gives some pointers on how to debug hangs in Oracle databases. At least […]

Reply to Interesting
sqlplus preliminary connection — как соединиться с бд, если CONNECT AS SYSDBA не возможен | Oracle mechanics says:

July 26, 2016 at 11 h 27 min

[…] P.S. Готовый рецепт использования: […]

Reply to sqlplus
Charles Schultz says:

August 17, 2016 at 15 h 26 min

I believe “oradebug systemstate 266″ should be “oradebug dump systemstate 266″

Reply to Charles
- Franck Pachot says:
  
  August 17, 2016 at 15 h 47 min
  
  Hi Charles, Thanks, I fixed it. Regards, Franck.
  
  Reply to Franck
Marat says:

October 19, 2016 at 19 h 27 min

Also useful one: oradebug pdump interval=60 ndumps=3 hanganalyze 3;

Dump hanganalyze 3 times with 60 seconds interval

PDUMP command

To perform a dump periodically use:

ORADEBUG PDUMP [interval=] [ndumps= [address]

Reply to Marat
Rebecca says:

July 12, 2017 at 17 h 56 min

Very informative and clear. Thanks

Reply to Rebecca

Follow: Linkedin, Twitter, Youtube, Mastodon, dev.to