This was first published on https://blog.dbi-services.com/linux-how-to-monitor-the-nproc-limit-1 (2014-06-10)
Republishing here for new followers. The content is related to the the versions available at the publication date
You probably know about ‘nproc’ limits in Linux which are set in /etc/limits.conf and checked with ‘ulimit -u’. But do you know how to handle the monitoring and be alerted when you’re close the fixed limit?
Nproc is defined at OS level to limit the number of processes per user. Oracle 11.2.0.4 documentation recommends the following:
oracle soft nproc 2047 oracle hard nproc 16384But that is often too low, especially when you have the Enterprise Manager agent or other java programs running.
Do you want to check that you are far from the limit? then you can use ‘ps’. But beware, ‘ps’ by default does not show all processes. In Linux, when doing multithreading, each thread is implemented as a light-weight process (LWP). And you must use the ‘-L’ to see all of them.
Let’s take an example. I have a system where ‘ps -u oracle’ returns 243 lines. But including LWPs shows a lot more processes which is near the limit:
$ ps h -Led -o user | sort | uniq -c | sort -n 1 dbus 1 ntp 1 rpc 1 rpcuser 2 avahi 2 haldaemon 2 postfix 166 grid 400 root 1370 oracleSo the ‘oracle’ user has 1370 processes. That’s high. And this is the actual number where the nproc limit applies.
‘ps -Lf’ can show the detail. And even without ‘-L’ we can display the NLWP which is the number of threads per process:
ps -o nlwp,pid,lwp,args -u oracle | sort -n NLWP PID LWP COMMAND 1 8444 8444 oracleOPRODP3 (LOCAL=NO) 1 9397 9397 oracleOPRODP3 (LOCAL=NO) 1 9542 9542 oracleOPRODP3 (LOCAL=NO) 1 9803 9803 /u00/app/oracle/product/agent12c/core/12.1.0.3.0/perl/bin/perl /u00/app/oracle/product/agent12c/core/12.1.0.3.0/bin/emwd.pl agent /u00/app/oracle/product/agent12c/agent_inst/sysman/log/emagent.nohup 19 11966 11966 /u00/app/11.2.0/grid/bin/oraagent.bin 1114 9963 9963 /u00/app/oracle/product/agent12c/core/12.1.0.3.0/jdk/bin/java ... emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMainThe Oracle 12c EM agent has started 1114 threads and the grid infrastructure ‘oraagent.bin’ has 19 threads. In addition to that I’ve a lot of other monothreaded processes. This is how we reach 1370 which is the exact value to compare to the nproc limit.
So what are the good values to set? About the high number of threads for EM agent 12c, there are a few bugs. And I suspect that 1000 threads is too much, especially when checking them with ‘jstack’ I see that they are “CRSeOns” threads that should not be used in 11.2.0.2 and higher. But that’s another problem which I’m currently investigating. When you reach the nproc limit, the user will not be able to create new processes. clone() calls will return EAGAIN and that is reported by Oracle as:
ORA-27300: OS system dependent operation:fork failed with status: 11 ORA-27301: OS failure message: Resource temporarily unavailableAnd that is clearly bad when it concerns an +ASM instance or archiver processes.
The goal of the nproc limit is only to prevent ‘fork bombs’ where a process forks forever and exhausts all resources. So there is no problem to increase this limit. However if you set it high for some users (‘oracle’ and ‘grid’ usually), it can be a good idea to monitor the number of processes with the ps h -L above. Because having too many processes is suspect and increasing the limit just hides a process leak and defer the failure.
In ‘ps h -L -o’ The argument ‘h’ is there to remove the header line, and ‘-L’ to show all processes including LWP. Then you can count with ‘wc -l’.
In order to be sure that ‘ps h -L’ gives the exact number, I have tested it. In case you want to check this on your system, here is how to do it. And please report any difference.
First, set your limit to 1024 processes. This is a limit for my user, and the limit is set for my shell and all its child processes:
[oracle@VM211 ocm]$ ulimit -u 1024Now you can check it:
[oracle@VM211 ocm]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 15919 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimitedThen you can run a small C program (testnproc.zip) that calls fork() in a loop until it fails with EAGAIN:
[oracle@VM211 ocm]$ ./testnproc ... parent says fork number 871 sucessful child says fork number 872 pid 1518 parent says fork number 872 sucessful child says fork number 873 pid 1519 parent says fork number 873 sucessful child says fork number 874 pid 1520 parent says fork number 874 sucessful parent says fork number 875 failed (nproc: soft=1024 hard=1024) with errno=11And finally, because the processes sleep for a while, you can check how many processes you have. I do that from another user account for the simple reason that I need to create 2 more processes (‘ps’ and ‘wc’) for that:
[root@VM211 ocm]# ps h -Lu oracle | wc -l 1023
Currently this is what is set on Oracle linux 6 for 11gR2 by the preinstall package (in /etc/security/limits.conf):
oracle soft nproc 16384 oracle hard nproc 16384For 12c, these are set in /etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf which overrides /etc/security/limits.conf:
oracle soft nproc 16384 oracle hard nproc 16384And just for your information, here is what is set in the ODA X4-2:
oracle soft nproc 131072So what do you want to set? You probably don’t want it too low and experience ‘resource temporarily unavailable’. But what you don’t want either is 100000 processes on your server. So my recommendation is to set it high but monitor it when the number of processes reaches something that is not sensible. Then you prevent having the system down in case of process leak, but you can detect it and ask for a patch.
Thanks Franck , you are a savior! Great article indeed.
Very useful article. Thanks Franck !
Great stuff. Saved us from production headache Thanks
Franck, thank you for the thorough explanation on diagnosing these types issues. It showed me exactly where my problem was.
Great !! Thank you very much.
Great !! Thank you very much.
Hi Franck Pachot, my DBA-Village friend
When I try to login to grid I am getting the error su: cannot set user id: Resource temporarily unavailable
limits.conf show soft nproc 2047 for the grid user. ps h -Led -o user | sort | uniq -c | sort -n shows grid only at 503. Is there any others that could have cause Linux to think that the user account had exceeded the nproc limit?
Hi Newbie,
Its 2017 now, but I hit the same issue.
While checking further got to know that npoc actually limit the threads created by _real_user_id (ps -U ) rather than effective user (ps -u ).
You may confirm this from man page of getrlimit:
# man getrlimit | grep -A2 RLIMIT_NPROC RLIMIT_NPROC The maximum number of processes (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process. Upon encountering this limit, fork(2) fails with the error EAGAIN.
Hi ‘Newbie’, I’m glad to see you there. The limits.conf my be overwritten by /etc/security/limits.d files You can check ulimit -Hu and ulimit -Su Maybe you can strace/truss the su command and see which system call returns EAGAIN (which is 11) Regards, Franck.
It helped me so quickly. awesome and Thanks a lot!!!
Hi Franck, Doing some research on nproc and came across this excellent article.
Thought you might find these results interesting:
Oracle VirtualBox 4.something
[oracle@ora12102a test-nproc]$ cat /etc/issue Oracle Linux Server release 6.5
[oracle@ora12102a test-nproc]$ ulimit -u 1024 [oracle@ora12102a test-nproc]$ grep ‘Max processes’ /proc/$$/limits Max processes 1024 1024 processes
[root@ora12102a ~]# ps h -Lu oracle | wc -l 47
[oracle@ora12102a test-nproc]$ ./testnproc … child says fork number 1018 pid 31680 parent says fork number 1018 sucessful parent says fork number 1019 failed (nproc: soft=1024 hard=1024) with errno=11
[root@ora12102a ~]# ps h -Lu oracle | wc -l 1066
Hi, Franck First, thank you for the great explanation of this requirement. It’s the first one I’ve seen that immediately explains the light-weight process (LWP) component in a straight-forward way. In my environment, Enterprise Manager agents are usually the reason why oracle’s overall thread count exceeds the soft limit (2047) value. Once it’s over 2047, any attempt to su – to oracle returns the message “resource temporarily unavailable” . My question is why does this happens at the soft limit, instead of the hard limit? The hard limit is 16384. If only the soft limit matters, then what is the purpose of setting a hard limit? Thank you, Laura Sallwasser
Hi Laura, Thanks for your feedback. The soft limit can be raised (up to the high limit) with the ulimit command. Of course, in your case, when you have the error at login then you can’t run the ulimit command. You can think of them as hard limit is your admin that limits you, soft limit is yourself limiting. Regards, Franck.
Perfect thanks for your wonderful information.