再续《》,看运行态进程数量之多:
crash> mach MACHINE TYPE: x86_64 MEMORY SIZE: 90 GB CPUS: 24 HYPERVISOR: KVM PROCESSOR SPEED: 2394 Mhz HZ: 1000 PAGE SIZE: 4096 KERNEL VIRTUAL BASE: ffff880000000000 KERNEL VMALLOC BASE: ffffc90000000000 KERNEL VMEMMAP BASE: ffffea0000000000 KERNEL START MAP: ffffffff80000000 KERNEL MODULES BASE: ffffffffc0000000 KERNEL STACK SIZE: 16384 IRQ STACK SIZE: 16384 IRQ STACKS: CPU 0: ffff880b52a00000 CPU 1: ffff880b52a40000 CPU 2: ffff880b52a80000 CPU 3: ffff880b52ac0000 CPU 4: ffff880b52b00000 CPU 5: ffff880b52b40000 CPU 6: ffff880b52b80000 CPU 7: ffff880b52bc0000 CPU 8: ffff880b52c00000 CPU 9: ffff880b52c40000 CPU 10: ffff880b52c80000 CPU 11: ffff880b52cc0000 CPU 12: ffff881692200000 CPU 13: ffff881692240000 CPU 14: ffff881692280000 CPU 15: ffff8816922c0000 CPU 16: ffff881692300000 CPU 17: ffff881692340000 CPU 18: ffff881692380000 CPU 19: ffff8816923c0000 CPU 20: ffff881692400000 CPU 21: ffff881692440000 CPU 22: ffff881692480000 CPU 23: ffff8816924c0000DOUBLEFAULT STACK SIZE: 4096
但是,我们来看,有多少个running进程呢?
ps |grep -i RU> 0 0 0 ffffffff81a09480 RU 0.0 0 0 [swapper/0] 0 0 1 ffff88016e798fd0 RU 0.0 0 0 [swapper/1] 0 0 2 ffff88016e799fa0 RU 0.0 0 0 [swapper/2] 0 0 3 ffff88016e79af70 RU 0.0 0 0 [swapper/3] 0 0 4 ffff88016e79bf40 RU 0.0 0 0 [swapper/4] 0 0 5 ffff88016e79cf10 RU 0.0 0 0 [swapper/5] 0 0 6 ffff88016e79dee0 RU 0.0 0 0 [swapper/6] 0 0 7 ffff88016e79eeb0 RU 0.0 0 0 [swapper/7]> 0 0 8 ffff88016e008000 RU 0.0 0 0 [swapper/8] 0 0 9 ffff88016e008fd0 RU 0.0 0 0 [swapper/9] 0 0 10 ffff88016e009fa0 RU 0.0 0 0 [swapper/10]> 0 0 11 ffff88016e00af70 RU 0.0 0 0 [swapper/11]> 0 0 12 ffff880beeaceeb0 RU 0.0 0 0 [swapper/12] 0 0 13 ffff880beeacdee0 RU 0.0 0 0 [swapper/13] 0 0 14 ffff880beeaccf10 RU 0.0 0 0 [swapper/14]> 0 0 15 ffff880beeacbf40 RU 0.0 0 0 [swapper/15] 0 0 16 ffff880beeacaf70 RU 0.0 0 0 [swapper/16] 0 0 17 ffff880beeac9fa0 RU 0.0 0 0 [swapper/17] 0 0 18 ffff880beeb10000 RU 0.0 0 0 [swapper/18] 0 0 19 ffff880beeb10fd0 RU 0.0 0 0 [swapper/19] 0 0 20 ffff880beeb11fa0 RU 0.0 0 0 [swapper/20] 0 0 21 ffff880beeb12f70 RU 0.0 0 0 [swapper/21] 0 0 22 ffff880beeb13f40 RU 0.0 0 0 [swapper/22] 0 0 23 ffff880beeb14f10 RU 0.0 0 0 [swapper/23] 1 0 21 ffff880beeac8000 RU 0.0 201332 14472 systemd 10 2 4 ffff88016e726eb0 RU 0.0 0 0 [rcu_sched] 12 2 1 ffff88016e00eeb0 RU 0.0 0 0 [watchdog/1] 14 2 1 ffff88016e00cf10 RU 0.0 0 0 [ksoftirqd/1] 17 2 2 ffff88016e040fd0 RU 0.0 0 0 [watchdog/2] 22 2 3 ffff88016e045ee0 RU 0.0 0 0 [watchdog/3] 24 2 3 ffff88016e098000 RU 0.0 0 0 [ksoftirqd/3] 27 2 4 ffff88016e09af70 RU 0.0 0 0 [watchdog/4] 42 2 7 ffff88016e111fa0 RU 0.0 0 0 [watchdog/7] 52 2 9 ffff88016e163f40 RU 0.0 0 0 [watchdog/9] 57 2 10 ffff88016e198fd0 RU 0.0 0 0 [watchdog/10] 73 2 13 ffff880beeb3bf40 RU 0.0 0 0 [watchdog/13] 83 2 15 ffff880beeb7dee0 RU 0.0 0 0 [watchdog/15] 88 2 16 ffff880beebaaf70 RU 0.0 0 0 [watchdog/16] 93 2 17 ffff880beebf0000 RU 0.0 0 0 [watchdog/17] 98 2 18 ffff880beebf4f10 RU 0.0 0 0 [watchdog/18] 103 2 19 ffff880bee021fa0 RU 0.0 0 0 [watchdog/19] 108 2 20 ffff880bee026eb0 RU 0.0 0 0 [watchdog/20] 110 2 20 ffff880bee050fd0 RU 0.0 0 0 [ksoftirqd/20] 113 2 21 ffff880bee053f40 RU 0.0 0 0 [watchdog/21] 118 2 22 ffff880bee078fd0 RU 0.0 0 0 [watchdog/22] 409 22481 10 ffff880b06e15ee0 RU 0.5 13597708 511816 zte_uep1 410 22481 21 ffff88077fe08000 RU 0.5 13597708 511816 zte_uep1 413 22481 1 ffff8814af3fbf40 RU 0.5 13597708 511816 zte_uep1 425 17770 10 ffff8807967d6eb0 RU 0.5 2267176 439500 java 636 1 15 ffff880b4f5b5ee0 RU 0.0 45452 1740 systemd-journal 865 22481 23 ffff8809a4aacf10 RU 0.5 13597708 511816 zte_uep1 924 22481 17 ffff880036a6cf10 RU 0.5 13597708 511816 zte_uep1 932 2 13 ffff880036022f70 IN 0.0 0 0 [xfs_mru_cache] 。。。。。。。。。。。。。。。。。。。。列不完
看看个数:
crash> ps |grep -w RU |wc -l1654
经常敲top的人肯定知道,running的进程数,一般是小于等于cpu的核数的,那为啥这个机器上这么多的running呢?为了排除是crash工具的问题,在从runqueue的角度来看下:
crash> p runqueuesPER-CPU DATA TYPE: struct rq runqueues;PER-CPU ADDRESSES: [0]: ffff880b52a17c00 [1]: ffff880b52a57c00 [2]: ffff880b52a97c00 [3]: ffff880b52ad7c00 [4]: ffff880b52b17c00 [5]: ffff880b52b57c00 [6]: ffff880b52b97c00 [7]: ffff880b52bd7c00 [8]: ffff880b52c17c00 [9]: ffff880b52c57c00 [10]: ffff880b52c97c00 [11]: ffff880b52cd7c00 [12]: ffff881692217c00 [13]: ffff881692257c00 [14]: ffff881692297c00 [15]: ffff8816922d7c00 [16]: ffff881692317c00 [17]: ffff881692357c00 [18]: ffff881692397c00 [19]: ffff8816923d7c00 [20]: ffff881692417c00 [21]: ffff881692457c00 [22]: ffff881692497c00 [23]: ffff8816924d7c00 crash> struct rq.nr_running ffff880b52a17c00 nr_running = 7 crash> struct rq.nr_running ffff880b52a57c00 nr_running = 152
挨个查看一下nr_running个数,然后脚本统计如下:
grep 'nr_running =' 1.txt |awk '{print $3}BEGIN{sum=0;num=0}{sum+=$3;num+=1}END{printf "%d\n",sum}'715211313865443841484311141443107102911101081059831630-------------总数
为什么这个总数和我们ps看到的RU状态的总数不一致呢?一开始还以为是percpu统计的瞬时值问题,但是这两个总数的相差:1654-1630=24,刚好和cpu的核数一致。
ps |grep -w 'RU'|grep 'swapper'> 0 0 0 ffffffff81a09480 RU 0.0 0 0 [swapper/0] 0 0 1 ffff88016e798fd0 RU 0.0 0 0 [swapper/1] 0 0 2 ffff88016e799fa0 RU 0.0 0 0 [swapper/2] 0 0 3 ffff88016e79af70 RU 0.0 0 0 [swapper/3] 0 0 4 ffff88016e79bf40 RU 0.0 0 0 [swapper/4] 0 0 5 ffff88016e79cf10 RU 0.0 0 0 [swapper/5] 0 0 6 ffff88016e79dee0 RU 0.0 0 0 [swapper/6] 0 0 7 ffff88016e79eeb0 RU 0.0 0 0 [swapper/7]> 0 0 8 ffff88016e008000 RU 0.0 0 0 [swapper/8] 0 0 9 ffff88016e008fd0 RU 0.0 0 0 [swapper/9] 0 0 10 ffff88016e009fa0 RU 0.0 0 0 [swapper/10]> 0 0 11 ffff88016e00af70 RU 0.0 0 0 [swapper/11]> 0 0 12 ffff880beeaceeb0 RU 0.0 0 0 [swapper/12] 0 0 13 ffff880beeacdee0 RU 0.0 0 0 [swapper/13] 0 0 14 ffff880beeaccf10 RU 0.0 0 0 [swapper/14]> 0 0 15 ffff880beeacbf40 RU 0.0 0 0 [swapper/15] 0 0 16 ffff880beeacaf70 RU 0.0 0 0 [swapper/16] 0 0 17 ffff880beeac9fa0 RU 0.0 0 0 [swapper/17] 0 0 18 ffff880beeb10000 RU 0.0 0 0 [swapper/18] 0 0 19 ffff880beeb10fd0 RU 0.0 0 0 [swapper/19] 0 0 20 ffff880beeb11fa0 RU 0.0 0 0 [swapper/20] 0 0 21 ffff880beeb12f70 RU 0.0 0 0 [swapper/21] 0 0 22 ffff880beeb13f40 RU 0.0 0 0 [swapper/22] 0 0 23 ffff880beeb14f10 RU 0.0 0 0 [swapper/23]
原来是因为idle进程默认在running态,但是并不算到nr_running的计数中:
crash> ps |grep -w 'RU' |grep -w '12'> 0 0 12 ffff880beeaceeb0 RU 0.0 0 0 [swapper/12] 12 2 1 ffff88016e00eeb0 RU 0.0 0 0 [watchdog/1]-----------这个不是12cpu的进程 22237 1 12 ffff881254b5eeb0 RU 1.0 14321692 912420 1_schedulercrash> struct rq.nr_running ffff881692217c00 nr_running = 1
为啥running这么多呢?查看处于running状态的task,堆栈都是 __schedule ,也就是就绪队列很长,
来挑一些看看堆栈:
crash> bt 30507PID: 30507 TASK: ffff8809d42daf70 CPU: 3 COMMAND: "java" #0 [ffff88099eb93d58] __schedule at ffffffff816b6165 #1 [ffff88099eb93dc0] schedule at ffffffff816b6ae9 #2 [ffff88099eb93dd0] schedule_hrtimeout_range_clock at ffffffff816b5852 #3 [ffff88099eb93e68] schedule_hrtimeout_range at ffffffff816b5903 #4 [ffff88099eb93e78] ep_poll at ffffffff812526de #5 [ffff88099eb93f30] sys_epoll_wait at ffffffff81253b9d #6 [ffff88099eb93f80] system_call_fastpath at ffffffff816c2789 RIP: 00007ffa43a652c3 RSP: 00007ffa2595b818 RFLAGS: 00010202 RAX: 00000000000000e8 RBX: ffffffff816c2789 RCX: 00000000e14ef930 RDX: 0000000000002000 RSI: 00000000047a5000 RDI: 0000000000000067 RBP: 00007ffa2595b630 R8: 0000000000000000 R9: 0000000000000003 R10: 0000000000007530 R11: 0000000000000293 R12: 0000016547341bca R13: 0000000000000067 R14: 0000000000007530 R15: 00000000047a5000 ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002bcrash> bt 31606PID: 31606 TASK: ffff8809cbe03f40 CPU: 9 COMMAND: "java" #0 [ffff880a4eedbd58] __schedule at ffffffff816b6165 #1 [ffff880a4eedbdc0] schedule at ffffffff816b6ae9 #2 [ffff880a4eedbdd0] schedule_hrtimeout_range_clock at ffffffff816b5852 #3 [ffff880a4eedbe68] schedule_hrtimeout_range at ffffffff816b5903 #4 [ffff880a4eedbe78] ep_poll at ffffffff812526de #5 [ffff880a4eedbf30] sys_epoll_wait at ffffffff81253b9d #6 [ffff880a4eedbf80] system_call_fastpath at ffffffff816c2789 RIP: 00007f7bb9bb72c3 RSP: 00007f7b8f060510 RFLAGS: 00000246 RAX: 00000000000000e8 RBX: ffffffff816c2789 RCX: 00000000ffffffff RDX: 0000000000002000 RSI: 0000000005be3000 RDI: 0000000000000151 RBP: 00007f7b8f060310 R8: 0000000000000000 R9: 0000000000000003 R10: 00000000000003e8 R11: 0000000000000293 R12: 0000016547349167 R13: 0000000000000151 R14: 00000000000003e8 R15: 0000000005be3000 ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002bcrash> task_struct.state ffff8809cbe03f40 state = 0
看看状态值的宏:
#define TASK_RUNNING 0#define TASK_INTERRUPTIBLE 1#define TASK_UNINTERRUPTIBLE 2#define __TASK_STOPPED 4#define __TASK_TRACED 8
再看看占有cpu的进程的堆栈,发现基本都在等锁,原本大家很快都能执行完,然后放到等待队列里面去,但是由于前面某个进程执行占着cpu不放,那么因为等待资源而得到唤醒的,处于就绪队列里面的进程自然得不到调度。当然这个是非抢占式内核的特点,高优先级的进程,也得等低优先级的进程主动放弃cpu,哪怕你已经在就绪队列,哪怕你的状态已经被改成了running,但就得等着。