博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
linux 再多的running也挡不住锁
阅读量:6201 次
发布时间:2019-06-21

本文共 11335 字,大约阅读时间需要 37 分钟。

再续《》,看运行态进程数量之多:

crash> mach          MACHINE TYPE: x86_64           MEMORY SIZE: 90 GB                  CPUS: 24            HYPERVISOR: KVM       PROCESSOR SPEED: 2394 Mhz                    HZ: 1000             PAGE SIZE: 4096   KERNEL VIRTUAL BASE: ffff880000000000   KERNEL VMALLOC BASE: ffffc90000000000   KERNEL VMEMMAP BASE: ffffea0000000000      KERNEL START MAP: ffffffff80000000   KERNEL MODULES BASE: ffffffffc0000000     KERNEL STACK SIZE: 16384        IRQ STACK SIZE: 16384            IRQ STACKS:                 CPU 0: ffff880b52a00000                 CPU 1: ffff880b52a40000                 CPU 2: ffff880b52a80000                 CPU 3: ffff880b52ac0000                 CPU 4: ffff880b52b00000                 CPU 5: ffff880b52b40000                 CPU 6: ffff880b52b80000                 CPU 7: ffff880b52bc0000                 CPU 8: ffff880b52c00000                 CPU 9: ffff880b52c40000                CPU 10: ffff880b52c80000                CPU 11: ffff880b52cc0000                CPU 12: ffff881692200000                CPU 13: ffff881692240000                CPU 14: ffff881692280000                CPU 15: ffff8816922c0000                CPU 16: ffff881692300000                CPU 17: ffff881692340000                CPU 18: ffff881692380000                CPU 19: ffff8816923c0000                CPU 20: ffff881692400000                CPU 21: ffff881692440000                CPU 22: ffff881692480000                CPU 23: ffff8816924c0000DOUBLEFAULT STACK SIZE: 4096

但是,我们来看,有多少个running进程呢?

ps |grep -i RU>     0      0   0  ffffffff81a09480  RU   0.0       0      0  [swapper/0]      0      0   1  ffff88016e798fd0  RU   0.0       0      0  [swapper/1]      0      0   2  ffff88016e799fa0  RU   0.0       0      0  [swapper/2]      0      0   3  ffff88016e79af70  RU   0.0       0      0  [swapper/3]      0      0   4  ffff88016e79bf40  RU   0.0       0      0  [swapper/4]      0      0   5  ffff88016e79cf10  RU   0.0       0      0  [swapper/5]      0      0   6  ffff88016e79dee0  RU   0.0       0      0  [swapper/6]      0      0   7  ffff88016e79eeb0  RU   0.0       0      0  [swapper/7]>     0      0   8  ffff88016e008000  RU   0.0       0      0  [swapper/8]      0      0   9  ffff88016e008fd0  RU   0.0       0      0  [swapper/9]      0      0  10  ffff88016e009fa0  RU   0.0       0      0  [swapper/10]>     0      0  11  ffff88016e00af70  RU   0.0       0      0  [swapper/11]>     0      0  12  ffff880beeaceeb0  RU   0.0       0      0  [swapper/12]      0      0  13  ffff880beeacdee0  RU   0.0       0      0  [swapper/13]      0      0  14  ffff880beeaccf10  RU   0.0       0      0  [swapper/14]>     0      0  15  ffff880beeacbf40  RU   0.0       0      0  [swapper/15]      0      0  16  ffff880beeacaf70  RU   0.0       0      0  [swapper/16]      0      0  17  ffff880beeac9fa0  RU   0.0       0      0  [swapper/17]      0      0  18  ffff880beeb10000  RU   0.0       0      0  [swapper/18]      0      0  19  ffff880beeb10fd0  RU   0.0       0      0  [swapper/19]      0      0  20  ffff880beeb11fa0  RU   0.0       0      0  [swapper/20]      0      0  21  ffff880beeb12f70  RU   0.0       0      0  [swapper/21]      0      0  22  ffff880beeb13f40  RU   0.0       0      0  [swapper/22]      0      0  23  ffff880beeb14f10  RU   0.0       0      0  [swapper/23]      1      0  21  ffff880beeac8000  RU   0.0  201332  14472  systemd     10      2   4  ffff88016e726eb0  RU   0.0       0      0  [rcu_sched]     12      2   1  ffff88016e00eeb0  RU   0.0       0      0  [watchdog/1]     14      2   1  ffff88016e00cf10  RU   0.0       0      0  [ksoftirqd/1]     17      2   2  ffff88016e040fd0  RU   0.0       0      0  [watchdog/2]     22      2   3  ffff88016e045ee0  RU   0.0       0      0  [watchdog/3]     24      2   3  ffff88016e098000  RU   0.0       0      0  [ksoftirqd/3]     27      2   4  ffff88016e09af70  RU   0.0       0      0  [watchdog/4]     42      2   7  ffff88016e111fa0  RU   0.0       0      0  [watchdog/7]     52      2   9  ffff88016e163f40  RU   0.0       0      0  [watchdog/9]     57      2  10  ffff88016e198fd0  RU   0.0       0      0  [watchdog/10]     73      2  13  ffff880beeb3bf40  RU   0.0       0      0  [watchdog/13]     83      2  15  ffff880beeb7dee0  RU   0.0       0      0  [watchdog/15]     88      2  16  ffff880beebaaf70  RU   0.0       0      0  [watchdog/16]     93      2  17  ffff880beebf0000  RU   0.0       0      0  [watchdog/17]     98      2  18  ffff880beebf4f10  RU   0.0       0      0  [watchdog/18]    103      2  19  ffff880bee021fa0  RU   0.0       0      0  [watchdog/19]    108      2  20  ffff880bee026eb0  RU   0.0       0      0  [watchdog/20]    110      2  20  ffff880bee050fd0  RU   0.0       0      0  [ksoftirqd/20]    113      2  21  ffff880bee053f40  RU   0.0       0      0  [watchdog/21]    118      2  22  ffff880bee078fd0  RU   0.0       0      0  [watchdog/22]    409  22481  10  ffff880b06e15ee0  RU   0.5 13597708 511816  zte_uep1    410  22481  21  ffff88077fe08000  RU   0.5 13597708 511816  zte_uep1    413  22481   1  ffff8814af3fbf40  RU   0.5 13597708 511816  zte_uep1    425  17770  10  ffff8807967d6eb0  RU   0.5 2267176 439500  java    636      1  15  ffff880b4f5b5ee0  RU   0.0   45452   1740  systemd-journal    865  22481  23  ffff8809a4aacf10  RU   0.5 13597708 511816  zte_uep1    924  22481  17  ffff880036a6cf10  RU   0.5 13597708 511816  zte_uep1    932      2  13  ffff880036022f70  IN   0.0       0      0  [xfs_mru_cache] 。。。。。。。。。。。。。。。。。。。。列不完

看看个数:

crash> ps |grep -w RU |wc -l1654

经常敲top的人肯定知道,running的进程数,一般是小于等于cpu的核数的,那为啥这个机器上这么多的running呢?为了排除是crash工具的问题,在从runqueue的角度来看下:

crash> p runqueuesPER-CPU DATA TYPE:  struct rq runqueues;PER-CPU ADDRESSES:  [0]: ffff880b52a17c00  [1]: ffff880b52a57c00 [2]: ffff880b52a97c00 [3]: ffff880b52ad7c00 [4]: ffff880b52b17c00 [5]: ffff880b52b57c00 [6]: ffff880b52b97c00 [7]: ffff880b52bd7c00 [8]: ffff880b52c17c00 [9]: ffff880b52c57c00 [10]: ffff880b52c97c00 [11]: ffff880b52cd7c00 [12]: ffff881692217c00 [13]: ffff881692257c00 [14]: ffff881692297c00 [15]: ffff8816922d7c00 [16]: ffff881692317c00 [17]: ffff881692357c00 [18]: ffff881692397c00 [19]: ffff8816923d7c00 [20]: ffff881692417c00 [21]: ffff881692457c00 [22]: ffff881692497c00 [23]: ffff8816924d7c00 crash> struct rq.nr_running ffff880b52a17c00 nr_running = 7 crash> struct rq.nr_running ffff880b52a57c00 nr_running = 152

挨个查看一下nr_running个数,然后脚本统计如下:

grep 'nr_running =' 1.txt |awk '{print $3}BEGIN{sum=0;num=0}{sum+=$3;num+=1}END{printf "%d\n",sum}'715211313865443841484311141443107102911101081059831630-------------总数

为什么这个总数和我们ps看到的RU状态的总数不一致呢?一开始还以为是percpu统计的瞬时值问题,但是这两个总数的相差:1654-1630=24,刚好和cpu的核数一致。

ps |grep -w 'RU'|grep 'swapper'>     0      0   0  ffffffff81a09480  RU   0.0       0      0  [swapper/0]      0      0   1  ffff88016e798fd0  RU   0.0       0      0  [swapper/1]      0      0   2  ffff88016e799fa0  RU   0.0       0      0  [swapper/2]      0      0   3  ffff88016e79af70  RU   0.0       0      0  [swapper/3]      0      0   4  ffff88016e79bf40  RU   0.0       0      0  [swapper/4]      0      0   5  ffff88016e79cf10  RU   0.0       0      0  [swapper/5]      0      0   6  ffff88016e79dee0  RU   0.0       0      0  [swapper/6]      0      0   7  ffff88016e79eeb0  RU   0.0       0      0  [swapper/7]>     0      0   8  ffff88016e008000  RU   0.0       0      0  [swapper/8]      0      0   9  ffff88016e008fd0  RU   0.0       0      0  [swapper/9]      0      0  10  ffff88016e009fa0  RU   0.0       0      0  [swapper/10]>     0      0  11  ffff88016e00af70  RU   0.0       0      0  [swapper/11]>     0      0  12  ffff880beeaceeb0  RU   0.0       0      0  [swapper/12]      0      0  13  ffff880beeacdee0  RU   0.0       0      0  [swapper/13]      0      0  14  ffff880beeaccf10  RU   0.0       0      0  [swapper/14]>     0      0  15  ffff880beeacbf40  RU   0.0       0      0  [swapper/15]      0      0  16  ffff880beeacaf70  RU   0.0       0      0  [swapper/16]      0      0  17  ffff880beeac9fa0  RU   0.0       0      0  [swapper/17]      0      0  18  ffff880beeb10000  RU   0.0       0      0  [swapper/18]      0      0  19  ffff880beeb10fd0  RU   0.0       0      0  [swapper/19]      0      0  20  ffff880beeb11fa0  RU   0.0       0      0  [swapper/20]      0      0  21  ffff880beeb12f70  RU   0.0       0      0  [swapper/21]      0      0  22  ffff880beeb13f40  RU   0.0       0      0  [swapper/22]      0      0  23  ffff880beeb14f10  RU   0.0       0      0  [swapper/23]

 

原来是因为idle进程默认在running态,但是并不算到nr_running的计数中:

crash> ps |grep -w 'RU' |grep -w '12'>     0      0  12  ffff880beeaceeb0  RU   0.0       0      0  [swapper/12]     12      2   1  ffff88016e00eeb0  RU   0.0       0      0  [watchdog/1]-----------这个不是12cpu的进程  22237      1  12  ffff881254b5eeb0  RU   1.0 14321692 912420  1_schedulercrash> struct rq.nr_running ffff881692217c00  nr_running = 1

 为啥running这么多呢?查看处于running状态的task,堆栈都是 __schedule ,也就是就绪队列很长,

 来挑一些看看堆栈:

crash> bt 30507PID: 30507  TASK: ffff8809d42daf70  CPU: 3   COMMAND: "java" #0 [ffff88099eb93d58] __schedule at ffffffff816b6165 #1 [ffff88099eb93dc0] schedule at ffffffff816b6ae9 #2 [ffff88099eb93dd0] schedule_hrtimeout_range_clock at ffffffff816b5852 #3 [ffff88099eb93e68] schedule_hrtimeout_range at ffffffff816b5903 #4 [ffff88099eb93e78] ep_poll at ffffffff812526de #5 [ffff88099eb93f30] sys_epoll_wait at ffffffff81253b9d #6 [ffff88099eb93f80] system_call_fastpath at ffffffff816c2789    RIP: 00007ffa43a652c3  RSP: 00007ffa2595b818  RFLAGS: 00010202    RAX: 00000000000000e8  RBX: ffffffff816c2789  RCX: 00000000e14ef930    RDX: 0000000000002000  RSI: 00000000047a5000  RDI: 0000000000000067    RBP: 00007ffa2595b630   R8: 0000000000000000   R9: 0000000000000003    R10: 0000000000007530  R11: 0000000000000293  R12: 0000016547341bca    R13: 0000000000000067  R14: 0000000000007530  R15: 00000000047a5000    ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002bcrash> bt 31606PID: 31606  TASK: ffff8809cbe03f40  CPU: 9   COMMAND: "java" #0 [ffff880a4eedbd58] __schedule at ffffffff816b6165 #1 [ffff880a4eedbdc0] schedule at ffffffff816b6ae9 #2 [ffff880a4eedbdd0] schedule_hrtimeout_range_clock at ffffffff816b5852 #3 [ffff880a4eedbe68] schedule_hrtimeout_range at ffffffff816b5903 #4 [ffff880a4eedbe78] ep_poll at ffffffff812526de #5 [ffff880a4eedbf30] sys_epoll_wait at ffffffff81253b9d #6 [ffff880a4eedbf80] system_call_fastpath at ffffffff816c2789    RIP: 00007f7bb9bb72c3  RSP: 00007f7b8f060510  RFLAGS: 00000246    RAX: 00000000000000e8  RBX: ffffffff816c2789  RCX: 00000000ffffffff    RDX: 0000000000002000  RSI: 0000000005be3000  RDI: 0000000000000151    RBP: 00007f7b8f060310   R8: 0000000000000000   R9: 0000000000000003    R10: 00000000000003e8  R11: 0000000000000293  R12: 0000016547349167    R13: 0000000000000151  R14: 00000000000003e8  R15: 0000000005be3000    ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002bcrash> task_struct.state ffff8809cbe03f40  state = 0

看看状态值的宏:

#define TASK_RUNNING        0#define TASK_INTERRUPTIBLE    1#define TASK_UNINTERRUPTIBLE    2#define __TASK_STOPPED        4#define __TASK_TRACED        8

再看看占有cpu的进程的堆栈,发现基本都在等锁,原本大家很快都能执行完,然后放到等待队列里面去,但是由于前面某个进程执行占着cpu不放,那么因为等待资源而得到唤醒的,处于就绪队列里面的进程自然得不到调度。当然这个是非抢占式内核的特点,高优先级的进程,也得等低优先级的进程主动放弃cpu,哪怕你已经在就绪队列,哪怕你的状态已经被改成了running,但就得等着。

 

转载于:https://www.cnblogs.com/10087622blog/p/9582606.html

你可能感兴趣的文章
MS SQL 建表SQL的脚本
查看>>
php中assert方法的安全问题
查看>>
CASpringAnimation的使用
查看>>
jQuery仿QQ空间图片查看特效(全屏,放大,缩小,旋转,镜像,鼠标滚轮缩放)...
查看>>
vSphere web client 迁移虚拟机
查看>>
MongoDB使用教程系列文章--Driver原理(初始化)
查看>>
提升网站用户体验—WebP 图片的高效使用
查看>>
系统分布式情况下最终一致性方案梳理
查看>>
八种不同类型的响应式网站建设赏析
查看>>
深入Java集合系列之四:ConcurrentHashMap
查看>>
集成新版(5.17+)Activiti Modeler与Rest服务
查看>>
MyBatis底层基础和拦截器 - 第一部分
查看>>
Apache Storm 官方文档 —— Storm 集群安装配置
查看>>
【万里征程——Windows App开发】动态磁贴
查看>>
【SpringMVC整合MyBatis】springmvc拦截器-定义和配置
查看>>
CODING 告诉你硅谷项目经理的项目管理之道(2)
查看>>
浏览器网页链接打开本地exe客户端程序
查看>>
基于Smali文件 Android Studio 动态调试 APP
查看>>
使用 Kind 搭建你的本地 Kubernetes 集群
查看>>
前端进阶之什么是BFC?BFC的原理是什么?如何创建BFC?
查看>>