很多人都知道jdk8会默认开启UseAdaptiveSizePolicy,一个自适应调节E区S区大小的功能,本意是根据GC的情况自动计算计算 Eden、From 和 To 区的大小,看上去非常不错的一个功能,但我最近碰到的fullgc的问题却是由于它造成的
网上很多文章提到,UseAdaptiveSizePolicy的目的有以下3点(优先级依次递减):
- Pause goal:应用达到预期的 GC 暂停时间。
- Throughput goal:应用达到预期的吞吐量,即应用正常运行时间 / (正常运行时间 + GC 耗时)。
- Minimum footprint:尽可能小的内存占用量。
大概意思就是这个自适应大小策略会尽量减小gc时间,提高吞吐量,并且在达到这2个目标的前提下减少内存的占用。
接下来讲讲我碰到问题:
我在消费一个大流量kafka应用时发现消费能力有点不足,在无法增加更多partition的情况下,我查了下应用gc的情况(jstat -gcutil),发现fgc异常频繁,大约20秒左右就发生一次,显然这是一个大问题。后我又通过jmap -heap查看jvm信息时发现jvm的S区占比小得可怜,打印gc日志发现进入O区的对象年龄居然是一岁(大哭)
于是我猜测当碰到大流量的应用时由于E区很快被占满触发ygc,导致ygc频率变高吞吐量下降,为减少gc时间,S区被主动减小(毕竟S区变小gc时间就变短了),也因为S区变小,过多的E区对象就直接进入了O区,从而引发了fgc!
于是我主动关闭了自适应大小策略,手动分配了年轻代、老年代大小和SurvivorRatio后fgc情况得到明显缓解。但本着求真的精神,我想还是要看看虚拟机源码来验证下我的猜测是不是对的
代码路径(源码版本openjdk:jdk8-b120 https://github.com/openjdk/jdk/tree/jdk8-b120)
hotspot/src/share/vm/gc_implementation/parallelScavenge/psAdaptiveSizePolicy.cpp
因为源码过长,以下在一些关键位置做摘抄,其余代码省略
//很明显这个方法就是计算S区大小和threshold的(这里的threshold指的是tenuring_threshold,也就是对象晋升到老年代的年龄阈值)
uint PSAdaptiveSizePolicy::compute_survivor_space_size_and_threshold(
bool is_survivor_overflow,
uint tenuring_threshold,
size_t survivor_limit) {
....省略代码
if (!is_survivor_overflow) {
// Keep running averages on how much survived
// We use the tenuring threshold to equalize the cost of major
// and minor collections.
// ThresholdTolerance is used to indicate how sensitive the
// tenuring threshold is to differences in cost betweent the
// collection types.
// Get the times of interest. This involves a little work, so
// we cache the values here.
const double major_cost = major_gc_cost();
const double minor_cost = minor_gc_cost();
if (minor_cost > major_cost * _threshold_tolerance_percent) {
// Minor times are getting too long; lower the threshold so
// less survives and more is promoted.
//这里就是关键点,ygc时间过长,开始降低晋升年龄阈值
decr_tenuring_threshold = true;
set_decrement_tenuring_threshold_for_gc_cost(true);
} else if (major_cost > minor_cost * _threshold_tolerance_percent) {
// Major times are too long, so we want less promotion.
incr_tenuring_threshold = true;
set_increment_tenuring_threshold_for_gc_cost(true);
}
} else {
//S区发生溢出,直接降低晋升年龄阈值
decr_tenuring_threshold = true;
}
//这里拿到S区大小,取的是gcstat里的平均值(实际上是一种带权重算法得出的一个估计值,我看算法挺复杂的,还没深入研究)
size_t target_size = align_size_up((size_t)_avg_survived->padded_average(),
_space_alignment);
target_size = MAX2(target_size, _space_alignment);
if (target_size > survivor_limit) {
// Target size is bigger than we can handle. Let's also reduce
// the tenuring threshold.
target_size = survivor_limit;
decr_tenuring_threshold = true;
set_decrement_tenuring_threshold_for_survivor_limit(true);
}
....省略代码
set_survivor_size(target_size);
return tenuring_threshold;
}
从以上代码可以看出,当ygc发生时间过长或者S区发生溢出时,对象晋升阈值便开始降低(这也与我从gc日志里发现晋升年龄仅为1的现象相符合),由于大量对象一次ygc便进入老年代后,使得S区对象大小平均值减少,从而根据算法将减小S区,S区变小使得溢出的可能又大大增加,造成了恶行循环。
可见UseAdaptiveSizePolicy使用不当,可是会造成大麻烦的!
至此算是初步清晰了问题的原因,由于水平有限,如有错误,还请大家指正!
最后,还想对网上有少量文章里提到显示设置SurvivorRatio可以固定S区比例,不需要关闭UseAdaptiveSizePolicy的说法做一个驳斥(至少我测试的版本下这个说法是不成立的)
首先亮一下java版本
openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)
代码:
package cn.mysens.demo;
public class App {
public static void main(String[] args) {
while(true){
try {
byte[][] byteArr = new byte[512][1024];
Thread.sleep(50);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
很简单的代码,50ms生成一个512kb的数组
启动命令
java -XX:SurvivorRatio=8 -Xmn100m -cp app/build/libs/app.jar cn.mysens.demo.App
接下来使用jmap -heap查看
第一次的结果: E区80M S0/S1区10M, 这次正常
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 3336568832 (3182.0MB)
NewSize = 104857600 (100.0MB)
MaxNewSize = 104857600 (100.0MB)
OldSize = 104857600 (100.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 83886080 (80.0MB)
used = 9996896 (9.533782958984375MB)
free = 73889184 (70.46621704101562MB)
11.917228698730469% used
From Space:
capacity = 10485760 (10.0MB)
used = 573456 (0.5468902587890625MB)
free = 9912304 (9.453109741210938MB)
5.468902587890625% used
To Space:
capacity = 10485760 (10.0MB)
used = 0 (0.0MB)
free = 10485760 (10.0MB)
0.0% used
PS Old Generation
capacity = 104857600 (100.0MB)
used = 16384 (0.015625MB)
free = 104841216 (99.984375MB)
0.015625% used
隔了几秒钟再查看一次:
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 3336568832 (3182.0MB)
NewSize = 104857600 (100.0MB)
MaxNewSize = 104857600 (100.0MB)
OldSize = 104857600 (100.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 102760448 (98.0MB)
used = 94252464 (89.88615417480469MB)
free = 8507984 (8.113845825195312MB)
91.72056548449459% used
From Space:
capacity = 1048576 (1.0MB)
used = 393248 (0.375030517578125MB)
free = 655328 (0.624969482421875MB)
37.5030517578125% used
To Space:
capacity = 1048576 (1.0MB)
used = 0 (0.0MB)
free = 1048576 (1.0MB)
0.0% used
PS Old Generation
capacity = 104857600 (100.0MB)
used = 16384 (0.015625MB)
free = 104841216 (99.984375MB)
0.015625% used
很明显SurvivorRatio已经不是8了,通过jstat -gcutil看到fgc次数是0,应该是S区存活的太少,给通过平均值算法减少了,证明显示设置SurvivorRatio时UseAdaptiveSizePolicy仍然是默认有效的
!!