通常情况下,Eureka Instance在Eureka Server上注册后,会定期发送心跳,Eureka Server通过心跳来判断Eureka Instance是否健康,同时会定期删除超过一定时长没有发送心跳的Instance。通常有两种情况会导致Eureka Server收不到Instance的心跳,一是Instance自身原因所致,比如故障或关闭;一是Instance与Server之间的网络出现故障。通常前者只会导致个别Instance出现故障,一般不会出现大批量的故障,而后者通常会导致Eureka Server在短时间内无法收到大批心跳。考虑到这个区别,Eureka设定了一个阀值,当判断挂掉的Instance数量超过阀值时,Eureka Server认为很大程度上出现了网络故障,将不再删除心跳过期的Instance。我们把这种模式叫着Eureka Server的自我保护模式。
注意进入自我保护模式只是不删除心跳过期的Instance,正常的注册和注销依然正常进行。
处理流程伪代码:
if(!isLeaseExpirationEnabled()) { return; } else { //获得可以回收的数量 //随机回收心跳过期的Instance,回收数量为上一步得到的数量 }
注意事项:
1、回收线程在计算心跳时,默认心跳间隔为30s,对应1m产生2个心跳。如果Instance修改了心跳间隔会导致回收线程计算错误。
AbstractInstanceRegistry -- 》 // Since the client wants to cancel it, reduce the threshold // (1 for 30 seconds, 2 for a minute) this.expectedNumberOfRenewsPerMin = this.expectedNumberOfRenewsPerMin + 2; this.numberOfRenewsPerMinThreshold = (int) (this.expectedNumberOfRenewsPerMin * serverConfig.getRenewalPercentThreshold());
2、由于心跳处理时的bug,导致在计算时的心跳过期时长=设定值的2倍。
* Checks if the lease of a given {@link com.netflix.appinfo.InstanceInfo} has expired or not. * * Note that due to renew() doing the 'wrong" thing and setting lastUpdateTimestamp to +duration more than * what it should be, the expiry will actually be 2 * duration. This is a minor bug and should only affect * instances that ungracefully shutdown. Due to possible wide ranging impact to existing usage, this will * not be fixed. * * @param additionalLeaseMs any additional lease time to add to the lease evaluation in ms. */ public boolean isExpired(long additionalLeaseMs) { return (evictionTimestamp > 0 || System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs)); } /** * Renew the lease, use renewal duration if it was specified by the * associated {@link T} during registration, otherwise default duration is * {@link #DEFAULT_DURATION_IN_SECS}. */ public void renew() { lastUpdateTimestamp = System.currentTimeMillis() + duration; }
3、增加一个Instance,对应1m增加2个心跳
通过Eureka的管理平台,可以看到是否进入自我保护模式,如下图所示
图中:
Lease expiration enabled false // false表明进入保护模式
Renews threshold 1 //心跳阀值
Renews (last min) 0 // 上1分钟的心跳数
进入保护模式时,界面可能会出现一行红字提示,如上所示。