文章目录
- 初始化
- 创建集群
- 任务处理模型
- 服务剔除任务
初始化
现在开始看eureka服务端是怎么启动的。首先从主类上标记的注解@EnableEurekaServer开始分析,其实没啥,就是@Import导入了EurekaServerMarkerConfiguration组件
@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Documented
@Import(EurekaServerMarkerConfiguration.class)
public @interface EnableEurekaServer {
}
EurekaServerMarkerConfiguration组件创建了一个Marker对象,顾名思义就是用来标记的,其实就用标记当前应用为eureka服务端应用。
@Configuration
public class EurekaServerMarkerConfiguration {
@Bean
public Marker eurekaServerMarkerBean() {
return new Marker();
}
class Marker {
}
}
不要忘了,springboot可以通过spi机制进行定制,所以看下spring.factories文件,这里我们看到定义了自动配置类EurekaServerAutoConfiguration,接下来分析这个类即可。
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
org.springframework.cloud.netflix.eureka.server.EurekaServerAutoConfiguration
通过类的声明,我们可以看到导入了EurekaServerInitializerConfiguration组件,刚才创建的Marker Bean也派上用场了,这里就是判断当容器中的Marker Bean存在时,才会对EurekaServerAutoConfiguration进行解析。
@Configuration
@Import(EurekaServerInitializerConfiguration.class)
@ConditionalOnBean(EurekaServerMarkerConfiguration.Marker.class)
@EnableConfigurationProperties({ EurekaDashboardProperties.class,
InstanceRegistryProperties.class })
@PropertySource("classpath:/eureka/server.properties")
public class EurekaServerAutoConfiguration extends WebMvcConfigurerAdapter {
......
@Bean
@ConditionalOnProperty(prefix = "eureka.dashboard", name = "enabled", matchIfMissing = true)
public EurekaController eurekaController() {
return new EurekaController(this.applicationInfoManager);
}
@Bean
public PeerAwareInstanceRegistry peerAwareInstanceRegistry(
ServerCodecs serverCodecs) {
this.eurekaClient.getApplications(); // force initialization
return new InstanceRegistry(this.eurekaServerConfig, this.eurekaClientConfig,
serverCodecs, this.eurekaClient,
this.instanceRegistryProperties.getExpectedNumberOfClientsSendingRenews(),
this.instanceRegistryProperties.getDefaultOpenForTrafficCount());
}
@Bean
@ConditionalOnMissingBean
public PeerEurekaNodes peerEurekaNodes(PeerAwareInstanceRegistry registry,
ServerCodecs serverCodecs) {
return new RefreshablePeerEurekaNodes(registry, this.eurekaServerConfig,
this.eurekaClientConfig, serverCodecs, this.applicationInfoManager);
}
@Bean
public EurekaServerContext eurekaServerContext(ServerCodecs serverCodecs,
PeerAwareInstanceRegistry registry, PeerEurekaNodes peerEurekaNodes) {
return new DefaultEurekaServerContext(this.eurekaServerConfig, serverCodecs,
registry, peerEurekaNodes, this.applicationInfoManager);
}
@Bean
public EurekaServerBootstrap eurekaServerBootstrap(PeerAwareInstanceRegistry registry,
EurekaServerContext serverContext) {
return new EurekaServerBootstrap(this.applicationInfoManager,
this.eurekaClientConfig, this.eurekaServerConfig, registry,
serverContext);
}
......
}
看下这个配置类创建了哪些Bean吧,EurekaController是用来处理仪表盘的,PeerEurekaNodes和PeerAwareInstanceRegistry这是两个集群相关的Bean,EurekaServerBootstrap待会分析,在初始化上下文的时候会用到,现在分析EurekaServerContext,由于initialize()方法存在@PostConstruct注解,因此在实例化以后会被调用。
@PostConstruct
@Override
public void initialize() {
logger.info("Initializing ...");
peerEurekaNodes.start();
try {
registry.init(peerEurekaNodes);
} catch (Exception e) {
throw new RuntimeException(e);
}
logger.info("Initialized");
}
创建集群
这里调用peerEurekaNodes.start()
开启线程周期性地调用updatePeerEurekaNodes(resolvePeerUrls());
。在resolvePeerUrls()方法中,首先通过内部属性applicationInfoManager的getInfo()方法获取InstanceInfo实例对象。
protected List<String> resolvePeerUrls() {
InstanceInfo myInfo = applicationInfoManager.getInfo();
// clientConfig(EurekaClientConfigBean)中的属性会跟"eureka.client"开头的配置项绑定
// 这里其实是通过EurekaClientConfigBean的region到availabilityZones(HashMap)中获取对应的value,记作zone
// zone的默认值是defaultZone
String zone = InstanceInfo.getZone(clientConfig.getAvailabilityZones(clientConfig.getRegion()), myInfo);
// 通过EurekaClientConfigBean的useDnsForFetchingServiceUrls判断副本来源,默认都是从配置中获取
// 从属性serviceUrl(HashMap->service-url配置项)中根据zone的值(默认为defaultZone)获取urls
// 将这些分区副本的url存到ArrayList<String>中
List<String> replicaUrls = EndpointUtils
.getDiscoveryServiceUrls(clientConfig, zone, new EndpointUtils.InstanceInfoBasedUrlRandomizer(myInfo));
// 将本节点从现有的分区中移除
// 做法isThisMyUrl是从url中获取hostname,再从InstanceInfo中获取本地hostname,把两者进行比较,
// 相同返回true,进行remove
int idx = 0;
while (idx < replicaUrls.size()) {
if (isThisMyUrl(replicaUrls.get(idx))) {
replicaUrls.remove(idx);
} else {
idx++;
}
}
return replicaUrls;
}
看下InstanceInfo是怎么被实例化的,springboot的spi定制化太强大了,可以轻而易举地整合其他框架,这里也是通过这个实现的,还是EurekaClientAutoConfiguration,这里通过eurekaApplicationInfoManager()创建InstanceInfo的实例对象,然后注入到ApplicationInfoManager里面去的,再来看InstanceInfo怎么被实例化的。
public class EurekaClientAutoConfiguration {
@Bean
@ConditionalOnMissingBean(value = ApplicationInfoManager.class, search = SearchStrategy.CURRENT)
public ApplicationInfoManager eurekaApplicationInfoManager(
EurekaInstanceConfig config) {
InstanceInfo instanceInfo = new InstanceInfoFactory().create(config);
return new ApplicationInfoManager(config, instanceInfo);
}
}
注入eurekaApplicationInfoManager()方法的EurekaInstanceConfig默认是EurekaInstanceConfigBean实现的。到这里可以把流程梳理一下。首先创建EurekaInstanceConfigBean实例Bean,而这个实例Bean中的属性跟"eureka.instance"开头的配置项绑定的。接着使用这个配置创建InstanceInfo实例对象,表示当前节点的实例信息,再注入到ApplicationInfoManager中。那我们现在就知道resolvePeerUrls()方法的InstanceInfo怎么来得了。再继续往下面看,接下来就是通过配置项获取集群中所有节点的url,作为updatePeerEurekaNodes()方法参数。接下来总算可以分析updatePeerEurekaNodes()方法,前面经过一系列的Bean注入和配置项获取,总算到updatePeerEurekaNodes()方法调用。
protected void updatePeerEurekaNodes(List<String> newPeerUrls) {
if (newPeerUrls.isEmpty()) {
logger.warn("The replica size seems to be empty. Check the route 53 DNS Registry");
return;
}
Set<String> toShutdown = new HashSet<>(peerEurekaNodeUrls);
toShutdown.removeAll(newPeerUrls);
Set<String> toAdd = new HashSet<>(newPeerUrls);
toAdd.removeAll(peerEurekaNodeUrls);
if (toShutdown.isEmpty() && toAdd.isEmpty()) { // No change
return;
}
// Remove peers no long available
List<PeerEurekaNode> newNodeList = new ArrayList<>(peerEurekaNodes);
if (!toShutdown.isEmpty()) {
logger.info("Removing no longer available peer nodes {}", toShutdown);
int i = 0;
while (i < newNodeList.size()) {
PeerEurekaNode eurekaNode = newNodeList.get(i);
if (toShutdown.contains(eurekaNode.getServiceUrl())) {
newNodeList.remove(i);
eurekaNode.shutDown();
} else {
i++;
}
}
}
// 依次添加集群中其他节点
if (!toAdd.isEmpty()) {
logger.info("Adding new peer nodes {}", toAdd);
for (String peerUrl : toAdd) {
newNodeList.add(createPeerEurekaNode(peerUrl));
}
}
this.peerEurekaNodes = newNodeList;
this.peerEurekaNodeUrls = new HashSet<>(newPeerUrls);
}
核心就是createPeerEurekaNode(peerUrl),根据集群中url创建节点PeerEurekaNode,都看到这里了,死磕到底吧,看下createPeerEurekaNode(peerUrl)方法。
protected PeerEurekaNode createPeerEurekaNode(String peerEurekaNodeUrl) {
HttpReplicationClient replicationClient = JerseyReplicationClient.createReplicationClient(serverConfig, serverCodecs, peerEurekaNodeUrl);
String targetHost = hostFromUrl(peerEurekaNodeUrl);
if (targetHost == null) {
targetHost = "host";
}
return new PeerEurekaNode(registry, targetHost, peerEurekaNodeUrl, replicationClient, serverConfig);
}
peerEurekaNodeUrl就是前面获取的集群中其他节点的url,这里先创建一个副本客户端HttpReplicationClient实例对象,再对PeerEurekaNode进行实例化,此时会创建任务,在任务里通过Jersey框架给集群中其他节点发送http请求,总算获取到集群中其他节点信息了。
任务处理模型
实例化之后可以通过batchingDispatcher.process();
对任务进行处理,本质是将任务添加到队列acceptorQueue(LinkedBlockingQueue)。
void process(ID id, T task, long expiryTime) {
acceptorQueue.add(new TaskHolder<ID, T>(id, task, expiryTime));
acceptedTasks++;
}
然后在AcceptorExecutor的AcceptorRunner线程run()中周期性处理这些任务
public void run() {
long scheduleTime = 0;
while (!isShutdown.get()) {
try {
// 从队列中取出任务
drainInputQueues();
int totalItems = processingOrder.size();
// 如果没有过期就进行处理
long now = System.currentTimeMillis();
if (scheduleTime < now) {
scheduleTime = now + trafficShaper.transmissionDelay();
}
if (scheduleTime <= now) {
assignBatchWork();
assignSingleItemWork();
}
// If no worker is requesting data or there is a delay injected by the traffic shaper,
// sleep for some time to avoid tight loop.
if (totalItems == processingOrder.size()) {
Thread.sleep(10);
}
} catch (InterruptedException ex) {
// Ignore
} catch (Throwable e) {
// Safe-guard, so we never exit this loop in an uncontrolled way.
logger.warn("Discovery AcceptorThread error", e);
}
}
}
其实就是从acceptorQueue中poll()出任务,添加到pendingTasks(HashMap)中,key是任务id,value是任务。
private void drainInputQueues() throws InterruptedException {
do {
drainReprocessQueue();
drainAcceptorQueue();
if (!isShutdown.get()) {
// 队列全空时延时等待任务加入
if (reprocessQueue.isEmpty() && acceptorQueue.isEmpty() && pendingTasks.isEmpty()) {
TaskHolder<ID, T> taskHolder = acceptorQueue.poll(10, TimeUnit.MILLISECONDS);
if (taskHolder != null) {
appendTaskHolder(taskHolder);
}
}
}
} while (!reprocessQueue.isEmpty() || !acceptorQueue.isEmpty() || pendingTasks.isEmpty());
}
private void drainAcceptorQueue() {
while (!acceptorQueue.isEmpty()) {
appendTaskHolder(acceptorQueue.poll());
}
}
private void appendTaskHolder(TaskHolder<ID, T> taskHolder) {
if (isFull()) {
pendingTasks.remove(processingOrder.poll());
queueOverflows++;
}
TaskHolder<ID, T> previousTask = pendingTasks.put(taskHolder.getId(), taskHolder);
if (previousTask == null) {
processingOrder.add(taskHolder.getId());
} else {
overriddenTasks++;
}
}
还是在本线程中收集任务,将刚才加入队列的任务都添加batchWorkQueue中,这些所有要处理的任务都在batchWorkQueue里了。
void assignBatchWork() {
if (hasEnoughTasksForNextBatch()) {
if (batchWorkRequests.tryAcquire(1)) {
long now = System.currentTimeMillis();
int len = Math.min(maxBatchingSize, processingOrder.size());
List<TaskHolder<ID, T>> holders = new ArrayList<>(len);
while (holders.size() < len && !processingOrder.isEmpty()) {
ID id = processingOrder.poll();
TaskHolder<ID, T> holder = pendingTasks.remove(id);
if (holder.getExpiryTime() > now) {
holders.add(holder);
} else {
expiredTasks++;
}
}
if (holders.isEmpty()) {
batchWorkRequests.release();
} else {
batchSizeMetric.record(holders.size(), TimeUnit.MILLISECONDS);
batchWorkQueue.add(holders);
}
}
}
}
接下是就是真正的处理这些任务processor.process(tasks)
了
public void run() {
try {
while (!isShutdown.get()) {
// 将保存批量任务的holders从队列batchWorkQueue中取出来
List<TaskHolder<ID, T>> holders = getWork();
metrics.registerExpiryTimes(holders);
List<T> tasks = getTasksOf(holders);
// 真正的处理在这里
ProcessingResult result = processor.process(tasks);
switch (result) {
case Success:
break;
case Congestion:
case TransientError:
taskDispatcher.reprocess(holders, result);
break;
case PermanentError:
logger.warn("Discarding {} tasks of {} due to permanent error", holders.size(), workerName);
}
metrics.registerTaskResult(result, tasks.size());
}
} catch (InterruptedException e) {
// Ignore
} catch (Throwable e) {
// Safe-guard, so we never exit this loop in an uncontrolled way.
logger.warn("Discovery WorkerThread error", e);
}
}
到这里对eureka任务调度模型总结下:首先batchingDispatcher将要处理的任务添加到acceptorQueue队列,然后线程从acceptorQueue将要处理的任务取出来缓存到pendingTasks,最后如果可以分发任务,就将任务合并到holders(ArrayList),然后在另外一个线程中批量处理processor.process(tasks)
,其实这个模型设计得很好的,工作中可以参考下。
现在我们还在EurekaServerContext的initialize()方法,看看注册初始化做的事情registry.init(peerEurekaNodes);
public void init(PeerEurekaNodes peerEurekaNodes) throws Exception {
// 定时刷新lastBucket值
this.numberOfReplicationsLastMin.start();
this.peerEurekaNodes = peerEurekaNodes;
// 实例化响应缓存
initializedResponseCache();
// 开启自动续约的任务
scheduleRenewalThresholdUpdateTask();
initRemoteRegionRegistry();
try {
Monitors.registerObject(this);
} catch (Throwable e) {
logger.warn("Cannot register the JMX monitor for the InstanceRegistry :", e);
}
}
到这里为止EurekaServerContext的实例化彻底完成了,意味着eureka服务上下文创建好了。
再看下服务端启动的最后一步,由于EurekaServerInitializerConfiguration.java实现了SmartLifecycle接口,所在start()会被spring调用,此时EurekaServerBootstrap.java的contextInitialized()会被调用。其实主要就做了两件事,先把集群中其他节点的实例信息同步过来this.registry.syncUp()
,再更新每分钟续约数,开个自动续约的任务。
服务剔除任务
服务剔除任务是在AbstractInstanceRegistry.java的postInit()方法中开启的,本质是定时执行evict()方法。
public void evict(long additionalLeaseMs) {
logger.debug("Running the evict task");
// 自我保护机制,如果开启了自我保护机制,并且每分钟续约数小于期望数numberOfRenewsPerMinThreshold
// 就不会剔除服务了
if (!isLeaseExpirationEnabled()) {
logger.debug("DS: lease expiration is currently disabled.");
return;
}
// 找到过期的实例
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
if (leaseMap != null) {
for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
Lease<InstanceInfo> lease = leaseEntry.getValue();
if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
expiredLeases.add(lease);
}
}
}
}
// To compensate for GC pauses or drifting local time, we need to use current registry size as a base for
// triggering self-preservation. Without that we would wipe out full registry.
int registrySize = (int) getLocalRegistrySize();
int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
int evictionLimit = registrySize - registrySizeThreshold;
// 随机清除,如果按顺序清除得话可能会清除掉整个应用
int toEvict = Math.min(expiredLeases.size(), evictionLimit);
if (toEvict > 0) {
logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);
Random random = new Random(System.currentTimeMillis());
for (int i = 0; i < toEvict; i++) {
// Pick a random item (Knuth shuffle algorithm)
int next = i + random.nextInt(expiredLeases.size() - i);
Collections.swap(expiredLeases, i, next);
Lease<InstanceInfo> lease = expiredLeases.get(i);
String appName = lease.getHolder().getAppName();
String id = lease.getHolder().getId();
EXPIRED.increment();
logger.warn("DS: Registry: expired lease for {}/{}", appName, id);
internalCancel(appName, id, false);
}
}
}
这就是eureka的自我保护机制,如果开启了自我保护机制,判断最近一分钟的心跳数是否大于期望收到的心跳数,如果小于期望值得话,说明有服务不在线,可能是重启或者宕机就不会剔除了,等他上线再说。
@Override
public boolean isLeaseExpirationEnabled() {
if (!isSelfPreservationModeEnabled()) {
// The self preservation mode is disabled, hence allowing the instances to expire.
return true;
}
return numberOfRenewsPerMinThreshold > 0 && getNumOfRenewsInLastMin() > numberOfRenewsPerMinThreshold;
}
evictionTimestamp是服务过期得时间戳,如果有值说明服务过期;lastUpdateTimestamp是最近一次得续约时间(心跳包发送得时间),duration为续约时间间隔,如果System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs))
,说明到当前时间为止还没有收到续约心跳,则任务过期。
public boolean isExpired(long additionalLeaseMs) {
return (evictionTimestamp > 0 || System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs));
}
剔除服务是在internalCancel()方法中,将要剔除得服务从registry的Map中移除remove(id)掉,然后清除缓存。
protected boolean internalCancel(String appName, String id, boolean isReplication) {
try {
read.lock();
CANCEL.increment(isReplication);
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToCancel = null;
if (gMap != null) {
leaseToCancel = gMap.remove(id);
}
synchronized (recentCanceledQueue) {
recentCanceledQueue.add(new Pair<Long, String>(System.currentTimeMillis(), appName + "(" + id + ")"));
}
InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
if (instanceStatus != null) {
logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
}
if (leaseToCancel == null) {
CANCEL_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
return false;
} else {
leaseToCancel.cancel();
InstanceInfo instanceInfo = leaseToCancel.getHolder();
String vip = null;
String svip = null;
if (instanceInfo != null) {
instanceInfo.setActionType(ActionType.DELETED);
recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
instanceInfo.setLastUpdatedTimestamp();
vip = instanceInfo.getVIPAddress();
svip = instanceInfo.getSecureVipAddress();
}
invalidateCache(appName, vip, svip);
logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
return true;
}
} finally {
read.unlock();
}
}
现在总结下eureka服务端启动的过程:首先从配置文件中获取集群中每个节点的信息,然后创建定时任务周期性发心跳包给其他节点,然后从其他节点将实例信息同步过来。