SpringBoot集成ZipKin实现链路跟踪

1、我们要做什么

当我们的服务器成千上万,当我们的模块上万成千,当我们的调用链路复杂如蜘蛛网时,我们突然发现一个小小的性能问题却不能快速定位到点!千万不要以为自己是神,当年那个觉得ELK日志分析系统多余的程序员已经被老板祭天!

废话有点多,今天我们要做的一件事非常简单,如何在一个多层调用的接口里快速查看它们的网络拓扑图并得到监控数据!

2、我们要注意什么

但凡一个合格的辅助,都不能抢主力的经济,不然会影响主力DPS输出----性能轻损耗

但凡一个合格的辅助,都应该跟随主力的脚步,而不能对主力指手画脚----业务非嵌入

但凡一个合格的辅助,都应该尽可能的实现简单轻负载----架构轻量级

3、zipkin实现原理

流程图:

java链路跟踪 springboot链路追踪_java链路跟踪

流程图包含三个重要信息:

  • Trace
    表示一条调用链路,是整个调用链路串联的唯一标识,将所有Span汇聚起来
  • Span
    通俗的理解就是一次请求信息;它是链路跟踪的基本工作单元,一次链路调用(可以是RPC,DB等没有特定的限制)创建一个span
  • Annotation
    用于定位一个request的开始和结束,cs/sr/ss/cr含有额外的信息,比如说时间点,当这个annotation被记录了,这个RPC也被认为完成了
cs:Client Start,表示客户端发起请求 ;一个span的开始;
cf:Client Finish,表示客户端获取到服务端返回信息;一个span的结束
ss:Server Start,表示服务端收到请求
sf:Server Finish,表示服务端完成处理,并将结果发送给客户端

ss-cs:网络延迟
sf-ss:逻辑处理时间
cf-cs:整个流程时间
  • Collector
    接受或者收集各个应用传输的数据,跟踪一个Http请求的工作流程:
  1. 把当前调用链的Trace信息添加到HTTP Header里面
  2. 记录当前调用的时间戳
  3. 发送HTTP请求,把trace相关的header信息携带上
  4. 调用结束之后,记录当前调用话费的时间
  5. 然后把上面流程产生的 信息汇集成一个span,把这个span信息上传到zipkin的Collector模块
  6. 下一个Http请求继续从第一步开始

4、落地实现

4.1 搭建zipkin server

持久化方式

创建zipkin持久化数据库,当然,也可以不持久化,放内存中,不过我相信您如果不想被老板祭天也不会这么干!

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for zipkin_annotations
-- ----------------------------
DROP TABLE IF EXISTS `zipkin_annotations`;
CREATE TABLE `zipkin_annotations`  (
  `trace_id_high` bigint(20) NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the\r\ntrace uses 128 bit traceIds instead of 64 bit',
  `trace_id` bigint(20) NOT NULL COMMENT 'coincides with zipkin_spans.trace_id',
  `span_id` bigint(20) NOT NULL COMMENT 'coincides with zipkin_spans.id',
  `a_key` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT 'BinaryAnnotation.key or\r\nAnnotation.value if type == -1',
  `a_value` blob NULL COMMENT 'BinaryAnnotation.value(), which must be smaller than\r\n64KB',
  `a_type` int(11) NOT NULL COMMENT 'BinaryAnnotation.type() or -1 if Annotation',
  `a_timestamp` bigint(20) NULL DEFAULT NULL COMMENT 'Used to implement TTL; Annotation.timestamp or\r\nzipkin_spans.timestamp',
  `endpoint_ipv4` int(11) NULL DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is null',
  `endpoint_ipv6` binary(16) NULL DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is\r\nnull, or no IPv6 address',
  `endpoint_port` smallint(6) NULL DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is\r\nnull',
  `endpoint_service_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'Null when\r\nBinary/Annotation.endpoint is null',
  UNIQUE INDEX `trace_id_high`(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) USING BTREE COMMENT 'Ignore insert on duplicate',
  INDEX `trace_id_high_2`(`trace_id_high`, `trace_id`, `span_id`) USING BTREE COMMENT 'for joining with zipkin_spans',
  INDEX `trace_id_high_3`(`trace_id_high`, `trace_id`) USING BTREE COMMENT 'for getTraces/ByIds',
  INDEX `endpoint_service_name`(`endpoint_service_name`) USING BTREE COMMENT 'for\r\ngetTraces and getServiceNames',
  INDEX `a_type`(`a_type`) USING BTREE COMMENT 'for getTraces and\r\nautocomplete values',
  INDEX `a_key`(`a_key`) USING BTREE COMMENT 'for getTraces and\r\nautocomplete values',
  INDEX `trace_id`(`trace_id`, `span_id`, `a_key`) USING BTREE COMMENT 'for dependencies job'
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = COMPRESSED;

-- ----------------------------
-- Table structure for zipkin_spans
-- ----------------------------
DROP TABLE IF EXISTS `zipkin_spans`;
CREATE TABLE `zipkin_spans`  (
  `trace_id_high` bigint(20) NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the\r\ntrace uses 128 bit traceIds instead of 64 bit',
  `trace_id` bigint(20) NOT NULL,
  `id` bigint(20) NOT NULL,
  `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `remote_service_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `parent_id` bigint(20) NULL DEFAULT NULL,
  `debug` bit(1) NULL DEFAULT NULL,
  `start_ts` bigint(20) NULL DEFAULT NULL COMMENT 'Span.timestamp(): epoch micros used for endTs query\r\nand to implement TTL',
  `duration` bigint(20) NULL DEFAULT NULL COMMENT 'Span.duration(): micros used for minDuration and\r\nmaxDuration query',
  PRIMARY KEY (`trace_id_high`, `trace_id`, `id`) USING BTREE,
  INDEX `trace_id_high`(`trace_id_high`, `trace_id`) USING BTREE COMMENT 'for\r\ngetTracesByIds',
  INDEX `name`(`name`) USING BTREE COMMENT 'for getTraces and\r\ngetSpanNames',
  INDEX `remote_service_name`(`remote_service_name`) USING BTREE COMMENT 'for getTraces\r\nand getRemoteServiceNames',
  INDEX `start_ts`(`start_ts`) USING BTREE COMMENT 'for getTraces ordering\r\nand range'
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = COMPRESSED;

-- ----------------------------
-- Records of zipkin_annotations
-- ----------------------------
set global innodb_large_prefix=1;
set global innodb_file_format=BARRACUDA;
-- ----------------------------
-- Table structure for zipkin_dependencies
-- ----------------------------
DROP TABLE IF EXISTS `zipkin_dependencies`;
CREATE TABLE `zipkin_dependencies`  (
  `day` date NOT NULL,
  `parent` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `child` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `call_count` bigint(20) NULL DEFAULT NULL,
  `error_count` bigint(20) NULL DEFAULT NULL,
  PRIMARY KEY (`day`, `parent`, `child`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = COMPRESSED;

SET FOREIGN_KEY_CHECKS = 1;
部署zipkin server

使用Docker Compose部署zipkin server

zipkin:
    image: openzipkin/zipkin
    container_name: zipkin
    environment:
      - STORAGE_TYPE=mysql
      # Point the zipkin at the storage backend
      - MYSQL_DB=zipkin
      - MYSQL_USER=root
      - MYSQL_PASS=root
      - MYSQL_HOST=192.168.137.129
      - MYSQL_TCP_PORT=3306
    network_mode: host
    ports:
      # Port used for the Zipkin UI and HTTP Api
      - 9411:9411

配置好数据库,启动docker容器

java链路跟踪 springboot链路追踪_zipkin_02


通过web浏览器访问如下,表示搭建成功

java链路跟踪 springboot链路追踪_后端_03

4.2 客户端嵌入zipkin跟踪

测试链路

非常简单,四个应用服务,调用深度为三层!

java链路跟踪 springboot链路追踪_java链路跟踪_04

引入依赖

每个zipkin客户端服务都引入如下依赖:

<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
        <dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-sleuth-zipkin</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-starter-sleuth</artifactId>
		</dependency>
	</dependencies>
服务调用代码
  • service1
package com.paratera.console.linktracking.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.client.RestTemplate;

@RestController
@RequestMapping("/service1")
public class ZipkinBraveController {

    @Autowired
    private RestTemplate restTemplate;

    @RequestMapping("/test")
    public String service1() throws Exception {
        //休眠100ms,模拟业务处理耗时
        Thread.sleep(100);
        //调用service2服务
        ResponseEntity<String> res = restTemplate.getForEntity("http://localhost:8082/service2/test", String.class);
        return res.getBody();
    }
}
  • service2
package com.paratera.console.linktracking.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.client.RestTemplate;

@RestController
@RequestMapping("/service2")
public class ZipkinBraveController {

    @Autowired
    private RestTemplate restTemplate;

    @RequestMapping("/test")
    public String service1() throws Exception {
        //休眠200ms,模拟业务处理
        Thread.sleep(200);
        //调用service3服务
        ResponseEntity<String> res1 = restTemplate.getForEntity("http://localhost:8083/service3/test", String.class);
        //调用service4服务
        ResponseEntity<String> res2 = restTemplate.getForEntity("http://localhost:8084/service4/test", String.class);
        return res1.getBody()+":"+res2.getBody();
    }
}
  • service3
package com.paratera.console.linktracking.controller;

import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/service3")
public class ZipkinBraveController {

    @RequestMapping("/test")
    public String service1() throws Exception {
        //休眠3s,模拟性能耗点
        Thread.sleep(3000);
        return "service3";
    }
}
  • service4
package com.paratera.console.linktracking.controller;

import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/service4")
public class ZipkinBraveController {

    @RequestMapping("/test")
    public String service1() throws Exception {
        //休眠100ms,模拟业务耗时
        Thread.sleep(100);
        return "service4";
    }

}
application.yml配置
spring:
  application:
    name: service1
  zipkin:
    base-url: http://192.168.137.129:9411    #zipkin server 的地址
    sender:
      type: web    #如果ClassPath里没有kafka, active MQ, 默认是web的方式
    sleuth:
      sampler:
        probability: 1.0  #100%取样,生产环境应该低一点,用不着全部取出来
server:
  port: 8081

四个服务工程除了端口不一样,其他都一样,zipkin监控serviceName不做配置,默认会使用Spring Application Name

附RestTemplate实例化代码
@Bean
public RestTemplate restTemplate(ClientHttpRequestFactory factory) {
   return new RestTemplate(factory);
}

@Bean
public ClientHttpRequestFactory clientHttpRequestFactory() {
   SimpleClientHttpRequestFactory factory = new SimpleClientHttpRequestFactory();
   factory.setConnectTimeout(5000);
   factory.setReadTimeout(5000);
   return factory;
}

4.3 测试结果

当我们访问http://localhost:8081/service1/test时,会生成一个完整的调用链路json数据,通过4.1搭建的zipkin server UI可以查看详情:

java链路跟踪 springboot链路追踪_java_05

点击show查看各server请求详情:

java链路跟踪 springboot链路追踪_java链路跟踪_06

进入service3详情:

java链路跟踪 springboot链路追踪_java链路跟踪_07

可以自己通过CS,SS,CF,SF计算是网络延迟引起的问题还是逻辑处理引起的性能问题!