通过web sql 直接我们可以了解一些dremio 查询执行的处理,对于web sql 执行来说如果从实际来说应该包含了两类: 通过rest api 执行sql 的以及通过web ui 操作的

rest api sql 执行

sql 查询流程为: 先提交任务,然后通过任务id 或者sql 执行的状态,然后获取数据,这个官方包含了比较完整的说明
代码处理(api 目录下的sql api 服务)
job 任务提交

 

public QueryDetails runQuery(CreateFromSQL sql) {
  final SqlQuery sqlQuery = JobRequestUtil.createSqlQuery(sql.getSql(), sql.getContext(),securityContext.getUserPrincipal().getName(), sql.getEngineName(), null);
  final JobSubmittedListener listener = new JobSubmittedListener();
  // 基于jobservice 处理
  final JobId jobId = jobs.submitJob(SubmitJobRequest.newBuilder()
    .setSqlQuery(sqlQuery)
    .setQueryType(QueryType.REST)
    .build(), listener)
    .getJobId();
 
  // if async disabled, wait until job has been submitted then return
  if (!projectOptionManager.getOption(ExecConstants.REST_API_RUN_QUERY_ASYNC)) {
    listener.await();
  }
 
  return new QueryDetails(jobId.getId());
}
job 状态查询

public JobStatus getJobStatus(@PathParam("id") String id) {
    int retryCount = 4;
    int i = 0;
 
    while (i++ < retryCount) {
      try {
        JobDetailsRequest request = JobDetailsRequest.newBuilder()
          .setJobId(com.dremio.service.job.proto.JobProtobuf.JobId.newBuilder().setId(id).build())
          .setUserName(securityContext.getUserPrincipal().getName())
          .build();
       // 也是基于jobservice 处理的
        JobDetails jobDetails = jobs.getJobDetails(request);
        return JobStatus.fromJob(jobDetails);
      } catch (JobNotFoundException e) {
        throw new NotFoundException(String.format("Could not find a job with id [%s]", id));
      } catch (io.grpc.StatusRuntimeException runtimeException) {
        // Retry if we hit runTimeException
        if (i == retryCount) {
          // Throw Exception, if retry count is exceeded.
          throw new javax.ws.rs.InternalServerErrorException(
            String.format("Getting job Status for job [%s] failed with an internal exception, please retry", id));
        }
        try {
          // hardcoding the sleep here, 3*50 = > 150ms is enough sleep time.
          Thread.sleep(50);
        } catch (InterruptedException interruptedException) {
          //ignore.
        }
      }
    }
    // We will never reach here.
    return null;
  }
job 结果查询

// 可以看出有一个500 的硬编码限时,对于结果的处理也是通过的jobservice 处理的,数据处理上基于了dremio 自己包装的一个,内部会基于FlightTicket 进行内部数据处理,实际上还是arrow flight 处理机制
public JobResourceData getQueryResults(@PathParam("id") String id, @QueryParam("offset") @DefaultValue("0") Integer offset, @Valid @QueryParam("limit") @DefaultValue("100") Integer limit) {
  Preconditions.checkArgument(limit <= 500,"limit can not exceed 500 rows");
  try {
    JobSummaryRequest request = JobSummaryRequest.newBuilder()
      .setJobId(JobProtobuf.JobId.newBuilder().setId(id).build())
      .setUserName(securityContext.getUserPrincipal().getName())
      .build();
    JobSummary jobSummary = jobs.getJobSummary(request);
 
    if (jobSummary.getJobState() != JobState.COMPLETED) {
      throw new BadRequestException(String.format("Can not fetch details for a job that is in [%s] state.", jobSummary.getJobState()));
    }
    // Additional wait not necessary since we check for job completion via JobState
    return new JobResourceData(jobs, jobSummary, securityContext.getUserPrincipal().getName(),
      getOrCreateAllocator("getQueryResults"),  offset, limit);
  } catch (JobNotFoundException e) {
    throw new NotFoundException(String.format("Could not find a job with id [%s]", id));
  }
}

web ui sql 执行

参考代码处理

// 可以看出是直接就处理返回了,不像rest api 需要经过多阶段处理
// 可以看出是直接就处理返回了,不像rest api 需要经过多阶段处理
public JobDataFragment query(CreateFromSQL sql) {
  final SqlQuery query = JobRequestUtil.createSqlQuery(sql.getSql(), sql.getContext(), securityContext.getUserPrincipal().getName());
  // Pagination is not supported in this API, so we need to truncate the results to 500 records
  final CompletionListener listener = new CompletionListener();
  final JobSubmission jobSubmission = jobs.submitJob(SubmitJobRequest.newBuilder().setSqlQuery(query).setQueryType(QueryType.REST).build(), listener);
  listener.awaitUnchecked();
  return new JobDataWrapper(jobs, jobSubmission.getJobId(), jobSubmission.getSessionId(), securityContext.getUserPrincipal().getName())
    .truncate(getOrCreateAllocator("query"), 500);
}

同时还有一个另外的执行地方,也是一个多阶段处理,实际数据的处理也是通过job服务处理的,只是基于的bff api 处理的
完整的api 是在dac backend 模块的explore 目录中,包含了数据集处理的api ,整体处理上是先于获取一个jobid,然后都会基于此
jobid 进行操作,同时是支持多版本操作的,因为处理上与ui 有很大的关联性,详细的后边我单独介绍下

dremio 支持的查询类型

 

// 此类型在dremio 中还是很重要的,比如企业版的查询路由就会基于此进行判断处理,dremio 部分代码可以看出来
 
// 此类型在dremio 中还是很重要的,比如企业版的查询路由就会基于此进行判断处理,dremio 部分代码可以看出来
enum QueryType {
  UNKNOWN = 0;
  UI_INITIAL_PREVIEW = 100; // create initial preview response
  UI_INTERNAL_PREVIEW = 101; // a preview query internal to the operation of the dac such as format settings preview
  UI_PREVIEW = 102; // run in preview mode
  PREPARE_INTERNAL = 103;
  UI_INTERNAL_RUN = 104; // a run query internal to operation of the dac such as histograms, transformation previews, card generation, etc.
  UI_RUN = 105; //actual run on the data
  UI_EXPORT = 106; // UI downloaded query
  ACCELERATOR_CREATE = 200; //accelerated dataset creation
  ACCELERATOR_DROP = 201; //accelerated  dataset drop
  ACCELERATOR_EXPLAIN = 202; //dependency graph construction
  ODBC = 300; //query submitted by odbc client
  JDBC = 400; //query submitted by jdbc client
  REST = 500; //query submitted by public rest interfaces
  FLIGHT = 600; //query submitted by FlightClient
  METADATA_REFRESH = 700; // Queries related to metadata refresh
  INTERNAL_ICEBERG_METADATA_DROP = 800; // Queries related to internal iceberg metadata drop
}

说明

以上是一个简单的web sql 执行介绍,实际上也是rest api 以及web ui sql 操作比较重要的部分,代码上还是值得学习参考的,尤其是对于web sql 数据查询部分的

参考资料

dac/backend/src/main/java/com/dremio/dac/explore/model/CreateFromSQL.java
dac/backend/src/main/java/com/dremio/dac/api/SQLResource.java
services/jobs/src/main/java/com/dremio/service/jobs/HybridJobsService.java
services/jobs/src/main/java/com/dremio/service/jobs/JobsClient.java
services/jobs/src/main/java/com/dremio/service/jobs/JobsService.java
dac/backend/src/test/java/com/dremio/dac/api/TestSQLResource.java
dac/backend/src/main/java/com/dremio/dac/api/JobResource.java
dac/backend/src/main/java/com/dremio/dac/resource/SQLResource.java
dac/backend/src/main/java/com/dremio/dac/server/RestServerV2.java
dac/backend/src/main/java/com/dremio/dac/server/APIServer.java
dac/backend/src/main/java/com/dremio/dac/server/WebServer.java
dac/backend/src/main/java/com/dremio/dac/api/JobStatus.java
dac/backend/src/main/java/com/dremio/dac/model/job/JobDataWrapper.java
dac/backend/src/main/java/com/dremio/dac/model/job/JobDataFragmentWrapper.java
dac/backend/src/main/java/com/dremio/dac/explore
dac/backend/src/main/java/com/dremio/dac/api
dac/backend/src/main/java/com/dremio/dac/explore/DatasetsResource.java
dac/backend/src/main/java/com/dremio/dac/resource
https://docs.dremio.com/software/rest-api/sql/post-sql/