通过web sql 直接我们可以了解一些dremio 查询执行的处理,对于web sql 执行来说如果从实际来说应该包含了两类: 通过rest api 执行sql 的以及通过web ui 操作的
rest api sql 执行
sql 查询流程为: 先提交任务,然后通过任务id 或者sql 执行的状态,然后获取数据,这个官方包含了比较完整的说明
代码处理(api 目录下的sql api 服务)
job 任务提交
public QueryDetails runQuery(CreateFromSQL sql) {
final SqlQuery sqlQuery = JobRequestUtil.createSqlQuery(sql.getSql(), sql.getContext(),securityContext.getUserPrincipal().getName(), sql.getEngineName(), null);
final JobSubmittedListener listener = new JobSubmittedListener();
// 基于jobservice 处理
final JobId jobId = jobs.submitJob(SubmitJobRequest.newBuilder()
.setSqlQuery(sqlQuery)
.setQueryType(QueryType.REST)
.build(), listener)
.getJobId();
// if async disabled, wait until job has been submitted then return
if (!projectOptionManager.getOption(ExecConstants.REST_API_RUN_QUERY_ASYNC)) {
listener.await();
}
return new QueryDetails(jobId.getId());
}
job 状态查询
public JobStatus getJobStatus(@PathParam("id") String id) {
int retryCount = 4;
int i = 0;
while (i++ < retryCount) {
try {
JobDetailsRequest request = JobDetailsRequest.newBuilder()
.setJobId(com.dremio.service.job.proto.JobProtobuf.JobId.newBuilder().setId(id).build())
.setUserName(securityContext.getUserPrincipal().getName())
.build();
// 也是基于jobservice 处理的
JobDetails jobDetails = jobs.getJobDetails(request);
return JobStatus.fromJob(jobDetails);
} catch (JobNotFoundException e) {
throw new NotFoundException(String.format("Could not find a job with id [%s]", id));
} catch (io.grpc.StatusRuntimeException runtimeException) {
// Retry if we hit runTimeException
if (i == retryCount) {
// Throw Exception, if retry count is exceeded.
throw new javax.ws.rs.InternalServerErrorException(
String.format("Getting job Status for job [%s] failed with an internal exception, please retry", id));
}
try {
// hardcoding the sleep here, 3*50 = > 150ms is enough sleep time.
Thread.sleep(50);
} catch (InterruptedException interruptedException) {
//ignore.
}
}
}
// We will never reach here.
return null;
}
job 结果查询
// 可以看出有一个500 的硬编码限时,对于结果的处理也是通过的jobservice 处理的,数据处理上基于了dremio 自己包装的一个,内部会基于FlightTicket 进行内部数据处理,实际上还是arrow flight 处理机制
public JobResourceData getQueryResults(@PathParam("id") String id, @QueryParam("offset") @DefaultValue("0") Integer offset, @Valid @QueryParam("limit") @DefaultValue("100") Integer limit) {
Preconditions.checkArgument(limit <= 500,"limit can not exceed 500 rows");
try {
JobSummaryRequest request = JobSummaryRequest.newBuilder()
.setJobId(JobProtobuf.JobId.newBuilder().setId(id).build())
.setUserName(securityContext.getUserPrincipal().getName())
.build();
JobSummary jobSummary = jobs.getJobSummary(request);
if (jobSummary.getJobState() != JobState.COMPLETED) {
throw new BadRequestException(String.format("Can not fetch details for a job that is in [%s] state.", jobSummary.getJobState()));
}
// Additional wait not necessary since we check for job completion via JobState
return new JobResourceData(jobs, jobSummary, securityContext.getUserPrincipal().getName(),
getOrCreateAllocator("getQueryResults"), offset, limit);
} catch (JobNotFoundException e) {
throw new NotFoundException(String.format("Could not find a job with id [%s]", id));
}
}
web ui sql 执行
参考代码处理
// 可以看出是直接就处理返回了,不像rest api 需要经过多阶段处理
// 可以看出是直接就处理返回了,不像rest api 需要经过多阶段处理
public JobDataFragment query(CreateFromSQL sql) {
final SqlQuery query = JobRequestUtil.createSqlQuery(sql.getSql(), sql.getContext(), securityContext.getUserPrincipal().getName());
// Pagination is not supported in this API, so we need to truncate the results to 500 records
final CompletionListener listener = new CompletionListener();
final JobSubmission jobSubmission = jobs.submitJob(SubmitJobRequest.newBuilder().setSqlQuery(query).setQueryType(QueryType.REST).build(), listener);
listener.awaitUnchecked();
return new JobDataWrapper(jobs, jobSubmission.getJobId(), jobSubmission.getSessionId(), securityContext.getUserPrincipal().getName())
.truncate(getOrCreateAllocator("query"), 500);
}
同时还有一个另外的执行地方,也是一个多阶段处理,实际数据的处理也是通过job服务处理的,只是基于的bff api 处理的
完整的api 是在dac backend 模块的explore 目录中,包含了数据集处理的api ,整体处理上是先于获取一个jobid,然后都会基于此
jobid 进行操作,同时是支持多版本操作的,因为处理上与ui 有很大的关联性,详细的后边我单独介绍下
dremio 支持的查询类型
// 此类型在dremio 中还是很重要的,比如企业版的查询路由就会基于此进行判断处理,dremio 部分代码可以看出来
// 此类型在dremio 中还是很重要的,比如企业版的查询路由就会基于此进行判断处理,dremio 部分代码可以看出来
enum QueryType {
UNKNOWN = 0;
UI_INITIAL_PREVIEW = 100; // create initial preview response
UI_INTERNAL_PREVIEW = 101; // a preview query internal to the operation of the dac such as format settings preview
UI_PREVIEW = 102; // run in preview mode
PREPARE_INTERNAL = 103;
UI_INTERNAL_RUN = 104; // a run query internal to operation of the dac such as histograms, transformation previews, card generation, etc.
UI_RUN = 105; //actual run on the data
UI_EXPORT = 106; // UI downloaded query
ACCELERATOR_CREATE = 200; //accelerated dataset creation
ACCELERATOR_DROP = 201; //accelerated dataset drop
ACCELERATOR_EXPLAIN = 202; //dependency graph construction
ODBC = 300; //query submitted by odbc client
JDBC = 400; //query submitted by jdbc client
REST = 500; //query submitted by public rest interfaces
FLIGHT = 600; //query submitted by FlightClient
METADATA_REFRESH = 700; // Queries related to metadata refresh
INTERNAL_ICEBERG_METADATA_DROP = 800; // Queries related to internal iceberg metadata drop
}
说明
以上是一个简单的web sql 执行介绍,实际上也是rest api 以及web ui sql 操作比较重要的部分,代码上还是值得学习参考的,尤其是对于web sql 数据查询部分的
参考资料
dac/backend/src/main/java/com/dremio/dac/explore/model/CreateFromSQL.java
dac/backend/src/main/java/com/dremio/dac/api/SQLResource.java
services/jobs/src/main/java/com/dremio/service/jobs/HybridJobsService.java
services/jobs/src/main/java/com/dremio/service/jobs/JobsClient.java
services/jobs/src/main/java/com/dremio/service/jobs/JobsService.java
dac/backend/src/test/java/com/dremio/dac/api/TestSQLResource.java
dac/backend/src/main/java/com/dremio/dac/api/JobResource.java
dac/backend/src/main/java/com/dremio/dac/resource/SQLResource.java
dac/backend/src/main/java/com/dremio/dac/server/RestServerV2.java
dac/backend/src/main/java/com/dremio/dac/server/APIServer.java
dac/backend/src/main/java/com/dremio/dac/server/WebServer.java
dac/backend/src/main/java/com/dremio/dac/api/JobStatus.java
dac/backend/src/main/java/com/dremio/dac/model/job/JobDataWrapper.java
dac/backend/src/main/java/com/dremio/dac/model/job/JobDataFragmentWrapper.java
dac/backend/src/main/java/com/dremio/dac/explore
dac/backend/src/main/java/com/dremio/dac/api
dac/backend/src/main/java/com/dremio/dac/explore/DatasetsResource.java
dac/backend/src/main/java/com/dremio/dac/resource
https://docs.dremio.com/software/rest-api/sql/post-sql/