结论

原因1 配置问题 配置文件中的recordDataTTL、otherMetricsDataTTL配置不生效, 可以认为是bug
解决方案:方法1:手动设置minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL。recordData的删除会使用dayMetricsDataTTL配置的值。方法2:修改源码

原因2 Skywalking Bug skywalking-6.2.0如果设置了nameSpace 在删除index的时候有bug, 需要改源码重新编译
解决方案:方法1:把namespace设置为空。方法2:修改源码

环境

Skywalking版本:6.2.0
ES实例:4核 * 14G, 三台实例,基于docker起的
OAPServer:只有一台,1500M
agent节点:也就是JVM实例大概50个

解决过程

1. 配置问题

通过查看源码找到删除ES历史数据的核心代码,如下。先根据该model(如Segement ,各种Metrics)的Downsampling配置和DataTTLConfig计算出截止时间,小于该截止时间的index需要删除

es删除mappings es删除数据后空间不释放_ide


DownSampling是一个枚举

public enum Downsampling {
    None(0, ""), Second(1, "second"), Minute(2, "minute"), Hour(3, "hour"), Day(4, "day"), Month(5, "month");

    private final int value;
    private final String name;

    Downsampling(int value, String name) {
        this.value = value;
        this.name = name;
    }

    public int getValue() {
        return value;
    }

    public String getName() {
        return name;
    }
}

DataTTLConfig就是配置各个类型过期时间的配置,record和metrics

@Setter
@Getter
public class DataTTLConfig {
    private int recordDataTTL;
    private int minuteMetricsDataTTL;
    private int hourMetricsDataTTL;
    private int dayMetricsDataTTL;
    private int monthMetricsDataTTL;
}

回过头看deleteHistory的逻辑,主要看一下计算截止时间timeBefore的逻辑,可以看到截止时间只与model的Downsampling和DataTTLConfig有关
StorageTTL的实现类为ElasticsearchStorageTTL, ElasticsearchStorageTTL的工作就是根据DownSampling返回对应的TTLCalculator。举例TTLCalculator的实现类EsMinuteTTLCalculator,可以看到会根据当前时间和DataTTLConfig的MinuteMetricsDataTTL配置计算时间,单位为 , 而EsHourTTLCalculator会使用DataTTLConfig的hourMetricsDataTTL计算时间,TTLCalculator与DataTTLConfig是有对应关系的

public class ElasticsearchStorageTTL implements StorageTTL {

    @Override public TTLCalculator calculator(Downsampling downsampling) {
        switch (downsampling) {
            case Month:
                return new MonthTTLCalculator();
            case Hour:
                return new EsHourTTLCalculator();
            case Minute:
                return new EsMinuteTTLCalculator();
            default:
                return new DayTTLCalculator();
        }
    }
}

public class EsMinuteTTLCalculator implements TTLCalculator {
    @Override public long timeBefore(DateTime currentTime, DataTTLConfig dataTTLConfig) {
        return Long.valueOf(currentTime.plusDays(0 - dataTTLConfig.getMinuteMetricsDataTTL()).toString("yyyyMMdd"));
    }
}

这里顺便说一下为什么recordDataTTL配置不会生效,Record类型的DownSampling为Second,但是从上面可以看到ElasticsearchStorageTTL中并没有case Second,所以遇到Second的话会返回DayTTLCalculator,而DayTTLCalculator使用的dataTTLConfig的DayMetricsDataTTL,recordDataTTL也就没有用了

es删除mappings es删除数据后空间不释放_Elastic_02


接下来只用DataTTLConfig是如何获取到的就可以了,从上面的deleteHistory代码可以看到DataTTLConfig是从CoreModule的ConfigService中读取的,其实也就是从application.yml的core Module配置读取的,不过StorageModuleElasticsearchProvider在启动的时候会用StorageModuleElasticsearchConfig覆盖CoreModule中的DataTTLConfig

org.apache.skywalking.oap.server.storage.plugin.elasticsearch.StorageModuleElasticsearchProvider

private void overrideCoreModuleTTLConfig() {
        ConfigService configService = getManager().find(CoreModule.NAME).provider().getService(ConfigService.class);

        configService.getDataTTLConfig().setRecordDataTTL(config.getRecordDataTTL());
        configService.getDataTTLConfig().setMinuteMetricsDataTTL(config.getMinuteMetricsDataTTL());
        configService.getDataTTLConfig().setHourMetricsDataTTL(config.getHourMetricsDataTTL());
        configService.getDataTTLConfig().setDayMetricsDataTTL(config.getDayMetricsDataTTL());
        configService.getDataTTLConfig().setMonthMetricsDataTTL(config.getMonthMetricsDataTTL());
    }

看下配置StorageModuleElasticsearchConfig,我们主要看与otherMetricsDataTTL相关的配置,从下面的代码可以看到,作者是想在otherMetricsDataTTL被赋值的时候自动把minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL给赋值上。但是由于系统启动的时候是通过反射直接修改的Field,所以setOtherMetricsDataTTL方法并不会被触发,这也就是我们在配置文件中配置了otherMetricsDataTTL也不会生效的原因,系统只会用默认的2

@Getter
public class StorageModuleElasticsearchConfig extends ModuleConfig {
    @Setter private int recordDataTTL = 7;
    @Setter private int minuteMetricsDataTTL = 2;
    @Setter private int hourMetricsDataTTL = 2;
    @Setter private int dayMetricsDataTTL = 2;
    private int otherMetricsDataTTL = 0;
    @Setter private int monthMetricsDataTTL = 18;

    public void setOtherMetricsDataTTL(int otherMetricsDataTTL) {
        if (otherMetricsDataTTL > 0) {
            minuteMetricsDataTTL = otherMetricsDataTTL;
            hourMetricsDataTTL = otherMetricsDataTTL;
            dayMetricsDataTTL = otherMetricsDataTTL;
        }
    }
}

系统启动时通过反射赋值Config的相关代码

org.apache.skywalking.oap.server.library.module.ModuleDefine

private void copyProperties(ModuleConfig dest, Properties src, String moduleName,
        String providerName) throws IllegalAccessException {
        if (dest == null) {
            return;
        }
        Enumeration<?> propertyNames = src.propertyNames();
        while (propertyNames.hasMoreElements()) {
            String propertyName = (String)propertyNames.nextElement();
            Class<? extends ModuleConfig> destClass = dest.getClass();

            try {
                Field field = getDeclaredField(destClass, propertyName);
                field.setAccessible(true);
                field.set(dest, src.get(propertyName));
            } catch (NoSuchFieldException e) {
                logger.warn(propertyName + " setting is not supported in " + providerName + " provider of " + moduleName + " module");
            }
        }
    }
配置问题的解决方案

方法1:直接在配置文件中配置minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL参数,而不使用默认的otherMetricsDataTTL。recordData的删除会使用dayMetricsDataTTL配置的值

方法2:修改源码,手动调用一下setOtherMetricsDataTTL

es删除mappings es删除数据后空间不释放_Elastic_03

2. 删除Index的bug问题

这个问题相对比较明显,从上面的deleteHistory中我们看到根据alias查询出index,然后判断时间过期的index会被调用删除逻辑,问题就出在deleteIndex的地方。如下可以看到在删除之前会在传入的indexName前面添加namespace,问题是此时传入的idnexName已经包含了Namespace信息了(是根据alias直接从es中查询出来的),再添加一次namespace就会导致找不到index,而导致删除index失败

public boolean deleteIndex(String indexName) throws IOException {
        indexName = formatIndexName(indexName);
        DeleteIndexRequest request = new DeleteIndexRequest(indexName);
        DeleteIndexResponse response;
        response = client.indices().delete(request);
        logger.debug("delete {} index finished, isAcknowledged: {}", indexName, response.isAcknowledged());
        return response.isAcknowledged();
    }

public String formatIndexName(String indexName) {
        if (StringUtils.isNotEmpty(namespace)) {
            return namespace + "_" + indexName;
        }
        return indexName;
    }

解决方案也很简单, 添加一个deleteIndexWithFullIndexName方法,这个地方直接调用deleteIndexWithFullIndexName即可

public boolean deleteIndex(String indexName) throws IOException {
        String fullIndexName = formatIndexName(indexName);
        return deleteIndexWithFullIndexName(fullIndexName);
    }

    public boolean deleteIndexWithFullIndexName(String fullIndexName) throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest(fullIndexName);
        DeleteIndexResponse response;
        response = client.indices().delete(request);
        logger.debug("delete {} index finished, isAcknowledged: {}", fullIndexName, response.isAcknowledged());
        return response.isAcknowledged();
    }

es删除mappings es删除数据后空间不释放_解决方案_04