目录

ES组成

springboot整合ES

detail

@Field

调用方法

版本

es 索引客户端

理论

匹配查询

实践

目的

文章索引

文章dao类

新建索引

插入假数据

查询数据

字段权重

当label标签权重大的搜索结果

当title标题权重大的搜索结果

精确查询

github

下一篇:配置停词还有同义词


ES组成

indexes->Document->type(类似表)->Field

springboot整合ES

之前写过这篇博客

detail

TestRepository

public interface ItemRepository extends ElasticsearchRepository<Item,Long> {

    List<Item> findByWeightsBetween(double price1, double price2);

    List<Item> findByCategoryOrderByWeights(String category);

}

类似jpa的Repository

@Field

中文分词

analyzer = "ik_max_word"
FieldType.Text支持全文搜索,所以可以进行分词
FieldType.Keyword支持精确查询,不支持分词,支持聚合等等

调用方法

参考这一篇 ,比较简单实现

版本

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.0.4.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.example</groupId>
    <artifactId>demo</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>demo</name>
    <description>Demo project for Spring Boot</description>

    <properties>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-data-elasticsearch -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.54</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

版本还要跟客户端es版本进行匹配

 

es 索引客户端

Elasticsearch-head https://github.com/mobz/elasticsearch-head

安装看下这一篇

看下es配置elasticsearch.yml

在最下面加了

http.cors.enabled: true 
http.cors.allow-origin: "*"
node.master: true
node.data: true

Elasticsearch-head安装文章进行安装即可。

es curl 模糊查询 es模糊查询效率_es curl 模糊查询

 

理论

由于想要通过个人化进行搜索文章的需求,后面发现es解决不了,还是得上算法解决。

可以用到标签的智能化应用https://www.elastic.co/guide/en/elasticsearch/guide/current/stopwords-relavance.html

匹配查询

@Test
    public void a(){
        QueryBuilder queryBuilder= QueryBuilders.boolQuery()
                //.must(QueryBuilders.matchQuery("title","鸡胸").minimumShouldMatch("100%"));
                .must(QueryBuilders.matchQuery("category","吃货"));
        //请求精度QueryBuilders.matchQuery("category","吃").minimumShouldMatch()

        SearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(queryBuilder)
                .build();

        // 搜索,获取结果
        Page<Item> items = this.itemRepository.search(searchQuery);
        // 总条数
        long total = items.getTotalElements();
        System.out.println("总条数 = " + total);
        // 总页数
        System.out.println("总页数 = " + items.getTotalPages());
        // 当前页
        System.out.println("当前页:" + items.getNumber());
        // 每页大小
        System.out.println("每页大小:" + items.getSize());

        for (Item item : items) {
            System.out.println(JSON.toJSON(item));
        }
    }

es支持的各类匹配查询 ,上面是match查询,会进行分词查询。

minimumShouldMatch通过这个提高匹配程度,一般提到100%就是完全匹配,其他数值都跟不设置一毛一样。

QueryBuilders.boolQuery().must ->and功能,必须匹配到这一项
QueryBuilders.boolQuery().should ->or功能,可能会匹配到

boost的方法进行修改字段权重

实践

目的

为了实现用户搜索对标签的匹配度,搜索到更接近的文章,这里使用es的相似度算法

文章索引

@Data
@Document(indexName = "article", type = "docs", shards = 1, replicas = 0)
public class Article {

    @Id
    private Long id;

    /**
     * 标题
     */
    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String title;

    /**
     * 标签
     */
    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String label;

    /**
     * 文章id
     */
    @Field(type = FieldType.Long)
    private Long articleId;

}

文章dao类

public interface ArticleRepository extends ElasticsearchRepository<Article, Long> {
}

新建索引

@Resource
    ElasticsearchTemplate esTemplate;

    @Test
    public void insert() {
        esTemplate.createIndex(Article.class);
    }

插入假数据

@Resource
    ArticleRepository articleRepository;

    @Test
    public void insertxx() {
        List<Article> list = new ArrayList<>();
        String[] titles = {"美妆", "护肤", "口红", "眼影", "粉底液", "遮瑕液", "眼霜", "亮白", "清洁", "肿眼泡", "发型", "大小眼","底妆","懂彩妆","爱种草","单眼皮",
                            "礼物","好物分享","上新","内双"};
        String[] labels = {"脸基尼、DIY面膜…当代护肤迷惑行为大赏,笑skr人", "完子护肤Q&A:根据肤质选择适合自己的卸妆产品", "超全眼型大解析 | 眼妆一直画不好,原来是因为这个!",
                "分情况选择适合自己的面膜", "如何选择适合自己的卸妆产品", "学会堪比微整,化妆师最怕被偷师的7个技巧", "变美不用挨针!如何用最高性价比的单品,get水光肌?",
                "涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!", "全民沉迷哪吒仿妆?我却被这些超实用的清透眼妆圈粉了!", "敏感星人最爱的“全能”成分,补水修复舒缓一步到位!"
                ,"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","如何用最高性价比的单品,get水光肌","气味贩卖 | 哪一种味道,能让ta立刻注意到你"
                ,"人无完人","拯救黄皮 | 什么妆容适合黄皮?看这篇教科书级示范~","6月减肉难题 | 带妆运动是作死还是精致,终于有了答案","当代忙碌女青年的夏天,请一切从简"
                ,"美到报警的12色动物盘,全4盘眼妆笔记"};
        for (int i = 0; i < 100; i++) {
            Article article = new Article();
            article.setId(IdUtils.getId());
            article.setTitle(titles[getRandom(titles.length)]+","+titles[getRandom(titles.length)]);
            article.setArticleId(IdUtils.getId());
            article.setLabel(labels[getRandom(labels.length)]);
            list.add(article);
        }
        // 接收对象集合,实现批量新增
        articleRepository.saveAll(list);
    }

es curl 模糊查询 es模糊查询效率_spring_02

查询数据

模拟下用户搜索,我们需要定义一下规则,举个栗子:标题大于标签

@Test
    public void a() {
        QueryBuilder queryBuilder = QueryBuilders.boolQuery()
                //.must(QueryBuilders.matchQuery("title","鸡胸").minimumShouldMatch("100%"));
                .should(QueryBuilders.matchQuery("title", "斩男"))
                .should(QueryBuilders.matchQuery("label", "斩男").boost(5));
        //请求精度QueryBuilders.matchQuery("category","吃").minimumShouldMatch()

        SearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(queryBuilder)
                .build();

        // 搜索,获取结果
        Page<Article> articles = this.articleRepository.search(searchQuery);
        // 总条数
        long total = articles.getTotalElements();
        System.out.println("总条数 = " + total);
        // 总页数
        System.out.println("总页数 = " + articles.getTotalPages());
        // 当前页
        System.out.println("当前页:" + articles.getNumber());
        // 每页大小
        System.out.println("每页大小:" + articles.getSize());

        for (Article article : articles) {
            System.out.println(JSON.toJSON(article));
        }
    }

由于才有分词,minimumShouldMatch()方法可以提高匹配度,在100%的时候是精准查询,其他的都是全文搜索

如果我们需要查询两个字段需要使用QueryBuilder queryBuilder = QueryBuilders.boolQuery().should 如果是must方法的话,显示结果都会出现在最前面,后面的权重失去了意义。

字段权重

es curl 模糊查询 es模糊查询效率_es curl 模糊查询_03

设置字段权重,5的话权重会大很多,一般boost默认为1

当label标签权重大的搜索结果

{"articleId":6568011881606291459,"id":6568011881606291458,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"美妆,眼影"}
{"articleId":6568011881606291513,"id":6568011881606291512,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"懂彩妆,粉底液"}
{"articleId":6568011881937641490,"id":6568011881937641489,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"大小眼,发型"}
{"articleId":6568011881606291493,"id":6568011881606291492,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"发型,粉底液"}
{"articleId":6568011881610485773,"id":6568011881610485772,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"粉底液,懂彩妆"}
{"articleId":6568011881610485809,"id":6568011881610485808,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"亮白,口红"}
{"articleId":6568011881610485821,"id":6568011881610485820,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"上新,内双"}
{"articleId":6568045942815072257,"id":6568045942815072256,"label":"按司农爱上你才送","title":"斩男"}

当title标题权重大的搜索结果

{"articleId":6568045942815072257,"id":6568045942815072256,"label":"按司农爱上你才送","title":"斩男"}
{"articleId":6568011881606291459,"id":6568011881606291458,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"美妆,眼影"}
{"articleId":6568011881606291513,"id":6568011881606291512,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"懂彩妆,粉底液"}
{"articleId":6568011881937641490,"id":6568011881937641489,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"大小眼,发型"}
{"articleId":6568011881606291493,"id":6568011881606291492,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"发型,粉底液"}
{"articleId":6568011881610485773,"id":6568011881610485772,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"粉底液,懂彩妆"}
{"articleId":6568011881610485809,"id":6568011881610485808,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"亮白,口红"}
{"articleId":6568011881610485821,"id":6568011881610485820,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"上新,内双"}

可以发现结果是不一样的,因为字段的权重设置不一样

精确查询

QueryBuilder queryBuilder = QueryBuilders.boolQuery()
                .should(QueryBuilders.matchQuery("title", "斩男"))
                .should(QueryBuilders.matchQuery("label", "斩男").minimumShouldMatch("100%"));

结果

{"articleId":6568045942815072257,"id":6568045942815072256,"label":"按司农爱上你才送","title":"斩男"}
{"articleId":6568011881606291459,"id":6568011881606291458,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"美妆,眼影"}
{"articleId":6568011881606291513,"id":6568011881606291512,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"懂彩妆,粉底液"}
{"articleId":6568011881937641490,"id":6568011881937641489,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"大小眼,发型"}

到此实现了根据用户搜索的词语进行权重查询