目录
ES组成
springboot整合ES
detail
@Field
调用方法
版本
es 索引客户端
理论
匹配查询
实践
目的
文章索引
文章dao类
新建索引
插入假数据
查询数据
字段权重
当label标签权重大的搜索结果
当title标题权重大的搜索结果
精确查询
github
下一篇:配置停词还有同义词
ES组成
indexes->Document->type(类似表)->Field
springboot整合ES
之前写过这篇博客
detail
TestRepository
public interface ItemRepository extends ElasticsearchRepository<Item,Long> {
List<Item> findByWeightsBetween(double price1, double price2);
List<Item> findByCategoryOrderByWeights(String category);
}
类似jpa的Repository
@Field
中文分词
analyzer = "ik_max_word"
FieldType.Text支持全文搜索,所以可以进行分词
FieldType.Keyword支持精确查询,不支持分词,支持聚合等等
调用方法
参考这一篇 ,比较简单实现
版本
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.4.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>demo</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-data-elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.54</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
版本还要跟客户端es版本进行匹配
es 索引客户端
Elasticsearch-head https://github.com/mobz/elasticsearch-head
安装看下这一篇
看下es配置elasticsearch.yml
在最下面加了
http.cors.enabled: true
http.cors.allow-origin: "*"
node.master: true
node.data: true
Elasticsearch-head安装文章进行安装即可。
理论
由于想要通过个人化进行搜索文章的需求,后面发现es解决不了,还是得上算法解决。
可以用到标签的智能化应用https://www.elastic.co/guide/en/elasticsearch/guide/current/stopwords-relavance.html
匹配查询
@Test
public void a(){
QueryBuilder queryBuilder= QueryBuilders.boolQuery()
//.must(QueryBuilders.matchQuery("title","鸡胸").minimumShouldMatch("100%"));
.must(QueryBuilders.matchQuery("category","吃货"));
//请求精度QueryBuilders.matchQuery("category","吃").minimumShouldMatch()
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.build();
// 搜索,获取结果
Page<Item> items = this.itemRepository.search(searchQuery);
// 总条数
long total = items.getTotalElements();
System.out.println("总条数 = " + total);
// 总页数
System.out.println("总页数 = " + items.getTotalPages());
// 当前页
System.out.println("当前页:" + items.getNumber());
// 每页大小
System.out.println("每页大小:" + items.getSize());
for (Item item : items) {
System.out.println(JSON.toJSON(item));
}
}
es支持的各类匹配查询 ,上面是match查询,会进行分词查询。
minimumShouldMatch通过这个提高匹配程度,一般提到100%就是完全匹配,其他数值都跟不设置一毛一样。
QueryBuilders.boolQuery().must ->and功能,必须匹配到这一项
QueryBuilders.boolQuery().should ->or功能,可能会匹配到
boost的方法进行修改字段权重
实践
目的
为了实现用户搜索对标签的匹配度,搜索到更接近的文章,这里使用es的相似度算法
文章索引
@Data
@Document(indexName = "article", type = "docs", shards = 1, replicas = 0)
public class Article {
@Id
private Long id;
/**
* 标题
*/
@Field(type = FieldType.Text, analyzer = "ik_max_word")
private String title;
/**
* 标签
*/
@Field(type = FieldType.Text, analyzer = "ik_max_word")
private String label;
/**
* 文章id
*/
@Field(type = FieldType.Long)
private Long articleId;
}
文章dao类
public interface ArticleRepository extends ElasticsearchRepository<Article, Long> {
}
新建索引
@Resource
ElasticsearchTemplate esTemplate;
@Test
public void insert() {
esTemplate.createIndex(Article.class);
}
插入假数据
@Resource
ArticleRepository articleRepository;
@Test
public void insertxx() {
List<Article> list = new ArrayList<>();
String[] titles = {"美妆", "护肤", "口红", "眼影", "粉底液", "遮瑕液", "眼霜", "亮白", "清洁", "肿眼泡", "发型", "大小眼","底妆","懂彩妆","爱种草","单眼皮",
"礼物","好物分享","上新","内双"};
String[] labels = {"脸基尼、DIY面膜…当代护肤迷惑行为大赏,笑skr人", "完子护肤Q&A:根据肤质选择适合自己的卸妆产品", "超全眼型大解析 | 眼妆一直画不好,原来是因为这个!",
"分情况选择适合自己的面膜", "如何选择适合自己的卸妆产品", "学会堪比微整,化妆师最怕被偷师的7个技巧", "变美不用挨针!如何用最高性价比的单品,get水光肌?",
"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!", "全民沉迷哪吒仿妆?我却被这些超实用的清透眼妆圈粉了!", "敏感星人最爱的“全能”成分,补水修复舒缓一步到位!"
,"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","如何用最高性价比的单品,get水光肌","气味贩卖 | 哪一种味道,能让ta立刻注意到你"
,"人无完人","拯救黄皮 | 什么妆容适合黄皮?看这篇教科书级示范~","6月减肉难题 | 带妆运动是作死还是精致,终于有了答案","当代忙碌女青年的夏天,请一切从简"
,"美到报警的12色动物盘,全4盘眼妆笔记"};
for (int i = 0; i < 100; i++) {
Article article = new Article();
article.setId(IdUtils.getId());
article.setTitle(titles[getRandom(titles.length)]+","+titles[getRandom(titles.length)]);
article.setArticleId(IdUtils.getId());
article.setLabel(labels[getRandom(labels.length)]);
list.add(article);
}
// 接收对象集合,实现批量新增
articleRepository.saveAll(list);
}
查询数据
模拟下用户搜索,我们需要定义一下规则,举个栗子:标题大于标签
@Test
public void a() {
QueryBuilder queryBuilder = QueryBuilders.boolQuery()
//.must(QueryBuilders.matchQuery("title","鸡胸").minimumShouldMatch("100%"));
.should(QueryBuilders.matchQuery("title", "斩男"))
.should(QueryBuilders.matchQuery("label", "斩男").boost(5));
//请求精度QueryBuilders.matchQuery("category","吃").minimumShouldMatch()
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.build();
// 搜索,获取结果
Page<Article> articles = this.articleRepository.search(searchQuery);
// 总条数
long total = articles.getTotalElements();
System.out.println("总条数 = " + total);
// 总页数
System.out.println("总页数 = " + articles.getTotalPages());
// 当前页
System.out.println("当前页:" + articles.getNumber());
// 每页大小
System.out.println("每页大小:" + articles.getSize());
for (Article article : articles) {
System.out.println(JSON.toJSON(article));
}
}
由于才有分词,minimumShouldMatch()方法可以提高匹配度,在100%的时候是精准查询,其他的都是全文搜索
如果我们需要查询两个字段需要使用QueryBuilder queryBuilder = QueryBuilders.boolQuery().should 如果是must方法的话,显示结果都会出现在最前面,后面的权重失去了意义。
字段权重
设置字段权重,5的话权重会大很多,一般boost默认为1
当label标签权重大的搜索结果
{"articleId":6568011881606291459,"id":6568011881606291458,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"美妆,眼影"}
{"articleId":6568011881606291513,"id":6568011881606291512,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"懂彩妆,粉底液"}
{"articleId":6568011881937641490,"id":6568011881937641489,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"大小眼,发型"}
{"articleId":6568011881606291493,"id":6568011881606291492,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"发型,粉底液"}
{"articleId":6568011881610485773,"id":6568011881610485772,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"粉底液,懂彩妆"}
{"articleId":6568011881610485809,"id":6568011881610485808,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"亮白,口红"}
{"articleId":6568011881610485821,"id":6568011881610485820,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"上新,内双"}
{"articleId":6568045942815072257,"id":6568045942815072256,"label":"按司农爱上你才送","title":"斩男"}
当title标题权重大的搜索结果
{"articleId":6568045942815072257,"id":6568045942815072256,"label":"按司农爱上你才送","title":"斩男"}
{"articleId":6568011881606291459,"id":6568011881606291458,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"美妆,眼影"}
{"articleId":6568011881606291513,"id":6568011881606291512,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"懂彩妆,粉底液"}
{"articleId":6568011881937641490,"id":6568011881937641489,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"大小眼,发型"}
{"articleId":6568011881606291493,"id":6568011881606291492,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"发型,粉底液"}
{"articleId":6568011881610485773,"id":6568011881610485772,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"粉底液,懂彩妆"}
{"articleId":6568011881610485809,"id":6568011881610485808,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"亮白,口红"}
{"articleId":6568011881610485821,"id":6568011881610485820,"label":"十分钟出门妆 | 直男无法分辨的超自然伪素颜!","title":"上新,内双"}
可以发现结果是不一样的,因为字段的权重设置不一样
精确查询
QueryBuilder queryBuilder = QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("title", "斩男"))
.should(QueryBuilders.matchQuery("label", "斩男").minimumShouldMatch("100%"));
结果
{"articleId":6568045942815072257,"id":6568045942815072256,"label":"按司农爱上你才送","title":"斩男"}
{"articleId":6568011881606291459,"id":6568011881606291458,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"美妆,眼影"}
{"articleId":6568011881606291513,"id":6568011881606291512,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"懂彩妆,粉底液"}
{"articleId":6568011881937641490,"id":6568011881937641489,"label":"涨姿势 | 斩男必备的眼唇配色小心机,让你妆容美到犯规!","title":"大小眼,发型"}
到此实现了根据用户搜索的词语进行权重查询