SpringBoot --- 整合Elasticsearch
- 1.elasticsearch install
- 1.1 windows版本下载,解压
- 1.2 启动
- 1.3 访问
- 1.4 关键字介绍
- 2.elasticsearch ui
- 2.1 elasticsearch-head
- 2.2 elasticHD
- 2.3 kibana
- 3.Springboot整合Elasticsearch
- 3.1 官方文档
- 3.2 依赖
- 3.3 properties
- 3.4 代码
- 3.5 报警
- 4.集群搭建
- 4.1 修改配置文件
- 4.2 还可以修改密码(此步骤没什么用)
- 4.3 启动三个节点,查看状态
- 4.4 启动kibana,监控
- 5.elasticsearch api
- 5.1 CURD之Create
- 5.2 CURD之Update
- 5.3 CURD之Delete
- 5.4 CURD之Retrieve
- 5.5 match查询
- 5.6 term查询
- 5.6 排序查询
- 5.7 分页查询
- 6.Elasticsearch之布尔查询
- 6.1 must关键字查询
- 6.2 should关键字查询
- 6.3 must_not关键字查询
- 6.4 filter关键字查询
- 7.Elasticsearch之查询结果过滤
- 8.Elasticsearch之高亮查询
- 9.Elasticsearch之聚合查询
- 9.1 avg
- 9.2 max
- 9.3 min
- 9.4 sum
- 9.5 range分组查询
- 10.Elasticsearch之Mapping & Dynamic Mapping
- 10.1 mapping
- 10.2 dynamic mapping
- 1 dynamic mapping
- 2 explicit mapping
- 3 strict mapping
- 4 小结
- 10.3 对象属性
- 10.4 控制当前字段是否被索引
- 10.5 对Null值实现搜索
- 11.elasticsearch之setting
- 12.elasticsearch字段的数据类型
- 13.cluster node
- 13.Analyzer进行分词
- 13.1 分析器
- 1. 标准分析器:standard analyzer
- 2. 简单分析器:simple analyzer
- 3. 空白分析器:whitespace analyzer
- 4. 停用词分析器:stop analyzer
- 5. 关键词分析器:keyword analyzer
- 6. 模式分析器:pattern analyzer
- 7. 语言和多语言分析器:chinese
- 8. 雪球分析器:snowball analyzer
- 13.2 字符过滤器
- 1. HTML字符过滤器
- 2. 映射字符过滤器
- 3. 模式替换过滤器
- 13.3 分词器
- 1.标准分词器:standard tokenizer
- 2. 关键词分词器:keyword tokenizer
- 3. 字母分词器:letter tokenizer
- 4. 小写分词器:lowercase tokenizer
- 5. 空白分词器:whitespace tokenizer
- 6. 模式分词器:pattern tokenizer
- 7. UAX URL电子邮件分词器:UAX RUL email tokenizer
- 8. 路径层次分词器:path hierarchy tokenizer
- 13.4 分词过滤器
- 1. 自定义分词过滤器
- 2. 自定义小写分词过滤器
- 3. 多个分词过滤器
- 13.5 IK分词器
- 1. 下载
- 2. 介绍
- 3. 测试
- 14.正排索引和倒排索引
- 15.数据建模
- 16.集群的内部安全通信
1.elasticsearch install
项目 | Elasticsearch | Solr |
实时索引 | 不会产生线程阻塞,效能高于solr | 会有io阻塞 |
动态添加数据 | 对效能没有影响 | 效能会变得低下 |
分布式 | 本身自带分布式 | 利用zookeeper进行分布式管理 |
数据格式 | 仅支持json | xml,json,csv等等 |
地位 | 更适合新兴的实时搜索应用 | 传统应用的有力解决方案 |
官网地址:elasticsearch官网.最好下载一些低版本的,高版本整合会报警
- https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.2-windows-x86_64.zip.
- 9200 是ES对外部RESTFUL接口
- 9300 是ES内部使用的端口
1.1 windows版本下载,解压
1.2 启动
1.3 访问
1.4 关键字介绍
Name | Desc |
index | 相当于Mysql中的一个库 |
document | 相当于Mysql中的一行数据 |
field | 相当于Mysql中的column |
shards | 分片存储 |
replicas | 进行备份 |
1.Doc中的元数据
{
"_id": "1",
"_index": "lsp",
"_score": 1,
-"_source": {
"age": 30,
"desc": "皮肤黑、武器长、性格直",
"from": "gu",
"name": "顾老二",
-"tags": [
"黑",
"长",
"直"
]
}
各部分的含义:
- _index: 文档所属的索引名
- _type:文档所属的类型名
- _id:文档唯一标识
- _source:文档的原始JSON数据
- @version:文档的版本信息(可用于并发搜索时,解决文档冲突)
- _score:相关性打分(根据检索结果打分)
2.index索引
每个索引都有自己的Mapping定义,用于包含所有的文档字段名和字段类型。
* Shard体现物理空间的概念
* 索引中的数据分散在Shard上
* Mapping定义文档字段的类型
* Setting定义不同的数据分布
3.type
- 5.x及以前版本一个index有一个或者多个type
- 6.X版本一个index只有一个type
- 7.X版本移除了type,type相关的所有内容全部变成Deprecated,为了兼容升级和过渡,所有的7.X版本es数据写入后type字段都默认被置为_doc
- 8.X版本完全废弃type
2.elasticsearch ui
2.1 elasticsearch-head
下载
- 安装node.js安装教程. npm -v node -v
- 安装grunt npm install -g grunt-cli grunt -v
- 下载elasticsearch-head,安装,启动下载地址:https://github.com/mobz/elasticsearch-head. 安装:cd到此文件夹,然后 npm install
启动:npm run start/grunt server 界面比较老旧,不时尚
2.2 elasticHD
1.下载
下载地址:https://github.com/360EntSecGroup-Skylar/ElasticHD/releases.
2.启动 可以直接双击启动 也可以cd到安装目录,执行 ElasticHD -p 127.0.0.1:9800
2.3 kibana
下载地址:https://www.elastic.co/start.
切记:要和上面的Elasticsearch版本匹配
鼠标放在windows上,会显示下载地址,直接修改版本就好了
- https://artifacts.elastic.co/downloads/kibana/kibana-7.15.2-linux-x86_64.tar.gz
- https://artifacts.elastic.co/downloads/kibana/kibana-7.16.2-windows-x86_64.zip
1.双击Kibana.bat启动 默认对应elasticsearch:9200
2.访问 http://localhost:5601,输入前面的密码
如何进行监控:Kibana监控Es Cluster.
3.Springboot整合Elasticsearch
3.1 官方文档
链接: Spring Data Elasticsearch - Reference Documentation.
3.2 依赖
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
ElasticsearchRestTemplate封装了RestHighLevelClient,源码如下👇
public class ElasticsearchRestTemplate extends AbstractElasticsearchTemplate {
private static final Logger LOGGER = LoggerFactory.getLogger(ElasticsearchRestTemplate.class);
private final RestHighLevelClient client;
private final ElasticsearchExceptionTranslator exceptionTranslator;
public ElasticsearchRestTemplate(RestHighLevelClient client) {
Assert.notNull(client, "Client must not be null!");
this.client = client;
this.exceptionTranslator = new ElasticsearchExceptionTranslator();
this.initialize(this.createElasticsearchConverter());
}
public ElasticsearchRestTemplate(RestHighLevelClient client, ElasticsearchConverter elasticsearchConverter) {
Assert.notNull(client, "Client must not be null!");
this.client = client;
this.exceptionTranslator = new ElasticsearchExceptionTranslator();
this.initialize(elasticsearchConverter);
}
}
3.3 properties
spring.data.elasticsearch.client.reactive.endpoints=127.0.0.1:9200
#没有这个index,就创建
spring.data.elasticsearch.repositories.enabled=true
spring.data.elasticsearch.cluster-nodes=127.0.0.1:9300
3.4 代码
- @Document表示这是一个Elasticsearch Data,
- indexName对应Elasticsearch Index
- type对应Elasticsearch type
@Document(indexName = "product",type = "article")
@Data
public class Article {
@Id
private String id;
private String title;
@Field(type = FieldType.Nested, includeInParent = true)
private List<Author> authors;
}
@Data
public class Author {
private String name;
}
@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article,String> {
//下面的这两个查询的作用是一样的。一个采用默认的实现方式,一个采用自定义的实现方式
Page<Article> findByAuthorsName(String name, Pageable pageable);
@Query("{\"bool\": {\"must\": [{\"match\": {\"authors.name\": \"?0\"}}]}}")
Page<Article> findByAuthorsNameUsingCustomQuery(String name, Pageable pageable);
//搜索title字段
Page<Article> findByTitleIsContaining(String word,Pageable pageable);
Page<Article> findByTitle(String title,Pageable pageable);
}
@Autowired
private ArticleRepository articleRepository;
@Autowired
private ElasticsearchRestTemplate elasticsearchRestTemplate;
//检查相应的索引是否存在,如果spring.data.elasticsearch.repositories.enabled=True,则会自动创建索引
private boolean checkIndexExists(Class<?> cls){
boolean isExist = elasticsearchRestTemplate.indexOps(cls).exists();
//获取索引名
String indexName = cls.getAnnotation(Document.class).indexName();
System.out.printf("index %s is %s\n", indexName, isExist ? "exist" : "not exist");
return isExist;
}
@Test
void test() {
checkIndexExists(Article.class);
}
@Test
void save(){
Article article = new Article();
articel.setTitle("Spring Data Elasticsearch");
article.setAuthors(asList(new Author("LaoAlex"),new Author("John")));
articleRepository.save(article);
article = new Article();
articel.setTitle("Spring Data Elasticsearch2");
article.setAuthors(asList(new Author("LaoAlex"),new Author("King")));
articleRepository.save(article);
article = new Article();
articel.setTitle("Spring Data Elasticsearch3");
article.setAuthors(asList(new Author("LaoAlex"),new Author("Bill")));
articleRepository.save(article);
}
@Test
void queryAuthorName() throws JsonProcessingException {
Page<Article> articles = articleRepository.findByAuthorsName("LaoAlex", PageRequest.of(0,10));
//将对象转为Json字符串
ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
String json = objectWriter.writeValueAsString(articles);
System.out.println(json);
}
//使用自定义查询
@Test
void queryAuthorNameByCustom() throws JsonProcessingException {
Page<Article> articles = articleRepository.findByAuthorsNameUsingCustomQuery("John",PageRequest.of(0,10));
//将对象转为Json字符串
ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
String json = objectWriter.writeValueAsString(articles);
System.out.println(json);
}
//使用Template进行关键字查询
@Test
void queryTileContainByTemplate() throws JsonProcessingException {
Query query = new NativeSearchQueryBuilder().withFilter(regexpQuery("title",".*elasticsearch2.*")).build();
SearchHits<Article> articles = elasticsearchRestTemplate.search(query, Article.class, IndexCoordinates.of("product"));
//将对象转为Json字符串
ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
String json = objectWriter.writeValueAsString(articles);
System.out.println(json);
}
@Test
void update() throws JsonProcessingException {
Page<Article> articles = articleRepository.findByTitle("Spring Data Elasticsearch",PageRequest.of(0,10));
//将对象转为Json字符串
ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
String json = objectWriter.writeValueAsString(articles);
System.out.println(json);
Article article = articles.getContent().get(0);
System.out.println(article);
article.setAuthors(null);
articleRepository.save(article);
}
@Test
void delete(){
Page<Article> articles = articleRepository.findByTitle("Spring Data Elasticsearch",PageRequest.of(0,10));
Article article = articles.getContent().get(0);
articleRepository.delete(article);
}
3.5 报警
1.no id property found for class
//报警就是es用来封装的实体类出了问题,两个办法解决
//1.将主键栏位改为id
@Data
@Document(indexName = "spring.student")
public class Student {
private int id;
private String stuName;
private String stuAddress;
private String gender;
}
//2.如果主键栏位不是id,给主键栏位添加注解@Id
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
@Data
@Document(indexName = "spring.student")
public class Student {
@Id
private int stuId;
private String stuName;
private String stuAddress;
private String gender;
}
4.集群搭建
4.1 修改配置文件
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 三台都是这个名字
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 集群中节点名称 (3个节点以此为:node-1,node-2,node-3)
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 逐一修改三台的端口,分别是(9201 9301) ,(9202 9302) ,(9203 9303)
http.port: 9201
transport.tcp.port: 9301
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#三个都一样,node-1为主节点
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
#
# ---------------------------------- Security ----------------------------------
#
# *** WARNING ***
#
# Elasticsearch security features are not enabled by default.
# These features are free, but require configuration changes to enable them.
# This means that users don’t have to provide credentials and can get full access
# to the cluster. Network connections are also not encrypted.
#
# To protect your data, we strongly encourage you to enable the Elasticsearch security features.
# Refer to the following documentation for instructions.
#
# https://www.elastic.co/guide/en/elasticsearch/reference/7.16/configuring-stack-security.html
#allow origin
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization
4.2 还可以修改密码(此步骤没什么用)
都说默认密码如下:我是登不进去
- username=elastic
- password=changeme
cd到bin目录,找到elasticsearch-setup-passwords(或者直接在地址栏cmd) D:\Tools\es_cluster\elasticsearch-7.16.2\bin>elasticsearch-setup-passwords interactive
噼里啪啦一顿改,必须全都要改
- elas123
4.3 启动三个节点,查看状态
1.节点启动成功,并不代表集群成功
2.呼叫下面的api,查看集群状态
- http://localhost:9203/_cat/health?v
3.查看集群状态
- http://ip:port/_cluster/health
- http://ip:port/_cat/nodes
- http://ip:port/ _cat/shards
GET http://localhost:9203/_cluster/health
{
"cluster_name": "my-application",
"status": "green",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 3,
"active_primary_shards": 8,
"active_shards": 16,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
颜色的含义:
- green - 主分片与副本都正常分配
- yellow - 主分片全部分配,有副本分片未能正常分配
- red - 有主分片未能分配
4.4 启动kibana,监控
1.修改配置文件
# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601
# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "127.0.0.1"
# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""
# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false
# Specifies the public URL at which Kibana is available for end users. If
# `server.basePath` is configured this URL should end with the same basePath.
#server.publicBaseUrl: ""
# The maximum payload size in bytes for incoming server requests.
#server.maxPayload: 1048576
# The Kibana server's name. This is used for display purposes.
#server.name: "your-hostname"
# The URLs of the Elasticsearch instances to use for all your queries.
#elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.hosts: ["http://localhost:9201","http://localhost:9202","http://localhost:9203"]
2.启动
此处缺一张图片kibana
5.elasticsearch api
造数据
PUT users/_doc/1
{
"name":"张飞",
"age":30,
"from": "China",
"desc": "皮肤黑、武器重、性格直",
"tags": ["黑", "重", "直"]
}
PUT users/_doc/2
{
"name":"赵云",
"age":18,
"from":"China",
"desc":"帅气逼人,一身白袍",
"tags":["帅", "白"]
}
PUT users/_doc/3
{
"name":"关羽",
"age":22,
"from":"England",
"desc":"大刀重,骑赤兔马,胡子长",
"tags":["重", "马","长"]
}
PUT users/_doc/4
{
"name":"刘备",
"age":29,
"from":"Child",
"desc":"大耳贼,持双剑,懂谋略",
"tags":["剑", "大"]
}
PUT users/_doc/5
{
"name":"貂蝉",
"age":25,
"from":"England",
"desc":"闭月羞花,沉鱼落雁",
"tags":["闭月","羞花"]
}
5.1 CURD之Create
Notice:当执行PUT命令时,如果数据不存在,则新增该条数据,如果数据存在则修改该条数据 下面是两种创建方法
//当执行PUT命令时,如果数据不存在,则新增该条数据,如果数据存在则修改该条数据
POST users/_doc
{
"user": "Mike",
"post_date": "2020-10-24T14:39:30",
"message": "trying out kibana"
}
post的id会是随机的,建议还是下面的put好
PUT users/_doc/1
{
"user": "Jack",
"post_date": "2020-10-24T14:39:30",
"message": "trying out Elasticsearch"
}
PUT users/_doc/2
{
"user": "Ludy",
"post_date": "2020-10-24T14:39:30",
"message": "trying out Elasticsearch"
}
查询某条数据x
GET users/_doc/x
5.2 CURD之Update
POST users/_doc/3/_update
{
"doc": {
"post_date": "2020-10-24T14:39:30",
"message": "trying out Elasticsearch"
}
}
5.3 CURD之Delete
DELETE users/_doc/4
5.4 CURD之Retrieve
使用elasticHD进行查询,Demo:
- 索引spring.student
- 索引spring.test
- type默认都是doc
- GET /Spring.student/_search 此时查询全部
- GET /Spring.student/_search?q=id:1 此时查询一个index
- GET /spring.student,spring.test/_search?q=id:1 此时查询多个index
5.5 match查询
GET users/_doc/_search
{
"query": {
"match": {
"post_date": "2020-10-24T14:39:30"
}
}
}
5.6 term查询
GET users/_doc/_search
{
"query": {
"term": {
"t1": "Beautiful girl!"
}
}
}
5.6 排序查询
可排序的属性
- 数字
- 日期
GET users/_doc/_search
{
"query": {
"match": {
"post_date": "2020-10-24T14:39:30"
}
},
"sort": [
{
"id": {
"order": "desc"
}
}
]
}
5.7 分页查询
GET users/_doc/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
],
"from": 2,
"size": 1
}
6.Elasticsearch之布尔查询
关键字 | 代表的含义 |
must | and |
should | or |
must_not | not |
filter | 与must组合使用 |
range | 条件筛选范围。 |
gt | 大于,相当于关系型数据库中的>。 |
gte | 大于等于,相当于关系型数据库中的>=。 |
lt | 小于,相当于关系型数据库中的<。 |
lte | 小于等于,相当于关系型数据库中的<=。 |
6.1 must关键字查询
//一个条件
GET lqz/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
]
}
}
}
//两个条件
GET lqz/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"age": 30
}
}
]
}
}
}
6.2 should关键字查询
GET lqz/_doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"tags": "闭月"
}
}
]
}
}
}
6.3 must_not关键字查询
//三个条件都满足
GET lqz/_doc/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"from": "gu"
}
},
{
"match": {
"tags": "可爱"
}
},
{
"match": {
"age": 18
}
}
]
}
}
}
6.4 filter关键字查询
//要查询from为gu,age大于25的数据怎么查
GET lqz/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"from": "gu"
}
}
],
"filter": {
"range": {
"age": {
"gt": 25
}
}
}
}
}
}
7.Elasticsearch之查询结果过滤
//所有的结果中,我只需要查看name和age两个属性
GET lqz/_doc/_search
{
"query": {
"match": {
"name": "顾老二"
}
},
"_source": ["name", "age"]
}
8.Elasticsearch之高亮查询
GET lqz/_doc/_search
{
"query": {
"match": {
"name": "石头"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
使用b标签自定义高亮
GET lqz/chengyuan/_search
{
"query": {
"match": {
"from": "gu"
}
},
"highlight": {
"pre_tags": "<b class='key' style='color:red'>",
"post_tags": "</b>",
"fields": {
"from": {}
}
}
}
9.Elasticsearch之聚合查询
聚合函数查询
- avg
- max
- min
- sum
聚合函数,其语法被封装在aggs中,而my_xxx则是为查询结果起个别名,封装了计算出的值
"aggregations" : {
"my_avg" : {
"value" : 27.0
}
}
"aggregations" : {
"my_max" : {
"value" : 30.0
}
}
.......
9.1 avg
GET users/_doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"_source": ["name", "age"]
}
GET lqz/_doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"size": 0,
"_source": ["name", "age"]
}
9.2 max
GET lqz/_doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_max": {
"max": {
"field": "age"
}
}
},
"size": 0
}
9.3 min
GET lqz/_doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_min": {
"min": {
"field": "age"
}
}
},
"size": 0
}
9.4 sum
GET lqz/_doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_sum": {
"sum": {
"field": "age"
}
}
},
"size": 0
}
9.5 range分组查询
GET lqz/_doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
}
}
}
}
两个条件,即分组,又要求平均值
GET lqz/_doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}
10.Elasticsearch之Mapping & Dynamic Mapping
10.1 mapping
GET index_name/_mapping
//先来感受一下
PUT users/_doc/1
{
"user": "Jack",
"post_date": "2020-10-24T14:39:30",
"message": "trying out Elasticsearch"
}
GET users/_mapping
{
"users" : {
"mappings" : {
"properties" : {
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"post_date" : {
"type" : "date"
},
"user" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
每个索引都有一个映射类型(这话必须放在elasticsearch6.x版本后才能说) 映射类型有:
- 元字段(meta-fields):元字段用于自定义如何处理文档关联的元数据,例如包括文档的_index、_type、_id和_source字段。
- 字段或属性(field or properties):映射类型包含与文档相关的字段或者属性的列表。
字段的 mapping 可以设置很多参数,如下:
- analyzer:指定分词器,只有 text 类型的数据支持。
- enabled:如果设置成 false,表示数据仅做存储,不支持搜索和聚合分析(数据保存在 _source 中)。 默认值为 true。
- index:字段是否建立倒排索引。 如果设置成 false,表示不建立倒排索引(节省空间),同时数据也无法被搜索,但依然支持聚合分析,数据也会出现在 _source 中。 默认值为 true。
- norms:字段是否支持算分。 如果字段只用来过滤和聚合分析,而不需要被搜索(计算算分),那么可以设置为 false,可节省空间。 默认值为 true。
- doc_values:如果确定不需要对字段进行排序或聚合,也不需要从脚本访问字段值,则可以将其设置为 false,以节省磁盘空间。 默认值为 true。
- fielddata:如果要对 text 类型的数据进行排序和聚合分析,则将其设置为 true。 默认为 false。
- store:默认值为 false,数据存储在 _source 中。 默认情况下,字段值被编入索引以使其可搜索,但它们不会被存储。这意味着可以查询字段,但无法检索原始字段值。 在某些情况下,存储字段是有意义的。例如,有一个带有标题、日期和非常大的内容字段的文档,只想检索标题和日期,而不必从一个大的源字段中提取这些字段。
- boost:可增强字段的算分。
- coerce:是否开启数据类型的自动转换,比如字符串转数字。 默认是开启的。
- dynamic:控制 mapping 的自动更新,取值有 true,false,strict。
- eager_global_ordinals
- fields:多字段特性。 让一个字段拥有多个子字段类型,使得一个字段能够被多个不同的索引方式进行索引。
- copy_to
- format
- ignore_above
- ignore_malformed
- index_options
- index_phrases
- index_prefixes
- meta
- normalizer
- null_value:定义 null 的值。
- position_increment_gap
- properties
- search_analyzer
- similarity
- term_vector
10.2 dynamic mapping
Dynamic Mapping的机制
- 我们无需手动定义Mappings。ES会自动根据文档信息,推算出字段的类型。
- 但是有时候会推算出不对,例如地理位置信息
- 当类型如果设置不对时,会导致一些功能无法正常运行,例如Range查询。
- 动态映射 dynamic mapping
- 静态映射 explicit mapping
- 严格映射 strict mappings
1 dynamic mapping
//创建索引nolan
PUT nolan
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "long"
}
}
}
}
//查询索引nolan
GET nolan
{
"nolan" : {
"aliases" : { },
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text"
}
}
},
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "nolan",
"creation_date" : "1650261517356",
"number_of_replicas" : "1",
"uuid" : "dXUnwua2TDCI2K9hcSL98A",
"version" : {
"created" : "7160299"
}
}
}
}
}
//插入数据
PUT nolan/_doc/1
{
"name": "小黑",
"age": 18,
"sex": "不详"
}
//查询索引
{
"nolan" : {
"aliases" : { },
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text"
},
"sex" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "nolan",
"creation_date" : "1650261517356",
"number_of_replicas" : "1",
"uuid" : "dXUnwua2TDCI2K9hcSL98A",
"version" : {
"created" : "7160299"
}
}
}
}
}
上面的例子,你会发现:
elasticsearch帮我们动态的新增了一个sex的映射
elasticsearch默认是允许添加新的字段的,也就是dynamic:true。
//其实创建索引的时候,是这样的
PUT nolan
{
"mappings": {
"dynamic":true,
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "long"
}
}
}
}
2 explicit mapping
将dynamic值设置为false
//创建索引nolan1
PUT nolan1
{
"mappings": {
"dynamic": "false",
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "long"
}
}
}
}
//插入数据
PUT nolan1/_doc/1
{
"name": "小黑",
"age":18
}
PUT nolan1/_doc/2
{
"name": "小白",
"age": 16,
"sex": "不详"
}
//查询mapping
GET nolan1
{
"nolan1" : {
"aliases" : { },
"mappings" : {
"dynamic" : "false",
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text"
}
}
},
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "nolan1",
"creation_date" : "1650262553126",
"number_of_replicas" : "1",
"uuid" : "OuUWSsQ-SUaGr2ged3FRbQ",
"version" : {
"created" : "7160299"
}
}
}
}
}
可以看到elasticsearch并没有为新增的sex建立映射关系
当elasticsearch察觉到有新增字段时,因为dynamic:false的关系
会忽略该字段,但是仍会存储该字段。
3 strict mapping
将dynamic的状态改为strict
//创建索引nolan2
PUT nolan2
{
"mappings": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "long"
}
}
}
}
//插入数据,当执行第二笔的时候会报警
PUT nolan1/_doc/1
{
"name": "小黑",
"age":18
}
PUT nolan1/_doc/2
{
"name": "小白",
"age": 16,
"sex": "不详"
}
遇到新字段,就会抛出异常
4 小结
Name | Setting | Value |
动态映射 | dynamic: true | 动态添加新的字段(或缺省) |
静态映射 | dynamic: false | 忽略新的字段。在原有的映射基础上,当有新的字段时,不会主动的添加新的映射关系,只作为查询结果出现在查询中。 |
严格模式 | dynamic: strict | 如果遇到新的字段,就抛出异常 |
一般静态映射用的较多。就像HTML的img标签一样,你可以在需要的时候添加id或者class属性。
10.3 对象属性
//属性嵌套
PUT noaln2/_doc/1
{
"name":"tom",
"age":18,
"info":{
"addr":"北京",
"tel":"10010"
}
}
PUT noaln2/_doc/21
{
"name":"jim",
"age":21,
"info":{
"addr":"东莞",
"tel":"10086"
}
}
//创建索引nolan2
PUT nolan2
{
"mappings": {
"dynamic": false,
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "text"
},
"info": {
"properties": {
"addr": {
"type": "text"
},
"tel": {
"type" : "text"
}
}
}
}
}
}
GET nolan2/_doc/_search
{
"query": {
"match": {
"info.tel": "10086"
}
}
}
10.4 控制当前字段是否被索引
关键字index
- age属性不会被索引
PUT nolan3
{
"mappings": {
"dynamic": false,
"properties": {
"name": {
"type": "text",
"index": true
},
"age": {
"type": "long",
"index": false
}
}
}
}
10.5 对Null值实现搜索
1.Keyword类型支持设定 null_value
PUT users
{
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text"
},
"lastName" : {
"type" : "text"
},
"mobile" : {
"type" : "keyword",
"null_value": "NULL"
}
}
}
}
2.ignore_above
//创建索引
PUT nolan
{
"mappings": {
"properties":{
"t1":{
"type":"keyword",
"ignore_above": 5
},
"t2":{
"type":"keyword",
"ignore_above": 10 ①
}
}
}
}
//插入数据
PUT nolan/_doc/1
{
"t1":"elk", ②
"t2":"elasticsearch" ③
}
//查询④
GET nolan/_doc/_search
{
"query":{
"term": {
"t1": "elk"
}
}
}
//查询⑤
GET nolan/_doc/_search
{
"query": {
"term": {
"t2": "elasticsearch"
}
}
}
- 该字段将忽略任何超过10个字符的字符串
- 文档已成功建立索引,也就是说能被查询,并且有结果返回
- 该字段将不会建立索引,以该字段作为查询条件,将不会有结果返回。
- 有结果返回。
- 则将不会有结果返回,因为t2字段对应的值长度超过了ignove_above设置的值。
11.elasticsearch之setting
设置主、复制分片
PUT nolan
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
},
"settings": {
"number_of_replicas": 1,
"number_of_shards": 5
}
}
- number_of_shards是主分片数量(每个索引默认5个主分片)
- number_of_replicas是复制分片,默认一个主分片搭配一个复制分片。
12.elasticsearch字段的数据类型
- 简单类型*
- Numeric
- Boolean
- Date
- Text
- Keyword
- Binary
- 等等
复杂类型
- Object
- Arrays
- Nested:一种对象数据类型。
- Join:为同一索引中的文档定义父/子关系。
特殊类型
- Geo-point
- Geo-shape
- Percolator
13.cluster node
1.Master eligible nodes 和Master node 每个节点启动后,默认就是Master eligible节点 Master-eligible可以参加选主进程,成为Master节点 当第一个节点启动时,它会将自己选举成Master节点
只有Master节点可以修改集群的状态信息 集群状态(Cluster State),维护了一个集群中,必要的信息
- 所有的节点信息
- 所有的索引和其相关的Mapping与Setting信息
- 分片的路由信息
2.Data Node & Coordinationg Node
- Data Node 可以保存数据的节点。负责保存分片数据。在数据的扩展上起到了至关重要的作用
- Coordinationg Node 负责接收Client的请求,将请求分发到合适的节点,最终把结果汇集到一起
3.分片(Primary Shard & Replica Shard)
- 主分片 用以解决数据水平扩展的问题,通过主分片,可以将数据分布到集群内的所有节点上。 一个分片是一个运行的Lucene实例 主分片数在索引创建时指定,后续不允许修改,除非Reindex
- 副本 用于解决数据高可用的问题。是主分片的拷贝 副本分片数,可以动态调整 增加副本数,还可以在一定程度上提高服务的可用性(读取的吞吐)
13.Analyzer进行分词
数据被发送到elasticsearch后,会进行的一系列操作
- 字符过滤:使用字符过滤器转变字符。
- 文本切分为分词:将文本(档)分为单个或多个分词。
- 分词过滤:使用分词过滤器转变每个分词。
- 分词索引:最终将分词存储在Lucene倒排索引中
13.1 分析器
在elasticsearch中,一个分析器可以包括:
- 可选的字符过滤器
- 一个分词器
- 0个或多个分词过滤器
1. 标准分析器:standard analyzer
标准分析器(standard analyzer):是elasticsearch的默认分析器,该分析器综合了大多数欧洲语言来说合理的默认模块,包括标准分词器、标准分词过滤器、小写转换分词过滤器和停用词分词过滤器。
POST _analyze
{
"analyzer": "standard",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
// 分词结果如下
{
"tokens" : [
{
"token" : "to",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "be",
"start_offset" : 3,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "or",
"start_offset" : 6,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "not",
"start_offset" : 9,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "to",
"start_offset" : 13,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "be",
"start_offset" : 16,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "that",
"start_offset" : 21,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "is",
"start_offset" : 26,
"end_offset" : 28,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "a",
"start_offset" : 29,
"end_offset" : 30,
"type" : "<ALPHANUM>",
"position" : 8
},
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "莎",
"start_offset" : 45,
"end_offset" : 46,
"type" : "<IDEOGRAPHIC>",
"position" : 10
},
{
"token" : "士",
"start_offset" : 46,
"end_offset" : 47,
"type" : "<IDEOGRAPHIC>",
"position" : 11
},
{
"token" : "比",
"start_offset" : 47,
"end_offset" : 48,
"type" : "<IDEOGRAPHIC>",
"position" : 12
},
{
"token" : "亚",
"start_offset" : 48,
"end_offset" : 49,
"type" : "<IDEOGRAPHIC>",
"position" : 13
}
]
}
2. 简单分析器:simple analyzer
简单分析器(simple analyzer):简单分析器仅使用了小写转换分词,这意味着在非字母处进行分词,并将分词自动转换为小写。这个分词器对于亚种语言来说效果不佳,因为亚洲语言不是根据空白来分词的,所以一般用于欧洲言中
POST _analyze
{
"analyzer": "simple",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
// 分词结果如下
{
"tokens" : [
{
"token" : "to",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "be",
"start_offset" : 3,
"end_offset" : 5,
"type" : "word",
"position" : 1
},
{
"token" : "or",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
},
{
"token" : "not",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
},
{
"token" : "to",
"start_offset" : 13,
"end_offset" : 15,
"type" : "word",
"position" : 4
},
{
"token" : "be",
"start_offset" : 16,
"end_offset" : 18,
"type" : "word",
"position" : 5
},
{
"token" : "that",
"start_offset" : 21,
"end_offset" : 25,
"type" : "word",
"position" : 6
},
{
"token" : "is",
"start_offset" : 26,
"end_offset" : 28,
"type" : "word",
"position" : 7
},
{
"token" : "a",
"start_offset" : 29,
"end_offset" : 30,
"type" : "word",
"position" : 8
},
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "word",
"position" : 9
},
{
"token" : "莎士比亚",
"start_offset" : 45,
"end_offset" : 49,
"type" : "word",
"position" : 10
}
]
}
3. 空白分析器:whitespace analyzer
空白格分析器(whitespace analyzer):这玩意儿只是根据空白将文本切分为若干分词,真是有够偷懒!
POST _analyze
{
"analyzer": "whitespace",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
// 分词结果如下
{
"tokens" : [
{
"token" : "To",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "be",
"start_offset" : 3,
"end_offset" : 5,
"type" : "word",
"position" : 1
},
{
"token" : "or",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
},
{
"token" : "not",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
},
{
"token" : "to",
"start_offset" : 13,
"end_offset" : 15,
"type" : "word",
"position" : 4
},
{
"token" : "be,",
"start_offset" : 16,
"end_offset" : 19,
"type" : "word",
"position" : 5
},
{
"token" : "That",
"start_offset" : 21,
"end_offset" : 25,
"type" : "word",
"position" : 6
},
{
"token" : "is",
"start_offset" : 26,
"end_offset" : 28,
"type" : "word",
"position" : 7
},
{
"token" : "a",
"start_offset" : 29,
"end_offset" : 30,
"type" : "word",
"position" : 8
},
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "word",
"position" : 9
},
{
"token" : "————",
"start_offset" : 40,
"end_offset" : 44,
"type" : "word",
"position" : 10
},
{
"token" : "莎士比亚",
"start_offset" : 45,
"end_offset" : 49,
"type" : "word",
"position" : 11
}
]
}
4. 停用词分析器:stop analyzer
停用词分析(stop analyzer)和简单分析器的行为很像,只是在分词流中额外的过滤了停用词
POST _analyze
{
"analyzer": "stop",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
{
"tokens" : [
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "word",
"position" : 9
},
{
"token" : "莎士比亚",
"start_offset" : 45,
"end_offset" : 49,
"type" : "word",
"position" : 10
}
]
}
5. 关键词分析器:keyword analyzer
关键词分析器(keyword analyzer)将整个字段当做单独的分词,如无必要,我们不在映射中使用关键词分析器。
POST _analyze
{
"analyzer": "keyword",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
// 分词结果如下
{
"tokens" : [
{
"token" : "To be or not to be, That is a question ———— 莎士比亚",
"start_offset" : 0,
"end_offset" : 49,
"type" : "word",
"position" : 0
}
]
}
6. 模式分析器:pattern analyzer
模式分析器(pattern analyzer)允许我们指定一个分词切分模式。但是通常更佳的方案是使用定制的分析器,组合现有的模式分词器和所需要的分词过滤器更加合适。
PUT pattern_test
{
"settings": {
"analysis": {
"analyzer": {
"my_email_analyzer":{
"type":"pattern",
"pattern":"\\W|_",
"lowercase":true
}
}
}
}
}
POST pattern_test/_analyze
{
"analyzer": "my_email_analyzer",
"text": "John_Smith@foo-bar.com"
}
// 分词结果如下
{
"tokens" : [
{
"token" : "john",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 0
},
{
"token" : "smith",
"start_offset" : 5,
"end_offset" : 10,
"type" : "word",
"position" : 1
},
{
"token" : "foo",
"start_offset" : 11,
"end_offset" : 14,
"type" : "word",
"position" : 2
},
{
"token" : "bar",
"start_offset" : 15,
"end_offset" : 18,
"type" : "word",
"position" : 3
},
{
"token" : "com",
"start_offset" : 19,
"end_offset" : 22,
"type" : "word",
"position" : 4
}
]
}
7. 语言和多语言分析器:chinese
elasticsearch为很多世界流行语言提供良好的、简单的、开箱即用的语言分析器集合:阿拉伯语、亚美尼亚语、巴斯克语、巴西语、保加利亚语、加泰罗尼亚语、中文、捷克语、丹麦、荷兰语、英语、芬兰语、法语、加里西亚语、德语、希腊语、北印度语、匈牙利语、印度尼西亚、爱尔兰语、意大利语、日语、韩国语、库尔德语、挪威语、波斯语、葡萄牙语、罗马尼亚语、俄语、西班牙语、瑞典语、土耳其语和泰语。
POST _analyze
{
"analyzer": "chinese",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
{
"tokens" : [
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "莎",
"start_offset" : 45,
"end_offset" : 46,
"type" : "<IDEOGRAPHIC>",
"position" : 10
},
{
"token" : "士",
"start_offset" : 46,
"end_offset" : 47,
"type" : "<IDEOGRAPHIC>",
"position" : 11
},
{
"token" : "比",
"start_offset" : 47,
"end_offset" : 48,
"type" : "<IDEOGRAPHIC>",
"position" : 12
},
{
"token" : "亚",
"start_offset" : 48,
"end_offset" : 49,
"type" : "<IDEOGRAPHIC>",
"position" : 13
}
]
}
也可以是别语言:
POST _analyze
{
"analyzer": "french",
"text":"Je suis ton père"
}
POST _analyze
{
"analyzer": "german",
"text":"Ich bin dein vater"
}
8. 雪球分析器:snowball analyzer
雪球分析器(snowball analyzer)除了使用标准的分词和分词过滤器(和标准分析器一样)也是用了小写分词过滤器和停用词过滤器,除此之外,它还是用了雪球词干器对文本进行词干提取。
POST _analyze
{
"analyzer": "snowball",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
// 分词结果如下
{
"tokens" : [
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "莎",
"start_offset" : 45,
"end_offset" : 46,
"type" : "<IDEOGRAPHIC>",
"position" : 10
},
{
"token" : "士",
"start_offset" : 46,
"end_offset" : 47,
"type" : "<IDEOGRAPHIC>",
"position" : 11
},
{
"token" : "比",
"start_offset" : 47,
"end_offset" : 48,
"type" : "<IDEOGRAPHIC>",
"position" : 12
},
{
"token" : "亚",
"start_offset" : 48,
"end_offset" : 49,
"type" : "<IDEOGRAPHIC>",
"position" : 13
}
]
}
13.2 字符过滤器
Name | Value |
HTML字符过滤器 | HTML Strip Char Filter |
映射字符过滤器 | Mapping Char Filter |
模式替换过滤器 | Pattern Replace Char Filter |
1. HTML字符过滤器
HTML字符过滤器(HTML Strip Char Filter)从文本中去除HTML元素。
POST _analyze
{
"tokenizer": "keyword",
"char_filter": ["html_strip"],
"text":"<p>I'm so <b>happy</b>!</p>"
}
//结果如下
{
"tokens" : [
{
"token" : """
I'm so happy!
""",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 0
}
]
}
2. 映射字符过滤器
映射字符过滤器(Mapping Char Filter)接收键值的映射,每当遇到与键相同的字符串时,它就用该键关联的值替换它们。
PUT pattern_test4
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer":{
"tokenizer":"keyword",
"char_filter":["my_char_filter"]
}
},
"char_filter":{
"my_char_filter":{
"type":"mapping",
"mappings":["刘备 => 666","关羽 => 888"]
}
}
}
}
}
POST pattern_test4/_analyze
{
"analyzer": "my_analyzer",
"text": "刘备爱惜关羽,可是后来关羽大意失荆州"
}
//结果如下
{
"tokens" : [
{
"token" : "666爱惜888,可是后来888大意失荆州",
"start_offset" : 0,
"end_offset" : 19,
"type" : "word",
"position" : 0
}
]
}
3. 模式替换过滤器
模式替换过滤器(Pattern Replace Char Filter)使用正则表达式匹配并替换字符串中的字符。但要小心你写的抠脚的正则表达式。因为这可能导致性能变慢!
PUT pattern_test5
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "(\\d+)-(?=\\d)",
"replacement": "$1_"
}
}
}
}
}
POST pattern_test5/_analyze
{
"analyzer": "my_analyzer",
"text": "My credit card is 123-456-789"
}
//结果如下
{
"tokens" : [
{
"token" : "My",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "credit",
"start_offset" : 3,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "card",
"start_offset" : 10,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "is",
"start_offset" : 15,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "123_456_789",
"start_offset" : 18,
"end_offset" : 29,
"type" : "<NUM>",
"position" : 4
}
]
}
13.3 分词器
由于elasticsearch内置了分析器,它同样也包含了分词器。分词器,顾名思义,主要的操作是将文本字符串分解为小块,而这些小块这被称为分词token。
1.标准分词器:standard tokenizer
标准分词器(standard tokenizer)是一个基于语法的分词器,对于大多数欧洲语言来说还是不错的,它同时还处理了Unicode文本的分词,但分词默认的最大长度是255字节,它也移除了逗号和句号这样的标点符号。
POST _analyze
{
"tokenizer": "standard",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
{
"tokens" : [
{
"token" : "To",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "be",
"start_offset" : 3,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "or",
"start_offset" : 6,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "not",
"start_offset" : 9,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "to",
"start_offset" : 13,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "be",
"start_offset" : 16,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "That",
"start_offset" : 21,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "is",
"start_offset" : 26,
"end_offset" : 28,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "a",
"start_offset" : 29,
"end_offset" : 30,
"type" : "<ALPHANUM>",
"position" : 8
},
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "莎",
"start_offset" : 45,
"end_offset" : 46,
"type" : "<IDEOGRAPHIC>",
"position" : 10
},
{
"token" : "士",
"start_offset" : 46,
"end_offset" : 47,
"type" : "<IDEOGRAPHIC>",
"position" : 11
},
{
"token" : "比",
"start_offset" : 47,
"end_offset" : 48,
"type" : "<IDEOGRAPHIC>",
"position" : 12
},
{
"token" : "亚",
"start_offset" : 48,
"end_offset" : 49,
"type" : "<IDEOGRAPHIC>",
"position" : 13
}
]
}
2. 关键词分词器:keyword tokenizer
关键词分词器(keyword tokenizer)是一种简单的分词器,将整个文本作为单个的分词,提供给分词过滤器,当你只想用分词过滤器,而不做分词操作时,它是不错的选择。
POST _analyze
{
"tokenizer": "keyword",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
{
"tokens" : [
{
"token" : "To be or not to be, That is a question ———— 莎士比亚",
"start_offset" : 0,
"end_offset" : 49,
"type" : "word",
"position" : 0
}
]
}
3. 字母分词器:letter tokenizer
字母分词器(letter tokenizer)根据非字母的符号,将文本切分成分词。
POST _analyze
{
"tokenizer": "letter",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
{
"tokens" : [
{
"token" : "To",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "be",
"start_offset" : 3,
"end_offset" : 5,
"type" : "word",
"position" : 1
},
{
"token" : "or",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
},
{
"token" : "not",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
},
{
"token" : "to",
"start_offset" : 13,
"end_offset" : 15,
"type" : "word",
"position" : 4
},
{
"token" : "be",
"start_offset" : 16,
"end_offset" : 18,
"type" : "word",
"position" : 5
},
{
"token" : "That",
"start_offset" : 21,
"end_offset" : 25,
"type" : "word",
"position" : 6
},
{
"token" : "is",
"start_offset" : 26,
"end_offset" : 28,
"type" : "word",
"position" : 7
},
{
"token" : "a",
"start_offset" : 29,
"end_offset" : 30,
"type" : "word",
"position" : 8
},
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "word",
"position" : 9
},
{
"token" : "莎士比亚",
"start_offset" : 45,
"end_offset" : 49,
"type" : "word",
"position" : 10
}
]
}
4. 小写分词器:lowercase tokenizer
小写分词器(lowercase tokenizer)结合了常规的字母分词器和小写分词过滤器(跟你想的一样,就是将所有的分词转化为小写)的行为。通过一个单独的分词器来实现的主要原因是,一次进行两项操作会获得更好的性能。
POST _analyze
{
"tokenizer": "lowercase",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
{
"tokens" : [
{
"token" : "to",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "be",
"start_offset" : 3,
"end_offset" : 5,
"type" : "word",
"position" : 1
},
{
"token" : "or",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
},
{
"token" : "not",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
},
{
"token" : "to",
"start_offset" : 13,
"end_offset" : 15,
"type" : "word",
"position" : 4
},
{
"token" : "be",
"start_offset" : 16,
"end_offset" : 18,
"type" : "word",
"position" : 5
},
{
"token" : "that",
"start_offset" : 21,
"end_offset" : 25,
"type" : "word",
"position" : 6
},
{
"token" : "is",
"start_offset" : 26,
"end_offset" : 28,
"type" : "word",
"position" : 7
},
{
"token" : "a",
"start_offset" : 29,
"end_offset" : 30,
"type" : "word",
"position" : 8
},
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "word",
"position" : 9
},
{
"token" : "莎士比亚",
"start_offset" : 45,
"end_offset" : 49,
"type" : "word",
"position" : 10
}
]
}
5. 空白分词器:whitespace tokenizer
空白分词器(whitespace tokenizer)通过空白来分隔不同的分词,空白包括空格、制表符、换行等。但是,我们需要注意的是,空白分词器不会删除任何标点符号。
POST _analyze
{
"tokenizer": "whitespace",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
{
"tokens" : [
{
"token" : "To",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "be",
"start_offset" : 3,
"end_offset" : 5,
"type" : "word",
"position" : 1
},
{
"token" : "or",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
},
{
"token" : "not",
"start_offset" : 9,
"end_offset" : 12,
"type" : "word",
"position" : 3
},
{
"token" : "to",
"start_offset" : 13,
"end_offset" : 15,
"type" : "word",
"position" : 4
},
{
"token" : "be,",
"start_offset" : 16,
"end_offset" : 19,
"type" : "word",
"position" : 5
},
{
"token" : "That",
"start_offset" : 21,
"end_offset" : 25,
"type" : "word",
"position" : 6
},
{
"token" : "is",
"start_offset" : 26,
"end_offset" : 28,
"type" : "word",
"position" : 7
},
{
"token" : "a",
"start_offset" : 29,
"end_offset" : 30,
"type" : "word",
"position" : 8
},
{
"token" : "question",
"start_offset" : 31,
"end_offset" : 39,
"type" : "word",
"position" : 9
},
{
"token" : "————",
"start_offset" : 40,
"end_offset" : 44,
"type" : "word",
"position" : 10
},
{
"token" : "莎士比亚",
"start_offset" : 45,
"end_offset" : 49,
"type" : "word",
"position" : 11
}
]
}
6. 模式分词器:pattern tokenizer
模式分词器(pattern tokenizer)允许指定一个任意的模式,将文本切分为分词。
POST pattern_test2/_analyze
{
"tokenizer": "my_tokenizer",
"text":"To be or not to be, That is a question ———— 莎士比亚"
}
PUT pattern_test2
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer":{
"tokenizer":"my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer":{
"type":"pattern",
"pattern":","
}
}
}
}
}
{
"tokens" : [
{
"token" : "To be or not to be",
"start_offset" : 0,
"end_offset" : 18,
"type" : "word",
"position" : 0
},
{
"token" : " That is a question ———— 莎士比亚",
"start_offset" : 19,
"end_offset" : 49,
"type" : "word",
"position" : 1
}
]
}
7. UAX URL电子邮件分词器:UAX RUL email tokenizer
POST _analyze
{
"tokenizer": "uax_url_email",
"text":"作者:张开来源:未知原文:邮箱:xxxxxxx@xx.com版权声明:本文为博主原创文章,转载请附上博文链接!"
}
{
"tokens" : [
{
"token" : "作",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "者",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "张",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "开",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "来",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<IDEOGRAPHIC>",
"position" : 4
},
{
"token" : "源",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<IDEOGRAPHIC>",
"position" : 5
},
{
"token" : "未",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<IDEOGRAPHIC>",
"position" : 6
},
{
"token" : "知",
"start_offset" : 9,
"end_offset" : 10,
"type" : "<IDEOGRAPHIC>",
"position" : 7
},
{
"token" : "原",
"start_offset" : 10,
"end_offset" : 11,
"type" : "<IDEOGRAPHIC>",
"position" : 8
},
{
"token" : "文",
"start_offset" : 11,
"end_offset" : 12,
"type" : "<IDEOGRAPHIC>",
"position" : 9
},
{
"token" : "",
"start_offset" : 13,
"end_offset" : 64,
"type" : "<URL>",
"position" : 10
},
{
"token" : "邮",
"start_offset" : 64,
"end_offset" : 65,
"type" : "<IDEOGRAPHIC>",
"position" : 11
},
{
"token" : "箱",
"start_offset" : 65,
"end_offset" : 66,
"type" : "<IDEOGRAPHIC>",
"position" : 12
},
{
"token" : "xxxxxxx@xx.com",
"start_offset" : 67,
"end_offset" : 81,
"type" : "<EMAIL>",
"position" : 13
},
{
"token" : "版",
"start_offset" : 81,
"end_offset" : 82,
"type" : "<IDEOGRAPHIC>",
"position" : 14
},
{
"token" : "权",
"start_offset" : 82,
"end_offset" : 83,
"type" : "<IDEOGRAPHIC>",
"position" : 15
},
{
"token" : "声",
"start_offset" : 83,
"end_offset" : 84,
"type" : "<IDEOGRAPHIC>",
"position" : 16
},
{
"token" : "明",
"start_offset" : 84,
"end_offset" : 85,
"type" : "<IDEOGRAPHIC>",
"position" : 17
},
{
"token" : "本",
"start_offset" : 86,
"end_offset" : 87,
"type" : "<IDEOGRAPHIC>",
"position" : 18
},
{
"token" : "文",
"start_offset" : 87,
"end_offset" : 88,
"type" : "<IDEOGRAPHIC>",
"position" : 19
},
{
"token" : "为",
"start_offset" : 88,
"end_offset" : 89,
"type" : "<IDEOGRAPHIC>",
"position" : 20
},
{
"token" : "博",
"start_offset" : 89,
"end_offset" : 90,
"type" : "<IDEOGRAPHIC>",
"position" : 21
},
{
"token" : "主",
"start_offset" : 90,
"end_offset" : 91,
"type" : "<IDEOGRAPHIC>",
"position" : 22
},
{
"token" : "原",
"start_offset" : 91,
"end_offset" : 92,
"type" : "<IDEOGRAPHIC>",
"position" : 23
},
{
"token" : "创",
"start_offset" : 92,
"end_offset" : 93,
"type" : "<IDEOGRAPHIC>",
"position" : 24
},
{
"token" : "文",
"start_offset" : 93,
"end_offset" : 94,
"type" : "<IDEOGRAPHIC>",
"position" : 25
},
{
"token" : "章",
"start_offset" : 94,
"end_offset" : 95,
"type" : "<IDEOGRAPHIC>",
"position" : 26
},
{
"token" : "转",
"start_offset" : 96,
"end_offset" : 97,
"type" : "<IDEOGRAPHIC>",
"position" : 27
},
{
"token" : "载",
"start_offset" : 97,
"end_offset" : 98,
"type" : "<IDEOGRAPHIC>",
"position" : 28
},
{
"token" : "请",
"start_offset" : 98,
"end_offset" : 99,
"type" : "<IDEOGRAPHIC>",
"position" : 29
},
{
"token" : "附",
"start_offset" : 99,
"end_offset" : 100,
"type" : "<IDEOGRAPHIC>",
"position" : 30
},
{
"token" : "上",
"start_offset" : 100,
"end_offset" : 101,
"type" : "<IDEOGRAPHIC>",
"position" : 31
},
{
"token" : "博",
"start_offset" : 101,
"end_offset" : 102,
"type" : "<IDEOGRAPHIC>",
"position" : 32
},
{
"token" : "文",
"start_offset" : 102,
"end_offset" : 103,
"type" : "<IDEOGRAPHIC>",
"position" : 33
},
{
"token" : "链",
"start_offset" : 103,
"end_offset" : 104,
"type" : "<IDEOGRAPHIC>",
"position" : 34
},
{
"token" : "接",
"start_offset" : 104,
"end_offset" : 105,
"type" : "<IDEOGRAPHIC>",
"position" : 35
}
]
}
8. 路径层次分词器:path hierarchy tokenizer
路径层次分词器(path hierarchy tokenizer)允许以特定的方式索引文件系统的路径,这样在搜索时,共享同样路径的文件将被作为结果返回。
POST _analyze
{
"tokenizer": "path_hierarchy",
"text":"/usr/local/python/python2.7"
}
{
"tokens" : [
{
"token" : "/usr",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 0
},
{
"token" : "/usr/local",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 0
},
{
"token" : "/usr/local/python",
"start_offset" : 0,
"end_offset" : 17,
"type" : "word",
"position" : 0
},
{
"token" : "/usr/local/python/python2.7",
"start_offset" : 0,
"end_offset" : 27,
"type" : "word",
"position" : 0
}
]
}
13.4 分词过滤器
1. 自定义分词过滤器
PUT pattern_test3
{
"settings": {
"analysis": {
"filter": {
"my_test_length":{
"type":"length",
"max":8,
"min":2
}
}
}
}
}
POST pattern_test3/_analyze
{
"tokenizer": "standard",
"filter": ["my_test_length"],
"text":"a Small word and a longerword"
}
//结果如下:
{
"tokens" : [
{
"token" : "Small",
"start_offset" : 2,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "word",
"start_offset" : 8,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "and",
"start_offset" : 13,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
2. 自定义小写分词过滤器
PUT lowercase_example
{
"settings": {
"analysis": {
"analyzer": {
"standard_lowercase_example": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"]
},
"greek_lowercase_example": {
"type": "custom",
"tokenizer": "standard",
"filter": ["greek_lowercase"]
}
},
"filter": {
"greek_lowercase": {
"type": "lowercase",
"language": "greek"
}
}
}
}
}
POST lowercase_example/_analyze
{
"tokenizer": "standard",
"filter": ["greek_lowercase"],
"text":"Ένα φίλτρο διακριτικού τύπου πεζά s ομαλοποιεί το κείμενο διακριτικού σε χαμηλότερη θήκη"
}
{
"tokens" : [
{
"token" : "ενα",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "φιλτρο",
"start_offset" : 4,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "διακριτικου",
"start_offset" : 11,
"end_offset" : 22,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "τυπου",
"start_offset" : 23,
"end_offset" : 28,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "πεζα",
"start_offset" : 29,
"end_offset" : 33,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "s",
"start_offset" : 34,
"end_offset" : 35,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "ομαλοποιει",
"start_offset" : 36,
"end_offset" : 46,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "το",
"start_offset" : 47,
"end_offset" : 49,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "κειμενο",
"start_offset" : 50,
"end_offset" : 57,
"type" : "<ALPHANUM>",
"position" : 8
},
{
"token" : "διακριτικου",
"start_offset" : 58,
"end_offset" : 69,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "σε",
"start_offset" : 70,
"end_offset" : 72,
"type" : "<ALPHANUM>",
"position" : 10
},
{
"token" : "χαμηλοτερη",
"start_offset" : 73,
"end_offset" : 83,
"type" : "<ALPHANUM>",
"position" : 11
},
{
"token" : "θηκη",
"start_offset" : 84,
"end_offset" : 88,
"type" : "<ALPHANUM>",
"position" : 12
}
]
}
3. 多个分词过滤器
POST _analyze
{
"tokenizer": "standard",
"filter": ["length","lowercase"],
"text":"a Small word and a longerword"
}
{
"tokens" : [
{
"token" : "a",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "small",
"start_offset" : 2,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "word",
"start_offset" : 8,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "and",
"start_offset" : 13,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "a",
"start_offset" : 17,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "longerword",
"start_offset" : 19,
"end_offset" : 29,
"type" : "<ALPHANUM>",
"position" : 5
}
]
}
13.5 IK分词器
1. 下载
1.打开Github官网,搜索elasticsearch-analysis-ik,单击medcl/elasticsearch-analysis-ik
https://github.com/medcl/elasticsearch-analysis-ik.
2.ik版本要和es的版本匹配
3.在es的安装目录,找到plugins,并新建ik子目录,将ik解压后放入此目录
4.重启es和kibana
2. 介绍
Name | Function |
IKAnalyzer.cfg.xml | 用来配置自定义的词库 |
main.dic | ik原生内置的中文词库,大约有27万多条,只要是这些单词,都会被分在一起。 |
surname.dic | 中国的姓氏。 |
suffix.dic | 特殊(后缀)名词,例如乡、江、所、省等等。 |
preposition.dic | 中文介词,例如不、也、了、仍等等。 |
stopword.dic | 英文停用词库,例如a、an、and、the等。 |
quantifier.dic | 单位名词,如厘米、件、倍、像素等。 |
3. 测试
分解
GET _analyze
{
"analyzer": "ik_max_word",
"text": "上海自来水来自海上"
}
GET _analyze
{
"analyzer": "ik_smart",
"text": "今天是个好日子"
}
查询
GET ik1/_search
{
"query": {
"match_phrase": {
"content": "今天"
}
}
}
GET ik1/_search
{
"query": {
"match_phrase_prefix": {
"content": {
"query": "今天好日子",
"slop": 2
}
}
}
}
14.正排索引和倒排索引
15.数据建模
16.集群的内部安全通信
加密数据
- 避免数据抓包,敏感信息泄露
- 验证身份,避免Impostor Node
- Data/Cluster state
为节点创建证书 TLS协议要求Trusted Certificate Authority(CA)签发的X.509证书
- Certificate 节点加入需要使用相同的CA签发的证书
- Full Verification 节点加入集群需要相同CA签发的证书,还需要验证Host name 或者IP地址
- No Verification 任何节点都可以加入,开发环境用于诊断目的
#生成证书
#为您的Elasticearch集群创建一个证书颁发机构。例如,使用elasticsearch-certutil ca命令:
bin/elasticsearch-certutil ca
#为群集中的每个节点生成证书和私钥。例如,使用elasticsearch-certutil cert 命令:
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
#将证书拷贝到 config/certs目录下
elastic-certificates.p12
bin/elasticsearch -E node.name=node0 -E cluster.name=es -E path.data=node0_data -E http.port=9200 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate -E xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12 -E xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
bin/elasticsearch -E node.name=node1 -E cluster.name=es -E path.data=node1_data -E http.port=9201 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate -E xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12 -E xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
#不提供证书的节点,无法加入
bin/elasticsearch -E node.name=node2 -E cluster.name=es -E path.data=node2_data -E http.port=9202 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate
elasticsearch.yml 配置
#xpack.security.transport.ssl.enabled: true
#xpack.security.transport.ssl.verification_mode: certificate
#xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
#xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12