鲨鱼抓包 android 鲨鱼抓包软件颜色含义

转载

夜行者3号 2024-07-12 19:55:28

文章标签 鲨鱼抓包 android spring boot es Elastic elasticsearch 文章分类 Android 移动开发

SpringBoot --- 整合Elasticsearch

1.elasticsearch install

1.1 windows版本下载，解压
1.2 启动
1.3 访问
1.4 关键字介绍

2.elasticsearch ui

2.1 elasticsearch-head
2.2 elasticHD
2.3 kibana

3.Springboot整合Elasticsearch

3.1 官方文档
3.2 依赖
3.3 properties
3.4 代码
3.5 报警

4.集群搭建

4.1 修改配置文件
4.2 还可以修改密码（此步骤没什么用）
4.3 启动三个节点,查看状态
4.4 启动kibana，监控

5.elasticsearch api

5.1 CURD之Create
5.2 CURD之Update
5.3 CURD之Delete
5.4 CURD之Retrieve
5.5 match查询
5.6 term查询
5.6 排序查询
5.7 分页查询

6.Elasticsearch之布尔查询

6.1 must关键字查询
6.2 should关键字查询
6.3 must_not关键字查询
6.4 filter关键字查询

7.Elasticsearch之查询结果过滤
8.Elasticsearch之高亮查询
9.Elasticsearch之聚合查询

9.1 avg
9.2 max
9.3 min
9.4 sum
9.5 range分组查询

10.Elasticsearch之Mapping & Dynamic Mapping

10.1 mapping
10.2 dynamic mapping

1 dynamic mapping
2 explicit mapping
3 strict mapping
4 小结

10.3 对象属性
10.4 控制当前字段是否被索引
10.5 对Null值实现搜索

11.elasticsearch之setting
12.elasticsearch字段的数据类型
13.cluster node
13.Analyzer进行分词

13.1 分析器

1. 标准分析器：standard analyzer
2. 简单分析器：simple analyzer
3. 空白分析器：whitespace analyzer
4. 停用词分析器：stop analyzer
5. 关键词分析器：keyword analyzer
6. 模式分析器：pattern analyzer
7. 语言和多语言分析器：chinese
8. 雪球分析器：snowball analyzer

13.2 字符过滤器

1. HTML字符过滤器
2. 映射字符过滤器
3. 模式替换过滤器

13.3 分词器

1.标准分词器：standard tokenizer
2. 关键词分词器：keyword tokenizer
3. 字母分词器：letter tokenizer
4. 小写分词器：lowercase tokenizer
5. 空白分词器：whitespace tokenizer
6. 模式分词器：pattern tokenizer
7. UAX URL电子邮件分词器：UAX RUL email tokenizer
8. 路径层次分词器：path hierarchy tokenizer

13.4 分词过滤器

1. 自定义分词过滤器
2. 自定义小写分词过滤器
3. 多个分词过滤器

13.5 IK分词器

1. 下载
2. 介绍
3. 测试

14.正排索引和倒排索引
15.数据建模
16.集群的内部安全通信

1.elasticsearch install

项目	Elasticsearch	Solr
实时索引	不会产生线程阻塞，效能高于solr	会有io阻塞
动态添加数据	对效能没有影响	效能会变得低下
分布式	本身自带分布式	利用zookeeper进行分布式管理
数据格式	仅支持json	xml,json,csv等等
地位	更适合新兴的实时搜索应用	传统应用的有力解决方案

官网地址：elasticsearch官网.最好下载一些低版本的，高版本整合会报警
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.16.2-windows-x86_64.zip.
9200 是ES对外部RESTFUL接口
9300 是ES内部使用的端口

1.1 windows版本下载，解压

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_Elastic

1.2 启动

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_elasticsearch_02

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_elasticsearch_03

1.3 访问

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_es_04

1.4 关键字介绍

Name	Desc
index	相当于Mysql中的一个库
document	相当于Mysql中的一行数据
field	相当于Mysql中的column
shards	分片存储
replicas	进行备份

1.Doc中的元数据

{
	"_id": "1",
	"_index": "lsp",
	"_score": 1,
	-"_source": {
		"age": 30,
		"desc": "皮肤黑、武器长、性格直",
		"from": "gu",
		"name": "顾老二",
		-"tags": [
			"黑",
			"长",
			"直"
		]
}

各部分的含义：
_index：文档所属的索引名
_type：文档所属的类型名
_id：文档唯一标识
_source：文档的原始JSON数据
@version：文档的版本信息（可用于并发搜索时，解决文档冲突）
_score：相关性打分（根据检索结果打分）
2.index索引

每个索引都有自己的Mapping定义，用于包含所有的文档字段名和字段类型。
 * Shard体现物理空间的概念
 * 索引中的数据分散在Shard上
 * Mapping定义文档字段的类型
 * Setting定义不同的数据分布

3.type
5.x及以前版本一个index有一个或者多个type
6.X版本一个index只有一个type
7.X版本移除了type，type相关的所有内容全部变成Deprecated，为了兼容升级和过渡，所有的7.X版本es数据写入后type字段都默认被置为_doc
8.X版本完全废弃type

2.elasticsearch ui

2.1 elasticsearch-head

下载
安装node.js安装教程. npm -v node -v
安装grunt npm install -g grunt-cli grunt -v
下载elasticsearch-head,安装,启动下载地址：https://github.com/mobz/elasticsearch-head. 安装：cd到此文件夹，然后 npm install
启动：npm run start/grunt server 界面比较老旧，不时尚

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_es_05

2.2 elasticHD

1.下载

下载地址:https://github.com/360EntSecGroup-Skylar/ElasticHD/releases.

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_鲨鱼抓包 android_06

2.启动可以直接双击启动也可以cd到安装目录，执行 ElasticHD -p 127.0.0.1:9800

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_elasticsearch_07

2.3 kibana

下载地址：https://www.elastic.co/start.

切记：要和上面的Elasticsearch版本匹配

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_elasticsearch_08

鼠标放在windows上，会显示下载地址，直接修改版本就好了
https://artifacts.elastic.co/downloads/kibana/kibana-7.15.2-linux-x86_64.tar.gz
https://artifacts.elastic.co/downloads/kibana/kibana-7.16.2-windows-x86_64.zip
1.双击Kibana.bat启动默认对应elasticsearch:9200
2.访问 http://localhost:5601，输入前面的密码

如何进行监控：Kibana监控Es Cluster.

3.Springboot整合Elasticsearch

3.1 官方文档

链接: Spring Data Elasticsearch - Reference Documentation.

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_鲨鱼抓包 android_12

3.2 依赖

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

ElasticsearchRestTemplate封装了RestHighLevelClient，源码如下👇

public class ElasticsearchRestTemplate extends AbstractElasticsearchTemplate {
    private static final Logger LOGGER = LoggerFactory.getLogger(ElasticsearchRestTemplate.class);
    private final RestHighLevelClient client;
    private final ElasticsearchExceptionTranslator exceptionTranslator;

    public ElasticsearchRestTemplate(RestHighLevelClient client) {
        Assert.notNull(client, "Client must not be null!");
        this.client = client;
        this.exceptionTranslator = new ElasticsearchExceptionTranslator();
        this.initialize(this.createElasticsearchConverter());
    }

    public ElasticsearchRestTemplate(RestHighLevelClient client, ElasticsearchConverter elasticsearchConverter) {
        Assert.notNull(client, "Client must not be null!");
        this.client = client;
        this.exceptionTranslator = new ElasticsearchExceptionTranslator();
        this.initialize(elasticsearchConverter);
    }
}

3.3 properties

spring.data.elasticsearch.client.reactive.endpoints=127.0.0.1:9200
#没有这个index，就创建
spring.data.elasticsearch.repositories.enabled=true

spring.data.elasticsearch.cluster-nodes=127.0.0.1:9300

3.4 代码

@Document表示这是一个Elasticsearch Data，
indexName对应Elasticsearch Index
type对应Elasticsearch type

@Document(indexName = "product"，type = "article")
@Data
public class Article {
    @Id
    private String id;
    private String title;
    @Field(type = FieldType.Nested, includeInParent = true)
    private List<Author> authors;
}

@Data
public class Author {
    private String name;
}

@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article,String> {

    //下面的这两个查询的作用是一样的。一个采用默认的实现方式，一个采用自定义的实现方式
    Page<Article> findByAuthorsName(String name, Pageable pageable);
    
    @Query("{\"bool\": {\"must\": [{\"match\": {\"authors.name\": \"?0\"}}]}}")
    Page<Article> findByAuthorsNameUsingCustomQuery(String name, Pageable pageable);

    //搜索title字段
    Page<Article> findByTitleIsContaining(String word,Pageable pageable);
    
    Page<Article> findByTitle(String title,Pageable pageable);
}

@Autowired
    private ArticleRepository articleRepository;
    @Autowired
    private ElasticsearchRestTemplate elasticsearchRestTemplate;

    //检查相应的索引是否存在，如果spring.data.elasticsearch.repositories.enabled=True,则会自动创建索引
    private boolean checkIndexExists(Class<?> cls){
        boolean isExist = elasticsearchRestTemplate.indexOps(cls).exists();
        //获取索引名
        String indexName = cls.getAnnotation(Document.class).indexName();
        System.out.printf("index %s is %s\n", indexName, isExist ? "exist" : "not exist");
        return isExist;
    }
    @Test
    void test() {
        checkIndexExists(Article.class);
    }


    @Test
     void save(){
        Article article = new Article();
        articel.setTitle("Spring Data Elasticsearch");
        article.setAuthors(asList(new Author("LaoAlex"),new Author("John")));
        articleRepository.save(article);

        article = new Article();
        articel.setTitle("Spring Data Elasticsearch2");
        article.setAuthors(asList(new Author("LaoAlex"),new Author("King")));
        articleRepository.save(article);

        article = new Article();
        articel.setTitle("Spring Data Elasticsearch3");
        article.setAuthors(asList(new Author("LaoAlex"),new Author("Bill")));
        articleRepository.save(article);
    }		
   
    @Test
    void queryAuthorName() throws JsonProcessingException {
        Page<Article> articles = articleRepository.findByAuthorsName("LaoAlex", PageRequest.of(0,10));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);
    }

    //使用自定义查询
    @Test
    void queryAuthorNameByCustom() throws JsonProcessingException {
        Page<Article> articles = articleRepository.findByAuthorsNameUsingCustomQuery("John",PageRequest.of(0,10));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);
    }

    //使用Template进行关键字查询
    @Test
    void queryTileContainByTemplate() throws JsonProcessingException {
        Query query = new NativeSearchQueryBuilder().withFilter(regexpQuery("title",".*elasticsearch2.*")).build();
        SearchHits<Article> articles = elasticsearchRestTemplate.search(query, Article.class, IndexCoordinates.of("product"));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);
    }


    @Test
    void update() throws JsonProcessingException {
        Page<Article> articles = articleRepository.findByTitle("Spring Data Elasticsearch",PageRequest.of(0,10));
        //将对象转为Json字符串
        ObjectWriter objectWriter = new ObjectMapper().writer().withDefaultPrettyPrinter();
        String json = objectWriter.writeValueAsString(articles);
        System.out.println(json);

        Article article = articles.getContent().get(0);
        System.out.println(article);
        article.setAuthors(null);
        articleRepository.save(article);
    }


    @Test
    void delete(){
        Page<Article> articles = articleRepository.findByTitle("Spring Data Elasticsearch",PageRequest.of(0,10));
        Article article = articles.getContent().get(0);
        articleRepository.delete(article);
    }

3.5 报警

1.no id property found for class

//报警就是es用来封装的实体类出了问题，两个办法解决
//1.将主键栏位改为id
@Data
@Document(indexName = "spring.student")
public class Student {

    private int id;
    private String stuName;
    private String stuAddress;
    private String gender;
}


//2.如果主键栏位不是id，给主键栏位添加注解@Id
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;

@Data
@Document(indexName = "spring.student")
public class Student {

    @Id
    private int stuId;
    private String stuName;
    private String stuAddress;
    private String gender;
}

4.集群搭建

4.1 修改配置文件

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 三台都是这个名字
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 集群中节点名称 （3个节点以此为：node-1，node-2，node-3）
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 127.0.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
# 逐一修改三台的端口,分别是(9201 9301) ,(9202 9302) ,(9203 9303)
http.port: 9201
transport.tcp.port: 9301
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#三个都一样，node-1为主节点
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
#
# ---------------------------------- Security ----------------------------------
#
#                                 *** WARNING ***
#
# Elasticsearch security features are not enabled by default.
# These features are free, but require configuration changes to enable them.
# This means that users don’t have to provide credentials and can get full access
# to the cluster. Network connections are also not encrypted.
#
# To protect your data, we strongly encourage you to enable the Elasticsearch security features. 
# Refer to the following documentation for instructions.
#
# https://www.elastic.co/guide/en/elasticsearch/reference/7.16/configuring-stack-security.html

#allow origin
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization

4.2 还可以修改密码（此步骤没什么用）

都说默认密码如下：我是登不进去
username=elastic
password=changeme
cd到bin目录，找到elasticsearch-setup-passwords(或者直接在地址栏cmd) D:\Tools\es_cluster\elasticsearch-7.16.2\bin>elasticsearch-setup-passwords interactive

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_鲨鱼抓包 android_13

噼里啪啦一顿改，必须全都要改
elas123

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_elasticsearch_14

4.3 启动三个节点,查看状态

1.节点启动成功，并不代表集群成功
2.呼叫下面的api，查看集群状态
http://localhost:9203/_cat/health?v
3.查看集群状态
http://ip:port/_cluster/health
http://ip:port/_cat/nodes
http://ip:port/ _cat/shards

GET http://localhost:9203/_cluster/health

{
  "cluster_name": "my-application",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 8,
  "active_shards": 16,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100.0
}

颜色的含义：
green - 主分片与副本都正常分配
yellow - 主分片全部分配，有副本分片未能正常分配
red - 有主分片未能分配

4.4 启动kibana，监控

1.修改配置文件

# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "127.0.0.1"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# Specifies the public URL at which Kibana is available for end users. If
# `server.basePath` is configured this URL should end with the same basePath.
#server.publicBaseUrl: ""

# The maximum payload size in bytes for incoming server requests.
#server.maxPayload: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URLs of the Elasticsearch instances to use for all your queries.
#elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.hosts: ["http://localhost:9201","http://localhost:9202","http://localhost:9203"]

2.启动

此处缺一张图片kibana

5.elasticsearch api

造数据

PUT users/_doc/1
{
  "name":"张飞",
  "age":30,
  "from": "China",
  "desc": "皮肤黑、武器重、性格直",
  "tags": ["黑", "重", "直"]
}

PUT users/_doc/2
{
  "name":"赵云",
  "age":18,
  "from":"China",
  "desc":"帅气逼人，一身白袍",
  "tags":["帅", "白"]
}

PUT users/_doc/3
{
  "name":"关羽",
  "age":22,
  "from":"England",
  "desc":"大刀重，骑赤兔马，胡子长",
  "tags":["重", "马","长"]
}


PUT users/_doc/4
{
  "name":"刘备",
  "age":29,
  "from":"Child",
  "desc":"大耳贼，持双剑，懂谋略",
  "tags":["剑", "大"]
}

PUT users/_doc/5
{
  "name":"貂蝉",
  "age":25,
  "from":"England",
  "desc":"闭月羞花，沉鱼落雁",
  "tags":["闭月","羞花"]
}

5.1 CURD之Create

Notice：当执行PUT命令时，如果数据不存在，则新增该条数据，如果数据存在则修改该条数据下面是两种创建方法

//当执行PUT命令时，如果数据不存在，则新增该条数据，如果数据存在则修改该条数据
POST users/_doc
{
  "user": "Mike",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out kibana"
}
post的id会是随机的，建议还是下面的put好


PUT users/_doc/1
{
  "user": "Jack",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out Elasticsearch"
}

PUT users/_doc/2
{
  "user": "Ludy",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out Elasticsearch"
}

查询某条数据x
GET users/_doc/x

5.2 CURD之Update

POST users/_doc/3/_update
{
  "doc": {
    "post_date": "2020-10-24T14:39:30",
    "message": "trying out Elasticsearch"
  }
}

5.3 CURD之Delete

DELETE users/_doc/4

5.4 CURD之Retrieve

使用elasticHD进行查询，Demo：
索引spring.student
索引spring.test
type默认都是doc
GET /Spring.student/_search 此时查询全部
GET /Spring.student/_search?q=id:1 此时查询一个index
GET /spring.student,spring.test/_search?q=id:1 此时查询多个index

5.5 match查询

GET users/_doc/_search
{
  "query": {
    "match": {
      "post_date": "2020-10-24T14:39:30"
    }
  }
}

5.6 term查询

GET users/_doc/_search
{
  "query": {
    "term": {
      "t1": "Beautiful girl!"
    }
  }
}

5.6 排序查询

可排序的属性
数字
日期

GET users/_doc/_search
{
  "query": {
    "match": {
      "post_date": "2020-10-24T14:39:30"
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ]
}

5.7 分页查询

GET users/_doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "from": 2,
  "size": 1
}

6.Elasticsearch之布尔查询

关键字	代表的含义
must	and
should	or
must_not	not
filter	与must组合使用
range	条件筛选范围。
gt	大于，相当于关系型数据库中的>。
gte	大于等于，相当于关系型数据库中的>=。
lt	小于，相当于关系型数据库中的<。
lte	小于等于，相当于关系型数据库中的<=。

6.1 must关键字查询

//一个条件
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ]
    }
  }
}

//两个条件
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "age": 30
          }
        }
      ]
    }
  }
}

6.2 should关键字查询

GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "闭月"
          }
        }
      ]
    }
  }
}

6.3 must_not关键字查询

//三个条件都满足
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "可爱"
          }
        },
        {
          "match": {
            "age": 18
          }
        }
      ]
    }
  }
}

6.4 filter关键字查询

//要查询from为gu，age大于25的数据怎么查
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gt": 25
          }
        }
      }
    }
  }
}

7.Elasticsearch之查询结果过滤

//所有的结果中，我只需要查看name和age两个属性
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "name": "顾老二"
    }
  },
  "_source": ["name", "age"]
}

8.Elasticsearch之高亮查询

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "name": "石头"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

使用b标签自定义高亮

GET lqz/chengyuan/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "highlight": {
    "pre_tags": "<b class='key' style='color:red'>",
    "post_tags": "</b>",
    "fields": {
      "from": {}
    }
  }
}

9.Elasticsearch之聚合查询

聚合函数查询
avg
max
min
sum
聚合函数，其语法被封装在aggs中，而my_xxx则是为查询结果起个别名，封装了计算出的值

"aggregations" : {
    "my_avg" : {
      "value" : 27.0
    }
  }
  "aggregations" : {
    "my_max" : {
      "value" : 30.0
    }
  }
  .......

9.1 avg

GET users/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "size": 0, 
  "_source": ["name", "age"]
}

9.2 max

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_max": {
      "max": {
        "field": "age"
      }
    }
  },
  "size": 0
}

9.3 min

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_min": {
      "min": {
        "field": "age"
      }
    }
  },
  "size": 0
}

9.4 sum

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_sum": {
      "sum": {
        "field": "age"
      }
    }
  },
  "size": 0
}

9.5 range分组查询

GET lqz/_doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      }
    }
  }
}

两个条件，即分组，又要求平均值

GET lqz/_doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      },
      "aggs": {
        "my_avg": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

10.Elasticsearch之Mapping & Dynamic Mapping

10.1 mapping

GET index_name/_mapping

//先来感受一下
PUT users/_doc/1
{
  "user": "Jack",
  "post_date": "2020-10-24T14:39:30",
  "message": "trying out Elasticsearch"
}


GET users/_mapping
{
  "users" : {
    "mappings" : {
      "properties" : {
        "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "post_date" : {
          "type" : "date"
        },
        "user" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

每个索引都有一个映射类型（这话必须放在elasticsearch6.x版本后才能说) 映射类型有：
元字段（meta-fields）：元字段用于自定义如何处理文档关联的元数据，例如包括文档的_index、_type、_id和_source字段。
字段或属性（field or properties）：映射类型包含与文档相关的字段或者属性的列表。
字段的 mapping 可以设置很多参数，如下：
analyzer：指定分词器，只有 text 类型的数据支持。
enabled：如果设置成 false，表示数据仅做存储，不支持搜索和聚合分析（数据保存在 _source 中）。默认值为 true。
index：字段是否建立倒排索引。如果设置成 false，表示不建立倒排索引（节省空间），同时数据也无法被搜索，但依然支持聚合分析，数据也会出现在 _source 中。默认值为 true。
norms：字段是否支持算分。如果字段只用来过滤和聚合分析，而不需要被搜索（计算算分），那么可以设置为 false，可节省空间。默认值为 true。
doc_values：如果确定不需要对字段进行排序或聚合，也不需要从脚本访问字段值，则可以将其设置为 false，以节省磁盘空间。默认值为 true。
fielddata：如果要对 text 类型的数据进行排序和聚合分析，则将其设置为 true。默认为 false。
store：默认值为 false，数据存储在 _source 中。默认情况下，字段值被编入索引以使其可搜索，但它们不会被存储。这意味着可以查询字段，但无法检索原始字段值。在某些情况下，存储字段是有意义的。例如，有一个带有标题、日期和非常大的内容字段的文档，只想检索标题和日期，而不必从一个大的源字段中提取这些字段。
boost：可增强字段的算分。
coerce：是否开启数据类型的自动转换，比如字符串转数字。默认是开启的。
dynamic：控制 mapping 的自动更新，取值有 true，false，strict。
eager_global_ordinals
fields：多字段特性。让一个字段拥有多个子字段类型，使得一个字段能够被多个不同的索引方式进行索引。
copy_to
format
ignore_above
ignore_malformed
index_options
index_phrases
index_prefixes
meta
normalizer
null_value：定义 null 的值。
position_increment_gap
properties
search_analyzer
similarity
term_vector

10.2 dynamic mapping

Dynamic Mapping的机制
我们无需手动定义Mappings。ES会自动根据文档信息，推算出字段的类型。
但是有时候会推算出不对，例如地理位置信息
当类型如果设置不对时，会导致一些功能无法正常运行，例如Range查询。
动态映射 dynamic mapping
静态映射 explicit mapping
严格映射 strict mappings

1 dynamic mapping

//创建索引nolan
PUT nolan
{
  "mappings": {
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
  }
}

//查询索引nolan
GET nolan
{
  "nolan" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "nolan",
        "creation_date" : "1650261517356",
        "number_of_replicas" : "1",
        "uuid" : "dXUnwua2TDCI2K9hcSL98A",
        "version" : {
          "created" : "7160299"
        }
      }
    }
  }
}

//插入数据
PUT nolan/_doc/1
{
  "name": "小黑",
  "age": 18,
  "sex": "不详"
}

//查询索引
{
  "nolan" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        },
        "sex" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "nolan",
        "creation_date" : "1650261517356",
        "number_of_replicas" : "1",
        "uuid" : "dXUnwua2TDCI2K9hcSL98A",
        "version" : {
          "created" : "7160299"
        }
      }
    }
  }
}

上面的例子，你会发现：
elasticsearch帮我们动态的新增了一个sex的映射
elasticsearch默认是允许添加新的字段的，也就是dynamic：true。

//其实创建索引的时候，是这样的
PUT nolan
{
  "mappings": {
      "dynamic":true,
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
  }
}

2 explicit mapping

将dynamic值设置为false

//创建索引nolan1
PUT nolan1
{
  "mappings": {
    "dynamic": "false",
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      }
    }
  }
}

//插入数据
PUT nolan1/_doc/1
{
  "name": "小黑",
  "age":18
}
PUT nolan1/_doc/2
{
  "name": "小白",
  "age": 16,
  "sex": "不详"
}

//查询mapping
GET nolan1
{
  "nolan1" : {
    "aliases" : { },
    "mappings" : {
      "dynamic" : "false",
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "nolan1",
        "creation_date" : "1650262553126",
        "number_of_replicas" : "1",
        "uuid" : "OuUWSsQ-SUaGr2ged3FRbQ",
        "version" : {
          "created" : "7160299"
        }
      }
    }
  }
}

可以看到elasticsearch并没有为新增的sex建立映射关系
当elasticsearch察觉到有新增字段时，因为dynamic:false的关系
会忽略该字段，但是仍会存储该字段。

3 strict mapping

将dynamic的状态改为strict

//创建索引nolan2
PUT nolan2
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      }
    }
  }
}

//插入数据,当执行第二笔的时候会报警
PUT nolan1/_doc/1
{
  "name": "小黑",
  "age":18
}
PUT nolan1/_doc/2
{
  "name": "小白",
  "age": 16,
  "sex": "不详"
}

遇到新字段，就会抛出异常

4 小结

Name	Setting	Value
动态映射	dynamic: true	动态添加新的字段（或缺省）
静态映射	dynamic: false	忽略新的字段。在原有的映射基础上，当有新的字段时，不会主动的添加新的映射关系，只作为查询结果出现在查询中。
严格模式	dynamic: strict	如果遇到新的字段，就抛出异常

一般静态映射用的较多。就像HTML的img标签一样,你可以在需要的时候添加id或者class属性。

10.3 对象属性

//属性嵌套
PUT noaln2/_doc/1
{
  "name":"tom",
  "age":18,
  "info":{
    "addr":"北京",
    "tel":"10010"
  }
}

PUT noaln2/_doc/21
{
  "name":"jim",
  "age":21,
  "info":{
    "addr":"东莞",
    "tel":"10086"
  }
}

//创建索引nolan2
PUT nolan2
{
  "mappings": {
   "dynamic": false,
   "properties": {
     "name": {
       "type": "text"
     },
     "age": {
       "type": "text"
     },
     "info": {
       "properties": {
         "addr": {
           "type": "text"
         },
         "tel": {
           "type" : "text"
         }
       }
     }
   }
  }
}

GET nolan2/_doc/_search
{
  "query": {
    "match": {
      "info.tel": "10086"
    }
  }
}

10.4 控制当前字段是否被索引

关键字index
age属性不会被索引

PUT nolan3
{
  "mappings": {
     "dynamic": false,
     "properties": {
       "name": {
         "type": "text",
         "index": true
       },
       "age": {
         "type": "long",
         "index": false
       }
     }
  }
}

10.5 对Null值实现搜索

1.Keyword类型支持设定 null_value

PUT users
{
  "mappings" : {
	  "properties" : {
	    "firstName" : {
	     "type" : "text"
	    },
	    "lastName" : {
	     "type" : "text"
	    },
	    "mobile" : {
	     "type" : "keyword",
	     "null_value": "NULL"
	    }
	   }
  }
}

2.ignore_above

//创建索引
PUT nolan
{
  "mappings": {
      "properties":{
        "t1":{
          "type":"keyword",
          "ignore_above": 5
        },
        "t2":{
          "type":"keyword",
          "ignore_above": 10 ①
        }
      }
  }
}

//插入数据
PUT nolan/_doc/1
{
  "t1":"elk",         ②
  "t2":"elasticsearch"     ③
}

//查询④
GET nolan/_doc/_search
{
  "query":{
    "term": {
      "t1": "elk"
    }
  }
}

//查询⑤
GET nolan/_doc/_search
{
  "query": {
    "term": {
      "t2": "elasticsearch"
    }
  }
}

该字段将忽略任何超过10个字符的字符串
文档已成功建立索引，也就是说能被查询，并且有结果返回
该字段将不会建立索引，以该字段作为查询条件，将不会有结果返回。
有结果返回。
则将不会有结果返回，因为t2字段对应的值长度超过了ignove_above设置的值。

11.elasticsearch之setting

设置主、复制分片

PUT nolan
{
  "mappings": {
	"properties": {
	   "name": {
	     "type": "text"
	   }
    }
  }, 
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 5
  }
}

number_of_shards是主分片数量（每个索引默认5个主分片）
number_of_replicas是复制分片，默认一个主分片搭配一个复制分片。

12.elasticsearch字段的数据类型

简单类型*
Numeric
Boolean
Date
Text
Keyword
Binary
等等
复杂类型
Object
Arrays
Nested：一种对象数据类型。
Join：为同一索引中的文档定义父/子关系。
特殊类型
Geo-point
Geo-shape
Percolator

13.cluster node

1.Master eligible nodes 和Master node 每个节点启动后，默认就是Master eligible节点 Master-eligible可以参加选主进程，成为Master节点当第一个节点启动时，它会将自己选举成Master节点
只有Master节点可以修改集群的状态信息集群状态（Cluster State）,维护了一个集群中，必要的信息
所有的节点信息
所有的索引和其相关的Mapping与Setting信息
分片的路由信息
2.Data Node & Coordinationg Node
Data Node 可以保存数据的节点。负责保存分片数据。在数据的扩展上起到了至关重要的作用
Coordinationg Node 负责接收Client的请求，将请求分发到合适的节点，最终把结果汇集到一起
3.分片(Primary Shard & Replica Shard)
主分片用以解决数据水平扩展的问题，通过主分片，可以将数据分布到集群内的所有节点上。一个分片是一个运行的Lucene实例主分片数在索引创建时指定，后续不允许修改，除非Reindex
副本用于解决数据高可用的问题。是主分片的拷贝副本分片数，可以动态调整增加副本数，还可以在一定程度上提高服务的可用性（读取的吞吐）

13.Analyzer进行分词

数据被发送到elasticsearch后,会进行的一系列操作
字符过滤：使用字符过滤器转变字符。
文本切分为分词：将文本（档）分为单个或多个分词。
分词过滤：使用分词过滤器转变每个分词。
分词索引：最终将分词存储在Lucene倒排索引中

13.1 分析器

在elasticsearch中，一个分析器可以包括：
可选的字符过滤器
一个分词器
0个或多个分词过滤器

1. 标准分析器：standard analyzer

标准分析器（standard analyzer）：是elasticsearch的默认分析器，该分析器综合了大多数欧洲语言来说合理的默认模块，包括标准分词器、标准分词过滤器、小写转换分词过滤器和停用词分词过滤器。

POST _analyze
{
  "analyzer": "standard",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "to",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "that",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

2. 简单分析器：simple analyzer

简单分析器（simple analyzer）：简单分析器仅使用了小写转换分词，这意味着在非字母处进行分词，并将分词自动转换为小写。这个分词器对于亚种语言来说效果不佳，因为亚洲语言不是根据空白来分词的，所以一般用于欧洲言中

POST _analyze
{
  "analyzer": "simple",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "to",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "that",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

3. 空白分析器：whitespace analyzer

空白格分析器（whitespace analyzer）：这玩意儿只是根据空白将文本切分为若干分词，真是有够偷懒！

POST _analyze
{
  "analyzer": "whitespace",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be,",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "————",
      "start_offset" : 40,
      "end_offset" : 44,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 11
    }
  ]
}

4. 停用词分析器：stop analyzer

停用词分析（stop analyzer）和简单分析器的行为很像，只是在分词流中额外的过滤了停用词

POST _analyze
{
  "analyzer": "stop",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

5. 关键词分析器：keyword analyzer

关键词分析器（keyword analyzer）将整个字段当做单独的分词，如无必要，我们不在映射中使用关键词分析器。

POST _analyze
{
  "analyzer": "keyword",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "To be or not to be,  That is a question ———— 莎士比亚",
      "start_offset" : 0,
      "end_offset" : 49,
      "type" : "word",
      "position" : 0
    }
  ]
}

6. 模式分析器：pattern analyzer

模式分析器（pattern analyzer）允许我们指定一个分词切分模式。但是通常更佳的方案是使用定制的分析器，组合现有的模式分词器和所需要的分词过滤器更加合适。

PUT pattern_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_email_analyzer":{
          "type":"pattern",
          "pattern":"\\W|_",
          "lowercase":true
        }
      }
    }
  }
}

POST pattern_test/_analyze
{
  "analyzer": "my_email_analyzer",
  "text": "John_Smith@foo-bar.com"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "john",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "smith",
      "start_offset" : 5,
      "end_offset" : 10,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "foo",
      "start_offset" : 11,
      "end_offset" : 14,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "bar",
      "start_offset" : 15,
      "end_offset" : 18,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "com",
      "start_offset" : 19,
      "end_offset" : 22,
      "type" : "word",
      "position" : 4
    }
  ]
}

7. 语言和多语言分析器：chinese

elasticsearch为很多世界流行语言提供良好的、简单的、开箱即用的语言分析器集合：阿拉伯语、亚美尼亚语、巴斯克语、巴西语、保加利亚语、加泰罗尼亚语、中文、捷克语、丹麦、荷兰语、英语、芬兰语、法语、加里西亚语、德语、希腊语、北印度语、匈牙利语、印度尼西亚、爱尔兰语、意大利语、日语、韩国语、库尔德语、挪威语、波斯语、葡萄牙语、罗马尼亚语、俄语、西班牙语、瑞典语、土耳其语和泰语。

POST _analyze
{
  "analyzer": "chinese",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

也可以是别语言：

POST _analyze
{
  "analyzer": "french",
  "text":"Je suis ton père"
}
POST _analyze
{
  "analyzer": "german",
  "text":"Ich bin dein vater"
}

8. 雪球分析器：snowball analyzer

雪球分析器（snowball analyzer）除了使用标准的分词和分词过滤器（和标准分析器一样）也是用了小写分词过滤器和停用词过滤器，除此之外，它还是用了雪球词干器对文本进行词干提取。

POST _analyze
{
  "analyzer": "snowball",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

// 分词结果如下
{
  "tokens" : [
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

13.2 字符过滤器

Name	Value
HTML字符过滤器	HTML Strip Char Filter
映射字符过滤器	Mapping Char Filter
模式替换过滤器	Pattern Replace Char Filter

1. HTML字符过滤器

HTML字符过滤器（HTML Strip Char Filter）从文本中去除HTML元素。

POST _analyze
{
  "tokenizer": "keyword",
  "char_filter": ["html_strip"],
  "text":"<p>I'm so <b>happy</b>!</p>"
}

//结果如下
{
  "tokens" : [
    {
      "token" : """

I'm so happy!

""",
      "start_offset" : 0,
      "end_offset" : 32,
      "type" : "word",
      "position" : 0
    }
  ]
}

2. 映射字符过滤器

映射字符过滤器（Mapping Char Filter）接收键值的映射，每当遇到与键相同的字符串时，它就用该键关联的值替换它们。

PUT pattern_test4
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer":{
          "tokenizer":"keyword",
          "char_filter":["my_char_filter"]
        }
      },
      "char_filter":{
          "my_char_filter":{
            "type":"mapping",
            "mappings":["刘备 => 666","关羽 => 888"]
          }
        }
    }
  }
}

POST pattern_test4/_analyze
{
  "analyzer": "my_analyzer",
  "text": "刘备爱惜关羽，可是后来关羽大意失荆州"
}

//结果如下
{
  "tokens" : [
    {
      "token" : "666爱惜888，可是后来888大意失荆州",
      "start_offset" : 0,
      "end_offset" : 19,
      "type" : "word",
      "position" : 0
    }
  ]
}

3. 模式替换过滤器

模式替换过滤器（Pattern Replace Char Filter）使用正则表达式匹配并替换字符串中的字符。但要小心你写的抠脚的正则表达式。因为这可能导致性能变慢！

PUT pattern_test5
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1_"
        }
      }
    }
  }
}

POST pattern_test5/_analyze
{
  "analyzer": "my_analyzer",
  "text": "My credit card is 123-456-789"
}

//结果如下
{
  "tokens" : [
    {
      "token" : "My",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "credit",
      "start_offset" : 3,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "card",
      "start_offset" : 10,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "is",
      "start_offset" : 15,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "123_456_789",
      "start_offset" : 18,
      "end_offset" : 29,
      "type" : "<NUM>",
      "position" : 4
    }
  ]
}

13.3 分词器

由于elasticsearch内置了分析器，它同样也包含了分词器。分词器，顾名思义，主要的操作是将文本字符串分解为小块，而这些小块这被称为分词token。

1.标准分词器：standard tokenizer

标准分词器（standard tokenizer）是一个基于语法的分词器，对于大多数欧洲语言来说还是不错的，它同时还处理了Unicode文本的分词，但分词默认的最大长度是255字节，它也移除了逗号和句号这样的标点符号。

POST _analyze
{
  "tokenizer": "standard",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "莎",
      "start_offset" : 45,
      "end_offset" : 46,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "士",
      "start_offset" : 46,
      "end_offset" : 47,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "比",
      "start_offset" : 47,
      "end_offset" : 48,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "亚",
      "start_offset" : 48,
      "end_offset" : 49,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    }
  ]
}

2. 关键词分词器：keyword tokenizer

关键词分词器（keyword tokenizer）是一种简单的分词器，将整个文本作为单个的分词，提供给分词过滤器，当你只想用分词过滤器，而不做分词操作时，它是不错的选择。

POST _analyze
{
  "tokenizer": "keyword",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To be or not to be,  That is a question ———— 莎士比亚",
      "start_offset" : 0,
      "end_offset" : 49,
      "type" : "word",
      "position" : 0
    }
  ]
}

3. 字母分词器：letter tokenizer

字母分词器（letter tokenizer）根据非字母的符号，将文本切分成分词。

POST _analyze
{
  "tokenizer": "letter",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

4. 小写分词器：lowercase tokenizer

小写分词器（lowercase tokenizer）结合了常规的字母分词器和小写分词过滤器（跟你想的一样，就是将所有的分词转化为小写）的行为。通过一个单独的分词器来实现的主要原因是，一次进行两项操作会获得更好的性能。

POST _analyze
{
  "tokenizer": "lowercase",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "to",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "that",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 10
    }
  ]
}

5. 空白分词器：whitespace tokenizer

空白分词器（whitespace tokenizer）通过空白来分隔不同的分词，空白包括空格、制表符、换行等。但是，我们需要注意的是，空白分词器不会删除任何标点符号。

POST _analyze
{
  "tokenizer": "whitespace",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

{
  "tokens" : [
    {
      "token" : "To",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "be,",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "That",
      "start_offset" : 21,
      "end_offset" : 25,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "is",
      "start_offset" : 26,
      "end_offset" : 28,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "a",
      "start_offset" : 29,
      "end_offset" : 30,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "question",
      "start_offset" : 31,
      "end_offset" : 39,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "————",
      "start_offset" : 40,
      "end_offset" : 44,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "莎士比亚",
      "start_offset" : 45,
      "end_offset" : 49,
      "type" : "word",
      "position" : 11
    }
  ]
}

6. 模式分词器：pattern tokenizer

模式分词器（pattern tokenizer）允许指定一个任意的模式，将文本切分为分词。

POST pattern_test2/_analyze
{
  "tokenizer": "my_tokenizer",
  "text":"To be or not to be,  That is a question ———— 莎士比亚"
}

PUT pattern_test2
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer":{
          "tokenizer":"my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer":{
          "type":"pattern",
          "pattern":","
        }
      }
    }
  }
}

{
  "tokens" : [
    {
      "token" : "To be or not to be",
      "start_offset" : 0,
      "end_offset" : 18,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "  That is a question ———— 莎士比亚",
      "start_offset" : 19,
      "end_offset" : 49,
      "type" : "word",
      "position" : 1
    }
  ]
}

7. UAX URL电子邮件分词器：UAX RUL email tokenizer

POST _analyze
{
  "tokenizer": "uax_url_email",
  "text":"作者：张开来源：未知原文：邮箱：xxxxxxx@xx.com版权声明：本文为博主原创文章，转载请附上博文链接！"
}

{
  "tokens" : [
    {
      "token" : "作",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "者",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "张",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "开",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "来",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "源",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "未",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "知",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "原",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    },
    {
      "token" : "文",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "<IDEOGRAPHIC>",
      "position" : 9
    },
    {
      "token" : "",
      "start_offset" : 13,
      "end_offset" : 64,
      "type" : "<URL>",
      "position" : 10
    },
    {
      "token" : "邮",
      "start_offset" : 64,
      "end_offset" : 65,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "箱",
      "start_offset" : 65,
      "end_offset" : 66,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "xxxxxxx@xx.com",
      "start_offset" : 67,
      "end_offset" : 81,
      "type" : "<EMAIL>",
      "position" : 13
    },
    {
      "token" : "版",
      "start_offset" : 81,
      "end_offset" : 82,
      "type" : "<IDEOGRAPHIC>",
      "position" : 14
    },
    {
      "token" : "权",
      "start_offset" : 82,
      "end_offset" : 83,
      "type" : "<IDEOGRAPHIC>",
      "position" : 15
    },
    {
      "token" : "声",
      "start_offset" : 83,
      "end_offset" : 84,
      "type" : "<IDEOGRAPHIC>",
      "position" : 16
    },
    {
      "token" : "明",
      "start_offset" : 84,
      "end_offset" : 85,
      "type" : "<IDEOGRAPHIC>",
      "position" : 17
    },
    {
      "token" : "本",
      "start_offset" : 86,
      "end_offset" : 87,
      "type" : "<IDEOGRAPHIC>",
      "position" : 18
    },
    {
      "token" : "文",
      "start_offset" : 87,
      "end_offset" : 88,
      "type" : "<IDEOGRAPHIC>",
      "position" : 19
    },
    {
      "token" : "为",
      "start_offset" : 88,
      "end_offset" : 89,
      "type" : "<IDEOGRAPHIC>",
      "position" : 20
    },
    {
      "token" : "博",
      "start_offset" : 89,
      "end_offset" : 90,
      "type" : "<IDEOGRAPHIC>",
      "position" : 21
    },
    {
      "token" : "主",
      "start_offset" : 90,
      "end_offset" : 91,
      "type" : "<IDEOGRAPHIC>",
      "position" : 22
    },
    {
      "token" : "原",
      "start_offset" : 91,
      "end_offset" : 92,
      "type" : "<IDEOGRAPHIC>",
      "position" : 23
    },
    {
      "token" : "创",
      "start_offset" : 92,
      "end_offset" : 93,
      "type" : "<IDEOGRAPHIC>",
      "position" : 24
    },
    {
      "token" : "文",
      "start_offset" : 93,
      "end_offset" : 94,
      "type" : "<IDEOGRAPHIC>",
      "position" : 25
    },
    {
      "token" : "章",
      "start_offset" : 94,
      "end_offset" : 95,
      "type" : "<IDEOGRAPHIC>",
      "position" : 26
    },
    {
      "token" : "转",
      "start_offset" : 96,
      "end_offset" : 97,
      "type" : "<IDEOGRAPHIC>",
      "position" : 27
    },
    {
      "token" : "载",
      "start_offset" : 97,
      "end_offset" : 98,
      "type" : "<IDEOGRAPHIC>",
      "position" : 28
    },
    {
      "token" : "请",
      "start_offset" : 98,
      "end_offset" : 99,
      "type" : "<IDEOGRAPHIC>",
      "position" : 29
    },
    {
      "token" : "附",
      "start_offset" : 99,
      "end_offset" : 100,
      "type" : "<IDEOGRAPHIC>",
      "position" : 30
    },
    {
      "token" : "上",
      "start_offset" : 100,
      "end_offset" : 101,
      "type" : "<IDEOGRAPHIC>",
      "position" : 31
    },
    {
      "token" : "博",
      "start_offset" : 101,
      "end_offset" : 102,
      "type" : "<IDEOGRAPHIC>",
      "position" : 32
    },
    {
      "token" : "文",
      "start_offset" : 102,
      "end_offset" : 103,
      "type" : "<IDEOGRAPHIC>",
      "position" : 33
    },
    {
      "token" : "链",
      "start_offset" : 103,
      "end_offset" : 104,
      "type" : "<IDEOGRAPHIC>",
      "position" : 34
    },
    {
      "token" : "接",
      "start_offset" : 104,
      "end_offset" : 105,
      "type" : "<IDEOGRAPHIC>",
      "position" : 35
    }
  ]
}

8. 路径层次分词器：path hierarchy tokenizer

路径层次分词器（path hierarchy tokenizer）允许以特定的方式索引文件系统的路径，这样在搜索时，共享同样路径的文件将被作为结果返回。

POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text":"/usr/local/python/python2.7"
}

{
  "tokens" : [
    {
      "token" : "/usr",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/local",
      "start_offset" : 0,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/local/python",
      "start_offset" : 0,
      "end_offset" : 17,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/local/python/python2.7",
      "start_offset" : 0,
      "end_offset" : 27,
      "type" : "word",
      "position" : 0
    }
  ]
}

13.4 分词过滤器

1. 自定义分词过滤器

PUT pattern_test3
{
  "settings": {
    "analysis": {
      "filter": {
        "my_test_length":{
          "type":"length",
          "max":8,
          "min":2
        }
      }
    }
  }
}

POST pattern_test3/_analyze
{
  "tokenizer": "standard",
  "filter": ["my_test_length"],
  "text":"a Small word and a longerword"
}

//结果如下：
{
  "tokens" : [
    {
      "token" : "Small",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "word",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "and",
      "start_offset" : 13,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

2. 自定义小写分词过滤器

PUT lowercase_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard_lowercase_example": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        },
        "greek_lowercase_example": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["greek_lowercase"]
        }
      },
      "filter": {
        "greek_lowercase": {
          "type": "lowercase",
          "language": "greek"
        }
      }
    }
  }
}

POST lowercase_example/_analyze
{
  "tokenizer": "standard",
  "filter": ["greek_lowercase"],
  "text":"Ένα φίλτρο διακριτικού τύπου πεζά s ομαλοποιεί το κείμενο διακριτικού σε χαμηλότερη θήκη"
}

{
  "tokens" : [
    {
      "token" : "ενα",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "φιλτρο",
      "start_offset" : 4,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "διακριτικου",
      "start_offset" : 11,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "τυπου",
      "start_offset" : 23,
      "end_offset" : 28,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "πεζα",
      "start_offset" : 29,
      "end_offset" : 33,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "s",
      "start_offset" : 34,
      "end_offset" : 35,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "ομαλοποιει",
      "start_offset" : 36,
      "end_offset" : 46,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "το",
      "start_offset" : 47,
      "end_offset" : 49,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "κειμενο",
      "start_offset" : 50,
      "end_offset" : 57,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "διακριτικου",
      "start_offset" : 58,
      "end_offset" : 69,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "σε",
      "start_offset" : 70,
      "end_offset" : 72,
      "type" : "<ALPHANUM>",
      "position" : 10
    },
    {
      "token" : "χαμηλοτερη",
      "start_offset" : 73,
      "end_offset" : 83,
      "type" : "<ALPHANUM>",
      "position" : 11
    },
    {
      "token" : "θηκη",
      "start_offset" : 84,
      "end_offset" : 88,
      "type" : "<ALPHANUM>",
      "position" : 12
    }
  ]
}

3. 多个分词过滤器

POST _analyze
{
  "tokenizer": "standard",
  "filter": ["length","lowercase"],
  "text":"a Small word and a longerword"
}

{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "small",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "word",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "and",
      "start_offset" : 13,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "a",
      "start_offset" : 17,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "longerword",
      "start_offset" : 19,
      "end_offset" : 29,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

13.5 IK分词器

1. 下载

1.打开Github官网，搜索elasticsearch-analysis-ik，单击medcl/elasticsearch-analysis-ik
https://github.com/medcl/elasticsearch-analysis-ik.

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_es_18

2.ik版本要和es的版本匹配

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_elasticsearch_19

3.在es的安装目录，找到plugins，并新建ik子目录，将ik解压后放入此目录
4.重启es和kibana

2. 介绍

Name	Function
IKAnalyzer.cfg.xml	用来配置自定义的词库
main.dic	ik原生内置的中文词库，大约有27万多条，只要是这些单词，都会被分在一起。
surname.dic	中国的姓氏。
suffix.dic	特殊（后缀）名词，例如乡、江、所、省等等。
preposition.dic	中文介词，例如不、也、了、仍等等。
stopword.dic	英文停用词库，例如a、an、and、the等。
quantifier.dic	单位名词，如厘米、件、倍、像素等。

3. 测试

分解

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "上海自来水来自海上"
}

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "今天是个好日子"
}

查询

GET ik1/_search
{
  "query": {
    "match_phrase": {
      "content": "今天"
    }
  }
}

GET ik1/_search
{
  "query": {
    "match_phrase_prefix": {
      "content": {
        "query": "今天好日子",
        "slop": 2
      }
    }
  }
}

14.正排索引和倒排索引

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_spring boot_20

鲨鱼抓包 android 鲨鱼抓包软件颜色含义_elasticsearch_21

15.数据建模

16.集群的内部安全通信

加密数据
避免数据抓包，敏感信息泄露
验证身份，避免Impostor Node
Data/Cluster state
为节点创建证书 TLS协议要求Trusted Certificate Authority(CA)签发的X.509证书
Certificate 节点加入需要使用相同的CA签发的证书
Full Verification 节点加入集群需要相同CA签发的证书，还需要验证Host name 或者IP地址
No Verification 任何节点都可以加入，开发环境用于诊断目的

#生成证书

#为您的Elasticearch集群创建一个证书颁发机构。例如，使用elasticsearch-certutil ca命令：
bin/elasticsearch-certutil ca

#为群集中的每个节点生成证书和私钥。例如，使用elasticsearch-certutil cert 命令：
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12

#将证书拷贝到 config/certs目录下
elastic-certificates.p12

bin/elasticsearch -E node.name=node0 -E cluster.name=es -E path.data=node0_data -E http.port=9200 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate -E xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12 -E xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12

bin/elasticsearch -E node.name=node1 -E cluster.name=es -E path.data=node1_data -E http.port=9201 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate -E xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12 -E xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12

#不提供证书的节点，无法加入
bin/elasticsearch -E node.name=node2 -E cluster.name=es -E path.data=node2_data -E http.port=9202 -E xpack.security.enabled=true -E xpack.security.transport.ssl.enabled=true -E xpack.security.transport.ssl.verification_mode=certificate

elasticsearch.yml 配置

#xpack.security.transport.ssl.enabled: true
#xpack.security.transport.ssl.verification_mode: certificate

#xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
#xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。