python 连接es 账户密码

转载

IT剑客风云 2024-11-23 17:51:41

文章标签 python 连接es 账户密码 elasticsearch Elastic 返回结果 文章分类 Python 后端开发

ElasticSearch for Python

分词器安装–ik

ik下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.6.2

ik分词器需要安装到elasticsearch-7.6.2目录下的plugins下

python 连接es 账户密码_返回结果_02

elasticseacrh安装目录中间不可以有空格，如果出现空格，会出现elasticsearch无法启动，请注意！！！

ik分词器的由来

IK Analyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版开始， IK Analyzer已经推出了4个大版本。最初，它是以开源项目Luence为应用主体的，结合词典分词和文法分析算法的中文分词组件。从3.0版本开始，IK发展为面向Java的公用分词组件，独立于Lucene项目，同时提供了对Lucene的默认优化实现。在2012版本中，IK实现了简单的分词歧义排除算法，标志着IK分词器从单纯的词典分词向模拟语义分词衍化。
IK Analyzer 2012特性：

采用了特有的正向迭代最细粒度切分算法，支持细粒度和智能分词两种切分模式。在系统环境：Core2 i7 3.4G双核，4G内存，window 7 64位， Sun JDK 1.6_29 64位普通pc环境测试，IK2012具有160万字/秒（3000KB/S）的高速处理能力。
2012版本的智能分词模式支持简单的分词排歧义处理和数量词合并输出。
采用了多子处理器分析模式，支持：英文字母、数字、中文词汇等分词处理，兼容韩文、日文字符
优化的词典存储，更小的内存占用。支持用户词典扩展定义。特别的，在2012版本，词典支持中文，英文，数字混合词语。

后来，被一个叫medcl（曾勇 elastic开发工程师与布道师，elasticsearch开源社区负责人，2015年加入elastic）的人集成到了elasticsearch中，并支持自定义字典…

测试

安装成功以后，重启elasticsearch和kibanan

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "学习使我快乐"
}

测试结果：

{
  "tokens" : [
    {
      "token" : "学习",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "使",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "我",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "快乐",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

python 连接es 账户密码_Elastic_03

ik-max-word：将文档做最细粒度的拆分，以穷尽尽可能的组合

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "今天天气不错"
}

测试结果

{
  "tokens" : [
    {
      "token" : "今天天气",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "不错",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

python 连接es 账户密码_python 连接es 账户密码_04

ik_smart：最粗粒度的拆分文档

python连接elasticsearch

下载依赖包：

pip install elasticsearch

连接elasticsearch

from elasticsearch import  Elasticsearch
# 默认连接本地的elasticsearch
es = Elasticsearch()   
# 指定IP和端口连接
es = Elasticsearch(['127.0.0.1:9200'])

测试是否连接成功：

from elasticsearch import  Elasticsearch
es = Elasticsearch()    # 默认连接本地elasticsearch

print(es.ping())  # 返回为True，代表连接成功

python 连接es 账户密码_python 连接es 账户密码_05

python操作elasticsearch

第一个简单示例

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.index(index='index01', doc_type='doc', id=1, body={'name': "zhangsan", "age": 20}))

返回结果

{'_index': 'index01', '_type': 'doc', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

获取数据

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.get(index='index01', doc_type='doc', id=1))

返回结果

{'_index': 'index01', '_type': 'doc', '_id': '1', '_version': 1, '_seq_no': 0, '_primary_term': 1, 'found': True, '_source': {'name': 'zhangsan', 'age': 20}}

Python中关于elasticsearch的操作，主要集中一下几个方面：

结果过滤，对于返回结果做过滤，主要是优化返回内容。
Elasticsearch（简称es），直接操作elasticsearch对象，处理一些简单的索引信息。一下几个方面都是建立在es对象的基础上。
Indices，关于索引的细节操作，比如创建自定义的mappings。
Cluster，关于集群的相关操作。
Nodes，关于节点的相关操作。
Cat API，换一种查询方式，一般的返回都是json类型的，cat提供了简洁的返回结果。
Snapshot，快照相关，快照是从正在运行的Elasticsearch集群中获取的备份。我们可以拍摄单个索引或整个群集的快照，并将其存储在共享文件系统的存储库中，并且有一些插件支持S3，HDFS，Azure，Google云存储等上的远程存储库。
Task Management API，任务管理API是新的，仍应被视为测试版功能。API可能以不向后兼容的方式更改。

结果过滤

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.search(index='index01', filter_path=['hits.total', 'hits.hits._source']))

返回结果

{'hits': {'total': {'value': 1, 'relation': 'eq'}, 'hits': [{'_source': {'name': 'zhangsan', 'age': 20}}]}}

filter_path参数用于减少elasticsearch返回的响应，比如仅返回hits.total和hits.hits._source内容。
除此之外，filter_path参数还支持*通配符以匹配字段名称、任何字段或者字段部分：

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.search(index='index01', filter_path=['hits.hits._*']))
# print(es.search(index='index01', filter_path=['hits.to*']))  # 仅返回响应数据的total

返回结果：

{'hits': {'hits': [{'_index': 'index01', '_type': 'doc', '_id': '1', '_score': 1.0, '_source': {'name': 'zhangsan', 'age': 20}}]}}

指定类型的结果过滤

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.search(index='index01', doc_type='doc'))

返回结果

{'took': 4, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 1, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'index01', '_type': 'doc', '_id': '1', '_score': 1.0, '_source': {'name': 'zhangsan', 'age': 20}}]}}

elasticsearch对象

index: 向指定索引添加或更新文档，如果索引不存在，首先会创建该索引，然后再执行添加或者更新操作。

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.index(index='index02', doc_type='doc', id='1', body={"name": "lisi", "age": 22}))

# 如果不指定doc_type会报错，TypeError: index() missing 1 required positional argument: 'doc_type'
# print(es.index(index='index02', id='2', body={"name": "wangwu", "age": 23})) 

# 可以不指定id，默认生成一个id
# print(es.index(index='index02', doc_type='doc', body={"name": "zhaoliu", "age": 20}))

返回结果：

{'_index': 'index02', '_type': 'doc', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

get：查询索引中指定文档

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.get(index='index02', doc_type='doc', id=1))

# 如果不指定id,会报TypeError: get() missing 1 required positional argument: 'id'
# print(es.get(index='index02', doc_type='doc'))  

# 如果不指定doc_type，会报TypeError: get() missing 1 required positional argument: 'doc_type'
# print(es.get(index='index',  id=1))

返回结果

{'_index': 'index02', '_type': 'doc', '_id': '1', '_version': 1, '_seq_no': 0, '_primary_term': 1, 'found': True, '_source': {'name': 'lisi', 'age': 22}}

search，执行搜索查询并获取与查询匹配的搜索匹配。这个用的最多，可以跟复杂的查询条件。

index要搜索的以逗号分隔的索引名称列表; 使用_all 或空字符串对所有索引执行操作。
doc_type 要搜索的以逗号分隔的文档类型列表; 留空以对所有类型执行操作。
body 使用Query DSL（QueryDomain Specific Language查询表达式）的搜索定义。
_source 返回_source字段的true或false，或返回的字段列表，返回指定字段。
_source_exclude要从返回的_source字段中排除的字段列表，返回的所有字段中，排除哪些字段。
_source_include从_source字段中提取和返回的字段列表，跟_source差不多。

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.search(index='index01', doc_type='doc', body={"query": {"match":{"age": 20}}}))

# 结果字段过滤
# print(es.search(index='index02', doc_type='doc', body={"query": {"match":{"age": 19}}},_source=['name', 'age'])) 

# 结果字段过滤，排除年龄字段
# print(es.search(index='index01', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_exclude  =[ 'age']))

# 
print(es.search(index='index02', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_include =[ 'age']))

返回结果

{'took': 2, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 1, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'index01', '_type': 'doc', '_id': '1', '_score': 1.0, '_source': {'name': 'zhangsan', 'age': 20}}]}}

get_source，通过索引、类型和ID获取文档的来源，其实，直接返回想要的字典。

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.get_source(index='index01', doc_type='doc', id='1'))

返回结果

{'name': 'zhangsan', 'age': 20}

count，执行查询并获取该查询的匹配数。比如查询年龄是20的文档。

from elasticsearch import Elasticsearch

es = Elasticsearch()

body = {
    "query": {
        "match": {
            "age": 20
        }
    }
}
print(es.count(index='index01', doc_type='doc', body=body))

# count其它用法
print(es.count(index='index01', doc_type='doc', body=body)['count'])  # 返回 1

print(es.count(index='index01'))  # 返回{'count': 1, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}

print(es.count(index='index01', doc_type='doc'))  # 返回{'count': 1, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}

返回结果

{'count': 1, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}

create，创建索引（索引不存在的话）并新增一条数据，索引存在仅新增（只能新增，重复执行会报错）。

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.create(index='index03', doc_type='doc', id='1', body={"name": 'laozhang', "age": 20}))

返回结果

{'_index': 'index03', '_type': 'doc', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

delete，删除指定的文档。比如删除文章id为4的文档，但不能删除索引，如果想要删除索引，还需要es.indices.delete来处理

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.delete(index='index03', doc_type='doc', id='1'))

返回结果

{'_index': 'index03', '_type': 'doc', '_id': '1', '_version': 2, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}

delete_by_query，删除与查询匹配的所有文档。

index 要搜索的以逗号分隔的索引名称列表; 使用_all 或空字符串对所有索引执行操作。
doc_type 要搜索的以逗号分隔的文档类型列表; 留空以对所有类型执行操作。
body使用Query DSL的搜索定义。

exists，查询elasticsearch中是否存在指定的文档，返回一个布尔值。

print(es.exists(index='index04', doc_type='doc', id='1'))

返回结果

False

info，获取当前集群的基本信息。

print(es.info())

返回结果

{'name': 'MING', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'ggy3MupOSIqmnigmXroxHQ', 'version': {'number': '7.6.2', 'build_flavor': 'default', 'build_type': 'zip', 'build_hash': 'ef48eb35cf30adf4db14086e8aabd07ef6fb113f', 'build_date': '2020-03-26T06:34:37.794943Z', 'build_snapshot': False, 'lucene_version': '8.4.0', 'minimum_wire_compatibility_version': '6.8.0', 'minimum_index_compatibility_version': '6.0.0-beta1'}, 'tagline': 'You Know, for Search'}

ping，如果群集已启动，则返回True，否则返回False。

print(es.ping())

返回结果

True

indices.analyze，返回分词结果。

from elasticsearch import Elasticsearch

es = Elasticsearch()

print(es.indices.analyze(body={'analyzer': "ik_max_word", "text": "今天天气不错，风和日丽的。"}))

返回结果

{'tokens': [{'token': '今天天气', 'start_offset': 0, 'end_offset': 4, 'type': 'CN_WORD', 'position': 0}, {'token': '今天', 'start_offset': 0, 'end_offset': 2, 'type': 'CN_WORD', 'position': 1}, {'token': '天天', 'start_offset': 1, 'end_offset': 3, 'type': 'CN_WORD', 'position': 2}, {'token': '天气', 'start_offset': 2, 'end_offset': 4, 'type': 'CN_WORD', 'position': 3}, {'token': '不错', 'start_offset': 4, 'end_offset': 6, 'type': 'CN_WORD', 'position': 4}, {'token': '风和日丽', 'start_offset': 7, 'end_offset': 11, 'type': 'CN_WORD', 'position': 5}, {'token': '风和', 'start_offset': 7, 'end_offset': 9, 'type': 'CN_WORD', 'position': 6}, {'token': '日', 'start_offset': 9, 'end_offset': 10, 'type': 'CN_CHAR', 'position': 7}, {'token': '丽', 'start_offset': 10, 'end_offset': 11, 'type': 'CN_CHAR', 'position': 8}, {'token': '的', 'start_offset': 11, 'end_offset': 12, 'type': 'CN_CHAR', 'position': 9}]}

indices.delete，在Elasticsearch中删除索引。

print(es.indices.delete(index='index03'))

返回结果

{'acknowledged': True}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：spark createOrReplaceTempView大数据怎么缓存

下一篇：sqlserver 临时表建索引反而慢

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯