一、简介
Elasticsearch是一个基于Lucene的全文搜索和分析引擎,Lucene Core
是一个完全用Java编写的高性能、全功能搜索引擎库。
它可以扩展到上百台服务器,处理PB级数据。PB = 2^50 Byte, 在数值上约等于1000个TB。 人类功能记忆容量约1.25TB, 也意味着800个人类记忆相当于1PB。
二、认知
1、Lucene
Lucene,封装好了各种建立倒排索引、匹配索引进行搜索的各种算法。我们可以引入Lucene,基于它的API进行开发。
ElasticSearch就在Lucene的基础上实现的,对Lucene进行了良好的封装,简化开发,并提供了很多高级功能
ElasticSearch 为快速检索和分析大数据而生,目前已形成丰富的生态。
成熟的ELK体系:
Elasticsearch
: 位于Elastic堆栈核心的分布式搜索和分析引擎Logstash
+Beats
:收集、聚合、丰富数据,存储到Elasticsearch中Kibana
: 以交互方式探索、可视化、共享对数据的见解,并管理和监视堆栈
2、倒排索引
Elasticsearch使用倒排索引的结构,适用于快速的全文搜索。一个倒排索引由文档中所有不重复词的列表构成,对于其中每个词,有一个包含它的文档列表。
为了创建索引,ES引擎通过分词器将每个文件的内容拆成单独的词(词条/term),再将这些词条创建成不含重复词条的排序列表,然后列出每个词条出现在哪个文档:
term | Doc 1 | Doc 2 | Doc 3 |
---|---|---|---|
no | ✅ | ||
po | ✅ | ||
work | ✅ | ✅ | |
enjoy | ✅ | ✅ | |
your | ✅ | ✅ | ✅ |
team | ✅ | ✅ | |
With | ✅ | ||
challenge | ✅ |
这种结构由文档中所有不重复的词的列表构成,对于其中每个词都有至少一个文档与之关联。这种由属性值来确定记录的位置的结构就是倒排索引,带有倒排索引的文件被称为倒排文件
3、基本概念
Elasticsearch名称 | ElasticSearch概念 | 数据库 |
---|---|---|
Index | 索引 | 库 |
Type | 类型 | 表 |
Document | 文档 | 行 |
field | 字段 | 列 |
Document文档
Json Object,由字段(field)组成
每个文档有一个唯一id标志
Document MetaData 元数据,用于标注文档的相关信息
- _index: 文档所在的索引名
- _type: 文档所在的类型名
- _id: 文档唯一id
- uid: 组合id, 由type和id组成
- _source: 文档的原始json数据,可从这里获取每个字段的内容
- _all: 整合所有字段内容到该字段,默认禁用
数据类型
三、es与kibana安装
version: '3.1'
services:
elasticsearch:
image: elasticsearch:7.13.3
container_name: elasticsearch
privileged: true
environment:
- "cluster.name=elasticsearch" #设置集群名称为elasticsearch
- "discovery.type=single-node" #以单一节点模式启动
- "ES_JAVA_OPTS=-Xms512m -Xmx1096m" #设置使用jvm内存大小
- bootstrap.memory_lock=true
volumes:
- ./es/plugins:/usr/local/dockercompose/elasticsearch/plugins #插件文件挂载
- ./es/data:/usr/local/dockercompose/elasticsearch/data:rw #数据文件挂载
- ./es/logs:/usr/local/dockercompose/elasticsearch/logs:rw
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: "2"
memory: 1000M
reservations:
memory: 200M
kibana:
image: kibana:7.13.3
container_name: kibana
depends_on:
- elasticsearch #kibana在elasticsearch启动之后再启动
environment:
ELASTICSEARCH_HOSTS: http://elasticsearch:9200 #设置访问elasticsearch的地址
I18N_LOCALE: zh-CN
ports:
- 5601:5601
将上述代码写入docker-compose.yml
,后运行即可拉起es、kibana服务:
docker-compose up -d
访问kibana主页: http://localhost:5601/app/home#/
四、kibana对elasticsearch管理
kibana中的Dev Tools开发者工具可以对es数据进行CRUD管理
method方法 | url地址 | 描述 |
---|---|---|
PUT | /索引名称/类型名称/文档id | 创建文档(指定id) |
POST | /索引名称/类型名称 | 创建文档(随机id) |
POST | /索引名称/类型名称/文档id/_update | 修改文档 |
DELETE | /索引名称/类型名称/文档id | 删除文档 |
GET | /索引名称/类型名称/文档id | 通过id查询文档 |
POST | /索引名称/类型名称/_search | 查询所有数据 |
PUT /tool_index/
{
"settings": {
"index": {
"number_of_shards":10,
"number_of_replicas":0
}
}
}
GET /tool_index/_settings
PUT /tool_index/tools/20230326214500
{
"name": "ijie",
"age": 22,
"grade": 1,
"hobby": "coding"
}
POST /tool_index/_doc
{
"name": "erran_new",
"age": 24,
"grade": 2,
"hobby": "code"
}
GET /tool_index/tools/_search
GET /tool_index/tools/_search?q=name:erran
GET /tool_index/tools/_search
{
"query": {
"match": {
"name.keyword": "erran"
}
}
"_source": ["name", "grade"]
}
POST /tool_index/tools/20230326213100/_update
{
"doc": {
"grade": 3
}
}
PUT /tool_index/tools/20230326214500
{
"people": "ijie"
}
DELETE /tool_index/tools/20230326214500
DELETE tool_index
⚠️⚠️⚠️:
- 创建索引后,插入的第一个数据的类型至关重要,如果指定为特殊类型,则后续插入的默认类型会变成该指定特殊类型;后续插入的其他特殊类型会报错!!【Rejecting mapping update to [tool_index] as the final mapping would have more than 1 type: [tools, demo】
- 更新时
- 字段匹配时,使用’field’ 或’field.keyword’皆可
- 在devcloud部署时,会存在高危服务内网风险,解决方案:【增加访问控制措施】
- iptables限制访问来源
- nginx:
server {listen 9200 default_server;server_name _;location / {proxy_set_header Host $host;proxy_set_header X-Real-IP $remote_addr;proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;proxy_pass http://localhost:9200;#Basic字符串就是使用你的用户名(admin),密码(12345)编码后的值#注意:在进行Basic加密的时候要使用如下格式如:admin:123456的base64编码 注意中间有个冒号proxy_set_header Authorization "Basic 这里是basic认证的密码";}}
- 通过es的Search Guard插件来设置api Basic鉴权:
https://docs.search-guard.com/latest/http-basic-authorization
五、查询场景
测试数据:
- 索引:person
- 类型:_doc
name | age | sex | grade | hobby | Weight |
---|---|---|---|---|---|
John | 24 | 男 | 2 | code | |
Alis | 24 | 女 | 1 | movie | |
Jack | 22 | 男 | 1 | code | |
Rookie | 23 | 男 | 4 | read | sixty kilo grams |
Jam | 25 | 女 | 3 | eat | fifty-kilo-grams |
下面使用github中Olivere框架,其封装了若干查询方法,简单上手
语句查询
词条查询
单值查询-term
SQL:
select * from person where name = 'Rookie';
ES:
GET /person/_search
{
"query": {
"term": {
"name": "Rookie"
}
}
}
{
"_index" : "person",
"_type" : "_doc",
"_id" : "rIpjQYcBtP47ROwdblh9",
"_score" : 0.2876821,
"_source" : {
"name" : "Rookie",
"age" : 23,
"sex" : "男",
"grade" : 4,
"hobby" : "read"
}
}
Golang:
client, err := elastic.NewClient(elastic.SetURL("http://xxxx:9200"),
elastic.SetSniff(false),
elastic.SetHealthcheck(false),
elastic.SetBasicAuth("username", "password"))
if err != nil {
fmt.Println(err)
return
}
q := elastic.NewTermQuery("name", "Rookie")
req, err := client.Search().
Index("person").
Query(q).
Do(context.Background())
if err != nil {
fmt.Println(err)
return
}
for _, v := range req.Hits.Hits {
var tmp interface{}
_ = json.Unmarshal(v.Source, &tmp)
fmt.Println(tmp)
}
结果:
多值查询-terms
类似于IN查询
SQL:
select * from persons where age in(23, 24);
ES:
GET /person/_search
{
"query": {
"terms": {
"age": [23,24]
}
}
}
ES结果:
{
"_index" : "person",
"_type" : "_doc",
"_id" : "rIpjQYcBtP47ROwdblh9",
"_score" : 1.0,
"_source" : {
"name" : "Rookie",
"age" : 23,
"sex" : "男",
"grade" : 4,
"hobby" : "read"
}
},
{
"_index" : "person",
"_type" : "_doc",
"_id" : "n4phQYcBtP47ROwd1Vha",
"_score" : 1.0,
"_source" : {
"name" : "Alis",
"age" : 24,
"sex" : "女",
"grade" : 1,
"hobby" : "movie"
}
},
{
"_index" : "person",
"_type" : "_doc",
"_id" : "vIlfQYcBzqYw9eGTVRry",
"_score" : 1.0,
"_source" : {
"name" : "John",
"age" : 24,
"sex" : "男",
"grade" : 2,
"hobby" : "code"
}
}
Golang:
q := elastic.NewTermsQuery("age", 23, 24)
req, err := client.Search().
Index("person").
Query(q).
Do(context.Background())
结果:
范围查询-range
即查询某字段在特定区间的记录
SQL:
select * from pesons where age between 21 and 23;
ES:
GET /person/_search
{
"query": {
"range": {
"age": {
"gte": 21,
"lte": 23
}
}
}
}
ES结果:
{
"_index" : "person",
"_type" : "_doc",
"_id" : "rIpjQYcBtP47ROwdblh9",
"_score" : 1.0,
"_source" : {
"name" : "Rookie",
"age" : 23,
"sex" : "男",
"grade" : 4,
"hobby" : "read"
}
},
{
"_index" : "person",
"_type" : "_doc",
"_id" : "DYliQYcBzqYw9eGTqht2",
"_score" : 1.0,
"_source" : {
"name" : "Jack",
"age" : 22,
"sex" : "男",
"grade" : 1,
"hobby" : "code"
}
}
Golang:
q := elastic.NewRangeQuery("age").Gte(21).Lte(23)
req, err := client.Search().
Index("person").
Query(q).
Do(context.Background())
结果:
通配符查询-wildcard
SQL:
select * from persons where hobby like '%o%';
ES:
GET /person/_search
{
"query": {
"wildcard": {
"hobby": "*o*"
}
}
}
ES结果:
{
"_index" : "person",
"_type" : "_doc",
"_id" : "n4phQYcBtP47ROwd1Vha",
"_score" : 1.0,
"_source" : {
"name" : "Alis",
"age" : 24,
"sex" : "女",
"grade" : 1,
"hobby" : "movie"
}
},
{
"_index" : "person",
"_type" : "_doc",
"_id" : "vIlfQYcBzqYw9eGTVRry",
"_score" : 1.0,
"_source" : {
"name" : "John",
"age" : 24,
"sex" : "男",
"grade" : 2,
"hobby" : "code"
}
},
{
"_index" : "person",
"_type" : "_doc",
"_id" : "DYliQYcBzqYw9eGTqht2",
"_score" : 1.0,
"_source" : {
"name" : "Jack",
"age" : 22,
"sex" : "男",
"grade" : 1,
"hobby" : "code"
}
}
Golang:
q := elastic.NewWildcardQuery("hobby", "*o*")
req, err := client.Search().
Index("person").
Query(q).
Do(context.Background())
结果:
匹配查询
Match query, 返回与提供的文本、数字、日期或布尔值匹配的文档,并在匹配之前分析提供的文本。
match
查询是执行全文搜索的标准查询,包括模糊匹配的选项。
以匹配weight=“fifty-kilo-grams”为例:
GET /person/_search
{
"query": {
"match": {
"weight": {
"query": "fifty-kilo-grams"
}
}
}
}
- query(必填)
- analyzer(可选,字符串)
- auto_generate_synonyms_phrase_query(可选,布尔值)
- fuzziness(可选,字符串)
- max_expansions(可选,整数)
- prefix_length(可选,整数)
- fuzzy_transpositions(可选,布尔值)
- fuzzy_rewrite(可选,字符串)
- lenient(可选,布尔值)
- operator(可选,字符串)
- minimum_should_match(可选,字符串)
- zero_terms_query(可选,字符串)
简单查询可以简化匹配语法:
GET /person/_search
{
"query": {
"match": {
"weight": "fifty-kilo-grams"
}
}
}
模糊匹配:
可以使用fuzziness
来模糊匹配字段, “AUTO”会根据词项的长度来产生可编辑距离,默认值是AUTO:3,6
- >5
可以使用prefix_length
、max_expansions
来控制模糊过程
可以使用fuzzy_rewrite
来允许控制查询将如何被重写
可以使用fuzzy_transpositions
来控制是否允许模糊换位(ab->ba)
GET /person/_search
{
"query": {
"match": {
"weight": {
"query": "2fifty-kilo-grams1",
"fuzziness": "AUTO"
}
}
}
}
GET /person/_mapping
"properties" : {
"age" : {
"type" : "long"
},
"grade" : {
"type" : "long"
},
"hobby" : {
"type" : "keyword"
},
"name" : {
"type" : "keyword"
},
"sex" : {
"type" : "keyword"
},
"weight" : {
"type" : "keyword"
}
}
新建一个text类型的new_field
,并设置new_field.keyword
为keyword类型
PUT /person/_mapping
{
"properties": {
"new_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
POST /person/_doc/rIpjQYcBtP47ROwdblh9
{
"name": "Rookie",
"age": 23,
"sex": "男",
"grade": 4,
"hobby": "read",
"weight": "sixty kilogram",
"new_field": "sixty kilo grams, sixty kilogram, 60 千克"
}
POST /person/_doc/vYplQYcBtP47ROwdc1gz
{
"name": "Jam",
"age": 25,
"sex": "女",
"grade": 3,
"hobby": "eat",
"weight": "fifty-kilo-grams",
"new_field": "60"
}
term——match——match_phrase
match_all(固定用法: "match_all": {}
) 查询所有
GET /person/_analyze
{
"text":"sixty kilo grams, sixty kilogram, 60 千克"
}
keyword: 只支持完整内容
text | term | match | match_phrase |
---|---|---|---|
分词器分词 | ✅ | ✅ | ✅ |
分词器多个分词 | ❌ | ✅ | ✅ |
完整内容 | ❌ | ✅ | ✅ |
中文短语 | ❌ | ✅ | ✅ |
match与match_phrase的区别:
For instance
{ "id" : 1,"content":"关注我,系统学编程" }
{ "id" : 2,"content":"系统学编程,关注我" }
{ "id" : 3,"content":"系统编程,关注我" }
{ "id" : 4,"content":"关注我,间隔系统学编程" }
查询 关注我,系统学
- match返回所有文档,因为使用“ik_smart”分词后,Token【关注、我、系统学】,包含Token即返回
- match_phrase只返回id=1,包含Token且顺序一致
- match_phrase添加
slop
参数——Token之间的位置距离容差值, id=4的Token【关注、我、间隔、系统学】,因此若添加slop=1,则返回文档1、文档4
q := elastic.NewMatchQuery("new_field", "60").Operator("and")
//map[age:23 grade:4 hobby:read name:Rookie new_field:sixty kilo grams, sixty kilogram, 60 千克 sex:男 weight:sixty kilogram, 60kg,千克]
q2 := elastic.NewMatchPhraseQuery("new_field", "千克")
//map[age:25 grade:3 hobby:eat name:Jam new_field:60 sex:女 weight:fifty-kilo-grams]
//map[age:23 grade:4 hobby:read name:Rookie new_field:sixty kilo grams, sixty kilogram, 60 千克 sex:男 weight:sixty kilogram, 60kg,千克]
type MatchQuery struct {
name string // key
text interface{} // value
operator string // or(默认)/and 使用and时,需要查询字段包含query中的所有分词
analyzer string
boost *float64 // 权重
fuzziness string // AUTO(默认) 可编辑最大距离
prefixLength *int // 未模糊的初始字符数
maxExpansions *int // 结果返回term的数量限制
minimumShouldMatch string
fuzzyRewrite string
lenient *bool // 忽略数据类型不匹配
fuzzyTranspositions *bool
zeroTermsQuery string // none(默认)/all 使用all时 忽略analyzer限制,效果与match_all相似
cutoffFrequency *float64 // 分数(0.02)表示频率,正整数(3)表示出现次数
queryName string
}
type MatchPhraseQuery struct {
name string // key
value interface{} // value
analyzer string
slop *int //分词词项最大移动次数
boost *float64
queryName string
zeroTermsQuery string
}
GET /person/_search
{
"query": {
"multi_match": {
"query": "sixty 60",
"fields": ["weight", "new_field"]
}
}
}
GET /person/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "weight": "sixty 60" }},
{ "match": { "new_field": "sixty 60" }}
]
}
}
}
Multi-match query
the multi_match
builds on the match
query to allow multi-field queries
先来看multi_match的type参数,内部执行查询的方式取决于type:
- best_fields:(默认)查找匹配任意field的文档,并根据_score来使用最佳field
- most_fields:查找匹配任意field的文档并结合每个字段的_score
- cross_fields:将field视为analyzer的一个大的field,在任意field中查找每一个单词
- phrase:使用
match_phrase
查询每一个field并根据_score来使用最佳field - phrase_prefix:使用
match_phrase_prefix
查询每一个field并根据_score来使用最佳field - bool_prefix:创建
match_bool_prefix
查询每一个field并结合每个field的_score
q := elastic.NewMultiMatchQuery("sixty 60", "weight", "new_field").Operator("and")
// map[age:23 grade:4 hobby:read name:Rookie new_field:sixty kilo grams, sixty kilogram, 60 千克 sex:男 weight:sixty kilogram, 60kg,千克]
q := elastic.NewMultiMatchQuery("sixty 60", "weight", "new_field")
// map[age:23 grade:4 hobby:read name:Rookie new_field:sixty kilo grams, sixty kilogram, 60 千克 sex:男 weight:sixty kilogram, 60kg,千克]
// map[age:25 grade:3 hobby:eat name:Jam new_field:60 sex:女 weight:fifty-kilo-grams]
复合查询
实际应用中,需要过滤多个值/字段,这样的多条件等值查询,则需要使用组合过滤器
SELECT * FROM person WHERE sex='女' AND age>=23;
GET /person/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"sex": {
"value": "女"
}
}
},
{
"range": {
"age": {
"gte": 23
}
}
}
]
}
}
}
query1 := elastic.NewTermQuery("sex", "女")
query2 := elastic.NewRangeQuery("age").Gte(23)
q := elastic.NewBoolQuery().Must(query1, query2)
// map[age:25 grade:3 hobby:eat name:Jam new_field:60 sex:女 weight:fifty-kilo-grams]
// map[age:24 grade:1 hobby:movie name:Alis sex:女]
布尔过滤器(bool filter)属于复合过滤器(compound filter)的一种,可以接受多个其他的过滤器作为参数,并将这些过滤器结合成各种布尔逻辑组合
Bool query
// Creates a new bool query.
func NewBoolQuery() *BoolQuery {
return &BoolQuery{
mustClauses: make([]Query, 0),
mustNotClauses: make([]Query, 0),
filterClauses: make([]Query, 0),
shouldClauses: make([]Query, 0),
}
}
must
: 所有的语句都必须匹配,与 ‘=’ 等价must_not
: 所有的语句都不能匹配,与 ‘!=’ 或 not in 等价should
: 至少有n个语句要匹配,n由参数控制(在olivere框架中会自动匹配个数,已封装好)filter
: 子句查询忽略评分、考虑缓存
select *from persons
where
sex = '男'
and
age between 22 and 25
and
grade != 1
and
(hobby = 'code' OR weight = 'sixty kilogram, 60kg,千克')
GET /person/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"sex": {
"value": "男"
}
}
},
{
"range": {
"age": {
"gte": 22,
"lte": 25
}
}
}
],
"must_not": [
{
"match": {
"grade": 1
}
}
],
"should": [
{
"match": {
"hobby": "code"
}
},
{
"match_phrase": {
"weight": "sixty kilogram"
}
},
{
"match_phrase": {
"weight": "60kg"
}
},
{
"match_phrase": {
"weight": "千克"
}
}
]
}
}
}
使用filter加速
query1 := elastic.NewTermQuery("sex", "男")
query2 := elastic.NewRangeQuery("age").Gte(23).Lte(25)
query3 := elastic.NewMatchQuery("grade", 1)
query4 := elastic.NewMatchQuery("hobby", "code")
query5 := elastic.NewMatchPhraseQuery("weight", "sixty kilogram")
query6 := elastic.NewMatchPhraseQuery("weight", "60kg")
query7 := elastic.NewMatchPhraseQuery("weight", "千克")
query := elastic.NewBoolQuery().
Must(query1, query2).
MustNot(query3).
Should(query4, query5, query6, query7)
q := elastic.NewBoolQuery().Filter(query)
聚合查询
统计
GET /person/_search
{
"aggs": {
"max_age": {
"max": {
"field": "age"
}
},
"min_age": {
"min": {
"field": "age"
}
},
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
aggs := elastic.NewMaxAggregation().Field("age")
q := elastic.NewMatchAllQuery()
req, err := client.Search().
Index("person").
Query(q).
Aggregation("max_age", aggs).
Do(context.Background())
// 结果在req.Aggregations中
// 调用Aggregations对象的方法来获取想要的聚合结果
aggResult, _ := req.Aggregations.Max("max_age")
maxAge := *aggResult.Value
fmt.Printf("Max age: %dn", int(maxAge))
// Max age: 25
去重查询
查询一共多少种爱好
GET /person/_search
{
"aggs": {
"hobby_count": {
"cardinality": {
"field": "hobby"
}
}
}
}
aggs := elastic.NewCardinalityAggregation().Field("hobby")
q := elastic.NewMatchAllQuery()
req, err := client.Search().
Index("person").
Query(q).
Aggregation("hobby_count", aggs).
Do(context.Background())
//hobby count : 4
分组
查询每个性别的人数
GET /person/_search
{
"aggs": {
"count": {
"terms": {
"field": "sex",
"size": 10
}
}
}
}
// "aggregations" : {
// "count" : {
// "doc_count_error_upper_bound" : 0,
// "sum_other_doc_count" : 0,
// "buckets" : [
// {
// "key" : "男",
// "doc_count" : 3
// },
// {
// "key" : "女",
// "doc_count" : 2
// }
// ]
// }
// }
aggs := elastic.NewTermsAggregation().Field("sex")
q := elastic.NewMatchAllQuery()
req, err := client.Search().
Index("person").
Query(q).
Aggregation("count", aggs).
Do(context.Background())
aggResult, _ := req.Aggregations.Terms("count")
for _, v := range aggResult.Buckets {
fmt.Printf("%v--%dn", v.Key, v.DocCount)
}
//男--3
//女--2
Warning⚠️:Elasticsearch 不支持对 text
类型的字段进行聚合操作。
原文地址:https://blog.csdn.net/qq_45366447/article/details/130646674
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.7code.cn/show_16579.html
如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱:suwngjj01@126.com进行投诉反馈,一经查实,立即删除!