Elasticsearch实践：Setting、Mapping

本文介绍: 在Elast icSe arc h中，没有专门的数组（Ar r a y）数据类型，但是，在默认情况下，任意一个字段都可以包含0或多个值，这意味着每个字段默认都是数组类型，只不过，数组类型的各个元素值的数据类型必须相同。支持 long，integer，short，byte，double，float，half _float，scale d _float。通用的ISO日期格式，其中日期部分是必需的，时间部分是可选的。对于整数类型（byte，short，integer和long）而言，我们应该选择这是足以使用的最小的类型。

Map ping类似于数据库中的表结构定义，主要作用如下：

Map ping 完整的内容可以分为四部分内容：

如果没有手动设置Map ping，Elast ic search 默认会自动解析出类型，且每个字段以第一次出现的为准。

下面我们先看一下Elast ic search 默认创建的Map ping是什么样的。

首先我们创建一个索引：

PUT /user/

查询索引信息：

GET /user

{
  "user": {
    "aliases": {},
    "mappings": {},
    "settings": {
      "index": {
        "creation_date": "1540044686190",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "_K5b8w7jRiuthf7QeQZhdw",
        "version": {
          "created": "5060299"
        },
        "provided_name": "user"
      }
    }
  }
}

PUT /user/doc/1
{
  "name":"Allen Yer",
  "job":"php",
  "age":22
}

PUT /user/doc/2
{
  "name":"Allen Yer",
  "job":0,
  "age":22
}

GET /user/doc/_count

{
  "count": 2,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  }
}

{
  "user": {
    "mappings": {
      "doc": {
        "properties": {
          "age": {
            "type": "long"
          },
          "job": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

DELETE /user

PUT /user/doc/2
{
  "name":"Allen Yer",
  "job":0,
  "age":22
}

PUT /user/doc/1
{
  "name":"Allen Yer",
  "job":"php",
  "age":22
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [job]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [job]",
    "caused_by": {
      "type": "number_format_exception",
      "reason": "For input string: "php""
    }
  },
  "status": 400
}

DELETE /user

PUT /user/
{
    "mappings": {
      "doc": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "age": {
            "type": "long",
            "index": false
          },
          "job": {
            "type": "keyword"
          },
          "intro":{
            "type":"text"
          },
          "create_time": {
            "type": "date", 
            "format": "epoch_second"
        }
        }
     }
  }
}

有text 和 keyword2种。其中 text 支持分词，用于全文搜索；keyword 不支持分词，用于聚合和排序。在旧的ES里这两个类型由string表示。

如果安装了IK分词插件，我们可以为text类型指定IK分词器。一般来说，对于字符串类型，如果：

"name": {
        "type": "text",
        "analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }

"content": {
        "type": "text",
        "analyzer": "ik_smart"
      }

"name": {
        "type": "keyword"
      }

"url": {
        "type": "keyword",
        "index": false
      }

带符号的64位整数，其最小值为-2^63，最大值为(2^63)-1。

带符号的32位整数，其最小值为-2^31，最大值为(23^1)-1。

对于浮点类型（float、half_float和scaled_float），-0.0和+0.0是不同的值，使用term查询查找-0.0不会匹配+0.0，同样range查询中上边界是-0.0不会匹配+0.0，下边界是+0.0不会匹配-0.0。

其中scaled_float，比如价格只需要精确到分，price为57.34的字段缩放因子为100，存起来就是5734。优先考虑使用带缩放因子的scaled_float浮点类型。

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "status": {
          "type": "byte"
        },
        "year": {
          "type": "short"
        },
        "id": {
          "type": "long"
        },
        "price": {
          "type": "scaled_float",
          "scaling_factor": 100
        }
      }
    }
  }
}

类型为 date 。

日期类型可以使用使用format自定义，默认缺省值："strict_date_optional_time||epoch_millis"：

"postdate": {
      "type": "date",
      "format": "strict_date_optional_time||epoch_millis"
    }

format 有很多内置类型，这里列举部分说明：

其中strict_开头的表示严格的日期格式，这意味着，年、月、日部分必须具有前置0。

"postdate":{
      "type":"date",
      "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
    }

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "date": {
          "type": "date",
          "format":"epoch_millis"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "date":1543151405000
}
PUT my_index/_doc/2
{
  "date":1543151405
}
PUT my_index/_doc/3
{
  "date":"2018-11-25 21:10:43"
}
GET my_index/_doc/_search

第3条数据插入失败，因为只接受长整数的时间戳，字符串类型的日期是不匹配的。第2条的值只有10位数，虽然值是不正确的，但是在epoch_millis的取值范围内，所以也是成功的。

类型为 boolean 。

类型为 binary 。

字符型数组: [ "one", "two" ]
整型数组：[ 1, 2 ]
数组型数组：[ 1, [ 2, 3 ]] 等价于[ 1, 2, 3 ]

原理是将所有字段的内容视为字符串，拼在一起放在一个_all字段上，但这个字段默认是不被存储的，可以被搜索。在query_string与 simple_query_string查询（Kibana 搜索框用的这种查询方式）默认也是查询_all字段。

6.x 版本被默认关闭。

PUT my_index
{
  "mappings": {
    "my_type": {
      "_all": {
        "enabled": true,
        "store": false
      },
      "properties": {}
    }
  },
  "settings": {
    "index.query.default_field": "_all" 
  }
}

上述配置在5.x版本是默认配置：

如果从CPU性能及磁盘空间考虑，可以考虑可以完全禁用或基于每个字段自定义_all字段。

假设_all字段被禁用，则URI搜索请求、 query_string和simple_query_string查询将无法将其用于查询。我们可以将它们配置为其他字段：通过定义 index.query.default_field 属性。

GET /user/doc/_search?q=name

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "user",
        "_type": "doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "this is test name",
          "age": 22,
          "job": "java",
          "intro": "the intro can not be searched by singal",
          "intro2": "去朝阳公园",
          "create_time": 1540047542
        }
      }
    ]
  }
}

搜索结果就包含_source字段，存储的是原始文档内容。如果被禁用，只知道有匹配内容，但是无法知道返回的是什么。所以需要谨慎关闭该字段。

{
  "mappings": {
    "_doc": {
      "_source": {
        "enabled": false
      }
    }
  }
}

显示所有内容

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

自动Map ping

手动 创建 m app in g

字段类型

普通数据类型

字符串类型

数字类型

日期类型

布尔类型

二进制类型

范围类型

复杂类型

Geo数据类型

专用数据类型

多字段

元字段

_all

_source

_type

发表回复取消回复

自动Mapping