Elasticsearch：基本 CRUD 操作 – Python

本文介绍: 在我之前的文章 “Elasticsearch：关于在 Python 中使用 Elasticsearch 你需要知道的一切 – 8.x”，我详细讲述了如何建立 Elasticsearch 的客户端连接。我们也详述了如何对数据的写入及一些基本操作。在今天的文章中，我们针对数据的 CRUD (create, read, update 及 delete) 做更进一步的描述。

在我之前的文章 “Elasticsearch：关于在 Python 中使用 Elasticsearch 你需要知道的一切 – 8.x”，我详细讲述了如何建立 Elasticsearch 的客户端连接。我们也详述了如何对数据的写入及一些基本操作。在今天的文章中，我们针对数据的 CRUD (create, read, update 及 delete) 做更进一步的描述。

我们需要安装 Elasticsearch 的依赖包：

pip3 install elasticsearch

$ pip3 install elasticsearch
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: elasticsearch in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (8.12.0)
Requirement already satisfied: elastic-transport<9,>=8 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from elasticsearch) (8.10.0)
Requirement already satisfied: urllib3<3,>=1.26.2 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from elastic-transport<9,>=8->elasticsearch) (2.1.0)
Requirement already satisfied: certifi in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from elastic-transport<9,>=8->elasticsearch) (2023.11.17)
$ pip3 list | grep elasticsearch
elasticsearch                            8.12.0
rag-elasticsearch                        0.0.1        /Users/liuxg/python/rag-elasticsearch/my-app/packages/rag-elasticsearch

我们使用如下的代码来建立一个客户端连接：

from elasticsearch import Elasticsearch

elastic_user = "elastic"
elastic_password = "xnLj56lTrH98Lf_6n76y"


url = f"https://{elastic_user}:{elastic_password}@localhost:9200"
es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
 
print(es.info())

在上面，我们需要使用自己的 Elasticsearch 集群的用户信息及证书代替上面的值。更多信息，请详细参阅文章 “Elasticsearch：关于在 Python 中使用 Elasticsearch 你需要知道的一切 – 8.x”。

# Data to be indexed
document = {
  "emp_id": 1,
  "age": 30,
  "email": "example@example.com",
  "name": "John Doe",
  "role": "Developer",
  "dob": "1992-01-01",
  "mobile_no": "1234567890",
  "educational": {
    "10": 87.5,
    "12": 90.0,
    "graduation": 8.4,
    "post_graduation": 9.1
  },
  "stack": ["Python", "Elasticsearch", "React"]
}

# Indexing the document
response = es.index(index="emp_db", document=document)

GET emp_db/_search

actions = [ 
  {"_index": "emp_db", "_op_type": "create", "_source": {"field1": "value1"}}, 
  {"_index": "emp_db", "_op_type": "create", "_source": {"field2": "value2"}} 
  # Add more actions as needed 
]

# List of data to be indexed, this could be in thousands.
documents = [
  {
      "emp_id": 250349,
      "age": 26,
      "email": "abc@xyz.com",
      "name": "abc",
      "role": "Developer",
      "dob": "1997-01-01",
      "mobile_no": "12345678",
      "educational": {
        "10": 87.5,
        "12": 90.0,
        "graduation": 8.4,
        "post_graduation": 9.1
      },
      "stack": ["Python", "PySpark", "AWS"]
  },
  {
      "emp_id": 10789,
      "name": "abc",
      "age": 27,
      "email": "abc@xyz.com",
      "role": "linux admin",
      "dob": "1996-12-10",
      "mobile_no": "12345678",
      "educational": {
        "10": 87.5,
        "12": 90.0,
        "graduation": 8.4,
        "post_graduation": 9.1
      },
      "stack": ["Linux", "AWS"]
  },
  {
      "emp_id": 350648,
      "name": "Sandeep",
      "age": 27,
      "email": "def@xyz.com",
      "role": "seller support"
  }
]

# Define your actions
actions = [dict(**{'_index':'emp_db'}, **{'_op_type':'create'}, **{'_id':str(item['emp_id'])}, **{'_source':item}) for item in documents]

# Import helpers for using bulk API
from elasticsearch import helpers

# Use the bulk helper to perform the actions
bulk_response = helpers.bulk(es, actions)

response = es.get(index="emp_db", id=250349)

doc_ids = [
  {"emp_id":250349},
  {"emp_id":350648}
]

# Define your actions
docs = [dict(**{'_index':'emp_db'}, **{'_id':str(item['emp_id'])}) for item in doc_ids]

# Retrieve the documents
response = es.mget(body={"docs": docs})

document = {
    "emp_id": 250349,
    "role": "sr software engineer"
}

response = es.update(index="emp_db", id=document["emp_id"], doc=document)

es.delete(index="emp_db", id=250349)

# List of ids to be deleted, this could be in thousands.
documents = [
  {
      "emp_id": 10789,
  },
  {
      "emp_id": 350648,
  }
]

# Define your actions
actions = [dict(**{'_index':'emp_db'}, **{'_op_type':'delete'}, **{'_id':str(item['emp_id'])}) for item in documents]

# Import helpers for using bulk API
from elasticsearch import helpers

# Use the bulk helper to perform the actions
response = helpers.bulk(es, actions)