当前位置：首页 > news >正文

python操作es

news 2025/4/3 14:47:24

1、常用操作

### 创建索引
```bash
curl -u 'elastic:123' -X PUT -H "Content-Type: application/json" -d @mapping.json "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index"
```

### 删除索引
```bash
curl -u 'elastic:123' -X DELETE "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index"
```

### 查询索引下的数据量总和
```bash
curl -u 'elastic:123' -X GET "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index/_count"
```

### 查询es下索引的状态
```bash
curl -u 'elastic:123' -X GET "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index/_stats?pretty"
```

```bash
curl -u 'elastic:123' -X GET "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}'
```

mapping.json

{"mappings": {"properties": {"target_term": {"type": "keyword"},"definitions": {"type": "nested","properties": {"definition": {"type": "text","analyzer": "ik_max_word","search_analyzer": "ik_smart"},"source": {"type": "text","analyzer": "ik_max_word","search_analyzer": "ik_smart"},"confidence": {"type": "float"},"context_snippet": {"type": "text","analyzer": "ik_max_word","search_analyzer": "ik_smart"}}},"related_terms": {"type": "text","analyzer": "ik_max_word","search_analyzer": "ik_smart"},"ambiguity_notes": {"type": "text","analyzer": "ik_max_word","search_analyzer": "ik_smart"},"doc_ts": {"type": "date","format": "epoch_millis"},"type": {"type": "integer"},"type_name": {"type": "text"},"id": {"type": "text"}}}
}

在 Elasticsearch 的 mapping 文件中，type定义了字段的数据类型，常见的类型有以下几种：

核心数据类型

text：用于存储全文本数据，例如文章内容、评论等。会对输入的文本进行分词处理，建立倒排索引，以便进行全文搜索。
keyword：适用于精确匹配的字符串，如标识符、标签、状态码等。不会对数据进行分词，而是直接索引整个字符串。
date：用于存储日期和时间数据。可以指定多种格式，如"yyyy-MM-dd HH:mm:ss"等，方便进行日期范围查询和排序。
long、integer、short、byte：用于存储不同范围的整数。long表示 64 位有符号整数，integer为 32 位有符号整数，short是 16 位有符号整数，byte是 8 位有符号整数。
double、float：用于存储浮点数。double是 64 位双精度浮点数，float是 32 位单精度浮点数。
boolean：用于存储布尔值，取值为true或false。
type为nested是一种特殊的数据类型，用于处理数组中包含对象的情况

2、测试搜索 text

前面的 keyword 类型的字段搜索需要把握的是完全一样就行，而对于 text 字段的搜索，text 字段的内容在写入 es 时本身会被分词处理，所以搜索 text 的处理并不完全一样。

在这里，我么用的 address 字段是 text 类型，我们还是用前面的示例作为演示。

term

term 的搜索是不分词的，搜索给定字符串的全部内容，比如对于我们插入的 id=4 的那条数据，address 的内容是 read a book，它被分词为三个，read、a、book，所以我们使用 term 方法搜索下面三个都可以搜到这条数据：

GET /exam/_search
{"query": {"term": {"address": "read"}}}GET /exam/_search
{"term": {"address": "a"}}GET /exam/_search
{"term": {"address": "book"}}

但是，如果我们 address 后面的值如下这种就搜索不到了，因为 term 操作并不会给搜索的内容进行分词，而是作为一个整体进行搜索：

GET /exam/_search
{"query": {"term": {"address": "read a"}}}GET /exam/_search
{"query": {"term": {"address": "a book"}}}GET /exam/_search
{"query": {"term": {"address": "read a book"}}}

但是还有一种情况，那就是对于搜索的 text 字段后加上 .keyword 字段的操作，这个相当于将 address 不分词进行搜索，将 address 这个字段看作是一个 keyword 来操作，可以理解成是使用 term 来搜索 keyword 字段，就是上一个类型的操作。

所以下面的这个操作就是可以搜索到 address='read a book' 的数据

GET /exam/_search
{"query": {"term": {"address.keyword": "read a book"}}}

match

match，模糊匹配，在匹配前会将搜索的字符串进行分词，然后将匹配上的数据按照匹配度（在 es 里有一个 _score 字段用于表示这种匹配程度）倒序返回。

比如我们对 address 字段搜索字符串 a，会返回两条数据，id 为 4 和 5 的，因为 address 字段进行分词存储后都包含这个字符串。

GET /exam/_search
{"query": {"match": {"address": "a"}}}

或者我们搜索内容为 read a，match 搜索会先将其分词，变成 read 和 a，然后匹配分词后包含这两个字符串一个或者两个的数据，在这里也会返回两条，一条的结果是 read a book，一条是 you can get a good job，因为这两条数据都包含字符串 a，但是因为前者分别满足了两个搜索的条件，所以前者的匹配度会更高，所以作为第一条数据返回：

GET /exam/_search
{"query": {"match": {"address": "read a"}}}

match_phrase

匹配短语，使用这个方法不加其他参数的情况下，可以看作是会匹配包含这个短语、且顺序一致的数据。

比如说对于 address="read a book" 的数据，搜索 read a，a book，read a book 都可以筛选到这条数据。

GET /exam/_search
{"query": {"match_phrase": {"address": "read a"}}}GET /exam/_search
{"query": {"match_phrase": {"address": "a book"}}}GET /exam/_search
{"query": {"match_phrase": {"address": "read a book"}}}

但是如果搜索 book a，因为顺序不一致，所以下面的搜索是无法搜素到该数据的：

GET /exam/_search
{"query": {"match_phrase": {"address": "book a"}}}

但是 match_phrase 有一个 slop 参数可以用于忽略这种顺序，也就是允许搜索的关键词错位的个数，比如 'book a'，分词后的 'book' 和 'a' 如果允许错位两个顺序（a 往前挪一个，book 往后挪一个，这是我理解的 slop 的操作用法），那么就可以筛选到我们这条数据，示例如下：

GET /exam/_search
{"query": {"match_phrase": {"address": {"query": "book a","slop": 2}}}
}

match_phrase_prefix

匹配前缀，比如对于 address 值为 'read a book' 的数据，我们只知道的值是 'read a bo'，想要根据这个搜索词搜索完整的数据，就可以用到 match_phrase_prefix。

他的用法是这样的，先将检索词分词，然后将最后一个分词结果单独去匹配，所以这个搜索词的过程就是先根据 'read a' 的分词结果搜索到一些数据，然后根据剩下的 'bo' 去匹配满足这个前缀的数据：

GET /exam/_search
{"query": {"match_phrase_prefix": {"address": "read a bo"}}}

3、match 的其他用法

匹配分词后的全部结果

对于 match，前面我们介绍过会先将搜索的字符串分词，然后去筛选包含分词结果一至多个的结果。

比如前面介绍的搜索 'read a'，会搜索出 'read a book' 以及 'you can get a good job'，因为他们都包含分词的结果 'a'，这种操作就类似于用 should 去对分词结果进行进一步的搜索操作，

但是如果我们想要更精确，搜索的内容必须包含分词的全部结果 'read' 和 'a'，我么可以加上 operator 参数：

GET /exam/_search
{"query": {"match": {"address": {"query": "read a","operator": "and"}}}
}

这样操作结果就是筛选了包含全部搜索词分词后结果的数据。

匹配的模糊处理

我们可以通过 fuzziness 字段来打开字符模糊匹配的开关，最简单的一个例子就是比如我们搜索 'read'，打字不小心打成了 'raed'，这种就可以实现他的模糊匹配：

GET /exam/_search
{"query": {"match": {"address": {"query": "raed a","operator": "and","fuzziness": 1}}}
}

4、multi-match 搜索

前面我们的 match 参数操作的都是针对于单个字段，multi_match 则可以针对于多个字段进行 match 操作，这个需要都能匹配上搜索的关键字，使用示例如下：

GET /exam/_search
{"query": {"multi_match": {"query": "python","fields": ["name", "address"]}}
}

其中，fields 是一个数组，里面是需要搜索的字段。

查看全文

http://www.mrgr.cn/news/96892.html

UniApp集成极光推送详细教程

Python实现 MCP 客户端调用（高德地图 MCP 服务）查询天气工具示例

Laravel 中使用 JWT 作用户登录，身份认证

【硬件视界9】网络硬件入门：从网卡到路由器

IO 端口与 IO 内存

Description of STM32F1xx HAL drivers用户手册

Shiro学习（三）：shiro整合springboot

【微知】ARM CPU是如何获取某个进程的页表的？（通过TTBR寄存器，MMU进行处理）

C++封装、继承、多态（虚函数）

表面法线估计（Surface Normal Estimation）

【JavaSE】String 类

1、常用操作 ### 创建索引 ```bash curl -u 'elastic:123' -X PUT -H "Content-Type: application/json" -d @mapping.json "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index" ```

### 删除索引 ```bash curl -u 'elastic:123' -X DELETE "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index" ```

### 查询索引下的数据量总和 ```bash curl -u 'elastic:123' -X GET "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index/_count" ```

### 查询es下索引的状态 ```bash curl -u 'elastic:123' -X GET "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index/_stats?pretty" ```

```bash curl -u 'elastic:123' -X GET "http://0.0.0.0:9200/ai_kg_extraction_new_lower_tag_index/_search" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} } }' ```