Elasticsearch(三)

Author： PrinceLei
发布时间：July 16, 2020
672views
No comments
6339 words
Categories：技术分享 elasticsearch

#  Elasticsearch(三)
##  Elasticsearch查询语法
###  SearchTimeout
默认没有timeout，如果设置了timeout，那么会执行timeout机制。
Timeout机制:
假设用户查询结果有1W条数据，需要10秒才能查询完毕，但是用户设置了1秒的timeout，那么不管当前一共查询了多少数据，都会在1秒后停止查询，并返回当前数据。
```
GET product/_search?timeout=1ms
```
###  ES常用查询
####  Query_String
查询所有:
```
GET /product/_search
```
带参数：
```
GET product/_search?q=name:xiaomi
```
分页:
```
product/_search?from=0&size=2&sort=price:asc
```
####  Query DSL:
match_all: 匹配所有
```
GET /product/_search
{
  "query":{
    "match_all": {}
  }
}
```
match: 查询某个字段
```
GET product/_search
{
  "query": {
    "match": {
      "name": "nfc"
    }
  }
}
```
sort: 排序
```
GET product/_search
{
  "query": {
    "match": {
      "name": "nfc"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}
```
multi_match: 根据多个字段查询
```
GET product/_search
{
  "query": {
    "multi_match": {
      "query": "nfc",
      "fields": ["name","desc"]
    }
  }
}
```
_source: 源数据，想要返回哪些字段
```
GET product/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["name","price"]
}
```
分页:
```
GET product/_search
{
  "query": {
    "match_all": {}
  }, 
  "from": 0,
  "size": 2
}
```
####  Full-text queries全文检索
query-term: 查询某个字段，不会被分词
```
GET product/_search
{
  "query": {
    "term": {
      "name": {
        "value": "nfc"
      }
    }
  }
}
```
match和term的区别:
match会将查询的关键词以空格分词，然后去反向索引中匹配，而term不会将关键词分词，直接去反向索引中匹配
```
GET product/_search
{
  "query": {
    "match": {
      "name": "nfc phone"
    }
  }
}
```
等同于
```
GET product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "name": {
              "value": "nfc"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "phone"
            }
          }
        }
      ]
    }
  }
}
```
等同于
```
GET product/_search
{
  "query": {
    "terms": {
      "name": [
        "nfc",
        "phone"
      ]
    }
  }
}
```
验证分词:
```
GET _analyze
{
  "analyzer": "standard",   #  使用默认分词器
  "text": ["xiaomi nfc zhineng phone"]
}
```
####  Phrase search
短语搜索，和全文检索相反，关键词会整体作为一个短语去检索。
```
GET product/_search
{
  "query": {
    "match_phrase": {
      "name": "nfc phone"
    }
  }
}
```
####  Query and Filter
bool: 可以组合多个查询条件，bool查询也是采用more_matches_is_better的机制，因此满足must和should的子句的document将会合并起来计算分值。
- must: 必须满足。子句(查询)必须出现在匹配文档中，并将有助于得分。
- filter: 过滤器，不计算相关分数。子句(查询)必须出现在匹配的文档中，但与must不同的是分值将被忽略，并且子句被考虑用于缓存。
- should: 可能满足。子句(查询)应出现在匹配的文档中。
- must_not: 必须不满足，不计算相关分数。子句(查询)不得出现在匹配的文档中，子句在过滤器上下文中执行，这意味着计分被忽略，并且子句被视为用于缓存。

```
GET product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": {
              "query": "xiaomi",
              "minimum_should_match": "80%"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "name": "erji"
          }
        }
      ],
      "should": [
        {
          "match": {
            "desc": "nfc"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "price": {
              "gt": 1999
            }
          }
        }
      ]
    }
  }
}
```
Filter由于不用计算分值，并且可以使用缓存，因此速度比较快，并且会优先于Query执行，先过滤掉大量不匹配数据，加快查询速度。
嵌套查询:
minimum_should_match: 参数指定should返回的文档必须匹配的子句的数量或百分比。如果bool查询包含至少一个should子句，而没有must或filter子句，则默认值为1。否则，默认值为0。
```
GET product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "nfc"
          }
        }
      ],
      "should": [
        {
          "range": {
            "price": {
              "gt": 1999
            }
          }
        },
        {
          "range": {
            "price": {
              "gt": 3999
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}
```
####  Compound queries
组合查询: 由于Fileter查询速度快，在不需要计算分值的场景下，可以直接使用Filter进行查询，语法如下:
```
GET product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must":[
            {
              "match": {
                "name": "xiaomi nfc"
              }
            }  
          ],
          "must_not":[
            {
              "term":{
                "name":"erji"
              }
            }  
          ]
        }
      }
    }
  }
}
```
####  Heighlight search
结果高亮
```
GET product/_search
{
  "query": {
    "match_phrase": {
      "name": "nfc phone"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}
```
返回结果中，会出现
```
"highlight" : {
          "name" : [
            "<em>nfc</em> <em>phone</em>"
          ]
        }
```
###  Deep paging
深度分页，假设我们需要取5000~5050的数据，按照价格由低到高排序，我们的ES有5个分片，每个分片有1万条数据，由于分片中存放的数据肯定不能满足我们想要得到排序，因此ES会从每个分片中取排序的前5050条记录，然后全部放到一起，再进行一次比较，取出5050条，也就是需要比较25250条记录。
1. 当翻页数据深度超过1万，也就是from+size大于1万时，不要使用，默认会报错。
2. 返回结果不要超过1000，500以下为宜，size小于等于500
3. 解决办法:
    1. 尽量避免深度分页查询，基本上用户很难关注到后面分页的数据，就像使用百度，大多数人只关心第一页的结果。如果非要使用需设置:
        ```
        PUT product/_settings
        {
            "index":{
                "max_result_window":1000000
            }
        }
        ```
    2. 使用Scroll Search(只能下一页，没办法上一页，不适合实时查询)

###  Scroll Search
解决Deep paging问题，第一次查询时，会返回一个scrollId游标，记录查询的位置，下一次查询直接从游标位置开始。
首次查询:
```
GET product/_search?scroll=1m    #  添加scroll=1m参数，意思是游标保存1分钟，游标超时将无法继续查询
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ],
  "size": 2    #  此处的size表示每一次查询返回的数量
}
```
后续查询:
```
GET _search/scroll  #  注意此处 没有设置index 因为scroll_id中已经包含了index信息 强行设置会报错
{
  "scroll":"1m",    #  刷新scroll的时间，每次查询都重新设置游标保存时间为1分钟 防止超时
  "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAGkYWSFFWVFNPWHVRRVdfaHIxenZjbWE4QQ=="
}
```
###  Filter缓存
1. Filter并不是每次执行都会进行cache，而是当执行一定次数的时候才会进行cache，保存一个二进制数组，1表示匹配，0表示不匹配。这个次数是不固定的。
2. Filter会从优先过滤掉稀疏的数据中，保留匹配的cache数组。
3. Filter cache保存的是匹配的结果，不需要再从倒排索引中去查找比对，大大提高了查询速度。
4. Filter一般会在Query之前执行，过滤掉一部分数据，从而提高Query速度。
5. Filter不计算相关度分数，在执行效率上比Query较高。
6. 当源数据发生改变时，cache也会更新

Last modification：July 16th, 2020 at 07:38 pm