Elasticsearch入门-2.索引和检索用户信息

用Elasticsearch实现一个简单的例子:

创建一个用户目录,该目录需要支持:

  • 支持包含多值标签、数值、以及全文本的数据
  • 检索任一用户的完整信息
  • 允许结构化搜索,比如查询 30 岁以上的用户
  • 允许简单的全文搜索以及较复杂的短语搜索
  • 支持在匹配文档内容中高亮显示搜索片段
  • 支持基于数据创建和管理分析仪表盘
  1. 将一条用户信息保存并索引:

    1
    2
    3
    4
    5
    6
    7
    PUT /aviraer/user/1
    {
    "name": "jack",
    "age": 18,
    "interests": [ "sports", "music" ],
    "about": "I am very happy"
    }
    1
    2
    3
    4
    5
    6
    7
    PUT /aviraer/user/2
    {
    "name": "tom",
    "age": 28,
    "interests": [ "tv", "music" ],
    "about": "I am very angry"
    }

    在上面的例子中,Elasticsearch做了如下操作:

    • 每个用户索引一个文档,包含该用户的所有信息。

    • 每个文档都将是 user 类型

    • 该类型位于 索引 aviraer 内。

    • 该索引保存在我们的 Elasticsearch 集群中。

  1. 检索刚刚添加的用户信息:

    1
    GET /aviraer/user/1
  1. 轻量级搜索

    上面只是检索指定用户的信息,接下来使用用户的信息来搜索

  • 搜索所有用户

    1
    GET /aviraer/user/_search
  • 搜索名字为tom的用户

    1
    GET /aviraer/user/_search?q=name:tom
  1. 表达式搜索

    轻量级搜索非常方便,但是并不够灵活,而表达式搜索可以提供更加丰富的搜索

    • 搜索年龄在15-20岁之间的用户
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    GET /aviraer/user/_search
    {
    "query" : {
    "bool": {
    "filter": {
    "range" : {
    "age" : { "gt" : 15, "lt" : 20}
    }
    }
    }
    }
    }
  2. 全文搜索

    传统关系数据库很难在大量数据中进行全文搜索,而这个也正是Elasticsearch的强项

    • 搜索关于中带有very happy的用户
    1
    2
    3
    4
    5
    6
    7
    8
    GET /aviraer/user/_search
    {
    "query" : {
    "match": {
    "about": "I happy"
    }
    }
    }

    查询结果如下,Es中的查询并不像传统数据库一样进行完全匹配,即使是部分匹配也会按照相关度排序

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    {
    "took": 1,
    "timed_out": false,
    "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
    {
    "_index": "aviraer",
    "_type": "user",
    "_id": "1",
    "_score": 0.5753642, //相关度得分
    "_source": {
    "name": "jack",
    "age": 18,
    "interests": [
    "sports",
    "music"
    ],
    "about": "I am very happy"
    }
    },
    {
    "_index": "aviraer",
    "_type": "user",
    "_id": "2",
    "_score": 0.2876821,
    "_source": {
    "name": "tom",
    "age": 28,
    "interests": [
    "tv",
    "music"
    ],
    "about": "I am very angry"
    }
    }
    ]
    }
    }
  3. 短语搜索

    上面的例子搜索了“I happy”,Es会把 I 和 happy 进行拆分,然后分别进行搜索,所以即使两个单词在数据中不连续也可以搜索到。如果需要强制将“I happy”作为一个整体进行匹配,就可以使用短语搜索功能:

    1
    2
    3
    4
    5
    6
    7
    8
    GET /aviraer/user/_search
    {
    "query" : {
    "match_phrase": {
    "about": "I happy"
    }
    }
    }

    这样就搜不到任何结果了

  1. 高亮搜索

    使用搜索引擎时结果页会将页面中匹配的内容进行高亮显示

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    GET /aviraer/user/_search
    {
    "query" : {
    "match": {
    "about": "I happy"
    }
    },
    "highlight": {
    "fields": {
    "about": {}
    }
    }
    }

    结果中highlight会把高亮内容用<em>标签包裹起来

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    {
    "took": 93,
    "timed_out": false,
    "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
    {
    "_index": "aviraer",
    "_type": "user",
    "_id": "1",
    "_score": 0.5753642,
    "_source": {
    "name": "jack",
    "age": 18,
    "interests": [
    "sports",
    "music"
    ],
    "about": "I am very happy"
    },
    "highlight": {
    "about": [
    "<em>I</em> am very <em>happy</em>"
    ]
    }
    },
    {
    "_index": "aviraer",
    "_type": "user",
    "_id": "2",
    "_score": 0.2876821,
    "_source": {
    "name": "tom",
    "age": 28,
    "interests": [
    "tv",
    "music"
    ],
    "about": "I am very angry"
    },
    "highlight": {
    "about": [
    "<em>I</em> am very angry"
    ]
    }
    }
    ]
    }
    }
  2. 分析

    分析用户的平均年龄

    1
    2
    3
    4
    5
    6
    7
    8
    GET /aviraer/user/_search
    {
    "aggs" : {
    "avg_age" : {
    "avg" : { "field" : "age" }
    }
    }
    }