elasticsearch

Setting up and playing with Elasticsearch for Development

What is Elastic Search (ES)?

The best description about Elasticsearch is from the creators themselves.

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Elasticsearch is very useful for running queries, retrieving and aggregating results on the documents stored in it including GeoPoint and proximity searches. It is one of the best solutions in the market to quickly (comparatively) built a niche search engine.

Elasticsearch lets you perform and combine many types of searches — structured, unstructured, geo, metric — any way you want. Start simple with one question and see where it takes you. It’s one thing to find the 10 best documents to match your query. But how do you make sense of, say, a billion log lines? Elasticsearch aggregations let you zoom out to explore trends and patterns in your data.

Pre-requisites

  • Docker Should be installed in your system. If not already installed, you can find the instruction from the docker documentation here. For Mac workstations, it is suggested to install the docker for Mac.
  • Access to a command line terminal (since the example below show cURL requests). In Mac, you can make use of the Terminal or iTerm.

Step-by-step guide

Setting up the Elasticsearch Instance

1.To quickly bring up an Elasticsearch Instance as a docker container, execute the below one-line command.

docker run --name estestserver -p 9200:9200 -p 9300:9300 -e discovery.type=single-node -d docker.elastic.co/elasticsearch/elasticsearch:latest

2. Now if you run the docker ps command, you should be able to see the ES container running. It takes appx. 10 seconds for the ES container to startup. The ES server is mapped on the host to its default port 9200

docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                                            NAMES
1949008f40ae        docker.elastic.co/elasticsearch/elasticsearch:7.10.2   "/usr/local/bin/dock..."   3 seconds ago       Up 2 seconds        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   estestserver

3. You can start off by firing a simple cURL request to the new ES. It will return the meta about the ES instance.

curl -X GET http://localhost:9200 -vvv
* Rebuilt URL to: http://localhost:9200/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9200 (#0)
> GET / HTTP/1.1
> Host: localhost:9200
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 508
<
{
  "name" : "1949008f40ae",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "3gtS8aCVRK2V3nsDfV0r4Q",
  "version" : {
    "number" : "7.3.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "4749ba6",
    "build_date" : "2019-08-19T20:19:25.651794Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
} 

From the above response, you can see that we are using the latest version 7.10.2 of ES as expected. You could also get to know the Lucene version that is being used internally and the name of the cluster. The name of the cluster is important for the clustered deployment of ES. However, it is beyond the scope of this article and will be covered in a different one.

4. Now you can start ingesting data into ES and execute different queries as per the ES API documentation. Some examples with sample data are shown in this article.

Setting up the Elastic Search and Kibana Combined Instance

Kibana is a visualisation platform that uses ES as the backend. Though Kibana is not used for development as is, Kibana provides an easy to use interface that can be leveraged to interact with ES in a simple manner with auto-complete options. Either, one can bring up another Kibana instance and connect to the ES instance shown perviously, can make use of the below compose file It is good for make and break.

version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:latest
    environment:
      - discovery.type=single-node
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - "9200:9200"
  kibana:
    image: docker.elastic.co/kibana/kibana:latest
    ports:
      - "5601:5601"

1.To bring up a combined instance of ES and Kibana, execute the below command. The below command will bring up ES listening on the default port 9200 and bind it to the host and also will start Kibana and bind to the port 5601.

docker-compose up -d

2. Now you can execute the above cURL to get the ES meta. In addition, you can log on to http://localhost:5601 to view the Kibana UI.

3. The primary work in terms of Dev and Test using Kibana will be on the Console. Kibana Console can be used to prototype API calls with Elastic Search. The URL http://localhost:5601/app/kibana#/dev_tools/console?_g=() should take you to the console. It is the Spanner symbol on the left pane, just above the Settings Cog Wheel.

Playing with Data in ES

Elastic search accepts data in JSON format. Each entry is called a document and needs to be inserted into an index and should have and ID.

A sample document can look as follows.

{
  "type": "base_rate",
  "key": "flatiron",
  "name": "The Flatiron Hotel",
  "city": "New York",
  "country": "USA",
  "latlong": "40.744072,-73.989258"
}

To bulk load the data into ES with a sample corpus (collection of data), the below can be used. Create a file as follows

{"index":{"_index":"index1","_id":0}}
{"type": "base_rate","key":"flatiron","name": "The Flatiron Hotel","city" : "New York","country": "USA","latlong" : "40.744072,-73.989258"}
{"index":{"_index":"index1","_id":1}},
{"type": "corporate_rate","key":"flatiron","name": "ABC | XYZ | The Flatiron Hotel","city" : "New York","country": "USA","latlong" : "40.744072,-73.989258","company_id": "ibm","agency_id": "gbt"}
{"index":{"_index":"index1","_id":2}}
{"type": "agency_rate","key":"flatiron","name": "XYZ | The Flatiron Hotel","city" : "New York","country": "USA","latlong" : "40.744072,-73.989258","agency_id": "xyz"}

1.The below cURL command can be used with the below file to load the data. Make sure to replace the path of the corpus file according to your system.

curl -X POST -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/index1/_bulk?pretty' --data-binary @corpus.ingest.json

2. Once the ingest is completed, an Index Pattern has to be created in Kibana so that you can play around with the specified indices in the console. To create an index pattern, Click on the settings symbol (cog wheel) of the left pane in the Kibana UI and click on Index Patterns

3. As shown in the sample corpus data, the index is created with the name index1 and we free to create what ever name is required. The Kibana index patten accepts, wild cards hence you can enter the pattern as index* and click on the Next Step button. In the next screen of the wizard, click on the Create Index Pattern button. 

4. Now, going to the console, we can execute ES API calls and Kibana will show the results.

5. In this case, it is important to update the mapping of the fields to have the right indexing and search. In this case, the lat long values have to be represented as GeoPoint. You can read more about mapping in the Elasticsearch site.  A sample mapping file looks as the contents in the below query, and executed from the file as follows.

curl -X PUT "http://localhost:9200/my-index?pretty" -H 'Content-Type: application/json' --data-binary @mapping.json
PUT /index1
{
    "settings": {
        "number_of_shards": 1
    },
    "mappings": {
        "properties": {
            "type" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "key" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "name" : {
                "type" : "text",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "city" : {
                "type" : "text",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "country" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "company_id" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "agency_id" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "latlong" : {
                "type" : "geo_point"
            }
        }
    }
}

6. Once the mapping is executed, executing GET index1/ in the console, will give back the mapping in which you can see that the datatype of the latlong field will be updated to geo_point.

Some Sample Queries that can be executed

A simple search using GeoPoints. All queries to Elasticsearch should be as per Query DSL.

GET /index1/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "2km",
                    "latlong" : {
                        "lat" : 40.74,
                        "lon" : -73.98
                    }
                }
            }
        }
    }
}

Posted by Arun Thundyill Saseendran in Database, How-To, Technology, 0 comments