Arun Thundyill Saseendran

Arun is a software engineering leader. He loves architecting enterprise software and guiding its development. He is an active researcher in the field of Artificial Intelligence and specializes in Computation Linguistics as a visiting researcher at Trinity College Dublin. Arun is also fond of reading and exploring the vastness of technology and periodically inking them. He is fond of machine learning, cloud computing and science of web. On a personal side, he enjoys spending time with family, he loves photography, travel and reading among others.
Arun is a software engineering leader. He loves architecting enterprise software and guiding its development. He is an active researcher in the field of Artificial Intelligence and specializes in Computation Linguistics as a visiting researcher at Trinity College Dublin. Arun is also fond of reading and exploring the vastness of technology and periodically inking them. He is fond of machine learning, cloud computing and science of web. On a personal side, he enjoys spending time with family, he loves photography, travel and reading among others.

Setting up and playing with Elasticsearch for Development

What is Elastic Search (ES)?

The best description about Elasticsearch is from the creators themselves.

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Elasticsearch is very useful for running queries, retrieving and aggregating results on the documents stored in it including GeoPoint and proximity searches. It is one of the best solutions in the market to quickly (comparatively) built a niche search engine.

Elasticsearch lets you perform and combine many types of searches — structured, unstructured, geo, metric — any way you want. Start simple with one question and see where it takes you. It’s one thing to find the 10 best documents to match your query. But how do you make sense of, say, a billion log lines? Elasticsearch aggregations let you zoom out to explore trends and patterns in your data.

Pre-requisites

  • Docker Should be installed in your system. If not already installed, you can find the instruction from the docker documentation here. For Mac workstations, it is suggested to install the docker for Mac.
  • Access to a command line terminal (since the example below show cURL requests). In Mac, you can make use of the Terminal or iTerm.

Step-by-step guide

Setting up the Elasticsearch Instance

1.To quickly bring up an Elasticsearch Instance as a docker container, execute the below one-line command.

docker run --name estestserver -p 9200:9200 -p 9300:9300 -e discovery.type=single-node -d docker.elastic.co/elasticsearch/elasticsearch:latest

2. Now if you run the docker ps command, you should be able to see the ES container running. It takes appx. 10 seconds for the ES container to startup. The ES server is mapped on the host to its default port 9200

docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                                            NAMES
1949008f40ae        docker.elastic.co/elasticsearch/elasticsearch:7.10.2   "/usr/local/bin/dock..."   3 seconds ago       Up 2 seconds        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   estestserver

3. You can start off by firing a simple cURL request to the new ES. It will return the meta about the ES instance.

curl -X GET http://localhost:9200 -vvv
* Rebuilt URL to: http://localhost:9200/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9200 (#0)
> GET / HTTP/1.1
> Host: localhost:9200
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 508
<
{
  "name" : "1949008f40ae",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "3gtS8aCVRK2V3nsDfV0r4Q",
  "version" : {
    "number" : "7.3.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "4749ba6",
    "build_date" : "2019-08-19T20:19:25.651794Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
} 

From the above response, you can see that we are using the latest version 7.10.2 of ES as expected. You could also get to know the Lucene version that is being used internally and the name of the cluster. The name of the cluster is important for the clustered deployment of ES. However, it is beyond the scope of this article and will be covered in a different one.

4. Now you can start ingesting data into ES and execute different queries as per the ES API documentation. Some examples with sample data are shown in this article.

Setting up the Elastic Search and Kibana Combined Instance

Kibana is a visualisation platform that uses ES as the backend. Though Kibana is not used for development as is, Kibana provides an easy to use interface that can be leveraged to interact with ES in a simple manner with auto-complete options. Either, one can bring up another Kibana instance and connect to the ES instance shown perviously, can make use of the below compose file It is good for make and break.

version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:latest
    environment:
      - discovery.type=single-node
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - "9200:9200"
  kibana:
    image: docker.elastic.co/kibana/kibana:latest
    ports:
      - "5601:5601"

1.To bring up a combined instance of ES and Kibana, execute the below command. The below command will bring up ES listening on the default port 9200 and bind it to the host and also will start Kibana and bind to the port 5601.

docker-compose up -d

2. Now you can execute the above cURL to get the ES meta. In addition, you can log on to http://localhost:5601 to view the Kibana UI.

3. The primary work in terms of Dev and Test using Kibana will be on the Console. Kibana Console can be used to prototype API calls with Elastic Search. The URL http://localhost:5601/app/kibana#/dev_tools/console?_g=() should take you to the console. It is the Spanner symbol on the left pane, just above the Settings Cog Wheel.

Playing with Data in ES

Elastic search accepts data in JSON format. Each entry is called a document and needs to be inserted into an index and should have and ID.

A sample document can look as follows.

{
  "type": "base_rate",
  "key": "flatiron",
  "name": "The Flatiron Hotel",
  "city": "New York",
  "country": "USA",
  "latlong": "40.744072,-73.989258"
}

To bulk load the data into ES with a sample corpus (collection of data), the below can be used. Create a file as follows

{"index":{"_index":"index1","_id":0}}
{"type": "base_rate","key":"flatiron","name": "The Flatiron Hotel","city" : "New York","country": "USA","latlong" : "40.744072,-73.989258"}
{"index":{"_index":"index1","_id":1}},
{"type": "corporate_rate","key":"flatiron","name": "ABC | XYZ | The Flatiron Hotel","city" : "New York","country": "USA","latlong" : "40.744072,-73.989258","company_id": "ibm","agency_id": "gbt"}
{"index":{"_index":"index1","_id":2}}
{"type": "agency_rate","key":"flatiron","name": "XYZ | The Flatiron Hotel","city" : "New York","country": "USA","latlong" : "40.744072,-73.989258","agency_id": "xyz"}

1.The below cURL command can be used with the below file to load the data. Make sure to replace the path of the corpus file according to your system.

curl -X POST -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/index1/_bulk?pretty' --data-binary @corpus.ingest.json

2. Once the ingest is completed, an Index Pattern has to be created in Kibana so that you can play around with the specified indices in the console. To create an index pattern, Click on the settings symbol (cog wheel) of the left pane in the Kibana UI and click on Index Patterns

3. As shown in the sample corpus data, the index is created with the name index1 and we free to create what ever name is required. The Kibana index patten accepts, wild cards hence you can enter the pattern as index* and click on the Next Step button. In the next screen of the wizard, click on the Create Index Pattern button. 

4. Now, going to the console, we can execute ES API calls and Kibana will show the results.

5. In this case, it is important to update the mapping of the fields to have the right indexing and search. In this case, the lat long values have to be represented as GeoPoint. You can read more about mapping in the Elasticsearch site.  A sample mapping file looks as the contents in the below query, and executed from the file as follows.

curl -X PUT "http://localhost:9200/my-index?pretty" -H 'Content-Type: application/json' --data-binary @mapping.json
PUT /index1
{
    "settings": {
        "number_of_shards": 1
    },
    "mappings": {
        "properties": {
            "type" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "key" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "name" : {
                "type" : "text",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "city" : {
                "type" : "text",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "country" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "company_id" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "agency_id" : {
                "type" : "keyword",
                "fields" : {
                    "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "latlong" : {
                "type" : "geo_point"
            }
        }
    }
}

6. Once the mapping is executed, executing GET index1/ in the console, will give back the mapping in which you can see that the datatype of the latlong field will be updated to geo_point.

Some Sample Queries that can be executed

A simple search using GeoPoints. All queries to Elasticsearch should be as per Query DSL.

GET /index1/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "2km",
                    "latlong" : {
                        "lat" : 40.74,
                        "lon" : -73.98
                    }
                }
            }
        }
    }
}

Posted by Arun Thundyill Saseendran in Database, How-To, Technology, 0 comments

Freelance Work a.k.a virtual company

What is it like?

When you work as a freelancer there is actually no company. It may be a large company, however, you are not entitled to any company benefits or there isn’t the protection of any Labour laws. In most cases it would be some individual or a small startup, that would be your employer. The other part which will turn your heart down is that the pay will be meager compared to the industry.

Why then?

Be it may, with all the downsides (or so-called), freelance work is one of the best types of work for learning. You do not have any horizontal or vertical departments as in a traditional organization, there are no software management systems for defect and change tracking, version control, file storage, or pretty much anything. You will have to manage end to end – from taking requirements, tracking them to delivering them to production after testing.

I still remember the time where I had to migrate 6000 live domains from a proprietary DNS server to a FOSS DNS Server (Power DNS), after writing a migration tool, a JAVA batch processor for syncing from the hosting servers to the PDNS servers, and shell scripts to wire in cron. All this without any kind of tracking systems or version control systems, not even a staging environment to test what you have written. For anyone working in an established company, this is utter nonsense and a nightmare. However this is how it happens in the freelance world, it is shivering, but it works and you get paid peanuts.

So…

However, the learning you get from such experiences is priceless. The confidence that you gain for managing things end to end is unimaginable. And that is the prime advantage of working as a freelancer. The feeling of freedom is amazing as well, more often than not, the person giving you the work will often be a non-techie meaning you get to decide what to and how-to, but then when to will be almost always very tight. Also, that means, you will have to explore, learn, use, develop, implement, test, deploy, write documentation, and support – uff – a lot of learning – a lot of experience!!!

Personally, I have gained a lot working as a freelancer. All the confidence that I have today, the ability to jump into stuff with strong confidence of turning it around, the knowledge that has given me a push start in each stage of my career, the appreciation of the tools, technologies, and benefits given by established companies that ease everyday life, is all because of my learnings working as a freelancer. However, I do not claim that everyone who works as a freelancer will have a great experience, however, if you choose the right job and have the passion to work and eagerness to learn, for sure you will reap the fruit.

Being a freelancer you can learn a lot and gain confidence for a lifetime – though you’ll get paid peanuts and experience a lot of stress!

Posted by Arun Thundyill Saseendran in Flavours of Companies, Thoughts, Worklife, Workplace, 0 comments

Flavors of Software Companies – The good, bad and ugly!

Since the world is driven by software, the number of software companies and their types is vast. Hence, in this article, I am going to categorize the different flavors on a high level and from the perspective of a software development engineer – which I am and will be! Folks focused in other areas of software industry such as marketing, sales, finance, management, etc., may not find these flavors meaningful.

Each Company is different

and

your choice decides your career!

What do I mean by flavors of companies?

By a flavour I mean a combination of

  • the culture the organisation has
  • the work environment
  • the technology stack and approach towards technology
  • the learning opportunity (very important isn’t it)
  • the compensation (a reason to wake up and go to work)

The types of companies in my initial list

  • Freelance Work a.k.a virtual company
  • Service / Consultancy company
  • An Established Product company
  • A startup
  • Pseudo Startup (Small/Medium established companies with a claim of being a startup)

I am going to jot down, what I know of, mostly because of my personal experience in working in all of the above types of companies and some that I know of from people I know I can trust.

Posted by Arun Thundyill Saseendran in Thoughts, Worklife, Workplace, 0 comments

Understanding Time Complexity – For you and me

Understanding what is time complexity or use of asymptotic notation is an important part of the learning curve of a software engineer. When you are comparing two algorithms for the same purpose or designing an algorithm and trying to optimize it, comparing it in terms of the asymptotic notation is the simplest and effective of the ways available. In fact, comparing the efficiency of the algorithm in terms of the asymptotic notation is a way software engineers talk about the efficiency of an algorithm.

Of course, one can go ahead and use a micro-second or nano-second precision clock to time your algorithm and compare it. But, what if we change the system, what if the precision of the clock varies from system to system. How do we gauge the effect of the various lengths of input?

The asymptotic analysis gives a simple solution to represent the time complexity (execution complexity) of an algorithm in terms of the size of the input to the algorithm. There is a bit of math behind it, though not exact math, however for this particular blog, we are going to avoid all of the math and understand the asymptotic analysis that is just enough (good enough or must know) for a software engineer. 

With that context, I thought I will quickly write down some of the basics of asymptotic analysis since I was doing a quick refresh of the concept with some of my friends. Hence this is not authoritative content but is bound to be simple and meaningful.

When we talk about asymptotic notation, three major symbols and what they represent are important. 

  • Big Oh: The upper bound or the worst case scenario.
  • Big Omega: The lower bound or the best case scenario.
  • Theta: Upper and Lower Bound or the average’ish’ scenario.
The three important symbols of asymptotic analysis

Of the three, in most cases, we worry about only the ‘Big Oh’ notation. As an algorithm designer, you are worried about the worst case scenario. In simple words, it means that given the worst possible input, how much time your algorithm is going to take in terms of the size of the input.

Some of the common scales of asymptotic notation are as follows. Do not worry if you do not understand it right away. We ultimately will by the end of this blog.

TermNotationExample
ConstantO (1)Not affected by input size
LograthamicO (log n)Binary Search
LinearO (n)Single Loop
N log N (Linearithmic)O (n log n)Merge Sort – Divide and Conquer
QuadraticO (n2)Bubble Sort
CubicO (n3)Three Nested Loops
Exponential2O(n)Brute Force
FactorialO (n!)Travelling Salesman

From the table above, we did know about some of the common terms that are used in terms of the ‘Big Oh’. It is pretty straight forward that they are in the increasing order of their complexity. Simply put, an algorithm having the complexity of O(1) is better than the one with O(log n) which is better than one with O(n log n) and so on. The next graph shows an easy to remember illustration about the various complexities.

Time Complexity Comparison Chart

The illustration is taken from here and I feel is a good resource for a quick reference of time complexity charts for various algorithms and data structures.

A quick look at some of the time complexities in terms of the various operations on different data structures as shown below will now give an authoritative idea on why some data structures are chosen for some operations. For example, searching in a hash table or set is efficient (because its complexity is O(1) – though that is debatable). The search on a linked list is not efficient compared to a hash table since it has a complexity of O(n).

Time Complexities of Data Structures
Continue reading →
Posted by Arun Thundyill Saseendran in Basics of Programming, 1 comment

Why Trinity College Dublin is the Best in Ireland for Studying Computer Science?

Trinity College Dublin Pano

To first put out a blunt fact, hardly 10% of the total population who study under graduation go on to pursue their Master’s degree. And that is one of the main reasons that there is scarce suggestions/advice available when it comes to choosing the country and university for pursuing your Master’s degree. Even I was quizzed with all these questions – about the country – about the University, and about the rationale for choosing that country or university. The worst part is when there are people who would never even have thought about doing their Master’s, without any research and without any experience starts suggesting countries like the USA, Canada, Australia, what else and what not. Some familiar names that they have heard come in as serious advice and whenever a person chooses something else they are offended. And often, it is the duty of the student to provide the rationale and justify the choice of choosing the country. For example, I choose Ireland to pursue a Master’s degree in Computer Science. Ireland is a wonderful country to study and gain work experience. I had done my research and the potential was very good. However, convincing others was very difficult and hence I wrote a post of the reasons for choosing Ireland to study Computer Science so that people like me who face questions can justify!

Choosing a good university is as important as choosing a good country to pursue a Masters degree in Computer Science. In this post, I pen out the rationale that satisfied me to choose Trinity College Dublin, in County Dublin, Ireland to pursue Master of Science in Computer Science. This may be an anchor for you make the decision just like me to study in Trinity College Dublin.

The above facts were based on focused reading and research before I joined here to study. However, after four months of studying in Trinity College Dubin, I can surely vouch for it. Some of my very personal experiences after studying here and talking to a lot of people are as follows.

  • It is a joy and pride to walk around in this heritage campus along the paths, corridors, and hallways where some of the greats have walked.
  • The course structure and the topics covered are much more elaborate, industry-oriented and hard than other universities like UCD, NUI Galway and most other universities.
  • A close competitor to Trinity College Dublin is UCD (University College Dublin) and they do a pretty decent job in terms of employability. When it comes to Trinity, the college and the course is reputed and respected and many career fairs are organized by the university. However, in terms of UCD, the placement cell takes special care in getting the students placed which as far as I know is not there in TCD.
  • Located in the heart of the City, studying at Trinity opens you to all possible Tech Meetups in the City. Be it any meetup, it is just a small drift away.
  • Tradition is always a norm. Seeing professors in robes every now and then and having orthodox ceremonies is a usual sight.

And to add to all these, some popular articles from the press.

Trinity College Dublin finishes top of the league table for the 16th successive year in the #SundayTimes Irish Good University Guide 2019

https://www.thetimes.co.uk

The Trinity Grand Canal Dock Innovation District is underway and with it, the fame and honor of Trinity College Dublin will further increase.

GC Innovation District EZine

If you have questions, post them in the comments section and I will try to answer them. If you have a difference of opinion or want to add anything, let me know through comments and I will add them in.

Posted by Arun Thundyill Saseendran in Education, 2 comments

Why study Computer Science in Ireland?

I came to Ireland in the year 2018 to study at Trinity College Dublin, Co. Dublin, Ireland. One of the questions that many people asked me very much unanimously when they came to know that I am in Ireland for studying my Master’s degree in Computer Science is: 

Why Ireland? Why not US, Canada or Australia… Why Ireland of all other countries?

Hence, I thought I will pen down why I chose Ireland. It will help some of those wanting to study in Ireland convince people and for some may be an eye-opener for those who are planning to do their higher studies.

Some of the most important things that one should consider when choosing a country and University for higher studies are as follows

  • Quality of education.
  • Research Output of the University.
  • Favorability for gaining experience in the field of study with a work VISA.
  • Availability of top companies in the related field.
  • The possibility of Employment and good enough salary to sustain you.

Rationale for Ireland as the country of Study

Considering the above-mentioned prospects and carefully inspecting the facts, Ireland was my choice of country to pursue Masters Degree in Computer Science. The reasons that led me to choose Ireland as the country of study are as follows.

A small word of warning – but can be tackled quite well.

The rent for housing in ‘Dublin’ is very very high. You could get a shared double bedroom 30 mins away from the city center at around 500 EUR. If you want to stay in the city center, a single ensuite room will be no less than 900 EUR. However, all these places are legal, safe and well maintained.

You could also find a lot of places where you could stay as paying guest. This has also been quite a good experience and works out a lot cheaper. It is great if you are happy to live with a host family. To remind you, the Irish people are very friendly and their hospitality is awesome.

I currently study at Trinity College Dublin. I will shortly write a blog post about the rationale of choosing Trinity for studying computer science.

Posted by Arun Thundyill Saseendran in Education, 0 comments