#StackBounty: #javascript #elasticsearch #mapper Multi-language elastic search mapping setup

Bounty: 50

I have documents stored in MongoDB like so:

const demoArticle = {
  created: new Date(),
  title: [{
    language: 'english',
    value: 'This is the english title'
  }, {
    language: 'dutch',
    value: 'Dit is de nederlandse titel'

I want to add analyzers to specific languages, which is normally specified like so:

"mappings": {
   "article": {
      "properties": {
         "created": {
            "type": "date"
         "title.value": {
           "type": "text",
           "analyzer": "english"

The problem is however: depending on the language set on the child level, it should have an analyzer set according to that same language.

I’ve stumbled upon Dynamic Templates in ElasticSearch but I was not quite convinced this is suited for this use-case.

Any suggestions?

Get this bounty!!!

#StackBounty: #sql-server #sql #database-design #elasticsearch Combine relational DB and Elastic search

Bounty: 100

We have a large quanitity of text files we want to free-text/full text search, combined with relational structured metadata about the text file.
So, a search could be “Give me all files that belong to group X(or sub groups of X), have author (Ari and Bari and Mari), belongs to organization Y, and contains the text “synthetic”. The latter part being a full-text search, and the other being already stored as relational data in our existing db.

In our database(which is rather complex), there are stored a way to ID the files, and a ton of various metadata about the file, spread among tens of tables, ranging from simple 1-1 relationships, to 1-many sets pr file, and even tree-structure relationship(things like “this file is type X, type X is a subgroup of type Y, etc).
And this metadata may change over time, all over the application(which is huge).

Now, I as a database admin, thought this could be solved by using SQL Server to do the search for structured metadata already in the DB, constraining the search to candidate-files, and then passing the candidate file id’s to elastic search for full-text searching. (Re-indexing the file on elastic when a file is added or commited is trivial in our code)

However, the elastic-guys in our project naturally had a different idea:
To extract all the meta-data as well as the full-text content from the files, to elastic-search, and run the search exclusively in elastic.

This allows them to run full powered lucene queries easily, and load is taken off the database, which is nice. However, this also to me, introduces a nightmare to keep the structured metadata in sync, and blindly re-indexing/syncing everything periodically is not possible due to the scale of data.

I can see merits/concerns to both options. Is there a best practice for this kind of thing?

Get this bounty!!!

#StackBounty: #elasticsearch #elasticsearch-aggregation Can ElasticSearch aggregations do what SQL can do?

Bounty: 50

In Elasticsearch I need get the frequency and the number of colors that occur the most frequent from the highest to lowest.
If I have data like this:


I need to get the count of each color and then group by the count. So in the end, I would like all the colors that occurs the same number of times to be inside one group.
This is how I would do it in sql.

  count(*) as 'Number of Colors', 
  i.c as 'Seen times' 
        name as 'n', 
        count(*) as 'c'
      group by name
   ) i 
group by i.c
order by i.c desc;

This would return:

Number of Colors | Seen times
2                | 3
1                | 2
1                | 1

How would I write it in Elasticsearch query? I am using version 5.5.

Get this bounty!!!

#StackBounty: #java #elasticsearch #rest-client Elasticsearch Rest Client Still Giving IOException : Too Many Open Files

Bounty: 100

This is a follow up to the solution which was provided to me on this previous post:

How to Properly Close Raw RestClient When Using Elastic Search 5.5.0 for Optimal Performance?

This same exact error message came back!

2017-09-29 18:50:22.497 ERROR 11099 --- [8080-Acceptor-0] org.apache.tomcat.util.net.NioEndpoint   : Socket accept failed

java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[na:1.8.0_141]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[na:1.8.0_141]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[na:1.8.0_141]
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:453) ~[tomcat-embed-core-8.5.15.jar!/:8.5.15]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]

2017-09-29 18:50:23.885  INFO 11099 --- [Thread-3] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@5387f9e0: startup date [Wed Sep 27 03:14:35 UTC 2017]; root of context hierarchy
2017-09-29 18:50:23.890  INFO 11099 --- [Thread-3] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 2147483647
2017-09-29 18:50:23.891  WARN 11099 --- [Thread-3] o.s.c.support.DefaultLifecycleProcessor  : Failed to stop bean 'documentationPluginsBootstrapper'

    ... 7 common frames omitted

2017-09-29 18:50:53.891  WARN 11099 --- [Thread-3] o.s.c.support.DefaultLifecycleProcessor  : Failed to shut down 1 bean with phase value 2147483647 within timeout of 30000: [documentationPluginsBootstrapper]
2017-09-29 18:50:53.891  INFO 11099 --- [Thread-3] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
2017-09-29 18:50:53.894  INFO 11099 --- [Thread-3] com.app.controller.SearchController  : Closing the ES REST client

I tried using the solution from the previous post.


public class ElasticsearchConfig {

private String host;

private int port;

public RestClient restClient() {
    return RestClient.builder(new HttpHost(host, port))
    .setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {
        public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder requestConfigBuilder) {
            return requestConfigBuilder.setConnectTimeout(5000).setSocketTimeout(60000);


public class SearchController {

    private RestClient restClient;

    @RequestMapping(value = "/search", method = RequestMethod.GET, produces="application/json" )
    public ResponseEntity<Object> getSearchQueryResults(@RequestParam(value = "criteria") String criteria) throws IOException {

        // Setup HTTP Headers
        HttpHeaders headers = new HttpHeaders();
        headers.add("Content-Type", "application/json");

        // Setup query and send and return ResponseEntity...

        Response response = this.restClient.performRequest(...);

    public void cleanup() {
        try {
            logger.info("Closing the ES REST client");
        catch (IOException ioe) {
            logger.error("Problem occurred when closing the ES REST client", ioe);



    <!-- Spring -->


    <!-- Elasticsearch -->


    <!-- Apache Commons -->

    <!-- Log4j -->

This makes me think that the RestClient was never explicitly closing the connection, in the first place…

And this is surprising since my Elasticsearch Spring Boot based Microservice is load balanced on two different AWS EC-2 Servers.

That exception occurreded like 2000 times reported by the log file and only in the end did the preDestroy() close the client. See the INFO from the @PreDestroy() cleanup method being logged at the end of the StackTrace.

Do I need to explicitly put a finally clause inside the SearchController and close the RestClient connection explicitly?

It’s really critical that this IOException doesn’t happen again because this Search Microservice is dependent on a lot of different mobile clients (iOS & Android).

Need this to be fault tolerant and scalable… Or, at the very least, not to break.

The only reason this is in the bottom of the log file:

2017-09-29 18:50:53.894  INFO 11099 --- [Thread-3] com.app.controller.SearchController : Closing the ES REST client

Is because I did this:

kill -3 jvm_pid

Should I keep the @PreDestory cleanup() method but change the contents of my SearchController.getSearchResults() method to reflect something like this:

@RequestMapping(value = "/search", method = RequestMethod.GET, produces="application/json" )
public ResponseEntity<Object> getSearchQueryResults(@RequestParam(value = "criteria") String criteria) throws IOException {
    // Setup HTTP Headers
    HttpHeaders headers = new HttpHeaders();
    headers.add("Content-Type", "application/json");

    // Setup query and send and return ResponseEntity...
    Response response = null;

    try {
        // Submit Query and Obtain Response
        response = this.restClient.performRequest("POST", endPoint, Collections.singletonMap("pretty", "true"), entity);
    catch(IOException ioe) {
        logger.error("Exception when performing POST request " + ioe);
    finally {
    // return response as EsResponse();

This way the RestClient connection is always closing…

Would appreciate if someone can help me with this.

Get this bounty!!!

#StackBounty: #elasticsearch elasticsearch: Get all documents that share the same top-most ancestor

Bounty: 50

I am trying to get all documents that share the same top-most ancestor, where one child can be the parent, grandparent, grand grandparent etc. of multiple docs.

So let’s say I have a structur like this (borrowed from https://www.elastic.co/guide/en/elasticsearch/reference/5.6/parent-join.html):

comment  answer
(child)  (child)

In code:

PUT my_index
  "settings": {
    "mapping.single_type": true
  "mappings": {
    "doc": {
      "properties": {
        "my_join_field": {
          "type": "join",
          "relations": {
            "question": ["answer", "comment"]

However, one can answer comments and comment answers theoretically forever. So say I have a question, that is structured like this:

                               (id: 1)
                        answer          answer
                       (id: 5)          (id: 8)
                       /                   |
                      /                    |
                   comment  answer        answer
                  (id: 15) (id: 12)      (id: 9)
                  /            |          /     
                 /             |         /      
              answer  answer  comment  answer  answer 
             (id: 16)(id: 17) (id: 19) (id: 10)(id: 11)

How do I get all the documents (ids 1, 5, 8, 9, 10, 11, 12, 15, 16, 17, 19), only knowing id 9?

Get this bounty!!!

#StackBounty: #performance #date #elasticsearch #filter Do additional date range filters increase the performance?

Bounty: 50

To an existing, large ElasticSearch 5 index, I want to add a date field, containing the date of the indexation of each document. Afterwards I want to query this index, to return all documents, created in the last minute.

In the ElasticSearch Ultimative Guide for version 1 it is mentioned, that adding additional filters for day, month and/or year can improve the performance drastically. Newer versions of the guide do not say so anymore.

Can I gain performance in ElasticSearch 5 with adding additional date filters?

Get this bounty!!!

#StackBounty: #elasticsearch #spring-boot #spring-data-elasticsearch ElasticSearch – Boosting based on depth in a recursive structure

Bounty: 50

I am using Elastic search 2.4.4(compatible with spring boot 1.5.2).

I have a document object which has the following structure :

    id : 1,
    title : Doc title
    //some more metadata
    sections :[
           "id" : 2,
           "title: Sec title 1,
           id : 3,
           title: Sec title 2,


Basically I want to make the titles in the document searchable(all document title, section titles and subsection titles at any level) and I want to be able to score the documents based on the level at which they match in the tree hierarchy.

My initial thought was using some strcture like this :

           title : doc title,
           depth : 0
           title : sec title 1,
           depth : 1
           title : sec title 2,
           depth : 1

I would like to rank the documents based on the depth at which there is match(higher the depth, lower is the score).

I know the basic boosting based on the field but,

is there a way can do this in elastic search?


Is it possible to do it by changing the structure?

Get this bounty!!!

#StackBounty: #javascript #elasticsearch #browser #kibana Kibana stuck on loading screen

Bounty: 50

Kibana is not starting up properly. When I open up the console it appears to be a javascript resource issue. When I open the js files directly (clicking on their link in the console) it appears they are incomplete and have been abruptly been cut off. Not sure if this is a browser file limit or somehow my files have been cut off? Please see the images below to show you what Im seeing.

Kibana stuck on loading screen

File as seen in chrome. This is the very bottom of the file as per how chrome loads it.


I have restarted kibana to see if that would resolve it, no luck.

I think browsers have a max line limit in js files. I am not sure why kibana hasn’t minified the js files? has it started up in some dev mode?

question summary

I guess I have discovered the reason for kibana not loading is because of the js not fully loading, this would change my question to how can I get all of my javascript to load?


I have located the JS files in the kibana bundles folder and found that the file is fully intact. It is indeed a browser loading complete file issue. I’m confused why suddenly those files are too long to be loaded by the browser? Was working fine a fortnight ago. Still trying to work out how I can get chrome to load the files.

As suggested by @asettouf I have removed(backed up) bundles folder in the /opt/kibana/optimize directory and started kibana up again. This did re-generate the bundles folder but the files are identical, meaning I still have the same issue. How come Kibana is not minifying the js when it bundles the files for caching?

My kibana.yml. I think it is cleaner to paste a link to it:


went back turned on verbose logging and this is my output from deleting optimize folder and restarting. nothing stands out as an error message to me.


replaced hostname with localhost for privacy and security reasons


I think this is an error with the webpack module not compiling the JS correctly. however i dont know enough about the module to debug it.

the files in question in the optimize folder are:

commons.bundle.js which is 65723 lines

kibana.bundle.js at 108950 lines

These are far from optimized and the content inside the files are not minified.

Result of curl -v localhost:5601


Get this bounty!!!

Installing Apache UserGrid on linux

About the Project

Apache Usergrid is an open-source Backend-as-a-Service (BaaS or mBaaS) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services and retrieval features like:

  • User Registration & Management
  • Data Storage
  • File Storage
  • Queues
  • Full Text Search
  • Geolocation Search
  • Joins

It is a multi-tenant system designed for deployment to public cloud environments (such as Amazon Web Services, Rackspace, etc.) or to run on traditional server infrastructures so that anyone can run their own private BaaS deployment.

For architects and back-end teams, it aims to provide a distributed, easily extendable, operationally predictable and highly scalable solution. For front-end developers, it aims to simplify the development process by enabling them to rapidly build and operate mobile and web applications without requiring backend expertise.

Usergrid 2.1.0 Deployment Guide

Though the Usergrid Deployment guide seems to be simple enough, I faced certain hiccups and it took me about 4 days to figure out what I was doing wrong.

This document explains how to deploy the Usergrid v2.1.0 Backend-as-a-Service (BaaS), which comprises the Usergrid Stack, a Java web application, and the Usergrid Portal, which is an HTML5/JavaScript application.


Below are the software requirements for Usergrid 2.1.0 Stack and Portal. You can install them all on one computer for development purposes, and for deployment you can deploy them separately using clustering.

Linux or a UNIX-like system (Usergrid may run on Windows, but we haven’t tried it)

Download the Apache Usergrid 2.1.0 binary release from the official Usergrid releases page:

After untarring the files that you need for deploying Usergrid Stack and Portal are ROOT.war and usergrid-portal.tar.

Stack STEP #1: Setup Cassandra

As mentioned in prerequisites, follow the installation guide given in link

Usergrid uses Cassandra’s Thrift protocol
Before starting cassandra, on Cassandra 2.x releases you MUST enable Thrift by setting start_rpc in your cassandra.yaml file:

    #Whether to start the thrift rpc server.
    start_rpc: true

Note:DataStax no longer supports the DataStax Community version of Apache Cassandra or the DataStax Distribution of Apache Cassandra. It is best to follow the Apache Documentation

Once you are up and running make a note of these things:

  • The name of the Cassandra cluster
  • Hostname or IP address of each Cassandra node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for Cassandra RPC (the default is 9160)
  • Replication factor of Cassandra cluster

Stack STEP #2: Setup ElasticSearch

Usergrid also needs access to at least one ElasticSearch node. As with Cassandra, you can setup single ElasticSearch node on your computer, and you should run a cluster in production.


  • Download and unzip Elasticsearch
  • Run bin/elasticsearch (or bin\elasticsearch -d on Linux as Background Process) (or bin\elasticsearch.bat on Windows)
  • Run curl http://localhost:9200/

Once you are up and running make a note of these things:

  • The name of the ElasticSearch cluster
  • Hostname or IP address of each ElasticSearch node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for ElasticSearch protocol (the default is 9200)

Stack STEP #3: Setup Tomcat

The Usergrid Stack is contained in a file named ROOT.war, a standard Java EE WAR ready for deployment to Tomcat. On each machine that will run the Usergrid Stack you must install the Java SE 8 JDK and Tomcat 7+.

Stack STEP #4: Configure Usergrid Stack

You must create a Usergrid properties file called usergrid-deployment.properties. The properties in this file tell Usergrid how to communicate with Cassandra and ElasticSearch, and how to form URLs using the hostname you wish to use for Usegrid. There are many properties that you can set to configure Usergrid.

Once you have created your Usergrid properties file, place it in the Tomcat lib directory. On a Linux system, that directory is probably located at /path/to/tomcat7/lib/

The Default Usergrid Properties File

You should review the defaults in the above file. To get you started, let’s look at a minimal example properties file that you can edit and use as your own.

Please note that if you are installing Usergrid on the same machine as Cassandra Server, then set the following property to true

   #Tell Usergrid that Cassandra is not embedded.

Stack STEP #5: Deploy ROOT.war to Tomcat

The next step is to deploy the Usergrid Stack software to Tomcat. There are a variety of ways of doing this and the simplest is probably to place the Usergrid Stack ROOT.war file into the Tomcat webapps directory, then restart Tomcat.

Look for messages like this, which indicate that the ROOT.war file was deployed:

INFO: Starting service Catalina
Jan 29, 2016 1:00:32 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.59
Jan 29, 2016 1:00:32 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive /usr/share/tomcat7/webapps/ROOT.war

Does it work?

you can use curl:

curl http://localhost:8080/status

If you get a JSON file of status data, then you’re ready to move to the next step. You should see a response that begins like this:

“timestamp” : 1454090178953,
“duration” : 10,
“status” : {
“started” : 1453957327516,
“uptime” : 132851437,
“version” : “201601240200-595955dff9ee4a706de9d97b86c5f0636fe24b43”,
“cassandraAvailable” : true,
“cassandraStatus” : “GREEN”,
“managementAppIndexStatus” : “GREEN”,
“queueDepth” : 0,
“org.apache.usergrid.count.AbstractBatcher” : {
“add_invocation” : {
“type” : “timer”,
“unit” : “microseconds”,
… etc. …

Initialize the Usergrid Database

Next, you must initialize the Usergrid database, index and query systems.

To do this you must issue a series of HTTP operations using the superuser credentials. You can only do this if Usergrid is configured to allow superused login via this property usergrid.sysadmin.login.allowed=true and if you used the above example properties file, it is allowed.

The three operation you must perform are expressed by the curl commands below and, of course, you will have ot change the password test to match the superuser password that you set in your Usergrid properties file.

curl -X PUT http://localhost:8080/system/database/setup -u superuser:test
curl -X PUT http://localhost:8080/system/database/bootstrap -u superuser:test
curl -X GET http://localhost:8080/system/superuser/setup -u superuser:test

When you issue each of those curl commands, you should see a success message like this:

“action” : “cassandra setup”,
“status” : “ok”,
“timestamp” : 1454100922067,
“duration” : 374

Now that you’ve gotten Usergrid up and running, you’re ready to deploy the Usergrid Portal.

Deploying the Usergrid Portal

The Usergrid Portal is an HTML5/JavaScript application, a bunch of static files that can be deployed to any web server, e.g. Apache HTTPD or Tomcat.

To deploy the Portal to a web server, you will un-tar the usergrid-portal.tar file into directory that serves as the root directory of your web pages.

Once you have done that there is one more step. You need to configure the portal so that it can find the Usergrid stack. You do that by editing the portal/config.js and changing this line:

Usergrid.overrideUrl = ’http://localhost:8080/‘;

To set the hostname that you will be using for your Usergrid installation.

I have deployed a sample instance and tested the same. You can find the system ready configurations in TechUtils repository