#StackBounty: #java #elasticsearch #rest-client Elasticsearch Rest Client Still Giving IOException : Too Many Open Files

Bounty: 100

This is a follow up to the solution which was provided to me on this previous post:

How to Properly Close Raw RestClient When Using Elastic Search 5.5.0 for Optimal Performance?

This same exact error message came back!

2017-09-29 18:50:22.497 ERROR 11099 --- [8080-Acceptor-0] org.apache.tomcat.util.net.NioEndpoint   : Socket accept failed

java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[na:1.8.0_141]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[na:1.8.0_141]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[na:1.8.0_141]
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:453) ~[tomcat-embed-core-8.5.15.jar!/:8.5.15]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]

2017-09-29 18:50:23.885  INFO 11099 --- [Thread-3] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@5387f9e0: startup date [Wed Sep 27 03:14:35 UTC 2017]; root of context hierarchy
2017-09-29 18:50:23.890  INFO 11099 --- [Thread-3] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 2147483647
2017-09-29 18:50:23.891  WARN 11099 --- [Thread-3] o.s.c.support.DefaultLifecycleProcessor  : Failed to stop bean 'documentationPluginsBootstrapper'

    ... 7 common frames omitted

2017-09-29 18:50:53.891  WARN 11099 --- [Thread-3] o.s.c.support.DefaultLifecycleProcessor  : Failed to shut down 1 bean with phase value 2147483647 within timeout of 30000: [documentationPluginsBootstrapper]
2017-09-29 18:50:53.891  INFO 11099 --- [Thread-3] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
2017-09-29 18:50:53.894  INFO 11099 --- [Thread-3] com.app.controller.SearchController  : Closing the ES REST client

I tried using the solution from the previous post.

ElasticsearchConfig:

@Configuration
public class ElasticsearchConfig {

@Value("${elasticsearch.host}")
private String host;

@Value("${elasticsearch.port}")
private int port;

@Bean
public RestClient restClient() {
    return RestClient.builder(new HttpHost(host, port))
    .setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {
        @Override
        public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder requestConfigBuilder) {
            return requestConfigBuilder.setConnectTimeout(5000).setSocketTimeout(60000);
        }
    }).setMaxRetryTimeoutMillis(60000).build();
}

SearchController:

@RestController
@RequestMapping("/api/v1")
public class SearchController {

    @Autowired
    private RestClient restClient;

    @RequestMapping(value = "/search", method = RequestMethod.GET, produces="application/json" )
    public ResponseEntity<Object> getSearchQueryResults(@RequestParam(value = "criteria") String criteria) throws IOException {

        // Setup HTTP Headers
        HttpHeaders headers = new HttpHeaders();
        headers.add("Content-Type", "application/json");

        // Setup query and send and return ResponseEntity...

        Response response = this.restClient.performRequest(...);
    }

    @PreDestroy
    public void cleanup() {
        try {
            logger.info("Closing the ES REST client");
            this.restClient.close();
        } 
        catch (IOException ioe) {
            logger.error("Problem occurred when closing the ES REST client", ioe);
        }
    }
}    

pom.xml:

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>1.5.4.RELEASE</version>
</parent>

<dependencies>
    <!-- Spring -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>

    <!-- Elasticsearch -->
    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>5.5.0</version>
    </dependency>

    <dependency>
        <groupId>org.elasticsearch.client</groupId>
        <artifactId>transport</artifactId>
        <version>5.5.0</version>
    </dependency>

    <!-- Apache Commons -->
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-lang3</artifactId>
        <version>3.6</version>
    </dependency>

    <!-- Log4j -->
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
</dependencies>

This makes me think that the RestClient was never explicitly closing the connection, in the first place…

And this is surprising since my Elasticsearch Spring Boot based Microservice is load balanced on two different AWS EC-2 Servers.

That exception occurreded like 2000 times reported by the log file and only in the end did the preDestroy() close the client. See the INFO from the @PreDestroy() cleanup method being logged at the end of the StackTrace.

Do I need to explicitly put a finally clause inside the SearchController and close the RestClient connection explicitly?

It’s really critical that this IOException doesn’t happen again because this Search Microservice is dependent on a lot of different mobile clients (iOS & Android).

Need this to be fault tolerant and scalable… Or, at the very least, not to break.

The only reason this is in the bottom of the log file:

2017-09-29 18:50:53.894  INFO 11099 --- [Thread-3] com.app.controller.SearchController : Closing the ES REST client

Is because I did this:

kill -3 jvm_pid

Should I keep the @PreDestory cleanup() method but change the contents of my SearchController.getSearchResults() method to reflect something like this:

@RequestMapping(value = "/search", method = RequestMethod.GET, produces="application/json" )
public ResponseEntity<Object> getSearchQueryResults(@RequestParam(value = "criteria") String criteria) throws IOException {
    // Setup HTTP Headers
    HttpHeaders headers = new HttpHeaders();
    headers.add("Content-Type", "application/json");

    // Setup query and send and return ResponseEntity...
    Response response = null;

    try {
        // Submit Query and Obtain Response
        response = this.restClient.performRequest("POST", endPoint, Collections.singletonMap("pretty", "true"), entity);
    } 
    catch(IOException ioe) {
        logger.error("Exception when performing POST request " + ioe);
    }
    finally {
        this.restClient.close();
    }
    // return response as EsResponse();
}

This way the RestClient connection is always closing…

Would appreciate if someone can help me with this.


Get this bounty!!!

#StackBounty: #elasticsearch elasticsearch: Get all documents that share the same top-most ancestor

Bounty: 50

I am trying to get all documents that share the same top-most ancestor, where one child can be the parent, grandparent, grand grandparent etc. of multiple docs.

So let’s say I have a structur like this (borrowed from https://www.elastic.co/guide/en/elasticsearch/reference/5.6/parent-join.html):

   (parent)
   question
    /    
   /      
comment  answer
(child)  (child)

In code:

PUT my_index
{
  "settings": {
    "mapping.single_type": true
  },
  "mappings": {
    "doc": {
      "properties": {
        "my_join_field": {
          "type": "join",
          "relations": {
            "question": ["answer", "comment"]
          }
        }
      }
    }
  }
}

However, one can answer comments and comment answers theoretically forever. So say I have a question, that is structured like this:

                               (id: 1)
                               question
                              /        
                             /          
                        answer          answer
                       (id: 5)          (id: 8)
                       /                   |
                      /                    |
                   comment  answer        answer
                  (id: 15) (id: 12)      (id: 9)
                  /            |          /     
                 /             |         /      
              answer  answer  comment  answer  answer 
             (id: 16)(id: 17) (id: 19) (id: 10)(id: 11)

How do I get all the documents (ids 1, 5, 8, 9, 10, 11, 12, 15, 16, 17, 19), only knowing id 9?


Get this bounty!!!

#StackBounty: #performance #date #elasticsearch #filter Do additional date range filters increase the performance?

Bounty: 50

To an existing, large ElasticSearch 5 index, I want to add a date field, containing the date of the indexation of each document. Afterwards I want to query this index, to return all documents, created in the last minute.

In the ElasticSearch Ultimative Guide for version 1 it is mentioned, that adding additional filters for day, month and/or year can improve the performance drastically. Newer versions of the guide do not say so anymore.

Can I gain performance in ElasticSearch 5 with adding additional date filters?


Get this bounty!!!

#StackBounty: #elasticsearch #spring-boot #spring-data-elasticsearch ElasticSearch – Boosting based on depth in a recursive structure

Bounty: 50

I am using Elastic search 2.4.4(compatible with spring boot 1.5.2).

I have a document object which has the following structure :

{
    id : 1,
    title : Doc title
    //some more metadata
    sections :[
        {
           "id" : 2,
           "title: Sec title 1,
           sections:[...]
        },{
           id : 3,
           title: Sec title 2,
           sections:[...]
        }

     ]
}

Basically I want to make the titles in the document searchable(all document title, section titles and subsection titles at any level) and I want to be able to score the documents based on the level at which they match in the tree hierarchy.

My initial thought was using some strcture like this :

 {
    titles:[
          {
           title : doc title,
           depth : 0
          },
          {
           title : sec title 1,
           depth : 1
          },
          {
           title : sec title 2,
           depth : 1
          },
          ......
   ] 
 }

I would like to rank the documents based on the depth at which there is match(higher the depth, lower is the score).

I know the basic boosting based on the field but,

is there a way can do this in elastic search?

OR

Is it possible to do it by changing the structure?


Get this bounty!!!

#StackBounty: #javascript #elasticsearch #browser #kibana Kibana stuck on loading screen

Bounty: 50

Kibana is not starting up properly. When I open up the console it appears to be a javascript resource issue. When I open the js files directly (clicking on their link in the console) it appears they are incomplete and have been abruptly been cut off. Not sure if this is a browser file limit or somehow my files have been cut off? Please see the images below to show you what Im seeing.

Kibana stuck on loading screen

File as seen in chrome. This is the very bottom of the file as per how chrome loads it.

commons.bundle.js

I have restarted kibana to see if that would resolve it, no luck.

I think browsers have a max line limit in js files. I am not sure why kibana hasn’t minified the js files? has it started up in some dev mode?

question summary

I guess I have discovered the reason for kibana not loading is because of the js not fully loading, this would change my question to how can I get all of my javascript to load?

Update

I have located the JS files in the kibana bundles folder and found that the file is fully intact. It is indeed a browser loading complete file issue. I’m confused why suddenly those files are too long to be loaded by the browser? Was working fine a fortnight ago. Still trying to work out how I can get chrome to load the files.

As suggested by @asettouf I have removed(backed up) bundles folder in the /opt/kibana/optimize directory and started kibana up again. This did re-generate the bundles folder but the files are identical, meaning I still have the same issue. How come Kibana is not minifying the js when it bundles the files for caching?

My kibana.yml. I think it is cleaner to paste a link to it:

http://www.heypasteit.com/clip/O8HUN

went back turned on verbose logging and this is my output from deleting optimize folder and restarting. nothing stands out as an error message to me.

/var/log/kibana/kibana.log

replaced hostname with localhost for privacy and security reasons

http://www.heypasteit.com/clip/OA4OR

I think this is an error with the webpack module not compiling the JS correctly. however i dont know enough about the module to debug it.

the files in question in the optimize folder are:

commons.bundle.js which is 65723 lines

kibana.bundle.js at 108950 lines

These are far from optimized and the content inside the files are not minified.

Result of curl -v localhost:5601

http://www.heypasteit.com/clip/OEKEX


Get this bounty!!!

Installing Apache UserGrid on linux

About the Project

Apache Usergrid is an open-source Backend-as-a-Service (BaaS or mBaaS) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services and retrieval features like:

  • User Registration & Management
  • Data Storage
  • File Storage
  • Queues
  • Full Text Search
  • Geolocation Search
  • Joins

It is a multi-tenant system designed for deployment to public cloud environments (such as Amazon Web Services, Rackspace, etc.) or to run on traditional server infrastructures so that anyone can run their own private BaaS deployment.

For architects and back-end teams, it aims to provide a distributed, easily extendable, operationally predictable and highly scalable solution. For front-end developers, it aims to simplify the development process by enabling them to rapidly build and operate mobile and web applications without requiring backend expertise.

Usergrid 2.1.0 Deployment Guide

Though the Usergrid Deployment guide seems to be simple enough, I faced certain hiccups and it took me about 4 days to figure out what I was doing wrong.

This document explains how to deploy the Usergrid v2.1.0 Backend-as-a-Service (BaaS), which comprises the Usergrid Stack, a Java web application, and the Usergrid Portal, which is an HTML5/JavaScript application.

Prerequsites

Below are the software requirements for Usergrid 2.1.0 Stack and Portal. You can install them all on one computer for development purposes, and for deployment you can deploy them separately using clustering.

Linux or a UNIX-like system (Usergrid may run on Windows, but we haven’t tried it)

Download the Apache Usergrid 2.1.0 binary release from the official Usergrid releases page:

After untarring the files that you need for deploying Usergrid Stack and Portal are ROOT.war and usergrid-portal.tar.

Stack STEP #1: Setup Cassandra

As mentioned in prerequisites, follow the installation guide given in link

Usergrid uses Cassandra’s Thrift protocol
Before starting cassandra, on Cassandra 2.x releases you MUST enable Thrift by setting start_rpc in your cassandra.yaml file:

    #Whether to start the thrift rpc server.
    start_rpc: true

Note:DataStax no longer supports the DataStax Community version of Apache Cassandra or the DataStax Distribution of Apache Cassandra. It is best to follow the Apache Documentation

Once you are up and running make a note of these things:

  • The name of the Cassandra cluster
  • Hostname or IP address of each Cassandra node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for Cassandra RPC (the default is 9160)
  • Replication factor of Cassandra cluster

Stack STEP #2: Setup ElasticSearch

Usergrid also needs access to at least one ElasticSearch node. As with Cassandra, you can setup single ElasticSearch node on your computer, and you should run a cluster in production.

Steps:

  • Download and unzip Elasticsearch
  • Run bin/elasticsearch (or bin\elasticsearch -d on Linux as Background Process) (or bin\elasticsearch.bat on Windows)
  • Run curl http://localhost:9200/

Once you are up and running make a note of these things:

  • The name of the ElasticSearch cluster
  • Hostname or IP address of each ElasticSearch node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for ElasticSearch protocol (the default is 9200)

Stack STEP #3: Setup Tomcat

The Usergrid Stack is contained in a file named ROOT.war, a standard Java EE WAR ready for deployment to Tomcat. On each machine that will run the Usergrid Stack you must install the Java SE 8 JDK and Tomcat 7+.

Stack STEP #4: Configure Usergrid Stack

You must create a Usergrid properties file called usergrid-deployment.properties. The properties in this file tell Usergrid how to communicate with Cassandra and ElasticSearch, and how to form URLs using the hostname you wish to use for Usegrid. There are many properties that you can set to configure Usergrid.

Once you have created your Usergrid properties file, place it in the Tomcat lib directory. On a Linux system, that directory is probably located at /path/to/tomcat7/lib/

The Default Usergrid Properties File

You should review the defaults in the above file. To get you started, let’s look at a minimal example properties file that you can edit and use as your own.

Please note that if you are installing Usergrid on the same machine as Cassandra Server, then set the following property to true

   #Tell Usergrid that Cassandra is not embedded.
   cassandra.embedded=true

Stack STEP #5: Deploy ROOT.war to Tomcat

The next step is to deploy the Usergrid Stack software to Tomcat. There are a variety of ways of doing this and the simplest is probably to place the Usergrid Stack ROOT.war file into the Tomcat webapps directory, then restart Tomcat.

Look for messages like this, which indicate that the ROOT.war file was deployed:

INFO: Starting service Catalina
Jan 29, 2016 1:00:32 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.59
Jan 29, 2016 1:00:32 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive /usr/share/tomcat7/webapps/ROOT.war

Does it work?

you can use curl:

curl http://localhost:8080/status

If you get a JSON file of status data, then you’re ready to move to the next step. You should see a response that begins like this:

{
“timestamp” : 1454090178953,
“duration” : 10,
“status” : {
“started” : 1453957327516,
“uptime” : 132851437,
“version” : “201601240200-595955dff9ee4a706de9d97b86c5f0636fe24b43”,
“cassandraAvailable” : true,
“cassandraStatus” : “GREEN”,
“managementAppIndexStatus” : “GREEN”,
“queueDepth” : 0,
“org.apache.usergrid.count.AbstractBatcher” : {
“add_invocation” : {
“type” : “timer”,
“unit” : “microseconds”,
… etc. …

Initialize the Usergrid Database

Next, you must initialize the Usergrid database, index and query systems.

To do this you must issue a series of HTTP operations using the superuser credentials. You can only do this if Usergrid is configured to allow superused login via this property usergrid.sysadmin.login.allowed=true and if you used the above example properties file, it is allowed.

The three operation you must perform are expressed by the curl commands below and, of course, you will have ot change the password test to match the superuser password that you set in your Usergrid properties file.

curl -X PUT http://localhost:8080/system/database/setup -u superuser:test
curl -X PUT http://localhost:8080/system/database/bootstrap -u superuser:test
curl -X GET http://localhost:8080/system/superuser/setup -u superuser:test

When you issue each of those curl commands, you should see a success message like this:

{
“action” : “cassandra setup”,
“status” : “ok”,
“timestamp” : 1454100922067,
“duration” : 374
}

Now that you’ve gotten Usergrid up and running, you’re ready to deploy the Usergrid Portal.

Deploying the Usergrid Portal

The Usergrid Portal is an HTML5/JavaScript application, a bunch of static files that can be deployed to any web server, e.g. Apache HTTPD or Tomcat.

To deploy the Portal to a web server, you will un-tar the usergrid-portal.tar file into directory that serves as the root directory of your web pages.

Once you have done that there is one more step. You need to configure the portal so that it can find the Usergrid stack. You do that by editing the portal/config.js and changing this line:

Usergrid.overrideUrl = ’http://localhost:8080/‘;

To set the hostname that you will be using for your Usergrid installation.

I have deployed a sample instance and tested the same. You can find the system ready configurations in TechUtils repository

Apache Solr vs Elasticsearch: The Feature Smackdown

API

Feature Solr 5.3.0 ElasticSearch 2.0
Format XML,CSV,JSON JSON
HTTP REST API
Binary API SolrJ TransportClient, Thrift (through a plugin)
JMX support ES specific stats are exposed through the REST API
Official client libraries Java Java, Groovy, PHP, Ruby, Perl, Python, .NET, JavascriptOfficial list of clients
Community client libraries PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, WordPress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, WordPress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR etcComplete list
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via plugin)

Infrastructure

Feature Solr 5.3.0 ElasticSearch 2.0
Master-slave replication Only in non-SolrCloud. In SolrCloud, behaves identically to ES. Not an issue because shards are replicated across nodes.
Integrated snapshot and restore Filesystem Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

Feature Solr 5.3.0 ElasticSearch 2.0
Data Import DataImportHandler – JDBC, CSV, XML, Tika, URL, Flat File [DEPRECATED in 2.x] Rivers modules – ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates with stored fields with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Synonyms Supports Solr and Wordnet synonym format
Multiple indexes
Near-Realtime Search/Indexing
Complex documents
Schemaless 4.4+
Multiple document types per schema One set of fields per schema, one schema per core
Online schema changes Schemaless mode or via dynamic fields. Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying via multi-fields
Hash-based deduplication Murmur plugin or ER plugin

Searching

Feature Solr 5.3.0 ElasticSearch 2.0
Lucene Query parsing
Structured Query DSL Need to programmatically create queries if going beyond Lucene query syntax.
Span queries via SOLR-2703
Spatial/geo search
Multi-point spatial search
Faceting Top N term accuracy can be controlled with shard_size
Advanced Faceting New JSON faceting API blog post
Geo-distance Faceting
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries JIRA issue Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping
Spellcheck Suggest API
Autocomplete
Query elevation workaround
Joins Joined index has to be single-shard and replicated across all nodes. via has_children and top_children queries
Resultset Scrolling New to 4.7.0 via scan search type
Filter queries also supports filtering by native scripts
Filter execution order local params and cache property
Alternative QueryParsers DisMax, eDisMax query_string, dis_max, match, multi_match etc
Negative boosting but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload Warmers API
Term Vectors API

Customizability

Feature Solr 5.3.0 ElasticSearch 2.0
Pluggable API endpoints
Pluggable search workflow via SearchComponents
Pluggable update workflow
Pluggable Analyzers/Tokenizers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing
Pluggable webapps site plugin
Automated plugin installation Installable from GitHub, maven, sonatype or elasticsearch.org

 

Full article