#StackBounty: #performance #date #elasticsearch #filter Do additional date range filters increase the performance?

Bounty: 50

To an existing, large ElasticSearch 5 index, I want to add a date field, containing the date of the indexation of each document. Afterwards I want to query this index, to return all documents, created in the last minute.

In the ElasticSearch Ultimative Guide for version 1 it is mentioned, that adding additional filters for day, month and/or year can improve the performance drastically. Newer versions of the guide do not say so anymore.

Can I gain performance in ElasticSearch 5 with adding additional date filters?


Get this bounty!!!

#StackBounty: #elasticsearch #spring-boot #spring-data-elasticsearch ElasticSearch – Boosting based on depth in a recursive structure

Bounty: 50

I am using Elastic search 2.4.4(compatible with spring boot 1.5.2).

I have a document object which has the following structure :

{
    id : 1,
    title : Doc title
    //some more metadata
    sections :[
        {
           "id" : 2,
           "title: Sec title 1,
           sections:[...]
        },{
           id : 3,
           title: Sec title 2,
           sections:[...]
        }

     ]
}

Basically I want to make the titles in the document searchable(all document title, section titles and subsection titles at any level) and I want to be able to score the documents based on the level at which they match in the tree hierarchy.

My initial thought was using some strcture like this :

 {
    titles:[
          {
           title : doc title,
           depth : 0
          },
          {
           title : sec title 1,
           depth : 1
          },
          {
           title : sec title 2,
           depth : 1
          },
          ......
   ] 
 }

I would like to rank the documents based on the depth at which there is match(higher the depth, lower is the score).

I know the basic boosting based on the field but,

is there a way can do this in elastic search?

OR

Is it possible to do it by changing the structure?


Get this bounty!!!

#StackBounty: #javascript #elasticsearch #browser #kibana Kibana stuck on loading screen

Bounty: 50

Kibana is not starting up properly. When I open up the console it appears to be a javascript resource issue. When I open the js files directly (clicking on their link in the console) it appears they are incomplete and have been abruptly been cut off. Not sure if this is a browser file limit or somehow my files have been cut off? Please see the images below to show you what Im seeing.

Kibana stuck on loading screen

File as seen in chrome. This is the very bottom of the file as per how chrome loads it.

commons.bundle.js

I have restarted kibana to see if that would resolve it, no luck.

I think browsers have a max line limit in js files. I am not sure why kibana hasn’t minified the js files? has it started up in some dev mode?

question summary

I guess I have discovered the reason for kibana not loading is because of the js not fully loading, this would change my question to how can I get all of my javascript to load?

Update

I have located the JS files in the kibana bundles folder and found that the file is fully intact. It is indeed a browser loading complete file issue. I’m confused why suddenly those files are too long to be loaded by the browser? Was working fine a fortnight ago. Still trying to work out how I can get chrome to load the files.

As suggested by @asettouf I have removed(backed up) bundles folder in the /opt/kibana/optimize directory and started kibana up again. This did re-generate the bundles folder but the files are identical, meaning I still have the same issue. How come Kibana is not minifying the js when it bundles the files for caching?

My kibana.yml. I think it is cleaner to paste a link to it:

http://www.heypasteit.com/clip/O8HUN

went back turned on verbose logging and this is my output from deleting optimize folder and restarting. nothing stands out as an error message to me.

/var/log/kibana/kibana.log

replaced hostname with localhost for privacy and security reasons

http://www.heypasteit.com/clip/OA4OR

I think this is an error with the webpack module not compiling the JS correctly. however i dont know enough about the module to debug it.

the files in question in the optimize folder are:

commons.bundle.js which is 65723 lines

kibana.bundle.js at 108950 lines

These are far from optimized and the content inside the files are not minified.

Result of curl -v localhost:5601

http://www.heypasteit.com/clip/OEKEX


Get this bounty!!!

Installing Apache UserGrid on linux

About the Project

Apache Usergrid is an open-source Backend-as-a-Service (BaaS or mBaaS) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services and retrieval features like:

  • User Registration & Management
  • Data Storage
  • File Storage
  • Queues
  • Full Text Search
  • Geolocation Search
  • Joins

It is a multi-tenant system designed for deployment to public cloud environments (such as Amazon Web Services, Rackspace, etc.) or to run on traditional server infrastructures so that anyone can run their own private BaaS deployment.

For architects and back-end teams, it aims to provide a distributed, easily extendable, operationally predictable and highly scalable solution. For front-end developers, it aims to simplify the development process by enabling them to rapidly build and operate mobile and web applications without requiring backend expertise.

Usergrid 2.1.0 Deployment Guide

Though the Usergrid Deployment guide seems to be simple enough, I faced certain hiccups and it took me about 4 days to figure out what I was doing wrong.

This document explains how to deploy the Usergrid v2.1.0 Backend-as-a-Service (BaaS), which comprises the Usergrid Stack, a Java web application, and the Usergrid Portal, which is an HTML5/JavaScript application.

Prerequsites

Below are the software requirements for Usergrid 2.1.0 Stack and Portal. You can install them all on one computer for development purposes, and for deployment you can deploy them separately using clustering.

Linux or a UNIX-like system (Usergrid may run on Windows, but we haven’t tried it)

Download the Apache Usergrid 2.1.0 binary release from the official Usergrid releases page:

After untarring the files that you need for deploying Usergrid Stack and Portal are ROOT.war and usergrid-portal.tar.

Stack STEP #1: Setup Cassandra

As mentioned in prerequisites, follow the installation guide given in link

Usergrid uses Cassandra’s Thrift protocol
Before starting cassandra, on Cassandra 2.x releases you MUST enable Thrift by setting start_rpc in your cassandra.yaml file:

    #Whether to start the thrift rpc server.
    start_rpc: true

Note:DataStax no longer supports the DataStax Community version of Apache Cassandra or the DataStax Distribution of Apache Cassandra. It is best to follow the Apache Documentation

Once you are up and running make a note of these things:

  • The name of the Cassandra cluster
  • Hostname or IP address of each Cassandra node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for Cassandra RPC (the default is 9160)
  • Replication factor of Cassandra cluster

Stack STEP #2: Setup ElasticSearch

Usergrid also needs access to at least one ElasticSearch node. As with Cassandra, you can setup single ElasticSearch node on your computer, and you should run a cluster in production.

Steps:

  • Download and unzip Elasticsearch
  • Run bin/elasticsearch (or bin\elasticsearch -d on Linux as Background Process) (or bin\elasticsearch.bat on Windows)
  • Run curl http://localhost:9200/

Once you are up and running make a note of these things:

  • The name of the ElasticSearch cluster
  • Hostname or IP address of each ElasticSearch node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for ElasticSearch protocol (the default is 9200)

Stack STEP #3: Setup Tomcat

The Usergrid Stack is contained in a file named ROOT.war, a standard Java EE WAR ready for deployment to Tomcat. On each machine that will run the Usergrid Stack you must install the Java SE 8 JDK and Tomcat 7+.

Stack STEP #4: Configure Usergrid Stack

You must create a Usergrid properties file called usergrid-deployment.properties. The properties in this file tell Usergrid how to communicate with Cassandra and ElasticSearch, and how to form URLs using the hostname you wish to use for Usegrid. There are many properties that you can set to configure Usergrid.

Once you have created your Usergrid properties file, place it in the Tomcat lib directory. On a Linux system, that directory is probably located at /path/to/tomcat7/lib/

The Default Usergrid Properties File

You should review the defaults in the above file. To get you started, let’s look at a minimal example properties file that you can edit and use as your own.

Please note that if you are installing Usergrid on the same machine as Cassandra Server, then set the following property to true

   #Tell Usergrid that Cassandra is not embedded.
   cassandra.embedded=true

Stack STEP #5: Deploy ROOT.war to Tomcat

The next step is to deploy the Usergrid Stack software to Tomcat. There are a variety of ways of doing this and the simplest is probably to place the Usergrid Stack ROOT.war file into the Tomcat webapps directory, then restart Tomcat.

Look for messages like this, which indicate that the ROOT.war file was deployed:

INFO: Starting service Catalina
Jan 29, 2016 1:00:32 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.59
Jan 29, 2016 1:00:32 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive /usr/share/tomcat7/webapps/ROOT.war

Does it work?

you can use curl:

curl http://localhost:8080/status

If you get a JSON file of status data, then you’re ready to move to the next step. You should see a response that begins like this:

{
“timestamp” : 1454090178953,
“duration” : 10,
“status” : {
“started” : 1453957327516,
“uptime” : 132851437,
“version” : “201601240200-595955dff9ee4a706de9d97b86c5f0636fe24b43”,
“cassandraAvailable” : true,
“cassandraStatus” : “GREEN”,
“managementAppIndexStatus” : “GREEN”,
“queueDepth” : 0,
“org.apache.usergrid.count.AbstractBatcher” : {
“add_invocation” : {
“type” : “timer”,
“unit” : “microseconds”,
… etc. …

Initialize the Usergrid Database

Next, you must initialize the Usergrid database, index and query systems.

To do this you must issue a series of HTTP operations using the superuser credentials. You can only do this if Usergrid is configured to allow superused login via this property usergrid.sysadmin.login.allowed=true and if you used the above example properties file, it is allowed.

The three operation you must perform are expressed by the curl commands below and, of course, you will have ot change the password test to match the superuser password that you set in your Usergrid properties file.

curl -X PUT http://localhost:8080/system/database/setup -u superuser:test
curl -X PUT http://localhost:8080/system/database/bootstrap -u superuser:test
curl -X GET http://localhost:8080/system/superuser/setup -u superuser:test

When you issue each of those curl commands, you should see a success message like this:

{
“action” : “cassandra setup”,
“status” : “ok”,
“timestamp” : 1454100922067,
“duration” : 374
}

Now that you’ve gotten Usergrid up and running, you’re ready to deploy the Usergrid Portal.

Deploying the Usergrid Portal

The Usergrid Portal is an HTML5/JavaScript application, a bunch of static files that can be deployed to any web server, e.g. Apache HTTPD or Tomcat.

To deploy the Portal to a web server, you will un-tar the usergrid-portal.tar file into directory that serves as the root directory of your web pages.

Once you have done that there is one more step. You need to configure the portal so that it can find the Usergrid stack. You do that by editing the portal/config.js and changing this line:

Usergrid.overrideUrl = ’http://localhost:8080/‘;

To set the hostname that you will be using for your Usergrid installation.

I have deployed a sample instance and tested the same. You can find the system ready configurations in TechUtils repository

Apache Solr vs Elasticsearch: The Feature Smackdown

API

Feature Solr 5.3.0 ElasticSearch 2.0
Format XML,CSV,JSON JSON
HTTP REST API
Binary API SolrJ TransportClient, Thrift (through a plugin)
JMX support ES specific stats are exposed through the REST API
Official client libraries Java Java, Groovy, PHP, Ruby, Perl, Python, .NET, JavascriptOfficial list of clients
Community client libraries PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, WordPress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, WordPress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR etcComplete list
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via plugin)

Infrastructure

Feature Solr 5.3.0 ElasticSearch 2.0
Master-slave replication Only in non-SolrCloud. In SolrCloud, behaves identically to ES. Not an issue because shards are replicated across nodes.
Integrated snapshot and restore Filesystem Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

Feature Solr 5.3.0 ElasticSearch 2.0
Data Import DataImportHandler – JDBC, CSV, XML, Tika, URL, Flat File [DEPRECATED in 2.x] Rivers modules – ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates with stored fields with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Synonyms Supports Solr and Wordnet synonym format
Multiple indexes
Near-Realtime Search/Indexing
Complex documents
Schemaless 4.4+
Multiple document types per schema One set of fields per schema, one schema per core
Online schema changes Schemaless mode or via dynamic fields. Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying via multi-fields
Hash-based deduplication Murmur plugin or ER plugin

Searching

Feature Solr 5.3.0 ElasticSearch 2.0
Lucene Query parsing
Structured Query DSL Need to programmatically create queries if going beyond Lucene query syntax.
Span queries via SOLR-2703
Spatial/geo search
Multi-point spatial search
Faceting Top N term accuracy can be controlled with shard_size
Advanced Faceting New JSON faceting API blog post
Geo-distance Faceting
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries JIRA issue Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping
Spellcheck Suggest API
Autocomplete
Query elevation workaround
Joins Joined index has to be single-shard and replicated across all nodes. via has_children and top_children queries
Resultset Scrolling New to 4.7.0 via scan search type
Filter queries also supports filtering by native scripts
Filter execution order local params and cache property
Alternative QueryParsers DisMax, eDisMax query_string, dis_max, match, multi_match etc
Negative boosting but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload Warmers API
Term Vectors API

Customizability

Feature Solr 5.3.0 ElasticSearch 2.0
Pluggable API endpoints
Pluggable search workflow via SearchComponents
Pluggable update workflow
Pluggable Analyzers/Tokenizers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing
Pluggable webapps site plugin
Automated plugin installation Installable from GitHub, maven, sonatype or elasticsearch.org

 

Full article