Apache Solr vs Elasticsearch: The Feature Smackdown

API

Feature Solr 5.3.0 ElasticSearch 2.0
Format XML,CSV,JSON JSON
HTTP REST API
Binary API SolrJ TransportClient, Thrift (through a plugin)
JMX support ES specific stats are exposed through the REST API
Official client libraries Java Java, Groovy, PHP, Ruby, Perl, Python, .NET, JavascriptOfficial list of clients
Community client libraries PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, WordPress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, WordPress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR etcComplete list
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via plugin)

Infrastructure

Feature Solr 5.3.0 ElasticSearch 2.0
Master-slave replication Only in non-SolrCloud. In SolrCloud, behaves identically to ES. Not an issue because shards are replicated across nodes.
Integrated snapshot and restore Filesystem Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

Feature Solr 5.3.0 ElasticSearch 2.0
Data Import DataImportHandler – JDBC, CSV, XML, Tika, URL, Flat File [DEPRECATED in 2.x] Rivers modules – ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates with stored fields with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Synonyms Supports Solr and Wordnet synonym format
Multiple indexes
Near-Realtime Search/Indexing
Complex documents
Schemaless 4.4+
Multiple document types per schema One set of fields per schema, one schema per core
Online schema changes Schemaless mode or via dynamic fields. Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying via multi-fields
Hash-based deduplication Murmur plugin or ER plugin

Searching

Feature Solr 5.3.0 ElasticSearch 2.0
Lucene Query parsing
Structured Query DSL Need to programmatically create queries if going beyond Lucene query syntax.
Span queries via SOLR-2703
Spatial/geo search
Multi-point spatial search
Faceting Top N term accuracy can be controlled with shard_size
Advanced Faceting New JSON faceting API blog post
Geo-distance Faceting
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries JIRA issue Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping
Spellcheck Suggest API
Autocomplete
Query elevation workaround
Joins Joined index has to be single-shard and replicated across all nodes. via has_children and top_children queries
Resultset Scrolling New to 4.7.0 via scan search type
Filter queries also supports filtering by native scripts
Filter execution order local params and cache property
Alternative QueryParsers DisMax, eDisMax query_string, dis_max, match, multi_match etc
Negative boosting but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload Warmers API
Term Vectors API

Customizability

Feature Solr 5.3.0 ElasticSearch 2.0
Pluggable API endpoints
Pluggable search workflow via SearchComponents
Pluggable update workflow
Pluggable Analyzers/Tokenizers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing
Pluggable webapps site plugin
Automated plugin installation Installable from GitHub, maven, sonatype or elasticsearch.org

 

Full article



How to find e-mail address source

Some times we need to find the source location of the sender of an email, or even the IP Address of the sender. This may not be a general case requirement, but even for the fun of it, or just out of curiosity, the recipient would want to search for such information.

One of our twitter followers asked us the same question on twitter

I decided to try this on one of the spam advertisers:

In the following steps you’ll learn how to find and copy an email header and paste it into the Trace Email Analyzer to get the sender’s IP address and track the source.

Would you like to track down (or trace) where an email that you received came from?

This Trace Email tool can help you do precisely that. It works by examining the header that is a part of the emails you receive to find the IP address. If you read the IP Lookup page, you’ll get a clear idea of what information an IP address can reveal.

(A header is the unseen part of every sent and received email. To learn a little bit more on headers, click here. You can see an example of a header at the end of this article.)

What email provider do you use?

To find the IP address of a received email you’re curious about, open the email and look for the header details. How you find that email’s header depends on the email program you use. Do you use Gmail or Yahoo? Hotmail or Outlook?

For example, if you’re a Gmail user, here are the steps you’d take:

  1. Open the message you want to view
  2. Click the down arrow next to the “Reply” link
  3. Select “Show Original” to open a new window with the full headers

STEPS TO TRACING AN EMAIL:

  1. Get instructions for locating a header for your email provider here
  2. Open the email you want to trace and find its header
  3. Copy the header, then paste it into the Trace Email Analyzer below
  4. Press the “Get Source” button
  5. Scroll down below the box for the Trace Email results!

You should know that in some instances people send emails with false or “forged” headers, which are common in spam and unwanted or even malicious e-mail. Our Trace Email tool does not and cannot detect forged e-mail. That’s why that person forged the header to begin with!

Results:


source: http://whatismyipaddress.com/





What is the dumbest code ever made that has become famous?

BogoSort is a moderately well-known algorithm to sort a list. Here’s how it works:

  1. Put the elements of the list in a random order.
  2. Check if the list is sorted. If not, start over.

BogoSort has an average running time of O((n+1)!), which is not very good. It is also the rare algorithm which has NO worst-case running time; if the input has at least two elements, it is possible for the algorithm to run for any amount of time.

Source: Quora


CodeEval: Penultimate Word

Challenge Description:

Write a program which finds the next-to-last word in a string.

Input Sample:

Your program should accept as its first argument a path to a filename. Input example is the following

some line with text
another line

Each line has more than one word.

Output Sample:

Print the next-to-last word in the following way.

with
another

Solution: