Installing Apache UserGrid on linux

About the Project

Apache Usergrid is an open-source Backend-as-a-Service (BaaS or mBaaS) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services and retrieval features like:

  • User Registration & Management
  • Data Storage
  • File Storage
  • Queues
  • Full Text Search
  • Geolocation Search
  • Joins

It is a multi-tenant system designed for deployment to public cloud environments (such as Amazon Web Services, Rackspace, etc.) or to run on traditional server infrastructures so that anyone can run their own private BaaS deployment.

For architects and back-end teams, it aims to provide a distributed, easily extendable, operationally predictable and highly scalable solution. For front-end developers, it aims to simplify the development process by enabling them to rapidly build and operate mobile and web applications without requiring backend expertise.

Usergrid 2.1.0 Deployment Guide

Though the Usergrid Deployment guide seems to be simple enough, I faced certain hiccups and it took me about 4 days to figure out what I was doing wrong.

This document explains how to deploy the Usergrid v2.1.0 Backend-as-a-Service (BaaS), which comprises the Usergrid Stack, a Java web application, and the Usergrid Portal, which is an HTML5/JavaScript application.

Prerequsites

Below are the software requirements for Usergrid 2.1.0 Stack and Portal. You can install them all on one computer for development purposes, and for deployment you can deploy them separately using clustering.

Linux or a UNIX-like system (Usergrid may run on Windows, but we haven’t tried it)

Download the Apache Usergrid 2.1.0 binary release from the official Usergrid releases page:

After untarring the files that you need for deploying Usergrid Stack and Portal are ROOT.war and usergrid-portal.tar.

Stack STEP #1: Setup Cassandra

As mentioned in prerequisites, follow the installation guide given in link

Usergrid uses Cassandra’s Thrift protocol
Before starting cassandra, on Cassandra 2.x releases you MUST enable Thrift by setting start_rpc in your cassandra.yaml file:

    #Whether to start the thrift rpc server.
    start_rpc: true

Note:DataStax no longer supports the DataStax Community version of Apache Cassandra or the DataStax Distribution of Apache Cassandra. It is best to follow the Apache Documentation

Once you are up and running make a note of these things:

  • The name of the Cassandra cluster
  • Hostname or IP address of each Cassandra node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for Cassandra RPC (the default is 9160)
  • Replication factor of Cassandra cluster

Stack STEP #2: Setup ElasticSearch

Usergrid also needs access to at least one ElasticSearch node. As with Cassandra, you can setup single ElasticSearch node on your computer, and you should run a cluster in production.

Steps:

  • Download and unzip Elasticsearch
  • Run bin/elasticsearch (or bin\elasticsearch -d on Linux as Background Process) (or bin\elasticsearch.bat on Windows)
  • Run curl http://localhost:9200/

Once you are up and running make a note of these things:

  • The name of the ElasticSearch cluster
  • Hostname or IP address of each ElasticSearch node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for ElasticSearch protocol (the default is 9200)

Stack STEP #3: Setup Tomcat

The Usergrid Stack is contained in a file named ROOT.war, a standard Java EE WAR ready for deployment to Tomcat. On each machine that will run the Usergrid Stack you must install the Java SE 8 JDK and Tomcat 7+.

Stack STEP #4: Configure Usergrid Stack

You must create a Usergrid properties file called usergrid-deployment.properties. The properties in this file tell Usergrid how to communicate with Cassandra and ElasticSearch, and how to form URLs using the hostname you wish to use for Usegrid. There are many properties that you can set to configure Usergrid.

Once you have created your Usergrid properties file, place it in the Tomcat lib directory. On a Linux system, that directory is probably located at /path/to/tomcat7/lib/

The Default Usergrid Properties File

You should review the defaults in the above file. To get you started, let’s look at a minimal example properties file that you can edit and use as your own.

Please note that if you are installing Usergrid on the same machine as Cassandra Server, then set the following property to true

   #Tell Usergrid that Cassandra is not embedded.
   cassandra.embedded=true

Stack STEP #5: Deploy ROOT.war to Tomcat

The next step is to deploy the Usergrid Stack software to Tomcat. There are a variety of ways of doing this and the simplest is probably to place the Usergrid Stack ROOT.war file into the Tomcat webapps directory, then restart Tomcat.

Look for messages like this, which indicate that the ROOT.war file was deployed:

INFO: Starting service Catalina
Jan 29, 2016 1:00:32 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.59
Jan 29, 2016 1:00:32 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive /usr/share/tomcat7/webapps/ROOT.war

Does it work?

you can use curl:

curl http://localhost:8080/status

If you get a JSON file of status data, then you’re ready to move to the next step. You should see a response that begins like this:

{
“timestamp” : 1454090178953,
“duration” : 10,
“status” : {
“started” : 1453957327516,
“uptime” : 132851437,
“version” : “201601240200-595955dff9ee4a706de9d97b86c5f0636fe24b43”,
“cassandraAvailable” : true,
“cassandraStatus” : “GREEN”,
“managementAppIndexStatus” : “GREEN”,
“queueDepth” : 0,
“org.apache.usergrid.count.AbstractBatcher” : {
“add_invocation” : {
“type” : “timer”,
“unit” : “microseconds”,
… etc. …

Initialize the Usergrid Database

Next, you must initialize the Usergrid database, index and query systems.

To do this you must issue a series of HTTP operations using the superuser credentials. You can only do this if Usergrid is configured to allow superused login via this property usergrid.sysadmin.login.allowed=true and if you used the above example properties file, it is allowed.

The three operation you must perform are expressed by the curl commands below and, of course, you will have ot change the password test to match the superuser password that you set in your Usergrid properties file.

curl -X PUT http://localhost:8080/system/database/setup -u superuser:test
curl -X PUT http://localhost:8080/system/database/bootstrap -u superuser:test
curl -X GET http://localhost:8080/system/superuser/setup -u superuser:test

When you issue each of those curl commands, you should see a success message like this:

{
“action” : “cassandra setup”,
“status” : “ok”,
“timestamp” : 1454100922067,
“duration” : 374
}

Now that you’ve gotten Usergrid up and running, you’re ready to deploy the Usergrid Portal.

Deploying the Usergrid Portal

The Usergrid Portal is an HTML5/JavaScript application, a bunch of static files that can be deployed to any web server, e.g. Apache HTTPD or Tomcat.

To deploy the Portal to a web server, you will un-tar the usergrid-portal.tar file into directory that serves as the root directory of your web pages.

Once you have done that there is one more step. You need to configure the portal so that it can find the Usergrid stack. You do that by editing the portal/config.js and changing this line:

Usergrid.overrideUrl = ’http://localhost:8080/‘;

To set the hostname that you will be using for your Usergrid installation.

I have deployed a sample instance and tested the same. You can find the system ready configurations in TechUtils repository

Apache Ignite: What is Ignite?

Apache Ignite(TM) In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash-based technologies.

apache-ignite

FEATURES

You can view Ignite as a collection of independent, well-integrated, in-memory components geared to improve performance and scalability of your application. Some of these components include:


Apache Ignite APIs

Apache Ignite has a reach set of APIs that are covered throughout the documentation. The APIs are implemented in a form of native libraries for such major languages and technologies as Java, .NET and C++ and by supporting a variety of protocols like REST, Memcached or Redis.

The documentation that is located under this domain is mostly related to Java. Refer to the following documentation sections and domains to learn more about alternative technologies and protocols you can use to connect to and work with an Apache Ignite cluster:

Fork It on GIT

Apache Commons DbUtils Mini Wrapper

This is a very small DB Connector code in Java as a wrapper class to Apache DBUtils.

The Commons DbUtils library is a small set of classes designed to make working with JDBC easier. JDBC resource cleanup code is mundane, error prone work so these classes abstract out all of the cleanup tasks from your code leaving you with what you really wanted to do with JDBC in the first place: query and update data.

Some of the advantages of using DbUtils are:

  • No possibility for resource leaks. Correct JDBC coding isn’t difficult but it is time-consuming and tedious. This often leads to connection leaks that may be difficult to track down.
  • Cleaner, clearer persistence code. The amount of code needed to persist data in a database is drastically reduced. The remaining code clearly expresses your intention without being cluttered with resource cleanup.
  • Automatically populate Java Bean properties from Result Sets. You don’t need to manually copy column values into bean instances by calling setter methods. Each row of the Result Set can be represented by one fully populated bean instance.

DbUtils is designed to be:

  • Small – you should be able to understand the whole package in a short amount of time.
  • Transparent – DbUtils doesn’t do any magic behind the scenes. You give it a query, it executes it and cleans up for you.
  • Fast – You don’t need to create a million temporary objects to work with DbUtils.

DbUtils is not:

  • An Object/Relational bridge – there are plenty of good O/R tools already. DbUtils is for developers looking to use JDBC without all the mundane pieces.
  • A Data Access Object (DAO) framework – DbUtils can be used to build a DAO framework though.
  • An object oriented abstraction of general database objects like a Table, Column, or Primary Key.
  • A heavyweight framework of any kind – the goal here is to be a straightforward and easy to use JDBC helper library.

Wrapper:

Apache Solr vs Elasticsearch: The Feature Smackdown

API

Feature Solr 5.3.0 ElasticSearch 2.0
Format XML,CSV,JSON JSON
HTTP REST API
Binary API SolrJ TransportClient, Thrift (through a plugin)
JMX support ES specific stats are exposed through the REST API
Official client libraries Java Java, Groovy, PHP, Ruby, Perl, Python, .NET, JavascriptOfficial list of clients
Community client libraries PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, WordPress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, WordPress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR etcComplete list
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via plugin)

Infrastructure

Feature Solr 5.3.0 ElasticSearch 2.0
Master-slave replication Only in non-SolrCloud. In SolrCloud, behaves identically to ES. Not an issue because shards are replicated across nodes.
Integrated snapshot and restore Filesystem Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

Feature Solr 5.3.0 ElasticSearch 2.0
Data Import DataImportHandler – JDBC, CSV, XML, Tika, URL, Flat File [DEPRECATED in 2.x] Rivers modules – ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates with stored fields with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Synonyms Supports Solr and Wordnet synonym format
Multiple indexes
Near-Realtime Search/Indexing
Complex documents
Schemaless 4.4+
Multiple document types per schema One set of fields per schema, one schema per core
Online schema changes Schemaless mode or via dynamic fields. Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying via multi-fields
Hash-based deduplication Murmur plugin or ER plugin

Searching

Feature Solr 5.3.0 ElasticSearch 2.0
Lucene Query parsing
Structured Query DSL Need to programmatically create queries if going beyond Lucene query syntax.
Span queries via SOLR-2703
Spatial/geo search
Multi-point spatial search
Faceting Top N term accuracy can be controlled with shard_size
Advanced Faceting New JSON faceting API blog post
Geo-distance Faceting
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries JIRA issue Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping
Spellcheck Suggest API
Autocomplete
Query elevation workaround
Joins Joined index has to be single-shard and replicated across all nodes. via has_children and top_children queries
Resultset Scrolling New to 4.7.0 via scan search type
Filter queries also supports filtering by native scripts
Filter execution order local params and cache property
Alternative QueryParsers DisMax, eDisMax query_string, dis_max, match, multi_match etc
Negative boosting but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload Warmers API
Term Vectors API

Customizability

Feature Solr 5.3.0 ElasticSearch 2.0
Pluggable API endpoints
Pluggable search workflow via SearchComponents
Pluggable update workflow
Pluggable Analyzers/Tokenizers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing
Pluggable webapps site plugin
Automated plugin installation Installable from GitHub, maven, sonatype or elasticsearch.org

 

Full article

How to find e-mail address source

Some times we need to find the source location of the sender of an email, or even the IP Address of the sender. This may not be a general case requirement, but even for the fun of it, or just out of curiosity, the recipient would want to search for such information.

One of our twitter followers asked us the same question on twitter

I decided to try this on one of the spam advertisers:

In the following steps you’ll learn how to find and copy an email header and paste it into the Trace Email Analyzer to get the sender’s IP address and track the source.

Would you like to track down (or trace) where an email that you received came from?

This Trace Email tool can help you do precisely that. It works by examining the header that is a part of the emails you receive to find the IP address. If you read the IP Lookup page, you’ll get a clear idea of what information an IP address can reveal.

(A header is the unseen part of every sent and received email. To learn a little bit more on headers, click here. You can see an example of a header at the end of this article.)

What email provider do you use?

To find the IP address of a received email you’re curious about, open the email and look for the header details. How you find that email’s header depends on the email program you use. Do you use Gmail or Yahoo? Hotmail or Outlook?

For example, if you’re a Gmail user, here are the steps you’d take:

  1. Open the message you want to view
  2. Click the down arrow next to the “Reply” link
  3. Select “Show Original” to open a new window with the full headers

STEPS TO TRACING AN EMAIL:

  1. Get instructions for locating a header for your email provider here
  2. Open the email you want to trace and find its header
  3. Copy the header, then paste it into the Trace Email Analyzer below
  4. Press the “Get Source” button
  5. Scroll down below the box for the Trace Email results!

You should know that in some instances people send emails with false or “forged” headers, which are common in spam and unwanted or even malicious e-mail. Our Trace Email tool does not and cannot detect forged e-mail. That’s why that person forged the header to begin with!

Results:


source: http://whatismyipaddress.com/

Unirest: Lightweight HTTP Request Client Libraries

UniRest

Unirest is a set of lightweight HTTP libraries available in multiple languages, built and maintained by the Mashape team.

Unirest.post("http://httpbin.org/post")
  .queryString("name", "Mark")
  .field("last", "Polo")
  .asJson()

Features

  • Make GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS requests
  • Both syncronous and asynchronous (non-blocking) requests
  • It supports form parameters, file uploads and custom body entities
  • Easily add route parameters without ugly string concatenations
  • Supports gzip
  • Supports Basic Authentication natively
  • Customizable timeout, concurrency levels and proxy settings
  • Customizable default headers for every request (DRY)
  • Customizable HttpClient and HttpAsyncClient implementation
  • Automatic JSON parsing into a native object for JSON responses
  • Customizable binding, with mapping from response body to java Object

Installing

Is easy as pie. Kidding. It’s about as easy as doing these little steps:

With Maven

You can use Maven by including the library:

<dependency>
    <groupId>com.mashape.unirest</groupId>
    <artifactId>unirest-java</artifactId>
    <version>1.4.7</version>
</dependency>

There are dependencies for Unirest-Java, these should be already installed, and they are as follows:

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.3.6</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpasyncclient</artifactId>
  <version>4.0.2</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpmime</artifactId>
  <version>4.3.6</version>
</dependency>
<dependency>
  <groupId>org.json</groupId>
  <artifactId>json</artifactId>
  <version>20140107</version>
</dependency>

If you would like to run tests, also add the following dependency along with the others:

<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.11</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>commons-io</groupId>
  <artifactId>commons-io</artifactId>
  <version>2.4</version>
  <scope>test</scope>
</dependency>

Without Maven

Alternatively if you don’t use Maven, you can directly include the JAR file in the classpath:http://oss.sonatype.org/content/repositories/releases/com/mashape/unirest/unirest-java/1.4.7/unirest-java-1.4.7.jar

Don’t forget to also install the dependencies (org.json, httpclient 4.3.6, httpmime 4.3.6,httpasyncclient 4.0.2) in the classpath too.

There is also a way to generate a Unirest-Java JAR file that already includes the required dependencies, but you will need Maven to generate it. Follow the instructions at http://blog.mashape.com/post/69117323931/installing-unirest-java-with-the-maven-assembly-plugin

Creating Request

So you’re probably wondering how using Unirest makes creating requests in Java easier, here is a basic POST request that will explain everything:

HttpResponse<JsonNode> jsonResponse = Unirest.post("http://httpbin.org/post")
  .header("accept", "application/json")
  .queryString("apiKey", "123")
  .field("parameter", "value")
  .field("foo", "bar")
  .asJson();

Requests are made when as[Type]() is invoked, possible types include Json, Binary, String, Object.

If the request supports and it is of type HttpRequestWithBody, a body it can be passed along with.body(String|JsonNode|Object). For using .body(Object) some pre-configuration is needed (see below).

If you already have a map of parameters or do not wish to use seperate field methods for each one there is a.fields(Map<String, Object> fields) method that will serialize each key – value to form parameters on your request.

.headers(Map<String, String> headers) is also supported in replacement of multiple header methods.

Full Documentation @ unirest.io

jSoup: Java HTML Parser

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

  • scrape and parse HTML from a URL, file, or string
  • find and extract data, using DOM traversal or CSS selectors
  • manipulate the HTML elements, attributes, and text
  • clean user-submitted content against a safe white-list, to prevent XSS attacks
  • output tidy HTML

jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.

Example

Fetch the Wikipedia homepage, parse it to a DOM, and select the headlines from theIn the news section into a list of Elements (online sample):

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

Open source

jsoup is an open source project distributed under the liberal MIT license. The source code is available at GitHub.

Getting started

  1. Download the jsoup jar (version 1.8.3)
  2. Read the cookbook introduction
  3. Enjoy!

Resources

HackerRank: Sherlock and The Beast

Problem Statement

Sherlock Holmes suspects his archenemy, Professor Moriarty, is once again plotting something diabolical. Sherlock’s companion, Dr. Watson, suggests Moriarty may be responsible for MI6’s recent issues with their supercomputer, The Beast.

Shortly after resolving to investigate, Sherlock receives a note from Moriarty boasting about infecting The Beastwith a virus; however, he also gives him a clue—a number, NN. Sherlock determines the key to removing the virus is to find the largest Decent Number having NN digits.

A Decent Number has the following properties:

  1. Its digits can only be 3‘s and/or 5‘s.
  2. The number of 3‘s it contains is divisible by 5.
  3. The number of 5‘s it contains is divisible by 3.
  4. If there are more than one such number, we pick the largest one.

Moriarty’s virus shows a clock counting down to The Beast‘s destruction, and time is running out fast. Your task is to help Sherlock find the key before The Beast is destroyed!

Constraints
1T201≤T≤20
1N1000001≤N≤100000

Input Format

The first line is an integer, TT, denoting the number of test cases.

The TT subsequent lines each contain an integer, NN, detailing the number of digits in the number.

Output Format

Print the largest Decent Number having NN digits; if no such number exists, tell Sherlock by printing -1.

Sample Input

4
1
3
5
11

Sample Output

-1
555
33333
55555533333

Explanation

For N=1, there is no decent number having 1 digit (so we print 1−1).
For N=3, 555 is the only possible number. The number 5 appears three times in this number, so our count of 5‘s is evenly divisible by 3 (Decent Number Property 3).
For N=5, 33333 is the only possible number. The number 3 appears five times in this number, so our count of 3‘s is evenly divisible by 5 (Decent Number Property 2).
For N=11, 55555533333 and all permutations of these digits are valid numbers; among them, the given number is the largest one.

Solution:

Code to merge list of PDFs using Itext Library

Here is the code to merge and output a single pdf from a set of PDFs in a folder.

In order to run and get the output just pass the folder path in the main method and the output will be saved in the same folder as the files.

Please share comments…:)

The Fuck

Magnificent app which corrects your previous console command, inspired by a @liamosaur tweet.

Few more examples:

➜ apt-get install vim
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

➜ fuck
sudo apt-get install vim [enter/↑/↓/ctrl+c]
[sudo] password for nvbn:
Reading package lists... Done
...
➜ git push
fatal: The current branch master has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin master


➜ fuck
git push --set-upstream origin master [enter/↑/↓/ctrl+c]
Counting objects: 9, done.
...
➜ puthon
No command 'puthon' found, did you mean:
 Command 'python' from package 'python-minimal' (main)
 Command 'python' from package 'python3' (main)
zsh: command not found: puthon

➜ fuck
python [enter/↑/↓/ctrl+c]
Python 3.4.2 (default, Oct  8 2014, 13:08:17)
...
➜ git brnch
git: 'brnch' is not a git command. See 'git --help'.

Did you mean this?
    branch

➜ fuck
git branch [enter/↑/↓/ctrl+c]
* master
➜ lein rpl
'rpl' is not a task. See 'lein help'.

Did you mean this?
         repl

➜ fuck
lein repl [enter/↑/↓/ctrl+c]
nREPL server started on port 54848 on host 127.0.0.1 - nrepl://127.0.0.1:54848
REPL-y 0.3.1
...

Source