### Distributed Evolutionary Algorithms in Python

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelization mechanism such as multiprocessing and SCOOP.

DEAP includes the following features:

- Genetic algorithm using any imaginable representation
- List, Array, Set, Dictionary, Tree, Numpy Array, etc.

- Genetic programing using prefix trees
- Loosely typed, Strongly typed
- Automatically defined functions

- Evolution strategies (including CMA-ES)
- Multi-objective optimisation (NSGA-II, SPEA2, MO-CMA-ES)
- Co-evolution (cooperative and competitive) of multiple populations
- Parallelization of the evaluations (and more)
- Hall of Fame of the best individuals that lived in the population
- Checkpoints that take snapshots of a system regularly
- Benchmarks module containing most common test functions
- Genealogy of an evolution (that is compatible with NetworkX)
- Examples of alternative algorithms : Particle Swarm Optimization, Differential Evolution, Estimation of Distribution Algorithm

See the DEAP User’s Guide for DEAP documentation.

## Installation

We encourage you to use easy_install or pip to install DEAP on your system. Other installation procedure like apt-get, yum, etc. usually provide an outdated version.

`pip install deap`

The latest version can be installed with

`pip install git+https://github.com/DEAP/deap@master`

If you wish to build from sources, download or clone the repository and type

`python setup.py install`

### #Tree #Traversals

Below is the Python code for traversing trees in various recursive modes like In order, Preorder, Post Order and their reverse orders…

The code is provided in python, but can be easily translated to Java/JS/PHP etc

### #Machine #Learning: The Basics, with Ron Bekkerman

### #HackerRank: Computing the Correlation

**Problem**

You are given the scores of **N** students in three different subjects – **Mathematics**, ** Physics** and

**Chemistry**; all of which have been graded on a scale of 0 to 100. Your task is to compute the Pearson product-moment correlation coefficient between the scores of different pairs of subjects (Mathematics and Physics, Physics and Chemistry, Mathematics and Chemistry) based on this data.

*This data is based on the records of the CBSE K-12 Examination – a national school leaving examination in India, for the year 2013.*

**Pearson product-moment correlation coefficient**

This is a measure of linear correlation described well on this Wikipedia page. The formula, in brief, is given by:

where **x** and **y** denote the two vectors between which the correlation is to be measured.

**Input Format**

The first row contains an integer **N**.

This is followed by **N** rows containing three tab-space (*‘\t’*) separated integers, **M P C** corresponding to a candidate’s scores in Mathematics, Physics and Chemistry respectively.

Each row corresponds to the scores attained by a unique candidate in these three subjects.

**Input Constraints**

1 <= N <= 5 x 10^{5}

0 <= **M, P, C** <= 100

**Output Format**

The output should contain three lines, with correlation coefficients computed

and rounded off **correct to exactly 2 decimal places**.

The first line should contain the correlation coefficient between Mathematics and Physics scores.

The second line should contain the correlation coefficient between Physics and Chemistry scores.

The third line should contain the correlation coefficient between Chemistry and Mathematics scores.

So, your output should look like this (these values are only for explanatory purposes):

```
0.12
0.13
0.95
```

**Test Cases**

There is one sample test case with scores obtained in Mathematics, Physics and Chemistry by 20 students. The hidden test case contains the scores obtained by all the candidates who appeared for the examination and took all three tests (Mathematics, Physics and Chemistry).

*Think:** How can you efficiently compute the correlation coefficients within the given time constraints, while handling the scores of nearly 400k students?*

**Sample Input**

```
20
73 72 76
48 67 76
95 92 95
95 95 96
33 59 79
47 58 74
98 95 97
91 94 97
95 84 90
93 83 90
70 70 78
85 79 91
33 67 76
47 73 90
95 87 95
84 86 95
43 63 75
95 92 100
54 80 87
72 76 90
```

**Sample Output**

```
0.89
0.92
0.81
```

There is **no special library support** available for this challenge.

**Solution**(Source)

### #HackerRank: Correlation and Regression Lines solutions

```
import numpy as np
import scipy as sp
from scipy.stats import norm
```

### Correlation and Regression Lines – A Quick Recap #1

Here are the test scores of 10 students in physics and history:

Physics Scores 15 12 8 8 7 7 7 6 5 3

History Scores 10 25 17 11 13 17 20 13 9 15

Compute Karl Pearson’s **coefficient of correlation** between these scores. Compute the answer correct to three decimal places.

**Output Format**

In the text box, enter the floating point/decimal value required. Do not leave any leading or trailing spaces. Your answer may look like: `0.255`

This is NOT the actual answer – just the format in which you should provide your answer.

```
physicsScores=[15, 12, 8, 8, 7, 7, 7, 6, 5, 3]
historyScores=[10, 25, 17, 11, 13, 17, 20, 13, 9, 15]
```

`print(np.corrcoef(historyScores,physicsScores)[0][1])`

```
0.144998154581
```

### Correlation and Regression Lines – A Quick Recap #2

Here are the test scores of 10 students in physics and history:

Physics Scores 15 12 8 8 7 7 7 6 5 3

History Scores 10 25 17 11 13 17 20 13 9 15

Compute the **slope of the line of regression** obtained while treating Physics as the independent variable. Compute the answer correct to three decimal places.

**Output Format**

In the text box, enter the floating point/decimal value required. Do not leave any leading or trailing spaces. Your answer may look like: `0.255`

This is NOT the actual answer – just the format in which you should provide your answer.

`sp.stats.linregress(physicsScores,historyScores).slope`

```
0.20833333333333331
```

### Correlation and Regression Lines – A quick recap #3

Here are the test scores of 10 students in physics and history:

Physics Scores 15 12 8 8 7 7 7 6 5 3

History Scores 10 25 17 11 13 17 20 13 9 15

When a student scores 10 in Physics, what is his probable score in History? Compute the answer correct to one decimal place.

**Output Format**

In the text box, enter the floating point/decimal value required. Do not leave any leading or trailing spaces. Your answer may look like: `0.255`

This is NOT the actual answer – just the format in which you should provide your answer.

```
def predict(pi,x,y):
slope, intercept, rvalue, pvalue, stderr=sp.stats.linregress(x,y);
return slope*pi+ intercept
predict(10,physicsScores,historyScores)
```

```
15.458333333333332
```

### Correlation and Regression Lines – A Quick Recap #4

The two regression lines of a bivariate distribution are:

`4x – 5y + 33 = 0`

(line of y on x)

`20x – 9y – 107 = 0`

(line of x on y).

Estimate the value of `x`

when `y = 7`

. Compute the correct answer to one decimal place.

**Output Format**

In the text box, enter the floating point/decimal value required. Do not lead any leading or trailing spaces. Your answer may look like: `7.2`

This is NOT the actual answer – just the format in which you should provide your answer.

```
x=[i for i in range(0,20)]
'''
4x - 5y + 33 = 0
x = ( 5y - 33 ) / 4
y = ( 4x + 33 ) / 5
20x - 9y - 107 = 0
x = (9y + 107)/20
y = (20x - 107)/9
'''
t=7
print( ( 9 * t + 107 ) / 20 )
```

```
8.5
```

#### Correlation and Regression Lines – A Quick Recap #5

The two regression lines of a bivariate distribution are:

`4x – 5y + 33 = 0`

(line of y on x)

`20x – 9y – 107 = 0`

(line of x on y).

find the variance of y when σx= 3.

Compute the correct answer to one decimal place.

**Output Format**

In the text box, enter the floating point/decimal value required. Do not lead any leading or trailing spaces. Your answer may look like: `7.2`

This is NOT the actual answer – just the format in which you should provide your answer.

#### http://www.mpkeshari.com/2011/01/19/lines-of-regression/

#### Q.3. If the two regression lines of a bivariate distribution are 4x – 5y + 33 = 0 and 20x – 9y – 107 = 0,

- calculate the arithmetic means of x and y respectively.
- estimate the value of x when y = 7. – find the variance of y when σx = 3.

##### Solution : –

We have,

4x – 5y + 33 = 0 => y = 4x/5 + 33/5 ————— (i)

And

20x – 9y – 107 = 0 => x = 9y/20 + 107/20 ————- (ii)

(i) Solving (i) and (ii) we get, mean of x = 13 and mean of y = 17.[Ans.]

(ii) Second line is line of x on y

x = (9/20) × 7 + (107/20) = 170/20 = 8.5 [Ans.]

(iii) byx = r(σy/σx) => 4/5 = 0.6 × σy/3 [r = √(byx.bxy) = √{(4/5)(9/20)]= 0.6 => σy = (4/5)(3/0.6) = 4 [Ans.]

variance= σ**2=> 16

### What is the difference between linear regression on y with x and x with y?

The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but that’s the case.

The best way to think about this is to imagine a scatter plot of points with **y** on the vertical axis and **x** represented by the horizontal axis. Given this framework, you see a cloud of points, which may be vaguely circular, or may be elongated into an ellipse. What you are trying to do in regression is find what might be called the ‘line of best fit’. However, while this seems straightforward, we need to figure out what we mean by ‘best’, and that means we must define what it would be for a line to be good, or for one line to be better than another, etc. Specifically, we must stipulate a loss function. A loss function gives us a way to say how ‘bad’ something is, and thus, when we minimize that, we make our line as ‘good’ as possible, or find the ‘best’ line.

Traditionally, when we conduct a regression analysis, we find estimates of the slope and intercept so as to minimize the **sum of squared errors**. These are defined as follows:

In terms of our scatter plot, this means we are minimizing the sum of the *vertical distances* between the observed data points and the line.

On the other hand, it is perfectly reasonable to regress **x** onto **y**, but in that case, we would put **x** on the vertical axis, and so on. If we kept our plot as is (with **x** on the horizontal axis), regressing **x** onto **y** (again, using a slightly adapted version of the above equation with **x** and **y** switched) means that we would be minimizing the sum of the *horizontal distances* between the observed data points and the line. This sounds very similar, but is not quite the same thing. (The way to recognize this is to do it both ways, and then algebraically convert one set of parameter estimates into the terms of the other. Comparing the first model with the rearranged version of the second model, it becomes easy to see that they are not the same.)

Note that neither way would produce the same line we would intuitively draw if someone handed us a piece of graph paper with points plotted on it. In that case, we would draw a line straight through the center, but minimizing the vertical distance yields a line that is slightly *flatter* (i.e., with a shallower slope), whereas minimizing the horizontal distance yields a line that is slightly *steeper*.

A correlation is symmetrical **x** is as correlated with **y** as **y** is with **x**. The Pearson product-moment correlation can be understood within a regression context, however. The correlation coefficient, **r**, is the slope of the regression line when both variables have been *standardized* first. That is, you first subtracted off the mean from each observation, and then divided the differences by the standard deviation. The cloud of data points will now be centered on the origin, and the slope would be the same whether you regressed **y** onto **x**, or **x** onto **y**.

Now, why does this matter? Using our traditional loss function, we are saying that all of the error is in only *one* of the variables (viz.,** y**). That is, we are saying that **x** is measured without error and constitutes the set of values we care about, but that **y** has *sampling error*. This is very different from saying the converse. This was important in an interesting historical episode: In the late 70’s and early 80’s in the US, the case was made that there was discrimination against women in the workplace, and this was backed up with regression analyses showing that women with equal backgrounds (e.g., qualifications, experience, etc.) were paid, on average, less than men. Critics (or just people who were extra thorough) reasoned that if this was true, women who were paid equally with men would have to be more highly qualified, but when this was checked, it was found that although the results were ‘significant’ when assessed the one way, they were not ‘significant’ when checked the other way, which threw everyone involved into a tizzy. See here for a famous paper that tried to clear the issue up.

Here’s another way to think about this that approaches the topic through the formulas instead of visually:

The formula for the slope of a simple regression line is a consequence of the loss function that has been adopted. If you are using the standard Ordinary Least Squares loss function (noted above), you can derive the formula for the slope that you see in every intro textbook. This formula can be presented in various forms; one of which I call the ‘intuitive’ formula for the slope. Consider this form for both the situation where you are regressing **y** on **x**, and where you are regressing **x** on **y**:

Now, I hope it’s obvious that these would not be the same unless **Var(x) **equals **Var(y)**. If the variances *are* equal (e.g., because you standardized the variables first), then so are the standard deviations, and thus the variances would both also equal **SD(x)SD(y)**. In this case, **β^1** would equal Pearson’s **r**, which is the same either way by virtue of the principle of commutativity:

### #HackerEarth: #BattleOfBots 9: Taunt

Taunt is a two player board game which is played on a 10X4 grid of cells and is played on opposite sides of the game-board. Each player has an allocated color, Orange ( First Player ) or Green ( Second Player ) being conventional. Each player has nine piece in total. The players move their pieces towards to his / her opponent’s area by moving their pieces strategically. Each piece has a different moving feature and is one of the 3 types of pieces.

**Piece 1:** It can move to horizontally or vertically adjacent cell, if the cell doesn’t contain a piece of same color.

**Piece 2:** It can move to horizontally adjacent cell or can move two steps forward, if the cell doesn’t contain a piece of same color (except the piece itself).

This type of piece can move to its own position if its in the second last row of the grid and going downward or if its in the second row of the grid and going upward.

**Piece 3:** It can move two step diagonally in the forward direction, if the cell doesn’t contain a piece of same color (except the piece itself).

This type of piece can move to its own position if its in the second last row of the grid and going downward or if its in the second row of the grid and going upward.

Players take turns involving moves of pieces as mentioned above and can captures opponent’s piece by jumping on or over opponent’s pieces.

**Note:** Forward direction for first player is downward and for second player is upward.

If a piece (except piece 1) is moving downward and touches the last row, its direction will change i.e. now it will move upward. Similarly, once if a piece (except piece 1) is moving upward and touches the first row, its direction will change i.e. now it will move downward.

Rules:

- Player can only move according to the moves mentioned above.
- A player may not move an opponent’s piece.
- A player can captures opponent’s piece by jumping on or over opponent pieces.

The game will end after 100 moves ( 50 moves for each player ) or when any of the players don’t have any move left. At the end of the game the player with majority of pieces will win.

We will play it on an 10X4 grid. The top left of the grid is [0,0] and the bottom right is [9,3].

**Input:**

The input will be a 10X4 matrix consisting only of 0,1or2. Next line will contain an integer denoting the total number of moves till the current state of the board. Next line contains an integer – 1 or 2 which is your player id.

In the given matrix, top-left is [0,0] and bottom-right is [9,3]. The y-coordinate increases from left to right, and x-coordinate increases from top to bottom.

A cell is represented by **3** integers.

First integer denotes the player id (1 or 2).

Second integer denotes the type of piece (1, 2 or 3).

Third integer denotes the direction of the piece (0 (upward) or 1 (downward)). When the piece is of first type, direction doesn’t matter as the piece is free to move to horizontally or vertically adjacent cell, if the cell doesn’t contain a piece of same color.

Empty cell is represented by 000.

**Output:**

In the first line print the coordinates of the cell separated by space, the piece you want to move.

In second line print the coordinates of the cell in which the piece will jump.

You must take care that you don’t print invalid coordinates. For example, [1,1] might be a valid coordinate in the game play if the piece is able to jump to [1,1], but [9,10] will never be. Also if you play an invalid move or your code exceeds the time/memory limit while determining the move, you lose the game.

**Starting state**

The starting state of the game is the state of the board before the game starts.

131 131 131 121 121 121 111 111 111 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 210 210 210 220 220 220 230 230 230

**First Input**

This is the input give to the first player at the start of the game.

131 131 131 121 121 121 111 111 111 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 210 210 210 220 220 220 230 230 230 0 1

000 000 000 000 000 000 000 111 000 000 111 130 000 000 000 000 000 000 000 000 000 220 000 000 131 000 000 000 121 000 210 000 000 210 131 000 000 210 000 000 58 1

8 2 8 0

Explanation

This is player 1’s turn, and the player will move the piece at [8,2] and will take two steps diagonally in downward direction and will be at [8,0]

After his/her move the state of game becomes:

000 000 000 000 000 000 000 111 000 000 111 130 000 000 000 000 000 000 000 000 000 220 000 000 131 000 000 000 121 000 210 000 130 210 000 000 000 000 000 000 59 2

**Note:** Direction of the piece is also changed from 1 to 0 as the piece was moving downward and touches the last row. This state will be fed as input to program of player 2.

Here is the code of the default bot.

### Installing Apache UserGrid on linux

## About the Project

Apache Usergrid is an open-source `Backend-as-a-Service`

(`BaaS`

or `mBaaS`

) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services and retrieval features like:

- User Registration & Management
- Data Storage
- File Storage
- Queues
- Full Text Search
- Geolocation Search
- Joins

It is a `multi-tenant`

system designed for deployment to public cloud environments (such as Amazon Web Services, Rackspace, etc.) or to run on traditional server infrastructures so that anyone can run their own private BaaS deployment.

For architects and back-end teams, it aims to provide a distributed, easily extendable, operationally predictable and highly scalable solution. For front-end developers, it aims to simplify the development process by enabling them to rapidly build and operate mobile and web applications without requiring backend expertise.

## Usergrid 2.1.0 Deployment Guide

Though the Usergrid Deployment guide seems to be simple enough, I faced certain hiccups and it took me about 4 days to figure out what I was doing wrong.

This document explains how to deploy the Usergrid v2.1.0 Backend-as-a-Service (BaaS), which comprises the Usergrid Stack, a Java web application, and the Usergrid Portal, which is an HTML5/JavaScript application.

### Prerequsites

Below are the software requirements for Usergrid 2.1.0 Stack and Portal. You can install them all on one computer for development purposes, and for deployment you can deploy them separately using clustering.

Linux or a UNIX-like system (Usergrid may run on Windows, but we haven’t tried it)

Download the Apache Usergrid 2.1.0 binary release from the official Usergrid releases page:

After untarring the files that you need for deploying Usergrid Stack and Portal are `ROOT.war`

and `usergrid-portal.tar`

.

#### Stack STEP #1: Setup Cassandra

As mentioned in prerequisites, follow the installation guide given in link

Usergrid uses Cassandra’s Thrift protocol

Before starting cassandra, on Cassandra 2.x releases you MUST enable Thrift by setting start_rpc in your cassandra.yaml file:

```
#Whether to start the thrift rpc server.
start_rpc: true
```

Note:`DataStax`

no longer supports the`DataStax Community`

version of`Apache Cassandra`

or the`DataStax Distribution of Apache Cassandra`

. It is best to follow the Apache Documentation

Once you are up and running make a note of these things:

- The name of the Cassandra cluster
- Hostname or IP address of each Cassandra node
- in case of same machine as Usergrid, then
**localhost**. Usergrid would then be running on single machine embedded mode.

- in case of same machine as Usergrid, then
- Port number used for Cassandra RPC (the default is 9160)
- Replication factor of Cassandra cluster

#### Stack STEP #2: Setup ElasticSearch

Usergrid also needs access to at least one ElasticSearch node. As with Cassandra, you can setup single ElasticSearch node on your computer, and you should run a cluster in production.

Steps:

- Download and unzip
`Elasticsearch`

- Run
`bin/elasticsearch`

(or`bin\elasticsearch -d`

on Linux as Background Process) (or`bin\elasticsearch.bat`

on Windows) - Run
`curl http://localhost:9200/`

Once you are up and running make a note of these things:

- The name of the ElasticSearch cluster
- Hostname or IP address of each ElasticSearch node
- in case of same machine as Usergrid, then
**localhost**. Usergrid would then be running on single machine embedded mode.

- in case of same machine as Usergrid, then
- Port number used for ElasticSearch protocol (the default is 9200)

#### Stack STEP #3: Setup Tomcat

The Usergrid Stack is contained in a file named ROOT.war, a standard Java EE WAR ready for deployment to Tomcat. On each machine that will run the Usergrid Stack you must install the Java SE 8 JDK and Tomcat 7+.

#### Stack STEP #4: Configure Usergrid Stack

You must create a Usergrid properties file called usergrid-deployment.properties. The properties in this file tell Usergrid how to communicate with Cassandra and ElasticSearch, and how to form URLs using the hostname you wish to use for Usegrid. There are many properties that you can set to configure Usergrid.

Once you have created your Usergrid properties file, place it in the Tomcat lib directory. On a Linux system, that directory is probably located at `/path/to/tomcat7/lib/`

You should review the defaults in the above file. To get you started, let’s look at a minimal example properties file that you can edit and use as your own.

Please note that if you are installing Usergrid on the same machine as Cassandra Server, then set the following property to true

`#Tell Usergrid that Cassandra is not embedded. cassandra.embedded=true`

#### Stack STEP #5: Deploy ROOT.war to Tomcat

The next step is to deploy the Usergrid Stack software to Tomcat. There are a variety of ways of doing this and the simplest is probably to place the Usergrid Stack `ROOT.war`

file into the Tomcat webapps directory, then restart Tomcat.

Look for messages like this, which indicate that the ROOT.war file was deployed:

INFO: Starting service Catalina

Jan 29, 2016 1:00:32 PM org.apache.catalina.core.StandardEngine startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.59

Jan 29, 2016 1:00:32 PM org.apache.catalina.startup.HostConfig deployWAR

INFO: Deploying web application archive /usr/share/tomcat7/webapps/ROOT.war

#### Does it work?

you can use curl:

If you get a JSON file of status data, then you’re ready to move to the next step. You should see a response that begins like this:

{

“timestamp” : 1454090178953,

“duration” : 10,

“status” : {

“started” : 1453957327516,

“uptime” : 132851437,

“version” : “201601240200-595955dff9ee4a706de9d97b86c5f0636fe24b43”,

“cassandraAvailable” : true,

“cassandraStatus” : “GREEN”,

“managementAppIndexStatus” : “GREEN”,

“queueDepth” : 0,

“org.apache.usergrid.count.AbstractBatcher” : {

“add_invocation” : {

“type” : “timer”,

“unit” : “microseconds”,

… etc. …

#### Initialize the Usergrid Database

Next, you must initialize the Usergrid database, index and query systems.

To do this you must issue a series of HTTP operations using the superuser credentials. You can only do this if Usergrid is configured to allow superused login via this property `usergrid.sysadmin.login.allowed=true`

and if you used the above example properties file, it is allowed.

The three operation you must perform are expressed by the curl commands below and, of course, you will have ot change the password `test`

to match the `superuser`

password that you set in your Usergrid properties file.

curl -X PUT http://localhost:8080/system/database/setup -u superuser:test

curl -X PUT http://localhost:8080/system/database/bootstrap -u superuser:test

curl -X GET http://localhost:8080/system/superuser/setup -u superuser:test

When you issue each of those curl commands, you should see a success message like this:

{

“action” : “cassandra setup”,

“status” : “ok”,

“timestamp” : 1454100922067,

“duration” : 374

}

Now that you’ve gotten Usergrid up and running, you’re ready to deploy the Usergrid Portal.

#### Deploying the Usergrid Portal

The Usergrid Portal is an HTML5/JavaScript application, a bunch of static files that can be deployed to any web server, e.g. Apache HTTPD or Tomcat.

To deploy the Portal to a web server, you will un-tar the `usergrid-portal.tar`

file into directory that serves as the root directory of your web pages.

Once you have done that there is one more step. You need to configure the portal so that it can find the Usergrid stack. You do that by editing the `portal/config.js`

and changing this line:

Usergrid.overrideUrl = ’http://localhost:8080/‘;

To set the hostname that you will be using for your Usergrid installation.

I have deployed a sample instance and tested the same. You can find the system ready configurations in TechUtils repository

### Launch HTML code in browser from Python

Lets say you have generated some html content for a web page dynamically and have it in memory variable in python.

In order to view and test this content, you would need a Python program that prints out the HTML code, which then would have to be copied and pasted to a HTML file, then from there, you can test it in a browser.

In Python, there is a way to launch such HTML code in a web browser so that you don’t have to go through the copy and paste method every time

Using `webbrowser.open`

: