#StackBounty: #mongodb #mongodb-query #nosql #aggregation-framework #nosql-aggregation How to create complicated mongodb queries?

Bounty: 50

I am a novice when it comes to mongo as I have traditionally only worked with Oracle database.
I have a mongo database that’s storing bitbucket data in columns like so:

_id | _class | collectorItemId| firstEverCommit | scmUrl | scmBranch | scmAuthor | scmCommitTimestamp

There are a few more columns in there that I’ve omitted for the sake of time. For the scmBranch column, the column is populated with one of two strings: “master” or “develop”.
Here is a sample of what the data looks like:
enter image description here

Here is the document view of one of the rows:

{
"_id" : ObjectId("5e39d6a0330c130006a042c6"),
"collectorItemId" : ObjectId("5e33a6b9887ef5000620a0c0"),
"firstEverCommit" : false,
"scmUrl" : "sampleRepo1",
"scmBranch" : "master",
"scmRevisionNumber" : "a2ad6842468eb55bffcbe7d700b6addd3eb11629",
"scmAuthor" : "son123",
"scmCommitTimestamp" : NumberLong(1580841662000)
}

I am now trying to formulate mongo queries that will get me the following data:

 1. For each scmUrl, If max(scmCommitTimestamp) where scmBranch =
    "develop" > max(scmCommitTimestamp) where scmBranch = "master" THEN
    count the number of rows (i.e commits) where scmBranch = "develop"
    AND scmCommitTimestamp > max(scmCommitTimestamp) where scmBranch =
    "master"

 2. For the results found in #1, find the oldest commit and newest
    commit

So far, the best mongo query I’ve been able to come up with is the following:

db.bitbucket.aggregate([{
    "$group": {
        "_id": {
            "scmUrl": "$scmUrl",
            "scmBranch": "$scmBranch"
        },
        "MostRecentCommit": {
            "$max": {"$toDate":"$scmCommitTimestamp"}
        }
    }
},{
    "$project": {
        "RepoName": {"$substr": ["$_id.scmUrl",39,-1]},
        "Branch": "$_id.scmBranch",
        "MostRecentCommit": "$MostRecentCommit"
    }
},{
   "$sort":{
       "RepoName":1,
       "Branch":1
       }

}
])

But this only gets me back the most recent commit for the develop branch and the master branch of each scmUrl (i.e repo), like so:
enter image description here

Ideally, I’d like to get back a table of results with the following columns:

scmUrl/RepoName | Number of commits on develop branch that are not on master branch| oldest commit in develop branch that's not in master branch | newest commit in develop branch that's not in master branch

How can I modify my mongo query to extract the data that I want?


Get this bounty!!!

#StackBounty: #mongodb #nosql #dimensional-modeling #node.js Cascading One-to-Many relationships modeling

Bounty: 50

Are there any drawbacks or better alternatives of this non relational model ? I note that it remains easy to understand but the concern is with the code interacting with.

First, as I was introduced to the no-SQL world, in many occasions I confronted a one-to-many relationship between entities. Today, I have a really relatively cascading example that might grow in the future.

Based on functionalities assumptions, I came up with a simple Snowflake model. Specifically a cascading one-to-many relationships with some describing data.

[User] 1 — * [Session] 1 — * [Execution] 1 — * [Report]

The data model as it seems at first is easy to deal with, but I finally found that acting on data using Mongoose (a NodeJS library) can become complex and less performant, especially in a web application context (request and response cycle). The first way of thinking is to simple refer to parents by children in a normalization fashion. Another way to implement this data model is using document embedding approach: https://docs.mongodb.com/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/ which is easier to interact with if you just model all in one entity; However this comes at the expense of performance; Because whenever you load a user, you load all sessions, executions and reports with it.

I found a compromise between a normalized model and the one using embedded documents; Modeled here:

Normalize and embed

The compromise consist of embedding a minimal variant of the child entity like Executions of type ExecutionsMini in Sessions. While maintaining the child entity Executions separate.

The concern grows because between Users and Loggings, there might be other entities added, in a one-to-many kind or not, and this could complex more the solution (not the data model).


Get this bounty!!!

#StackBounty: #windows #database #synchronization #nosql NoSQL database for Windows with Offline sync capability

Bounty: 200

I develop a Windows 7+ desktop application that currently uses Microsoft SQL Server to hold the data and a local DB to work offline.

Now I want to switch to a NoSQL database. That DB should work on Windows too and have offline and auto-sync capabilities on-board.

I could run a dedicated server to hold the data as I currently run a server with Microsoft SQL Server on it since I don’t want to pay for storing my data online in the cloud.


Get this bounty!!!

#StackBounty: #mongodb #database-design #nosql #mongoengine MongoDB : How to design schema based on application access patterns?

Bounty: 100

As someone that comes from DynamoDB, modeling a MongoDB schema to really fit deeply into my application is kinda confusing, specially since it has the concept of references and from what I read is not recommended to keep duplicated data to accomodate your queries.

Take the following example (modeled in mongoengine, but shouldn’t matter) :

    #User
    class User(Document):
        email = EmailFieldprimary_key=True)
        pswd_hash = StringField()
        #This also makes it easier to find the Projects the user has a Role
        roles = ListField(ReferenceField('Role')

    #Project
    class Project(Document):
        name = StringField()
        #This is probably unnecessary as the Role id is already the project id
        roles = ListField(ReferenceField('Role'))

    #Roles in project
    class Role(Document):
        project = ReferenceField('Project', primary_key=True)
        #List of permissions
        permissions = ListField(StringField())
        users = ListField(ReferenceField('User')

There are Projects and Users.

Each Project can have many Roles in it.

Each User can have one Role in a Project.


So, it’s a Many-Many between Users and Projects

A Many-One between Users and Roles

A Many-One between Roles and Projects


The problem is when I try to accomodate the schema to the access, because on every page refresh on the application, I need :

  1. Project (the id is in the url)
  2. User (email is in session)
  3. User permissions in that project (server-side security checks)

So, considering this is the most common query, how should I model my schema to accomodate it?

Or is the way I’m doing at the moment okay already?


Get this bounty!!!

#StackBounty: #cms #wiki #nosql Wiki or Cms to store cards (CCG/Sports/…etc)

Bounty: 50

A friend wants to create an online database that contains every card that exists in the world. This includes everything from magic the gathering, sports cards to normal (52) cards, Risk cards etc. I want to help and although for me it seems impossible to have every card, we could have a lot.

I’m now wondering what would be the best way to store the cards. Is there something like a Wiki, but with more structured data? From one viewpoint a Wiki seems suitable since we don’t have to worry about different data-models for different games (we just enter the cards as free text), but from another viewpoint something like MongoDB seems more suitable, as it’s better to have the data structured and it will be possible for visitors to filter cards on properties of a game. I think there are already too many different games to manually program a View (as in mVc) for each game. Is there something in between MongoDB and a Wiki?

Another idea would be to have a “CMS” with a no-sql database where all cards are in the same table, where you can manually, without programming, create views (i.e. frontend layout) that are selected based on certain properties of each item. For example, you could create a “Mtg Creature View”. And when the card- table contains something like

{ "card-type": "ccg",
  "game": "Magic the gathering",
  "expansion": "Ice Age",
  "type": "Creature",
  "Power": "7"
  "Thoughness": "6" }

then that view will be applied because the “game” and the “type” are a match. Does something like that exist or is it possible to modify an existing framework for this?


Get this bounty!!!

#StackBounty: #c# #azure #nosql #azure-cosmosdb #scalability Is it possible to model a scalable Follower/Following relationship in Azur…

Bounty: 100

Suppose I have a model class that looks like this:

public class Relationship
{
  public Guid PartitionKey { get; set; }

  public Guid Id { get; set; }

  public DateTime CreatedOn { get; set; }
}
  • PartitionKey is the partition key of the Relationship container, and it is represented by the user id of the person who is being followed. (receiver)

  • Id is the id of the container and it is represented by the user id of the person who follows the other user. (sender)

This model ensures that the same Id cannot be added to the same PartitionKey so that the follower/following relationship can only be created once between two users. It also allows me to easily look up the list of all followers for a specific person, which is crucial.

The problem is that each logical partition is limited to 10 GB of data. Considering the actual Relationship model is likely to have more properties and there is automatic indexing that happens behind the scenes and some users have millions of followers, this limit will be hit and make it impossible to allow new relationships for the same partition key.

How would one design this model on Cosmos DB so that it is truly scalable?


Get this bounty!!!

#StackBounty: #javascript #node.js #amazon-web-services #nosql AWS Lambda function to update newly added DynamoDB records

Bounty: 100

Approach

All data is submitted into my DynamoDB from another Lambda > API Integration function whereas the lastUpdated row gets inserted as null and then the function below basically polls my database every 1 minute looking for new rows that have a null value & performs actions on them until the lastUpdated can then be updated (once an action is performed elsewhere)

I have the following Node.JS (runtime v8.10) executing on AWS Lambda:

const AWS = require("aws-sdk");
const game = require('game-api');
const uuid = require("uuid");

AWS.config.update({
  region: "us-east-1"
});

exports.handler = async (event, context) => {

    //set db
    var documentClient = new AWS.DynamoDB.DocumentClient();

    //find matches
    var params = {
        TableName: 'matches',
        FilterExpression:'updated_at = :updated_at',
        ExpressionAttributeValues: {
            ":updated_at": 0,
        }
    };
    var rows = await documentClient.scan(params).promise();

    //game object
    let gameAPI = new game(
        [
            "admin@game.com",
            "password"
        ]
    );

    await gameAPI.login();

    for (let match of rows.Items) {

        var player_params = {
            TableName: 'players',
            Key: { "match_id": match.id }
        };

        let player_row = await documentClient.get(player_params).promise();

        //grab stats and compare
        var stats = await gameAPI.getStatsBR(
            player_row.Item.player_1_name,
            player_row.Item.player_1_network
        );
        var new_data = compareModiified(
            match.match_type,
            player_row.Item.player_1_last_updated,
            stats
        );

        if(new_data === true) {

                //we have new data
                let kills;
                let matches;
                let kills_completed;
                let matches_completed;
                let modified;

                switch(match.match_type) {
                    case 'myself':
                        kills = stats.group.solo.kills;
                        kills_completed = (kills - player_row.Item.player_1_kills);
                        matches = stats.group.solo.matches;
                        matches_completed = (matches - player_row.Item.player_1_matches);
                        modified = stats.group.solo.lastModified;
                        break;
                    case 'solo':
                        kills = stats.group.duo.kills;
                        kills_completed = (kills - player_row.Item.player_1_kills);
                        matches = stats.group.duo.matches;
                        matches_completed = (matches - player_row.Item.player_1_matches);
                        modified = stats.group.duo.lastModified;
                        break;
                    case 'duo':
                        kills = stats.group.squad.kills;
                        kills_completed = (kills - player_row.Item.player_1_kills);
                        matches = stats.group.squad.matches;
                        matches_completed = (matches - player_row.Item.player_1_matches);
                        modified = stats.group.squad.lastModified;
                        break;
                }

                var update_params = {
                    TableName:"matches",
                    Key: { "id": match.id },
                    UpdateExpression: "SET #status = :status, updated_at = :modified",
                    ExpressionAttributeNames:{
                        "#status":"status"
                    },
                    ExpressionAttributeValues:{
                        ":status": 1,
                        ":modified": modified
                    }
                };
                await documentClient.update(update_params).promise();

                var report_params = {
                    Item: {
                        'match_id': match.id,
                        'kills': kills_completed,
                        'matches': matches_completed,
                        'completed_at': new Date().getTime()
                    },
                    TableName : 'reports'
                };
                await documentClient.put(report_params).promise();

            } else {

                //we don't have new data.
                console.log("We don't have new data, let's not do anything..");

            }

    }

    return {
        statusCode: 200
    };

};

function compareModiified(match_type, db_modifiied, stats) {
    var stats_modified;
    switch(match_type) {
        case 'myself':
            stats_modified = stats.group.solo.lastModified;
            break;
        case 'solo':
            stats_modified = stats.group.duo.lastModified;
            break;
        case 'duo':
            stats_modified = stats.group.squad.lastModified;
            break;
    }
    return (stats_modified > db_modifiied);
}
```


Get this bounty!!!

Installing Apache UserGrid on linux

About the Project

Apache Usergrid is an open-source Backend-as-a-Service (BaaS or mBaaS) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services and retrieval features like:

  • User Registration & Management
  • Data Storage
  • File Storage
  • Queues
  • Full Text Search
  • Geolocation Search
  • Joins

It is a multi-tenant system designed for deployment to public cloud environments (such as Amazon Web Services, Rackspace, etc.) or to run on traditional server infrastructures so that anyone can run their own private BaaS deployment.

For architects and back-end teams, it aims to provide a distributed, easily extendable, operationally predictable and highly scalable solution. For front-end developers, it aims to simplify the development process by enabling them to rapidly build and operate mobile and web applications without requiring backend expertise.

Usergrid 2.1.0 Deployment Guide

Though the Usergrid Deployment guide seems to be simple enough, I faced certain hiccups and it took me about 4 days to figure out what I was doing wrong.

This document explains how to deploy the Usergrid v2.1.0 Backend-as-a-Service (BaaS), which comprises the Usergrid Stack, a Java web application, and the Usergrid Portal, which is an HTML5/JavaScript application.

Prerequsites

Below are the software requirements for Usergrid 2.1.0 Stack and Portal. You can install them all on one computer for development purposes, and for deployment you can deploy them separately using clustering.

Linux or a UNIX-like system (Usergrid may run on Windows, but we haven’t tried it)

Download the Apache Usergrid 2.1.0 binary release from the official Usergrid releases page:

After untarring the files that you need for deploying Usergrid Stack and Portal are ROOT.war and usergrid-portal.tar.

Stack STEP #1: Setup Cassandra

As mentioned in prerequisites, follow the installation guide given in link

Usergrid uses Cassandra’s Thrift protocol
Before starting cassandra, on Cassandra 2.x releases you MUST enable Thrift by setting start_rpc in your cassandra.yaml file:

    #Whether to start the thrift rpc server.
    start_rpc: true

Note:DataStax no longer supports the DataStax Community version of Apache Cassandra or the DataStax Distribution of Apache Cassandra. It is best to follow the Apache Documentation

Once you are up and running make a note of these things:

  • The name of the Cassandra cluster
  • Hostname or IP address of each Cassandra node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for Cassandra RPC (the default is 9160)
  • Replication factor of Cassandra cluster

Stack STEP #2: Setup ElasticSearch

Usergrid also needs access to at least one ElasticSearch node. As with Cassandra, you can setup single ElasticSearch node on your computer, and you should run a cluster in production.

Steps:

  • Download and unzip Elasticsearch
  • Run bin/elasticsearch (or bin\elasticsearch -d on Linux as Background Process) (or bin\elasticsearch.bat on Windows)
  • Run curl http://localhost:9200/

Once you are up and running make a note of these things:

  • The name of the ElasticSearch cluster
  • Hostname or IP address of each ElasticSearch node
    • in case of same machine as Usergrid, then localhost. Usergrid would then be running on single machine embedded mode.
  • Port number used for ElasticSearch protocol (the default is 9200)

Stack STEP #3: Setup Tomcat

The Usergrid Stack is contained in a file named ROOT.war, a standard Java EE WAR ready for deployment to Tomcat. On each machine that will run the Usergrid Stack you must install the Java SE 8 JDK and Tomcat 7+.

Stack STEP #4: Configure Usergrid Stack

You must create a Usergrid properties file called usergrid-deployment.properties. The properties in this file tell Usergrid how to communicate with Cassandra and ElasticSearch, and how to form URLs using the hostname you wish to use for Usegrid. There are many properties that you can set to configure Usergrid.

Once you have created your Usergrid properties file, place it in the Tomcat lib directory. On a Linux system, that directory is probably located at /path/to/tomcat7/lib/

The Default Usergrid Properties File

You should review the defaults in the above file. To get you started, let’s look at a minimal example properties file that you can edit and use as your own.

Please note that if you are installing Usergrid on the same machine as Cassandra Server, then set the following property to true

   #Tell Usergrid that Cassandra is not embedded.
   cassandra.embedded=true

Stack STEP #5: Deploy ROOT.war to Tomcat

The next step is to deploy the Usergrid Stack software to Tomcat. There are a variety of ways of doing this and the simplest is probably to place the Usergrid Stack ROOT.war file into the Tomcat webapps directory, then restart Tomcat.

Look for messages like this, which indicate that the ROOT.war file was deployed:

INFO: Starting service Catalina
Jan 29, 2016 1:00:32 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.59
Jan 29, 2016 1:00:32 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive /usr/share/tomcat7/webapps/ROOT.war

Does it work?

you can use curl:

curl http://localhost:8080/status

If you get a JSON file of status data, then you’re ready to move to the next step. You should see a response that begins like this:

{
“timestamp” : 1454090178953,
“duration” : 10,
“status” : {
“started” : 1453957327516,
“uptime” : 132851437,
“version” : “201601240200-595955dff9ee4a706de9d97b86c5f0636fe24b43”,
“cassandraAvailable” : true,
“cassandraStatus” : “GREEN”,
“managementAppIndexStatus” : “GREEN”,
“queueDepth” : 0,
“org.apache.usergrid.count.AbstractBatcher” : {
“add_invocation” : {
“type” : “timer”,
“unit” : “microseconds”,
… etc. …

Initialize the Usergrid Database

Next, you must initialize the Usergrid database, index and query systems.

To do this you must issue a series of HTTP operations using the superuser credentials. You can only do this if Usergrid is configured to allow superused login via this property usergrid.sysadmin.login.allowed=true and if you used the above example properties file, it is allowed.

The three operation you must perform are expressed by the curl commands below and, of course, you will have ot change the password test to match the superuser password that you set in your Usergrid properties file.

curl -X PUT http://localhost:8080/system/database/setup -u superuser:test
curl -X PUT http://localhost:8080/system/database/bootstrap -u superuser:test
curl -X GET http://localhost:8080/system/superuser/setup -u superuser:test

When you issue each of those curl commands, you should see a success message like this:

{
“action” : “cassandra setup”,
“status” : “ok”,
“timestamp” : 1454100922067,
“duration” : 374
}

Now that you’ve gotten Usergrid up and running, you’re ready to deploy the Usergrid Portal.

Deploying the Usergrid Portal

The Usergrid Portal is an HTML5/JavaScript application, a bunch of static files that can be deployed to any web server, e.g. Apache HTTPD or Tomcat.

To deploy the Portal to a web server, you will un-tar the usergrid-portal.tar file into directory that serves as the root directory of your web pages.

Once you have done that there is one more step. You need to configure the portal so that it can find the Usergrid stack. You do that by editing the portal/config.js and changing this line:

Usergrid.overrideUrl = ’http://localhost:8080/‘;

To set the hostname that you will be using for your Usergrid installation.

I have deployed a sample instance and tested the same. You can find the system ready configurations in TechUtils repository