#StackBounty: #java #scala #jmeter #jsr223 #scriptengine Scala JSR223 script using JMeter/Java context

Bounty: 50

Scala JSR223 script support since 2.11

e.eval("""s"a is $a, s is $s"""")

I added Scala 2.13 jars and tried to execute script, it can display constants in response

But I can’t add JMeter’s bind variables as log, I tried with:

log.info(a);
$log.info(a);

Or can’t print values to log, tried also

var a:Int =  10
println(a)

JMeter bindings code:

 Bindings bindings = engine.createBindings();
 final Logger logger = LoggerFactory.getLogger(JSR223_INIT_FILE);
 bindings.put("log", logger); // $NON-NLS-1$ (this name is fixed)       
 engine.eval(reader, bindings);

Tried also using bindings but it isn’t in context

bindings.get("log").info("aa");

Exception

ERROR o.a.j.p.j.s.JSR223Sampler: Problem in JSR223 script JSR223 Sampler, message: javax.script.ScriptException: not found: value bindings

How can I submit Scala JSR223 script using JMeter/Java bindings variables?


Get this bounty!!!

#StackBounty: #scala #amazon-web-services #apache-spark #hadoop #amazon-s3 Can't connect from Spark to S3 – AmazonS3Exception Statu…

Bounty: 50

I am trying to connect from Spark (running on my PC) to my S3 bucket:

 val spark = SparkSession
      .builder
      .appName("S3Client")
      .config("spark.master", "local")
      .getOrCreate()

val sc = spark.sparkContext;
    sc.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY)
    sc.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY)
    val txtFile = sc.textFile("s3a://bucket-name/folder/file.txt")
    val contents = txtFile.collect();

But getting the following exception:

Exception in thread “main”
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400,
AWS Service: Amazon S3, AWS Request ID: 07A7BDC9135BCC84, AWS Error
Code: null, AWS Error Message: Bad Request, S3 Extended Request ID:
6ly2vhZ2mAJdQl5UZ/QUdilFFN1hKhRzirw6h441oosGz+PLIvLW2fXsZ9xmd8cuBrNHCdh8UPE=

I have seen this question but it didn’t help me.

Edit:

As Zack suggested, I added:

sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3.eu-central-1.amazonaws.com")

But I still get the same exception.


Get this bounty!!!

#StackBounty: #scala #apache-spark #user-defined-functions How to add null columns to complex array struct in Spark with a udf

Bounty: 50

I am trying to add null columns to embebed array[struct] column, by this way I will able to transform a similar complex column:

  case class Additional(id: String, item_value: String)
  case class Element(income:String,currency:String,additional: Additional)
  case class Additional2(id: String, item_value: String, extra2: String)
  case class Element2(income:String,currency:String,additional: Additional2)

  val  my_uDF = fx.udf((data: Seq[Element]) => {
    data.map(x=>new Element2(x.income,x.currency,new Additional2(x.additional.id,x.additional.item_value,null))).seq
  })
  sparkSession.sqlContext.udf.register("transformElements",my_uDF)
  val result=sparkSession.sqlContext.sql("select transformElements(myElements),line_number,country,idate from entity where line_number='1'")

The goal is add to Element.Additional an extra field called extra2, for this reason I map this field with a UDF but it fails because:

org.apache.spark.SparkException: Failed to execute user defined function(anonfun$1: (array<struct<income:string,currency:string,additional:struct<id:string,item_value:string>>>) => array<struct<income:string,currency:string,additional:struct<id:string,item_value:string,extra2:string>>>)

If I print schema for ‘Elements’ field shows:

 |-- myElements: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- income: string (nullable = true)
 |    |    |-- currency: string (nullable = true)
 |    |    |-- additional: struct (nullable = true)
 |    |    |    |-- id: string (nullable = true)
 |    |    |    |-- item_value: string (nullable = true)

And I am trying to convert into this schema:

 |-- myElements: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- income: string (nullable = true)
 |    |    |-- currency: string (nullable = true)
 |    |    |-- additional: struct (nullable = true)
 |    |    |    |-- id: string (nullable = true)
 |    |    |    |-- item_value: string (nullable = true)
 |    |    |    |-- extra2: string (nullable = true)


Get this bounty!!!

#StackBounty: #scala #function-composition #finagle Is it possible to have a generic logging filter in finagle that can be "insert…

Bounty: 50

In our code we create many “finagle pipelines” like so:

val f1 = new Filter[A,B,C,D](...)
val f2 = new SimpleFilter[C,D](...)
val f3 = new Filter[C,D,E,F](...)
val s = new Service[E,F](...)

val pipeline: Service[A,B] = f1 andThen f2 andThen f3 andThen s

I would now like the ability to “insert” loggers anywhere in such a chain. The logger would only log the fact that a request came in and a response was received. Something like this:

class LoggerFilter[Req, Resp](customLog: String) extends SimpleFilter[Req, Resp] with LazyLogging{
  override def apply(request: Req, service: Service[Req, Resp]): Future[Resp] = {
    logger.info(s"$customLog => Request: ${request.getClass.getName} -> ${service.toString}")
    service(request).map{resp =>
      logger.info(s"$customLog => Response: ${resp.getClass.getName} -> ${request.getClass.getName}")
      resp
    }
  }
}

With this approach we have to keep declaring multiple loggers so that the types can align correctly and then we insert at the “right location”.

val logger1 = new LoggerFilter[A,B]("A->B Logger")
val logger2 = new LoggerFilter[C,D]("C->D Logger")
val logger3 = new LoggerFilter[E,F]("E->F Logger")

val pipeline = logger1 andThen f1 andThen f2 andThen logger2 andThen f3 andThen logger3 andThen s

Is there a way this can be avoided? Is it possible to just have a single logger that can infer the Req/Resp types automagically and be “insertable anywhere” in the chain?

E.g.:

val logger = getTypeAgnosticLogger // What's the implementation?

val pipeline = logger andThen f1 andThen f2 andThen logger andThen f3 andThen logger andThen s

// Is this possible - params for logger to print?
val pipeline = logger("f1") andThen f1 andThen f2 andThen logger("f3") andThen f3 andThen logger("s") andThen s


Get this bounty!!!

#StackBounty: #reinventing-the-wheel #scala #generics #dsl Typeclass-oriented example project with implicit classes

Bounty: 50

The following code will only be an example of a project that due to (nested) higher kinds heavily relies on typeclasses for its DSL.

When reviewing the code keep in mind that this is an example, so I am not relying on libraries like cats even if they might already provide solutions for what I am doing.

With that aside I have some explicit questions about aspects of my code apart from an overall review:

  • What’s the correct package structuring e.g. /algebra, /syntax, /ops, /dsl, /implicits, … and where in that structure do simple case classes (like M or N) go, where typeclasses (like Invertable), where typeclass-instances (like mnInvertable), where implicit classes (like InvertableOps)?
  • Is my use of an implicit class good here, or is there a better way to define .invert on types like M[N[A]]?
  • Should the implicit parameter in the implicit class be moved to def invert instead?
  • What’s the naming convention for typeclasses (I used ...able), typeclass instances (I used type + typeclass) and implicit classes (I used typeclass + Ops)?
  • Bonus: Is it possible to define Invertable as context bounds instead of an implicit parameter? (I think not, but who knows!)
// src/main/scala/myproject/algebra/M.scala
case class M[A](value: A)

// src/main/scala/myproject/algebra/N.scala
case class N[A](value: A)

// src/main/scala/myproject/syntax/Invertable.scala
trait Invertable[F[_], G[_]] {
  def invert[A](fga: F[G[A]]): G[F[A]]
}

// src/main/scala/myproject/implicits/package.scala
implicit val mnInvertable: Invertable[M, N] = new Invertable[M, N] {
  def invert[A](fga: M[N[A]]): N[M[A]] = N(M(fga.value.value))
}

implicit class InvertableOps[F[_], G[_], A](fga: F[G[A]])(implicit i: Invertable[F, G]) {
  def invert = i.invert(fga)
}

// Somewhere in the project
M(N(1)).invert // It works!


Get this bounty!!!

#StackBounty: #scala #join #activerecord #group-by Write join query with groupby in Scala ActiveRecord

Bounty: 150

I am trying to write a specific query in scala Active record. But it always returns nothing. I have read the wiki on the github page but it does not contain a lot of info on it. The query I am trying to write is

SELECT e.name, e.id, COUNT(pt.pass_id) as pass_count, e.start_date, e.total_passes_to_offer
FROM events e inner join passes p on e.id = p.event_id inner join pass_tickets pt on p.id = pt.pass_id where e.partner_id = 198 group by e.name, e.id

What I have tried is

Event.joins[Pass, PassTicket](
                (event, pass, passTicket) => (event.id === pass.eventId, pass.id === passTicket.passId)
            ).where(
                (event, _, _) => event.partnerId === partnerId
            ).select(
                (event, pass, _) => (event.name, event.id, PassTicket.where(_.passId === pass.id).count, event.startDate, event.totalPassesToOffer)
            ).groupBy( data => data._2)

But first, the return type becomes a map, not a list. And second when executed, it doesnt return anything even though the data exists and the SQL executes fine.


Get this bounty!!!

#StackBounty: #scala #apache-spark #rdd #k-means Spark iterative Kmeans not get expected results?

Bounty: 50

In my homework I am asked to write an naive implementation of Kmeans in Spark:

import breeze.linalg.{ Vector, DenseVector, squaredDistance }
import scala.math 
def parse(line: String): Vector[Double] = {
    DenseVector(line.split(' ').map(_.toDouble))
  }
def closest_assign(p: Vector[Double], centres: Array[Vector[Double]]): Int = {
    var bestIndex = 1
    var closest = Double.PositiveInfinity

    for (i <- 0 until centres.length) {
      val tempDist = squaredDistance(p, centres(i))

      if (tempDist < closest) {
        closest = tempDist
        bestIndex = i
      }
    }

    bestIndex
 }

val fileroot:String="/FileStore/tables/"
val file=sc.textFile(fileroot+"data.txt")
           .map(parse _)
           .cache()
val c1=sc.textFile(fileroot+"c1.txt")
         .map(parse _)
         .collect()

val c2=sc.textFile(fileroot+"c2.txt")
         .map(parse _)
         .collect()
val K=10
val MAX_ITER=20
var kPoints=c2

for(i<-0 until MAX_ITER){
    val closest = file.map(p => (closest_assign(p, kPoints), (p, 1)))

    val pointStats = closest.reduceByKey { case ((x1, y1), (x2, y2)) => (x1 + x2, y1 + y2) }

    val newPoints = pointStats.map { pair =>
        (pair._1, pair._2._1 * (1.0 / pair._2._2))
      }.collectAsMap()

     for (newP <- newPoints) {
        kPoints(newP._1) = newP._2
      }

  val tempDist = closest
    .map { x => squaredDistance(x._2._1, newPoints(x._1)) }
    .fold(0) { _ + _ }

     println(i+" time finished iteration (cost = " + tempDist + ")") 
}

In thoory tempDist should become smaller and smaller as the program runs but in reality it goes the other way around. Also I found c1 and c2 changes value as after the for(i<-0 until MAX_ITER) loop. But c1 and c2 should be val values! Is the way I load c1 and c2 wrong? c1 and c2 are two different initial clusters for the data.


Get this bounty!!!

#StackBounty: #scala #build #sbt Circular project dependencies with a testkit sbt

Bounty: 200

I maintain a open source bitcoin library called bitcoin-s. If you look at the build.sbt file you will see that the testkit project depends on the rpc project, and the rpc project depends on the testkit project as a publish dependency inside of our Deps.scala file.

This is unfortunate because if we change the api in the rpc project at all, we have to publish a new testkit snapshot to be able to reflect the changes in the rpc api, and then run tests in the rpc project. You can see a more detailed guide of the build process here

I would like to make it so that we can just have each project depend on each other in build.sbt with something like this:

lazy val rpc = project
  .in(file("rpc"))
  .enablePlugins()
  .settings(commonSettings: _*)
  .dependsOn(
    core,
    testkit % "test->test"
  )
  .settings(
    testOptions in Test += Tests.Argument("-oF")
  )

lazy val bench = project
  .in(file("bench"))
  .enablePlugins()
  .settings(assemblyOption in assembly := (assemblyOption in assembly).value
    .copy(includeScala = true))
  .settings(commonSettings: _*)
  .settings(
    libraryDependencies ++= Deps.bench,
    name := "bitcoin-s-bench"
  )
  .dependsOn(core)

lazy val eclairRpc = project
  .in(file("eclair-rpc"))
  .enablePlugins()
  .settings(commonSettings: _*)
  .dependsOn(
    core,
    rpc
    testkit % "test->test"
  )

lazy val testkit = project
  .in(file("testkit"))
  .enablePlugins()
  .settings(commonSettings: _*)
  .dependsOn(
    core,
    rpc,
    eclairRpc
  )

However this creates a circular dependency between the projects which leads to a stackoverflow when loading build.sbt.

Is there any way to avoid this? We have a very complicated process of publishing the dependency currently which ends up depending on SNAPSHOTS of the project (not full releases) as the bitcoinsV


Get this bounty!!!

#StackBounty: #scala #validation #serialization #version #scala-cats How to separate out parsing from validation in case of versioned c…

Bounty: 150

Background

I have a set of configuration JSON files that look like the following:

{
  "version" : 1.0,
  "startDate": 1548419535,
  "endDate": 1558419535,
  "sourceData" : [...]  // nested json inside the List.
  "destData" : [...]    // nested json inside the List.
  "extra" : ["business_type"]
}

There are several such config files. They are fixed and reside in my code directory only. The internal representation of each config file is given by my case class Config:

case class Attribute(name: String, mappedTo: String)

case class Data(location: String, mappings:List[Attribute])

case class Config(version: Double, startDate: Long, endDate: Long, sourceData: List[Data],
                  destData: List[Data], extra: List[String])

I have three classes Provider, Parser and Validator.

  1. Provider has a method getConfig(date: Long): Config. It has to return the config satisfying startDate <= date <= endDate (ideally exactly one such config should be present, as startDate to endDate defines the version of config to be returned).
  2. getConfig calls a method inside Parser called parseList(jsonConfigs: List[String]): Try[List[Config]]. What parseList does is try to deserialize all configs in the list, each to an instance of case class Config. Even if one JSON fails to deserialize parseList returns a scala.util.Failure otherwise it returns scala.util.Success[List[Config]].
  3. If scala.util.Success[List[Config]] is returned from the previous step, getConfig then finally calls a method inside Validator called def validate(List[Config], Date): ValidationResult[Config], and returns it’s result. As I want all errors to be accumulated I am using Cats Validated for validation. I have even asked a question about it’s correct usage here.
  4. validate does the following:
    Checks if exactly one Config in the List, is applicable for the given
    date (startDate <= date <= endDate) and then performs some validations on that Config (otherwise it returns an invalidNel). I perform some basic validations on the Config like checking various Lists and Strings being non empty etc. I also perform some semantic validations like checking that each String in field extra is present in mappings of each source/dest Data etc.

Question

  1. The question that has troubled me for couple of last days is, my purpose for using Cats Validated was solely to collect all errors (and not to fail fast when encountering the first validation error). But by the time I reach validate method I have already done some kind of validations in parseList method. That is, I have already validated inparseList that my JSON structure is in accordance to my case class Config. But my parseList doesn’t accumulate errors like my validate method. So if many incompatibilities between my json structure and my case class Config are present I’ll get to know only the first. But I would like to know them all at once.
  2. It gets worse if I start adding require clauses like nonEmpty inside the case class only ( they will be invoked while construction of case class, i.e. while parsing itself), e.g.
    case class Data(location: String, mappings: List[Attribute]) {
      require(location.nonEmpty)
      require(mappings.nonEmpty)
    }
    

So I am not able to draw a line between my parsing and my validation functionality properly.

  1. One solution I thought of was abandon the current JSON library (lift-json) I am using and use play-json instead. It has functionality for accumulating errors like Cats Validated (I got to know about it here, goes really well with Cats invalidNel). I thought I would first parse JSON to play-json’s JSON AST JsValue, perform the structural compatible validation between JsValue and my Config using play-jsons validate method (it accumulates errors). If its fine read Config case class from JsValue and perform latter validations I gave examples of above, using Cats.
  2. But I need to parse all config to see which one is applicable for a given date. I don’t proceed if even one config fails to deserialize. If all deserialize successfully I pick the one whose (startDate, endDate) enclose the given date. So if I follow the solution I mentioned above, I have pushed the conversion of List[JsValue] to List[Config] to validation phase. Now if each JsValue in the List deserializes successfully to a Config instance, I can choose the applicable one, perform more validations on it and return the result. But if some JsValue fail to deserialize what do I do? Should I return their errors? Doesn’t seem intuitive. This problem here is that I need to parse all config to see which one is applicable for a given date. And this is making it more difficult for me to mark a separation between parsing and validation phase.

How do I draw a line between parsing and validating a config in my scenario? Do I change the way I maintain versions (a version being valid from start to end date)?

PS: I am an extremely novice programmer in general. Forgive me if my question is weird. I myself never thought I would spend so much time on validation while learning Scala.


Get this bounty!!!