#StackBounty: #nginx #linux-networking #load-balancing #tcp #debian-buster Nginx fails on high load with Debian10 and not with Debian9

Bounty: 100

We had never any problems with nginx. We use 5 nginx server as loadbalancers in front of many spring boot application servers.

We were running them for years on debian 9 with the default nginx package 1.10.3. Now we switched three of our loadbalancers to debian 10 with nginx 1.14.2. First everything runs smoothly. Then, on high load we encountered some problems. It starts with

2020/02/01 17:10:55 [crit] 5901#5901: *3325390 SSL_write() failed while sending to client, client: ...
2020/02/01 17:10:55 [crit] 5901#5901: *3306981 SSL_write() failed while sending to client, client: ...

In between we get lots of

2020/02/01 17:11:04 [error] 5902#5902: *3318748 upstream timed out (110: Connection timed out) while connecting to upstream, ...
2020/02/01 17:11:04 [crit] 5902#5902: *3305656 SSL_write() failed while sending response to client, client: ...
2020/02/01 17:11:30 [error] 5911#5911: unexpected response for ocsp.int-x3.letsencrypt.org

It ends with

2020/02/01 17:11:33 [error] 5952#5952: unexpected response for ocsp.int-x3.letsencrypt.org

The problem does only exits for 30-120 seconds on high load and disappears afterwards.

In the kernel log we have sometimes:
Feb 1 17:11:04 kt104 kernel: [1033003.285044] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies. Check SNMP counters.

But on other occasions we don’t see any kernel.log messages

On both debian 9 and debian 10 servers we use the identical setup and had some TCP Tuning in place:

# Kernel tuning settings
# https://www.nginx.com/blog/tuning-nginx/
net.core.rmem_max=26214400
net.core.wmem_max=26214400
net.ipv4.tcp_rmem=4096 524288 26214400
net.ipv4.tcp_wmem=4096 524288 26214400
net.core.somaxconn=1000
net.core.netdev_max_backlog=5000
net.ipv4.tcp_max_syn_backlog=10000
net.ipv4.ip_local_port_range=16000 61000
net.ipv4.tcp_max_tw_buckets=2000000
net.ipv4.tcp_fin_timeout=30
net.core.optmem_max=20480

The nginx config is exactly the same, so I just show the main file:

user www-data;
worker_processes auto;
worker_rlimit_nofile 50000;
pid /run/nginx.pid;

events {
    worker_connections 5000;
    multi_accept on;
    use epoll;
}

http {
    root /var/www/loadbalancer;
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    types_hash_max_size 2048;
    server_tokens off;
    client_max_body_size 5m;
    client_header_timeout 20s; # default 60s
    client_body_timeout 20s; # default 60s
    send_timeout 20s; # default 60s

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
    ssl_session_timeout 1d;
    ssl_session_cache shared:SSL:100m;
    ssl_buffer_size 4k;
    ssl_dhparam /etc/nginx/dhparam.pem;
    ssl_prefer_server_ciphers on;
    ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS';

    ssl_session_tickets on;
    ssl_session_ticket_key /etc/nginx/ssl_session_ticket.key;
    ssl_session_ticket_key /etc/nginx/ssl_session_ticket_old.key;

    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate /etc/ssl/rapidssl/intermediate-root.pem;

    resolver 8.8.8.8;

    log_format custom '$host $server_port $request_time $upstream_response_time $remote_addr "$ssl_session_reused" $upstream_addr $time_iso8601 "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent";

    access_log /var/log/nginx/access.log custom;
    error_log /var/log/nginx/error.log;

    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_path /var/cache/nginx/ levels=1:2 keys_zone=imagecache:10m     inactive=7d use_temp_path=off;
    proxy_connect_timeout 10s;
    proxy_read_timeout 20s;
    proxy_send_timeout 20s;
    proxy_next_upstream off;

    map $http_user_agent $outdated {
    default 0;
    "~MSIE [1-6]." 1;
    "~Mozilla.*Firefox/[1-9]." 1;
    "~Opera.*Version/[0-9]." 1;
    "~Chrome/[0-9]." 1;
  }

  include sites/*.conf;
}

The upstream timeout signals some problems with our java machines. But at the same time the debian9 nginx/loadbalancer is running fine and has no problems connecting to any of the upstream servers.
And the problems with letsencrypt and SSL_write are signaling to me some problems with nginx or TCP or whatever.
I really don’t know how to debug this situation. But we can reliable reproduce it most of the times we encounter high load on debian10 servers and did never see it on debian 9.

Then I installed the stable version nginx 1.16 on debian10 to see if this is a bug in nginx which is already fixed:

nginx version: nginx/1.16.1
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1c 28 May 2019 (running with OpenSSL 1.1.1d 10 Sep 2019)
TLS SNI support enabled
configure arguments: ...

But it didn’t help.

It seems to be a network related problem. But we do not encouter it on the application servers. But the load is of course lower as the loadbalancer/nginx machine has to handle external and internal traffic.

It is very difficult to debug as it only happens on high load. We treid to load test the servers with ab, but we could not reproduce the problem.

Can somebody help me and give me some hints how to start further debugging of this situation?


Get this bounty!!!

#StackBounty: #postgresql #docker #ssl #tcp #traefik How to connect to Traefik TCP Services with TLS configuration enabled?

Bounty: 50

I am trying to configure Traefik so that I would have access to services via domain names, and that I would not have to set different ports. For example, two MongoDB services, both on the default port, but in different domains, example.localhost and example2.localhost. Only this example works. I mean, other cases probably work, but I can’t connect to them, and I don’t understand what the problem is. This is probably not even a problem with Traefik.

I have prepared a repository with an example that works. You just need to generate your own certificate with mkcert. The page at example.localhost returns the 403 Forbidden error but you should not worry about it, because the purpose of this configuration is to show that SSL is working (padlock, green status). So don’t focus on 403.

Only the SSL connection to the mongo service works. I tested it with the Robo 3T program. After selecting the SSL connection, providing the host on example.localhost and selecting the certificate for a self-signed (or own) connection works. And that’s the only thing that works that way. Connections to redis (Redis Desktop Manager) and to pgsql (PhpStorm, DBeaver, DbVisualizer) do not work, regardless of whether I provide certificates or not. I do not forward SSL to services, I only connect to Traefik. I spent long hours on it. I searched the internet. I haven’t found the answer yet. Has anyone solved this?

PS. I work on Linux Mint, so my configuration should work in this environment without any problem. I would ask for solutions for Linux.


If you do not want to browse the repository, I attach the most important files:

docker-compose.yml

version: "3.7"

services:
    traefik:
        image: traefik:v2.0
        ports:
            - 80:80
            - 443:443
            - 8080:8080
            - 6379:6379
            - 5432:5432
            - 27017:27017
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
            - ./config.toml:/etc/traefik/traefik.config.toml:ro
            - ./certs:/etc/certs:ro
        command:
            - --api.insecure
            - --accesslog
            - --log.level=INFO
            - --entrypoints.http.address=:80
            - --entrypoints.https.address=:443
            - --entrypoints.traefik.address=:8080
            - --entrypoints.mongo.address=:27017
            - --entrypoints.postgres.address=:5432
            - --entrypoints.redis.address=:6379
            - --providers.file.filename=/etc/traefik/traefik.config.toml
            - --providers.docker
            - --providers.docker.exposedByDefault=false
            - --providers.docker.useBindPortIP=false

    apache:
        image: php:7.2-apache
        labels:
            - traefik.enable=true
            - traefik.http.routers.http-dev.entrypoints=http
            - traefik.http.routers.http-dev.rule=Host(`example.localhost`)
            - traefik.http.routers.https-dev.entrypoints=https
            - traefik.http.routers.https-dev.rule=Host(`example.localhost`)
            - traefik.http.routers.https-dev.tls=true
            - traefik.http.services.dev.loadbalancer.server.port=80
    pgsql:
        image: postgres:10
        environment:
            POSTGRES_DB: postgres
            POSTGRES_USER: postgres
            POSTGRES_PASSWORD: password
        labels:
            - traefik.enable=true
            - traefik.tcp.routers.pgsql.rule=HostSNI(`example.localhost`)
            - traefik.tcp.routers.pgsql.tls=true
            - traefik.tcp.routers.pgsql.service=pgsql
            - traefik.tcp.routers.pgsql.entrypoints=postgres
            - traefik.tcp.services.pgsql.loadbalancer.server.port=5432
    mongo:
        image: mongo:3
        labels:
            - traefik.enable=true
            - traefik.tcp.routers.mongo.rule=HostSNI(`example.localhost`)
            - traefik.tcp.routers.mongo.tls=true
            - traefik.tcp.routers.mongo.service=mongo
            - traefik.tcp.routers.mongo.entrypoints=mongo
            - traefik.tcp.services.mongo.loadbalancer.server.port=27017
    redis:
        image: redis:3
        labels:
            - traefik.enable=true
            - traefik.tcp.routers.redis.rule=HostSNI(`example.localhost`)
            - traefik.tcp.routers.redis.tls=true
            - traefik.tcp.routers.redis.service=redis
            - traefik.tcp.routers.redis.entrypoints=redis
            - traefik.tcp.services.redis.loadbalancer.server.port=6379

config.toml

[tls]
[[tls.certificates]]
certFile = "/etc/certs/example.localhost.pem"
keyFile = "/etc/certs/example.localhost-key.pem"

Build & Run

mkcert example.localhost # in ./certs/
docker-compose up -d

Step by step

  1. Install mkcert
  2. Clone my code
  3. In certs folder run mkcert example.localhost
  4. Start container by docker-compose up -d
  5. Open page https://example.localhost/ and check if it is secure connection
  6. If address http://example.localhost/ is not reachable, add 127.0.0.1 example.localhost to /etc/hosts
  7. Download and run Robo 3T
  8. Create new connection:
    • Address: example.localhost
    • Use SSL protocol
    • Self-signed Certificate
  9. Test tool (image below)
  10. Try connect that way to postgres or redis (other clients of course)

test


Get this bounty!!!

#StackBounty: #linux #sockets #go #tcp Why accepted two same 5-tuple socket when concurrent connect to the server?

Bounty: 50

server.go

package main

import (
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "net"
    "net/http"
    _ "net/http/pprof"
    "sync"
    "syscall"
)

type ConnSet struct {
    data  map[int]net.Conn
    mutex sync.Mutex
}

func (m *ConnSet) Update(id int, conn net.Conn) error {
    m.mutex.Lock()
    defer m.mutex.Unlock()
    if _, ok := m.data[id]; ok {
        fmt.Printf("add: key %d existed n", id)
        return fmt.Errorf("add: key %d existed n", id)
    }
    m.data[id] = conn
    return nil
}

var connSet = &ConnSet{
    data: make(map[int]net.Conn),
}

func main() {
    setLimit()

    ln, err := net.Listen("tcp", ":12345")
    if err != nil {
        panic(err)
    }

    go func() {
        if err := http.ListenAndServe(":6060", nil); err != nil {
            log.Fatalf("pprof failed: %v", err)
        }
    }()

    var connections []net.Conn
    defer func() {
        for _, conn := range connections {
            conn.Close()
        }
    }()

    for {
        conn, e := ln.Accept()
        if e != nil {
            if ne, ok := e.(net.Error); ok && ne.Temporary() {
                log.Printf("accept temp err: %v", ne)
                continue
            }

            log.Printf("accept err: %v", e)
            return
        }
        port := conn.RemoteAddr().(*net.TCPAddr).Port
        connSet.Update(port, conn)
        go handleConn(conn)
        connections = append(connections, conn)
        if len(connections)%100 == 0 {
            log.Printf("total number of connections: %v", len(connections))
        }
    }
}

func handleConn(conn net.Conn) {
    io.Copy(ioutil.Discard, conn)
}

func setLimit() {
    var rLimit syscall.Rlimit
    if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
        panic(err)
    }
    rLimit.Cur = rLimit.Max
    if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
        panic(err)
    }

    log.Printf("set cur limit: %d", rLimit.Cur)
}

client.go

package main

import (
    "bytes"
    "flag"
    "fmt"
    "io"
    "log"
    "net"
    "os"
    "strconv"
    "sync"
    "syscall"
    "time"
)

var portFlag = flag.Int("port", 12345, "port")

type ConnSet struct {
    data  map[int]net.Conn
    mutex sync.Mutex
}

func (m *ConnSet) Update(id int, conn net.Conn) error {
    m.mutex.Lock()
    defer m.mutex.Unlock()
    if _, ok := m.data[id]; ok {
        fmt.Printf("add: key %d existed n", id)
        return fmt.Errorf("add: key %d existed n", id)
    }
    m.data[id] = conn
    return nil
}

var connSet = &ConnSet{
    data: make(map[int]net.Conn),
}

func echoClient() {
    addr := fmt.Sprintf("127.0.0.1:%d", *portFlag)
    dialer := net.Dialer{}
    conn, err := dialer.Dial("tcp", addr)
    if err != nil {
        fmt.Println("ERROR", err)
        os.Exit(1)
    }
    port := conn.LocalAddr().(*net.TCPAddr).Port
    connSet.Update(port, conn)
    defer conn.Close()

    for i := 0; i < 10; i++ {
        s := fmt.Sprintf("%s", strconv.Itoa(i))
        _, err := conn.Write([]byte(s))
        if err != nil {
            log.Println("write error: ", err)
        }
        b := make([]byte, 1024)
        _, err = conn.Read(b)
        switch err {
        case nil:
            if string(bytes.Trim(b, "x00")) != s {
                log.Printf("resp req not equal, req: %d, res: %s", i, string(bytes.Trim(b, "x00")))
            }
        case io.EOF:
            fmt.Println("eof")
            break
        default:
            fmt.Println("ERROR", err)
            break
        }
    }
    time.Sleep(time.Hour)
    if err := conn.Close(); err != nil {
        log.Printf("client conn close err: %s", err)
    }
}

func main() {
    flag.Parse()
    setLimit()
    before := time.Now()
    var wg sync.WaitGroup
    for i := 0; i < 20000; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            echoClient()
        }()
    }
    wg.Wait()
    fmt.Println(time.Now().Sub(before))
}

func setLimit() {
    var rLimit syscall.Rlimit
    if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
        panic(err)
    }
    rLimit.Cur = rLimit.Max
    if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
        panic(err)
    }

    log.Printf("set cur limit: %d", rLimit.Cur)
}

running command

go run server.go
---
go run client.go

server running screenshot

enter image description here

The client simultaneously initiates 20,000 connections to the server, and the server accepted two remotePort connections that are exactly the same (in a extremely short period of time).

I try to use tcpconnect.py from bcc (patched to add skc_num)
enter image description here

tcpaccept.py
enter image description here
tracing the connection, and also finds that the remote port is duplicated on the server side when there is no duplicate on the client side

In my understanding, the 5-tuple of the socket will not be duplicated, Why the server accepted two sockets with exactly the same remote port?

My test environment:

kernel version 5.3.15-300.fc31.x86_64 and 4.19.1

go version go1.13.5 linux/amd64


Get this bounty!!!

#StackBounty: #c# #networking #async-await #task-parallel-library #tcp Socket application using TPL

Bounty: 50

This is an application I wrote that allows multiple TCP clients to share a single TCP connection to a remote server (hosted project, and a demo). Traffic generated by the server is forwarded to all clients. Traffic generated from any client is forwarded to the server. I don’t concern the case where client traffics are interleaved, as in my domain traffics are spontaneous and it’s affordable to have interleaved, invalid data.

The Goal

The goal for the code review is to examine if I am idiomatic on .NET socket programming and TPL programming, if I correctly handled task executions and cancellations, and if there is any performance concerns, e.g., unnecessarily blocking the socket communication, etc.

High Level Design

The project is hosted here. Feel free to read the README. Implementation wise, I designed it to be as responsive as possible – no socket connection is unnecessarily blocked.

  • There is an outbound queue. Each outbound packet from each client is put into this queue.
  • There is an inbound queue for each client. Each inbound packet is put into each of these queues.

For each client, there’re two async tasks, one for dequeueing from the client’s inbound queue (this can block when the queue is empty, which is desirable) and write the data to the client socket; one for reading data from the client socket and enqueueing to the outbound queue.

Similarly there’re two async tasks for the remote connection, one for dequeueing the outbound queue and write to the remote socket; one for reading from the remote socket and enqueueing to each client’s inbound queue.

There’s another async task accepting commands from stdin for connecting/disconnecting to the server, dumping diagnostics info, etc. But that’s not the main focus.


The entire code base is hosted here, but the most relevant classes are listed below.

Program.cs

namespace Multiplexer
{
    using System.Threading.Tasks;

    class Program
    {
        static void Main(string[] args)
        {
            var glob = new Global();
            var ctrl = new ControlChannel(glob);
            var clientServer = new ClientServer(3333, glob);

            Task.WaitAny(
                Task.Run(() => ctrl.Run(), glob.CancellationToken),
                Task.Run(() => clientServer.Run(), glob.CancellationToken));
        }
    }
}

ClientServer.cs

namespace Multiplexer
{
    using System;
    using System.Net;
    using System.Net.Sockets;
    using System.Threading.Tasks;
    /// <summary>
    /// Listens to connection requests and manages client connections
    /// </summary>
    class ClientServer
    {
        /// <summary>
        /// Port to listen on for clients connections
        /// </summary>
        readonly int port;
        Global glob;

        public ClientServer(
            int port, 
            Global glob)
        {
            this.port = port;
            this.glob = glob;
        }

        /// <summary>
        /// Continuously listening for client connection requests
        /// </summary>
        public async Task Run()
        {
            var localserver = new TcpListener(IPAddress.Parse("127.0.0.1"), port);
            localserver.Start();

            while (true)
            {
                Console.WriteLine("Waiting for clients to connect...");

                // AcceptTcpClientAsync() does not accept a cancellation token. But it's OK since
                // in no case would I want the client listener loop to stop running during the entire
                // multiplexer lifecycle. For the sake of completeness, if it is necessary to cancel
                // this operation, one could use CancellationToken.Register(localserver.Stop).
                // See: http://stackoverflow.com/a/30856169/695964
                var client = await localserver.AcceptTcpClientAsync();

                var clientWrapper = new Client(client, glob.CancellationToken, Upload);
                Console.WriteLine($"Client connected: {clientWrapper}");

                // Register client
                glob.Clients[clientWrapper] = 0;

                // Unregister client when it is terminating
                clientWrapper.OnClose = () =>
                {
                    Console.WriteLine($"Removing client from clients list: {clientWrapper}");
                    byte c;
                    glob.Clients.TryRemove(clientWrapper, out c);
                };

                // Start the client. This is fire-and-forget. We don't want to await on it. I
                // t's OK because Start() has necessary logic to handle client termination and disposal.
                var tsk = clientWrapper.Start();
            }
        }

        /// <summary>
        /// Implementation of upload delegate to be called when there's data to upload to remote server
        /// </summary>
        /// <param name="data">the outbound data</param>
        void Upload(byte[] data)
        {
            // Do not enqueue data if remote is not connected (drop it)
            if (glob.Remote.Connected)
            {
                glob.UploadQueue.TryAdd(data);
            }
        }
    }
}

ControlChannel.cs

namespace Multiplexer
{
    using Newtonsoft.Json;
    using System;
    using System.Linq;
    using System.Net.Sockets;
    using System.Threading.Tasks;

    /// <summary>
    /// Class to handle multiplexer management commands.
    /// </summary>
    class ControlChannel
    {
        Global glob;
        public ControlChannel(Global glob)
        {
            this.glob = glob;
        }

        public void Run()
        {
            while (true)
            {
                var line = Console.ReadLine();

                var toks = line.Split();
                switch (toks[0])
                {
                    // Connecting to remote server
                    // connect <remote_server> <port>
                    case "connect":
                        if (glob.Remote.Connected)
                        {
                            Console.WriteLine("Error: Remote is connected. Try disconnect first");
                        }
                        else
                        {
                            // Reset the upload queue so stale outbound data is not uploaded to the new
                            // connection
                            glob.ResetUploadQueue();

                            // This is a fire-and-forget task. It is the responsibility of
                            // the task to properly handle resource clean up.
                            Task.Run(() => StartServer(toks[1], int.Parse(toks[2])));
                        }
                        break;

                    // Disconnecting:
                    // disconnect
                    case "disconnect":
                        if (glob.Remote.Connected)
                        {
                            Console.WriteLine("Disconnecting");

                            // cancel the global cancellation token. This would disconnect the server and all 
                            // clients. It is desirable to disconnect the clients to maintain the equivelency 
                            // when a client is directly connected to the server.
                            glob.Cancel();
                        }
                        else
                        {
                            Console.WriteLine("Not connected");
                        }
                        break;

                    // Dump multiplexer status
                    // info|stats
                    case "info":
                    case "stats":
                        DumpStats();
                        break;

                    // Quit the multiplexer application
                    case "quit":
                        Console.WriteLine("Exiting...");
                        glob.Cancel();
                        return; // terminate the control channel loop

                    default:
                        Console.WriteLine("Unknown command: " + line);
                        break;
                }
            }
        }

        /// <summary>
        /// Dump multiplxer info
        /// </summary>
        private void DumpStats()
        {
            var info = new
            {
                Remote = glob.Remote,
                Clients = glob.Clients.Select(c => c.ToString()).ToArray(),
                UploadQueue = glob.UploadQueue.Select(msg => System.Text.Encoding.UTF8.GetString(msg)).ToArray(),
            };
            Console.WriteLine(JsonConvert.SerializeObject(info, Formatting.Indented));
        }

        /// <summary>
        /// Start the remote connection
        /// </summary>
        /// <param name="hostname">remote hostname</param>
        /// <param name="port">remote port</param>
        async Task StartServer(string hostname, int port)
        {
            Console.WriteLine($"Connecting to {hostname}:{port}");
            var remote = new TcpClient(hostname, port);
            var server = new Remote(remote, glob.UploadQueue, glob.CancellationToken, /* receive: */ data =>
            {
                // Implementation of receive() is to put inbound data to each of the client queues.
                // Note that this is non-blocking. If any queue is full, the data is dropped from that
                // queue.
                foreach (var client in glob.Clients)
                {
                    client.Key.DownlinkQueue.TryAdd(data);
                }
            });

            // Register the remote connection globally so everyone is aware of the connection status. This is 
            // essential for several requirements, e.g., client data should not be added to outbound queue if
            // there's no connection.
            glob.RegisterRemote(server);

            try
            {
                // Start and wait for the remote connection to terminate
                await server.Start();
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }
            finally
            {
                Console.WriteLine($"Disposing remote connection: {server}");
                server.Dispose();
                server = null;

                // When remote connection is terminated. Also disconnects all the clients.
                glob.Cancel();
            }
        }
    }
}

Remote.cs

namespace Multiplexer
{
    using System;
    using System.Collections.Concurrent;
    using System.Linq;
    using System.Net.Sockets;
    using System.Threading;
    using System.Threading.Tasks;

    /// <summary>
    /// An interface to expose read-only remote connection information
    /// </summary>
    public interface IRemoteInfo
    {
        /// <summary>
        /// Whether connected to the remote service
        /// </summary>
        bool Connected { get; }

        /// <summary>
        /// Remote address
        /// </summary>
        string RemoteAddress { get; }
    }

    /// <summary>
    /// Class to manage connection to the remote server
    /// </summary>
    class Remote : IDisposable, IRemoteInfo
    {
        readonly TcpClient client;
        readonly NetworkStream stream;

        /// <summary>
        /// A delegate called on receiving a package. Implementation could be submitting the package to the queue.
        /// </summary>
        readonly Action<byte[]> receive;

        /// <summary>
        /// Queue for data to be uploaded to remote server
        /// </summary>
        readonly BlockingCollection<byte[]> uplinkQueue;

        /// <summary>
        /// a linked cancellation token that is cancelled when:
        ///   - external cancellation is requested, or
        ///   - the token of the linked CTS is cancelled
        /// Note that the cancellation of the linked source won't propagate to the external token
        /// </summary>
        readonly CancellationTokenSource linkedCTS;

        public bool Connected => client.Connected;
        public string RemoteAddress => $"{client.Client.RemoteEndPoint}";

        public Remote(
            TcpClient client,
            BlockingCollection<byte[]> uplinkQueue,
            CancellationToken externalCancellationToken, 
            Action<byte[]> receive)
        {
            this.client = client;
            this.uplinkQueue = uplinkQueue;
            this.receive = receive;
            linkedCTS = CancellationTokenSource.CreateLinkedTokenSource(externalCancellationToken);
            stream = client.GetStream();
        }

        /// <summary>
        /// Async task to handle downlink (remote -> multiplexer) traffic
        /// 
        /// This is to read from the socket and put data into the downlink queue (via receive())
        /// </summary>
        async Task HandleDownlink()
        {
            linkedCTS.Token.ThrowIfCancellationRequested();
            int c;
            byte[] buffer = new byte[256];
            while ((c = await stream.ReadAsync(buffer, 0, buffer.Length, linkedCTS.Token)) > 0)
            {
                // Receive is non-blocking
                receive(buffer.Take(c).ToArray());
            }
        }

        /// <summary>
        /// Async task to handle uplink (multiplexer -> remote) traffic
        /// 
        /// This is to take data from the uplink queue and write into the socket.
        /// </summary>
        async Task HandleUplink()
        {
            linkedCTS.Token.ThrowIfCancellationRequested();
            byte[] data;

            // Taking from the queue can be blocked if there's nothing in the queue for consumption
            while (null != (data = uplinkQueue.Take(linkedCTS.Token)))
            {
                await stream.WriteAsync(data, 0, data.Length, linkedCTS.Token);
            }
        }

        /// <summary>
        /// Async task to start and wait for the uplink and downlink handlers
        /// </summary>
        public async Task Start()
        {
            try
            {
                var downlinkTask = Task.Run(HandleDownlink, linkedCTS.Token);
                var uplinkTask = Task.Run(HandleUplink, linkedCTS.Token);

                // If either task returns, the connection is considered to be terminated.
                await await Task.WhenAny(downlinkTask, uplinkTask);
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }
            finally
            {
                // Cancel the other task (uplink or downlink)
                linkedCTS.Cancel();
                Console.WriteLine("Remote connection exited.");
                this.Dispose();
            }
        }

        public void Dispose()
        {
            Console.WriteLine("Disposing of remote connection");
            linkedCTS.Dispose();
            stream.Dispose();
            client.Close();
        }
    }
}

Client.cs

namespace Multiplexer
{
    using System;
    using System.Collections.Concurrent;
    using System.Linq;
    using System.Net.Sockets;
    using System.Threading;
    using System.Threading.Tasks;

    /// <summary>
    /// A class to handle local client connections (client - multiplexer)
    /// </summary>
    class Client : IDisposable
    {
        readonly TcpClient client;

        /// <summary>
        /// A delegate to invoke when receiving data from the client socket that should be uploaded.
        /// 
        /// Implementation should put this to the outbound queue. This should be non-blocking.
        /// </summary>
        readonly Action<byte[]> upload;
        readonly NetworkStream stream;

        /// <summary>
        /// A queue containing data from remote server that should be delivered to this client
        /// </summary>
        readonly BlockingCollection<byte[]> downlinkQueue = new BlockingCollection<byte[]>();

        /// <summary>
        /// A cancellation token source linked with an external token
        /// </summary>
        readonly CancellationTokenSource cts;

        public BlockingCollection<byte[]> DownlinkQueue
        {
            get
            {
                return this.downlinkQueue;
            }
        }

        /// <summary>
        /// A delegate to be called when the client is closed. The <see cref="ClientServer"/> uses this to 
        /// properly remove the client from the clients list.
        /// </summary>
        public Action OnClose { get; set; }

        public Client(TcpClient client, CancellationToken externalCancellationToken, Action<byte[]> upload)
        {
            this.client = client;
            this.upload = upload;
            this.cts = CancellationTokenSource.CreateLinkedTokenSource(externalCancellationToken);
            this.stream = client.GetStream();
        }

        /// <summary>
        /// Start the client traffic
        /// </summary>
        public async Task Start()
        {
            var uplinkTask = Task.Run(HandleUplink, cts.Token);
            var downlinkTask = Task.Run(HandleDownlink, cts.Token);

            try
            {
                // Await for either of the downlink or uplink task to finish
                await await Task.WhenAny(uplinkTask, downlinkTask);
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }
            finally
            {
                // Cancel the other task (uplink or downlink)
                cts.Cancel();
                Console.WriteLine("Client closing");
                Dispose();
            }
        }

        /// <summary>
        /// Handle uplink traffic (client -> multiplexer -> remote)
        /// </summary>
        async Task HandleUplink()
        {
            cts.Token.ThrowIfCancellationRequested();
            int c;
            byte[] buffer = new byte[256];
            while ((c = await stream.ReadAsync(buffer, 0, buffer.Length, cts.Token)) > 0)
            {
                upload(buffer.Take(c).ToArray());
            }
        }

        /// <summary>
        /// Handle downlink traffic (remote -> multiplexer -> client)
        /// </summary>
        async Task HandleDownlink()
        {
            cts.Token.ThrowIfCancellationRequested();
            byte[] data;

            // This would block if the downlink queue is empty
            while (null != (data = downlinkQueue.Take(cts.Token)))
            {
                await stream.WriteAsync(data, 0, data.Length, cts.Token);
            }
        }

        public override string ToString()
        {
            return client.Client.RemoteEndPoint.ToString();
        }

        public void Dispose()
        {
            Console.WriteLine($"Disposing of client: {this}");
            OnClose();
            cts.Dispose();
            stream.Dispose();
            client.Close();
        }
    }
}

Global.cs

namespace Multiplexer
{
    using System;
    using System.Collections.Concurrent;
    using System.Threading;

    /// <summary>
    /// A class to hold common dependencies to other classes
    /// </summary>
    /// <remarks>
    /// This used to be a singleton and referenced directly by other code, hence the name "Global".
    /// I changed it to be dependencies passed as constructor parameters of other classes so it's 
    /// easier to write tests.
    /// </remarks>
    class Global
    {
        /// <summary>
        /// Queue for data to be uploaded to remote server (client -> remote)
        /// </summary>
        public BlockingCollection<byte[]> UploadQueue => uploadQueue;

        /// <summary>
        /// Set of connected clients. This is used as a set (only keys are used), but there's no ConcurrentSet.
        /// </summary>
        public ConcurrentDictionary<Client, byte> Clients => clients;

        /// <summary>
        /// A cancellation token for disconnection. This is used to cancel the remote connection and client connections.
        /// </summary>
        public CancellationToken CancellationToken => cts.Token;

        /// <summary>
        /// A readonly object holding status for the remote connection
        /// </summary>
        public IRemoteInfo Remote => remote ?? new DummyRemote();

        private BlockingCollection<byte[]> uploadQueue = new BlockingCollection<byte[]>();
        private ConcurrentDictionary<Client, byte> clients = new ConcurrentDictionary<Client, byte>();
        private CancellationTokenSource cts = new CancellationTokenSource();
        private IRemoteInfo remote;

        /// <summary>
        /// Register remote connection
        /// </summary>
        public IRemoteInfo RegisterRemote(IRemoteInfo remote)
        {
            return Interlocked.Exchange(ref this.remote, remote);
        }

        /// <summary>
        /// Cancel the global cancellation token. Used for disconnecting remote and clients
        /// </summary>
        public void Cancel()
        {
            if (!cts.Token.IsCancellationRequested)
            {
                lock (cts)
                {
                    if (!cts.Token.IsCancellationRequested)
                    {
                        cts.Cancel();
                        cts.Dispose();
                        cts = new CancellationTokenSource();
                    }
                }
            }
        }

        /// <summary>
        /// Clear the upload queue. Used at each new remote connection.
        /// </summary>
        public void ResetUploadQueue()
        {
            Console.WriteLine($"Resetting upload queue ({uploadQueue.Count})");
            byte[] b;
            while (uploadQueue.TryTake(out b)) { }
        }

        /// <summary>
        /// A dummy remote info to avoid null referencing when no remote server is connected
        /// </summary>
        class DummyRemote : IRemoteInfo
        {
            public bool Connected => false;
            public string RemoteAddress => "";
        }
    }
}


Get this bounty!!!

#StackBounty: #ubuntu #security #tcp Windows .NET application can't connect after Ubuntu 16.04 upgrade of linux-image to 4.4.0-151

Bounty: 200

After upgrading Ubuntu 16.04 kernel to linux-image-4.4.0-151-generic some of our clients stopped being able to connect with TCP. Specifically using SSH.NET library from Windows servers with SFTP service provided by CrushFTP.

We had to rollback the upgrade, but the issues fixed in this kernel version look very serious (CVE-2019-11477, CVE-2019-11478 {SACK Panic}, CVE-2019-11479):

Version: 4.4.0-151.178  2019-06-19 13:11:04 UTC

  linux (4.4.0-151.178) xenial; urgency=medium

  * Remote denial of service (system crash) caused by integer overflow in TCP
    SACK handling (LP: #1831637)
    - SAUCE: tcp: limit payload size of sacked skbs
    - SAUCE: tcp: fix fack_count accounting on tcp_shift_skb_data()

  * Remote denial of service (resource exhaustion) caused by TCP SACK scoreboard
    manipulation (LP: #1831638)
    - SAUCE: tcp: tcp_fragment() should apply sane memory limits

 -- Stefan Bader <email address hidden> Tue, 11 Jun 2019 09:36:19 +0200

Do you know and can share any links for more information about similar problems experienced after upgrade to this specific Ubuntu kernel?


Get this bounty!!!

#StackBounty: #ubuntu #security #tcp Windows .NET application can't connect after Ubuntu 16.04 upgrade of linux-image to 4.4.0-151

Bounty: 200

After upgrading Ubuntu 16.04 kernel to linux-image-4.4.0-151-generic some of our clients stopped being able to connect with TCP. Specifically using SSH.NET library from Windows servers with SFTP service provided by CrushFTP.

We had to rollback the upgrade, but the issues fixed in this kernel version look very serious (CVE-2019-11477, CVE-2019-11478 {SACK Panic}, CVE-2019-11479):

Version: 4.4.0-151.178  2019-06-19 13:11:04 UTC

  linux (4.4.0-151.178) xenial; urgency=medium

  * Remote denial of service (system crash) caused by integer overflow in TCP
    SACK handling (LP: #1831637)
    - SAUCE: tcp: limit payload size of sacked skbs
    - SAUCE: tcp: fix fack_count accounting on tcp_shift_skb_data()

  * Remote denial of service (resource exhaustion) caused by TCP SACK scoreboard
    manipulation (LP: #1831638)
    - SAUCE: tcp: tcp_fragment() should apply sane memory limits

 -- Stefan Bader <email address hidden> Tue, 11 Jun 2019 09:36:19 +0200

Do you know and can share any links for more information about similar problems experienced after upgrade to this specific Ubuntu kernel?


Get this bounty!!!

#StackBounty: #ubuntu #security #tcp Windows .NET application can't connect after Ubuntu 16.04 upgrade of linux-image to 4.4.0-151

Bounty: 200

After upgrading Ubuntu 16.04 kernel to linux-image-4.4.0-151-generic some of our clients stopped being able to connect with TCP. Specifically using SSH.NET library from Windows servers with SFTP service provided by CrushFTP.

We had to rollback the upgrade, but the issues fixed in this kernel version look very serious (CVE-2019-11477, CVE-2019-11478 {SACK Panic}, CVE-2019-11479):

Version: 4.4.0-151.178  2019-06-19 13:11:04 UTC

  linux (4.4.0-151.178) xenial; urgency=medium

  * Remote denial of service (system crash) caused by integer overflow in TCP
    SACK handling (LP: #1831637)
    - SAUCE: tcp: limit payload size of sacked skbs
    - SAUCE: tcp: fix fack_count accounting on tcp_shift_skb_data()

  * Remote denial of service (resource exhaustion) caused by TCP SACK scoreboard
    manipulation (LP: #1831638)
    - SAUCE: tcp: tcp_fragment() should apply sane memory limits

 -- Stefan Bader <email address hidden> Tue, 11 Jun 2019 09:36:19 +0200

Do you know and can share any links for more information about similar problems experienced after upgrade to this specific Ubuntu kernel?


Get this bounty!!!

#StackBounty: #ubuntu #security #tcp Windows .NET application can't connect after Ubuntu 16.04 upgrade of linux-image to 4.4.0-151

Bounty: 200

After upgrading Ubuntu 16.04 kernel to linux-image-4.4.0-151-generic some of our clients stopped being able to connect with TCP. Specifically using SSH.NET library from Windows servers with SFTP service provided by CrushFTP.

We had to rollback the upgrade, but the issues fixed in this kernel version look very serious (CVE-2019-11477, CVE-2019-11478 {SACK Panic}, CVE-2019-11479):

Version: 4.4.0-151.178  2019-06-19 13:11:04 UTC

  linux (4.4.0-151.178) xenial; urgency=medium

  * Remote denial of service (system crash) caused by integer overflow in TCP
    SACK handling (LP: #1831637)
    - SAUCE: tcp: limit payload size of sacked skbs
    - SAUCE: tcp: fix fack_count accounting on tcp_shift_skb_data()

  * Remote denial of service (resource exhaustion) caused by TCP SACK scoreboard
    manipulation (LP: #1831638)
    - SAUCE: tcp: tcp_fragment() should apply sane memory limits

 -- Stefan Bader <email address hidden> Tue, 11 Jun 2019 09:36:19 +0200

Do you know and can share any links for more information about similar problems experienced after upgrade to this specific Ubuntu kernel?


Get this bounty!!!

#StackBounty: #ubuntu #security #tcp Windows .NET application can't connect after Ubuntu 16.04 upgrade of linux-image to 4.4.0-151

Bounty: 200

After upgrading Ubuntu 16.04 kernel to linux-image-4.4.0-151-generic some of our clients stopped being able to connect with TCP. Specifically using SSH.NET library from Windows servers with SFTP service provided by CrushFTP.

We had to rollback the upgrade, but the issues fixed in this kernel version look very serious (CVE-2019-11477, CVE-2019-11478 {SACK Panic}, CVE-2019-11479):

Version: 4.4.0-151.178  2019-06-19 13:11:04 UTC

  linux (4.4.0-151.178) xenial; urgency=medium

  * Remote denial of service (system crash) caused by integer overflow in TCP
    SACK handling (LP: #1831637)
    - SAUCE: tcp: limit payload size of sacked skbs
    - SAUCE: tcp: fix fack_count accounting on tcp_shift_skb_data()

  * Remote denial of service (resource exhaustion) caused by TCP SACK scoreboard
    manipulation (LP: #1831638)
    - SAUCE: tcp: tcp_fragment() should apply sane memory limits

 -- Stefan Bader <email address hidden> Tue, 11 Jun 2019 09:36:19 +0200

Do you know and can share any links for more information about similar problems experienced after upgrade to this specific Ubuntu kernel?


Get this bounty!!!

#StackBounty: #ubuntu #security #tcp Windows .NET application can't connect after Ubuntu 16.04 upgrade of linux-image to 4.4.0-151

Bounty: 200

After upgrading Ubuntu 16.04 kernel to linux-image-4.4.0-151-generic some of our clients stopped being able to connect with TCP. Specifically using SSH.NET library from Windows servers with SFTP service provided by CrushFTP.

We had to rollback the upgrade, but the issues fixed in this kernel version look very serious (CVE-2019-11477, CVE-2019-11478 {SACK Panic}, CVE-2019-11479):

Version: 4.4.0-151.178  2019-06-19 13:11:04 UTC

  linux (4.4.0-151.178) xenial; urgency=medium

  * Remote denial of service (system crash) caused by integer overflow in TCP
    SACK handling (LP: #1831637)
    - SAUCE: tcp: limit payload size of sacked skbs
    - SAUCE: tcp: fix fack_count accounting on tcp_shift_skb_data()

  * Remote denial of service (resource exhaustion) caused by TCP SACK scoreboard
    manipulation (LP: #1831638)
    - SAUCE: tcp: tcp_fragment() should apply sane memory limits

 -- Stefan Bader <email address hidden> Tue, 11 Jun 2019 09:36:19 +0200

Do you know and can share any links for more information about similar problems experienced after upgrade to this specific Ubuntu kernel?


Get this bounty!!!