#StackBounty: #linux #networking #apache-2.4 #tcp Apparent random slowniness on apache 2.4 on HTTPS and HTTP

Bounty: 50

Hello currently I experience random slowniness in apache 2.4.18 on HTTPS on Ubuntu server 16.04. Here follow observations, symptoms and info:

  • it doesn’t happen with HTTP
  • it doesn’t happen with about 2k used slots in apache’s /server-status. It happens from 2.7k upwards (I’ve seen the status reach even 4.2k used slots)
  • time curl -vI <url> is stuck in Trying <ip address> and responds in 1 to 30 seconds (the average processing time, when not slow, is 0.2s). While stuck, tcpdump -n | grep '<ip address of request>' shows no incoming packets in the server: they appear once curl goes on
  • apache is using mpm_prefork with the following settings
StartServers                     10
MinSpareServers           50
MaxSpareServers          100
ServerLimit     20000
MaxRequestWorkers         20000
MaxConnectionsPerChild   0
  • apache’s /server-status page shows many (hundreds) free slots
  • server resources average usage: cpu 30%, load average 20, ram used 35%, disk io r/w very low
  • some tuning attempts on the TCP queues (https://blog.cloudflare.com/syn-packet-handling-in-the-wild/)
    • ss -n state syn-recv sport = :443 | wc -l shows a maximum number of 130 when launched repeatedly. It seems that curl hangs when this number is close to 130
    • net.core.somaxconn was 128, I did set it to 4096 with sysctl -w net.core.somaxconn=4096, then even reloaded apache
    • I’ve launched sysctl -w net.ipv4.tcp_max_syn_backlog=4096
    • ss -plnt sport = :443 shows something like
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port
LISTEN     0      128         :::443                     :::*                   users:(("apache2",pid=184902,fd=8),<4000 more apache processes>

Honestly I don’t know if the queues tuning has been effective.

I’d like to first find out how to troubleshoot the problem to find out the part responsible of the slowniness, so any hint on troubleshooting is appreciated. Thank you

Edit 1

OK there’s 1 thing that maybe I didn’t try (for avoiding a downtime in production): the restart of apache. I thought that the TCP queue of the port 443 was handled by the system alone, i.e. without asking apache which was its limit: wrong. After the restart the queue was finally free to go over 130: in fact, in the first seconds after the apache restart the queue hit the apache limit of 511.

Now that the queue seems ok curl isn’t stuck anymore on Trying to connect; now the problem seems on the apache side, which seems slow even on HTTP.

More information:

  • the request url I’m using is being handled by an apache VirtualHost, which just issues a Redirect
  • here follows the information on the SSL cache
SSL/TLS Session Cache Status:
cache type: SHMCB, shared memory: 512000 bytes, current entries: 2176
subcaches: 32, indexes per subcache: 88
time left on oldest entries' objects: avg: 229 seconds, (range: 209...247)
index usage: 77%, cache usage: 99%
total entries stored since starting: 43916
total entries replaced since starting: 0
total entries expired since starting: 0
total (pre-expiry) entries scrolled out of the cache: 41740
total retrieves since starting: 586 hit, 83648 miss
total removes since starting: 0 hit, 0 miss

So please share any thoughts on how to troubleshoot the slowniness. Thank you

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.