#StackBounty: #tensorflow #computer-vision #object-detection #yolo #implementation Yolo issue with detecting positives

Bounty: 50

I’ve recently tried to implement a Yolo detector for traffic light detection based on yolo v1 implementation in Tensorflow/Keras. My model really struggles with detecting small objects. Loss function components drop on training, but all this does is seemingly push confidence values to really small values (Since there are many more cells that do not contain objects, one way model could minimize loss function would be to push all confidences to 0).

It usually detects objects where traffic lights appear in dataset, so in some way it is learning a distribution of correct positions/size ratios, but it fails to predict correct bounding boxes on some concrete example from training set, like in the following image:

enter image description here

I’ve used the net proposed in the original paper with 448 x 448 resolution images, however without pretraining. I’ve actually tried using VGG-16 net pretrained on Imagenet as a feature extractor and adding some convolutional and FC layers, but with similar results :(-

My loss function chooses a predictor for each object in a grid cell base on highest IoU with that object.
It adds a squared difference from that predictor multiplied by 0.1 factor if there was no object there. If there was an object that predictor was assigned to, it only adds the squared difference loss. Also a predictor can be assigned to multiple objects (as per this answer).

So I’m at a loss here (pun intended) and have a few questions.

a) Is pretraining really necessary, and if, can I use a net that was pretrained as classifier on a different dataset, and different objects (other than traffic lights) as a feature extractor.

b) Could I improve performance by running the net on negative examples first (images with no traffic lights), then adding positive examples?

I used the Bosch Small Traffic Light Dataset. Here is my entire loss function:

class YoloLoss():

  def __init__(self, step=0):
    self.step = step
    

  def call(self, y_true, y_preds):
    """
    Args:
        ground_truth: np.array [batch_size, s, s , b, (4 + 1)]
        y_preds: tf.Tensor [batch_size, ss, b, (4 + 1)] 

    Returns:
        loss for each element of batch 
    """
    batch,s,s,b,_ = y_true.shape
    ss = s * s 
    size1 = [batch, ss, b, 5]
   
    cy = tf.tile(tf.range(s, dtype=tf.float32)[...,None], [1, s])
    cx = tf.tile(tf.range(s, dtype=tf.float32)[None,...], [s, 1])
    
    cell_xy = tf.reshape(tf.stack([cx,cy], axis=-1), [1, ss, 1, 2])  # [1, ss, 1, 2]
    cell_xy = tf.tile(cell_xy, [batch, 1, b, 1]) # [batch, ss, b, 2]

    # ==== PREDICTIONS ====
    #y_preds = tf.reshape(y_preds, size1) # [batch, SS, B, 5]

    # Transform net outputs
    net_confs = y_preds[..., 4] # [batch, SS, B, 2]
    net_xy = y_preds[..., 0:2] # [batch, SS, B, 2]
    net_wh = tf.exp(y_preds[..., 2:4]) # [batch, SS, B, 2]

    """
    net_confs = tf.sigmoid(y_preds[..., 4]) # [batch, SS, B, 2]
    net_xy = tf.sigmoid(y_preds[..., 0:2]) # [batch, SS, B, 2]
    net_wh = tf.exp(y_preds[..., 2:4]) # [batch, SS, B, 2]
    """
    
    pred_confs = tf.expand_dims(net_confs, axis=2) #[batch, SS, 1, B]
    pred_wh = tf.expand_dims(net_wh, axis=2) # [batch, SS, 1, B, 2]
    pred_centers = tf.expand_dims(net_xy + cell_xy, axis=2)  # [batch, ss, 1, b, 2]
    pred_floor = pred_centers - (0.5 * pred_wh)  # [batch, SS, 1, B, 2]
    pred_ceil  = pred_centers + (0.5 * pred_wh)  # [batch, SS, 1, B, 2]
    pred_area = pred_wh[..., 0] * pred_wh[..., 1] # [batch, SS, 1, B]

    
    # ==== GROUND TRUTH ==== 
    y_true = tf.reshape(y_true, size1)

    p_obj = tf.expand_dims(y_true[..., 4], axis=3) #[batch, ss, B, 1]
    true_floor = tf.expand_dims(y_true[..., 0:2], axis=3)  # [batch, ss, B, 1, 2]
    true_ceil  = tf.expand_dims(y_true[..., 2:4], axis=3)  # [batch, ss, B, 1, 2]
    true_wh = true_ceil - true_floor # [batch, ss, B, 1, 2]
    true_area = true_wh[..., 0] * true_wh[..., 1] # [batch, ss, B, 1]
    true_centers = 0.5 * (true_floor + true_ceil) # [batch, ss, B, 1, 2] 


    # ==== CALCULATE IOU (TRUTH, PREDS) ==== 

    xy_floor = tf.math.maximum(true_floor, pred_floor) # [batch, ss, B, B, 2]
    xy_ceil  = tf.math.minimum(true_ceil, pred_ceil) # [batch, ss, B, B, 2]
    
    z = tf.math.maximum(0.0, xy_ceil - xy_floor) #[batch, ss, B, B, 2]
    inter_area = z[..., 0] * z[..., 1] #[batch, ss, B, B]

    union_area = true_area + pred_area - inter_area # [batch, ss, B, B]

    iou = tf.math.truediv(inter_area, union_area) # [batch, ss, b, b]


    # ==== PREDICTOR RESPONSIBILITY ==== 

    # iou_mask[:,:,i,j] = 1.0 if object predictor j is assigned to object i
    responsibility_mask = tf.cast(tf.equal(tf.argsort(tf.argsort(iou, 3, direction='DESCENDING'), 3), 0), tf.float32) # [batch, ss, b, b]
    cobj = responsibility_mask * p_obj   # [batch, ss, b, b]
    cnoobj = responsibility_mask * (1. - p_obj) # [batch, ss, b, b]
    
    # ==== LOSS COMPONENTS ==== 
    scoord = tf.constant(5.0, dtype=tf.float32)
    snoobj = tf.constant(0.1, dtype=tf.float32)
    sconf  = tf.constant(5.0, dtype=tf.float32)

    xy_diff = tf.math.square(pred_centers - true_centers) * cobj[..., None] # [batch, ss, b, b, 2]
    xy_loss = tf.math.reduce_sum(xy_diff, axis=[1,2,3,4]) # [batch]

    wh_diff = tf.math.square(tf.sqrt(pred_wh) - tf.sqrt(true_wh)) * cobj[..., None] # [batch, ss, b, b, 2]
    wh_loss = tf.math.reduce_sum(wh_diff, axis=[1,2,3,4]) # [batch]

    iou_diff = tf.math.square(pred_confs - iou) # [batch, ss, b, b]

    conf_diff = iou_diff * cobj # [batch, ss, b, b]
    conf_loss = tf.math.reduce_sum(conf_diff, axis=[1,2,3])

    noobj_diff = iou_diff * cnoobj #[batch, ss, b, b]
    noobj_loss = tf.math.reduce_sum(noobj_diff, axis=[1,2,3])

   
    loss = scoord * (xy_loss + wh_loss) + sconf * conf_loss + snoobj * noobj_loss 
    loss = tf.math.reduce_sum(loss)

    tf.summary.scalar("xy_loss", tf.math.reduce_mean(xy_loss), step=self.step)
    tf.summary.scalar("wh_loss", tf.math.reduce_mean(wh_loss), step=self.step)
    tf.summary.scalar("conf_loss", tf.math.reduce_mean(conf_loss), step=self.step)
    tf.summary.scalar("noobj_loss", tf.math.reduce_mean(noobj_loss), step=self.step)

    self.step += 1

    return loss


Get this bounty!!!

#StackBounty: #tensorflow #computer-vision #object-detection #yolo Yolo issue with detecting positives

Bounty: 50

I’ve recently tried to implement a Yolo detector for traffic light detection based on yolo v1 implementation in Tensorflow/Keras. My model really struggles with detecting small objects. Loss function components drop on training, but all this does is seemingly push confidence values to really small values (Since there are many more cells that do not contain objects, one way model could minimize loss function would be to push all confidences to 0).

It usually detects objects where traffic lights appear in dataset, so in some way it is learning a distribution of correct positions/size ratios, but it fails to predict correct bounding boxes on some concrete example from training set, like in the following image:

enter image description here

I’ve used the net proposed in the original paper with 448 x 448 resolution images, however without pretraining. I’ve actually tried using VGG-16 net pretrained on Imagenet as a feature extractor and adding some convolutional and FC layers, but with similar results :(-

My loss function chooses a predictor for each object in a grid cell base on highest IoU with that object.
It adds a squared difference from that predictor multiplied by 0.1 factor if there was no object there. If there was an object that predictor was assigned to, it only adds the squared difference loss. Also a predictor can be assigned to multiple objects (as per this answer).

So I’m at a loss here (pun intended) and have a few questions.

a) Is pretraining really necessary, and if, can I use a net that was pretrained as classifier on a different dataset, and different objects (other than traffic lights) as a feature extractor.

b) Could I improve performance by running the net on negative examples first (images with no traffic lights), then adding positive examples?

I used the Bosch Small Traffic Light Dataset. Here is my entire loss function:

class YoloLoss():

  def __init__(self, step=0):
    self.step = step
    

  def call(self, y_true, y_preds):
    """
    Args:
        ground_truth: np.array [batch_size, s, s , b, (4 + 1)]
        y_preds: tf.Tensor [batch_size, ss, b, (4 + 1)] 

    Returns:
        loss for each element of batch 
    """
    batch,s,s,b,_ = y_true.shape
    ss = s * s 
    size1 = [batch, ss, b, 5]
   
    cy = tf.tile(tf.range(s, dtype=tf.float32)[...,None], [1, s])
    cx = tf.tile(tf.range(s, dtype=tf.float32)[None,...], [s, 1])
    
    cell_xy = tf.reshape(tf.stack([cx,cy], axis=-1), [1, ss, 1, 2])  # [1, ss, 1, 2]
    cell_xy = tf.tile(cell_xy, [batch, 1, b, 1]) # [batch, ss, b, 2]

    # ==== PREDICTIONS ====
    #y_preds = tf.reshape(y_preds, size1) # [batch, SS, B, 5]

    # Transform net outputs
    net_confs = y_preds[..., 4] # [batch, SS, B, 2]
    net_xy = y_preds[..., 0:2] # [batch, SS, B, 2]
    net_wh = tf.exp(y_preds[..., 2:4]) # [batch, SS, B, 2]

    """
    net_confs = tf.sigmoid(y_preds[..., 4]) # [batch, SS, B, 2]
    net_xy = tf.sigmoid(y_preds[..., 0:2]) # [batch, SS, B, 2]
    net_wh = tf.exp(y_preds[..., 2:4]) # [batch, SS, B, 2]
    """
    
    pred_confs = tf.expand_dims(net_confs, axis=2) #[batch, SS, 1, B]
    pred_wh = tf.expand_dims(net_wh, axis=2) # [batch, SS, 1, B, 2]
    pred_centers = tf.expand_dims(net_xy + cell_xy, axis=2)  # [batch, ss, 1, b, 2]
    pred_floor = pred_centers - (0.5 * pred_wh)  # [batch, SS, 1, B, 2]
    pred_ceil  = pred_centers + (0.5 * pred_wh)  # [batch, SS, 1, B, 2]
    pred_area = pred_wh[..., 0] * pred_wh[..., 1] # [batch, SS, 1, B]

    
    # ==== GROUND TRUTH ==== 
    y_true = tf.reshape(y_true, size1)

    p_obj = tf.expand_dims(y_true[..., 4], axis=3) #[batch, ss, B, 1]
    true_floor = tf.expand_dims(y_true[..., 0:2], axis=3)  # [batch, ss, B, 1, 2]
    true_ceil  = tf.expand_dims(y_true[..., 2:4], axis=3)  # [batch, ss, B, 1, 2]
    true_wh = true_ceil - true_floor # [batch, ss, B, 1, 2]
    true_area = true_wh[..., 0] * true_wh[..., 1] # [batch, ss, B, 1]
    true_centers = 0.5 * (true_floor + true_ceil) # [batch, ss, B, 1, 2] 


    # ==== CALCULATE IOU (TRUTH, PREDS) ==== 

    xy_floor = tf.math.maximum(true_floor, pred_floor) # [batch, ss, B, B, 2]
    xy_ceil  = tf.math.minimum(true_ceil, pred_ceil) # [batch, ss, B, B, 2]
    
    z = tf.math.maximum(0.0, xy_ceil - xy_floor) #[batch, ss, B, B, 2]
    inter_area = z[..., 0] * z[..., 1] #[batch, ss, B, B]

    union_area = true_area + pred_area - inter_area # [batch, ss, B, B]

    iou = tf.math.truediv(inter_area, union_area) # [batch, ss, b, b]


    # ==== PREDICTOR RESPONSIBILITY ==== 

    # iou_mask[:,:,i,j] = 1.0 if object predictor j is assigned to object i
    responsibility_mask = tf.cast(tf.equal(tf.argsort(tf.argsort(iou, 3, direction='DESCENDING'), 3), 0), tf.float32) # [batch, ss, b, b]
    cobj = responsibility_mask * p_obj   # [batch, ss, b, b]
    cnoobj = responsibility_mask * (1. - p_obj) # [batch, ss, b, b]
    
    # ==== LOSS COMPONENTS ==== 
    scoord = tf.constant(5.0, dtype=tf.float32)
    snoobj = tf.constant(0.1, dtype=tf.float32)
    sconf  = tf.constant(5.0, dtype=tf.float32)

    xy_diff = tf.math.square(pred_centers - true_centers) * cobj[..., None] # [batch, ss, b, b, 2]
    xy_loss = tf.math.reduce_sum(xy_diff, axis=[1,2,3,4]) # [batch]

    wh_diff = tf.math.square(tf.sqrt(pred_wh) - tf.sqrt(true_wh)) * cobj[..., None] # [batch, ss, b, b, 2]
    wh_loss = tf.math.reduce_sum(wh_diff, axis=[1,2,3,4]) # [batch]

    iou_diff = tf.math.square(pred_confs - iou) # [batch, ss, b, b]

    conf_diff = iou_diff * cobj # [batch, ss, b, b]
    conf_loss = tf.math.reduce_sum(conf_diff, axis=[1,2,3])

    noobj_diff = iou_diff * cnoobj #[batch, ss, b, b]
    noobj_loss = tf.math.reduce_sum(noobj_diff, axis=[1,2,3])

   
    loss = scoord * (xy_loss + wh_loss) + sconf * conf_loss + snoobj * noobj_loss 
    loss = tf.math.reduce_sum(loss)

    tf.summary.scalar("xy_loss", tf.math.reduce_mean(xy_loss), step=self.step)
    tf.summary.scalar("wh_loss", tf.math.reduce_mean(wh_loss), step=self.step)
    tf.summary.scalar("conf_loss", tf.math.reduce_mean(conf_loss), step=self.step)
    tf.summary.scalar("noobj_loss", tf.math.reduce_mean(noobj_loss), step=self.step)

    self.step += 1

    return loss


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!

#StackBounty: #ssh #pygame #framebuffer #computer-vision #xrdp Python pygame program won't run through SSH / Remote Desktop

Bounty: 50

I’ve been working on my own object recognition program based on the rpi-vision test program pitft_labeled_output.py (from this webpage). It’s basically a custom neural network model and a bit modified code for the new model to work. However, I’m having problems running my program through Remote Desktop. This is how I run my program (using venv from Graphic Labeling Demo):

pi@raspberrypi:~ $ sudo bash
root@raspberrypi:/home/pi# cd signcap && . ../rpi-vision/.venv/bin/activate
(.venv) root@raspberrypi:/home/pi/signcap# python3 src/dar_ts.py
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
No protocol specified
No protocol specified
No protocol specified
xcb_connection_has_error() returned true
No protocol specified
xcb_connection_has_error() returned true
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
No protocol specified
Unable to init server: Could not connect: Connection refused

(Mask:1359): Gtk-WARNING **: 20:00:31.012: cannot open display: :10.0

As you can see, I’m getting several "No protocol specified" errors and the display error. When I run echo $DISPLAY in a console when I’m connected through Remote Desktop, it prints :10.0.

When I run the pitft_labeled_output.py through Remote Desktop like:

sudo bash
root@raspberrypi:/home/pi# cd rpi-vision && . .venv/bin/activate
(.venv) root@raspberrypi:/home/pi/rpi-vision# python3 tests/pitft_labeled_output.py

The display is successfully opened and everything works as it should.

However, my program works fine locally or through VNC. When I’m connected through VNC and run echo $DISPLAY, I get :1.0.
What could be the issue that it doesn’t work through Remote Desktop, but pitft_labeled_output.py does?

Here is the code of my program so you can compare it with pitft_labeled_output.py from rpi-vision:

import time
import logging
import argparse
import pygame
import os
import sys
import numpy as np
import subprocess
import signal

# Environment variables for Braincraft HAT.
os.environ['SDL_FBDEV'] = "/dev/fb1"
os.environ['SDL_VIDEODRIVER'] = "fbcon"

def dont_quit(signal, frame):
   print('Caught signal: {}'.format(signal))
signal.signal(signal.SIGHUP, dont_quit)

from capture import PiCameraStream
from tsdar import TrafficSignDetectorAndRecognizer

# Initialize the logger.
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

# Initialize the display.
pygame.init()
# Create a Surface object which is shown on the display.
# If size is set to (0,0), the created Surface will have the same size as the
# current screen resolution (240x240 for Braincraft HAT).
screen = pygame.display.set_mode((0,0), pygame.FULLSCREEN)
# Declare the capture manager for Pi Camera.
capture_manager = None

# Function for parsing program arguments.
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--rotation', type=int, choices=[0, 90, 180, 270],
                        dest='rotation', action='store', default=0,
                        help='Rotate everything on the display by this angle')
    args = parser.parse_args()
    return args

last_seen = []
already_seen = []

def main(args):
    global capture_manager, last_seen, already_seen

    # Initialize the capture manager to get stream from Pi Camera.
    if screen.get_width() == screen.get_height() or args.roation in (0, 180):
        capture_manager = PiCameraStream(resolution=(max(320, screen.get_width()), max(240, screen.get_height())), rotation=180, preview=False, format='rgb')
    else:
        capture_manager = PiCameraStream(resolution=(max(240, screen.get_height()), max(320, screen.get_width())), rotation=180, preview=False, format='rgb')

    # Initialize the buffer size to screen size.
    if args.rotation in (0, 180):
        buffer = pygame.Surface((screen.get_width(), screen.get_height()))
    else:
        buffer = pygame.Surface((screen.get_height(), screen.get_width()))

    # Hide the mouse from the screen.
    pygame.mouse.set_visible(False)
    # Initialize the screen to black.
    screen.fill((0,0,0))
    # Try to show the splash image on the screen (if the image exists), otherwise, leave screen black.
    try:
        splash = pygame.image.load(os.path.dirname(sys.argv[0])+'/bchatsplash.bmp')
        splash = pygame.transform.rotate(splash, args.rotation)
        screen.blit(splash, ((screen.get_width() / 2) - (splash.get_width() / 2),
                    (screen.get_height() / 2) - (splash.get_height() / 2)))
    except pygame.error:
        pass
    pygame.display.update()

    # Use the default font.
    smallfont = pygame.font.Font(None, 24)
    medfont = pygame.font.Font(None, 36)

    # Initialize the traffic sign detector and recognizer object with the path
    # to the TensorFlow Lite (tflite) neural network model.
    tsdar0 = TrafficSignDetectorAndRecognizer(os.path.dirname(sys.argv[0])+'/models/uw_tsdar_model_no_aug_w_opts.tflite')

    # Start getting capture from Pi Camera.
    capture_manager.start()

    while not capture_manager.stopped:
        # If the frame wasn't captured successfully, go to the next while iteration
        if capture_manager.frame is None:
            continue

        # Fill the buffer with black color
        buffer.fill((0,0,0))

        # Update the frame.
        rgb_frame = capture_manager.frame

        # Make predictions. If traffic signs were detected, a bounding rectangle
        # will be drawn around them.
        timestamp = time.monotonic()
        predictions, out_frame = tsdar0.predict(rgb_frame)
        delta = time.monotonic() - timestamp
        logging.info(predictions)
        logging.info("TFLite inference took %d ms, %0.1f FPS" % (delta * 1000, 1 / delta))

        # Make an image from a frame.
        previewframe = np.ascontiguousarray(out_frame)
        img = pygame.image.frombuffer(previewframe, capture_manager.camera.resolution, 'RGB')

        # Put the image into buffer.
        buffer.blit(img, (0, 0))

        # Add FPS and temperature on the top corner of the buffer.
        fpstext = "%0.1f FPS" % (1/delta,)
        fpstext_surface = smallfont.render(fpstext, True, (255, 0, 0))
        fpstext_position = (buffer.get_width()-10, 10) # Near the top right corner
        buffer.blit(fpstext_surface, fpstext_surface.get_rect(topright=fpstext_position))
        try:
            temp = int(open("/sys/class/thermal/thermal_zone0/temp").read()) / 1000
            temptext = "%dN{DEGREE SIGN}C" % temp
            temptext_surface = smallfont.render(temptext, True, (255, 0, 0))
            temptext_position = (buffer.get_width()-10, 30) # near the top right corner
            buffer.blit(temptext_surface, temptext_surface.get_rect(topright=temptext_position))
        except OSError:
            pass

        # Reset the detecttext vertical position.
        dtvp = 0

        # For each traffic sign that is recognized in the current frame (up to 3 signs),
        # its name will be printed on the screen and it will be announced if it already wasn't.
        for i in range(len(predictions)):
            p = predictions[i];
            name = tsdar0.CLASS_NAMES[p]
            print("Detected", name)

            last_seen.append(name)

            # Render sign name on the bottom of the buffer (if multiple signs detected,
            # current sign name is written above the previous sign name).           .
            detecttext = name
            detecttext_font = medfont
            detecttext_color = (255, 0, 0)
            detecttext_surface = detecttext_font.render(detecttext, True, detecttext_color)
            dtvp = buffer.get_height() - (i+1)*(detecttext_font.size(detecttext)[1]) - i*detecttext_font.size(detecttext)[1]//2
            detecttext_position = (buffer.get_width()//2, dtvp)
            buffer.blit(detecttext_surface, detecttext_surface.get_rect(center=detecttext_position))

            # Make an announcement for the traffic sign if it's new (not detected in previous consecutive frames).
            if detecttext not in already_seen:
                os.system('echo %s | festival --tts & ' % detecttext)

        # If new traffic signs were detected in the current frame, add them to already_seen list
        for ts in last_seen:
            if ts not in already_seen:
                already_seen.append(ts)

        # If the traffic sign disappeared from the frame (a car passed it), remove it from already_seen
        diff = list(set(already_seen)-set(last_seen))
        already_seen = [ts for ts in already_seen if ts not in diff]

        # Reset last_seen.
        last_seen = []

        # Show the buffer image on the screen.
        screen.blit(pygame.transform.rotate(buffer, args.rotation), (0,0))
        pygame.display.update()

# Run the program until it's interrupted by key press.
if __name__ == "__main__":
    args = parse_args()
    try:
        main(args)
    except KeyboardInterrupt:
        capture_manager.stop()

Edit:
To clarify a bit more, I first followed this tutorial to install my Braincraft HAT and then followed this one to try out the object recognition test example (pitft_labeled_output.py) from rpi-vision. Everything worked great through SSH. I saw logging info in the SSH console and the camera feed and recognized objects on the Braincraftt HAT display. Then I decided to try it out from Windows Remote Desktop (after installing xrdp on RPi) and it worked great. I saw logging info in terminal and camera feed on Braincraft display. But, when I wanted to run my program instead of pitft_labeled_output.py, I received the errors mentioned above. I even went further and replaced pitft_labeled_output.py code with my code(dar_ts.py) and ran my code as if I was running pitft_labeled_output.py (thought that there might be some dependencies inside rpi-vision folder), but it didn’t work, received the same error. What could be the issue with my code?

P.S. What also confused me further is that pitft_labeled_output.py has a typo in line 56 and runs fine anyway, but when I ran my code for the first time, it asked me to correct the error.
enter image description here


Get this bounty!!!