#StackBounty: #ruby #garbage-collection #temporary-files Ruby TempFile behaviour among different classes

Bounty: 50

Our processing server works mainly with TempFiles as it makes things easier on our side: no need to take care of deleting them as they get garbage collected or handle name collisions, etc.

Lately, we are having problems with TempFiles getting GCed too early in the process. Specially with one of our services that will convert a Foo file from a url to some Bar file and upload it to our servers.

For sake of clarity I added bellow a case scenario in order to make discussion easier and have an example at hand.

This workflow does the following:

  1. Get a url as parameter
  2. Download the Foo file as a TempFile
  3. Duplicate it to a new TempFile
  4. Download the related assets to TempFiles
  5. Link the related assets into the local dup TempFile
  6. Convert the Foo to Bar format
  7. Upload it to our server

At times the conversion fail and everything points to the fact that our local Foo file is pointing to related assets that have been created and GCed before the conversion.

My two questions:

  1. Is it possible that my TempFiles get GCed too early? I read about Ruby GCed system it was very conservative to avoid those scenarios.
  2. How can I avoid this from happening? I could try to save all related assets from download_and_replace_uri(node) and passing them as a return to keep it alive while the instance of ConvertFooService is still existing. But I’m not sure if this would solve it.

ConvertFooService

class ConvertFooService < ApplicationService
  def initialize(url)
    @url = url
  end

  def call
    import_foo
    generate_bar
    upload_bar
    @bar_url
  end

  private

  def import_foo
    @foo_file = Helper::ImportFooService.call(url) # <- TempFile
  end

  def generate_glb_from_pipeline
    `create-bar "#{foo_file.path}" "#{bar_file.path}"`
  end

  def upload_bar
    @bar_url = Helper::UploadBarService.call(bar_file)
  end

  def bar_file
    @bar_file ||= Tempfile.new(['new-file-', '.bar']) # <- TempFile
  end
end

ImportFooService

module Helper
  class ImportFooService < ApplicationHelperService
    def initialize(url)
      @url = url
    end

    def call
      download_if_needed
      duplicate_remote_file
      download_and_replace_embedded_assets
      edited_file
    end

    private

    def download_if_needed
      @original =
        if http?(@url) 
          DownloadRemoteFileService.call(@url) # <- TempFile
        else
          File.open(@url)
        end
    end

    def duplicate_remote_file
      FileUtils.cp(@original.path, edited_file.path)
    end

    def download_and_replace_uri_embedded_assets
      file = File.read(edited_file.path)
      json = JSON.parse(file, symbolize_names: true)
      json[:buffers]&.each { |node| download_and_replace_uri(node) }
      json[:images]&.each { |node| download_and_replace_uri(node) }
      write_to_disk(edited_file.path, json.to_json)
    end

    def download_and_replace_uri(node)
      return unless http?(node[:uri])

      node[:uri] = DownloadRemoteFileService.call(node[:uri]).path # <- TempFile
    end

    def edited_file
      @edited_file ||= Tempfile.new(['edited-', '.foo'])
    end

    def http?(url)
      url.starts_with?('http://') || url.starts_with?('https://')
    end
  end
end

DownloadRemoteFileService

module Helper
  class DownloadRemoteFileService < ApplicationHelperService
    def initialize(url)
      @url = url
    end

    def call
      create_file
      download_file
      @file # <- Tempfile
    end

    private

    def create_file
      @file = Tempfile.new(['new-file-', File.extname(uri.path)])
    end

    def download_file
      use_ssl = uri.scheme == 'https'
      Net::HTTP.start(uri.host, uri.port, use_ssl: use_ssl) do |http|
        url_path = Net::HTTP::Get.new(uri.path)
        response = http.request(url_path)
        write_file(response)
      end
    end

    def write_file(response)
      @file.binmode
      @file.write(response.body)
      @file.flush
    end

    def uri
      @uri ||= URI.parse(url)
    end
  end
end

UploadBarService

module Helper
  class UploadBarService < ApplicationHelperService
    def initialize(file)
      @file = file
      @s3_presigned_url_payload = s3_presigned_url_payload
    end

    def call
      upload # NOTE: Returns the url for the uploaded file
    end

    private

    def upload
      HTTParty.post(ENV['upload_url'], body: { file: @file })
    end
  end
end


Get this bounty!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.