#StackBounty: #sql #postgresql #query-performance Query optimization with multi-column variant matching

Bounty: 100

TL;DR – I’m looking for advice on how to better write the query below.

Below is a pared down version of my table structure with some sample data. I don’t have control over the data
structure at all so recommendations on schema changes unfortunately won’t help me.

Problem

Given a building_level_key and a faction_key I need to return a record from building_levels joined to its
closest match from the building_culture_variants table.

For example, if I used goblin_walls & fact_blue I would expect the goblin_walls record that joins
building_culture_variant_key record 2.

An example structure of tables can be seen below:

Sample data with db structure

  • factions – is a compacted version of the real table as cultures/subculture records are stored in different
    tables but it gets the point across. This table is only really needed in the query so that the appropriate
    culture/subculture can be referenced in relation to a given faction_key.

  • building_levels – acts as a base record for every building in the system. There is only one record per
    building.

  • building_culture_variants – acts as its name implies; there can be more than one record for each building_level_key and each variant record is matched against a building level using the building_level_key and a combination of faction_key, culture_key and subculture_key.

How matching works

Matching starts with finding the given building_level_key in the culture variants table. This is a hard match and is needed to join any two building level and culture variant.

Each building level record will have at least one culture variant. Often there are several culture variants per building level but on average no more that 4. The most common culture variants is a “generic” one which means that the faction_key, culture_key and subculture_key columns are all null so the building will match against any faction. However any combination of the faction columns could have a key so I need to match a given faction against each of the faction columns in the culture variant.

Side note: the culture variant keys are always consistent, meaning I’ll never have a scenario where a faction_key and subculture_key in the culture variants table don’t match a corresponding faction_key and subculture_key from the factions table (and subculture table, which has been omitted for clarity).

What I’ve tried

I have provided a sql fiddle to play around with and included my version of the query below:

SELECT 
  "building_culture_variants"."building_culture_variant_key" AS qualified_key, 
  "building_levels"."building_level_key" AS building_key, 
  "building_levels"."create_time", 
  "building_levels"."create_cost", 
  "building_culture_variants"."name",
  'fact_blue'::text AS faction_key
FROM 
  "building_levels" 
  INNER JOIN "building_culture_variants" ON (
    "building_culture_variants"."building_culture_variant_key" IN (
      SELECT 
        "building_culture_variant_key" 
      FROM 
        (
          SELECT 
            "building_culture_variants"."building_culture_variant_key", 
            (
                CASE WHEN "building_culture_variants"."faction_key" = "building_factions"."faction_key" THEN 1 WHEN "building_culture_variants"."faction_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."culture_key" = "building_factions"."culture_key" THEN 1 WHEN "building_culture_variants"."culture_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."subculture_key" = "building_factions"."subculture_key" THEN 1 WHEN "building_culture_variants"."subculture_key" IS NULL THEN 0 ELSE NULL END
            ) AS match_count 
          FROM 
            "building_culture_variants" 
            INNER JOIN (
              -- This is a subquery because here I would join a couple more tables
              -- to collect all of the faction info
              SELECT 
                "factions"."faction_key", 
                "factions"."culture_key", 
                "factions"."subculture_key"
              FROM 
                "factions" 
            ) AS "building_factions" ON ("building_factions"."faction_key" = 'fact_blue')
          WHERE ("building_levels"."building_level_key" = "building_culture_variants"."building_level_key") 
          GROUP BY 
            match_count, 
            building_culture_variant_key 
          ORDER BY 
            match_count DESC NULLS LAST 
          LIMIT 
            1
        ) AS "culture_variant_match"
    )
  ) 
WHERE "building_levels"."building_level_key" = 'goblin_walls'
ORDER BY 
  "building_levels"."building_level_key"

Question

The query I provided above works and gets the job done but I feel like I’m trying to brute force the problem by just nesting a bunch of queries. I get this feeling like I’m not taking advantage of some sql construct that would streamline the performance of, or greatly simplify the query.

So what I’m really asking, is there a better way I could rewrite the query to be more efficient?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.