#StackBounty: #sql #data-warehouse #postgresql-9.3 Extrapolate daily historical values from a table that only records when a value chan…

Bounty: 400

I have a table that records a row for each time a score for a location has changed.

score_history:

  • id int PK (uuid auto incrementing int)
  • happened_at timestamp (when the score changed)
  • location_id int FK (the location that the value is for)
  • score float (the new score)

This was done with an eye on efficiency and being able to simply retrieve a list of changes for a given location and serves that purpose nicely.

I’m trying to output the data in a very redundant format to help load it into a rigid external system. The external system expects a row for each location * every date. The goal is to represent the last score value for each location for each date. So if the score changed 3 times in a given date only the score closest to midnight would be considered that locations closing score for the day. I imagine this is similar to the challenge of creating a close of business inventory level fact table.

I have a handy star schema style date dimension table which has a row for every date fully covering this sample period and well into the future.

That table looks like

dw_dim_date:

  • date date PK
  • a bunch of other columns like week number, is_us_holiday etc.

So, if I had only 3 records in the score_history table…

1, 2019-01-01:10:13:01, 100, 5.0
2, 2019-01-05:20:00:01, 100, 5.8
3, 2019-01-05:23:01:22, 100, 6.2

The desired output would be:

2019-01-01, 100, 5.0 
2019-01-02, 100, 5.0 
2019-01-03, 100, 5.0
2019-01-04, 100, 5.0 
2019-01-05, 100, 6.2

3 Requirements:

  1. One row per day per location even if there are no score records
    for that day.
  2. If there are score records for that day the last
    one before midnight should be the score value for the row. In the event of a tie the greater of the two should “win”.
  3. If there are zero score records for that day the score should be the most recent previous score.

I’ve been chasing my tail through subqueries and window functions.

Because I’m hesitant to post something without something I tried I’ll share this trainwreck which produces output but of no meaning…

    SELECT dw_dim_date.date,
                 (SELECT
                    score 
                  FROM score_history 
                    WHERE 
                     score_history.happened_at::DATE < dw_dim_date.date 
                    OR
                     score_history.happened_at::DATE = dw_dim_date.date 
                    ORDER BY score_history.id desc limit 1) as last_score
    FROM dw_dim_date
    WHERE dw_dim_date.date > '2019-06-01'

Grateful for guidance or pointers to other questions to read.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.