This week's assignment involves working with the actor_films dataset. Your task is to construct a series of SQL queries and table definitions that will allow us to model the actor_films dataset in a way that facilitates efficient analysis. This involves creating new tables, defining data types, and writing queries to populate these tables with data from the actor_films dataset
The actor_films dataset contains the following fields:
actor: The name of the actor.actorid: A unique identifier for each actor.film: The name of the film.year: The year the film was released.votes: The number of votes the film received.rating: The rating of the film.filmid: A unique identifier for each film.
The primary key for this dataset is (actor_id, film_id).
-
DDL for
actorstable: Create a DDL for anactorstable with the following fields:-
films: An array ofstructwith the following fields:- film: The name of the film.
- votes: The number of votes the film received.
- rating: The rating of the film.
- filmid: A unique identifier for each film.
-
quality_class: This field represents an actor's performance quality, determined by the average rating of movies of their most recent year. It's categorized as follows:star: Average rating > 8.good: Average rating > 7 and ≤ 8.average: Average rating > 6 and ≤ 7.bad: Average rating ≤ 6.
-
is_active: A BOOLEAN field that indicates whether an actor is currently active in the film industry (i.e., making films this year).
-
-
Cumulative table generation query: Write a query that populates the
actorstable one year at a time. -
DDL for
actors_history_scdtable: Create a DDL for anactors_history_scdtable with the following features:- Implements type 2 dimension modeling (i.e., includes
start_dateandend_datefields). - Tracks
quality_classandis_activestatus for each actor in theactorstable.
- Implements type 2 dimension modeling (i.e., includes
-
Backfill query for
actors_history_scd: Write a "backfill" query that can populate the entireactors_history_scdtable in a single query. -
Incremental query for
actors_history_scd: Write an "incremental" query that combines the previous year's SCD data with new incoming data from theactorstable.