The homework this week will be using the devices and events dataset
Construct the following eight queries:
-
A query to deduplicate
game_detailsfrom Day 1 so there's no duplicates -
A DDL for an
user_devices_cumulatedtable that has:- a
device_activity_datelistwhich tracks a users active days bybrowser_type - data type here should look similar to
MAP<STRING, ARRAY[DATE]>- or you could have
browser_typeas a column with multiple rows for each user (either way works, just be consistent!)
- or you could have
- a
-
A cumulative query to generate
device_activity_datelistfromevents -
A
datelist_intgeneration query. Convert thedevice_activity_datelistcolumn into adatelist_intcolumn -
A DDL for
hosts_cumulatedtable- a
host_activity_datelistwhich logs to see which dates each host is experiencing any activity
- a
-
The incremental query to generate
host_activity_datelist -
A monthly, reduced fact table DDL
host_activity_reduced- month
- host
- hit_array - think COUNT(1)
- unique_visitors array - think COUNT(DISTINCT user_id)
-
An incremental query that loads
host_activity_reduced- day-by-day
Please add these queries into a folder, zip them up and submit here