diff --git a/vignettes/rfaR-Realization-Conceptual.Rmd b/vignettes/rfaR-Realization-Conceptual.Rmd index 724a89c..07c572c 100644 --- a/vignettes/rfaR-Realization-Conceptual.Rmd +++ b/vignettes/rfaR-Realization-Conceptual.Rmd @@ -15,16 +15,17 @@ vignette: > knitr::opts_chunk$set( collapse = TRUE, fig.align = 'center', - comment = "#>") + comment = "#>" +) ``` ## Executive Summary -RMC-RFA produces a reservoir stage-frequency curve with uncertainty bounds by utilizing a deterministic flood routing model while treating the inflow volume, flood hydrograph shape, seasonal occurrence, and antecedent reservoir stage as uncertain variables rather than fixed values. To quantify both the natural variability and epistemic uncertainty in reservoir stage-frequency estimates, RMC-RFA employs a two-looped, nested Monte Carlo methodology. +RMC-RFA produces a reservoir stage-frequency curve with uncertainty bounds by utilizing a deterministic flood routing model within a nested Monte Carlo simulation. The nested Monte Carlo simulation quantifies natural variability and epistemic uncertainty by treating inflow volume, flood hydrograph shape, seasonal occurrence, and antecedent reservoir stage as uncertain variables. -The inner loop — defined here as a realization — simulates natural variability by routing 10,000 individual flood events through the reservoir, each with randomly sampled starting stage, hydrograph shape, and stratified-sampled inflow volume drawn from the volume-frequency distribution. Together these routed events produce a single stage-frequency curve estimate. The outer loop propagates epistemic uncertainty by repeating this process across many realizations, each conditioned on a different set of volume-frequency distribution parameters drawn from the posterior. +The inner loop — defined here as a realization — simulates natural variability by routing 10,000 individual flood events through the reservoir, each with a randomly sampled starting stage, hydrograph shape, and inflow volume. Together the 10,000 routed events produce a single stage-frequency curve estimate. The outer loop propagates epistemic uncertainty by repeating this process across many realizations, each conditioned on a different set of volume-frequency distribution parameters drawn from the posterior. -This document walks through a single realization in detail. Understanding one realization is the key to understanding the full simulation: the outer loop simply repeats this process with a resampled parameter set each time (in `rfaR`, this is the *next* parameter set from the input). The walkthrough covers the stratified sampling procedure, random sampling of starting stage and hydrograph shape, and calculation of stage exceedance probabilities — steps that are common to all simulation types. The full uncertainty simulation conducts 10,000 realizations, each comprising 10,000 flood events (100 million routing operations total), while the expected value simulation conducts 10,000 realizations, each using a one of the 10,000 BestFit parameter sets. +This document walks through a single realization in detail. Understanding one realization is the key to understanding the full simulation: the outer loop repeats realizations with a resampled parameter sets (in `rfaR`, this is the *next* parameter set from the input). The walkthrough covers the stratified sampling procedure, random sampling of starting stage and hydrograph shape, and calculation of stage exceedance probabilities. These steps are common to all simulation types. The full uncertainty simulation conducts 10,000 realizations, each comprising 10,000 flood events (100 million routing operations total), while the expected value simulation conducts 10,000 realizations, each using a one of the 10,000 BestFit parameter sets. ***Note:*** Users with coding experience may find the loops in this script "inefficient". This purpose of this step-by-step guide is to provide a replicable guide to the realization process. @@ -38,30 +39,30 @@ library(ggsci) ## `rfaR` Analysis Summary -A full uncertainty run in `rfa_simulate()` is comprised of 10,000 realizations. Each realization is compromised of 10,000 reservoir routing computations (simulations) using the following: +A full uncertainty run in `rfa_simulate()` is the result of 10,000 realizations. An individual realization is comprised of 10,000 deterministic routing simulations. Each simulation contains: -- Stratified samples (10,000 samples comprised of 20 bins with 500 events each) from an inflow volume-frequency curve (VFC). The VFC is estimated using one parameter set +- Stratified sample from an inflow volume-frequency curve (VFC)[^1] -- 10,000 Seasonality samples represented by the month +- Seasonality sample represented by the month -- 10,000 Starting stage samples, dependent on the sampled month +- Starting stage sample, dependent on the sampled month - Hydrograph Shape, scaled to sampled inflow volume -- Modified-Puls routings to obtain peak stages and/or peak discharge +- Modified-Puls routing to obtain peak stages and/or peak discharge These module emulate the workflow and methodology from RMC-RFA. The subsections below will step through an rfaR realization. These sections represent modules and functions contained in `rfa_simulate()`. +[^1]: The VFC is estimated using one parameter set. The stratification is 20 bins with 500 events each. + ## Flow-Frequency Stratified Sampling ### Stratified Bins Definition & Set-Up -Stratified sampling is used instead of simple random sampling because reliable estimates of exceedance probabilities far into the tail of the distribution — events with AEPs of 1/10,000 or rarer — are required. Simple random sampling would require an impractically large number of events to adequately populate the tail. Instead, the frequency axis is divided into `Nbins` equal-probability bins, and `Mevents` are sampled uniformly within each bin. This ensures the tail is well-represented without artificially inflating the total number of routing operations. +Stratified sampling is used instead of simple random sampling because reliable estimates of exceedance probabilities far into the tail of the distribution (events with AEPs of 1/10,000 or rarer) are required. Simple random sampling would require an impractically large number of events to adequately populate the tail. Instead, the frequency axis is divided into equal-probability bins (`Nbins`), and events are sampled uniformly within each bin (`Mevents`). This ensures the tail is well-represented without artificially inflating the total number of routing operations. ```{r} -ords <- stratified_sampler(Nbins = 20, - Mevents = 500, - dist = "ev1") +ords <- stratified_sampler(Nbins = 20, Mevents = 500, dist = "ev1") ``` The bin weights (`ords$Weights`) reflect the true probability mass that each bin represents. These weights are applied later to reconstruct the true exceedance probability from the stratified sample, correcting for the fact that rare-event bins were intentionally oversampled relative to their actual frequency of occurrence. @@ -74,15 +75,16 @@ The z-variate matrix translates each stratified sample into a standard normal de z_matrix <- matrix(ncol = ords$Nbins, nrow = ords$Mevents) # Using EV1 -for (i in 1:ords$Nbins){ +for (i in 1:ords$Nbins) { # Lower Bin Boundary (prior bin's upper boundary) bin_lower <- ords$Zlower[i] - + # Upper Bin Boundary bin_upper <- ords$Zupper[i] # Vector of random values - z_matrix[,i] <- (bin_lower + (runif(500, min = 0, max = 1))*(bin_upper - bin_lower)) + z_matrix[, i] <- (bin_lower + + (runif(500, min = 0, max = 1)) * (bin_upper - bin_lower)) } ``` @@ -107,8 +109,19 @@ aep_bounds <- 1 - pnorm(unique(c(ords$Zlower, ords$Zupper))) # Format nicely in scientific notation aep_labels <- formatC(aep_bounds, format = "E", digits = 1) -aep_major <- c(9.9e-01, 9.0e-01, 5.0e-01, 1.0e-01, 1.0e-02, - 1.0e-03, 1.0e-04, 1.0e-05, 1.0e-06, 1.0e-07, 1.0e-08) +aep_major <- c( + 9.9e-01, + 9.0e-01, + 5.0e-01, + 1.0e-01, + 1.0e-02, + 1.0e-03, + 1.0e-04, + 1.0e-05, + 1.0e-06, + 1.0e-07, + 1.0e-08 +) as.data.frame(Q_matrix) |> setNames(1:20) |> @@ -118,16 +131,20 @@ as.data.frame(Q_matrix) |> geom_jitter(aes(color = bin), size = 0.5, alpha = 0.7) + scale_color_viridis_d(option = "turbo") + scale_x_discrete(breaks = 1:20, labels = aep_labels[1:20]) + - scale_y_log10(breaks = scales::log_breaks(n = 5, base = 10), - minor_breaks = scales::minor_breaks_log(detail = 1)) + - theme_rfar_conceptual()+ + scale_y_log10( + breaks = scales::log_breaks(n = 5, base = 10), + minor_breaks = scales::minor_breaks_log(detail = 1) + ) + + theme_rfar_conceptual() + #theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7)) + - labs(x = "Bin Midpoint AEP", - y = "Inflow Volume (cfs)", - color = NULL, - title = "Inflow Volume-Frequency Sampling", - subtitle = "LP3: (3.5504, 0.3718, 0.7555)") + labs( + x = "Bin Midpoint AEP", + y = "Inflow Volume (cfs)", + color = NULL, + title = "Inflow Volume-Frequency Sampling", + subtitle = "LP3: (3.5504, 0.3718, 0.7555)" + ) ``` @@ -146,25 +163,37 @@ The month of flood occurrence is sampled randomly using historical relative freq The sample is a pre-allocated vector/sequence of months that is `Nsims` long. ```{r} -InitMonths <- sample(1:12, size = Nsims, replace = TRUE, prob = jmd_seasonality$relative_frequency) +InitMonths <- sample( + 1:12, + size = Nsims, + replace = TRUE, + prob = jmd_seasonality$relative_frequency +) UniqMonths <- sort(unique(InitMonths)) ``` ```{r init_month-seq-plot, fig.width = 7, fig.height = 5, echo = FALSE, warning = FALSE} -InitMonths_df <- data.frame(Sim_n = seq(1,length(InitMonths),1), - InitMonths = InitMonths) -ggplot(InitMonths_df) + - geom_bar(aes(factor(as.integer(InitMonths)), - fill = factor(as.integer(InitMonths))), - color = "grey80") + +InitMonths_df <- data.frame( + Sim_n = seq(1, length(InitMonths), 1), + InitMonths = InitMonths +) +ggplot(InitMonths_df) + + geom_bar( + aes(factor(as.integer(InitMonths)), fill = factor(as.integer(InitMonths))), + color = "grey80" + ) + ggsci::scale_fill_aaas() + - scale_x_discrete() + - scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) + - labs(x = "Sampled Month", y = "Count within Sample", fill = NULL, - title = "Seasonality Sampling", - subtitle = "Total Count of Monthly Samples") + + scale_x_discrete() + + scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) + + labs( + x = "Sampled Month", + y = "Count within Sample", + fill = NULL, + title = "Seasonality Sampling", + subtitle = "Total Count of Monthly Samples" + ) + theme_rfar_conceptual() ``` @@ -183,28 +212,42 @@ stage_ts$months <- lubridate::month(lubridate::mdy(stage_ts$date)) for (i in 1:length(UniqMonths)) { sampleID <- which(InitMonths == UniqMonths[i]) - InitStages[sampleID] <- sample(stage_ts$stage[stage_ts$months %in% UniqMonths[i]], - size = sum(InitMonths == UniqMonths[i]), replace = TRUE) + InitStages[sampleID] <- sample( + stage_ts$stage[stage_ts$months %in% UniqMonths[i]], + size = sum(InitMonths == UniqMonths[i]), + replace = TRUE + ) } ``` ```{r init_stage-seq-plot, fig.width = 7, fig.height = 5, echo = FALSE, warning = FALSE} -InitStages_df <- data.frame(Sim_n = seq(1,length(InitMonths),1), - InitMonths = InitMonths, - InitStages = InitStages) +InitStages_df <- data.frame( + Sim_n = seq(1, length(InitMonths), 1), + InitMonths = InitMonths, + InitStages = InitStages +) ggplot(InitStages_df) + - geom_jitter(aes(x = InitMonths, InitStages, color = factor(as.integer(InitMonths))), size = 1, alpha = 0.7) + + geom_jitter( + aes(x = InitMonths, InitStages, color = factor(as.integer(InitMonths))), + size = 1, + alpha = 0.7 + ) + ggsci::scale_color_aaas() + - scale_y_continuous(breaks = seq(3790,3900,10), - labels = scales::label_comma()) + - scale_x_continuous(breaks = seq(4,9,1)) + - coord_cartesian(ylim = c(3790,3870)) + - labs(x = "Sampled Month", y = "Sampled Starting Stage", - color = NULL, - title = "Samples of Starting Stage Elevations", - subtitle = "Samples are taken from observed stage timeseries") + + scale_y_continuous( + breaks = seq(3790, 3900, 10), + labels = scales::label_comma() + ) + + scale_x_continuous(breaks = seq(4, 9, 1)) + + coord_cartesian(ylim = c(3790, 3870)) + + labs( + x = "Sampled Month", + y = "Sampled Starting Stage", + color = NULL, + title = "Samples of Starting Stage Elevations", + subtitle = "Samples are taken from observed stage timeseries" + ) + theme_rfar_conceptual() ``` @@ -214,18 +257,24 @@ ggplot(InitStages_df) + The shape of the flood hydrograph is sampled randomly from a selection of historical and synthetic hydrographs. Each hydrograph in the library is assigned a weight (default weights are uniform); the sampled shape is then scaled to match the flood volume drawn from the LP3 distribution. Note that the defined critical duration is 2-days (`critical_duration`) and the routing window has been set to 10-days (`routing_days`). ```{r} -hydrographs <- hydrograph_setup(jmd_hydro_apr1999, - jmd_hydro_jun1921, - jmd_hydro_jun1965, - jmd_hydro_jun1965_15min, - jmd_hydro_may1955, - jmd_hydro_pmf, - jmd_hydro_sdf, - critical_duration = 2, - routing_days = 10) - -hydroSamps <- sample(1:length(hydrographs), size = Nsims, replace = TRUE, - prob = attr(hydrographs, "probs")) +hydrographs <- hydrograph_setup( + jmd_hydro_apr1999, + jmd_hydro_jun1921, + jmd_hydro_jun1965, + jmd_hydro_jun1965_15min, + jmd_hydro_may1955, + jmd_hydro_pmf, + jmd_hydro_sdf, + critical_duration = 2, + routing_days = 10 +) + +hydroSamps <- sample( + 1:length(hydrographs), + size = Nsims, + replace = TRUE, + prob = attr(hydrographs, "probs") +) ``` ## Conduct Reservoir Routing Simulations @@ -245,21 +294,25 @@ for (i in 1:nrow(Q_matrix)) { realiz <- (i - 1) * ncol(Q_matrix) + j # Hydrograph shape - hydrograph_shape <- hydrographs[[hydroSamps[realiz]]][,2:3] + hydrograph_shape <- hydrographs[[hydroSamps[realiz]]][, 2:3] # Hydrograph observed volume - obs_hydrograph_vol <- attr(hydrographs[[hydroSamps[realiz]]],"obs_vol") + obs_hydrograph_vol <- attr(hydrographs[[hydroSamps[realiz]]], "obs_vol") # Scale hydrograph to sample volume - scaled_hydrograph <- scale_hydrograph(hydrograph_shape, - obs_hydrograph_vol, - Q_matrix[i,j]) + scaled_hydrograph <- scale_hydrograph( + hydrograph_shape, + obs_hydrograph_vol, + Q_matrix[i, j] + ) # Route scaled hydrograph - tmpResults <- mod_puls_routing(resmodel_df = jmd_resmodel, - inflow_df = scaled_hydrograph, - initial_elev = InitStages[realiz], - full_results = FALSE) + tmpResults <- mod_puls_routing( + resmodel_df = jmd_resmodel, + inflow_df = scaled_hydrograph, + initial_elev = InitStages[realiz], + full_results = FALSE + ) # Record results peakStage[i, j] <- tmpResults[1] @@ -288,14 +341,14 @@ For consistency with code chunks, the candidate stages in `stage_vect` can be re n_exceedance_stages <- ords$Mevents stage_vect <- rep(NA, n_exceedance_stages) -for (b in 1:(n_exceedance_stages)){ +for (b in 1:(n_exceedance_stages)) { # Min Stage - if(b < 2){ + if (b < 2) { stage <- min_stage - - # Incrementally calculate the next exceedance stage - }else{ - stage <- stage_vect[b-1] + (max_stage - min_stage)/(n_exceedance_stages) + + # Incrementally calculate the next exceedance stage + } else { + stage <- stage_vect[b - 1] + (max_stage - min_stage) / (n_exceedance_stages) } stage_vect[b] <- stage } @@ -308,13 +361,16 @@ For each bin (column) of `peakStage` and each exceedance stage (`exceedance_stag The accompanying graphic illustrates this process. In each bin, the number of stages exceeding `stage_vect[1]` is summed by `stage_exceed_count`. This is then divided by the number of routed events in the bin (`mstages`) to represent a probability of exceedance (`stage_exceed_prob`). This processes is repeated for each exceedance stage in `stage_vect`, within each bin. ```{r} -stage_exceedance_matrix <- matrix(nrow = length(stage_vect), ncol = ncol(Q_matrix)) +stage_exceedance_matrix <- matrix( + nrow = length(stage_vect), + ncol = ncol(Q_matrix) +) -for(m in 1:ncol(peakStage)){ +for (m in 1:ncol(peakStage)) { # Get Bin of peak stages, each represented by m event - mstages <- peakStage[,m] + mstages <- peakStage[, m] - for(n in 1:length(stage_vect)){ + for (n in 1:length(stage_vect)) { # Get Stage exceedance_stage <- stage_vect[n] @@ -322,10 +378,10 @@ for(m in 1:ncol(peakStage)){ stage_exceed_count <- sum(mstages > exceedance_stage) # Exceedance Prob in Bin - stage_exceed_prob <- stage_exceed_count/length(mstages) + stage_exceed_prob <- stage_exceed_count / length(mstages) - # Save to exceedance matrix - stage_exceedance_matrix[n,m] <- stage_exceed_prob + # Save to exceedance matrix + stage_exceedance_matrix[n, m] <- stage_exceed_prob } } ``` @@ -339,15 +395,19 @@ as.data.frame(peakStage) |> geom_jitter(aes(color = bin), size = 0.9, alpha = 0.7) + scale_color_viridis_d(option = "turbo") + scale_x_discrete(breaks = 1:20, labels = aep_labels[1:20]) + - scale_y_continuous(breaks = scales::pretty_breaks(n = 10), - labels = scales::label_comma()) + - labs(x = "Bin Midpoint AEP", - y = "Peak Stage (ft-NAVD88)", - color = NULL, - title = "Peak Stage Results by Stratified Bin", - subtitle = "Routing results from 10,000 events") + + scale_y_continuous( + breaks = scales::pretty_breaks(n = 10), + labels = scales::label_comma() + ) + + labs( + x = "Bin Midpoint AEP", + y = "Peak Stage (ft-NAVD88)", + color = NULL, + title = "Peak Stage Results by Stratified Bin", + subtitle = "Routing results from 10,000 events" + ) + theme_rfar_conceptual() + - theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7)) + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7)) ``` @@ -358,14 +418,14 @@ The stage exceedance probabilities from each bin are combined using the stratifi The result is `stage_result` which is the stages from `stage_vect` and their corresponding AEP estimates. This is the realization stage-frequency curve. This curve can be interpolated using stages or AEPs. ```{r} -stage_aep_vect <- rep(NA,length(stage_vect)) +stage_aep_vect <- rep(NA, length(stage_vect)) -for(m in 1:nrow(stage_exceedance_matrix)){ +for (m in 1:nrow(stage_exceedance_matrix)) { # Grab row of exceedances - stage_exceedance_probs <- stage_exceedance_matrix[m,] + stage_exceedance_probs <- stage_exceedance_matrix[m, ] # Sum the dot product of exceedances and weights - stage_aep <- sum(stage_exceedance_probs*ords$Weights) + stage_aep <- sum(stage_exceedance_probs * ords$Weights) #save to vector stage_aep_vect[m] <- stage_aep @@ -373,9 +433,10 @@ for(m in 1:nrow(stage_exceedance_matrix)){ stage_result <- tibble( AEP = stage_aep_vect, - Z_var = qnorm(1-stage_aep_vect), + Z_var = qnorm(1 - stage_aep_vect), Gumb = -log(-log(1 - AEP)), - Stage = stage_vect) + Stage = stage_vect +) ``` @@ -388,27 +449,52 @@ stage_exc_long <- as.data.frame(stage_exceedance_matrix) |> mutate(bin = as.integer(bin)) near_1 <- qnorm(1 - 0.999999999999) -near_0 <- qnorm(1-1e-16) +near_0 <- qnorm(1 - 1e-16) ggplot() + - geom_line(data = stage_exc_long, - aes(x = qnorm(1 - AEP), y = Stage, color = factor(bin)), - linewidth = 0.75, - alpha = 0.6) + - geom_line(data = stage_result, - aes(x = Z_var, y = Stage), - linewidth = 1.25, color = "black") + - scale_x_continuous(breaks = qnorm(1 - aep_major), - labels = formatC(aep_major, format = "E", digits = 0)) + - scale_y_continuous(breaks = scales::pretty_breaks(n = 10), - labels = scales::label_comma()) + + geom_line( + data = stage_exc_long, + aes(x = qnorm(1 - AEP), y = Stage, color = factor(bin)), + linewidth = 0.75, + alpha = 0.6 + ) + + geom_line( + data = stage_result, + aes(x = Z_var, y = Stage), + linewidth = 1.25, + color = "black" + ) + + scale_x_continuous( + breaks = qnorm(1 - aep_major), + labels = formatC(aep_major, format = "E", digits = 0) + ) + + scale_y_continuous( + breaks = scales::pretty_breaks(n = 10), + labels = scales::label_comma() + ) + scale_color_viridis_d(option = "turbo") + coord_cartesian(xlim = c(near_1, near_0)) + - theme_rfar_conceptual()+ - theme(axis.text.x = element_text(angle = 45, hjust = 1))+ - labs(x = "AEP", y = "Peak Stage (ft-NAVD88)", - title = "Stage-Frequency Curve", - subtitle = "Colored lines = per-bin exceedance curves; black = weighted results (sum of weighted bin exceedance probabilities)") - + theme_rfar_conceptual() + + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + + labs( + x = "AEP", + y = "Peak Stage (ft-NAVD88)", + title = "Stage-Frequency Curve", + subtitle = "Colored lines = per-bin exceedance curves; black = weighted results (sum of weighted bin exceedance probabilities)" + ) ``` + +## Concluding Remarks + +### Understanding the Role of a Single Realization + +Understanding the inputs and calculation procedures of a single realization is crucial to understanding the nested Monte Carlo simulation process. Realization computations are used in both expected-only and full uncertainty stage-frequency estimates. The expected-only estimation consists of a single realization using 10,000 stratified volume-frequency samples, each drawn from an individual volume-frequency distribution parameter set. Full uncertainty computations perform one realization for each volume-frequency distribution parameter set to satisfy the law of total probability. + +### Deterministic Routing Within a Probabilistic Framework + +RMC-RFA produces a reservoir stage-frequency curve with uncertainty bounds by employing a deterministic flood routing model within a nested Monte Carlo simulation. The inputs to each routing simulation are sampled probabilistically, while the reservoir model routes each simulated flood event using deterministic stage-storage-discharge functions. + +### The Law of Total Probability is the Theoretical Backbone + +The nested Monte Carlo process provides reliable stage-frequency estimates with uncertainty bounds through the law of total probability. Complete integration over the sample space is achieved through stratified inflow-volume sampling within each volume-frequency parameter set, ensuring the full range of flood outcomes is represented across all parameter uncertainty.