--- title: "Parallel FMA with future" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Parallel FMA with future} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE ) future_available <- requireNamespace("future", quietly = TRUE) && requireNamespace("future.apply", quietly = TRUE) run_parallel_examples <- future_available ``` ```{r setup} library(ameras) ``` ```{r data} data(data, package = "ameras") ``` # Introduction Frequentist model averaging (`FMA`) fits the specified model once for each dose realization, computes AIC weights across those realization-specific fits, and then samples from the fitted normal approximations. The realization-specific fits are independent of one another, so they are a natural place to use parallel computation. `ameras()` supports optional parallel FMA through the [`future`](https://future.futureverse.org/reference/plan.html) framework and [`future.apply`](https://future.apply.futureverse.org/reference/future_lapply.html). The package does not set a future plan internally. Instead, the user decides whether FMA should run sequentially or in parallel by setting `future::plan()` before calling `ameras()`. If `future.apply` is not installed, `ameras()` falls back to the ordinary sequential `lapply()` path. For more detail on future plans and future-backed apply loops, see `?future::plan` and `?future.apply::future_lapply`, or the linked Futureverse documentation. # Choosing a future plan The default future plan is sequential, so this call uses the ordinary single-process FMA path. ```{r sequential-plan, eval = FALSE} future::plan(future::sequential) fit <- ameras(Y.gaussian ~ dose(V1:V10) + X1 + X2, data = data, family = "gaussian", methods = "FMA") ``` For parallel FMA on a local machine, `multisession` is usually the most portable choice. It starts separate background R sessions and works across the main platforms supported by R, including Windows, macOS, and Linux. ```{r multisession-plan, eval = FALSE} future::plan(future::multisession, workers = 2) fit <- ameras(Y.gaussian ~ dose(V1:V10) + X1 + X2, data = data, family = "gaussian", methods = "FMA") future::plan(future::sequential) ``` The `multicore` backend can be efficient on systems that support forking, but it is not available on Windows. For code intended to run on multiple platforms, prefer `multisession`. For more specialized settings, the future ecosystem also supports cluster and HPC-oriented backends. See `?future::plan` or the future package documentation for details. # Controlling chunk size By default, `future.apply` chooses how to split dose realizations into futures. For FMA, this can be adjusted with `future.chunk.size.FMA`, the average number of dose realizations handled by each future. Let `K` be the number of dose realizations, `W` the number of workers, and `C` the value of `future.chunk.size.FMA`. `K` and `C` determine the approximate number of future tasks, `ceiling(K / C)`, and at most `W` of those tasks can run at the same time. For example, with `K = 100` dose realizations and `W = 4` workers: | `future.chunk.size.FMA` | Approximate chunks | Interpretation | |---:|---:|---| | 1 | 100 | many small tasks; more load balancing, more overhead | | 5 | 20 | moderate load balancing and overhead | | 25 | 4 | about one task per worker | | 100 | 1 | one task; only one worker is used | ```{r chunk-size, eval = FALSE} future::plan(future::multisession, workers = 2) fit <- ameras(Y.gaussian ~ dose(V1:V10) + X1 + X2, data = data, family = "gaussian", methods = "FMA", future.chunk.size.FMA = 2) future::plan(future::sequential) ``` Smaller chunks can improve load balancing when some realizations take longer to fit than others. Larger chunks can reduce overhead and may be preferable when each realization-specific fit is fast. If the realization-specific fits have similar runtimes, a reasonable starting point is a chunk size near `ceiling(K / W)` or somewhat smaller. If runtimes vary substantially across realizations, smaller chunks may keep workers busier. # Example: sequential and parallel FMA The example below compares the same Gaussian FMA fit with a sequential plan and a two-worker multisession plan. The seed is reset before each call so that the final FMA sampling step is comparable. ```{r compare-plans, eval = run_parallel_examples} old_plan <- future::plan() on.exit(future::plan(old_plan), add = TRUE) future::plan(future::sequential) set.seed(2024) sequential_elapsed <- system.time( sequential_fit <- suppressWarnings( ameras(Y.gaussian ~ dose(V1:V10) + X1 + X2, data = data, family = "gaussian", methods = "FMA", MFMA = 10000) ) )[["elapsed"]] future::plan(future::multisession, workers = 2) set.seed(2024) parallel_elapsed <- system.time( parallel_fit <- suppressWarnings( ameras(Y.gaussian ~ dose(V1:V10) + X1 + X2, data = data, family = "gaussian", methods = "FMA", MFMA = 10000, future.chunk.size.FMA = 2) ) )[["elapsed"]] future::plan(old_plan) ``` Compare the coefficient estimates and elapsed runtimes. ```{r compare-results, eval = run_parallel_examples} coefficient_comparison <- data.frame( term = names(sequential_fit$FMA$coefficients), sequential = unname(sequential_fit$FMA$coefficients), parallel = unname(parallel_fit$FMA$coefficients), row.names = NULL ) coefficient_comparison[-1] <- round(coefficient_comparison[-1], 4) coefficient_comparison data.frame( plan = c("sequential", "multisession"), elapsed_seconds = round(c(sequential_elapsed, parallel_elapsed), 2) ) ``` The estimates should agree up to ordinary numerical precision. Parallel execution is not guaranteed to be faster for small examples because starting workers and transferring data have overhead. It is most useful when the realization-specific fits are slow enough for that overhead to matter less. # Reproducibility The parallelized part of FMA is the realization-specific optimization step, which is deterministic for ordinary model fits. The random FMA sampling step is performed afterward in the main R process. For reproducible FMA samples, set the seed before calling `ameras()`, just as in the sequential case. ```{r reproducibility, eval = FALSE} future::plan(future::multisession, workers = 2) set.seed(2024) fit <- ameras(Y.gaussian ~ dose(V1:V10) + X1 + X2, data = data, family = "gaussian", methods = "FMA") future::plan(future::sequential) ``` User-supplied functions passed through `transform`, `transform.jacobian`, or other runtime arguments should also be deterministic. Random number generation inside those functions is not recommended. # Practical recommendations Use parallel FMA when a direct `ameras(..., methods = "FMA")` call is feasible in one R session, but the realization-specific fits are slow enough that using multiple workers is worthwhile. Start conservatively: - use `future::plan(future::multisession, workers = 2)` for a portable local parallel plan; - increase `workers` only after checking memory use; - try `future.chunk.size.FMA = 2` or larger when each realization-specific fit is fast; - reset the plan to `future::sequential` after the fit if the rest of the session should not use parallel futures. With `multisession`, each worker is a separate R process. Large datasets may therefore be copied to multiple workers. If memory is tight, use fewer workers, increase the chunk size, or consider the manual FMA workflow. # Parallel FMA or manual FMA? The parallel option shown here is best when the built-in FMA workflow fits in memory and you want a regular `amerasfit` object returned by `ameras()`. The [manual FMA vignette](manualfma.html) is intended for larger or more operationally complicated analyses where it is useful to split the realization-specific fits across separate jobs, save intermediate fit summaries, or resume work after partial completion. Manual FMA is more flexible, but it requires more bookkeeping and does not return a regular `amerasfit` object. In short: - use built-in parallel FMA for ordinary analyses that benefit from local parallel workers; - use manual FMA when the analysis must be broken into independent chunks or run outside a single R session.