Species Habitat Suitability Modelling with biomod2

Author

Dr Tahir Ali | AG de Meaux

Introduction

Welcome to the Species Habitat Suitability Modelling Workshop! We will explore how to use biomod2 to fit, evaluate, and project multi-algorithm SDMs under climate change.

We’ll explore multi-algorithm modelling, model evaluation, variable importance, ensemble modelling, and future projections under climate change.

Learning Goals

Prepare occurrence & environmental data
Fit multiple algorithms (GLM, GAM, RF, GBM, CTA)
Evaluate models and interpret variable importance
Create ensemble predictions and project under future climates

Species Occurrence Data

Code

demo <- get_demo_data()
species <- demo$species
preds   <- demo$preds
data_source <- demo$source

head(species)

Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 155174 ymin: 4048000 xmax: 509277 ymax: 4207836
Projected CRS: WGS 84 / UTM zone 30N
  Occurrence               geometry
1          1 POINT (304000 4048000)
2          0 POINT (433009 4197679)
3          1 POINT (261300 4200000)
4          1 POINT (383560 4170853)
5          1 POINT (155174 4207836)
6          0 POINT (509277 4144975)

Code

plot(species["Occurrence"], main = "Species Occurrences")

Interpretation:

Presence (1) and absence (0) points show the distribution of the species in the study area.

Interactive Map

Code

species_ll <- tryCatch(sf::st_transform(species, 4326), error = function(e) species)
coords_ll  <- sf::st_coordinates(species_ll)

leaflet(species_ll) |>
  addProviderTiles("CartoDB.Positron") |>
  addCircleMarkers(~coords_ll[,1], ~coords_ll[,2],
                   color = ~ifelse(species_ll$Occurrence == 1, "darkgreen", "red"),
                   popup = ~paste("Occurrence:", species_ll$Occurrence),
                   radius = 5, stroke = FALSE, fillOpacity = 0.8)

Interpretation: Green = presence, Red = absence. Interactive maps help visualize sampling bias or clustering.

Environmental Predictors

Code

Code

plot(preds, main = "Environmental Predictors")

Interpretation: These environmental layers (precipitation, temperature, etc.) are predictors for species distribution.

Output

Code

names(preds)

[1] "elevation"     "precipitation" "temperature"   "vegetation"

Inspect predictor correlation before modeling! Highly correlated predictors can inflate variable importance.

Data Formatting for biomod2

Code

myBiomodData <- BIOMOD_FormatingData(
  resp.var  = species$Occurrence,
  expl.var  = preds,                          # SpatRaster
  resp.xy   = sf::st_coordinates(species),    # in same CRS as preds
  resp.name = "MySpecies",
  PA.strategy = "none"
)

Run SDMs

Code

# Create modeling options
myBiomodOptions <- BIOMOD_ModelingOptions()

myBiomodModelOut <- BIOMOD_Modeling(
  bm.format   = myBiomodData,
  models      = c("GLM", "GAM", "RF", "GBM", "CTA"),
  bm.options  = myBiomodOptions,
  CV.strategy = "random",
  CV.nb.rep   = 3,
  CV.perc     = 0.7,
  metric.eval = c("TSS", "ROC"),
  var.import  = 5,
  seed.val    = 123
)

Each algorithm captures a different relationship between environment and occurrence — ensemble methods help combine their strengths.

Code

mods <- get_built_models(myBiomodModelOut, run = "RUN1")
invisible(
bm_PlotResponseCurves(
bm.out = myBiomodModelOut,
models.chosen = mods,
fixed.var = "median"
)
)

Further Deepen Your Understanding:

How does the predicted probability of presence change with each environmental variable?
Are there thresholds or nonlinearities in the response?

Model Evaluation

Code

Code

evals <- get_evaluations(myBiomodModelOut)
print(evals)

Plot

Code

invisible(
  bm_PlotEvalBoxplot(myBiomodModelOut))

Think about:

Which model performed best (by TSS or ROC)?
Are there any models that performed poorly? What could be the reason?
What does a high TSS or ROC value mean in ecological terms?
What does that imply about nonlinearity in the data?

Further Deepen Your Understanding:

Compare the evaluation metrics across models. Discuss possible reasons for differences (e.g., overfitting, data structure, algorithm assumptions).

Variable Importance

Code

Code

varimp <- get_variables_importance(myBiomodModelOut)
print(varimp)

Plot

Code

invisible(
  bm_PlotVarImpBoxplot(myBiomodModelOut))

Think about:

Which environmental variable is most important for predicting the species’ distribution?
Are there variables that seem unimportant? Why might that be?
How might variable importance change if you used a different species or study area?

Further Deepen Your Understanding:

Discuss how variable importance can inform conservation or management decisions.

Project to Current Environment

Code

Code

myBiomodProj <- BIOMOD_Projection(
  bm.mod = myBiomodModelOut,
  new.env = preds,
  proj.name = "current",
  selected.models = "all",
  binary.meth = "TSS"
)

Map

Code

invisible(plot(myBiomodProj))

Ensemble Modeling

Code

Code

myBiomodEnsemble <- BIOMOD_EnsembleModeling(
  bm.mod = myBiomodModelOut,
  models.chosen = "all",
  em.by = "all",
  em.algo = c("EMmean", "EMwmean", "EMcv"),
  metric.select = "TSS",
  metric.select.thresh = 0.5,
  metric.eval = c("TSS", "ROC")
)

myBiomodEnsembleProj <- BIOMOD_EnsembleForecasting(
  bm.em  = myBiomodEnsemble,
  bm.proj = myBiomodProj
)

# biomod2's internal plot (can look squished for multiple panels)
invisible(plot(myBiomodEnsembleProj))

The weighted mean ensemble (EMwmean) gives higher weight to better-performing algorithms.

Why it looks compressed - biomod2’s internal plotting code does not preserve the raster’s coordinate ratio. - It sets up a multi-panel layout (mfrow) to plot multiple ensemble models side by side. - Each panel is drawn with equal x/y units, not the spatial extent ratio, so the map looks “squished”.

Solution - Extract predictions and plot with terra::plot()

This gives you correct aspect ratio and prettier maps:

Code

# Balanced aspect using terra
ens_rast <- get_predictions(myBiomodEnsembleProj)
terra::plot(ens_rast, nc = 2, main = "Ensemble Forecasts (Balanced Plot)")

Using plotRGB() — composite visualization

plotRGB() displays three raster layers as R–G–B channels (e.g., temperature in red, precipitation in blue, elevation in green). It’s ideal for visualizing covariate contrasts or model uncertainty (mean, sd, cv).

Code

# Extract RGB layers safely
layer_ids <- grep("EM(mean|wmean|cv)ByTSS", names(ens_rast))
if (length(layer_ids) < 3) {
  message("⚠️ Expected 3 ensemble layers (mean, wmean, cv), found ", length(layer_ids), ".")
  message("Available layers are: ", paste(names(ens_rast), collapse = ", "))
} else {
  ens_rgb <- ens_rast[[layer_ids]]

  # Rescale values to 0–255
  stretch <- function(x, minv=0, maxv=255){
    vals <- values(x)
    vals_scaled <- round((vals - min(vals, na.rm=TRUE)) /
                         (max(vals, na.rm=TRUE)-min(vals, na.rm=TRUE)) * (maxv-minv) + minv)
    values(x) <- vals_scaled
    return(x)
  }

  ens_rgb <- stretch(ens_rgb)

  # Convert to RasterStack for plotRGB
  ens_rgb_r <- raster::stack(ens_rgb)

  # Plot RGB composite
  par(mfrow=c(1,1), mar=c(5,5,5,5))
  plotRGB(ens_rgb_r, r=1, g=2, b=3, scale=255,
          axes=TRUE, xlab="Easting (m)", ylab="Northing (m)",
          main="Ensemble (Mean=R, Weighted Mean=G, CV=B)")
}

Interpretation:

Red → areas with high mean suitability
Green → strong weighted mean influence
Blue → high variability among models

Optional Enhancement: View each layer separately with consistent color palette

Code

cols <- hcl.colors(100, "YlOrRd", rev = TRUE)
plot(ens_rast, col = cols)

Code

library(terra)
library(raster)
library(RColorBrewer)
library(sp)

# Convert terra SpatRaster to raster (Raster* object)
ens_raster <- raster::stack(ens_rast)

# Inspect names to select the right layer
names(ens_raster)
# Example: "MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo"

# Subset the TSS-weighted layer
ens_layer <- ens_raster[["MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo"]]

# Check for values
summary(ens_layer)
# If it shows only NAs, projection/resampling step might have failed.

# Optional: replace NAs with 0 or mask to study area
# ens_layer[is.na(ens_layer[])] <- 0

# Define a nice color palette
cols <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG"))(100)
plot(ens_layer,
     col = cols,
     main = "TSS-weighted ensemble prediction",
     axes = TRUE,  # shows coordinates
     box = TRUE)

You can also produces a color-tuned visualization of all three ensemble layers, scaled and plotted like an RGB composite (or single-color maps), with axes and proper legends using terra.

Code

library(terra)
library(RColorBrewer)

# Extract ensemble raster stack
ens_rast <- get_predictions(myBiomodEnsembleProj)

# Optional: check names
names(ens_rast)
# [1] "MySpecies_EMmeanByTSS_mergedData_mergedRun_mergedAlgo"
# [2] "MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo"
# [3] "MySpecies_EMcvByTSS_mergedData_mergedRun_mergedAlgo"

# Select the three layers
layers <- ens_rast[[c(1,2,3)]]

# Scale each layer to 0-255 (RGB scale)
scale_to_255 <- function(x) {
  vals <- x[]
  vals_scaled <- round( (vals - min(vals, na.rm=TRUE)) / (max(vals, na.rm=TRUE) - min(vals, na.rm=TRUE)) * 255 )
  x[] <- vals_scaled
  return(x)
}

layers_scaled <- lapply(layers, scale_to_255)
layers_scaled <- rast(layers_scaled)
names(layers_scaled) <- c("EMmean","EMwmean","EMcv")

# Plot single layers with custom color palettes
cols <- colorRampPalette(brewer.pal(11, "BrBG"))(100)

for (i in 1:3) {
  plot(layers_scaled[[i]],
       col = cols,
       main = paste0(names(layers_scaled)[i], " (scaled)"),
       axes = TRUE, box = TRUE)
}

Code

# Optional: RGB composite (assign layers to R, G, B channels)
# Only works if all layers are scaled 0-255
plotRGB(layers_scaled, r=1, g=2, b=3, scale=255, stretch="lin",
        main="Ensemble RGB composite (EMmean=R, EMwmean=G, EMcv=B)")

Think about:

How does the ensemble prediction compare to individual models?
What are the advantages of using an ensemble approach?
How does model uncertainty (e.g., prob.cv) inform your confidence in predictions?

Further Deepen Your Understanding:

Where are the highest predicted suitability values?
Are there areas of high uncertainty (compare with EMcv or EMci projections)?
How does the ensemble map compare to individual model projections?

Future Climate Projection

Code

if (data_source == "sdm") {
  # 19 bioclim variables (place your file in the project folder)
  # Example file name; change if needed
  future_2070 <- try(rast("future_2070_bioclim.tif"), silent = TRUE)
  if (inherits(future_2070, "SpatRaster")) {
    if (is.na(crs(future_2070))) crs(future_2070) <- "EPSG:4326"
    
    temp_f   <- future_2070[[1]]   # bio1
    precip_f <- future_2070[[12]]  # bio12
    
    elev_c <- preds[["elevation"]]
    veg_c  <- preds[["vegetation"]]
    
    # Align current to WGS84 of future
    elev_wgs <- project(elev_c, temp_f)
    veg_wgs  <- project(veg_c,  temp_f, method = "near")
    
    ext_study   <- ext(elev_wgs)
    temp_crop   <- crop(temp_f,   ext_study)
    precip_crop <- crop(precip_f, ext_study)
    
    elev_res <- resample(elev_wgs, temp_crop)
    veg_res  <- resample(veg_wgs,  temp_crop, method = "near")
    
    # Stack with same names & order as training
    preds_future_sel <- c(elev_res, precip_crop, temp_crop, veg_res)
    names(preds_future_sel) <- c("elevation","precipitation","temperature","vegetation")
    
    # Project
    myBiomodProjFuture <- BIOMOD_Projection(
      bm.mod = myBiomodModelOut,
      new.env = preds_future_sel[[c("elevation","precipitation","temperature","vegetation")]],
      proj.name = "future_2070_fix",
      selected.models = "all",
      binary.meth = "TSS",
      compress = FALSE
    )
    
    # Crop projection back to study area (preds footprint)
    r  <- get_predictions(myBiomodProjFuture)
    r_zoom <- zoom_to_preds(r, preds)
  } else {
    message("future_2070_bioclim.tif not found; skipping future projection.")
    r_zoom <- NULL
  }
} else {
  message("Using biomod2 fallback data; future-projection demo (elev/veg + bio) is skipped.")
  r_zoom <- NULL
}

Think about: - Discussion: How do predicted suitable regions shift?

Are expansions or contractions consistent with ecological expectations?
How could this information be used in conservation planning?

Code

if (!is.null(r_zoom)) {
  terra::plot(r_zoom, nc = 3, mar = c(2,2,2,3), axes = FALSE)
}

Export Outputs

Code

# Export ensemble weighted mean (if built)
if (exists("myBiomodEnsembleProj")) {
  ensemble_rast <- get_predictions(myBiomodEnsembleProj)
  wmean_name <- grep("EMwmeanByTSS", names(ensemble_rast), value = TRUE)
  if (length(wmean_name)) {
    ens_wmean <- ensemble_rast[[wmean_name]]
    writeRaster(ens_wmean, "ensemble_weighted_mean.tif", overwrite = TRUE)
  }
}

# Optional: export cropped future projections
if (!is.null(r_zoom)) {
  writeRaster(r_zoom, "future_2070_projection_cropped.tif", overwrite = TRUE)
}

Now you can import your results into QGIS or ArcGIS for spatial overlay with land-use or protected areas.

Try It Yourself

Modify one of the following:

Use different algorithms (e.g., Maxent, XGBoost).
Add pseudo-absence generation.
Use k-fold spatial cross-validation.
Compare projections for two SSP scenarios (245 vs. 585).

Summary

✅ Prepared occurrence and environmental data ✅ Fitted multi-algorithm SDMs ✅ Evaluated models and explored variable importance ✅ Built ensembles and projected under climate change

Curious Questions:

How could you use these outputs in QGIS or ArcGIS?
What are the limitations of SDM outputs for real-world decision making?

Discussion and Reflection

Group Discussion Prompts:

What are the main sources of uncertainty in SDMs?
How would you improve the data or modeling process?
What ethical considerations arise when using SDMs for conservation or management?

Ideas for Student Mini-Projects

Try modeling a different species (change the resp.var). Add or remove environmental predictors and see how results change.
Compare results using different cross-validation strategies (e.g., k-fold, block).
Explore the effect of sample size by subsetting the data.

References

Thuiller W., Georges D., Engler R., Breiner F. (2023). biomod2: Ensemble platform for species distribution modeling. R package version 4.5.4. Hijmans, R. J. (2023). geodata: Download Geographic Data. R package version 0.6-3.

--- title: "Species Habitat Suitability Modelling with biomod2" author: "Dr Tahir Ali | AG de Meaux" format: html: toc: true toc-depth: 2 code-copy: true code-fold: true code-tools: true code-overflow: wrap execute: warning: false message: false echo: false --- ```{r, message=FALSE, warning=FALSE, results='hide'} # Global setup: robust figure handling for Quarto the_dev <- if (requireNamespace("ragg", quietly = TRUE)) "ragg_png" else "png" knitr::opts_chunk$set( fig.path = "figs/", fig.width = 8, fig.height = 6, dpi = 150, dev = the_dev, fig.align = "center", out.width = "90%", echo = TRUE, message = FALSE, warning = FALSE ) dir.create("figs", showWarnings = FALSE, recursive = TRUE) # macOS Quartz quirks options(bitmapType = "cairo") # Load packages library(sf) library(terra) library(biomod2) library(randomForest) library(gbm) library(mgcv) library(maxnet) library(xgboost) library(leaflet) set.seed(123) # Robust data loader: use 'sdm' demo if present; otherwise fallback to biomod2 demo get_demo_data <- function() { species_file <- system.file("external/species.shp", package = "sdm") if (nzchar(species_file)) { # sdm demos (UTM 30N) species <- sf::st_read(species_file, quiet = TRUE) path <- system.file("external", package = "sdm") preds <- terra::rast(list.files(path, pattern = "asc$", full.names = TRUE)) terra::crs(preds) <- "EPSG:32630" # UTM zone 30N list(species = species, preds = preds, source = "sdm") } else { # biomod2 demos (WGS84) message("Package 'sdm' not available; using biomod2 demo data.") data(DataSpecies, package = "biomod2") df <- DataSpecies[, c("X_WGS84", "Y_WGS84", "GuloGulo")] names(df) <- c("X", "Y", "Occurrence") species <- sf::st_as_sf(df, coords = c("X", "Y"), crs = 4326) data(bioclim_current, package = "biomod2") preds <- terra::rast(bioclim_current) list(species = species, preds = preds, source = "biomod2") } } # Helper for cropping/zooming projections to preds (study area) zoom_to_preds <- function(x, preds, method = "bilinear") { x_proj <- terra::project(x, preds[[1]], method = method) x_crop <- terra::crop(x_proj, preds) x_mask <- terra::mask(x_crop, preds) x_mask } ``` # Introduction Welcome to the **Species Habitat Suitability Modelling Workshop**! We will explore how to use **`biomod2`** to fit, evaluate, and project multi-algorithm SDMs under climate change.\ We’ll explore **multi-algorithm modelling**, **model evaluation**, **variable importance**, **ensemble modelling**, and **future projections** under climate change. ### Learning Goals - Prepare occurrence & environmental data - Fit multiple algorithms (`GLM`, `GAM`, `RF`, `GBM`, `CTA`) - Evaluate models and interpret variable importance - Create ensemble predictions and project under future climates ------------------------------------------------------------------------ # Species Occurrence Data ```{r} demo <- get_demo_data() species <- demo$species preds <- demo$preds data_source <- demo$source head(species) plot(species["Occurrence"], main = "Species Occurrences") ``` **Interpretation:** - Presence (1) and absence (0) points show the distribution of the species in the study area. **Interactive Map** ```{r, message=FALSE, warning=FALSE} species_ll <- tryCatch(sf::st_transform(species, 4326), error = function(e) species) coords_ll <- sf::st_coordinates(species_ll) leaflet(species_ll) |> addProviderTiles("CartoDB.Positron") |> addCircleMarkers(~coords_ll[,1], ~coords_ll[,2], color = ~ifelse(species_ll$Occurrence == 1, "darkgreen", "red"), popup = ~paste("Occurrence:", species_ll$Occurrence), radius = 5, stroke = FALSE, fillOpacity = 0.8) ``` **Interpretation:** Green = presence, Red = absence. Interactive maps help visualize sampling bias or clustering. ------------------------------------------------------------------------ # Environmental Predictors **Code** ```{r, message=FALSE, warning=FALSE} plot(preds, main = "Environmental Predictors") ``` **Interpretation:** These environmental layers (precipitation, temperature, etc.) are predictors for species distribution. **Output** ```{r, message=FALSE, warning=FALSE} names(preds) ``` Inspect predictor correlation before modeling! Highly correlated predictors can inflate variable importance. ------------------------------------------------------------------------ # Data Formatting for biomod2 ```{r, message=FALSE, warning=FALSE, results='hide'} myBiomodData <- BIOMOD_FormatingData( resp.var = species$Occurrence, expl.var = preds, # SpatRaster resp.xy = sf::st_coordinates(species), # in same CRS as preds resp.name = "MySpecies", PA.strategy = "none" ) ``` ------------------------------------------------------------------------ # Run SDMs ```{r, message=FALSE, warning=FALSE, results='hide'} # Create modeling options myBiomodOptions <- BIOMOD_ModelingOptions() myBiomodModelOut <- BIOMOD_Modeling( bm.format = myBiomodData, models = c("GLM", "GAM", "RF", "GBM", "CTA"), bm.options = myBiomodOptions, CV.strategy = "random", CV.nb.rep = 3, CV.perc = 0.7, metric.eval = c("TSS", "ROC"), var.import = 5, seed.val = 123 ) ``` Each algorithm captures a different relationship between environment and occurrence — ensemble methods help combine their strengths. ```{r, message=FALSE, warning=FALSE, results='hide'} mods <- get_built_models(myBiomodModelOut, run = "RUN1") invisible( bm_PlotResponseCurves( bm.out = myBiomodModelOut, models.chosen = mods, fixed.var = "median" ) ) ``` **Further Deepen Your Understanding:** - How does the predicted probability of presence change with each environmental variable? - Are there thresholds or nonlinearities in the response? ------------------------------------------------------------------------ # Model Evaluation **Code** ```{r, message=FALSE, warning=FALSE, results='hide'} evals <- get_evaluations(myBiomodModelOut) print(evals) ``` **Plot** ```{r, message=FALSE, warning=FALSE, results='hide'} invisible( bm_PlotEvalBoxplot(myBiomodModelOut)) ``` **Think about:** - Which model performed best (by TSS or ROC)? - Are there any models that performed poorly? What could be the reason? - What does a high TSS or ROC value mean in ecological terms? - What does that imply about nonlinearity in the data? **Further Deepen Your Understanding:** - Compare the evaluation metrics across models. Discuss possible reasons for differences (e.g., overfitting, data structure, algorithm assumptions). ------------------------------------------------------------------------ # Variable Importance **Code** ```{r, message=FALSE, warning=FALSE, results='hide'} varimp <- get_variables_importance(myBiomodModelOut) print(varimp) ``` **Plot** ```{r, message=FALSE, warning=FALSE, results='hide'} invisible( bm_PlotVarImpBoxplot(myBiomodModelOut)) ``` **Think about:** - Which environmental variable is most important for predicting the species’ distribution? - Are there variables that seem unimportant? Why might that be? - How might variable importance change if you used a different species or study area? **Further Deepen Your Understanding:** - Discuss how variable importance can inform conservation or management decisions. ------------------------------------------------------------------------ # Project to Current Environment **Code** ```{r, message=FALSE, warning=FALSE, results='hide'} myBiomodProj <- BIOMOD_Projection( bm.mod = myBiomodModelOut, new.env = preds, proj.name = "current", selected.models = "all", binary.meth = "TSS" ) ``` **Map** ```{r, message=FALSE, warning=FALSE, results='hide'} invisible(plot(myBiomodProj)) ``` ------------------------------------------------------------------------ # Ensemble Modeling **Code** ```{r, message=FALSE, warning=FALSE, results='hide'} myBiomodEnsemble <- BIOMOD_EnsembleModeling( bm.mod = myBiomodModelOut, models.chosen = "all", em.by = "all", em.algo = c("EMmean", "EMwmean", "EMcv"), metric.select = "TSS", metric.select.thresh = 0.5, metric.eval = c("TSS", "ROC") ) myBiomodEnsembleProj <- BIOMOD_EnsembleForecasting( bm.em = myBiomodEnsemble, bm.proj = myBiomodProj ) # biomod2's internal plot (can look squished for multiple panels) invisible(plot(myBiomodEnsembleProj)) ``` The weighted mean ensemble (EMwmean) gives higher weight to better-performing algorithms. **Why it looks compressed** - biomod2's internal plotting code does not preserve the raster’s coordinate ratio. - It sets up a multi-panel layout (mfrow) to plot multiple ensemble models side by side. - Each panel is drawn with equal x/y units, not the spatial extent ratio, so the map looks "squished". **Solution** - Extract predictions and plot with terra::plot() This gives you correct aspect ratio and prettier maps: ```{r, message=FALSE, warning=FALSE, results='hide'} # Balanced aspect using terra ens_rast <- get_predictions(myBiomodEnsembleProj) terra::plot(ens_rast, nc = 2, main = "Ensemble Forecasts (Balanced Plot)") ``` **Using plotRGB() — composite visualization** plotRGB() displays three raster layers as R–G–B channels (e.g., temperature in red, precipitation in blue, elevation in green). It’s ideal for visualizing covariate contrasts or model uncertainty (mean, sd, cv). ```{r, message=FALSE, warning=FALSE, results='hide'} # Extract RGB layers safely layer_ids <- grep("EM(mean|wmean|cv)ByTSS", names(ens_rast)) if (length(layer_ids) < 3) { message("⚠️ Expected 3 ensemble layers (mean, wmean, cv), found ", length(layer_ids), ".") message("Available layers are: ", paste(names(ens_rast), collapse = ", ")) } else { ens_rgb <- ens_rast[[layer_ids]] # Rescale values to 0–255 stretch <- function(x, minv=0, maxv=255){ vals <- values(x) vals_scaled <- round((vals - min(vals, na.rm=TRUE)) / (max(vals, na.rm=TRUE)-min(vals, na.rm=TRUE)) * (maxv-minv) + minv) values(x) <- vals_scaled return(x) } ens_rgb <- stretch(ens_rgb) # Convert to RasterStack for plotRGB ens_rgb_r <- raster::stack(ens_rgb) # Plot RGB composite par(mfrow=c(1,1), mar=c(5,5,5,5)) plotRGB(ens_rgb_r, r=1, g=2, b=3, scale=255, axes=TRUE, xlab="Easting (m)", ylab="Northing (m)", main="Ensemble (Mean=R, Weighted Mean=G, CV=B)") } ``` **Interpretation:** - **Red** → areas with high mean suitability - **Green** → strong weighted mean influence - **Blue** → high variability among models **Optional Enhancement:** View each layer separately with consistent color palette ```{r, message=FALSE, warning=FALSE, results='hide'} cols <- hcl.colors(100, "YlOrRd", rev = TRUE) plot(ens_rast, col = cols) ``` ```{r, message=FALSE, warning=FALSE, results='hide'} library(terra) library(raster) library(RColorBrewer) library(sp) # Convert terra SpatRaster to raster (Raster* object) ens_raster <- raster::stack(ens_rast) # Inspect names to select the right layer names(ens_raster) # Example: "MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo" # Subset the TSS-weighted layer ens_layer <- ens_raster[["MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo"]] # Check for values summary(ens_layer) # If it shows only NAs, projection/resampling step might have failed. # Optional: replace NAs with 0 or mask to study area # ens_layer[is.na(ens_layer[])] <- 0 # Define a nice color palette cols <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG"))(100) plot(ens_layer, col = cols, main = "TSS-weighted ensemble prediction", axes = TRUE, # shows coordinates box = TRUE) ``` You can also produces a color-tuned visualization of all three ensemble layers, scaled and plotted like an RGB composite (or single-color maps), with axes and proper legends using terra. ```{r, message=FALSE, warning=FALSE, results='hide'} library(terra) library(RColorBrewer) # Extract ensemble raster stack ens_rast <- get_predictions(myBiomodEnsembleProj) # Optional: check names names(ens_rast) # [1] "MySpecies_EMmeanByTSS_mergedData_mergedRun_mergedAlgo" # [2] "MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo" # [3] "MySpecies_EMcvByTSS_mergedData_mergedRun_mergedAlgo" # Select the three layers layers <- ens_rast[[c(1,2,3)]] # Scale each layer to 0-255 (RGB scale) scale_to_255 <- function(x) { vals <- x[] vals_scaled <- round( (vals - min(vals, na.rm=TRUE)) / (max(vals, na.rm=TRUE) - min(vals, na.rm=TRUE)) * 255 ) x[] <- vals_scaled return(x) } layers_scaled <- lapply(layers, scale_to_255) layers_scaled <- rast(layers_scaled) names(layers_scaled) <- c("EMmean","EMwmean","EMcv") # Plot single layers with custom color palettes cols <- colorRampPalette(brewer.pal(11, "BrBG"))(100) for (i in 1:3) { plot(layers_scaled[[i]], col = cols, main = paste0(names(layers_scaled)[i], " (scaled)"), axes = TRUE, box = TRUE) } # Optional: RGB composite (assign layers to R, G, B channels) # Only works if all layers are scaled 0-255 plotRGB(layers_scaled, r=1, g=2, b=3, scale=255, stretch="lin", main="Ensemble RGB composite (EMmean=R, EMwmean=G, EMcv=B)") ``` **Think about:** - How does the ensemble prediction compare to individual models? - What are the advantages of using an ensemble approach? - How does model uncertainty (e.g., prob.cv) inform your confidence in predictions? **Further Deepen Your Understanding:** - Where are the highest predicted suitability values? - Are there areas of high uncertainty (compare with EMcv or EMci projections)? - How does the ensemble map compare to individual model projections? ------------------------------------------------------------------------ # Future Climate Projection ```{r, message=FALSE, warning=FALSE, results='hide'} if (data_source == "sdm") { # 19 bioclim variables (place your file in the project folder) # Example file name; change if needed future_2070 <- try(rast("future_2070_bioclim.tif"), silent = TRUE) if (inherits(future_2070, "SpatRaster")) { if (is.na(crs(future_2070))) crs(future_2070) <- "EPSG:4326" temp_f <- future_2070[[1]] # bio1 precip_f <- future_2070[[12]] # bio12 elev_c <- preds[["elevation"]] veg_c <- preds[["vegetation"]] # Align current to WGS84 of future elev_wgs <- project(elev_c, temp_f) veg_wgs <- project(veg_c, temp_f, method = "near") ext_study <- ext(elev_wgs) temp_crop <- crop(temp_f, ext_study) precip_crop <- crop(precip_f, ext_study) elev_res <- resample(elev_wgs, temp_crop) veg_res <- resample(veg_wgs, temp_crop, method = "near") # Stack with same names & order as training preds_future_sel <- c(elev_res, precip_crop, temp_crop, veg_res) names(preds_future_sel) <- c("elevation","precipitation","temperature","vegetation") # Project myBiomodProjFuture <- BIOMOD_Projection( bm.mod = myBiomodModelOut, new.env = preds_future_sel[[c("elevation","precipitation","temperature","vegetation")]], proj.name = "future_2070_fix", selected.models = "all", binary.meth = "TSS", compress = FALSE ) # Crop projection back to study area (preds footprint) r <- get_predictions(myBiomodProjFuture) r_zoom <- zoom_to_preds(r, preds) } else { message("future_2070_bioclim.tif not found; skipping future projection.") r_zoom <- NULL } } else { message("Using biomod2 fallback data; future-projection demo (elev/veg + bio) is skipped.") r_zoom <- NULL } ``` **Think about:** - Discussion: How do predicted suitable regions shift? - Are expansions or contractions consistent with ecological expectations? - How could this information be used in conservation planning? ------------------------------------------------------------------------ ```{r, message=FALSE, warning=FALSE, results='hide'} if (!is.null(r_zoom)) { terra::plot(r_zoom, nc = 3, mar = c(2,2,2,3), axes = FALSE) } ``` # Export Outputs ```{r, message=FALSE, warning=FALSE, results='hide'} # Export ensemble weighted mean (if built) if (exists("myBiomodEnsembleProj")) { ensemble_rast <- get_predictions(myBiomodEnsembleProj) wmean_name <- grep("EMwmeanByTSS", names(ensemble_rast), value = TRUE) if (length(wmean_name)) { ens_wmean <- ensemble_rast[[wmean_name]] writeRaster(ens_wmean, "ensemble_weighted_mean.tif", overwrite = TRUE) } } # Optional: export cropped future projections if (!is.null(r_zoom)) { writeRaster(r_zoom, "future_2070_projection_cropped.tif", overwrite = TRUE) } ``` Now you can import your results into QGIS or ArcGIS for spatial overlay with land-use or protected areas. ------------------------------------------------------------------------ # Try It Yourself Modify one of the following: - Use **different algorithms** (e.g., Maxent, XGBoost). - Add **pseudo-absence generation**. - Use **k-fold spatial cross-validation**. - Compare projections for **two SSP scenarios (245 vs. 585)**. ------------------------------------------------------------------------ # Summary ✅ Prepared occurrence and environmental data ✅ Fitted multi-algorithm SDMs ✅ Evaluated models and explored variable importance ✅ Built ensembles and projected under climate change ------------------------------------------------------------------------ **Curious Questions:** - How could you use these outputs in QGIS or ArcGIS? - What are the limitations of SDM outputs for real-world decision making? # Discussion and Reflection **Group Discussion Prompts:** - What are the main sources of uncertainty in SDMs? - How would you improve the data or modeling process? - What ethical considerations arise when using SDMs for conservation or management? # Ideas for Student Mini-Projects - Try modeling a different species (change the resp.var). Add or remove environmental predictors and see how results change. - Compare results using different cross-validation strategies (e.g., k-fold, block). - Explore the effect of sample size by subsetting the data. # References Thuiller W., Georges D., Engler R., Breiner F. (2023). **biomod2: Ensemble platform for species distribution modeling**. R package version 4.5.4. Hijmans, R. J. (2023). **geodata: Download Geographic Data**. R package version 0.6-3.