Species Habitat Suitability Modelling with biomod2

Author

Dr Tahir Ali | AG de Meaux

Introduction

Welcome to the Species Habitat Suitability Modelling Workshop! We will explore how to use biomod2 to fit, evaluate, and project multi-algorithm SDMs under climate change.

We’ll explore multi-algorithm modelling, model evaluation, variable importance, ensemble modelling, and future projections under climate change.

Learning Goals

  • Prepare occurrence & environmental data

  • Fit multiple algorithms (GLM, GAM, RF, GBM, CTA)

  • Evaluate models and interpret variable importance

  • Create ensemble predictions and project under future climates


Species Occurrence Data

Code
demo <- get_demo_data()
species <- demo$species
preds   <- demo$preds
data_source <- demo$source

head(species)
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 155174 ymin: 4048000 xmax: 509277 ymax: 4207836
Projected CRS: WGS 84 / UTM zone 30N
  Occurrence               geometry
1          1 POINT (304000 4048000)
2          0 POINT (433009 4197679)
3          1 POINT (261300 4200000)
4          1 POINT (383560 4170853)
5          1 POINT (155174 4207836)
6          0 POINT (509277 4144975)
Code
plot(species["Occurrence"], main = "Species Occurrences")

Interpretation:

  • Presence (1) and absence (0) points show the distribution of the species in the study area.

Interactive Map

Code
species_ll <- tryCatch(sf::st_transform(species, 4326), error = function(e) species)
coords_ll  <- sf::st_coordinates(species_ll)

leaflet(species_ll) |>
  addProviderTiles("CartoDB.Positron") |>
  addCircleMarkers(~coords_ll[,1], ~coords_ll[,2],
                   color = ~ifelse(species_ll$Occurrence == 1, "darkgreen", "red"),
                   popup = ~paste("Occurrence:", species_ll$Occurrence),
                   radius = 5, stroke = FALSE, fillOpacity = 0.8)

Interpretation: Green = presence, Red = absence. Interactive maps help visualize sampling bias or clustering.


Environmental Predictors

Code

Code
plot(preds, main = "Environmental Predictors")

Interpretation: These environmental layers (precipitation, temperature, etc.) are predictors for species distribution.

Output

Code
names(preds)
[1] "elevation"     "precipitation" "temperature"   "vegetation"   

Inspect predictor correlation before modeling! Highly correlated predictors can inflate variable importance.


Data Formatting for biomod2

Code
myBiomodData <- BIOMOD_FormatingData(
  resp.var  = species$Occurrence,
  expl.var  = preds,                          # SpatRaster
  resp.xy   = sf::st_coordinates(species),    # in same CRS as preds
  resp.name = "MySpecies",
  PA.strategy = "none"
)

Run SDMs

Code
# Create modeling options
myBiomodOptions <- BIOMOD_ModelingOptions()

myBiomodModelOut <- BIOMOD_Modeling(
  bm.format   = myBiomodData,
  models      = c("GLM", "GAM", "RF", "GBM", "CTA"),
  bm.options  = myBiomodOptions,
  CV.strategy = "random",
  CV.nb.rep   = 3,
  CV.perc     = 0.7,
  metric.eval = c("TSS", "ROC"),
  var.import  = 5,
  seed.val    = 123
)

Each algorithm captures a different relationship between environment and occurrence — ensemble methods help combine their strengths.

Code
mods <- get_built_models(myBiomodModelOut, run = "RUN1")
invisible(
bm_PlotResponseCurves(
bm.out = myBiomodModelOut,
models.chosen = mods,
fixed.var = "median"
)
)

Further Deepen Your Understanding:

  • How does the predicted probability of presence change with each environmental variable?

  • Are there thresholds or nonlinearities in the response?


Model Evaluation

Code

Code
evals <- get_evaluations(myBiomodModelOut)
print(evals)

Plot

Code
invisible(
  bm_PlotEvalBoxplot(myBiomodModelOut))

Think about:

  • Which model performed best (by TSS or ROC)?

  • Are there any models that performed poorly? What could be the reason?

  • What does a high TSS or ROC value mean in ecological terms?

  • What does that imply about nonlinearity in the data?

Further Deepen Your Understanding:

  • Compare the evaluation metrics across models. Discuss possible reasons for differences (e.g., overfitting, data structure, algorithm assumptions).

Variable Importance

Code

Code
varimp <- get_variables_importance(myBiomodModelOut)
print(varimp)

Plot

Code
invisible(
  bm_PlotVarImpBoxplot(myBiomodModelOut))

Think about:

  • Which environmental variable is most important for predicting the species’ distribution?

  • Are there variables that seem unimportant? Why might that be?

  • How might variable importance change if you used a different species or study area?

Further Deepen Your Understanding:

  • Discuss how variable importance can inform conservation or management decisions.

Project to Current Environment

Code

Code
myBiomodProj <- BIOMOD_Projection(
  bm.mod = myBiomodModelOut,
  new.env = preds,
  proj.name = "current",
  selected.models = "all",
  binary.meth = "TSS"
)

Map

Code
invisible(plot(myBiomodProj))


Ensemble Modeling

Code

Code
myBiomodEnsemble <- BIOMOD_EnsembleModeling(
  bm.mod = myBiomodModelOut,
  models.chosen = "all",
  em.by = "all",
  em.algo = c("EMmean", "EMwmean", "EMcv"),
  metric.select = "TSS",
  metric.select.thresh = 0.5,
  metric.eval = c("TSS", "ROC")
)

myBiomodEnsembleProj <- BIOMOD_EnsembleForecasting(
  bm.em  = myBiomodEnsemble,
  bm.proj = myBiomodProj
)

# biomod2's internal plot (can look squished for multiple panels)
invisible(plot(myBiomodEnsembleProj))

The weighted mean ensemble (EMwmean) gives higher weight to better-performing algorithms.

Why it looks compressed - biomod2’s internal plotting code does not preserve the raster’s coordinate ratio. - It sets up a multi-panel layout (mfrow) to plot multiple ensemble models side by side. - Each panel is drawn with equal x/y units, not the spatial extent ratio, so the map looks “squished”.

Solution - Extract predictions and plot with terra::plot()

This gives you correct aspect ratio and prettier maps:

Code
# Balanced aspect using terra
ens_rast <- get_predictions(myBiomodEnsembleProj)
terra::plot(ens_rast, nc = 2, main = "Ensemble Forecasts (Balanced Plot)")

Using plotRGB() — composite visualization

plotRGB() displays three raster layers as R–G–B channels (e.g., temperature in red, precipitation in blue, elevation in green). It’s ideal for visualizing covariate contrasts or model uncertainty (mean, sd, cv).

Code
# Extract RGB layers safely
layer_ids <- grep("EM(mean|wmean|cv)ByTSS", names(ens_rast))
if (length(layer_ids) < 3) {
  message("⚠️ Expected 3 ensemble layers (mean, wmean, cv), found ", length(layer_ids), ".")
  message("Available layers are: ", paste(names(ens_rast), collapse = ", "))
} else {
  ens_rgb <- ens_rast[[layer_ids]]

  # Rescale values to 0–255
  stretch <- function(x, minv=0, maxv=255){
    vals <- values(x)
    vals_scaled <- round((vals - min(vals, na.rm=TRUE)) /
                         (max(vals, na.rm=TRUE)-min(vals, na.rm=TRUE)) * (maxv-minv) + minv)
    values(x) <- vals_scaled
    return(x)
  }

  ens_rgb <- stretch(ens_rgb)

  # Convert to RasterStack for plotRGB
  ens_rgb_r <- raster::stack(ens_rgb)

  # Plot RGB composite
  par(mfrow=c(1,1), mar=c(5,5,5,5))
  plotRGB(ens_rgb_r, r=1, g=2, b=3, scale=255,
          axes=TRUE, xlab="Easting (m)", ylab="Northing (m)",
          main="Ensemble (Mean=R, Weighted Mean=G, CV=B)")
}

Interpretation:

  • Red → areas with high mean suitability
  • Green → strong weighted mean influence
  • Blue → high variability among models

Optional Enhancement: View each layer separately with consistent color palette

Code
cols <- hcl.colors(100, "YlOrRd", rev = TRUE)
plot(ens_rast, col = cols)

Code
library(terra)
library(raster)
library(RColorBrewer)
library(sp)

# Convert terra SpatRaster to raster (Raster* object)
ens_raster <- raster::stack(ens_rast)

# Inspect names to select the right layer
names(ens_raster)
# Example: "MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo"

# Subset the TSS-weighted layer
ens_layer <- ens_raster[["MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo"]]

# Check for values
summary(ens_layer)
# If it shows only NAs, projection/resampling step might have failed.

# Optional: replace NAs with 0 or mask to study area
# ens_layer[is.na(ens_layer[])] <- 0

# Define a nice color palette
cols <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG"))(100)
plot(ens_layer,
     col = cols,
     main = "TSS-weighted ensemble prediction",
     axes = TRUE,  # shows coordinates
     box = TRUE)

You can also produces a color-tuned visualization of all three ensemble layers, scaled and plotted like an RGB composite (or single-color maps), with axes and proper legends using terra.

Code
library(terra)
library(RColorBrewer)

# Extract ensemble raster stack
ens_rast <- get_predictions(myBiomodEnsembleProj)

# Optional: check names
names(ens_rast)
# [1] "MySpecies_EMmeanByTSS_mergedData_mergedRun_mergedAlgo"
# [2] "MySpecies_EMwmeanByTSS_mergedData_mergedRun_mergedAlgo"
# [3] "MySpecies_EMcvByTSS_mergedData_mergedRun_mergedAlgo"

# Select the three layers
layers <- ens_rast[[c(1,2,3)]]

# Scale each layer to 0-255 (RGB scale)
scale_to_255 <- function(x) {
  vals <- x[]
  vals_scaled <- round( (vals - min(vals, na.rm=TRUE)) / (max(vals, na.rm=TRUE) - min(vals, na.rm=TRUE)) * 255 )
  x[] <- vals_scaled
  return(x)
}

layers_scaled <- lapply(layers, scale_to_255)
layers_scaled <- rast(layers_scaled)
names(layers_scaled) <- c("EMmean","EMwmean","EMcv")

# Plot single layers with custom color palettes
cols <- colorRampPalette(brewer.pal(11, "BrBG"))(100)

for (i in 1:3) {
  plot(layers_scaled[[i]],
       col = cols,
       main = paste0(names(layers_scaled)[i], " (scaled)"),
       axes = TRUE, box = TRUE)
}

Code
# Optional: RGB composite (assign layers to R, G, B channels)
# Only works if all layers are scaled 0-255
plotRGB(layers_scaled, r=1, g=2, b=3, scale=255, stretch="lin",
        main="Ensemble RGB composite (EMmean=R, EMwmean=G, EMcv=B)")

Think about:

  • How does the ensemble prediction compare to individual models?

  • What are the advantages of using an ensemble approach?

  • How does model uncertainty (e.g., prob.cv) inform your confidence in predictions?

Further Deepen Your Understanding:

  • Where are the highest predicted suitability values?

  • Are there areas of high uncertainty (compare with EMcv or EMci projections)?

  • How does the ensemble map compare to individual model projections?


Future Climate Projection

Code
if (data_source == "sdm") {
  # 19 bioclim variables (place your file in the project folder)
  # Example file name; change if needed
  future_2070 <- try(rast("future_2070_bioclim.tif"), silent = TRUE)
  if (inherits(future_2070, "SpatRaster")) {
    if (is.na(crs(future_2070))) crs(future_2070) <- "EPSG:4326"
    
    temp_f   <- future_2070[[1]]   # bio1
    precip_f <- future_2070[[12]]  # bio12
    
    elev_c <- preds[["elevation"]]
    veg_c  <- preds[["vegetation"]]
    
    # Align current to WGS84 of future
    elev_wgs <- project(elev_c, temp_f)
    veg_wgs  <- project(veg_c,  temp_f, method = "near")
    
    ext_study   <- ext(elev_wgs)
    temp_crop   <- crop(temp_f,   ext_study)
    precip_crop <- crop(precip_f, ext_study)
    
    elev_res <- resample(elev_wgs, temp_crop)
    veg_res  <- resample(veg_wgs,  temp_crop, method = "near")
    
    # Stack with same names & order as training
    preds_future_sel <- c(elev_res, precip_crop, temp_crop, veg_res)
    names(preds_future_sel) <- c("elevation","precipitation","temperature","vegetation")
    
    # Project
    myBiomodProjFuture <- BIOMOD_Projection(
      bm.mod = myBiomodModelOut,
      new.env = preds_future_sel[[c("elevation","precipitation","temperature","vegetation")]],
      proj.name = "future_2070_fix",
      selected.models = "all",
      binary.meth = "TSS",
      compress = FALSE
    )
    
    # Crop projection back to study area (preds footprint)
    r  <- get_predictions(myBiomodProjFuture)
    r_zoom <- zoom_to_preds(r, preds)
  } else {
    message("future_2070_bioclim.tif not found; skipping future projection.")
    r_zoom <- NULL
  }
} else {
  message("Using biomod2 fallback data; future-projection demo (elev/veg + bio) is skipped.")
  r_zoom <- NULL
}

Think about: - Discussion: How do predicted suitable regions shift?

  • Are expansions or contractions consistent with ecological expectations?

  • How could this information be used in conservation planning?


Code
if (!is.null(r_zoom)) {
  terra::plot(r_zoom, nc = 3, mar = c(2,2,2,3), axes = FALSE)
}

Export Outputs

Code
# Export ensemble weighted mean (if built)
if (exists("myBiomodEnsembleProj")) {
  ensemble_rast <- get_predictions(myBiomodEnsembleProj)
  wmean_name <- grep("EMwmeanByTSS", names(ensemble_rast), value = TRUE)
  if (length(wmean_name)) {
    ens_wmean <- ensemble_rast[[wmean_name]]
    writeRaster(ens_wmean, "ensemble_weighted_mean.tif", overwrite = TRUE)
  }
}

# Optional: export cropped future projections
if (!is.null(r_zoom)) {
  writeRaster(r_zoom, "future_2070_projection_cropped.tif", overwrite = TRUE)
}

Now you can import your results into QGIS or ArcGIS for spatial overlay with land-use or protected areas.


Try It Yourself

Modify one of the following:

  • Use different algorithms (e.g., Maxent, XGBoost).
  • Add pseudo-absence generation.
  • Use k-fold spatial cross-validation.
  • Compare projections for two SSP scenarios (245 vs. 585).

Summary

✅ Prepared occurrence and environmental data ✅ Fitted multi-algorithm SDMs ✅ Evaluated models and explored variable importance ✅ Built ensembles and projected under climate change


Curious Questions:

  • How could you use these outputs in QGIS or ArcGIS?

  • What are the limitations of SDM outputs for real-world decision making?

Discussion and Reflection

Group Discussion Prompts:

  • What are the main sources of uncertainty in SDMs?
  • How would you improve the data or modeling process?
  • What ethical considerations arise when using SDMs for conservation or management?

Ideas for Student Mini-Projects

  • Try modeling a different species (change the resp.var). Add or remove environmental predictors and see how results change.
  • Compare results using different cross-validation strategies (e.g., k-fold, block).
  • Explore the effect of sample size by subsetting the data.

References

Thuiller W., Georges D., Engler R., Breiner F. (2023). biomod2: Ensemble platform for species distribution modeling. R package version 4.5.4. Hijmans, R. J. (2023). geodata: Download Geographic Data. R package version 0.6-3.