social.coop is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Fediverse instance for people interested in cooperative and collective projects. If you are interested in joining our community, please apply at https://join.social.coop/registration-form.html.

Administered by:

Server stats:

492
active users

Kathe Todd-Brown

Curious about the spatial distribution of soil organic carbon across the United States? Of course you are! Wang etall 2024 agupubs.onlinelibrary.wiley.co used a combination clustering and random forest approach to upscale point estimates of soil carbon stocks.

I'm a co-author here! Join me on a read through of this paper 1/n

This paper has been a while coming so I'm reminding myself of the details here. Broadly we were hoping to cluster the spatial distribution of the drivers of soil carbon formation (climate and geology) into similar groups, then model the distribution of soil carbon within these groups to get a more accurate estimate of carbon stocks.

Let's dive into the abstract and see if I'm remembering right. 2/n

Let's start with the baseline: soil organic carbon (SOC) maps are hard and there is a high variation in current maps. Our estimates fell within the total SOC that other folks have estimated but we were able to get better fits with NEON sites then other maps (NEON not included in training data). Our method also delivers an uncertainty map which is important for bench-marking Earth system models. 3/n

Soil carbon stocks are a huge pool for carbon that might otherwise be CO2. It's important to know of the net flux is going down into the ground or up into the air, if we are going to predict how CO2 will drive climate in the future. (Just to cover the standard justifications.) These fluxes are often predicted by Earth system models that require the current distribution of stocks so that the right soils are interaction with the right climate. 4/n

Mapping exercises have historically focused on expert-drawn mapping units which were then matched to representative soil profiles. This idea of a mapping unit was further refined as machine learning methods developed focusing on remote sensing outputs. Random forest and quantile regression forests are particularly populate for these upscaled maps. 5/n

Scale, however, matters. Globally fitted RF and QRF algorithms have shown significant shortcomings in describing smaller regional scale variations.

We address this gap by using multivariate geographic clustering to create regional climate-geology groups. Borrowing from the ecological design of the NEON (National Ecological Observatory Network), this allowed us to balance both regional connection with soil formation drivers. 6/n

**Methods** We used existing maps that had better agreement with validation data then the soil carbon maps. Based on classic soil formation theory these driving variables include: bioclimate (beyond just MAT and MAP), physiographic (elevation, slope, aspect), non-organic-carbon soil (CaCO3, CEC, K, texture, hydro), and vegetation (NPP, LAI, NDVI). 7/n

Complimentary response variables included soil organic carbon stocks from site-level surveys from the International Soil Carbon Network, Alaska Soil Profile Data, and the National Ecological Observatory Network (Hmmm hope that NEON was used for validation based on abstract.) 8/n

We then used Principal Component Analysis to remove collinearity from the 36 environmental driving variables. 12 most explanatory components were passed to the Multivariable Geographic Clustering (MGC). 9/n

MGC then used k-mean cluster to create our mapping unit. Here we did a bit of tuning, the number of clusters selected for k-mean clustering is critical. Instead of allowing a traditional cut-off dictated by centroid distance patterns we balanced sample size (wanting n>100) with known heterogeneity (don't make East coast one blob) to arrive at a cluster count of 20. This also aligned with the NEON ecoclimate domains, though our regions were different. 10/n

Now that we had the clusters, we went back to the original ecoclimate variables and applied a series of 20 random forest models (n-trees = 1000) with an 80/20 training/validation split (RF models are insensitive to co-linearity). Models were then validated with HWSD 1.2, NCSCD, and SoilGrids 2.0 as well as NEON (which was withheld from the calibration data). 11/n

In most other fields these would be horrible R2 but for a soil map they are pretty darn good. We saw fits of ~0.4 to 0.5 R2 with less bias then SoilGrids 2.0 or the HWSD-NCSCD maps.

Alaska, Great Lakes, Nebraska Sand Hills and the far northeast of the US, Southwest deserts, Rocky Mountains and southeast Florida all had high coefficient of variation reflecting low sample size.

((I wonder if some of this might have been resolved with the Forest Service FIA database?)). 12/n

As expected, we saw that different environmental drivers were explanatory in different regions. Spatial patterns were broadly consistent with prior work. 13/n

Key limitations in this study. ISCN profiles were collected over several decades and thus experienced different climates and climate histories, bulk density was frequently missing and we were forced to use a pedotranfer function to gap fill, and finally downscaling driving maps could have also introduced bias.

There are always limitations. 14/n

Overall this study moved beyond mean maps to capture regional patterns of SOC and their variation. 15/n