最近在做随机森林回归实验时,遇到了一个问题:

在做回归时,用的是excel点数据做的回归,像这样:

但是如何把这个训练好的模型应用到整幅含有地理信息的遥感影像上去,并保存这个预测图像呢?对这个问题,展开了一些思考,废话不多说,直接上代码。

1 随机森林回归模型构建

导入数据,这里把Merge_Data替换成自己的数据;

import pandas as pd

import numpy as np

from sklearn.ensemble import RandomForestRegressor

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, r2_score,mean_absolute_error

from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt

file_path = r"Merge_Data.xlsx"

data = pd.read_excel(file_path)

# Separating features and label

X = data.iloc[:, :-1] # Assuming the last column is the label

y = data.iloc[:, -1]

# Splitting the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

训练模型:

# 初始化模型

rf_model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)

# 训练模型

rf_model.fit(X_train_scaled, y_train)

# 预测

y_pred = rf_model.predict(X_test_scaled)

# 计算评估指标

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

mae = mean_absolute_error(y_test, y_pred)

results = {'Random Forest': {'MSE': mse, 'R2': r2, 'MAE': mae}}

# 绘图

plt.rcParams["font.family"] = "Times New Roman"

plt.figure(figsize=(5, 4.5))

plt.plot(range(0, 30), range(0, 30), color='black', alpha=0.7)

plt.scatter(y_test, y_pred, marker='.', s=1)

plt.title("Random Forest - Actual vs Predicted", fontsize=15)

plt.xlabel("Actual Values", fontsize=15)

plt.ylabel("Predicted Values", fontsize=15)

plt.tick_params(axis='both', which='major', labelsize=15)

filename = f"{file_path}Random_Forest_plot.png"

plt.tight_layout()

# plt.savefig(filename, dpi=600)

plt.show()

2 应用该模型到遥感影像上

这里首先利用rasterio打开图像;因为我这里是用了特定波段进行的预测,所以设置了一个selected_bands,其实是不需要这个的,全波段的话selected_bands设置为None;

之后创建一个空数组来存放预测结果,后利用rf_model来预测,最后重塑;

predicted_image就是预测结果,即完成了预测部分。

import rasterio

# 打开影像

selected_bands = None

with rasterio.open(r"\img.tif") as src:

# 读取指定的波段

img = src.read(selected_bands)

# 将影像数据转换为二维数组(像素数 x 波段数)

reshaped_img = img.transpose(1, 2, 0).reshape(-1, img.shape[0]))

# 创建一个空数组来存放预测结果,初始化为NaN

predictions = np.full(reshaped_img.shape[0], np.nan)

# 归一化

reshaped_img = scaler.transform(reshaped_img)

# 找出没有NaN值的行索引

valid_data_mask = ~np.isnan(reshaped_img).any(axis=1)

valid_data = reshaped_img[valid_data_mask]

# 批量预测,对所有有效数据一次性进行

predictions[valid_data_mask] = rf_model.predict(valid_data)

# 重塑预测结果以匹配原始影像的形状

predicted_image = predictions.reshape(src.height, src.width)

3 为预测影像添加地理信息

首先设置一些基本的输入和输出路径,这里有地理信息的栅格就是原始的遥感影像,需要读取原始图像的地理信息并将其应用到预测影像上,注释比较清楚,不多解释了。

from rasterio.transform import from_origin

# 路径设置

source_raster = r"\img3_CASK.tif" # 有地理信息的栅格

target_data = predicted_image # 没有地理信息的栅格

output_raster = r'output.tif' # 输出文件路径

# 读取有地理信息的栅格数据

with rasterio.open(source_raster) as src:

transform = src.transform

crs = src.crs

data = src.read(1)

# 创建一个新的栅格文件,应用地理信息

with rasterio.open(

output_raster, 'w', driver='GTiff', height=target_data.shape[0],

width=target_data.shape[1], count=1, dtype=target_data.dtype,

crs=crs, transform=transform

) as dst:

dst.write(target_data, 1)

参考阅读

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: