第一部分第二部分

本文是对学习笔记之——3D Gaussian Splatting源码解读的补充,并订正了一些错误。

目录

五、反向传播1. `rasterizer_impl.cu`: `CudaRasterizer::Rasterizer::backward`2. `backward.cu`: `renderCUDA`3. `backward.cu`: `preprocess`

五、反向传播

原文第6节:

During the backward pass, we must therefore recover the full sequence of blended points per-pixel in the forward pass. One solution would be to store arbitrarily long lists of blended points per-pixel in global memory [Kopanas et al. 2021]. To avoid the implied dynamic memory management overhead, we instead choose to traverse the per-tile lists again; we can reuse the sorted array of Gaussians and tile ranges from the forward pass. To facilitate gradient computation, we now traverse them back-to-front.

The traversal starts from the last point that affected any pixel in the tile, and loading of points into shared memory again happens collaboratively. Additionally, each pixel will only start (expensive) overlap testing and processing of points if their depth is lower than or equal to the depth of the last point that contributed to its color during the forward pass. Computation of the gradients described in Sec. 4 requires the accumulated opacity values at each step during the original blending process. Rather than trasversing an explicit list of progressively shrinking opacities in the backward pass, we can recover these intermediate opacities by storing only the total accumulated opacity at the end of the forward pass. Specifically, each point stores the final accumulated opacity 훼 in the forward process; we divide this by each point’s 훼 in our back-to-front traversal to obtain the required coefficients for gradient computation.

反向传播是根据loss对每个像素颜色的导求出loss对Gaussian的2维中心坐标、3维中心坐标、椭圆二次型矩阵、不透明度、相机看到的Gaussian颜色、球谐系数、三维协方差矩阵、缩放和旋转的导数。

代码中出现的类似于dL_dx的变量的意思是loss对参数x的导数,即

L

x

\frac{\partial L}{\partial x}

∂x∂L​。

1. rasterizer_impl.cu: CudaRasterizer::Rasterizer::backward

// Produce necessary gradients for optimization, corresponding

// to forward render pass

void CudaRasterizer::Rasterizer::backward(

const int P, // Gaussian个数

int D, // 球谐度数

int M, // 三通道球谐系数个数

int R,

// R对应CudaRasterizer::Rasterizer::forward函数中的num_rendered

// 即排序数组的个数(等于每个Gaussian覆盖的tile的个数之和)

const float* background, // 背景颜色

const int width, int height,

const float* means3D,

const float* shs,

const float* colors_precomp,

const float* scales,

const float scale_modifier,

const float* rotations,

const float* cov3D_precomp,

const float* viewmatrix,

const float* projmatrix,

const float* campos,

const float tan_fovx, float tan_fovy,

const int* radii,

char* geom_buffer,

char* binning_buffer,

char* img_buffer,

// 上面的参数不解释

const float* dL_dpix, // loss对每个像素颜色的导数

float* dL_dmean2D, // loss对Gaussian二维中心坐标的导数

float* dL_dconic, // loss对椭圆二次型矩阵的导数

float* dL_dopacity, // loss对不透明度的导数

float* dL_dcolor, // loss对Gaussian颜色的导数(颜色是从相机中心看向Gaussian的颜色)

float* dL_dmean3D, // loss对Gaussian三维中心坐标的导数

float* dL_dcov3D, // loss对Gaussian三维协方差矩阵的导数

float* dL_dsh, // loss对Gaussian的球谐系数的导数

float* dL_dscale, // loss对Gaussian的缩放参数的导数

float* dL_drot, // loss对Gaussian旋转四元数的导数

bool debug)

{

GeometryState geomState = GeometryState::fromChunk(geom_buffer, P);

BinningState binningState = BinningState::fromChunk(binning_buffer, R);

ImageState imgState = ImageState::fromChunk(img_buffer, width * height);

// 上面这些缓冲区都是在前向传播的时候存下来的,现在拿出来用

if (radii == nullptr)

{

radii = geomState.internal_radii;

}

const float focal_y = height / (2.0f * tan_fovy);

const float focal_x = width / (2.0f * tan_fovx);

const dim3 tile_grid((width + BLOCK_X - 1) / BLOCK_X, (height + BLOCK_Y - 1) / BLOCK_Y, 1);

const dim3 block(BLOCK_X, BLOCK_Y, 1);

// Compute loss gradients w.r.t. 2D mean position, conic matrix,

// opacity and RGB of Gaussians from per-pixel loss gradients.

// If we were given precomputed colors and not SHs, use them.

const float* color_ptr = (colors_precomp != nullptr) ? colors_precomp : geomState.rgb;

CHECK_CUDA(BACKWARD::render(

tile_grid,

block,

imgState.ranges,

binningState.point_list,

width, height,

background,

geomState.means2D,

geomState.conic_opacity,

color_ptr,

imgState.accum_alpha,

imgState.n_contrib,

dL_dpix,

(float3*)dL_dmean2D,

(float4*)dL_dconic,

dL_dopacity,

dL_dcolor), debug)

// Take care of the rest of preprocessing. Was the precomputed covariance

// given to us or a scales/rot pair? If precomputed, pass that. If not,

// use the one we computed ourselves.

const float* cov3D_ptr = (cov3D_precomp != nullptr) ? cov3D_precomp : geomState.cov3D;

// 因为是反向传播,所以preprocess放在后面了(❁´◡`❁)

CHECK_CUDA(BACKWARD::preprocess(P, D, M,

(float3*)means3D,

radii,

shs,

geomState.clamped,

(glm::vec3*)scales,

(glm::vec4*)rotations,

scale_modifier,

cov3D_ptr,

viewmatrix,

projmatrix,

focal_x, focal_y,

tan_fovx, tan_fovy,

(glm::vec3*)campos,

(float3*)dL_dmean2D,

dL_dconic,

(glm::vec3*)dL_dmean3D,

dL_dcolor,

dL_dcov3D,

dL_dsh,

(glm::vec3*)dL_dscale,

(glm::vec4*)dL_drot), debug)

}

2. backward.cu: renderCUDA

这个函数中每个线程负责一个像素,计算loss对Gaussian的二维中心坐标、椭圆二次型矩阵、不透明度和Gaussian颜色的导。

这里求导利用的公式是颜色的递推公式,与前向传播有所不同。设

b

i

\boldsymbol{b}_i

bi​是第

i

i

i个即其后的Gaussians渲染出来的颜色(三个通道),

g

i

\boldsymbol{g}_i

gi​是第

i

i

i个Gaussian的颜色,则有递推公式:

b

i

=

α

i

g

i

+

(

1

α

i

)

b

i

+

1

\boldsymbol{b}_i=\alpha_i \boldsymbol{g}_i+(1-\alpha_i)\boldsymbol{b}_{i+1}

bi​=αi​gi​+(1−αi​)bi+1​令

T

i

=

(

1

α

1

)

(

1

α

2

)

(

1

α

i

1

)

T_i=(1-\alpha_1)(1-\alpha_2)\cdots(1-\alpha_{i-1})

Ti​=(1−α1​)(1−α2​)⋯(1−αi−1​)为第

i

i

i个Gaussian对像素点的透光率,

C

\boldsymbol{C}

C是像素点的颜色,则有

L

b

i

=

L

C

b

1

b

2

b

2

b

3

b

i

1

b

i

=

L

C

(

1

α

1

)

I

(

1

α

2

)

I

(

1

α

i

1

)

I

=

T

i

L

C

\begin{aligned} \cfrac{\partial L}{\partial\boldsymbol{b}_i}&=\cfrac{\partial L}{\partial\boldsymbol{C}}\cdot\cfrac{\partial \boldsymbol{b}_1}{\partial\boldsymbol{b}_2}\cdot\cfrac{\partial \boldsymbol{b}_2}{\partial\boldsymbol{b}_3}\cdots\cfrac{\partial \boldsymbol{b}_{i-1}}{\partial\boldsymbol{b}_i}\\ &=\cfrac{\partial L}{\partial\boldsymbol{C}}\cdot(1-\alpha_1)I\cdot(1-\alpha_2)I\cdots(1-\alpha_{i-1})I\\ &=T_i\cfrac{\partial L}{\partial\boldsymbol{C}}\\ \end{aligned}

∂bi​∂L​​=∂C∂L​⋅∂b2​∂b1​​⋅∂b3​∂b2​​⋯∂bi​∂bi−1​​=∂C∂L​⋅(1−α1​)I⋅(1−α2​)I⋯(1−αi−1​)I=Ti​∂C∂L​​故

L

α

i

=

T

i

L

C

(

g

i

b

i

+

1

)

L

g

i

=

T

i

α

i

L

C

\cfrac{\partial L}{\partial \alpha_i}=T_i\cfrac{\partial L}{\partial\boldsymbol{C}}(\boldsymbol{g}_i-\boldsymbol{b}_{i+1})\\ \cfrac{\partial L}{\partial \boldsymbol{g}_i}=T_i\alpha_i\cfrac{\partial L}{\partial\boldsymbol{C}}

∂αi​∂L​=Ti​∂C∂L​(gi​−bi+1​)∂gi​∂L​=Ti​αi​∂C∂L​令

α

=

min

(

0.99

,

σ

G

)

\alpha=\min(0.99,\sigma G)

α=min(0.99,σG),其中

σ

\sigma

σ是“opacity”,

G

G

G是正态分布给出的“exponential falloff”,则

L

σ

=

G

L

α

 

L

G

=

σ

L

α

\frac{\partial L}{\partial \sigma}=G\frac{\partial L}{\partial \alpha}\\\ \\ \frac{\partial L}{\partial G}=\sigma\frac{\partial L}{\partial \alpha}

∂σ∂L​=G∂α∂L​ ∂G∂L​=σ∂α∂L​(注:代码中似乎不考虑

α

>

0.99

\alpha>0.99

α>0.99的情况;也许是因为想让

σ

\sigma

σ过大时也有梯度从而变小?) 而

G

=

exp

{

1

2

d

T

A

d

}

G=\exp\{-\frac{1}{2}\boldsymbol{d}^T A\boldsymbol{d}\}

G=exp{−21​dTAd},

d

=

p

μ

\boldsymbol{d}=\boldsymbol{p}-\boldsymbol\mu

d=p−μ,其中

p

\boldsymbol p

p为像素坐标,

μ

\boldsymbol\mu

μ为Gausian中心的2D坐标,

d

\boldsymbol d

d为它们之间的位移向量,

A

A

A为椭圆二次型的矩阵,因此

L

d

=

1

2

G

L

G

[

d

T

(

A

T

+

A

)

]

L

μ

=

L

d

L

A

00

=

1

2

G

L

G

d

x

2

L

A

11

=

1

2

G

L

G

d

y

2

L

A

01

=

1

2

G

L

G

d

x

d

y

\frac{\partial L}{\partial \boldsymbol d}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}[\boldsymbol d^T(A^T+A)]\\ \cfrac{\partial L}{\partial \boldsymbol \mu}=-\cfrac{\partial L}{\partial \boldsymbol d}\\ \frac{\partial L}{\partial \boldsymbol A_{00}}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}d_x^2\\ \frac{\partial L}{\partial \boldsymbol A_{11}}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}d_y^2\\ \frac{\partial L}{\partial \boldsymbol A_{01}}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}d_x d_y

∂d∂L​=−21​G∂G∂L​[dT(AT+A)]∂μ∂L​=−∂d∂L​∂A00​∂L​=−21​G∂G∂L​dx2​∂A11​∂L​=−21​G∂G∂L​dy2​∂A01​∂L​=−21​G∂G∂L​dx​dy​注意计算

L

μ

\frac{\partial L}{\partial \boldsymbol \mu}

∂μ∂L​时要乘以像素坐标对像平面坐标的导。

公式中出现的变量和代码中变量的关系如下表:

公式变量代码变量

b

i

+

1

\boldsymbol{b}_{i+1}

bi+1​accum_rec

g

i

\boldsymbol{g}_i

gi​c

T

i

T_i

Ti​T

L

C

\cfrac{\partial L}{\partial\boldsymbol{C}}

∂C∂L​dL_dpixel

σ

i

\sigma_i

σi​con_o.w

d

\boldsymbol{d}

dd

μ

\boldsymbol\mu

μxy

p

\boldsymbol p

ppixf

A

00

,

A

01

,

A

11

A_{00},A_{01},A_{11}

A00​,A01​,A11​con_o.x,con_o.y,con_o.z

// Backward version of the rendering procedure.

template

__global__ void __launch_bounds__(BLOCK_X * BLOCK_Y)

renderCUDA(

const uint2* __restrict__ ranges,

const uint32_t* __restrict__ point_list,

int W, int H,

const float* __restrict__ bg_color,

const float2* __restrict__ points_xy_image,

const float4* __restrict__ conic_opacity,

const float* __restrict__ colors,

const float* __restrict__ final_Ts,

const uint32_t* __restrict__ n_contrib,

const float* __restrict__ dL_dpixels,

float3* __restrict__ dL_dmean2D,

float4* __restrict__ dL_dconic2D,

float* __restrict__ dL_dopacity,

float* __restrict__ dL_dcolors)

{

// We rasterize again. Compute necessary block info.

auto block = cg::this_thread_block();

const uint32_t horizontal_blocks = (W + BLOCK_X - 1) / BLOCK_X;

const uint2 pix_min = { block.group_index().x * BLOCK_X, block.group_index().y * BLOCK_Y };

const uint2 pix_max = { min(pix_min.x + BLOCK_X, W), min(pix_min.y + BLOCK_Y , H) };

const uint2 pix = { pix_min.x + block.thread_index().x, pix_min.y + block.thread_index().y };

const uint32_t pix_id = W * pix.y + pix.x;

const float2 pixf = { (float)pix.x, (float)pix.y };

// 上面这些就是当前线程负责的像素点的信息

const bool inside = pix.x < W&& pix.y < H;

const uint2 range = ranges[block.group_index().y * horizontal_blocks + block.group_index().x];

const int rounds = ((range.y - range.x + BLOCK_SIZE - 1) / BLOCK_SIZE);

// 轮数,和前向传播一样

bool done = !inside;

int toDo = range.y - range.x;

__shared__ int collected_id[BLOCK_SIZE];

__shared__ float2 collected_xy[BLOCK_SIZE];

__shared__ float4 collected_conic_opacity[BLOCK_SIZE];

__shared__ float collected_colors[C * BLOCK_SIZE];

// In the forward, we stored the final value for T, the

// product of all (1 - alpha) factors.

const float T_final = inside ? final_Ts[pix_id] : 0;

float T = T_final; // 最后的透光度

// We start from the back. The ID of the last contributing

// Gaussian is known from each pixel from the forward.

uint32_t contributor = toDo;

const int last_contributor = inside ? n_contrib[pix_id] : 0;

float accum_rec[C] = { 0 };

float dL_dpixel[C]; // loss对该像素RGB通道的导数

if (inside)

for (int i = 0; i < C; i++)

dL_dpixel[i] = dL_dpixels[i * H * W + pix_id];

float last_alpha = 0;

float last_color[C] = { 0 };

// Gradient of pixel coordinate w.r.t. normalized

// screen-space viewport corrdinates (-1 to 1)

// 像素坐标对像平面坐标的导(参见ndc2Pix函数)

const float ddelx_dx = 0.5 * W;

const float ddely_dy = 0.5 * H;

// Traverse all Gaussians

for (int i = 0; i < rounds; i++, toDo -= BLOCK_SIZE)

{

// Load auxiliary data into shared memory, start in the BACK

// and load them in revers order.

block.sync();

const int progress = i * BLOCK_SIZE + block.thread_rank();

if (range.x + progress < range.y)

{

const int coll_id = point_list[range.y - progress - 1];

collected_id[block.thread_rank()] = coll_id;

collected_xy[block.thread_rank()] = points_xy_image[coll_id];

collected_conic_opacity[block.thread_rank()] = conic_opacity[coll_id];

for (int i = 0; i < C; i++)

collected_colors[i * BLOCK_SIZE + block.thread_rank()] = colors[coll_id * C + i];

}

block.sync();

// Iterate over Gaussians

for (int j = 0; !done && j < min(BLOCK_SIZE, toDo); j++)

{

// Keep track of current Gaussian ID. Skip, if this one

// is behind the last contributor for this pixel.

contributor--;

if (contributor >= last_contributor)

continue;

// 之所以不直接把contributor设置成last_contributor,

// 是因为排在我的last_contributor后面的Gaussian可能

// 对于其他像素来说是可见的

// Compute blending values, as before.

const float2 xy = collected_xy[j];

const float2 d = { xy.x - pixf.x, xy.y - pixf.y };

const float4 con_o = collected_conic_opacity[j];

const float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y;

if (power > 0.0f)

continue;

const float G = exp(power);

const float alpha = min(0.99f, con_o.w * G); // 先把alpha算出来

if (alpha < 1.0f / 255.0f)

continue;

T = T / (1.f - alpha); // 一点一点把T除掉(T原本是累乘)

const float dchannel_dcolor = alpha * T;

// Propagate gradients to per-Gaussian colors and keep

// gradients w.r.t. alpha (blending factor for a Gaussian/pixel

// pair).

float dL_dalpha = 0.0f;

const int global_id = collected_id[j];

for (int ch = 0; ch < C; ch++)

{

const float c = collected_colors[ch * BLOCK_SIZE + j];

// Gaussian的color

// Update last color (to be used in the next iteration)

accum_rec[ch] = last_alpha * last_color[ch] + (1.f - last_alpha) * accum_rec[ch];

// 更新我以及后面所有Gaussian的颜色贡献

last_color[ch] = c; // 我后面的Gaussian的颜色

const float dL_dchannel = dL_dpixel[ch];

dL_dalpha += (c - accum_rec[ch]) * dL_dchannel;

// Update the gradients w.r.t. color of the Gaussian.

// Atomic, since this pixel is just one of potentially

// many that were affected by this Gaussian.

atomicAdd(&(dL_dcolors[global_id * C + ch]), dchannel_dcolor * dL_dchannel);

}

dL_dalpha *= T;

// Update last alpha (to be used in the next iteration)

last_alpha = alpha;

// Account for fact that alpha also influences how much of

// the background color is added if nothing left to blend

float bg_dot_dpixel = 0;

for (int i = 0; i < C; i++)

bg_dot_dpixel += bg_color[i] * dL_dpixel[i];

dL_dalpha += (-T_final / (1.f - alpha)) * bg_dot_dpixel;

// Helpful reusable temporary variables

const float dL_dG = con_o.w * dL_dalpha;

const float gdx = G * d.x;

const float gdy = G * d.y;

const float dG_ddelx = -gdx * con_o.x - gdy * con_o.y;

const float dG_ddely = -gdy * con_o.z - gdx * con_o.y;

// 下面累积这些参数的梯度

// Update gradients w.r.t. 2D mean position of the Gaussian

atomicAdd(&dL_dmean2D[global_id].x, dL_dG * dG_ddelx * ddelx_dx);

atomicAdd(&dL_dmean2D[global_id].y, dL_dG * dG_ddely * ddely_dy);

// Update gradients w.r.t. 2D covariance (2x2 matrix, symmetric)

atomicAdd(&dL_dconic2D[global_id].x, -0.5f * gdx * d.x * dL_dG);

atomicAdd(&dL_dconic2D[global_id].y, -0.5f * gdx * d.y * dL_dG);

atomicAdd(&dL_dconic2D[global_id].w, -0.5f * gdy * d.y * dL_dG);

// Update gradients w.r.t. opacity of the Gaussian

atomicAdd(&(dL_dopacity[global_id]), G * dL_dalpha);

}

}

}

3. backward.cu: preprocess

这个函数主要干了四件事:

从Gaussian中心在像平面上的二维坐标的梯度计算其三维坐标的梯度;从Gaussian颜色的梯度计算球谐系数的梯度(利用computeColorFromSH函数);从Gaussian在像平面上投影的2D椭圆二次型矩阵的梯度计算其3D协方差矩阵梯度;从协方差矩阵梯度计算缩放和旋转参数的梯度。

该函数及其调用的函数的代码我就不贴了,都是一些纷繁复杂的数学运算。

至此,我们大体上读完了Gaussian Splatting的代码。

完结撒花!!​​

相关文章

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: