σ11e−2σ12(x−μ1)2dx=2σ22σ12+μ12−2μ1μ2+μ22=2σ22σ12+(μ1−μ2)2
对上述式子进行汇总:
K
L
(
N
(
μ
1
,
σ
1
2
)
∥
N
(
μ
2
,
σ
2
2
)
)
=
σ
2
σ
1
−
1
2
+
σ
1
2
+
(
μ
1
−
μ
2
)
2
2
σ
2
2
=
1
2
(
σ
1
2
+
μ
1
2
−
σ
1
2
−
1
)
begin{aligned} mathrm{KL}left(mathcal{N}left(mu_{1}, sigma_{1}^{2}right) | mathcal{N}left(mu_{2}, sigma_{2}^{2}right)right) &=log{frac{sigma_2}{sigma_1}-frac{1}{2}+frac{sigma_1^2+(mu_1-mu_2)^2}{2sigma_2^2}} \&=frac{1}{2}(sigma_1^2+mu_1^2-log^{sigma_1^2}-1) end{aligned}
KL(N(μ1,σ12)∥N(μ2,σ22))=logσ1σ2−21+2σ22σ12+(μ1−μ2)2=21(σ12+μ12−logσ12−1)
代码部分
损失函数
通过上述推导,我们知道了需要最小化散度,然后最大化那个均值。所以可以得到如下的损失函数。
def loss_fn(recon_x, x, mean, log_var):
BCE = torch.nn.functional.binary_cross_entropy(
recon_x.view(-1, 28*28), x.view(-1, 28*28), reduction='sum')
KLD = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())
return (BCE + KLD) / x.size(0)
Encoder部分
class Encoder(nn.Module):
def __init__(self, layer_sizes, latent_size):
super(Encoder, self).__init__()
self.MLP = nn.Sequential()
for i, (in_size, out_size) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
self.MLP.add_module(name="L{:d}".format(i), module=nn.Linear(in_size, out_size))
self.MLP.add_module(name="A{:d}".format(i), module=nn.ReLU())
# 首先对图像特征进行一些变换处理,然后将其展开成一维向量,然后通过全连接层得到均值和方差
self.linear_means = nn.Linear(layer_sizes[-1], latent_size)
self.linear_log_var = nn.Linear(layer_sizes[-1], latent_size)
def forward(self, x):
x = self.MLP(x)
means = self.linear_means(x)
log_vars = self.linear_log_var(x)
return means, log_vars
Decoder部分
class Decoder(nn.Module):
def __init__(self, layer_sizes, latent_size):
super(Decoder, self).__init__()
self.MLP = nn.Sequential()
input_size = latent_size
for i, (in_size, out_size) in enumerate(zip([input_size] + layer_sizes[:-1], layer_sizes)):
self.MLP.add_module(
name="L{:d}".format(i), module=nn.Linear(in_size, out_size))
if i + 1 < len(layer_sizes):
self.MLP.add_module(name="A{:d}".format(i), module=nn.ReLU())
else:
self.MLP.add_module(name="sigmoid", module=nn.Sigmoid())
def forward(self, z):
#对输入的z进行全接连操作,最后输出一个重构的x
x = self.MLP(z)
return x
VAE整体架构
class VAE(nn.Module):
def __init__(self, encoder_layer_sizes, latent_size, decoder_layer_sizes):
super(VAE, self).__init__()
self.latent_size = latent_size
self.encoder = Encoder(encoder_layer_sizes, latent_size)
self.decoder = Decoder(decoder_layer_sizes, latent_size)
def forward(self, x):
if x.dim() > 2:
x = x.view(-1, 28 * 28)
means, log_var = self.encoder(x)
z = self.reparameterize(means, log_var)
recon_x = self.decoder(z)
return recon_x, means, log_var, z
def reparameterize(self, mu, log_var):
"""
用于对encoder部分输出的均值方差进行重参数化,采样得到隐式表示部分z
:param mu:
:param log_var:
:return:
"""
std = torch.exp(0.5 * log_var)
eps = torch.randn_like(std)
return mu + eps * std
def inference(self, z):
recon_x = self.decoder(z)
return recon_x
VAE问题
再产生图片时,只是通过像素差异进行评估,则对于关键点像素和可忽略像素之间的图片,两者在vae看来是一致的,但是不是理想的产生图片,因此出现了GAN
参考资料
ML Lecture 18: Unsupervised Learning – Deep Generative Model (Part II)
原文地址:https://blog.csdn.net/qq_49729636/article/details/134671076
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.7code.cn/show_26958.html
如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱:suwngjj01@126.com进行投诉反馈,一经查实,立即删除!