CS231n Regularization
Stanford 课程正则化讲义
观察 L1/L2 正则化如何防止过拟合。增大多项式阶数会导致过拟合,增大 λ 可以抑制过拟合。
💡 紫点=训练数据, 虚线=真实函数 sin(πx)
一句话定义:正则化是一类限制模型复杂度的技术,目的是防止模型在训练数据上表现太好(过拟合),而在新数据上表现很差(泛化能力差)。
常见正则化方法:
| 欠拟合 | 刚刚好 | 过拟合 | |
|---|---|---|---|
| 偏差 (Bias) | 高 | 低 | 低 |
| 方差 (Variance) | 低 | 低 | 高 |
| 训练误差 | 高 | 低 | 极低 |
| 测试误差 | 高 | 低 | 高 |
正则化的目标:在偏差和方差之间找到平衡点,使测试误差最小。
在损失函数中添加权重的平方和:
使用权重的绝对值之和:
训练时以概率 随机将神经元输出置为 0:
| 特性 | L1 (Lasso) | L2 (Ridge) |
|---|---|---|
| 惩罚形式 | ||
| 权重分布 | 稀疏(很多 0) | 小而分散 |
| 特征选择 | ✅ 自动 | ❌ |
| 计算 | 不可导点需处理 | 处处可导 |
| 使用场景 | 高维稀疏数据 | 通用 |
import torchimport torch.nn as nn
# ===== L2 正则化 (Weight Decay) =====# PyTorch 优化器内置支持model = nn.Linear(10, 1)optimizer = torch.optim.Adam( model.parameters(), lr=0.001, weight_decay=0.01 # L2 正则化强度)
# ===== L1 正则化 (手动添加) =====def l1_regularization(model, lambda_l1=0.001): l1_loss = 0 for param in model.parameters(): l1_loss += torch.abs(param).sum() return lambda_l1 * l1_loss
# 训练循环中loss = criterion(output, target)loss += l1_regularization(model) # 添加 L1 项loss.backward()
# ===== Dropout =====class MyModel(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 256) self.dropout = nn.Dropout(p=0.5) # 50% drop 概率 self.fc2 = nn.Linear(256, 10)
def forward(self, x): x = torch.relu(self.fc1(x)) x = self.dropout(x) # 训练时生效,eval 时自动关闭 x = self.fc2(x) return x
model = MyModel()model.train() # Dropout 生效model.eval() # Dropout 关闭
# ===== 早停 (Early Stopping) =====class EarlyStopping: def __init__(self, patience=5, min_delta=0): self.patience = patience self.min_delta = min_delta self.counter = 0 self.best_loss = float('inf') self.should_stop = False
def __call__(self, val_loss): if val_loss < self.best_loss - self.min_delta: self.best_loss = val_loss self.counter = 0 else: self.counter += 1 if self.counter >= self.patience: self.should_stop = True
# 使用early_stop = EarlyStopping(patience=5)for epoch in range(100): train_loss = train_one_epoch() val_loss = validate() early_stop(val_loss) if early_stop.should_stop: print(f"Early stopping at epoch {epoch}") breakimport torchvision.transforms as T
# 图像数据增强train_transform = T.Compose([ T.RandomHorizontalFlip(p=0.5), # 水平翻转 T.RandomRotation(degrees=15), # 随机旋转 T.RandomResizedCrop(224, scale=(0.8, 1.0)), # 随机裁剪 T.ColorJitter(brightness=0.2, contrast=0.2), # 颜色抖动 T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),])
# 测试时不做增强test_transform = T.Compose([ T.Resize(256), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),])
# NLP 数据增强示例def text_augment(text): import random words = text.split() # 随机删除 if random.random() < 0.1: idx = random.randint(0, len(words)-1) words.pop(idx) # 随机交换 if random.random() < 0.1 and len(words) > 1: i, j = random.sample(range(len(words)), 2) words[i], words[j] = words[j], words[i] return ' '.join(words)