HuggingFace 入门

🎬 视频详解 (Video)

📌 核心定义 (What)

一句话定义：HuggingFace 是 AI 界的 npm——提供数万个预训练模型，一行代码即可使用。

为什么需要 HuggingFace？

从零训练模型需要海量数据 + GPU + 数周时间
预训练模型已经学会了通用知识
你只需要 加载 + 微调/直接使用

🎨 交互演示 (Interactive)

选择任务类型，体验 HuggingFace Pipeline 的简洁 API。

🤗 HuggingFace Pipeline 演示

text-classification|模型: distilbert-base-uncased-finetuned-sst-2-english

Pipeline 内部流程

📝 Text

→

🔤 Tokenizer

→

🧠 Model

→

📊 Output

💡 pipeline() 封装了 Tokenizer + Model + 后处理，一行代码完成推理

🏠 生活类比 (Analogy)

📦 “预制菜 vs 从头做饭”

场景	对应技术
从头做饭	从零训练模型（费时费力）
预制菜	预训练模型（开袋即食）
美团外卖	HuggingFace Hub（各种模型任你挑）
加热一下	微调 Fine-tune（适配你的场景）

💻 JS 开发者类比

// 你熟悉的 npm 工作流
npm install lodash
import _ from 'lodash';
_.chunk([1,2,3,4], 2);  // [[1,2], [3,4]]

// HuggingFace 工作流 (几乎一样!)
// pip install transformers
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I love AI!")  // [{"label": "POSITIVE", "score": 0.99}]

🔥 快速上手

1. 安装

pip install transformers torch

2. 使用 Pipeline (推荐)

from transformers import pipeline

# 情感分析
classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.9998}]

# 文本生成
generator = pipeline("text-generation", model="gpt2")
result = generator("Once upon a time", max_length=50)
print(result[0]['generated_text'])

# 问答系统
qa = pipeline("question-answering")
result = qa(
    question="What is the capital of France?",
    context="France is a country in Europe. Paris is its capital."
)
print(result)  # {'answer': 'Paris', 'score': 0.98}

3. 手动加载模型 (更灵活)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 加载模型和分词器
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Tokenize
text = "I love this!"
inputs = tokenizer(text, return_tensors="pt")

# 推理
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)

print(predictions)  # tensor([[0.0002, 0.9998]])

📊 常用 Pipeline 任务

任务	标识符	用途
情感分析	`sentiment-analysis`	判断文本正负面
文本分类	`text-classification`	自定义分类
命名实体识别	`ner`	提取人名、地名等
问答	`question-answering`	根据上下文回答问题
文本生成	`text-generation`	续写文本
翻译	`translation`	机器翻译
摘要	`summarization`	长文本压缩
零样本分类	`zero-shot-classification`	无需训练的分类

🚀 下一步

浏览 HuggingFace Hub - 探索数万个模型
RAG 实战 - 结合检索增强生成
Transformer 原理 - 理解底层架构