feat: dockerize app and unify query management UI
This commit is contained in:
@@ -0,0 +1,372 @@
|
||||
# MySQL Query Design
|
||||
|
||||
## Goal
|
||||
|
||||
让三方直接查 MySQL,同时保持查询简单、稳定、足够快。
|
||||
|
||||
当前数据规模:
|
||||
|
||||
- `device_record`: 6219
|
||||
- `lookup key`: 23463
|
||||
|
||||
这个量级不需要分库分表,也不需要 ES。MySQL 8 + 合适索引就足够。
|
||||
|
||||
## Overall Design
|
||||
|
||||
建议把 MySQL 拆成两层角色:
|
||||
|
||||
1. 离线构建层
|
||||
- 继续以 `brands/*.md` 为原始数据。
|
||||
- 通过 `tools/device_mapper.py` 生成标准化记录。
|
||||
- 通过 `tools/export_mysql_seed.py` 导出 MySQL seed SQL。
|
||||
2. 查询服务层
|
||||
- 三方只读查询 MySQL。
|
||||
- 只开放读账号,只允许 `SELECT`。
|
||||
- 最好挂在只读实例或只读副本上,不直接连主库。
|
||||
|
||||
## Tables
|
||||
|
||||
### `mm_device_record`
|
||||
|
||||
兼容视图,一条设备一行,来自 `mm_device_catalog` 聚合。
|
||||
|
||||
主要字段:
|
||||
|
||||
- `record_id`
|
||||
- `device_name`
|
||||
- `brand`
|
||||
- `manufacturer_brand`
|
||||
- `parent_brand`
|
||||
- `market_brand`
|
||||
- `device_type`
|
||||
- `source_file`
|
||||
- `section`
|
||||
- `source_rank`
|
||||
- `source_weight`
|
||||
- `aliases_json`
|
||||
|
||||
用途:
|
||||
|
||||
- 回源查看聚合后的设备信息
|
||||
- 兼容历史排查 SQL
|
||||
- 不再作为独立物理表维护
|
||||
|
||||
### `mm_device_catalog`
|
||||
|
||||
统一设备查询主表,一条设备 alias 一行。
|
||||
|
||||
主要字段:
|
||||
|
||||
- `model`
|
||||
- `alias_norm`
|
||||
- `record_id`
|
||||
- `device_name`
|
||||
- `brand`
|
||||
- `manufacturer_brand`
|
||||
- `parent_brand`
|
||||
- `market_brand`
|
||||
- `device_type`
|
||||
- `source_rank`
|
||||
- `source_weight`
|
||||
- `code`
|
||||
- `code_alias`
|
||||
- `ver_name`
|
||||
|
||||
这是当前唯一的设备物理表。
|
||||
|
||||
原因:
|
||||
|
||||
- 只需要等值查 `alias_norm`
|
||||
- 同时兼容 `mm_device_lookup` 和 `models`
|
||||
- 最适合高频只读场景
|
||||
|
||||
### `mm_device_lookup`
|
||||
|
||||
兼容视图,来自 `mm_device_catalog`。
|
||||
|
||||
### `mm_brand_lookup`
|
||||
|
||||
品牌归一化表,用于把三方传来的品牌值归到:
|
||||
|
||||
- `manufacturer_brand`
|
||||
- `parent_brand`
|
||||
- `market_brand`
|
||||
|
||||
比如:
|
||||
|
||||
- `荣耀` -> `HONOR`
|
||||
- `redmi` -> `market_brand=Redmi`
|
||||
- `xiaomi` -> `manufacturer_brand=Xiaomi`
|
||||
|
||||
### `models` / `python_services_test.models`
|
||||
|
||||
为了兼容旧查询链路,保留一层旧结构适配:
|
||||
|
||||
- `mobilemodels.models`(兼容视图)
|
||||
- `python_services_test.models`(兼容视图)
|
||||
|
||||
字段映射说明:
|
||||
|
||||
- `model`
|
||||
- 当前行可直接查询的原始设备标识
|
||||
- 一条设备的多个 alias 会展开成多行
|
||||
- `dtype`
|
||||
- 对应当前的 `device_type`
|
||||
- `brand`
|
||||
- 优先使用 `market_brand`
|
||||
- `brand_title`
|
||||
- 使用 `manufacturer_brand`
|
||||
- `code`
|
||||
- 当前设备识别出的主型号编码
|
||||
- `code_alias`
|
||||
- 其他型号编码,使用 ` | ` 拼接
|
||||
- `model_name`
|
||||
- 对应当前的 `device_name`
|
||||
- `ver_name`
|
||||
- 当前设备的人类可读别名,使用 ` | ` 拼接
|
||||
|
||||
这层主要用于兼容历史 SQL 和第三方直接查表,不建议作为后续新能力的主数据模型。
|
||||
|
||||
## Query Contract
|
||||
|
||||
三方不要直接查原始字符串,统一查归一化后的 `alias_norm`。
|
||||
|
||||
归一化规则与当前项目一致:
|
||||
|
||||
- 全部转小写
|
||||
- 只保留 `[0-9a-z\u4e00-\u9fff]`
|
||||
- 去掉空格、横线、下划线和其他标点
|
||||
|
||||
例如:
|
||||
|
||||
- `SM-G9980` -> `smg9980`
|
||||
- `iPhone14,2` -> `iphone142`
|
||||
- `NOH-AL00` -> `nohal00`
|
||||
|
||||
## Recommended SQL
|
||||
|
||||
推荐分三层:
|
||||
|
||||
1. 新接入
|
||||
- 直接查 `mobilemodels.mm_device_catalog`
|
||||
2. 兼容现有查询
|
||||
- 继续查 `mobilemodels.mm_device_lookup`
|
||||
3. 兼容历史旧结构
|
||||
- 继续查 `python_services_test.models`
|
||||
|
||||
### 1. 主推查法:按设备标识查主表
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
model,
|
||||
record_id,
|
||||
alias_norm,
|
||||
device_name,
|
||||
brand,
|
||||
manufacturer_brand,
|
||||
parent_brand,
|
||||
market_brand,
|
||||
device_type,
|
||||
source_file,
|
||||
section,
|
||||
source_rank,
|
||||
source_weight,
|
||||
code,
|
||||
code_alias,
|
||||
ver_name
|
||||
FROM mobilemodels.mm_device_catalog
|
||||
WHERE alias_norm = ?
|
||||
ORDER BY source_rank ASC, record_id ASC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
说明:
|
||||
|
||||
- `alias_norm` 是主查询键
|
||||
- `model` 是当前命中的原始设备标识
|
||||
- `code / code_alias / ver_name` 用于兼容历史字段和辅助展示
|
||||
|
||||
### 1.1 兼容查法:沿用 `mm_device_lookup`
|
||||
|
||||
如果接入方已经依赖 `mm_device_lookup`,可以不改 SQL:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
alias_norm,
|
||||
record_id,
|
||||
device_name,
|
||||
brand,
|
||||
manufacturer_brand,
|
||||
parent_brand,
|
||||
market_brand,
|
||||
device_type,
|
||||
source_file,
|
||||
section,
|
||||
source_rank,
|
||||
source_weight
|
||||
FROM mobilemodels.mm_device_lookup
|
||||
WHERE alias_norm = ?
|
||||
ORDER BY source_rank ASC, record_id ASC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
### 1.2 兼容旧结构查法:沿用 `python_services_test.models`
|
||||
|
||||
如果接入方仍然沿用旧表结构,可以继续查:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
model,
|
||||
dtype,
|
||||
brand,
|
||||
brand_title,
|
||||
code,
|
||||
code_alias,
|
||||
model_name,
|
||||
ver_name
|
||||
FROM python_services_test.models
|
||||
WHERE model = ?
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
### 2. 主推查法:带品牌约束查
|
||||
|
||||
先把品牌查成归一化结果,再过滤:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
l.model,
|
||||
l.record_id,
|
||||
l.alias_norm,
|
||||
l.device_name,
|
||||
l.brand,
|
||||
l.manufacturer_brand,
|
||||
l.parent_brand,
|
||||
l.market_brand,
|
||||
l.device_type,
|
||||
l.source_file,
|
||||
l.section,
|
||||
l.source_rank,
|
||||
l.source_weight,
|
||||
l.code,
|
||||
l.code_alias,
|
||||
l.ver_name
|
||||
FROM mobilemodels.mm_device_catalog AS l
|
||||
LEFT JOIN mobilemodels.mm_brand_lookup AS b
|
||||
ON b.alias_norm = ?
|
||||
WHERE l.alias_norm = ?
|
||||
AND (
|
||||
b.alias_norm IS NULL
|
||||
OR (b.market_brand IS NOT NULL AND l.market_brand = b.market_brand)
|
||||
OR (b.manufacturer_brand IS NOT NULL AND l.manufacturer_brand = b.manufacturer_brand)
|
||||
OR (b.parent_brand IS NOT NULL AND l.parent_brand = b.parent_brand)
|
||||
)
|
||||
ORDER BY l.source_rank ASC, l.record_id ASC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
### 3. 兼容查法:按设备记录聚合查看
|
||||
|
||||
如果历史排查逻辑需要“一台设备一行”,可以继续查:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
record_id,
|
||||
device_name,
|
||||
brand,
|
||||
manufacturer_brand,
|
||||
parent_brand,
|
||||
market_brand,
|
||||
device_type,
|
||||
source_file,
|
||||
section,
|
||||
source_rank,
|
||||
source_weight,
|
||||
aliases_json
|
||||
FROM mobilemodels.mm_device_record
|
||||
WHERE record_id = ?
|
||||
LIMIT 1;
|
||||
```
|
||||
|
||||
## Migration Advice
|
||||
|
||||
建议迁移顺序:
|
||||
|
||||
1. 新增接入直接使用 `mobilemodels.mm_device_catalog`
|
||||
2. 旧查询先保持 `mm_device_lookup` / `python_services_test.models` 不变
|
||||
3. 待业务侧完成字段适配后,再逐步切到主表
|
||||
|
||||
这样可以做到:
|
||||
|
||||
- 数据底层只维护一张设备实体表
|
||||
- 现有查询链路不需要同时改造
|
||||
- 新能力统一围绕主表扩展
|
||||
|
||||
## Performance Strategy
|
||||
|
||||
要点只有四个:
|
||||
|
||||
1. 只允许等值查询 `alias_norm`
|
||||
- 禁止 `%xxx%` 这类模糊查
|
||||
2. 查询主表用 `mm_device_catalog`
|
||||
- 兼容链路走 `mm_device_lookup` / `models` 视图
|
||||
3. 给三方只读账号
|
||||
- 只开放 `SELECT`
|
||||
4. 最好放只读实例
|
||||
- 不让三方流量影响同步和管理操作
|
||||
|
||||
在当前数据规模下,`alias_norm` 命中是很轻的查询。
|
||||
|
||||
## Update Flow
|
||||
|
||||
建议每日全量刷新一次,不做增量。
|
||||
|
||||
原因:
|
||||
|
||||
- 数据量小
|
||||
- 全量替换更稳定
|
||||
- 不容易出现脏数据和漏删
|
||||
|
||||
建议流程:
|
||||
|
||||
1. 拉取上游原始数据
|
||||
2. 生成最新 `device_index.json`
|
||||
3. 导出 MySQL seed SQL
|
||||
4. 在 MySQL 中执行:
|
||||
- `DELETE FROM mm_device_catalog`
|
||||
- `DELETE FROM mm_brand_lookup`
|
||||
- 批量插入新数据
|
||||
5. 完成后切换读流量
|
||||
|
||||
## Security
|
||||
|
||||
如果一定让三方直连 MySQL,至少要做这些限制:
|
||||
|
||||
- 单独只读账号
|
||||
- IP 白名单
|
||||
- 只授权 `mobilemodels` 和兼容需要的 `python_services_test` schema
|
||||
- 不开放 DDL / DML
|
||||
- 连接数限制
|
||||
- 查询超时限制
|
||||
|
||||
更稳妥的方式仍然是:
|
||||
|
||||
- 三方查你们自己的只读网关
|
||||
- 网关内部查 MySQL
|
||||
|
||||
但如果现阶段必须直连库,上面的三张表已经足够支撑。
|
||||
|
||||
## Files
|
||||
|
||||
- Schema: `sql/mobilemodels_mysql_schema.sql`
|
||||
- Seed exporter: `tools/export_mysql_seed.py`
|
||||
|
||||
生成 seed:
|
||||
|
||||
```bash
|
||||
python3 tools/export_mysql_seed.py
|
||||
```
|
||||
|
||||
默认输出:
|
||||
|
||||
- `dist/mobilemodels_mysql_seed.sql`
|
||||
Reference in New Issue
Block a user