Post

类o1长推理数据集

整理了一些长推理数据集

STILL

math, physics, chemistry, biology, code, puzzle

NuminaMath CoT

数学题,从中国高中数学练习到美国和国际数学奥林匹克竞赛问题

MetaMathQA

GSM8K and MATH

CoT_zh

FLAN 发布的 9 个 CoT 数据集组合,谷歌翻译成中文

sharegpt_cot_dataset

kaist-ai/CoT-Collection

1060 tasks chosen from the Flan Collection

gsm8k_synthetic_cot

Math-reasoning-10k

为 NuminaMath CoT 生成推理思路,但不实际解决问题

Model used: Qwen2.5-7B-Instruct (in my testing, it showed the same performance as the bigger Qwen 2.5 72b, 32b instruct versions)

Sky-T1_data_17k

This post is licensed under CC BY 4.0 by the author.