Skip to content
English library

ChainThought

Chain of Thought collection

Play icon crypto ? lives Chain Thought

Dataset Overview

The CoT Collection is a new instruction-tuning dataset containing 1.88 million Chain-of-Thought (CoT) rationales across 1,060 tasks. This dataset is designed to improve the reasoning capabilities of smaller language models (with less than 100B parameters) by equipping them with step-by-step reasoning skills.

Data Origin

The CoT Collection was developed to address the challenge of enabling smaller language models to perform chain-of-thought reasoning. The dataset is publicly available and includes fine-tuning examples derived from multiple tasks that improve performance on both zero-shot and few-shot learning scenarios.

CoT for Language Models

The dataset focuses on improving the Chain-of-Thought (CoT) reasoning capability of smaller language models, such as Flan-T5 (3B and 11B), enabling them to better tackle unseen tasks. Fine-tuning these models with CoT rationales results in notable improvements in both zero-shot and few-shot performance across a variety of benchmarks.

Performance Improvement

CoT fine-tuning has led to significant improvements in task accuracy, with the Flan-T5 3B model improving by +4.34% and the Flan-T5 11B model by +2.60% on the BIG-Bench-Hard benchmark. Additionally, domain-specific tasks showed improvements of +2.24% and +2.37%, outperforming larger models like ChatGPT on several tasks.

Model and Data Access

The CoT Collection dataset, as well as the fine-tuned models and code, are publicly available for further research and development. This allows the broader community to leverage these resources for advancing smaller LMs in reasoning tasks.

Find the plan that's right for you, each plan includes

docs iconsDocs
sheets iconsSheets
slides iconsslides
forms iconsforms
keep iconskeep
sites iconssites
drive iconsdrive
gmail iconsgmail
meet iconsmeet
calendar iconscalendar
Chat_icon@1x iconsChat
docusaurus_keytar iconsjup
docusaurus iconsBusiness
GoogleMaps iconsGoogleMaps

Released under the MIT License.

Thought has loaded