[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73311":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},73311,"RustGPT","tekaratzas\u002FRustGPT","tekaratzas","An transformer based LLM. Written completely in Rust",null,"Rust",3134,264,34,5,0,4,11,16,12,74.37,"MIT License",false,"main",true,[],"2026-06-12 04:01:08","# 🦀 Rust LLM from Scratch\n\n[![Check](https:\u002F\u002Fgithub.com\u002Ftekaratzas\u002FRustGPT\u002Factions\u002Fworkflows\u002Fcheck.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftekaratzas\u002FRustGPT\u002Factions\u002Fworkflows\u002Fcheck.yml) [![Test](https:\u002F\u002Fgithub.com\u002Ftekaratzas\u002FRustGPT\u002Factions\u002Fworkflows\u002Ftest.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftekaratzas\u002FRustGPT\u002Factions\u002Fworkflows\u002Ftest.yml)\n\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fec4a4100-b03a-4b3c-a7d6-806ea54ed4ed\n\nA complete **Large Language Model implementation in pure Rust** with no external ML frameworks. Built from the ground up using only `ndarray` for matrix operations.\n\n## 🚀 What This Is\n\nThis project demonstrates how to build a transformer-based language model from scratch in Rust, including:\n- **Pre-training** on factual text completion\n- **Instruction tuning** for conversational AI\n- **Interactive chat mode** for testing\n- **Full backpropagation** with gradient clipping\n- **Modular architecture** with clean separation of concerns\n\n## ❌ What This Isn't\n\nThis is not a production grade LLM. It is so far away from the larger models.\n\nThis is just a toy project that demonstrates how these models work under the hood.\n\n## 🔍 Key Files to Explore\n\nStart with these two core files to understand the implementation:\n\n- **[`src\u002Fmain.rs`](src\u002Fmain.rs)** - Training pipeline, data preparation, and interactive mode\n- **[`src\u002Fllm.rs`](src\u002Fllm.rs)** - Core LLM implementation with forward\u002Fbackward passes and training logic\n\n## 🏗️ Architecture\n\nThe model uses a **transformer-based architecture** with the following components:\n\n```\nInput Text → Tokenization → Embeddings → Transformer Blocks → Output Projection → Predictions\n```\n\n### Project Structure\n\n```\nsrc\u002F\n├── main.rs              # 🎯 Training pipeline and interactive mode\n├── llm.rs               # 🧠 Core LLM implementation and training logic\n├── lib.rs               # 📚 Library exports and constants\n├── transformer.rs       # 🔄 Transformer block (attention + feed-forward)\n├── self_attention.rs    # 👀 Multi-head self-attention mechanism\n├── feed_forward.rs      # ⚡ Position-wise feed-forward networks\n├── embeddings.rs        # 📊 Token embedding layer\n├── output_projection.rs # 🎰 Final linear layer for vocabulary predictions\n├── vocab.rs            # 📝 Vocabulary management and tokenization\n├── layer_norm.rs       # 🧮 Layer normalization\n└── adam.rs             # 🏃 Adam optimizer implementation\n\ntests\u002F\n├── llm_test.rs         # Tests for core LLM functionality\n├── transformer_test.rs # Tests for transformer blocks\n├── self_attention_test.rs # Tests for attention mechanisms\n├── feed_forward_test.rs # Tests for feed-forward layers\n├── embeddings_test.rs  # Tests for embedding layers\n├── vocab_test.rs       # Tests for vocabulary handling\n├── adam_test.rs        # Tests for optimizer\n└── output_projection_test.rs # Tests for output layer\n```\n\n## 🧪 What The Model Learns\n\nThe implementation includes two training phases:\n\n1. **Pre-training**: Learns basic world knowledge from factual statements\n   - \"The sun rises in the east and sets in the west\"\n   - \"Water flows downhill due to gravity\"\n   - \"Mountains are tall and rocky formations\"\n\n2. **Instruction Tuning**: Learns conversational patterns\n   - \"User: How do mountains form? Assistant: Mountains are formed through tectonic forces...\"\n   - Handles greetings, explanations, and follow-up questions\n\n## 🚀 Quick Start\n\n```bash\n# Clone and run\ngit clone https:\u002F\u002Fgithub.com\u002Ftekaratzas\u002FRustGPT.git\ncd RustGPT\ncargo run\n\n# The model will:\n# 1. Build vocabulary from training data\n# 2. Pre-train on factual statements (100 epochs)\n# 3. Instruction-tune on conversational data (100 epochs)\n# 4. Enter interactive mode for testing\n```\n\n## 🎮 Interactive Mode\n\nAfter training, test the model interactively:\n\n```\nEnter prompt: How do mountains form?\nModel output: Mountains are formed through tectonic forces or volcanism over long geological time periods\n\nEnter prompt: What causes rain?\nModel output: Rain is caused by water vapor in clouds condensing into droplets that become too heavy to remain airborne\n```\n\n## 🧮 Technical Implementation\n\n### Model Configuration\n- **Vocabulary Size**: Dynamic (built from training data)\n- **Embedding Dimension**: 128 (defined by `EMBEDDING_DIM` in `src\u002Flib.rs`)\n- **Hidden Dimension**: 256 (defined by `HIDDEN_DIM` in `src\u002Flib.rs`)\n- **Max Sequence Length**: 80 tokens (defined by `MAX_SEQ_LEN` in `src\u002Flib.rs`)\n- **Architecture**: 3 Transformer blocks + embeddings + output projection\n\n### Training Details\n- **Optimizer**: Adam with gradient clipping\n- **Pre-training LR**: 0.0005 (100 epochs)\n- **Instruction Tuning LR**: 0.0001 (100 epochs)\n- **Loss Function**: Cross-entropy loss\n- **Gradient Clipping**: L2 norm capped at 5.0\n\n### Key Features\n- **Custom tokenization** with punctuation handling\n- **Greedy decoding** for text generation\n- **Gradient clipping** for training stability\n- **Modular layer system** with clean interfaces\n- **Comprehensive test coverage** for all components\n\n## 🔧 Development\n\n```bash\n# Run all tests\ncargo test\n\n# Test specific components\ncargo test --test llm_test\ncargo test --test transformer_test\ncargo test --test self_attention_test\n\n# Build optimized version\ncargo build --release\n\n# Run with verbose output\ncargo test -- --nocapture\n```\n\n## 🧠 Learning Resources\n\nThis implementation demonstrates key ML concepts:\n- **Transformer architecture** (attention, feed-forward, layer norm)\n- **Backpropagation** through neural networks\n- **Language model training** (pre-training + fine-tuning)\n- **Tokenization** and vocabulary management\n- **Gradient-based optimization** with Adam\n\nPerfect for understanding how modern LLMs work under the hood!\n\n## 📊 Dependencies\n\n- `ndarray` - N-dimensional arrays for matrix operations\n- `rand` + `rand_distr` - Random number generation for initialization\n\nNo PyTorch, TensorFlow, or Candle - just pure Rust and linear algebra!\n\n## 🤝 Contributing\n\nContributions are welcome! This project is perfect for learning and experimentation.\n\n### High Priority Features Needed\n- **🏪 Model Persistence** - Save\u002Fload trained parameters to disk (currently all in-memory)\n- **⚡ Performance optimizations** - SIMD, parallel training, memory efficiency\n- **🎯 Better sampling** - Beam search, top-k\u002Ftop-p, temperature scaling\n- **📊 Evaluation metrics** - Perplexity, benchmarks, training visualizations\n\n### Areas for Improvement\n- **Advanced architectures** (multi-head attention, positional encoding, RoPE)\n- **Training improvements** (different optimizers, learning rate schedules, regularization)\n- **Data handling** (larger datasets, tokenizer improvements, streaming)\n- **Model analysis** (attention visualization, gradient analysis, interpretability)\n\n### Getting Started\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature\u002Fmodel-persistence`\n3. Make your changes and add tests\n4. Run the test suite: `cargo test`\n5. Submit a pull request with a clear description\n\n### Code Style\n- Follow standard Rust conventions (`cargo fmt`)\n- Add comprehensive tests for new features\n- Update documentation and README as needed\n- Keep the \"from scratch\" philosophy - avoid heavy ML dependencies\n\n### Ideas for Contributions\n- 🚀 **Beginner**: Model save\u002Fload, more training data, config files\n- 🔥 **Intermediate**: Beam search, positional encodings, training checkpoints\n- ⚡ **Advanced**: Multi-head attention, layer parallelization, custom optimizations\n\nQuestions? Open an issue or start a discussion!\n\nNo PyTorch, TensorFlow, or Candle - just pure Rust and linear algebra!\n","RustGPT是一个完全用Rust编写的基于Transformer架构的大规模语言模型。该项目不依赖任何外部机器学习框架，仅使用`ndarray`进行矩阵运算，实现了从预训练到指令微调的全流程，包括交互式聊天模式和全反向传播算法。其模块化设计使得代码结构清晰，易于理解和扩展。尽管它目前还不是一个生产级别的解决方案，但非常适合于那些希望深入了解语言模型内部机制的研究者或开发者，特别是对Rust编程感兴趣的用户。通过这个项目，你可以学习到如何从零开始构建一个复杂的深度学习系统。",2,"2026-06-11 03:44:56","high_star"]