Quantized Llama models with increased speed and a reduced memory footprint

Quantized Llama models with increased speed and a reduced memory footprint

📅 2024-10-24 ⚓ Hacker News 🌐 Source 🖼️ Load Image