What are GLM-5.2 and Kimi K2.7?
Both are open-weight, coding-first AI models released in June 2026. GLM-5.2 is built by Zhipu AI (Z.ai) — a sparse Mixture-of-Experts model (~750B parameters, ~40B active) with a context window up to 1 million tokens, released under the MIT license. Kimi K2.7 Code is built by Moonshot AI — a ~1-trillion-parameter Mixture-of-Experts model (~32B active) with a 256K context window, released under a Modified MIT license. Both are tuned for software engineering, long-horizon agent workflows, and tool use rather than pure chat.
GLM vs Kimi — which is better for coding?
They win at different things. Kimi K2.7 Code is strongest at long-horizon autonomous coding and agent swarms — it always reasons (chain-of-thought is enforced) and supports vision input, so it can turn a screenshot into code. GLM-5.2 shines on repository-scale work thanks to its up-to-1M-token context and its tunable reasoning depth (high vs max modes), making it a strong pick for auditing or refactoring an entire codebase in one pass. For quick agentic builds and vision tasks, reach for Kimi; for whole-repo reasoning and deep architecture planning, reach for GLM.
Are GLM-5.2 and Kimi K2.7 free and open source?
Both are open-weight, meaning the model weights are downloadable from Hugging Face and you can self-host them. GLM-5.2 uses the MIT license and Kimi K2.7 uses a Modified MIT license, both of which permit commercial use. Running them still costs compute — either your own hardware or an API provider (Z.ai's GLM Coding Plan, Moonshot's Kimi platform, or third parties like Cloudflare Workers AI). The prompts on this page are free to copy.
How large is each model's context window?
Kimi K2.7 Code has a 256K-token context window across all variants. GLM-5.2 supports up to a 1-million-token context (commonly deployed around 262K, with full 1M available on specialized runtimes) and can output up to ~131K tokens in a single response. GLM's larger context is why it's well-suited to feeding an entire repository at once.
Do GLM and Kimi support image or vision input?
Kimi K2.7 Code is multimodal — it includes a MoonViT vision encoder and accepts image (and video) input, so you can hand it a UI screenshot and ask for matching code. GLM-5.2 is text- and code-only; it has no vision input. If your task depends on interpreting an image, use Kimi.
What sampling and reasoning settings work best?
Both models are tuned around temperature = 1.0 and top_p = 0.95. Kimi K2.7 enforces these values — the API errors if you change them — and chain-of-thought 'thinking' is always on. GLM-5.2 makes reasoning tunable: set reasoning_effort to 'high' for most coding and Q&A, and 'max' for the hardest multi-step planning; you can also disable thinking for quick factual answers. Set max_tokens generously (GLM's 'max' mode can produce very long reasoning chains).
How do I write a good prompt for these coding models?
Four things move quality the most: (1) set a clear role and the exact task in an imperative sentence, (2) let the model reason — both models do chain-of-thought, so ask it to plan before it codes, (3) specify the output format precisely (a JSON schema, a file-by-file structure, a test suite), and (4) give constraints — what NOT to do, which dependencies are allowed, and to self-verify by running tests before claiming success. Every prompt in this library follows that structure.