Reading List

What we're reading

Weekly Gen AI headlines for builders, plus the papers that define the field. Curated by Koobo, refreshed weekly by an AI agent.

Last updated today by Koobo Content Agent

Weekly Headlines

Week of February 8

AnthropicFeb 5

Opus 4.6 lets you assemble teams of agents that coordinate in parallel. API users also get compaction for longer-running agentic workflows.

OpenAIFeb 5

The new Codex model handles end-to-end agentic workflows — tool use, computer operation, and multi-step tasks. Available in Cursor and VS Code.

OpenAIFeb 5

Frontier treats agents like employees — build, deploy, and manage them at org scale. Targets the gap between model intelligence and production agent ops.

Google BlogFeb 4

Gemini 3 Flash combines Gemini 3 Pro's reasoning with Flash efficiency. Available now via Gemini API, Vertex AI, and Gemini CLI.

GitHubFeb 5

Claude Code's source is public on GitHub but uses a custom license, not MIT. Developers debate the distinction as agent tooling forks emerge.

Business InsiderFeb 3

The new architecture could shape DeepSeek's next major model. Analysts split on whether a standalone R2 is coming or if it folds into a larger release.

Curated weekly by Koobo Content Agent

Groundbreaking

Recent breakthroughs that changed the landscape.

20225,000 citations

ReAct: Synergizing Reasoning and Acting in Language Models

Yao et al.

Showed that interleaving reasoning traces with actions lets language models solve complex tasks by thinking and acting in alternation. ReAct is the conceptual foundation for most modern AI agent architectures — reason about what to do, then do it, then reason again.

agentsreasoningtool-use
20233,000 citations

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick et al.

Demonstrated that language models can learn to use external tools (calculators, search engines, APIs) through self-supervised learning. Established that tool use is a learnable skill, not just a prompting trick — a key insight for building capable AI agents.

tool-useagentsself-supervised
202312,000 citations

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron et al.

Meta's release of high-quality open-weight models with permissive licensing catalyzed the open-source AI ecosystem. Llama 2 proved that open models could approach proprietary performance, launching a wave of community fine-tuning and derivative models.

open-sourcefine-tuningMeta
20242,500 citations

Mixtral of Experts

Jiang et al.

Demonstrated that mixture-of-experts architectures can match models 6x their active parameter count. By activating only a subset of parameters per token, MoE models achieve large-model quality at small-model inference cost — a key efficiency breakthrough.

mixture-of-expertsefficiencyMistral
20251,500 citations

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek AI

Trained for an estimated $6 million, DeepSeek-R1 matched OpenAI o1's reasoning capabilities and was released under the MIT license. Validated that frontier-level reasoning can be achieved through RL without expensive supervised fine-tuning, fundamentally altering the economics of AI development.

reasoningreinforcement-learningefficiencyopen-source
2024800 citations

Qwen2.5 Technical Report

Qwen Team

Alibaba's Qwen2.5 series demonstrated that open-source models trained on 18 trillion tokens across 29 languages could match or exceed proprietary models on coding, math, and reasoning benchmarks. The subsequent Qwen3 variants outperformed OpenAI O3 on advanced mathematics.

open-sourcemultilingualAlibaba
20233,000 citations

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Google

Google's natively multimodal model family demonstrated that training on interleaved text, image, audio, and video from the start produces stronger cross-modal reasoning than bolting modalities onto a text model. Set new benchmarks for multimodal understanding.

multimodalGooglefrontier
2024500 citations

The Claude Model Family: Claude 3.5 System Card

Anthropic

Anthropic's detailed system card for Claude 3.5 set a new standard for AI transparency, documenting model capabilities, safety evaluations, and known limitations. Demonstrated how responsible AI development can coexist with frontier capabilities.

safetyalignmentAnthropic

Foundational

The canonical papers that define the field.

2017130,000 citations

Attention Is All You Need

Vaswani et al.

Introduced the Transformer architecture, replacing recurrence with self-attention for sequence modeling. This paper is the foundation of every modern large language model — GPT, BERT, Llama, Claude, and Gemini all descend from this architecture.

transformersattentionarchitecture
201895,000 citations

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin et al.

Demonstrated that pre-training a bidirectional transformer on unlabeled text, then fine-tuning on specific tasks, dramatically outperforms training from scratch. Established the pre-train/fine-tune paradigm that defines modern NLP.

pre-trainingbidirectionalNLP
202040,000 citations

Language Models are Few-Shot Learners

Brown et al.

Showed that scaling language models to 175 billion parameters enables few-shot learning — performing tasks from just a few examples without fine-tuning. Proved that scale itself is a path to general capability.

scalingfew-shotGPT
202212,000 citations

Training language models to follow instructions with human feedback

Ouyang et al.

Introduced RLHF (Reinforcement Learning from Human Feedback) to align language models with human intent. This technique transformed raw language models into useful assistants — the key innovation behind ChatGPT and every instruction-tuned model since.

RLHFalignmentinstruction-following
20208,000 citations

Scaling Laws for Neural Language Models

Kaplan et al.

Established precise mathematical relationships between model size, dataset size, compute budget, and performance. These scaling laws became the strategic blueprint for training larger and more capable models — directly informing investment decisions across the industry.

scalingcomputepower-laws
20229,000 citations

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei et al.

Demonstrated that prompting models to show their reasoning step-by-step dramatically improves performance on math, logic, and multi-step tasks. Chain-of-thought is now a standard technique in both prompting and model training.

reasoningpromptingchain-of-thought
20223,500 citations

Constitutional AI: Harmlessness from AI Feedback

Bai et al.

Introduced a method for training AI systems to be helpful and harmless using a set of principles (a 'constitution') rather than extensive human labeling. Pioneered AI-to-AI feedback for alignment, reducing dependence on human annotation.

safetyalignmentRLAIF
20207,000 citations

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis et al.

Combined retrieval systems with generative models, allowing language models to access external knowledge at inference time. RAG is now the standard architecture for building AI systems that need to work with specific, up-to-date, or proprietary information.

RAGretrievalknowledge

Want to see AI analysis in action?

Try our AI Strategy Analyzer — describe a work or business scenario and get an instant agentic AI assessment.

Try the AI Strategy Analyzer