π Bibliography: LLM Applications and Safety
This bibliography complements Chapter 8 by focusing on applied large language models: how we steer them, evaluate them, and deploy them safely.
Prompting and tool use
- Lilian Weng (OpenAI). Prompt Engineering.
- Anthropic. Constitutional AI.
Post-training: instruction tuning and alignment
- Ouyang et al. (2022). Training language models to follow instructions with human feedback (InstructGPT).
- Bai et al. (2022). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
Evaluation, reliability, and safety
- OpenAI. Evals.
- Ribeiro et al. (2020). Beyond Accuracy: Behavioral Testing of NLP Models with CheckList.
Privacy and deployment considerations
- NIST. AI Risk Management Framework (AI RMF 1.0).