📚 Bibliografía: Transformers y Mecanismos de Atención

Papers Fundamentales

El Paper Original

Lectura Esencial

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017)
"Attention Is All You Need"
📄 arXiv:1706.03762 | PDF Direct

El paper que revolucionó el procesamiento del lenguaje natural. Introduce la arquitectura Transformer y el mecanismo de self-attention, eliminando la necesidad de redes recurrentes (RNNs) para tareas de secuencia.

Conceptos clave: Multi-head attention, positional encoding, encoder-decoder architecture

Modelos Basados en Transformers

BERT (Bidirectional Encoder Representations)

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018)
"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
📄 arXiv:1810.04805 | PDF
🔗 GitHub Oficial | Blog de Google AI

BERT introduce el pre-entrenamiento bidireccional, permitiendo que el modelo entienda el contexto completo de una palabra mirando tanto a la izquierda como a la derecha.

GPT (Generative Pre-trained Transformer)

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018)
"Improving Language Understanding by Generative Pre-Training" (GPT-1)
📄 OpenAI Paper
🔗 Blog de OpenAI

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019)
"Language Models are Unsupervised Multitask Learners" (GPT-2)
📄 OpenAI Paper
🔗 Blog de OpenAI

Brown, T., Mann, B., Ryder, N., et al. (2020)
"Language Models are Few-Shot Learners" (GPT-3)
📄 arXiv:2005.14165 | PDF

La serie GPT demuestra el poder del aprendizaje autoregresivo y el scaling de modelos de lenguaje.

T5 (Text-to-Text Transfer Transformer)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020)
"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
📄 arXiv:1910.10683 | PDF
🔗 GitHub

T5 reformula todas las tareas de NLP como problemas de text-to-text, demostrando la versatilidad de los Transformers.

Vision Transformer (ViT)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021)
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
📄 arXiv:2010.11929 | PDF
🔗 Blog de Google AI

Demuestra que la arquitectura Transformer puede aplicarse exitosamente más allá del texto, revolucionando también la visión por computadora.

Recursos Educativos Verificados

Tutoriales Visuales e Interactivos

Recurso	Autor/Fuente	Descripción	Enlace
The Illustrated Transformer	Jay Alammar	Explicación visual paso a paso de la arquitectura Transformer con diagramas intuitivos	jalammar.github.io
The Illustrated GPT-2	Jay Alammar	Visualización de cómo funciona GPT-2, desde tokens hasta predicciones	jalammar.github.io
Visualizing A Neural Machine Translation Model	Jay Alammar	Mecánica de attention en traducción automática	jalammar.github.io
LLM Visualization	Brendan Bycroft	Visualización 3D interactiva de arquitectura GPT	bbycroft.net/llm
Transformer Explainer	Georgia Tech Vis Lab	Explorador interactivo de Transformers en el navegador	poloclub.github.io

Cursos y Documentación Oficial

Recurso	Institución	Nivel	Enlace
Hugging Face NLP Course	Hugging Face	Principiante a Avanzado	huggingface.co/learn
CS224N: Natural Language Processing	Stanford University	Avanzado	web.stanford.edu
Deep Learning Specialization	DeepLearning.AI (Coursera)	Intermedio	coursera.org
Transformer Models Documentation	Hugging Face	Todos los niveles	huggingface.co/docs
Attention and Transformers	MIT 6.S191	Intermedio	YouTube

Blogs Oficiales y Artículos de Investigación

Google AI Blog

"Transformer: A Novel Neural Network Architecture for Language Understanding" (2017)
blog.research.google
"Open Sourcing BERT: State-of-the-Art Pre-training for NLP" (2018)
blog.research.google
"REALM: Retrieval-Augmented Language Model Pre-Training" (2020)
blog.research.google

OpenAI Blog

"Language Unsupervised" - Introducción a GPT-1 (2018)
openai.com
"Better Language Models and Their Implications" - GPT-2 (2019)
openai.com
"GPT-3: Language Models are Few-Shot Learners" (2020)
openai.com
"ChatGPT: Optimizing Language Models for Dialogue" (2022)
openai.com

Meta AI Blog

"RoBERTa: A Robustly Optimized BERT Pretraining Approach" (2019)
ai.meta.com
"LLaMA: Open and Efficient Foundation Language Models" (2023)
ai.meta.com

Microsoft Research

"Turing-NLG: A 17-billion-parameter language model" (2020)
microsoft.com

Papers sobre Mecanismos de Atención

Bahdanau, D., Cho, K., & Bengio, Y. (2014)
"Neural Machine Translation by Jointly Learning to Align and Translate"
📄 arXiv:1409.0473
Introduce el mecanismo de atención antes de Transformers

Luong, M. T., Pham, H., & Manning, C. D. (2015)
"Effective Approaches to Attention-based Neural Machine Translation"
📄 arXiv:1508.04025
Variantes de mecanismos de atención

Cheng, J., Dong, L., & Lapata, M. (2016)
"Long Short-Term Memory-Networks for Machine Reading"
📄 arXiv:1601.06733
Self-attention en redes LSTM

Optimizaciones y Variantes de Transformers

Kitaev, N., Kaiser, Ł., & Levskaya, A. (2020)
"Reformer: The Efficient Transformer"
📄 arXiv:2001.04451
Mejoras de eficiencia para contextos largos

Beltagy, I., Peters, M. E., & Cohan, A. (2020)
"Longformer: The Long-Document Transformer"
📄 arXiv:2004.05150
Atención eficiente para documentos largos

Wang, S., Li, B. Z., Khabsa, M., Fang, H., & Ma, H. (2020)
"Linformer: Self-Attention with Linear Complexity"
📄 arXiv:2006.04768
Reducción de complejidad computacional

Zaheer, M., Guruganesh, G., Dubey, A., et al. (2020)
"Big Bird: Transformers for Longer Sequences"
📄 arXiv:2007.14062
Manejo eficiente de secuencias largas

Aplicaciones Médicas de Transformers

Lee, J., Yoon, W., Kim, S., et al. (2020)
"BioBERT: a pre-trained biomedical language representation model"
📄 Bioinformatics | arXiv:1901.08746
🔗 GitHub

Alsentzer, E., Murphy, J., Boag, W., et al. (2019)
"Publicly Available Clinical BERT Embeddings"
📄 arXiv:1904.03323
BERT entrenado con notas clínicas

Gu, Y., Tinn, R., Cheng, H., et al. (2021)
"Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing"
📄 ACM Transactions
PubMedBERT - especializado en literatura médica

Singhal, K., Azizi, S., Tu, T., et al. (2023)
"Large language models encode clinical knowledge"
📄 Nature
Med-PaLM - LLM especializado en medicina

Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., et al. (2023)
"Large language models in medicine"
📄 Nature Medicine
Revisión comprehensiva de LLMs en medicina

Entrevistas y Charlas

Charlas Técnicas Fundamentales

"Attention is All You Need" - Author Discussion
📹 NeurIPS 2017 Talk
Presentación original del paper por los autores

Ilya Sutskever: OpenAI and AGI
📹 Lex Fridman Podcast #94
Co-autor de "Attention Is All You Need", cofundador de OpenAI

Ashish Vaswani: Transformers
📹 Lex Fridman Podcast #208
Primer autor de "Attention Is All You Need"

Andrej Karpathy: Neural Networks and Transformers
📹 YouTube Channel
Ex-Director of AI at Tesla, explicaciones detalladas sobre arquitecturas

Conferencias Importantes

NeurIPS (Conference on Neural Information Processing Systems)
ICLR (International Conference on Learning Representations)
ACL (Association for Computational Linguistics)
EMNLP (Empirical Methods in Natural Language Processing)
NAACL (North American Chapter of ACL)

Herramientas y Librerías

Librería	Organización	Descripción	Enlace
Transformers	Hugging Face	Librería principal para modelos Transformer	GitHub
JAX	Google	Framework de alto rendimiento para ML	GitHub
PyTorch	Meta AI	Framework de deep learning	pytorch.org
TensorFlow	Google	Plataforma de ML end-to-end	tensorflow.org
Fairseq	Meta AI	Toolkit para modelado de secuencias	GitHub
AllenNLP	AI2	Librería NLP sobre PyTorch	allennlp.org

Libros Recomendados

"Speech and Language Processing" (3rd ed. draft)
Jurafsky, D., & Martin, J. H.
📚 Draft Online Gratuito

"Natural Language Processing with Transformers"
Tunstall, L., von Werra, L., & Wolf, T. (2022)
📚 O'Reilly

"Deep Learning"
Goodfellow, I., Bengio, Y., & Courville, A. (2016)
📚 deeplearningbook.org

Comunidades y Foros

Hugging Face Forums: discuss.huggingface.co
r/MachineLearning: reddit.com/r/MachineLearning
Papers with Code: paperswithcode.com
AI Alignment Forum: alignmentforum.org

Nota sobre las Referencias

Todas las referencias han sido verificadas y están activas a la fecha de octubre de 2025. Los enlaces de arXiv, blogs oficiales y documentación han sido comprobados para asegurar su accesibilidad.