Since their introduction in 2017, transformers have quickly become the dominant architecture for achieving state-of-the-art results on a variety of natural language processing tasks. If you're a data scientist or coder, this practical book -now revised in full color- shows you how to train and scale these large models using Hugging Face Transformers, a Python-based deep learning library.
Transformers have been used to write realistic news stories, improve Google Search queries, and even create chatbots that tell corny jokes. In this guide, authors Lewis Tunstall, Leandro von Werra, and Thomas Wolf, among the creators of Hugging Face Transformers, use a hands-on approach to teach you how transformers work and how to integrate them in your applications. You'll quickly learn a variety of tasks they can help you solve.
Build, debug, and optimize transformer models for core NLP tasks, such as text classification, named entity recognition, and question answering Learn how transformers can be used for cross-lingual transfer learning Apply transformers in real-world scenarios where labeled data is scarce Make transformer models efficient for deployment using techniques such as distillation, pruning, and quantization Train transformers from scratch and learn how to scale to multiple GPUs and distributed environments
I was afraid this book would be redundant given how much information one can find online about transformers and the Hugging Face platform. Still, it turned out to be a very concise and pragmatic introduction to the topic and a valuable reference book with dozens of tips for training and tailoring your data tasks to the transformers paradigm.
This is a super cool book on NLP using the HuggingFace 🤗 ecosystem. It's well-written, and you can read it quite quickly (except for two very technical but important chapters). I would recommend it to anyone who has basic experience with deep learning and wants to dive into NLP.
Both a great primer on the subject, as well as a nice collection of more advanced 'gotchas'. Would have loved to see a bit more on the 'in production' side of things, but a great read nonetheless :-)
As a programmer I found this book initially interesting for laying the grounds to a more fundamental understading of transformers and why they are so hyped. I really hoped - being naive - I would read something groundbreaking in terms of computation and also hoped for an enablement to use transformers for future programming undertakings.
I was wrong. But that is really not the books fault. This technologies around transformers and llms strikes me as a black box where the underlying abstractions are not only difficult to understand but also difficult to put to use without tremendous effort and data. I truly do not understand where this is applicable outside of text-heavy domains and therefore I am a bit dissappointed in the technology.
At the same time the book has put a bit of calmness in me while reading stories of so-called AI products and software, where now I know at least the fundamentals and how hard they really are to put to use. All those abstractions resemble to me how the inner workings of a rdbms, and I know that for rdbms' that those are a career path in its own. Hence I am also starting to believe that at this stage, transformers has a long and steep learning curve for really putting them to use, and must not change its underlyings so much that it becomes difficult to follow over time.
I liked the book, it introduced me to a lot of things that is out of my daily work and I will definitly glimpse through the fundamentals many times again.
Accessibly written, with useful code examples and lots of directly actionable information on how to use HuggingFace tools. The chapters on making models efficient for production and on dealing with situations in which few labels are available are especially illuminating. It seems as if HuggingFace has developed a number of useful abstraction layers to make ML engineers more productive, especially around storing and accessing both models and data in a straightforward manner. Training your own neural network and deploying it has never been as easy as today and this book is a useful introduction to the ecosystem. The authors state that as of writing, 20,000 models had been shared on the model hub, but by the time I checked, the number was already 10 times as high. The explanations of model architectures and some technical details such as self-attention are also well-written, though of course given the current pace of technological change, are sure to be in need of another revision in only a couple years.
This book does a good job at introducing and explaining concepts of transformers. I especially like the Named Entity Recognition section, which also goes explains how to do model debugging.
I wish that this book focused a bit more on the huggingface ecosystem rather than only on the transformers part. When you would be tackling your custom problem, you will often have to deal with hugging face tokenisers and datasets. In my opinion they are vital to solving a NLP problem using hugging face and this book sadly does not give them enough attention.
Overall I did very much enjoy this book, however I am still sitting a bit with hunger w.r.t. solving real-world problems using hugging face.
Very good overview about capabilities of transformer architecture. Great examples. However more complicated math topics are glossed over in a very unelegant way. Math typograhpy is really bad. Some of the examples are really contrived.
Really nice overview of all things Transformers. Sometimes very vendor-centric but they tried their best to be as neural and inclusive as possible. One complaint I have is that it kinda starts midway without any context about Transformers but maybe that’s fine given the target audience of the book.
a lot of code and technical details. I read haft of the book and scanned through the other half. I wish there were more details and architecture design choices instead. Nevertheless, every well-written book.
A highly recommended book for anyone looking to understand how Transformer architecture works, combining theoretical concepts with practical code examples to ensure a thorough grasp of these models.
I’m impressed by how clearly the author explains the architectures and their applications. The clarity definitely makes complex concepts approachable to both technical and non-technical people.