Using Gemma LLM: A Step-by-Step Guide

Introduction 

Large language models (LLMs) are increasingly becoming powerful tools for understanding and generating human language. These models have achieved state-of-the-art results on different natural language processing tasks, including text summarization, machine translation, question answering, and dialogue generation. LLMs have even shown promise in more specialized domains, like healthcare, finance, and law.

Google has been at the forefront of LLM research and development, releasing a series of open models that have pushed the boundaries of what is possible with this technology. These models include BERT, T5, and T5X, which have been widely adopted by researchers and practitioners alike. In this Guide, we introduce Gemma, a new family of open LLMs developed by Google. 

What is Gemma? 

Gemma is a family of open language models based on Google’s Gemini models, trained on up to 6T tokens of text. These are considered to be the lighter versions of Gemini models. The Gemma family consists of two sizes: a 7 billion parameter model for efficient deployment on GPU and TPU, and a 2 billion parameter model for CPU and on-device applications. Gemma exhibits strong generalist capabilities in text domains and state-of-the-art understanding and reasoning skills at scale. It achieves better performance compared to other open models of similar or larger scales across different domains, including question answering, commonsense reasoning, mathematics and science, and coding. For both the models, the pre-trained, finetune checkpoints and open-source codebase for inference and serving are released by the Google Team.

Gemma

Gemma builds upon recent advancements in sequence models, transformers, deep learning, and large-scale training in a distributed manner. It continues Google’s history of releasing open models and ecosystems, following Word2Vec, Transformer, BERT, T5, and T5X. The responsible release of Gemma aims to improve the safety of frontier models, provide equitable access to this technology, give the path to rigorous evaluation and analysis of current techniques, and foster the development of future innovations. However, thorough safety testing specific to each Use Case is crucial before deploying or using Gemma.

Gemma – Model Architecture 

Gemma follows the architecture of a decoder-only transformer that was introduced way back in 2017. Both the Gamma 2B and the 7B models have a vocabulary size of 256k. Both models even have a context length of 8192 tokens. The Gemma even includes the recent advancements made in the transformers’ architecture including: 

How was Gemma Trained?

Gemma 2B and 7B models were trained on 2T and 6T tokens, respectively, of primarily-English data sourced from Web Docs, mathematics, and code. Unlike Gemini models, which include multimodal elements and are optimized for multilingual tasks, Gemma models focus is on processing English text. The training data underwent a careful filtering process to remove Unwanted or Unsafe Content, including personal information and sensitive data. This filtering involved both heuristic methods and model-based classifiers to ensure the quality and safety of the dataset.

Gemma 2B and 7B models underwent supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to further refine their performance. The supervised fine-tuning involved a mix of text-only, English-only synthetic, and human-generated prompt-response pairs. Data mixtures for fine-tuning were carefully selected based on LM-based side-by-side evaluations, with different Prompt sets designed to highlight specific capabilities like the instruction following, factuality, creativity, and safety.

Even, synthetic data underwent several stages of filtering to remove examples containing personal information or toxic outputs, following the approach established by Gemini for improving model performance without compromising safety. Finally, reinforcement learning from human feedback involved collecting pairs of preferences from human raters and training a reward function under the Bradley-Terry model. This function was then optimized using a type of REINFORCE to further refine the models’ performance and mitigate potential issues like reward hacking.

Benchmarks and Performance Metrics 

Looking at the results, Gemma outperforms Mistral on five out of six benchmarks, with the sole exception being HellaSwag, where they get similar accuracy. This dominance is clearly evident in tasks like ARC-c and TruthfulQA, where Gemma surpasses Mistral by nearly 2% and 2.5% in accuracy and F1 score, respectively. Even on MMLU, where Perplexity scores are lower is better, Gemma achieves a prominently lower Perplexity, indicating a better grip of language patterns. These results solidify Gemma’s position in being a powerful language model, capable of handling complex NLP tasks with good accuracy and efficiency.

Gemma 

Getting Started with Gemma 

In this section, we will get started with Gemma. We will be working with Google Colab because it comes with a free GPU. Before we get started, we need to accept Google’s Terms and Conditions to download the model.

Step 1: Opening Gemma

Click on this link to go to Gemma on HuggingFace. You will be presented with something like the below:

Gemma 

Step 2: Click on Acknowledge License

If you click on Acknowledge License , then you will see a page as below.

Gemma 

Click on Authorize. Done we are now ready to download the model. Before, let’s generate a new HuggingFace Token. For this, you can go to the HuggingFace Settings and Generate a new Token, this token will be useful because we need it to authorize inside Google Colab to download the Google Gemma Large Language Model.

Step 3: Installing Libraries

To get started, we first need to install the following libraries.

Step 4: Typing Important Command

Now, type the below command

Step 5: Inferencing the model

Now let’s try inferencing the model.

Step 6: Response Generation

Running the code has generated the following response

Conclusion

Gemma, Google’s latest addition to its suite of open language models, presents advancement in the field of natural language processing. With its strong generalist capabilities and state-of-the-art understanding and reasoning skills, Gemma outperforms other open models across different domains including question answering, commonsense reasoning, mathematics and science, and coding tasks. Built upon recent advancements in sequence models, transformers, and large-scale training techniques, Gemma provides improved performance and efficiency, making it a powerful tool for researchers and practitioners alike. However, responsible deployment and thorough safety testing specific to each problem are compulsory before integrating Gemma into production systems.

Frequently Asked Questions

Q1. What is Gemma?

A. Gemma is a family of open language models developed by Google, providing strong generalist capabilities and state-of-the-art understanding and reasoning skills in different domains.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Latest articles

Related articles