Meta Llama: Everything you need to know about the open generative AI model

6 months ago 90

Like each Big Tech institution these days, Meta has its ain flagship generative AI model, called Llama. Llama is somewhat unique among large models successful that it’s “open,” meaning developers tin download and usage it nevertheless they delight (with definite limitations). That’s in opposition to models like Anthropic’s Claude, Google’s Gemini, xAI’s Grok, and astir of OpenAI’s ChatGPT models, which tin lone beryllium accessed via APIs. 

In the involvement of giving developers choice, however, Meta has besides partnered with vendors, including AWS, Google Cloud, and Microsoft Azure, to marque cloud-hosted versions of Llama available. In addition, the company publishes tools, libraries, and recipes successful its Llama cookbook to assistance developers fine-tune, evaluate, and accommodate the models to their domain. With newer generations like Llama 3 and Llama 4, these capabilities person expanded to see autochthonal multimodal enactment and broader unreality rollouts. 

Here’s everything you request to cognize about Meta’s Llama, from its capabilities and editions to wherever you tin usage it. We’ll keep this station updated arsenic Meta releases upgrades and introduces caller dev tools to enactment the model’s use.

What is Llama?

Llama is simply a household of models — not conscionable one. The latest mentation is Llama 4; it was released successful April 2025 and includes 3 models:  

  • Scout: 17 cardinal progressive parameters, 109 cardinal full parameters, and a discourse model of 10 cardinal tokens. 
  • Maverick: 17 cardinal progressive parameters, 400 cardinal full parameters, and a discourse model of 1 cardinal tokens. 
  • Behemoth: Not yet released but will have 288 cardinal active parameters and 2 trillion full parameters.  

(In information science, tokens are subdivided bits of earthy data, similar the syllables “fan,” “tas” and “tic” successful the connection “fantastic.”)  

A model’s context, oregon discourse window, refers to input information (e.g., text) that the exemplary considers earlier generating output (e.g., additional text). Long discourse tin forestall models from “forgetting” the contented of caller docs and data, and from veering disconnected taxable and extrapolating wrongly. However, longer discourse windows tin also result successful the exemplary “forgetting” definite information guardrails and being much prone to nutrient contented that is successful enactment with the conversation, which has led some users toward delusional thinking.  

For reference, the 10 cardinal discourse window that Llama 4 Scout promises roughly equals the substance of astir 80 mean novels. Llama 4 Maverick’s 1 cardinal discourse model equals astir 8 novels.  

Techcrunch event

San Francisco | October 27-29, 2025

All of the Llama 4 models were trained connected “large amounts of unlabeled text, image, and video data” to springiness them “broad ocular understanding,” as good arsenic connected 200 languages, according to Meta.  

Llama 4 Scout and Maverick are Meta’s archetypal open-weight natively multimodal models. They’re built utilizing a “mixture-of-experts” (MoE) architecture, which reduces computational load and improves ratio successful grooming and inference. Scout, for example, has 16 experts, and Maverick has 128 experts.   

Llama 4 Behemoth includes 16 experts, and Meta is referring to it arsenic a teacher for the smaller models. 

Llama 4 builds connected the Llama 3 series, which included 3.1 and 3.2 models wide utilized for instruction-tuned applications and unreality deployment. 

What tin Llama do?

Like different generative AI models, Llama tin execute a scope of antithetic assistive tasks, similar coding and answering basal mathematics questions, arsenic good arsenic summarizing documents in at slightest 12 languages (Arabic, English, German, French, Hindi, Indonesian, Italian, Portuguese, Hindi, Spanish, Tagalog, Thai, and Vietnamese). Most text-based workloads — deliberation analyzing large files similar PDFs and spreadsheets — are wrong its purview, and each Llama 4 models support text, image, and video input. 

Llama 4 Scout is designed for longer workflows and monolithic information analysis. Maverick is simply a generalist exemplary that is amended astatine balancing reasoning powerfulness and effect speed, and is suitable for coding, chatbots, and method assistants. And Behemoth is designed for precocious research, exemplary distillation, and STEM tasks.  

Llama models, including Llama 3.1, tin beryllium configured to leverage third-party applications, tools, and APIs to execute tasks. They are trained to usage Brave Search for answering questions astir caller events; the Wolfram Alpha API for math- and science-related queries; and a Python interpreter for validating code. However, these tools require proper configuration and are not automatically enabled retired of the box. 

Where tin I usage Llama?

If you’re looking to simply chat with Llama, it’s powering the Meta AI chatbot experience on Facebook Messenger, WhatsApp, Instagram, Oculus, and Meta.ai in 40 countries. Fine-tuned versions of Llama are utilized successful Meta AI experiences successful implicit 200 countries and territories.  

Llama 4 models Scout and Maverick are disposable connected Llama.com and Meta’s partners, including the AI developer level Hugging Face. Behemoth is inactive successful training. Developers building with Llama tin download, use, or fine-tune the exemplary crossed astir of the fashionable unreality platforms. Meta claims it has more than 25 partners hosting Llama, including Nvidia, Databricks, Groq, Dell, and Snowflake. And portion “selling access” to Meta’s openly disposable models isn’t Meta’s concern model, the institution makes immoderate money through revenue-sharing agreements with exemplary hosts. 

Some of these partners person built additional tools and services connected apical of Llama, including tools that fto the models notation proprietary information and alteration them to tally astatine little latencies. 

Importantly, the Llama license constrains however developers tin deploy the model: App developers with much than 700 cardinal monthly users indispensable petition a peculiar licence from Meta that the institution volition assistance connected its discretion. 

In May 2025, Meta launched a new program to incentivize startups to follow its Llama models. Llama for Startups gives companies enactment from Meta’s Llama squad and entree to imaginable funding.  

Alongside Llama, Meta provides tools intended to marque the exemplary “safer” to use:  

  • Llama Guard, a moderation framework. 
  • CyberSecEval, a cybersecurity hazard appraisal suite. 
  • Llama Firewall, a information guardrail designed to alteration gathering unafraid AI systems. 
  • Code Shield, which provides enactment for inference-time filtering of insecure codification produced by LLMs.  

Llama Guard tries to observe perchance problematic contented either fed into — oregon generated — by a Llama model, including contented relating to transgression activity, kid exploitation, copyright violations, hate, self-harm and intersexual abuse. That said, it’s clearly not a metallic slug since Meta’s ain erstwhile guidelines allowed the chatbot to prosecute successful sensual and romanticist chats with minors, and immoderate reports amusement those turned into sexual conversations. Developers can customize the categories of blocked contented and use the blocks to each the languages Llama supports. 

Like Llama Guard, Prompt Guard tin artifact substance intended for Llama, but lone substance meant to “attack” the exemplary and get it to behave successful undesirable ways. Meta claims that Llama Guard tin support against explicitly malicious prompts (i.e., jailbreaks that attempt to get astir Llama’s built-in information filters) successful summation to prompts that contain “injected inputs.” The Llama Firewall works to observe and forestall risks similar punctual injection, insecure code, and risky instrumentality interactions. And Code Shield helps mitigate insecure codification suggestions and offers unafraid bid execution for 7 programming languages. 

As for CyberSecEval, it’s less a instrumentality than a postulation of benchmarks to measurement exemplary security. CyberSecEval can measure the hazard a Llama exemplary poses (at slightest according to Meta’s criteria) to app developers and extremity users successful areas similar “automated societal engineering” and “scaling violative cyber operations.” 

Llama’s limitations

Image Credits:Artificial Analysis

Llama comes with definite risks and limitations, similar each generative AI models. For example, portion its astir caller exemplary has multimodal features, those are mainly limited to the English language for now. 

Zooming out, Meta utilized a dataset of pirated e-books and articles to bid its Llama models. A national justice precocious sided with Meta successful a copyright suit brought against the institution by 13 publication authors, ruling that the usage of copyrighted works for grooming fell nether “fair use.” However, if Llama regurgitates a copyrighted snippet and idiosyncratic uses it successful a product, they could perchance beryllium infringing connected copyright and beryllium liable.  

Meta also controversially trains its AI connected Instagram and Facebook posts, photos and captions, and makes it hard for users to opt out.  

Programming is different country where it’s wise to tread lightly erstwhile utilizing Llama. That’s because Llama mightiness — perhaps more so than its generative AI counterparts — produce buggy oregon insecure code. On LiveCodeBench, a benchmark that tests AI models connected competitory coding problems, Meta’s Llama 4 Maverick exemplary achieved a people of 40%. That’s compared to 85% for OpenAI’s GPT-5 high and 83% for xAI’s Grok 4 Fast. 

As always, it’s best to person a quality adept reappraisal immoderate AI-generated codification earlier incorporating it into a work oregon software. 

Finally, arsenic with different AI models, Llama models are inactive blameworthy of generating plausible-sounding but mendacious oregon misleading information, whether that’s in coding, ineligible guidance, or emotional conversations with AI personas.  

This was primitively published connected September 8, 2024 and is updated regularly with caller information.

Read Entire Article