113
u/ohwut 1d ago
133
u/Tobio-Star 1d ago
10M tokens context window is insane
64
u/Fruit_loops_jesus 1d ago
Thinking the same. Llama is the only model approved at my job. This might actually make my life easier.
5
u/Ok_Kale_1377 1d ago
Why llama in particular is approved?
55
u/PM_ME_A_STEAM_GIFT 1d ago
Not OP, but I assume because it's self-hostable, i.e. company data stays in-house.
15
u/Exciting-Look-8317 1d ago
He works at meta probably
5
u/Thoughtulism 22h ago
Zuck is sitting there looking over his shoulder right now smoking that huge bong
4
u/MalTasker 1d ago
So are qwen and deepseek and theyre much better
16
u/ohwut 1d ago
Many companies won’t allow models developed outside the US to be used on critical work even when they’re hosted locally.
8
u/Pyros-SD-Models 1d ago
Which makes zero sense. But that’s how the suits are. Wonder what their reasoning is against models like gemma, phi and mistral then.
18
u/ohwut 1d ago
It absolutely makes sense.
You have to work on two concepts. People are stupid and won’t review the AI work and people are malicious.
It’s absolutely trivial to taint AI output with proper training. A Chinese model could easily just be trained to output malicious code in certain situation. Or be trained to output other specifically misleading data in critical situations.
Obviously any model has the same risks, but there’s an inherent trust toward models made by yourself or your geopolitical allies.
-3
u/rushedone ▪️ AGI whenever Q* is 1d ago
Chinese models can be run uncensored
(the open source ones at least)
→ More replies (0)2
u/Lonely-Internet-601 1d ago
It’s impractical to approve and host every single model. Similar things happen with suppliers at big companies, they have a few approved suppliers as it’s time consuming to vet everyone
•
u/Perfect-Campaign9551 4m ago
Might be nice is I could use that! We are stuck on default copilot with a crappy 64k context. It barfs all the time now because it updated itself with some sort of search function now that seems to search the codebase, which of course will full the context window pretty quick....
15
u/ezjakes 1d ago
While it may not be better than Gemini 2.5 in most ways, I am glad they are pushing the envelope in certain respects.
6
u/Proof_Cartoonist5276 1d ago
Llama 4 is a non reasoning model
17
u/mxforest 1d ago
A reasoning model is coming. There are 4 in total, 2 released today with behemoth and reasoning in training.
1
u/RipleyVanDalen We must not allow AGI without UBI 1d ago
Wrong. Llama 4 is a series of models. One of which is a reasoning model.
5
1
1
u/IllegitimatePopeKid 1d ago
For those not so in the loop, why is it insane?
22
9
u/mxforest 1d ago
128k context has been a limiting factor in many applications. I frequently deal with data that goes upto 500-600k token range so i have to run multiple passes to first condense and then rerun on the combination of condensed. This makes my life easier.
3
u/SilverAcanthaceae463 1d ago
Many SOTA models were already much more than 128k, namely 1M, but 10M is really good
3
u/Iamreason 1d ago
Outside of 2.5 Pro's recent release none of the 1M context models have been particularly good. This hopefully changes that.
Lots of codebases bigger than 1M tokens too.
1
u/Purusha120 22h ago
Many SOTA models were already much more than 128k, namely 1M
Literally the only definitive SOTA model with 1M+ context is 2.5 pro. 2.0 thinking and 2.0 pro weren’t SOTA, and outside of that, the implication that there have been other major players in long context is mostly wrong. Claude’s had 200k for a second with significant performance drop off, and OpenAI’s were limited to 128k. So where is “many” coming from?
But yes, 10M is very good… if it works well. So far we only have needle in a haystack benchmarks which aren’t very useful for most real life performance.
0
163
u/xRolocker 1d ago
Oh hello!
Edit: 10 million context window???? What the f-
45
u/Proud_Fox_684 1d ago
Only the smallest model will have 10 million tokens context window.
25
2
u/Duckpoke 1d ago
Seems especially useful for something where model size doesn’t matter. Like a virtual personal assistant
154
u/Busy-Awareness420 1d ago
23
u/Sir-Thugnificent 1d ago edited 1d ago
Somebody please explain to me what « context window » means and why should I be hyped about it
Edit : thank y’all for the answers !
62
u/ChooChoo_Mofo 1d ago
basically it’s how many tokens (letters or group of letters) that the LLM can use as “context” in its response. 10M tokens is like, 7M words.
so, you could give Llama 4 a 7M word book and ask about it and it could summarize it, talk about it, etc. or you could have an extremely long conversation with it and it could remember things said at the beginning (as long as the entire chat is within the 10M token limit).
10M context is just absolutely massive - even the 2M context from Gemini 2.5 is crazy. Think huge code bases, an entire library of books, etc.
60
u/Tkins 1d ago
The Lord of the rings trilogy has 550k words for instance.
-1
u/chrisonetime 1d ago
True but don’t tokens counts as characters and spaces not words? And the entire context window is a blend of input(your prompts) and output(ai response) tokens?
8
u/Rain_On 1d ago
Tokens are words, fragments of words, individual characters or punctuation.
You can see examples here:
https://platform.openai.com/tokenizer3
8
u/Jolly-Ground-3722 ▪️competent AGI - Google def. - by 2030 1d ago
Or you can feed an entire codebase of a big software project into it, at once, so it understands it in its entirety.
1
1
u/Majinvegito123 1d ago
This is great, but how much of that context is usable? Gemini 2.5 stands out because it can effectively handle context >500k tokens.
6
u/PwanaZana ▪️AGI 2077 1d ago
It's how many tokens (letters/words) the model can keep in its short term memory. When you go above that number in a conversation (or if you feed a pdf or code to a model that's too long), the model goes crazy.
(If I'm wrong on this, I'm sure reddit will let me know)
2
u/iruscant 1d ago
"Goes crazy" is a bit much, it just starts forgetting the earlier parts of the conversation.
The frustrating thing has always been that most online chatbot sites don't just tell you when it's happening, so you just have to guess and you might not realize the AI is forgetting old stuff until many messages later. Google's AI Studio site has a token count on the right and it's great, but having a colossal 10M context is also one way to get rid of the problem.
1
4
u/PrimitiveIterator 1d ago
The context window is just the size of the input the model can accept. So if 1 word = 1 token (which is not true but gets the idea across), 10m context means the model could handle 10 million words of input at once. So if you wanted it to summarize many books, a few pdfs and have a long conversation about it, it could do that without missing any of that information in its input for each token it generates.
Why you should be hyped though? Idk be hyped about what you want to be hyped about. 10m context is good for some people, but not others. It depends on your use case.
3
u/Own-Refrigerator7804 1d ago
When you start a chat with a model it knows a lot but doesn't remember anything you said in other chat. Context is "memory" it remember the thing you asked and the thing the ia answered. With this much contenx 6can upload a book or a paper and the model will know everything of it.
3
1
u/mxforest 1d ago
Complete message history size. You can load up more data or have conversation for longer while still maintaining knowledge of old conversations.
1
36
u/CMDR_Crook 1d ago
But can it code?
11
16
u/jazir5 1d ago
It said Llama Scout above Gemma 3 and 2.0 flash lite, below 4o and 2.0 flash. So not really. Models that are o1 tier running locally are looking a couple months further out than I thought, hopefully by August. The mid tier and high tier models sound legit, but ain't no one running those on home systems.
-5
u/ninjasaid13 Not now. 1d ago
Who says they won't released RL tuned version as llama 4.5
2
u/jazir5 1d ago edited 1d ago
I didn't say that, I meant these are not ready to use for coding on local personal computers yet, that's probably 4-6 months out for it to be o1 tier and actually usable.
4o is terrible at coding, and the current mid tier Llama 4 model has ~that accuracy, which requires a multi H100 card server to run. And Llama 4 scout (which is ~gemini 2.0 flash lite level, which is a joke capability wise) requires a single H100 to run the 4 bit quant.
We're still a ways off from high powered local models, but I think we should easily be there by September, latest by October.
2
u/ninjasaid13 Not now. 1d ago
I don't think the o1 or 4.5 tier model is supposed to be the ones currently released, it is supposed to be the behemoth tier.
65
u/BreadCrustSucks 1d ago
And I thought 1 mil context was massive 🤯
56
u/mxforest 1d ago
Gemini was boasting 1M and soon to be available 2M. Then mr Zuck walks in and slaps his massive size 10 bong on the table.
108
u/Jolly-Ground-3722 ▪️competent AGI - Google def. - by 2030 1d ago
13
u/Gratitude15 1d ago
This is the world we live in now. I mean...
This should be the new bar for memes
1
27
u/Pyros-SD-Models 1d ago
The 10m bong has to prove first that it’s actually 10m of usable context and is not shitting the bed after 8k tokens.
Until now it’s just a number.
5
4
u/Thinklikeachef 1d ago
Yeah it's great news. I'll certainly be interested in testing it. How much is actually usable?
3
u/analtelescope 1d ago
I mean, we don't actually know how it performs at beyond 1 million tokens context. Like, theoretically, every model is infinite context if you don't account for performance past a certain point.
45
70
u/Halpaviitta Virtuoso AGI 2029 1d ago
10m??? Is this the exponential curve everyone's hyped about?
46
u/Informal_Warning_703 1d ago
Very amusing to see the contrast in opinions in this subreddit vs the local llama subreddit:
Most people here: "Wow, this is so revolutionary!"
Most people there: "This makes no fucking sense and it's barely better than 3.3 70b"21
u/BlueSwordM 1d ago
I mean, it is a valid opinion.
HOWEVER, considering the model was natively trained on 256k native context, it'll likely perform quite a bit better.
I'll still wait for proper benchmarks though.
1
u/johnkapolos 1d ago
Link for the 256k claim? Or perhaps it's on the release page and I missed it?
6
u/BlueSwordM 1d ago
"Llama 4 Scout is both pre-trained and post-trained with a 256K context length, which empowers the base model with advanced length generalization capability."
2
23
4
u/Bitter-Good-2540 1d ago
Don't get your hopes up. Doesn't help if the model forgets everything after 1 million tokens
5
u/hopelesslysarcastic 1d ago
No, but what it does mean is that we can expect all new foundation models from every lab to now be at or near that benchmark going forward.
Basically, this latest generation trained on a OOM more compute…Llama 4 is one of the first of that generation that is now coming to market at this new foundational context level, others will follow in tow.
1
10
35
u/calashi 1d ago
10M context window basically means you can throw a big codebase there and have an oracle/architect/lead at your disposal 24/7
30
u/Bitter-Good-2540 1d ago
The big question will be: how good will it be with this context? Sonnet 1,2 or 3 level?
5
u/jazir5 1d ago
Given Gemini's performance until 2.5 pro, almost certainly garbage above 100k tokens, and likely leaning into gibberish territory after 50k. Gemini's 1M context window was entirely on paper, this will likely play out the same, but hoo boy do I want to be wrong.
3
u/OddPermission3239 1d ago
Gemini accuracy is still around 128k which is great if you think about it.
2
u/thecanonicalmg 1d ago
I’m wondering how many h100s you’d need to effectively hold the 10M context window. Like $50/hour if renting from a cloud provider maybe?
7
14
u/ChooChoo_Mofo 1d ago
Wonder if this could beat Pokémon since it has such a huge context window - isn’t that the issue with Claude? Like, it couldn’t remember enough so it couldn’t get unstuck?
1
u/Purusha120 22h ago
There haven’t been many benchmarks on actual recall and summary/application with this extended context length besides a needle in a haystack evaluation which can be a good preliminary (very basic) metric but not usually representative of many real world tasks. So we’ll have to see how well it holds up. Also, it’ll likely not be as smart as 3.5-3.7 Claude models. I’m excited to see how Gemini 2.5 pro does with this.
3
u/revistabr 1d ago
10m context, but can't answer a simple prompt asking for a react diagram because it hits an output limit.
Not good for real use, at least not the free version (not sure if there are other versions)
18
u/snoee 1d ago
The focus on reducing "political bias" is concerning. Lobotomised models built appease politicians is not what I want from AGI/ASI.
3
u/MidSolo 1d ago edited 1d ago
I couldn't find anything about reducing political bias on the Llama site. Where did you get that from? Or what do you mean?
Edit: Found it here, scroll to section called "Addressing bias in LLMs".
Addressing bias in LLMs
It’s well-known that all leading LLMs have had issues with bias—specifically, they historically have leaned left when it comes to debated political and social topics. This is due to the types of training data available on the internet.
Our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue. As part of this work, we’re continuing to make Llama more responsive so that it answers questions, can respond to a variety of different viewpoints without passing judgment, and doesn't favor some views over others.
We have made improvements on these efforts with this release—Llama 4 performs significantly better than Llama 3 and is comparable to Grok:
- Llama 4 refuses less on debated political and social topics overall (from 7% in Llama 3.3 to below 2%).
- Llama 4 is dramatically more balanced with which prompts it refuses to respond to (the proportion of unequal response refusals is now less than 1% on a set of debated topical questions).
- Our testing shows that Llama 4 responds with strong political lean at a rate comparable to Grok (and at half of the rate of Llama 3.3) on a contentious set of political or social topics. While we are making progress, we know we have more work to do and will continue to drive this rate further down.
We’re proud of this progress to date and remain committed to our goal of eliminating overall bias in our models.
19
u/Informal_Warning_703 1d ago
What the fuck are you talking about? Studies have shown that base/foundation models exhibit less political bais than fine-tuned ones. The political bias is the actual lobotomizing that is occurring, as corporations fine-tune the models to exhibit more bias.
[2402.01789] The Political Preferences of LLMs
Measuring Political Preferences in AI Systems: An Integrative Approach | Manhattan InstituteIn other words, introducing less bias in during the fine-tuning stage will give a more accurate representation of the model (not to mention a more accurate reflection of the human population).
20
u/SgathTriallair ▪️ AGI 2025 ▪️ ASI 2030 1d ago
The question is always: What do the builders consider to be true what do they consider to be biased?
Some will say that recognizing transgender people is biased and some will say it is true. Given Zuck's hard turn to the right, I'm concerned about what his definition of unbiased is.
2
u/Tax__Player ▪️AGI 2025 1d ago
What do the builders consider to be true what do they consider to be biased?
Who cares? That's why you don't impose ANY bias in the training. Let the LLM figure out what's true and what's not purely on the broad training data.
8
u/MidSolo 1d ago
This is literally what the chain-leading post was complaining about; Meta focusing on reducing political bias for Llama 4 is a problem.
1
u/Tax__Player ▪️AGI 2025 1d ago
I'm assuming by reducing political bias they mean bias not in the training data but their fine tuning which removes "problematic content".
3
u/SgathTriallair ▪️ AGI 2025 ▪️ ASI 2030 1d ago
In order to turn an LLM into a chat bot you have to do reinforcement learning. This means you give the AI a set of prompts and answers then you give it prompts and rate its answers.
A human does this work and the human has a perspective on what is true and false and in what is good or bad. If the AI says the earth is flat then they'll mark that down and if it gets after and yells at the user they'll mark that down. An "unbiased response" is merely one that agrees with your own biases. The people doing reinforcement learning dummy have access to universal truth, and neither does anything else in the universe. So both the users and the trainers are going off their own concept of truth.
So a "less biased" AI is one that is biased towards its user base. So the question is, who is this user base that the builder was imagining when determining whether specific training responses were biased or not.
2
u/oldjar747 1d ago
Almost every model has some sort of corporoate neoliberal bias that has pervaded Western culture. I'm not a fan of corporatism nor neoliberalism; in fact, would probably rather prefer a Chinese model over that.
-15
u/Informal_Warning_703 1d ago
If you think Zuckerberg took a "hard turn to the right" then you're one of those fringe nutjobs who is part of the problem. People should be concerned about AI that is aligned to any such fringe ideology.
5
3
u/Daedes 1d ago
Are you one of those gamers that took the bait that DEI is ruining everything. I feel bad for you, gullible people :/
-1
u/Informal_Warning_703 1d ago
Yeah, moron, I must be an an anti-DEI gamer because I don’t believe Zuckerberg is a hard right winger. The level of sheer stupidity among Reddit leftists is truly astonishing.
3
u/Daedes 1d ago edited 16h ago
How humourous that you assume I'm a leftist. The Reddit gaymers have truly shallow and tribalistic political views.
Edit- Oh wait I just had to browse your comment history :P. Don't get mad that people can call you out for being predictable npcs.
"A coup of what? He’s already the head of the executive branch, including the military. One could also say it’s unprecedented that the military push modern DEI initiatives (those started under Obama) and many of those fired were known for pushing it. You’re just going to be definitively exposed as a nutcase when there’s no “coup”
.
0
u/Informal_Warning_703 9h ago
Only an extreme leftist nutjob would think “This person doesn’t believe Zuckerberg is a hard right winger, therefore they must be a gamer who thinks DEI has ruined everything!”
And, of course, in true nutjob fashion, you dig through months of my comments to try to find any instance where I mentioned DEI. And notice that I actually gave no evaluation of DEI! I didn’t say it was good or bad, I simply said it was recent and the motivation for Trump’s actions in a specific context… and I was right!
So, thanks for demonstrating that you’re another reddit nutjob who is bad at logic. For your own health, you probably shouldn’t spend so much time and effort investigating a random person just to try to draw more tenuous connections. Go outside, my friend.
0
u/Daedes 9h ago
Its just for the record for the comment string where the sentiment is clear to see. If I were to ask you about the context of the comment thread where the quote is from it would go like this.
Me-Hey do you think Trump attempted a coup on janurary 6th?
You-Define coupe. From what we know there is no definitive legal statement that defines a coupe...
Me-.....
0
u/Informal_Warning_703 9h ago
Unsurprisingly, the nutjob who thinks anyone who believes Zuckerberg is not hard right wing plays games and hates DEI, and who dug through months of comments of a random person to find any mention of DEI, also believes “the sentiment is clear to see” even though no sentiment can be derived from the words themselves.
→ More replies (0)6
u/MidSolo 1d ago
Llama is made by Meta, which is a corporation owned by Zuckerberg. You're both talking about the same thing. Calm down.
Meta has announced that they are attempting to address bias in LLMs so that the model, instead of adhering to the training data, is forced into an unnatural neutrality:
It’s well-known that all leading LLMs have had issues with bias—specifically, they historically have leaned left when it comes to debated political and social topics. This is due to the types of training data available on the internet.
2
0
u/Awkward_Research1573 1d ago edited 1d ago
That is extremely wrong, you should read up on Digital Colonialism and the “WEIRD” (western, educated, industrialised, rich and democratic) bias most if not all LLMs show due to their data set being predominantly Americanised and anglophone content. Right now; LLMs don’t show an unbiased view of the human population and although they are multilingual they are monocultural
0
u/Informal_Warning_703 9h ago
How about you demonstrate your claims instead of asking me to do your work for you.
0
u/Awkward_Research1573 8h ago
Sure I can give you something to read. At the end you have to put the work in if you want.
Just to add. I was just rejecting your use of “more accurate reflection of the human population”. Considering that more than 50% of the training data is English content is already a dead giveaway why LLMs are biased towards the American (western) culture…
1
u/Informal_Warning_703 4h ago
Yes, dumb ass, an LLM that is less biased towards the far left or right of the American political parties *is* a more accurate reflection of the human population. And if you knew anything about logic, instead of just how to do a quick google search for the link you share, you would know that isn't inconsistent with the idea that LLMs are biased toward American culture generally.
1
u/H9ejFGzpN2 1d ago
I don't think it's meant to appease rather than not take sides and influence elections.
This is possibly the biggest propaganda tool ever made if the model leans to one side instead of sharing facts only.
2
u/Dear-One-6884 ▪️ Narrow ASI 2026|AGI in the coming weeks 1d ago
I need to see the fiction.livebench scores..... But holy fuck 10M context
2
u/Widerrufsdurchgriff 1d ago edited 1d ago
10 million Tokens? What a time to be alive. And people dont see the exponential growth. Im getting hyped every day about new benchmarks and increasing graphs
2
u/Hour_Cry3520 1d ago
10m context window does not necessarily mean accuracy in retrieving all information available in that huge range right?
2
u/ponieslovekittens 1d ago
Correct, it absolutely does not.
Companies are playing sleight of hand with what they even mean by "model" these days, but the TL;DR here is that the context length they're advertising is only possible because they're generating summaries of it and throwing most of the information away.
They may have trained it with longer sequences, but that doesn't mean that the AI will ever even see all the information in an especially large context you give it. They're doing gymnastics to trim it down, hoping you won't notice the degradation.
4
u/Cosmic__Guy 1d ago
Days of flexing 1M context window are gone... Cough cough... google....
1
u/Purusha120 22h ago
We’ll have to see if llama 4 benchmarks past a simple needle in a haystack test back up 10m first but hopefully that’s the case!
4
u/Setsuiii 1d ago
Everyone is getting excited over the context limit but we don’t know how good it actually works. There is usually massive degradation after like 32k context.
4
u/itorcs 1d ago
Looks like it's still behind the new deepseek v3.1 in coding. Which means deepseek r2 is going to be absolutely insane. That's the model I'm waiting for. Maybe this is foolish but if I was forced to bet I'd go all in on r2 overtaking gemini 2.5. Openai better pray full o3 and o4-mini are good but I'm sure they are sweating.
4
u/iDoAiStuffFr 1d ago
livecodebench of 49 is decent for non thinking model. also it becomes apparent they are spending very high amounts just for another iteration of a huge teacher model, like gpt-4.5. it seems to be worth it in their circles. maybe we underestimate good base models completely. alternative explanation: they all gamble the same game and we stagnate. maybe they just have this kind of money... while i still work my ass off to pay rent
2
u/name_is_unimportant 1d ago
Not allowed to use it in the European Union
2
u/Feisty-River-929 1d ago
Conclusion : Stagnation
2
u/etzel1200 1d ago
Oh my god, it doesn’t wipe all benchmarks. Stagnation!
Last summer this would have been insane. Today it’s still the biggest contest window out there and some good numbers.
2
u/Feisty-River-929 1d ago
The models are being trained at 6 months cycle. Every 1-3% increment will take exponentially more compute. Hence, the LLMs have stagnated. The O1 training time accuracy plot for reference.
1
u/dervu ▪️AI, AI, Captain! 1d ago
Can you run any of it on single 4090?
1
1
1
1
u/BriefImplement9843 1d ago edited 1d ago
Whichever model is being used on meta.ai definitely sucks at writing. Hopefully it's scout. Feels like 3.1 or 3.3. Noticing no difference. It says it's llama 4, hopefully hallucinating.
Context is also horrific. 20 prompts in and it completely forgot the start of the session, telling me it can't read context from a previous session, lmao. The web version is total garbage and nerfed.
1
u/Bacon44444 1d ago
Holy shit. That cost to performance ratio is crazy and then there's 10m tokens. Is this areasoning model, I got so excited I forgot to check
1
u/Curious-Adagio8595 1d ago
Any word on the reliability of that context window. Really skeptical on how much of that 10M context the model is able to actually recall.
1
1
1
u/VisualLibrarian7593 1d ago
Wild to see how fast small models are catching up. Llama 4 Scout is just 17B active params, runs on a single GPU, and still crushes benchmarks. Model size used to mean everything—now it’s all about smarter architectures and better efficiency
1
u/Blankeye434 1d ago
Ignore me. Novice here trying to understand transformers, but the context window needn't be fixed actually right?
What's stopping us from taking a model and giving a larger than context window input? Perf drops or does it throw error
1
u/Darkstar_111 ▪️AGI will be A(ge)I. Artificial Good Enough Intelligence. 22h ago
17B, 16E, 10M context, and 109B params...
Exactly how much vram do I need to run this thing, does anyone know??
1
1
1
1
u/FriendlyRope 1d ago
Anything to make the meta stocks go up again. Or at least slow it's decent.
3
u/New_World_2050 1d ago
its weird how ai releases have no effect on stock tbh
like one would think having one of the best ai teams in the world would be worth something. investors are tweaking.
2
1
0
u/Mrleibniz 1d ago
No image generation
3
u/FrermitTheKog 1d ago
No big western company has the balls to open source one. China on the other hand...
266
u/ExoticCard 1d ago
10M context??? OOOHHHH FUCK
Now we're really cooking