HN Story - macintosh.world

macintosh.world | Log In | Register

Back to HN

GLM-5.2 is the new leading open weights model on Artificial Analysis

by himata4113 | 901 points | 442 comments | 2026-06-17 04:12:00 Central

Open Source Link | Read Source Here

Open on Hacker News

Comments

Tiberium
It seems to really be a nice step-up and is getting quite close to the frontier. I wish they'd start focusing on the reasoning efficiency now, though. I have a simple (relatively) test task to evaluate LLMs: writing a simple math evaluator library in Nim (it's about 400-600 lines total max), and GLM 5.2 (xhigh which maps to max effort) spent over 15 minutes (!) reasoning, spending about 45k tokens, before it finally wrote the first file.I know it's hard to improve on that, but now that their models are good enough at raw intelligence, I think this should become a higher priority task.Currently on https://artificialanalysis.ai/#output-tokens GPT 5.5 xhigh spends 16k tokens total on average, GPT 5.5 high is 10k, Fable 5 33k, Opus 4.8 41k, GLM 5.2 is 42k. GPT 5.5 is extremely reasoning efficient.Of course if you convert those values to actual request cost, GLM 5.2 will probably beat GPT 5.5/Opus 4.8, but speed matters for a lot of people, I think.

> benjiro29
GLM 5.2 Max = Opus 4.8 Max in thinking behavior. The thinking chain is so similar, and so is the amount of token usage on the output.If you want reasonable token usage, you need to run it GLM 5.2 at High. There is little drop in quality from Max to High (for most tasks). And it cuts token usage by 2 a 2.5x. GLM 5.2, Max is really something you only need for complex tasks.In essence, GLM 5.2 is Opus 4.8 its little brother, at a way, WAY cheaper price.There has been really no training on Opus models going on, really, none i tell you! /sarcasm

> > matheusmoreira
> GLM 5.2 Max = Opus 4.8 Max in thinking behaviorThis is insane! I can't wait until technology progresses to the point we can run these things on consumer hardware!

> > > chartpath
Are there any indications that this will be possible? Consumer hardware will continue getting better but I can't see 512GB RAM in a MacBook Pro any time soon. I'm hoping linear attention techniques plus MoE will make breakthroughs in size/compression and throughput.

> > > > nijave
Well, we're probably not going to be running frontier models anytime soon, but I think the general assumption is smaller models will continue to improve until they're sufficiently good frontier models aren't needed.There's potentially also augmentation through tools, harnesses and RAG to help boost how well they work without tons of parameters.

> > > > carter2099
> but I can't see 512GB RAM in a MacBook Pro any time soonCould totally see this being a comment from a forum in like 1994 but swap out GB for MB and MacBook Pro to whatever the popular consumer pc was at the time

> > > > > r-w
Yeah but the price of RAM wasn't increasing at that point.

> > > > deadbabe
There will be a 1024GB unified memory MacBook Pro.

> > > > matheusmoreira
Certainly not any time soon, but I have faith it'll happen one day.

> > > > majormajor
In the last ten years laptop memory footprints have, what, doubled at the low end? Smallest MacBook Pro in 2016 was 8GB, smallest is 16GB today? Max I think has gone up 8x meanwhile, 16 to 128?I wonder if there's a bit of a chicken-and-egg issue where there wasn't much that demanded 10x the RAM, so there wasn't much pressure to develop more or increase production to support it at consumer prices.There's wayyyyyyy more demand for memory generally now, so assuming it's not a demand bubble that pops rapidly, I'd expect the new normal to end up at a much higher baseline. 512GB would be 4x greater than today's max, so even with the relatively slow last 10 years development pace, give it five years max?

> > > > > regularfry
The problem is that the situation in the RAM market might just... not go away. It's locked in for the next couple of years unless the AI market goes pop. Which it might! But if it doesn't, there's no particular reason to think that the incentives for cornering the market like OpenAI have would go away.We might see that new normal in five years or so. We will see a new normal sooner than that if there's a run on AI because of the sudden availability of DRR fab capacity, but also we'll probably see the level of local models freeze at whatever state they've got to at that point. But an equally likely outcome is that any new DDR capacity that comes online is just immediately absorbed by frontier AI, and consumer devices stay at "just good enough" for a decade.

> > > > > mikestorrent
The new Macbook Neo is 8GB. I think that if we are lucky, the huge RAM demand right now means new factory buildouts which eventually means more supply and prices go back down, and capacity begins to go up. This level of demand was just not anticipated by anyone.

> > > muyuu
you need 8 x 96GB Blackwell or equivalentso around US$150k which is Small/Medium-Enterprise territory already, but who knows when it will hit "reasonable" home consumer territoryI think there's hope future generations of unified memory machines may get this sort of memory availability when new fabs open in then next couple of years and then ramp up production for a few years afterwards - that makes ~2030s credible at this point, but nobody can really predict the market that far ahead

> > > > matheusmoreira
> I think there's hope future generations of unified memory machines may get this sort of memory availabilityI hope you're right. This is a very exciting idea. The weights are out there. The demand is astronomical. The manufacturers just need to make it happen.

> > > > sterlind
there are cheaper ways to do it. not like, consumer-cheap, but I'm setting up a rig for 80% cheaper than that.I'm a tad worried about triggering a run on the particular hardware I'm buying though so I'll leave it vague here, but hit me up on Discord if you're curious.

> > > > > sankalpmukim
Hey, very intrigued about how it can be done for cheaper. Sent a friend request to sterlind on Discord, interested if you do a write up

> > > > > muyuu
But at what kind of speed? We're aiming at some speed that would negate the point of even using an off-site provider.

> > > harshit119
This is quite evident for personal AI but general intelligence with current scaling laws and how model keep getting better with more number of parameters, certainly the path does not converge. Personal AI is more deprived of context today than quality of token. Having a on-system knowledge base paired with Gemma works well to large extend.

> > FooBarWidget
With such ridiculously long thinking traces I'm surprised max outperforms high. After all, performance falls off a hill after a certain amount of context, and long thinking traces can fill that up really quickly.

> > maxdo
looking at the score this is rather a gemini 3.5 flash competitor, yes, for cheaper, but distance to opus and fable is as big as their price diff.

> > vitalyan123
distillation of thinking models is not particularly effective - both "Open"AI and Misanthropic don't show you the real chain of thought, only its severely downscaled version. both do everything in their power to combat such outrageous copyright infringement, so the bulk of unethically scrapped data the Chinese have is from several generations ago.

> > > nyrikki
It is quite likely that the intermediate tokens don't have 'semantic import'[0]There are methods like Habitual Reasoning Distillation or Inverted Reasoning Traces [1] that can help.While there are reasons to hide the intermediate tokens from a IP protection stand point, there is also a need to hide more effective and efficient generating that doesn't fit the R1 claims of an aha moment that has been debunked, but is a consumer expectation.While hidden intermediate tokens do increase the difficulty, it is not a from barrier in itself, especially as they are billed, given information about their length.[0] https://arxiv.org/abs/2504.09762v4[1] https://arxiv.org/abs/2603.07267

> > > kmeisthax
Chinese distillation attacks are about as unethical as Robin Hood stealing from the rich to give to the poor. The real unethical scraping was done by Anthropic to train Claude.To be clear, if Anthropic was using totally licensed data, I'd be sympathetic to these claims. But if you're going to pirate the world's creativity you'd better be willing to gimme dat shit for free[0].[0] As said by Hungry Santa.

> > > duskdozer
>such outrageous copyright infringementSarcasm, considering the source of their own training data?

> > > > margalabargala
Considering they called the company "Misanthropic", sarcasm is a safe bet.

> > > > > duskdozer
Somehow, I completely overlooked that.

> > > > orphea
Narrator: it was sarcasm, indeed.

> > > > baron3dl
IP for me, not thee.

> > > Bolwin
For Claude models at least, you can tell to just manually think in the output and it works fine. I do it reguralrly because for creative writing and summarization, they seem to believe they don't need to think at all, and get way worse results.

> > > > carterschonwald
this helps so much. i do it too. with some of the newer frontier models its unclear if you can even turn it off in the first party chat apps. havent compared api semantics yet.

> > > overfeed
FYI: model outputs are not protected by copyright.

> > > mannanj
The companies that did copyright infringement and unethically scrapped data think that copyright infringement and unethically scrapping data is wrong and needs to be stopped.Though only in particular situations, like when it's done to them and not when they do it. Cause they have the power and are morally right and know better than you. And if you question this at all, well you're a threat to American values and a supporter of the Chinese and leading to the break down of Democracy.This isn't a type of reasoning argument or manipulation tactic used by the rich throughout history to trick the naive and gullible masses or anything like that. Trust me, I'm rich and I'm morally right. /sarcasm

> > > > brookst
It's been amazing to see the arc of tech people going from "evil Disney, copyright is an abomination, information wants to be free" to "OMG copyright is inviolable and AI is taking money out of Plato's descendants' pockets!"

> > > > > solid_fuel
> taking money out of Plato's descendants' pocketsYeah, remind me - is it Plato's descendants that people are concerned about here, or is it every single author who had any work in Anna's Archive, any work published online, any work published on github, etc?I think that people are probably upset about the harm to living people who had their work stolen by Meta and other LLM companies - regardless of license, terms of use, or any other attempted protection.

> > > > > > brookst
Sure, that's the motte / bailey. Easy to point to living, starving writers who suffer grevious harm, in defense of perpetual copyright. Disney and others use literally this exact argument year after year.I'm not even disagreeing. I'm just saying the shift in attitude about copyright in the tech space has been sudden, dramatic, and really funny. Remember "you wouldn't steal a car"? Today's anti-AI tech contingent are enthusiastically embracing that false equivalence that we all laughed at 20 years ago.

> > > > > > toraway
Having a static, immovable belief system about something like copyright that is unaffected by seismic shifts in the real world also doesn't seem very logical.If like, Disney did a 180 overnight and bought rights from Google to scan every writer's saved work in Docs with some flimsy legal argument then a person saying "wait doesn't copyright actually protect that" would make sense. Even if you were previously upset about them suing schools for using 80 year art.

> > > > > > brookst
Sure. So you're saying MPAA was right and you've come around?Creative works have always been accretive. There had never been a creative work made out of whole cloth, with no debt to any previous work.The fact your opinions about creative works change based on who's profiting does not change that.

> > > BoorishBears
Reasoning models can coaxed to reason like they do in dedicated reasoning blocks, outside of those blocks: in normal parts of the response.But Anthropic at least has openly admitted they try to detect that and interfere

> > > ComputerGuru
Supposedly there are "jailbreaks" that expose considerably more of the thinking traces.

> > > > woctordho
Simple trick: Use an agentic tool like Pi or OpenCode that allows you to switch models. First do some chats with DeepSeek or GLM who shows full thinking traces, then switch to Claude or GPT and it's more likely to show full thinking traces.

> > > mirekrusin
I don't understand why there isn't public dataset for reasoning that can be improved by humans/llms like Wikipedia (ie with auto judging contributions etc).

> > > > woctordho
There is already a lot of effort to collect agent traces including reasonings, e.g. see the recent discussion: https://old.reddit.com/r/LocalLLaMA/commen ts/1u795pb/donate_...We've been developing DataClaw for this: https://github.com/peteromallet/dataclaw

> > > > > mirekrusin
Did I get it wrong or the first link has dataset with 30 entries only?

> > > > logicchains
For reasoning a manually-curated dataset is too small; you need to be able to automatically generate vast volumes of synthetic reasoning data with provably correct answers. That's presumably why Claude and GPT are so good at using Lean (the theorem prover), because they get fed a bunch of synthetic, verifiably correct training data.

> > > > > mirekrusin
Wikipedia is a lot of data as well but we manage to do it, no?

> > > orbital-decay
You can trivially leak the CoT of any current model, it's not a problem.>outrageous copyright infringement>unethically scrapped dataHahahahaha

> alexjplant
> It seems to really be a nice step-up and is getting quite close to the frontier.IMHO it's already surpassed them. I vastly prefer my personal GLM and OpenCode setup to the Claude Code and Opus one that I have to use at work. The former makes way fewer StackOverflow brogrammer-tier mistakes and is considerably better at following instructions. The harness UX is also vastly superior as it doesn't ignore, randomly change, or incorrectly report settings.Maybe it's the harness and I'd have even greater success with OpenCode and Anthropic, but I think it safe to say that Anthropic's moat is evaporating.

> > carter2099
You would be surprised at how much of an impact the harness has. I switched to Pi and chinese open source models, and models that _I know_ are less capable than sonnet outperform my sonnet + claude code stack at work.

> vorticalbox
This is a problem I find with opus is will spend so long thinking then going "but wait what if"To point where I stop it and simple tell it to "start writing code you can work it out as you go along"Seems writers block also effects LLM

> > robertkarl
https://arxiv.org/abs/2606.00206In this paper they nerf an LLMs ability to emit waffling thinking tokens like "wait", "but", "alternatively", and the models (they're old, small models in the paper) terminate reasoning faster and perform better. I bet Anthropic is tuning this on their backend.

> > > addandsubtract
Didn't they originally introduce those tokens to make the models smarter by second guessing their "thoughts"?

> > > meatmanek
This is super cool. Do you know if any of the inference backends (llama.cpp, vllm, etc) support this technique?

> > > > iaw
vLLM supports "banning" certain tokens but I don't know if it can dynamically reduce them.To my knowledge you can also "ban" with llama.cpp but it is passed in the API call rather than to the server at initialization.

> > > orbital-decay
I imagine Anthropic would rather train a small control model instead of resorting to sampling hacks

> > giancarlostoro
I usually have Claude build a plan first, then I put it into an XML file it updates with phases, usually we talk about some of those tasks, and then once its good and I like it, I have Claude implement the plan.Another thing I tell Claude to do is to not guess, but look at documentation, it messes up a lot less, might use some tokens reading docs, but at least it has a higher success rate code wise.

> > > > giancarlostoro
Apparently because of how Claude is trained, even the system level prompts go through as XML, it works better with XML "prompting" so I figured I could have it write plans in XML. I need to update my ticketing tool to output XML maybe by default.https://www.reddit.com/r/ClaudeAI/ comments/1psxuv7/anthropic...

> > > > > saltsucker
Comments later in thread say markdown works just as fine and that it's more important to organize your plan into sections.Also just think about it, why would a model trained on the world's corpus of text (that isnt formatted in xml) perform better with XML? It would be a better study if that post tested markdown, org, xml, json, etc. 10 times to see if their is a difference

> > > > > > swingboy
Anthropic's best practices still include the use of XML: https://platform.claude.com/docs/e n/build-with-claude/prompt...

> > > > > > adastra22
A year or so ago XML worked more reliably for long-lived prompt instructions. Now it is cargo culting.

> > > > > > orbital-decay
XML consistently performed better than markdown and JSON in all evals I've ever seen on any model, except for a couple very specific ones.

> > > > aesthesia
One reason to use XML-like formatting is that it makes the beginning and end of sections explicit. This is less of an issue when the model is generating text but can still be helpful when using templated prompts.

> > > > root-parent
XML stands for Xtra ML....

> > > > > noworriesnate
I'd like to switch to a sales career--can you give me any pointers?

> > mikeocool
Seriously. Whenever I read the thinking output I get mad and turn down effort to medium or low.Just output the code and we'll work through it!I feel similarly about having codex review claude's plans. I don't think I've ever seen it catch a major issue. It just points out things that would have inevitably been addressed during implementation anyway.

> > > SubiculumCode
A lot of times this is how humans work. Just start 'putting words on paper', 'think by doing', etc. sometimes it's more efficient to see why something won't work after writing a bit of it, and sometimes you get lucky and it works right off the bat

> > epolanski
Fable was 20 times worse on that.It's clear it was the vibe coding model, as like no other model before, fully turned you into his assistant instead of the other way around.

> > > RyanHamilton
Could it be possible, these firms are optimizing for two things: a) Better performance. b) Gathering data from you to further improve performance later. I've also found the huge amount of planning rather than iteration frustrating. I've felt like I'm teaching a junior!

> > > > epolanski
I think they simply optimize around E2E benchmarks, none of those benchmarks is designed as multi turn assistance to the user, but going from a prompt straight to the final solution.

> > > > > celrod
Exactly. How can "we" develop and encourage benchmarks for multi-turn user assistance? That is what I want. I feel like the models and harnesses push much too hard against this workflow -- that they push you towards letting go and vibe coding, with only your discipline (and desire for a quality and maintainable product) holding it back.

> > > > happyPersonR
more thinking == more tokens === more money LOLL

> > > > > overfeed
Os there a cost benchmark out there? I wonder how frontier models are doing over time for cost per problem solved.

> > > > > drob518
I think they are optimizing for one-shot performance because that will drive usage. They can't afford to look bad in the benchmarks. And if that means consuming an order of magnitude more tokens, well, that's good for business, too.

> > drob518
Qwen is notorious for this, too. It'll sometimes spin in a long loop of "But wait..." paragraphs.

> > thinkingtoilet
I've been having success with Opus but you REALLY have to tame it. Long prompts that list what files to look at, relationships between entities, etc... I went from regularly hitting my daily limit to almost never hitting it. Oh, and also I was being lazy with small changes and stopping that helped a lot too. As you said, it gets in these loops where it's just churning and if you don't stop it it can go on for way too long.

> h14h
Hopefully the recent work Moonshot did with Kimi K2.7 Code trickles in to the other open-model labs.Per AA, while K2.7 Code is roughly on par w/ K2.6 in terms of intelligence, it uses half the output tokens to get there.

> > h14h
I've been doing some testing with GLM 5.2 on Fireworks and it looks like the "High" reasoning level uses fewer tokens than even K2.7 Code by a considerable margin (roughly half).Don't have any evals indicating how it compares on upper-bound quality, but for a well-defined task it seems like GLM 5.2 on "High" is remarkably token efficient. Looking forward to seeing where it lands on the AA index.

> bertili
This is GLM 5.2 Max. GLM 5.2 High which use less than half[1] the tokens.[1] https://z.ai/blog/glm-5.2

> > Tiberium
Yes, but the Artificial Analysis result is also from GLM 5.2 (max), not high.

> > > andai
They have this with a lot of models, measuring only the max setting, while the one you'd actually want to use for most tasks is much lower.

> > > > epolanski
For the brief period with had Fable, I never had to use it above medium.Low nailed the overwhelming majority of mundane tasks on it's own, medium was good for more complex stuff.

> cmrdporcupine
> Of course if you convert those values to actual request cost, GLM 5.2 will probably beat GPT 5.5/Opus 4.8, but speed matters for a lot of people, I think.GLM5.2 ends up being far more expensive than I thought it would be when I tried it on openrouter. I ground through $5 USD worth of tokens quite quickly.And this was high, not max.

> > guelo
Using these open models really makes you realize how subsidized Anthropic and OpenAi's subscription plans are.

> > > nijave
Absolutely. You can also run codeburn or ccusage and they'll scan the session files and tell you how much you burnt in API token pricing equivalent.

> esafak
I agree. I've noticed that it is quite smart but it has a tendency to doubt itself and overthink. I monitor its internal dialogue and prod it when it does this. They need to optimize the chain of thought early stopping.

> abgruszecki
Agreed that models should get better at working with rare programming languages like Nim! Using them tends to confuse agents a lot in general. We're working on a paper right now where we compare how token-efficient models are when trying to implement the exact same program in different programming languages, and that's one of the trends we're seeing.

> robmccoll
That's interesting. I gave nearly the same task to Gemma4 31b as a test yesterday. Write a symbolic math engine in Typescript that can perform evaluation and simple expression reductions over +-/*(). It performed the task correctly with minimal reasoning - much fewer reasoning tokens than output tokens.

> > gbingles
Tbh, so what? I googled "symbolic math engine in Typescript that can perform evaluation and simple expression reductions over +-/*()" and got what looks to be viable answers without using any AI model at all. Reciting well established things from memory isn't terribly interesting. Show it a novel codebase and have it implement something within it.

> > > SubiculumCode
TBH, while your point is a fair one, your attitude is off-putting and needlessly condescending.

> > > drob518
So, a natural question would be why a model would ever get it wrong?

> xyzsparetimexyz
Reminiscent of https://en.wikipedia.org/wiki/Portia_(spider)

> rdsubhas
As per stats in other comments, it is frontier, not close to frontier.

> HWR_14
I thought you could not compare tokens across models because their cost and speed was so different between models.

> nurumaik
You asked for maximum effort, you got maximum effort

kristopolous
I have a script that ranks these based on codingindex from Artificial Analysis.All it does is pull a json from their main table page and parses it with the fields I care about (coding).There used to be a mailing list associated with it but eh ... there wasn't much interest. I use the script every day though.Current partial output score age size name 47.1 58 large Kimi K2.6 47.5 54 large DeepSeek V4 Pro (Reasoning, Max Effort) 47.5 70 - Muse Spark 47.6 132 - Claude Opus 4.6 (Non-reasoning, High Effort) 47.8 205 - Claude Opus 4.5 (Reasoning) 48.1 132 - Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 48.6 55 - GPT-5.5 (Non-reasoning) 48.7 188 - GPT-5.2 (xhigh) 50.1 29 - Qwen3.7 Max 50.7 1 large GLM-5.2 (max) 50.9 120 - Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 51.5 92 - GPT-5.4 mini (xhigh) 52.1 55 - GPT-5.5 (low) 52.5 62 - Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 53.1 132 - GPT-5.3 Codex (xhigh) 53.1 62 - Claude Opus 4.7 (Non-reasoning, High Effort) 55.5 118 - Gemini 3.1 Pro Preview 56.2 55 - GPT-5.5 (medium) 56.7 20 - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) 57.2 104 - GPT-5.4 (xhigh) 58.5 55 - GPT-5.5 (high) 59.1 55 - GPT-5.5 (xhigh) 62 8 - Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
To see everything, run it like so $ curl day50.dev/art-analysis.sh | bash
The repo: https://github.com/day50-dev/aa-eval-emailsome key takeaways:* open models are on about a 4-7 month lag right now depending on how you want to measure it* if this keeps up, you might see an open-weights model doing claude fable 5 level work before the new year.if people sign up for the free mailing list (that just does this) I'll go and put it back on ... emails when new model evals drop - it was pretty useful.

> papersail
score age size name 62.0 8 - Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) 59.1 55 - GPT-5.5 (xhigh) 58.5 55 - GPT-5.5 (high) 57.2 104 - GPT-5.4 (xhigh) 56.7 20 - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) 56.2 55 - GPT-5.5 (medium) 55.5 118 - Gemini 3.1 Pro Preview 53.1 132 - GPT-5.3 Codex (xhigh) 53.1 62 - Claude Opus 4.7 (Non-reasoning, High Effort) 52.5 62 - Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 52.1 55 - GPT-5.5 (low) 51.5 92 - GPT-5.4 mini (xhigh) 50.9 120 - Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 50.7 1 large GLM-5.2 (max) 50.1 29 - Qwen3.7 Max 48.7 188 - GPT-5.2 (xhigh) 48.6 55 - GPT-5.5 (Non-reasoning) 48.1 132 - Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 47.8 205 - Claude Opus 4.5 (Reasoning)

> > tcp_handshaker
Short comments...- GPT 5.5 consistently the best, an opinion who gets me constant downvotes here by the Anthropic Marketeer strike force...- China is going to eat the US lunch on AI- What have European universities and companies been doing? Its like if, on a parallel past/future, Nikola Tesla and Edison would have created flying Cyberpunk machines, while Europeans researchers, would be getting together to request EU funds, for investigation on how to breed faster horses.- If Zuckerberg could be fired, after spending a total of $235 billion on AI and having NOTHING to show for...should he be fired?

> > > Certhas
None of these models come from universities, European or otherwise.Mistral is clearly currently not competing for Frontier Model. Whether this is due to a lack of VC Funds or a lack of technical ability or the former arising from the latter would be interesting to know.The top models are from startups. Among the FAANG only Google managed to get a Frontier model, and they litterally invented the architecture and have more money than they can possibly spend to throw at the problem. Facebook shows that even ungodly amounts of money don't get you there though.So why did no EU based Startups succeed while two US start ups succeeded? I agree that that's a very important question the EU should ask. The Internet revolution was driven by US companies, and now AI will be as well, with Chinese Open Weights mixed in. The EU consistently can not turn its considerable economic output into fast moving tech firms.

> > > > Quarrel
Mistral have moved to actually trying to make money, and been relatively successful; at least if we lived in a normal world.They've got a heap of contractors working to help industry adopt LLMs. It is just classic consulting work, and they'd look like a really great company if we weren't comparing them to literal $2T+ companies losing money hand-over-fist...

> > > > sschueller
Apertus was built by universities in Switzerland. Although not frontier it is fully open.[1] https://apertvs.ai/pages/about/

> > > > kristopolous
I'm actually more curious about IBM. Their granite series appears to be nowhere close to competitive.They had Watson, remember, it won on jeopardy like 15 years ago? They've been at this for a long timeMaybe it's good at something else?

> > > > > tekchip
IBM doesn't do technology they do contracts. Any "technology" is marketing stunts. They hire a bunch of "fellows" outside contractors to make a thing they can be first at or whatever, do the stunt, then get a bunch of 5-10 year contracts with customers off the stunt. They then fuck it up for that length of time but still get paid due to those contracts. After that space of time the folks theyve burned have moved on, rinse repeat. Pretty easy to look back at the timeline of "firsts" they have and see the pattern.

> > > > > > JSR_FDED
Don't forget the marketing for the new $1B "initiative" (fill in: mobile, cloud, blockchain, AI,...)Upon closer inspection the $1B is (a) over 10 years, (b) mostly internal cross-billing between departments.

> > > > > > drob518
Yes, but the key point is that nobody got fired for buying it from IBM.

> > > > > > tanseydavid
"HAL, I want you to train a frontier-level large language model for me.""I'm sorry Dave, I can't do that"

> > > > > root-parent
Agree that IBM has no excuse. Specially for how long they have been trying to do AI. Although Watson was a completely different technology.They had to start from scratch, but dont seem to have the management to be smart enough, to stop doing it in house. They could have just acquired a startup that could build a frontier model.What is also very ironic since their whole bussiness for the last 15 years, has been buying companies a la CA Associates...Their previous Watson branding and collapse of Watson expectations cost them one CEO, but the current CEO was part of the same team. They just dont learn....

> > > > > vunderba
I view Watson in the same light as Deep Blue, one-offs that brought more prestige and potential share value to IBM than necessarily "moving the needle" in the respective technology.

> > > > > greenavocado
Granite is OK for speech to text (ASR)

> > > marcus_cemes
To be honest, living in Switzerland and speaking with peers, we're just exhausted by the constant AI hype. For a lot of us, the fact that Europe isn't frantically trying to scrape the entire internet and every book in existence for the next massive model isn't a bad thing. The big players are doing their thing, like with the nuclear arms race. We regulate a lot, too much a lot of the time, but sometimes that trickles down to other places too. A lot was done right, imo.ETH Zurich and EPFL universities recently put out an open model called Apertus (was on the HN front page a few months back), it's not a frontier model, but they built it properly regarding copyright and data transparency.It might look a bit slow or old-fashioned, but focusing on doing things ethically and legally feels like a much better path than just joining the race to scrape everything.

> > > > dr_dshiv
Sir, I would suggest that if Europe fails to be economically competitive, the downstream implications on European society will produce much worse outcomes than (for instance) data transparency...Doing things with ethical intentions does not necessarily produce outcomes that are beneficial for society at large.

> > > > > marcus_cemes
I'm inclined to agree with you, but you could make the same argument for exploiting natural resources and the environment. I don't think it's being done right at the moment, and it does not seem to be benefiting people as much as certain companies.

> > > > > muvlon
Well, is this mad dash for AI producing "outcomes that are beneficial for society at large" yet? So far it looks like its mostly producing a ton of negative externalities and wealth transfer to corrupt elites.Also, no, abandoning ethics is not an option, what a ridiculous suggestion.

> > > > > > dr_dshiv
Data transparency and copyright does not constitute "ethics."

> > > > _zoltan_
also living in Swizerland and I disagree. Hard.it's horrible that Europe is so backwards in AI. too much regulation and nothing to show for it. we should be way faster.there is no money. the culture in both Europe and Switzerland is that you don't fail, while in the US it's perfectly fine to be on your 4th startup because the first 3 failed.it's not that it LOOKS slow and old fashioned, it IS slow and old fashioned. it's horrible.

> > > > tsss
If these models ever reach the point where they are as good a programmer as a human is (and thus can self-improve completely independently), then there won't be an independent Switzerland much longer. AI race is a race for first place.> like with the nuclear arms raceMacArthur was about to nuke the Chinese in the Korean war. China knows that nuclear weapons, AI and robotics are a matter of survival and not a nice-to-have.

> > > wunderlotus
> - If Zuckerberg could be fired, after spending a total of $235 billion on AI and having NOTHING to show for...should he be fired?Yes, if the premise was true but it's not.https://opper.ai/ai-roundtable/questions/b bf5a4e9-204

> > > > tcp_handshaker
Interesting...but this shows how dumb these AI are.And they misunderstood nothing to show for as...literally nothing to show for. Yes not factually but he has nothing effectively not much that is competitive to show for so its literally true.And had they been give this clarification then would have suddenly said: "Oh yes of course, you are absolutely right, you are correct on challenging me on that...."

> > > ricardobayes
Well Europe is famously a laggard when it comes to new tech - in parts of Switzerland, two horses were required be mounted in front to carry cars up until 1925. UK required a person to walk in front of a car and wave a red flag.

> > > kristopolous
They did muse spark ... it's not garbage.Also what are they building it for? I'd think it's to serve ads better or something like that. Maybe Muse Spark fits facebook's needs perfectly...

> > > > jansan
Mo Bitar said something like "Meta's LLM is the one you use if you accidentially hit the wrong button in WhatsApp. Its user base is fat-finger phone users."

> > > > > tcp_handshaker
As comparison the WHOLE NASA budget is 24 billion. Meta burned 10x that on AI...