trouve_search OK, I'm 100% rooting for both Mistral and task focused
small models.But Mistral has fall really far behind since
2025Q3. It seems they can't get good reasoning models
working at even medium context sizes, which is necessary
to be at the table right now.Gemma4 and Qwen3.6 are
currently best in the small size; Mistral's "small" model
has ~4x the parameter count at 120B and isn't even
competing with models a quarter its size.Back one year ago
with Mistral Small 3.1 they were keeping up, but they've
fallen into irrelevancy right now.If Mistral seriously
wants to play the on-prem and small task-specific model
game, a decent proxy would be to build models that get the
r/localLlama crowd excited
|
> ar0 I agree. I am a paying Le Chat Pro user, really
rooting for a European alternative. But the quality
difference between Mistral and the frontier labs is
growing too big to ignore. It's worrying to me that
they didn't talk much about new models at the
conference, because that is really where their focus
should be IMHO.I am wondering what is keeping them
back, though: Money? Compute? Skills? Training data?
My fear is that you are really only getting really
good models by training on very dubious data (outputs
from the frontier models etc) and that Mistral is too
European and too enterprisey to take those risks.
|
> > mattnewton My theory with no insider information: it's a
little of all of the above, but mostly money. To
some extent, you can dig yourself out of a data
hole with RL and a lot of compute. And you can buy
a lot of compute and some data with a lot of
money. Big labs have been operating in this regime
for a while and it's one of the drivers behind
their costs beyond just scaling the weights and
doing the actual training. Mistral just doesn't
have access to this level of compute or the money
to try and muscle their way in.
|
> > > MichaelZuo Don't they supposedly have a huge amount of EU
support?Or at least there's been a lot of
noise about that.
|
> > > > mattnewton I wouldn't be surprised if each of the
frontier American labs and individually
has compute access similar to the entire
EU. Chinese firms are a more interesting
comparison since there are a fair amount
of great models there, and it's estimated
about 15% of the ai relevant compute is in
China versus maybe 5% in the EU under
European companies (and 70% ish in the US
is the most common ballpark I see)
|
> > > > > cowpig I think you are underestimating the
amount of compute the US frontier labs
have access to.
|
> > > > > > rolymath So, more than 70% of the compute
on earth?
|
> > > > > > wongarsu More than 5%, I assume. From the
combination of "5% in the EU under
European companies" and "each of
the frontier American labs and
individually has compute access
similar to the entire EU"I dont't
think that was meant to be
implied: the EU actually has
access to more GPUs than those
hosted by European companies in
Europe, just as US labs have
access to GPUs hosted outside the
US
|
> > > > baq They can get what, 1B euros? 10B when
everyone loses their mind? This doesn't
buy nearly enough compute
nowadays.Meanwhile, Anthropic and OpenAI
have investors practically begging them to
let them buy this much equity at
mind-bogging valuations.
|
> > > > > Tepix The chinese labs manage to do it.
Mistral should have enough money.
|
> > > > > > wongarsu The EU has intentional structural
hurdles to pouring money into a
predetermined single company. Both
hurdles meant to fight corruption
and nepotism, and hurdles meant to
ensure fairness between the member
states. After all, money to
Mistral is money to France too,
and you don't want countries to
abuse such mechanismsIt's not
impossible, but China is just much
better set up for the nessesary
level of government support
|
> > > > > > snowpid China is a way more corrupt
country but this might be a
benefit of less rules.
|
> > > > > > codebolt China has cheap coal powered
electricity and leaders that make
things happen. Europe has
beaureaucrats that only love
talking, high taxes and expensive
energy.
|
> > > > tormeh It's a bit strange, but a huge handout
from the EU/France and a huge AI lab
investment round are different orders of
magnitude. The necessary sums are just not
politically possible. How do you sell
spending the equivalent of ten USS Gerald
Fords on a start-up? You don't.
|
> > > > > teiferer And a lot of the "funding" is through
mutual deals with MSFT, Nvidia, etc.
The Europeans have none of that and
would need to pay in actual cash.
|
> > teiferer > I am wondering what is keeping them back,
though: Money? Compute? Skills? Training data?Not
ruthless enough and no backing by a corrupt govt
administration that has no morals but focuses on
self-enrichment instead.Might sound drastic but I
think that's actually closer to the truth thn
everbody likes to admit.> My fear is that you are
really only getting really good models by training
on very dubious data (outputs from the frontier
models etc) and that Mistral is too European and
too enterprisey to take those risks.Exactly.
|
> > > steve_adams_86 My hope is that people working with less
resources will need to push on ingenuity to do
more with less, leading to innovations. But
there's certainly no guarantee of it
|
> > pembrook > what is keeping them back, though: Money?
Compute? Skills? Training data?All of the above
and more. Everything holding Mistral back is the
same thing that has held Europe back from
competing in the entire digital revolution. See
this 1991 article lamenting the loss of any viable
European PC manufacturer:
https://www.nytimes.com/1991/04/22/business/europe
-stumbles-...Mistral being in Europe is
disadvantaged with:1. Money: less diverse private
pension fund environment = less LPs to invest in
VC funds = less VC dollars to invest in new
ventures. European money is vacuumed out of the
private sector into state pension funds and dumped
into low yielding government bonds. This starves
the private sector of capital while inflating the
% of GDP driven by government spending every year
(government pension funds buying government bonds
in circular fashion enable runaway deficit
spending...just like circular AI infrastructure
spending).2. Talent & compute: due to #1, Silicon
Valley can outbid Europe for the best talent and
hardware. Watch an OpenAI launch video and listen
to all the European accents.3. Local market
fragmentation: Europe is a collection of countries
that pretend to work together while not even
having a unified capital market. The average EU
citizen can barely communicate with their neighbor
in a common language beyond the level of a toddler
(english fluency is massively overstated by
Americans who only experience tourist capitals).4.
Regulatory disadvantages: In everything from
company regs, employee regs, unions, privacy regs,
data portability regs, etc.It's not "culture" or
Europeans being "lazy" as most people would claim.
There's currently thousands of young french people
working 80 hour weeks creating dumb consulting
powerpoints or legacy investment banking deal
memos as we speak. Ambitious people exist
everywhere in equal proportion, they're just
working on the wrong things.Europe can't compete
in the digital revolution the same way they could
compete in the industrial revolution due to
various system design choices. Culture is simply
the aesthetically observed byproducts of system
design.
|
> > > dash2 >The average EU citizen can barely communicate
with their neighbor in a common language
beyond the level of a toddler (english fluency
is massively overstated by Americans who only
experience tourist capitals).Not true in my
experience: even German waiters in small towns
tend to have pretty fluent English.
|
> > > > CalRobert It varies a lot. Germany is pretty strong
in English, and the Netherlands next door
is exceptional, but as you go south to
Italy, etc English proficiency
weakens.Edit: more broadly, there's just
more friction when people aren't in their
first language. I know I hesitate to bring
up some things, say hi to strangers, try
making a joke, etc because the cost of
talking is just... higher.
|
> > > > pembrook Was just driving around medium and
small-ish towns in Bavaria. This was not
my experience at all.The German speaking
members of our group had to order food for
us in most restaurants.And most locals
aren't waiters in restaurants.
|
> > > cj00 > 4. Regulatory disadvantages: In everything
from company regs, employee regs, unions,
privacy regs, data portability regs,
etc.Agreed. My own anecdote: my company is
global and for the past 6 months, we've been
working on getting regulatory and legal
approval for an LLM-based feature. The initial
proposals of going live in all of our markets
have been pared back to exclude Europe
altogether due to the regulatory
environment.When I took part in company-wide
gen AI councils that reviewed new product
rollouts, it seemed like there was a definite
hesitation from higher ups from pushing out
any leading edge features to European markets.
And it's not that the regulations would
necessarily block these features from going
live but that they'd increase implementation
costs to the point where it wouldn't be worth
it.
|
> > > PeterStuer 1 and 2 are the same. Infinite money without
barely any consequence because of 'reserve
currency' privilege. To compete with that, the
EU can't nuke the dollar because it would be
suicide given the Eurodollar realities, and
they can't anchor EU ip and talent because our
politicians are too intertwined with globalist
ideology and capital.
|
> > > > snowpid "they can't anchor EU ip and talent
because our politicians are too
intertwined with globalist ideology and
capital." You want to force staying in the
EU?
|
> > > > > PeterStuer You think other regions do not anchor
strategic industries? The EU is
extremely lax in this regard.
|
> > > > > > snowpid so why do you use the term
globalist? Most talent is
globalist anyway.
|
> > > _fizz_buzz_ > 2. Talent & compute: due to #1, Silicon
Valley can outbid Europe for the best talent
and hardware. Watch an OpenAI launch video and
listen to all the European accents.There is
definitely a lot of truth to that. Maybe a bit
of an arbitrary measure, but these are the
nationalites of the people that wrote the
"Attention is all you need" paper. Pretty
revealing I find:Ashish Vaswani: IndiaNiki
Parmar: IndiaJakob Uszkoreit: GermanyLlion
Jones: Wales (UK)Aidan Gomez: CanadaŁukasz
Kaiser: PolandIllia Polosukhin: UkraineNoam
Shazeer: USA
|
> > > > snowpid Yes that was 2018. Things vastly
deteriorated in the US.
|
> > > Shitty-kitty You say that as if the American version of
maximalist Capitalism is good or desirable to
most people.Personally, I would much rather
have good public pensions and health-care,
than A.I agents.
|
> > > > pembrook This has nothing to do with it.The US also
has public pensions (social security
payouts rival or beat many EU countries)
with dramatically better tax free private
options on top.Also, the US has free
healthcare (Medicare and Medicaid) for
roughly 50% of its population.Expanding
that to 100% doesn't suddenly make them a
bad country to do business in.You think
OpenAI is going to close up shop and move
to Mexico if the US expands single payer
healthcare? That would actually make it
even easier for businesses to operate in
the US!
|
> > > > > Shitty-kitty Social Security and Medicare are
vastly inferior to their European
counterparts. Medicaid is an absolute
disaster and a large number of doctors
and health facilities will not even
accept it.
|
> > > > > > pembrook Not true in the case of average
social security payouts. But
again, this argument is a total
derailing of this thread and
addresses none of my
points.Explain to me how expanding
US single payer healthcare
suddenly makes the US a worse
place to do business in than
Europe?Companies would love not
having to deal with the
complexities of 401ks and employer
health plans.
|
> > > > > > Shitty-kitty You must have misunderstood me. I
never argued that US single payer
healthcare would be bad, just that
one based on expanding medicaid
bottom-rung insurance, would not
be adequate.
|
> > > > moffkalast Yeah and data protections. GDPR, data
frugality laws, etc. may be the end of
Mistral but it's a small price to pay for
corporations to not have free range over
every minute detail of our lives.
Americans just accept it because they have
already lost. We haven't, in fact we've
just won recently with chat control being
struck down. Meta can no longer train on
and monitor every Whatsapp chat without
being criminally liable.
|
> > > > manmal Maybe we will have only agents, soon.
|
> > > Fnoord > European money is vacuumed out of the
private sector into state pension funds and
dumped into low yielding government
bonds.Which countries do that? The ones in NL
actually invest in US big tech.Once Europe
stops investing in USA, Europe will be better
able to compete.> Talent & compute: due to #1,
Silicon Valley can outbid Europe for the best
talent and hardware. Watch an OpenAI launch
video and listen to all the European
accents.That just denotes European students
are high quality.Brain drain is happening due
to bullying and fascism. The extend of
longterm danage of current administration is
unclear.> Local market fragmentation: Europe
is a collection of countries that pretend to
work together while not even having a unified
capital market. The average EU citizen can
barely communicate with their neighbor in a
common language beyond the level of a toddler
(english fluency is massively overstated by
Americans who only experience tourist
capitals).Bollocks. I have been in Berlin and
Munich various times past decades, and people
there speak English very well. Nowadays,
translation is a profession which got hit by
the AI club.The people in the rural areas
don't have to work together with other people
from rural areas. They just need websites and
tooling in their native language, or a major
language.Case in point: the French company
Mistral has Dutch company ASML has one of
their major investors. If you go to Eindhoven
area (Netherlands mini SV called Brainport
Eindhoven), you get away with English
perfectly fine, and there too you will hear
all kind if accents.
|
> > > > pembrook > Which countries do that?Every single one
invests in government bonds with a large
allocation. Aside from the pure ponzi
scheme ones with no actual fund where its
money in -> money out.Also, if Europe
stopped investing in US equities their
pension insolvency problem would get about
2-3X worse given US equities have far
exceeded EU equity returns over the past
20 years.> Brain drain is happening due to
bullying and fascism. The extend of
longterm danage of current administration
is unclear.Huh? Did you even see the
headline of the 1991 article I linked?
Brain drain has been happening because of
everything I listed which has been true
for decades. Europe couldn't come up with
a relevant company in PCs, Operating
Systems, Internet 1.0, Social media,
Mobile, AI, etc. None of it is due to the
current administration.> Bollocks. I have
been in Berlin and Munich various times
past decades, and people there speak
English very well.Yes, and while you were
traveling to tourist capitals occasionally
I've been actually living in Europe. Your
perception is not the reality the average
German person lives.The problem isn't that
the smartest Engineers in Europe don't
speak english. The problem is that the
average person in the markets they would
sell into don't speak a common
language.Kind of hard to cut deals and
build a brand among 27 different insular,
hyper nationalist markets in a bunch of
languages you don't speak with completely
different regulations.
|
> > > WhyComboNadir re: #4 Maybe it's easier if you grow up in the
system and know how to navigate the written
and unwritten rules, but as a dual
Canadian-American who recently gained Austrian
citizenship, the regulatory friction is
absolutely real. I decided to launch a new
venture through an Austrian GmbH.There are
supposedly streamlined paths for local
residents, but I had to go through the
standard corporate pipeline. I spent three
months fighting a bizarre catch-22 between my
notary (who cost €3k+) and the bank. To open
the account, I had to prove I deposited €10k
in capital. But I couldn't make the deposit
without an active bank account. On top of
that, the bank's compliance team kept
arbitrarily canceling my application due to
"incorrect answers"... refusing to tell me
what the errors actually were and forcing me
to restart the entire process ab initio.I
finally just gave up. I wrote off the €3,000
notary fee and €1,000 in registered office
costs as a sunk cost, and incorporated a US
LLC instead. It took under 10 minutes, no
notary, fees of $25 since I did it myself,
plus another 20 minutes to open the business
bank account.There was no commercial reason to
choose Austria; it was purely sentimental. My
ancestors were entrepreneurs in Linz and
Vienna, and I loved the idea of renewing that
legacy. But the sheer weight of the
bureaucracy managed to kill about 99% of the
early-stage startup enthusiasm you normally
rely on to get a new project off the ground.
|
> > > > mike_hearn That catch-22 is supposed to be broken by
the bank. It's a two phase commit where
you open the account in a special state
where you can only deposit the capital.
Then the bank gives you evidence you've
done so, you take that to the notary and
open the company, then send the evidence
you've done that back to the bank to
convert it into a full account.It's a
bizarre system that Switzerland uses too.
I've done it twice. Unfortunately the
German speaking world has a lot of rules
that are trying to eliminate all risk for
investors and employees. The GmbH/AG
capital requirements are just the
start.The next fun thing you might have
encountered, at least in Switzerland, are
rules that literally say your company's
assets can't fall below 50% of your
initial capitalization. If it does you're
supposed to raise funds or make more
investment of your own private capital and
this rule pierces the usual liability
requirements. Even more fun: it turns out
that this law isn't actually enforced and
locals regularly ignore it. But bad
accountants won't tell you that. They'll
just inform you of the law when you do
your yearly accounts.Then you have wealth
taxes that cover the valuation of a
startup as if it were a cash position. So
if you raise $100M in investor funding
then whatever shares you have left over
are considered to be liquid assets you can
offload at will, and are wealth taxed as
such. The fact that the shares don't trade
in a liquid market is irrelevant to the
tax authorities. In Zürich at least that
got patched by the local tax office
deciding that startup shares aren't
counted for the wealth tax, but this just
means you have to be able to convince the
tax authority that your company is a
startup. The way they determine this is
more or less just the opinion of whoever
at the tax office assesses your case. Does
it sound "startuppy" enough?Fixing this
stuff isn't hard, but it never gets fixed
because European politics is both quite
stagnant and dominated by people who view
hostility to business as a virtue signal.
They don't want to fix it because they
think businesses are sort of like oil
fields. They just exist, lying around
naturally, and the only question is how to
maximally exploit them.
|
> > > > > WhyComboNadir re: catch-22. I was surprised to learn
of it, but then even more surprised to
get caught up in an endless loop even
when I followed the process.For
European countries it seems like a
left-hand-right-hand problem: I go to
a party at the Los Angeles Austrian
consulate, or a "Start Up Austria"
event in SF and listen to them pitch
bringing businesses to Austria (the
left hand). But back home the "right
hand" and culture seems to look down
at people who have accumulated wealth,
and be hostile to the idea of the most
simple reforms that would actually
make someone want to move their
business to Austria/EU. For example,
the intrenched interests of needing a
notary paid a few thousand Euros, are
screwing the whole country so one set
of people can fill an archaic role.Do
you know how much a notary costs in
the US? If you go to a UPS or FedEx
store, it is typically about $10-15.
If you go to your own bank they do it
for free.And get this, when you
incorporate a company in a US state,
guess how many things need to be
notarized? None. As you fill out the
forms you are swearing to their truth,
and accepting a notice that you are
committing perjury if you are
untruthful. Do they check your
statements right then to make sure
you're not lying - absolutely not! Why
bother, you haven't started anything
yet. But by God, if 5 years later when
you're making money, the IRS finds out
that you lied, then you're going to
wish you were living in another
country.In other words, the timeframe
for the risk concern is completely
backwards in Europe: the risk
management basically stops 95% of
things from happening in the first
place, including the 0.001% the might
be fraudulent. Wouldn't it be better
to let 1,000,000+ companies get
formed, then 50,000 of them naturally
become a meaningful success, then take
a closer look at their compliance once
they are truly a real business.I had
some ignorant biases against doing
business in Europe before starting
this process - and I was hoping that
experience would change my mind - but
the funny thing is that my biases
weren't strong enough! (my bank and
incorporating story above is just one
example of many). No thank you EU!(My
first businesses were started in
Canada, and I thought Canada was so
backwards compared to the US in being
business friendly, but now I realize
Canada is maybe 90% as business
friendly as the USA. EU, in my
experience is like 5% as business
friendly as USA)
|
> > sofixa > I am wondering what is keeping them back,
though: Money? Compute? Skills? Training
data?Considering all their talk about new DCs and
compute, and a few offhand comments, it sounded to
me that compute is a big limitation.
|
> > miki123211 Should it, though?I think an European company,
taking Chinese models, perhaps doing its own
post-training on them and training the
Chinese-ness out, with a great chat service,
enterprise API and coding agent, could be pretty
valuable in itself.
|
> > > pyvpx What does "training the Chinese-ness out" even
mean?!
|
> > > > Fnoord The Chinese censorship. The Chinese use
open weight models, Europeans too. US big
three don't.
|
> greyskull > task focused small modelsThis is tangential: and
forgive my ignorance here, but is there an inherent
reason why there aren't smaller, focused models from
the frontier model providers?I'm thinking something
like a software-specific subset of Opus that is the
default for use in Claude Code. Smaller, cheaper to
deploy and consume, maybe faster.
|
> > pavpanchekha OpenAI used to make Codex-specific models, but
they stopped. What I've gathered from interviews
and similar is that training two models isn't
worth the (small) lift from having a
coding-specific model. You're pre-training on
everything anyway, and coding RL is reasonably
useful for general-purpose models too.
|
> > > greyskull Interesting. I'd have guessed there would be
meaningful opex benefits to serving smaller
models.
|
> > > > mediaman What I've heard is that much of the model
"intelligence" is a commingled bucket:
although you can specialize specific
knowledge somewhat, it's hard to
specialize advanced reasoning to specific
domains because so much of reasoning is a
generalized capability that is not unique
to, say, coding.It turns out coding has to
do with a lot of the same reasoning needed
in math or in legal analysis, even if the
grammatical expression is different.This
is less true of lower intelligence tasks.
Classification requires a lot less
reasoning capacity and so can be much
smaller and more specialized.
|
> baq agreed, the next price increase from frontier labs
(and the inevitable limits decrease in subscription
tiers) will have people thinking real hard about their
model providers and that's when mistral should be
ready. however, given their recent performance, I
realistically don't have my hopes high up.
|
> > amunozo DeepSeek is both cheaper and better than Mistral.
|
> > > barrell Not in many tasks. I use deepseek as a
fallback in https://phrasing.app and it's
always very apparent when it happen (due to
mistakes/clear performance drop off)
|
> > > > efromvt Interesting - which models specifically?
I'd be interested in using mistral over
deepseek if it was competitive (guess I
need to go benchmark)
|
> > > > > barrell I use small, large, an medium-3.5
depending on the task
|
> > > gregorygoc Because they distill
|
> > > > losvedir I feel like there's an implication here
that distillation is a problem but I don't
understand what you mean. I thought
distillation was generating text from a
model and then training another model on
it. Is the something unethical in that?
You're paying the API costs to generate
the tokens, right?Or I guess more to the
point: is this something frontier labs
have said is (or tried to paint at any
rate) problematic? This feels like an "out
of the loop" situation because I've only
ever heard "distillation" with a positive
connotation before.
|
> > > > > taneq Whether it's a 'problem' or not is
viewpoint-dependent but it's against
the OpenAI ToU:> You may not use our
Services for any illegal, harmful, or
abusive activity. For example, you may
not:> [...]> * Use Output to develop
models that compete with
OpenAI.Source:
https://openai.com/policies/row-terms-
of-use/(I'm also curious whether they
consider developing a competing model
to be illegal, or harmful, or
abusive...?)
|
> > > > > > Matl > it's against the OpenAI ToUGiven
that OpenAI doesn't care about
training on copyrighted data, why
is suddenly their ToU something
anyone should care about?
|
> > > > > > jononor That OpenAI was in the wrong when
they ignored everyone copyright,
does not make it right to ignore
their ToU. If a one wants IP and
rule of law (incl contracts) to be
respected, one should not violate
others rights when it is
convenient.On a more risk-strategy
level there is the size of their
legal team, general endowment, and
supplier and political connections
to consider.Everyone is free to
ignore their ToU, but I can
understand why a company would
avoid it...
|
> > > > > > nibbleyou > If a one wants IP and rule of
law (incl contracts) to be
respected, one should not violate
others rights when it is
convenient.Yes that's what should
be said to OpenAI. Now they should
not cry about their T&Cs not being
respected when they never cared
about others' copyrights.
|
> > > > > > amunozo As we say in Spanish, "quien roba
a un ladrón tiene cien años de
perdón".
|
> > > > > > zhivota Feels like this should be some
kind of anti-competitive violation
even if it's not actually.
Probably moot under this admin but
still.It's like saying you can't
use windows to develop an OS, or
drive a Ford on the way to your
job at Hyundai.
|
> > > > > > CamperBob2 In heavily-unionized areas, you'd
best not drive that Hyundai to
your job at the Ford plant.
|
> > > > ctrlkctrls it doesn't matter the reason. This is a
race and nobody will care or remember how
the winners got there.Mistral looks like
it's fading away to irrelevance unless
they can play alongside the similar sized
models, or have some unique advantage
other than being in Europe, for Europe. I
was really excited for them back when they
were startup that had the biggest European
venture round ever. This space will have a
few winners, and many losers. Google, plus
either Anthropic or OpenAI most likely.
Big models will see breakthroughs in
inference performance/cost fall
precipitously and small models will only
exist on devices (Pixels and iPhones,
cars, watches, bluetooth speakers, etc)
|
> > > > > gregorygoc It's not that I don't agree with you,
I am just pointing out why it's hard
to catch up to scaling laws given the
European economic (capital) and
political (US would be upset if they
found out Europeans distill)
constraints. China is only bound by
economic constraints.
|
> > > > > > KronisLV With the insight in your comment
and this bit from the above one:>
This is a race and nobody will
care or remember how the winners
got there.It seems like the EU
should have paid China for the
distillation datasets, esp. since
Mistral isn't even a governmental
org.
|
> > > > > sofixa > This is a race and nobody will care
or remember how the winners got
there.For consumer AI, yes. For coding
assistants, probably.For specific
application "business" AI like the
things Airbus announced the other day?
Not at all. What matters for an Airbus
using Mistral to build compliance
documentation based on AI generated
physics simulations is the enterprise
relationship, reliability, compliance,
forward deployed engineers helping
with the fine tuning, quality,
predictability, support. A Chinese lab
having a better at benchmarks model
that is cheaper is just irrelevant for
that.And IMO, the real money in AI is
this type of "business AI" deployment.
Developer tooling tends to converge on
becoming commoditised. Once you're a
core supplier for a big bank and
embedded in their processes, you're
there untill you screw up with the
pricing (like Broadcom), and even
then.
|
> > > > k__ Why doesn't Mistral distill?
|
> > > > > gregorygoc Good question, given that American
companies basically threw copyright
law into the trash, I think they
should.
|
> > > > > > Saline9515 American companies can't sur
Chinese ones, but they can do it
with European ones.
|
> > > > > > Matl So then the European ones should
join with European copyright
holders to sue OpenAI/Anthropic
and watch them trying to BS their
way around what they train on.
|
> > > > > > Saline9515 Training a model on copyrighted
material is fair use and copyright
holders already operate a
mafia-style extortion ring in
Europe, so I don't think that's a
good idea.
|
> > > > > > Matl > Training a model on copyrighted
material is fair useI am not sure
this has been actually
established, but I don't see how
distilling a model is not fair use
then.
|
> > > > > > cleaning American companies won't even sue
other American companies (xai
distilling opus), in fact they
instead form partnerships ;)
|
> > > > > > xienze Well if they did wouldn't be able
to feel superior to Americans
about that particular thing.
Perish the thought!
|
> > > > > > __m It's really a pity, why can't they
feel superior while breaking ToS
and copyrights just like Americans
can feel superior over Deepseek
while breaking ToS and copyrights?
|
> > > > opsnooperfax I suppose losing with dignity is a
consolation.
|
> > djvdq Also, new Medium 3.5 is far more expensive than
previous Mistral models, and much more expensive
than e.g. Deepseek
|
> > > KronisLV I tried it out on some dev tasks with their
Mistral Vibe subscription, and the performance
was pretty okay (okay, not great), both in
regards to development and speed. Worse than
Anthropic's models I'm used to but at 20 EUR
per month it wasn't a bad deal - except that
the 200k context size would more or less be a
deal breaker in many cases.
|
> > > > eisa01 Where do you sign up for that
subscription?I wanted to try out Mistral,
but I fail to find anything like that even
after creating an account
|
> > > > > KronisLV The other comment already mentioned
that you get their subscription:
https://mistral.ai/pricing/ they do
say that you can try out their coding
agent for free, but personally the Pro
tier is pretty affordable too to try
out for a month.Then you can install
their coding harness, I personally
used the Python + uv option:
https://mistral.ai/products/vibe/code/
if you don't have uv yet, you might
have to install it too:
https://docs.astral.sh/uv/ though I
already use it for other projects. Oh
and if on Windows, you probably want
to do all of the installation inside
of WSL, just so that file paths are
the *nix variety, I've had issues
otherwise with pretty much every
coding harness, like OpenCode as well
(across multiple models).After that,
you need an API key for your
subscription, you can generate and
copy it here:
https://console.mistral.ai/codestral/c
li that's also where you see the
quota, though it seems to NOT refresh
instantly, but more or less a few
times a day.Either way, happy coding!
|
> > > > > djvdq Maybe on their pricing
page?https://mistral.ai/pricing/
|
> > > bermudi Everything is more expensive than deepseek.
They aren't frontier in intelligence but they
are the frontier in cost per intelligence
|
> echelon Nobody trying to compete with Google, OpenAI, and
Anthropic should be playing the small models / local
models game.Foundation model labs should be building
very large reasoning models, then leaving it to the
community to distill them down.You can't scale a small
model up, but you can scale a small model down.I'm
convinced the only way we'll have a seat at the table
in the future and avoid total runaway takeoff is if
there are very large models within 80% of the
capabilities of the frontier models. Tiny RTX models
do diddly squat to remain competitive.Build open
weights models for running on H200s. I'll spin them up
on RunPod or Lambda.
|
> > farley13 I do think there's a chance open weight models
have a bit of a moment with the costs of frontier
models growing on business balance sheets. It's
unfortunate from my "privacy loving" PoV that it's
mostly Chinese models filling the gap. ( the top
models on openrouter for instance ).I have used
Mistral models out of pure ideology for web agents
and the like which aren't doing a lot of heavy
lifting.
|
> > > theturtletalks Antirez's Deepseek 4 Flash implementation that
can run on MacBooks also was a revelation. It
runs decently on M5 Max 128GB and it's
pointing out other bottlenecks like prefill
speed which will improve.
|
> > ahnick I thought distillation meant small models don't
have to compete with the big models and can always
eventually achieve close parity, but it's just a
matter of time to do the distillation? (i.e. how
much lag do you want to live with) Am I
oversimplifying?
|
> > > gertlabs There is likely a theoretical limit to how
much intelligence you can pack into a model of
a given size (especially when stretching that
over a large input context size).Our evals are
pretty complex so we only recently started
testing ~30B class models, which are now
becoming quite smart (on par with the frontier
from 1 year ago). Mistral is far behind, but
I'm rooting for them.Data at
https://gertlabs.com/rankings
|
> raincole > they've fallen into irrelevancy right nowIt's a very
charitable take, as Mistral has never really left the
realm of irrelevancy.It's only a matter of time before
EU falls back to hosting Chinese models in EU
datacenters.
|
> lettergram We actually found the Mistral Small 4, quantized to
4bit was comparable to Qwen 3.6 27B and is roughly the
same size. At least from our experience on our use
cases, the quantization of the Mistral model worked
far better than trying to quantize the Qwen
family.Fully agree to your point though, Mistral in
general is far behind where I'd expect and Qwen in
particular is crushing it at the smaller
sizes.Personally, I'd consider anything 20B params and
above a "medium" model. Small being <20B and large
>100B. I think obviously we can get to the huge 1-2T
param models, but frankly the margin of accuracy
improvement for the speed hit is kinda insane (1-2%
for many metrics).
|
> > rhdunn It's all relative. For local use I'd classify it
by hardware (VRAM size) using FP8 or Q6
quantization:1. tiny <2-3B -- easily runnable on
lower-spec hardware2. small 4-8B -- runnable on
8GB GPUs3. medium 9-12B -- runnable on 12GB GPUs4.
large 13-24B -- runnable on 16GB (for the lower
end models) and 24GB GPUs5. very large 25-32GB --
runnable on 32GB GPUs6. huge >32GB -- not easily
runnable on consumer GPUs without compromising
performance (offloading layers to the CPU/RAM),
quality (heavy quantization, esp. at <= Q4), or
price (investing in multi-GPU setups and/or
server-grade hardware).You could possibly split
huge down further, as 70GB models (e.g. llama 3)
are easier to get working than >120GB models and
1TB models are completely intractable.
|
> > > sroussey As a Mac user:1. tiny <2-3B -- could run in a
browser even, mac neo2. small 4-8B -- last of
browser options, MacBook Air base3. medium
9-24B -- 32GB machine, air or pro notebook or
mini4. large 25-48B -- 64GB, pro notebook or
mini5. x-large 49-100B -- 128GB MacBook Pro or
Studio6. Huge > 100B -- 256/512GB Mac Studio
|
> > > > ElFitz > tiny <2-3B -- could run in a browser
even, mac neoOr a phone. I'm running Gemma
4 E2B in one of my apps on my 14 pro
(which may or may not be killing my
display through overheating. It might just
be a coincidence).
|
> barrell I think it really depends on what you're doing. I use
mistral for many tasks in https://phrasing.app and
they blow models many times their size out of the
water.None of my tasks use reasoning though (reasoning
actually kills the performance) so perhaps that's why.
Still, I just had to rewrite my pipeline, and mistral
was both faster, cheaper, and substantially better
than any alternative
|
> rhdunn Yeah. I run LLM models locally and for me 22B-32B is
the largest I'm willing to invest in trying out.Even
though Mistral 4 has 6B active parameters per token
(allowing 3-3.5 per token parameters to be loaded on a
4090), the ~240GB download + storage is pushing the
limits of being able to try this out locally,
especially if you are downloading and evaluating
multiple models.It also makes it harder for other
people to make downstream finetunes like with what
happened with the older Mistral/Magistral models.
|
> > wolttam I think machines like the DGX Spark are about to
become a lot more common/popular. It's big enough
to run sparse 150-250B MoEs with enough throughout
for a single user. Deepseek v4 Flash is #1 (in
terms of usage) on OpenRouter because it's good
enough to be useful. You can run it on a Spark
(though it runs better across 2, which is getting
up there in cost)
|
> chartpath I find Mistral Medium 3.5 with OpenCode is perfectly
fine if you're willing to talk to it in a more
fine-grained way about actual code. For me that's fine
because even with huge frontier models I don't like
trying to vibe prompt like a product manager.
|
> coredev_ I don't agree that they are falling behind. Using both
chat and cli I get what I need and it's comparable to
"sota" when I compare.
|
> arkh Mistral is entering the "let's extract has much money
from EU taxpayers as we can" phase of European tech
company which did not get bought by a US one.They'll
end like Dailymotion, just a zombie company.
|
> thatsadude Nawh, they trained on test since Llama 2, no wonder.
|
> dyauspitr Mistral is bad bad. For its use cases I feel like
India's Sarvam is doing better.
|
> > ctrlkctrls channeling Rocky (extraterrestrial) there I see :)
|
> kergonath > a decent proxy would be to build models that get the
r/localLlama crowd excitedI don't really disagree with
your post, but this is not exactly right. That
subreddit seems to go from hype train to hype train
every week, I haven't found anything really insightful
in it for quite a while now.
|
antirez I really want Europe to be part of the AI development and
research. And I strongly cheered for Mistral. But they are
accumulating too much technological delay. This needs to
be fixed, otherwise it will turn into yet another proof we
are not able to run large tech with good results.
Basically any Chinese lab is doing much better. It's not
Mistral that created I don't want to say DeepSeek, but
MiMo 2.5, Minimax 2.7, and so forth. There are only weaker
and/or larger and slower (no MoE) models. Not good.
|
> sbinnee When it comes to MoE, to me, I remember Mixtral model
that showed the viability of MoE for the first time. I
was impressed by their technical report. To be clear,
MoE idea was already out there, if I am not mistaken.
If they have pushed Mixtral model family further, who
knows they might have achieved the reputation of what
the current Qwen family has. A missed opportunity.
|
> GaProgMan Compared to the UK Government which recently announced
10 million GBP for AI research, which will likely be
scooped up by consultants. I think Europe is doing
fine considering.
|
> > antirez The first step would be indeed to join forces with
UK, in order to don't be two entities, which is
very unnatural to me.
|
> > > kergonath That Brexit ship sailed. It's very difficult
to do anything with the UK currently.
|
> > > gregorygoc No, we don't need US's Trojan horse in the EU
|
> > > > foo42 Interesting. Could you elaborate. As a pro
Europe Brit I'm interested to understand
this viewpoint. Is it a widely held
perspective do you know?
|
> > > > > vrganj I think that while y'all were
appreciated members and definitely had
a lot to offer, you also had a lot of
annoying carve-outs and kept stalling
needed measures to federalize and
strengthen the EU more so we can be a
proper superpower in our own
right.Maybe it's good you left for
now, maybe we can finally get these
things done. And once that's
accomplished and enough of the gammon
has died off, you can always rejoin
:-)
|
> > > > > > disgruntledphd2 The UK was a useful stalking horse
for lots of smaller countries to
push back against federalisation,
it's not just them.They also have
the obvious place for any common
EU market (which is desperately
needed). Brexit has been bad for
everyone involved.
|
> > > > > > vrganj Sure, but the smaller countries
don't have the political capital
to resist federalization long term
like the Brits did.
|
> > > > > snowpid Jumping in and most people in Germany
wouldnt see UK as an American trojan
hourse. I dont think anti American
countries like France and Danemark
have a problem with UK being in the EU
per se.I can see most people want that
UK wouldnt just get special treatment
any more.
|
> kubb > But they are accumulating too much technological
delay.How so? Catching up is easier and cheaper than
spearheading the lead.
|
> b65e8bee43c2ed0 https://en.wikipedia.org/wiki/Artificial_Intelligence_
Act#Pe...Europe shot itself in the dick with this
hastily implemented at the height of mass hysteria
bullshit and now no sane company will build anything
there. an AI startup in the US or China can be a boy
and his computer. in Europe, the boy needs a dozen
lawyers.Mistral's sinking into irrelevancy despite the
head start they had, the very promising early models
they released, and the funding they receive, might
very well be the consequence of trying to comply with
all that crap.
|
> > mhitza So let me get this straight. You think that Europe
"shot itself in the dick" by making it harder to
deploy AI that:- manipulates, including
subliminally (hope you'll like your subliminal Ads
mixed into your LLM output)- profiling for social
scoring- automated thread labeling as an
individual, with no human supervision-
facetracking databases- emotional and "well-being"
monitoring at work or in schools- + many other
kinds of surveillance tools.I hope you are
joking.edit:For context this was a snippet of
prohibited use, which the fines listed on
Wikipedia (theoretically apply to),
https://artificialintelligenceact.eu/article/5/
|
> > > disgruntledphd2 No, all of that stuff is fine (I have read the
AI act).It's all the compliance stuff that
will cause issues, particularly in non
financial services businesses.Each EU country
will end up badly implementing a national
regulator, and these will mostly be bad and
stop good stuff from happening.
|
> > Epa095 You don't compete with anthrophopic from the
basement. For that you need either a shit loads of
money, or a government which are not afraid of
getting very very involved.There is a lot of
Europeans working on AI, it's just that a lot of
them work for American companies. Because of
money.
|
> > > alecco I think both of you are correct.
|
> > antirez Possibly yes but let me remember you that France,
Italy Germany were against the AI act, so here
something very odd is happening, that the EU
funding nations are getting marginalized by the
countries they welcomed on key topics for our
future, and I believe corruption could be a big
part of what is happening, both internal to those
three countries and at an even more alarming rate
in other countries.
|