yu3zhou4 README is in my opinion (author here) the most interesting
- I wrote it to help others build useful mental model to
be able to recreate the project yourself, without need to
even read my code
|
> lukemerrick I am not super familiar with C and CUDA, so I read
solely for the README and enjoyed it supremely. The
blend of cheerful walking through instructive examples
and your philosophical takes on how to approach the
exercise to get the most out of it put me in a great
mood. You captured that special upbeat attitude that
comes about when you're doing something as well as you
can just because it's so legitimately interesting to
you.
|
> janalsncm Really practical teaching approach. I clicked in to
see how safetensors are loaded and just kept reading.
Thanks for sharing.
|
> quanglee love the details you put into to explain different
techniques. it's a bit dense though, some schemas will
help i think
|
cookiengineer Wanted to add that the author has an amazing blog with
lots of interesting papers: https://jedrzej.maczan.pl/
|
samhoss93 Great README. Genuinely one of the clearest walkthrough of
inference internals. The KV cache section is worth
lingering one as most of the OOM and throughput issues
trace back to this and normally difficult to reason about.
sequence length and batch size fill the cache in a way
that show up under real traffic.look forward to going over
the completed course.
|
dwa3592 Very nice job on read me.>>Physically, LLM is a file which
contains a lot of float numbers.aka atoms of the LLM.
|
> cyanydeez the universe is just atomic if statments
|
xuanlin314 The lesson-style README is a great approach. Breaking down
LLM inference into digestible steps makes the codebase
approachable even for people who haven't touched CUDA
before.
|
GoldenJade Thanks for sharing this. As someone currently researching
LLMs, I'm sure I'll be referencing this quite a bit going
forward.
|
tom-wal I feel like I learned twice as much in 10 minutes reading
this than I did reading LLM for Dummies. Thank you
|
nazgulsenpai I love the documentation formatted in lessons. I can't
wait to read through it.
|
juancn Looks interesting, it reminds me of the first llama.cpp,
but better documented.
|
sylware I am looking at a plain and simple C implemented LLM
inference, and/or x86_64 assembly implemented, and/or AMD
GPU RDNA assembly.Anybody?
|
> irishcoffee I heard once that c++ can become assembly at some
point if you type the right things in. :)
|
> > sylware Well, the whole purpose is to be independent of
invisible backdoor injectors...^W I mean compiler,
to be more accurate those compilers which deals
with computer languages with an absurd and
grotesque syntax complexity.
|
einpoklum It seems the author believes checking the return values of
CUDA API calls is not "tiny" enough :-(
|