Llama2 70B (offline AI) Says stuff about the CPU scheduler Cores and ISA and I don't know how else to verify (is CPU core ISA determined by microcode and how to hack)

j4k3@lemmy.world · 11 months ago

Llama2 70B (offline AI) Says stuff about the CPU scheduler Cores and ISA and I don't know how else to verify (is CPU core ISA determined by microcode and how to hack)

luchs@feddit.de · 11 months ago

Similarly, Linux kernel developers have added support for asymmetric instruction availability in their schedulers through features like “CPU microcode updates”, which allow the OS to query a processor’s available instructions sets at runtime and make more informed decisions about thread placement as a result.

This is bullshit, the microcode is not in any way a mechanism to query any information. It’s an opaque blob provided by Intel and the system firmware. What’s happening with Alder Lake is that AVX-512 is disabled through microcode, but if you’re not Intel or a motherboard manufacturer, there is nothing you can do about that.

If your goal is to make AVX-512 usable on your computer, don’t bother with it. It will not magically make things go faster.

j4k3@lemmy.world · edit-2 11 months ago

The code running this model is specifically designed for use with these instructions. I can run larger versions than this 4 bit, but they are impractically slow. This is the exact bottleneck the AVX512 commands were designed to help alleviate. I have no problem setting up the cgroups isolation and affinity required to use the command safely with all other threads on the e-cores. I think my processor is likely fused but can’t confirm this yet.

https://github.com/zingaburga/alderlake_avx512/wiki#enabling-avx-512-on-alder-lake

∟⊔⊤∦∣≶@lemmy.nz · 11 months ago

What are you running this on?

The max I can get on 24GB 3090 is a 30B model

j4k3@lemmy.world · 11 months ago

The key is the system memory is 64GB and 12th gen i7. Details are in posts below. Top link is with the GPU and bottom is just the CPU. The model itself is very important. I tried several before getting this one working.

https://lemmy.world/post/2952867

https://lemmy.world/post/2892804

GL