• 1 Post
  • 105 Comments
Joined 1 year ago
cake
Cake day: July 20th, 2023

help-circle

  • My PC with 4GB of RAM and an HDD is barely holding with linux mint. Tbf Mint isn’t the problem, it only takes 32% of ram compared to the 60+% of a debloated windows 10. It’s the other apps. Running a browser along anything else and Linux mint starts to struggle, even the built-in apps like the file manager and the text editor feel like they’re gonna crash the computer at any moment because of the random freezing/ delays.

    My advice would be to try upgrading to 4GB and installing an SSD. Your old computer will likely only support SATA SSDs, which have a max speed of 500MB/s, but it’s far better than the 30MB/s at best that the HDD disks give.








  • Completely forgot to tell you to only use quantized models. Your pc can run 4bit quantized versions of the models I mentioned. That’s the key for running llms on at consumer level hardware. You can later read further about the different quantizations and toy with other ones like Q5_K_M and such.

    Just read phi-3 got released and apparently it’s a 4B that reach gpt 3.5 level. Follow the news and wait for it to be add to ollama/llama.ccp

    Thank you so much for taking the time to help me with that! I’m very new to the whole LLM things, and sorta figuring it out as I go

    I became fascinated with llms after the first AI booms but all this knowledge is basically useless where I live, so might as well make it useful by teaching people what i know.



  • Yeah, it’s not a potato but not that powerful eaither. Nonetheless, it should run a 7b/8b/9b and maybe 13b models easily.

    running them in Python with Huggingface’s Transformers library (from local models

    That’s your problem right here. Python is great for making llms but is horrible at running them. With a computer as weak as yours, every bit of performance counts.

    Just try ollama or llama.ccp . Their github is also a goldmine for other projects you could try.

    Llama.ccp can partially run the model on the gpu for way faster inference.

    Piper is a pretty decent very lightweight tts engine that can be directly run on your cpu if you want to add tts capabilities to your setup.

    Good luck and happy tinkering!




  • Sadly, can’t really help you much. I have a potato pc and the biggest model I ran on it was Microsoft phi-2 using the candle framework. I used to tinker with Llama.cpp on colab, but it seems they don’t handle llama3 yet. ollama says it does , but I’ve never tried it before. For the speed, It’s kinda expected for a 70b model to be really slow on the CPU. How much slow is too slow ? I don’t really know…

    You can always try the 8b model. People says it’s really great and even replaced the 70b models they’ve been using.