• 3 Posts
  • 29 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle







  • Universal Paperclips is one of the best clicker games.

    In particular: because it isn’t a clicker game. It only starts off as one. There’s only about 2 sections IIRC that are “clicker”, the start (before auto-clippers kick in), and then the quantum computer.

    I guess you have to launch your first 20 or 30 probes at the space stage and that’s done one-click-at-a-time… but I don’t think that counts as a “clicker” game since its so few clicks in the great scheme of things. At no other point is rapid-clicking that useful.


  • I had a pretty standard linear-list scan initially. Each time the program started, I’d check the list for some values. The list of course grew each time the program started. I maximized the list size to like 2MB or something (I forget), but it was in the millions and therefore MBs range. I figured it was too small for me to care about optimization.

    I was somewhat correct, even when I simulated a full-sized list, the program booted faster than I could react, so I didn’t care.


    Later, I wrote some test code that exhaustively tested startup conditions. Instead of just running the startup once, I was running it millions of times. Suddenly I cared about startup speed, so I replaced it with a Hash Table so that my test-code would finish within 10 minutes (instead of taking a projected 3 days to exhaustively test all startup conditions).


    Honestly, I’m more impressed at the opposite. This is perhaps one of the few times I’ve actually taken the linear-list and optimized it into a hash table. Almost all other linear-lists I’ve used in the last 10 years of my professional coding life remain just that: a linear scan, with no one caring about performance. I’ve got linear-lists doing some crazy things, even with MBs of data, that no one has ever came back to me and said it needs optimization.

    Do not underestimate the power of std::vector. Its probably faster than you expect, even with O(n^2) algorithms all over the place. std::map and std::unordered_map certainly have their uses, but there’s a lot of situations where the std::vector is far, far, far easier to think about, so its my preferred solution rather than preoptimizing to std::map ahead of time.



  • How many layers does the Orange Pi Zero pcb have?

    Answer: Good luck finding out. That’s not documented. But based off of the layout and what I can see with screenshots, far more than 4 layers.


    A schematic alone is kind of worthless. Knowing if a BGA is designed for 6, 8, or 10 layers makes a big difference. Seeing a reference pcb-implementation with exactly that layer count, so the EE knows how to modify the design for themselves is key to customization. There’s all sorts of EMI and trace-length matching that needs to happen to get that CPU to DDR connection up-and-running.

    Proving that a 4-layer layout like this exists is a big deal. It means that a relative beginner can work with the SAM9x60’s DDR interface on cheap 4-layer PCBs (though as I said earlier: 6-layers offer more room and is available at OSHPark so I’d recommend a beginner work with 6 instead)


    With regards to SAM9x60D1G-I/LZB SOM vs Orange Pi Zero, the SAM9x60D1G-I/LZB SOM provides you with all remaining pins of access… 152 pins… to the SAM9x60. Meaning a full development board with full access to every feature. Its a fundamentally different purpose. The SOM is a learning-tool and development tool for customization.


  • Well, my self-deprecating humor aside, I’ve of course thought about it more deeply over my research. So I don’t want to sell it too short.

    SAM9x60 has a proper GPU (albeit 2D one), full scale Linux, and DDR2 support (easily reaching 64MB, 128MB or beyond of RAM). At $3 for DDR2 chips the cost-efficacy is absurd (https://www.digikey.com/en/products/detail/issi-integrated-silicon-solution-inc/IS43TR16640C-125JBL/11568766), a QSPI 8MBit (1MB) SRAM chip basically costs the same as 1Gbit (128MB) of RAM.

    Newhaven Displays offers various 16-bit TFT/LCD screens (https://newhavendisplay.com/tft-displays/standard-displays/) at a variety of price points. Lets take say… 400x300 pixel 16-bit screen for instance. How much RAM do you need for the framebuffer? (I dunno: this one https://newhavendisplay.com/4-3-inch-ips-480x272px-eve2-resistive-tft/ or something close).


    Oh right, 400 x 300 x 2-bytes per pixel and we’re already at 240kB, meaning the entire field of MSP430, ATMega328, ARM Cortex-M0 and even ARM Cortex-M4 are dead on the framebuffer alone. Now lets say we have a 10-frames of animation we’d want to play and bam, we’re already well beyond what a $3 QSPI SRAM chip will offer us.

    But lets look at one of the brother chips really quick: Microchip’s SAMA5D4. Though more difficult to boot up, this one comes with H.264 decoder. Forget “frames of animation”, this baby straight up supports MP4 videos on a full scale Linux platform.

    Well, maybe you want Rasp. Pi to run that, but a Rasp. Pi 4 can hit 6000mW of power consumption, far beyond the means of typical battery packs of the ~3-inch variety. Dropping the power consumption to 300mW (SAMA5D4 + DDR2 RAM) + 300mW (LCD Screen) and suddenly we’re in the realm of AAA batteries.


    So we get to the point where I can say: I can build you a 3" scale device powered by AAA batteries that runs full Linux and supports H.264 decode animations running on a Touch-screen interface, fully custom with whatever chips/whatever you want on it. Do I know what it does yet? No. Lol, I haven’t been able to figure that out yet. But… surely this is a useful base to start thinking of ideas.



  • That’s not what storage engineers mean when they say “bitrot”.

    “Bitrot”, in the scope of ZFS and BTFS means the situation where a hard-drive’s “0” gets randomly flipped to “1” (or vice versa) during storage. It is a well known problem and can happen within “months”. Especially as a 20-TB drive these days is a collection of 160 Trillion bits, there’s a high chance that at least some of those bits malfunction over a period of ~double-digit months.

    Each problem has a solution. In this case, Bitrot is “solved” by the above procedure because:

    1. Bitrot usually doesn’t happen within single-digit months. So ~6 month regular scrubs nearly guarantees that any bitrot problems you find will be limited in scope, just a few bits at the most.

    2. Filesystems like ZFS or BTFS, are designed to handle many many bits of bitrot safely.

    3. Scrubbing is a process where you read, and if necessary restore, any files where bitrot has been detected.

    Of course, if hard drives are of noticeably worse quality than expected (ex: if you do have a large number of failures in a shorter time frame), or if you’re not using the right filesystem, or if you go too long between your checks (ex: taking 25 months to scrub for bitrot instead of just 6 months), then you might lose data. But we can only plan for the “expected” kinds of bitrot. The kinds that happen within 25 months, or 50 months, or so.

    If you’ve gotten screwed by a hard drive (or SSD) that bitrots away in like 5 days or something awful (maybe someone dropped the hard drive and the head scratched a ton of the data away), then there’s nothing you can really do about that.


  • If you have a NAS, then just put iSCSI disks on the NAS, and network-share those iSCSI fake-disks to your mini-PCs.

    iSCSI is “pretend to be a hard-drive over the network”. iSCSI can exist “after” ZFS or BTRFS, meaning your scrubs / scans will fix any issues. So your mini-PC can have a small C: drive, but then be configured so that iSCSI is mostly over the D: iSCSI / Network drive.

    iSCSI is very low-level. Windows literally thinks its dealing with a (slow) hard drive over the network. As such, it works even in complex situations like Steam installations, albeit at slower network-speeds (it gotta talk to the NAS before the data comes in) rather than faster direct connection to hard drive (or SSD) speeds.


    Bitrot is a solved problem. It is solved by using bitrot-resilient filesystems with regular scans / scrubs. You build everything on top of solved problems, so that you never have to worry about the problem ever again.



  • Wait, what’s wrong with issuing “ZFS Scan” every 3 to 6 months or so? If it detects bitrot, it immediately fixes it. As long as the bitrot wasn’t too much, most of your data should be fixed. EDIT: I’m a dumb-dumb. The term was “ZFS scrub”, not scan.

    If you’re playing with multiple computers, “choosing” one to be a NAS and being extremely careful with its data that its storing makes sense. Regularly scanning all files and attempting repairs (which is just a few clicks with most NAS software) is incredibly easy, and probably could be automated.


  • dragontamer@lemmy.worldtoLinux@lemmy.ml*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Honestly, Docker is solving the problems in a lot of practice.

    Its kinda stupid that so many dependencies need to be kept track of that its easier to spin up a (vm-like) environment to run Linux binaries more properly. But… it does work. With a bit more spit-shine / polish, Docker is probably the way to move forward on this issue.

    But Docker is just not there yet. Because Linux is Open Source, there’s no license-penalties to just carrying an entire Linux-distro with your binaries all over the place. And who cares if a binary is like 4GB? Does it work? Yes. Does it work across many versions of Linux? Yes (for… the right versions of Linux with the right compilation versions of Docker, but yes!! It works).

    Get those Dockers a bit more long-term stability and compatibility, and I think we’re going somewhere with that. Hard Drives these days are $150 for 12TB and SSDs are $80 for 2TB, we can afford to store those fat binaries, as inefficient as it all feels.


    I did have a throw-away line with MUSL problems, but honestly, we’ve already to incredibly fat dockers laying around everywhere. Why are the OSS guys trying to save like 100MB here and there when no one actually cares? Just run glibc, stop adding incompatibilities for honestly, tiny amounts of space savings.


  • dragontamer@lemmy.worldtoLinux@lemmy.ml*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    5
    ·
    edit-2
    1 year ago

    Because it isn’t inferior.

    Ubuntu barely can run programs from 5 years ago, backwards compatibility is terrible. Red Hat was doing good but it just shit the bed. To have any degree of consistenty, you need to wrap all apps inside of a Docker and carry with you all the dependencies (but this leads you to obscure musl bugs in practice, because musl has different bugs than glibc).

    For better or worse, Windows programs with dependency on kernel32.dll (at the C++ level) have remained consistently deployed since the early 1990s and rarely break. C# programs have had good measures of stability. DirectX9, DirectX10, DirectX11, and DirectX12 all had major changes to how the hardware works and yet all the hardware automatically functions on Windows. You can play Starcraft from 1998 without any problems despite it being a DirectX6 game.

    Switch back over to Ubuntu land, and Wayland is… maybe working? Eventually? Good luck reaching back to programs using X.org dependencies or systemd.


    Windows is definitely a better experience than Ubuntu. I think Red Hat has the right idea but IBM is seemingly killing all good will built up to Red Hat and CentOS. SUSE linux is probably our best bet moving forward as a platform that cares about binary stability.

    Windows networking stack is also far superior for organizations. SAMBA on Linux works best if you have… a Windows Server instance holding the group-policies and ACLs on a centralized server. Yes, $1000 software works better than $0 software. Windows Server is expensive but its what organizations need to handle ~50 to ~1000 computers inside of a typical office building.

    Good luck deploying basic security measures in an IT department with Linux. The only hope, in my experience, is to buy Windows Server, and then run SAMBA (and deal with SAMBA bugs as appropriate). I’m not sure if I ever got a Linux-as-Windows-server ever working well. Its not like Linux development community understands what an ACL is in practice.


  • Professor Lemire btw, is a high-performance professor who has been doing a lot of AVX512 techniques / articles for the past few years. His blogposts are very popular on Hacker News (news.ycombinator.com). Pretty cool guy, I think its well worth it to follow his blog if you’re into low-level assembly, low-level memory optimizations and the like.


    pext (and the reverse, pdep) are basically a 64-bit bitwise gather and 64-bit bitwise scatter instruction. On Intel, they execute in 1-tick, but on AMD they execute on 19-ticks (at least, a few years ago). Rumor is that the newest AMD chips are faster at it.

    pdep and pext are some of my favorite functions, because gather/scatter is an important supercomputer / parallelism concept, and Intel invented an extremely elegant way to describe bit-movement in 64-bit registers. Given the huge importance of gather/scatter is to supercomputer algorithms of the past 40 years, I expect many, many more applications of pdep/pext.

    My own experiments with pdep and pext was to create a small-sized bit-scale relational database for solving 4-coloring theorem (like) problems. I was able to implement “select” with a pext, and “joins” as a pdep. (4-bits is a single-column table. 16-bits for a dual-column table. 64-bits for a triple-column table).


  • Its not so easy.

    GPU-programmers are the expert in AoS vs SoA formats. And when you look at how RGB values are stored, its… incredibly complex. Sometimes you’ve got RRRRGGGGBBBB, sometimes its RGBARGBARGBA, sometimes its YYYYUUVV. What’s best for performance changes dramatically on system-to-system, requiring lots of benchmarking and ultimately… a massive slew of processor-specific / ARM NEON instructions that convert between every format imaginable.

    On right, GPUs don’t need that processor-specific instruction because permute and bpermute instructions exist (32-way crossbar any data-to-any-lane movement, and vice versa any lane pulling from any data, permute and bpermute respectively). CPUs do need it though.