• 2 Posts
  • 28 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle

  • This is about thew new starter cost.

    When a developer joins a team, they will not be as productive as they have to learn the code, frameworks, libraries, the project purpose, the tooling, etc… Often this impacts other members of the team lowering the entire teams productivity.

    When you use productivity tracking (e.g. things like capacity planning) you will see the teams performance drop and it will take time for it to exceed the previous measured performance. This is the cost of adding a new starter.

    So if it takes 6 weeks for a new starter to increase overall team producitivty then planning someone on a project for 4 weeks is pointless since the team will have a higher delivery rate without the extra person. This is typically why an organsation loses its ability to migrate staff between projects.

    Code formating affects the layout of the code and our brains do all sorts of tricks around pattern recognition, so if your code formatting rules are too different a someone migrating between projects has to spend time looking for code and retraining their brain.

    Its an additional barrier and a one within an organisations skills to remove (by forcing a common code standard).



  • Python is unique in formatting forms part of the syntax, every language has linters but its far more common for orgs to tweak the default rules .

    For example Java has Checkstyle. The default rules ‘sun checks’ give a line length of 80, tabs are 4 spaces and everything is placed on a new line.

    Junior devs inevitably want to trash the line length (honestly on 1080p monitors, 120 makes sense,).

    There is always a new line/same line discussion (everyone perfers same line but there is always one die hard new line person).

    The tab width discussion always has one junior dev complain that “tabs are better”, as someone who started development on Visual Studio 6 where half the team double spaced, the other half used tabs. Those people get a lecture from me on how we can convert tabs to spaces but not the inverse so it will always be spaces if I am near.

    With Checkstyle you upload the rule file as an artifact into your M2 repository. Then you can pull it down as a dependency when the checkstyle plugin runs.


  • I avoid any company that requires a software test before the interview.

    I worked for a company that introduced them after I joined, I collected evidence all of the companies top performers wouldn’t have joined since we all had multiple offers and having to do the test would put people off applying. The scores from it didn’t correlate with interview results so it was being ignored by everyone. Still took 2 years to get rid of it.

    The best place used STAR (Situation Task Action Result) based interviews. The goal was to ask questions until you got 2 stars.

    I thought these were great because it was more varied and conversational but there was a comparable consistency accross interviewers.

    You would inevitably get references to past work and you switch to asking a few questions about that. Since it was around a situation you would get more complete technical explanations (e.g. on that project I wrote an X and Y was really challenging because of Z).

    I loved asking “Tell me about something your really proud off”. Even a nervous junior would start opening up after that question.

    After an hour interview you would end up with enough information you could compare them against the company gradings (junior, senior, etc…).

    This was important because it changed the attitude of the interview. It wasn’t a case of if the candidate would be a good senior dev for project X, but an assessment of the candidate. If they came out as a lead and we had a lead role, lets offer them that.



  • Nvidia drivers don’t tend to be as performant under linux.

    With AMD instead of using the AMD VLK driver, you would use the RADV (developed largely by valve). Which petforms better.

    Every AMD card under linux supports OpenCL (the driver is more based on graphics card architecture) and you install it very easily. Googling it with windows found pages of errors and missing support.

    Blender supports OpenCL. I bet the 2x improvement is Blender being able to ofload rendering to the AMD graphics card.

    Also this represents the biggest headache in Linux, lots of gamers insist they can only use Nvidia cards. Nvidia treats linux as an afterthought as best or deliberately sabotages things at worse.

    AMD embraced open source and so Linux land is much nicer on AMD (and to a less extent Intel).

    The results here will probably be a DxVK quirk, lots of “Nvidia optimised” games have game engines doing weird things and the Nvidia driver compensates. DxVK has been identifying that to produce “good” vulkan calls.


  • Python’s public API changes subtly, so minor changes in Python version can lead to massive changes in the version of dependencies you use.

    A few years ago we developed a script to update Cassandra on Python 2.7.Y. Production environment used Python 2.7.X (it was 5 patch releases earlier).

    This completely changed the cassandra library version. We had to go back 15 patch releases which annoying resulting in a breaking change in the Cassandra libraries API and wouldn’t work on the dev environments Python.

    Python 3 hasn’t solved this, 2 years ago I was asked to look at a number of Machine Learning projects running in docker. Upgrading Python from 3.4 to 3.8 had a huge effect on dependencies and figuring out the right combination was a huge pain.

    This is a solved problem in Java, Node.js has the same weakness but their changes to language spec are additive so old code runs on new releases (just not the inverse). Ruby has exactly the same issues as Python


  • This advice isn’t grounded in reality.

    Management normally defines ways to track and judge itself, these are typically called Key Performance Indicators.

    KPI’s are normally things like contract value growth, new contracts signed, profit margin, etc…

    So if the project manager is meeting or exceeding their KPI’s and you walk up to their boss telling them the PM is failing as basic job functions, the boss won’t care.

    This is because the boss might have set the KPI’s or the boss might also be judged on them. In either situation its to the bosses advantage to ignore you.

    The boss will only care if there is a KPI you can demonstrate the PM failing to meet.

    Every person/group will have various incentives and motivations. To affect change you have to understand what they are.


  • A project manager has responsibility for delivery of a project but they typically lack domain specific knowledge. As a result they can’t directly deliver something, merely ask subject matter experts for advice and facilitate a team to deliver.

    Most PM’s cope with the stress of this position poorly.

    This cartoon is an example of micro management (a common coping mechanisim), the manager has involved themselves in the low level decisions because that gives a sense of control. If a technical team then tell them its a bad decison the team are effectively attacking their coping mechanisim.

    The solution isn’t to tell them their technical idea is terrible, when you’ve fallen down this rabbit hole you have to treat the PM as a stakeholder. They are someone you have to manage, so a common solution is to give them confidence there is a path to delivery, a way to track and understand it.


  • Do not mix tabs and spaces.

    Its impossible to automate checking that tabs were only used for indentation and spacing for precise alignment. So you then take on a burden of manually checking

    You end up with the issue where someone didn’t realise and space idented or anouther person used tabs for precise alignment and people forget to check the whitespace characters in review and it ends up going inconsistent and becoming a huge pile of technical debt to fix.

    Use only one, you can automate enforcement and ensure the code renders consistency.



  • Years ago there was no way to share IDE settings between developers.

    You ended up with some developers choosing a tab width of 2 spaces, some choosing 4 spaces and as there was no linting enforcement some people using 2-4 spaces depending on their IDE settings.

    This resulted in an unreadable mess as stuff was idented to all sorts of random levels.

    It doesn’t matter if you use tabs or spaces as long as only one type is consistently used within a project.

    Spaces tends to win because inevitably there are times you need to use spaces and so its difficult to ensure a project only uses tabs for identation.

    IDE’s support converting tabs into spaces based on tab width and code formatting will ensure correct indentation. You can now have centralised IDE settings so everyone gets the same setup.

    Honestly 99% of people don’t care about formatting (they only care when consistency isn’t enforced and code is hard to read), there is always one person who wants a 60 charracter line width or only tabs or double new lined parathensis. Who then sucks up huge amounts of the team time arguing their thing is a must while they code in emacs, unlike the rest of the team using an actual ide.


  • I am actually arguing for a stable ABI.

    The few times I have had to compile out of tree drivers for the linux kernel its usually failed because the ABI has changed.

    Each time I have looked into it, I found code churn, e.g. changing an enum to a char (or the other way) or messing with the parameter order.

    If I was empire of the world, the linux kernel would be built using conan.io, with device trees pulling down drivers as dependencies.

    The Linux ABI Headers would move out into their own seperately managed project. Which is released and managed at its own rate. Subsystem maintainers would have to raise pull requests to change the ABI and changing a parameter from enum to char because you prefer chars wouldn’t be good enough.

    Each subsystem would be its own “project” and with a logical repository structure (e.g. intel and amd gpu drivers don’t share code so why would they be in the same repo?) And built against the appropriate ABI version with each repository released at its own rate.

    Unsupported drivers would then be forked into their own repositories. This simplifies depreciation since its external to the supported drivers and doesn’t need to be refactored or maintained. If distributions can build them and want to include the driver they can.

    Linus job would be to maintain the core kernel, device trees and ABI projects and provide a bill of materials for a selection of linux kernel/abi/drivers version which are supported.

    Lastly since every driver is a descrete buildable component, it would make it far easier for distributions to check if the driver is compatible (e.g. change a dependency version and build) with the kernel ABI they are using and provide new drivers with the build.

    None of this will ever happen. C/C++ developers loath dependency management and people can ve stringly attached to mono repos for some reason.


  • The linux kernel is very old school in how it is run and originally a big part of the DevSecOps movement was removing a lot of manual overhead.

    Moving on to something like Gitea (codeberg) would give you a better diff view and is quicker/easier than posting a patch to a mailing list.

    The branching model of the kernel is something people write up on paper that looks great (much like Gitflow) but is really time consuming to manage. Moving to feature branch workflow and creating a release branches as part of the release process allows a ton of things to be automated and simplified.

    Similarly file systems aren’t really device specific, so you could build system tests for them for benchmarking and standard use cases.

    Setting up a CI to perform smoke testing and linting, is fairly standard.

    Its really easy to setup a CI to trigger when a new branch/pr is created/updated, this means review becomes reduced to checking business logic which makes reviews really quick and easy.

    Similarly moving on to a decent issue tracker, Jira’s support for Epic’s/stories/tasks/capabilities and its linking ability is a huge simplifier for long term planning.

    You can do things like define OKR’s and then attach Epics to them and Stories/tasks to epics which lets you track progress to goals.

    You can use issues the way the linux community currently uses mailing lists.

    Combined with a Kanban board for tracking, progress of tickets. You remove a ton of pain.

    Although open source issue trackers are missing the key productivity enablers of Jira, which makes these improvements hard to realise.

    The issue is people, the linux kernel maintainers have been working one way for decades. Getting them to adopt new tools will be heavily resisted, same with changing how they work.

    Its like everyone outside, knows a breaking the ABI definition from the sub system implementation would create a far more stable ABI which would solve a bunch of issues and allow change when needed, except no one in the kernel will entertain the idea.


  • During the pandemic I had some unoccupied python graduates I wanted to teach data engineering to.

    Initially I had them implement REST wrappers around Apache OpenNLP and SpaCy and then compare the results of random data sets (project Gutenberg, sharepoint, etc…).

    I ended up stealing a grad data scientist because we couldn’t find a difference (while there was a difference in confidence, the actual matches were identical).

    SpaCy required 1vCPU and 12GiB of RAM to produce the same result as OpenNLP that was running on 0.5 vCPU and 4.5 GiB of RAM.

    2 grads were assigned a Spring Boot/Camel/OpenNLP stack and 2 a Spacy/Flask application. It took both groups 4 weeks to get a working result.

    The team slowly acquired lockdown staff so I introduced Minio/RabbitMQ/Nifi/Hadoop/Express/React and then different file types (not raw UTF-8, but what about doc, pdf, etc…) for NLP pipelines. They built a fairly complex NLP processing system with a data exploration UI.

    I figured I had a group to help me figure out Python best approach in the space, but Python limitations just lead to stuff like needing a Kubernetes volume to host data.

    Conversely none of the data scientists we acquired were willing to code in anything but Python.

    I tried arguing in my company of the time there was a huge unsolved bit of market there (e.g. MLOP’s)

    Alas unless you can show profit on the first customer no business would invest. Which is why I am trying to start a business.



  • Natural scrolling is the first thing I disable when forced to use a Mac, windows, gnome, kde, xfce, etc… all scroll in one direction.

    Macos has a unique keyboard and a lot of unique non obvious and non discoverable behaviour. For example I use a lot of windows laptops, left and right click involve pushing the trackpad downon the left or right. Someone had to show me right click on a Macbook was a two finger touch. These deliberate non standard behaviours make switching devices really annoying.

    I would argue KDE defaults should follow the most common behaviour across multiple platforms, with the option to implement specific quirks.

    The move to default double click brings the KDE default into alignment with other platforms (single click isn’t the default anywhere else).

    I would suggest a bigsur global theme that implements macos keyboard shortcuts, mouse actions, etc… would be a better compromise.


  • Maven has unit and integration test phases and there are a multitude of plugins designed to hook into those phases but there are constraints by design.

    Trying to hook everything into the build management system is a source of technical debt, your using a tool for something it wasn’t designed.

    I would look at what makes sense within the build management system and what makes sense in a CI pipeline.

    CI tools have different DSL and usually provide a means to manage environments. Certain integration and system level tests are best performed there.

    For instance I keep system tests as a seperate managed project. The project can be executed from developer machines for local builds but I also create a small build pipeline to build the project, deploy it and run the system tests against it triggered by pull requests.

    This is why I say the build management system doesn’t really change, because you should treat everything as descrete standalone components.

    The Parent POM gets updates once every six months, the basic build verification CI pipeline only changes to the latest language release, etc…

    Projects which try to embed gitflow into a pom or integrate CD into the gradle file are the unbuildable messes I get asked to fix.


  • Maven has a high learning curve, but once learned it is incredibly simple to use.

    That high bar is created by the tool configuration. You can change and hack everything, but you have to understand how Maven works to do so. This generally blocks people from doing really stupid things, because you have to learn how maven works to successfully modify it and in doing so you learn why you shouldn’t.

    This is the exact weakness of Gradle, the barrier for modification is far lower and the tool is far less rigid. So you get lots of people who are still learning implement all sorts of weird and terrible practice.

    The end result is I can usually dust off someone elses old maven project and it will build immediately using “mvn clean install”, about half the gradle projects I have been brought in on won’t without reverse engineering effort because they have things hard coded all over them. A not small percentage are so mangled they can’t be built without the dev who wrote it’s machine.

    Also you really shouldn’t be tinkering with your build pipelines that much. Initial constraints determine the initial solution, then periodically you review them to improve. DevSecOps exists to speed development and ease support it isn’t a goal in of itself