• 0 Posts
  • 45 Comments
Joined 1 year ago
cake
Cake day: June 20th, 2023

help-circle
  • I manage a stack like this, we have dedicated hardware running a steady state of backend processing, but scale into AWS if there’s a surge in realtime processing needed and we don’t have the hardware. We also had an outage in our on prem datacenter once which was expensive for us (I assume an insurance claim was made), but scaling to AWS was almost automatic, and the impact was minimal for a full datacenter outage.

    If we wanted to optimize even more, I’m sure we could scale into Azure depending on server costs when spot pricing is higher in AWS. The moral of the story is to not get too locked into any one provider and utilize some of the abstraction layers so that AWS, Azure, etc are just targets that you can shop around for by default, without having to scramble.






  • It’s my understanding that FreeIPA can federate with Active Directory, but personally I haven’t tried that myself. As for Authentik, it looks interesting but it’s the first I’ve heard of it. I also rely on FreeIPA’s certmonger implementation, so I wonder if Authentik could replace that?

    Just to understand your use case, you have users in Active Directory where you want to manage SSH keys and be able to login via SSH to linux machines?


  • If your services are not stateless, work to make them such so you can learn about scaling in the cloud, which can even be done w/ VM-based services. how much more agility using cloud vs a DC gives you

    This can’t be understated. Embracing elastic idology to remove single points of failure and decoupling stateful aspects of applications has been the biggest takeaway of being part of several migrations of services to AWS. Implementing these into your practices as you grow is a huge benefit that may is worth the cost.

    Over time, if the scale you’re operating at grows, using experience/knowledge from AWS and applying it to running services in a datacenter could be beneficial. In my experience, if you have a large, consistent, asynchronous workload which you’ve maxed out on reserved instances or savings plans, it is likely cheaper to operate on your own hardware than in the cloud (or get credits from GCP or Azure to migrate services to reduce costs). This is where avoiding vendor lock-in is key.

    have y’all factored in all the time/money spent on maintaining the server hardware, power, DC cooling, etc. too?

    For sure, this isn’t 2007 where you need to purchase servers and network equipment to start a website. For most startups and small businesses, operating in the cloud will be less expensive upfront and likely over the first 3 years. This isn’t a one size fits all approach though, and it’d be prudent to evaluate the cloud spend periodically and compare with what’d it’d cost to manage it entirely. Obviously you’d need a team competent enough to manage this, without it going to shit.





  • In my experience, I prefer to review or contribute commits which are logical changes that are compartmentalized enough that if needed, they could be reverted without impacting something completely differently. This doesn’t mean 1 commit is always the right number of commits in a PR.

    For example, if you have a feature addition which requires you to update the version of a dependency in the project, and this dependency update breaks existing code, I would have two commits, being:

    • Update dependency and fix issues because of the upgrade
    • Add new feature using new dependency

    When stepping through the commits in the PR or looking at a git blame, it’s clear which changes were needed because of the new dependency, and which were feature additions.

    Obviously this isn’t a one size fits all, but if someone submitted a PR with 12 commits of trial and error, and the overall changes are like +2 lines -3 lines, I’d ask them to clean that up before it gets merged.



  • This. For example, if you have a DNS entry for your DB and the TTL is set to 1 hour, an hour before you intend to make the changes, just lower the TTL of the record to a minute. This allows all clients to be told to only cache for a minute and to do lookups every minute. Then after an hour, make the necessary changes to the record. Within a minute of the changes, the clients should all be using the new record. Once you’ve confirmed that everything is good, you can then raise TTL to 1 hour again.

    This approach does require some more planning and two or three updates to DNS, but minimizes downtime. The reason you may need to keep TTL high is if you have thousands of clients and you know the DNS won’t be updated often. Since most providers charge per thousand or million lookups, that adds up quickly when you have thousands of clients who would be doing unnecessary lookups often. Also a larger TTL would minimize the impact of a loss of DNS servers.



  • I have a coworker who always forgets TTL is a thing, and never plans ahead. On multiple occasions they’ve moved a database, updated DNS to reflect the change, and are confused why everything is broken for 10-20 minutes.

    I really wish the first time they learned, but every once and a while they come to me to troubleshoot the same issue.




  • I think there probably can be adjustments made with more research, which at the time was unrealistic, since we were in the middle of it. For the alerts to be meaningful, they should be actionable, and receiving an alert should tell the person they should take a test. If 9 times out of 10 the alert turns up a negative covid test, then it’s not really reliable, and people won’t see use in taking them seriously (aka Alarm Fatigue). Fine tuning the parameters of the alert can improve this rate, for example requiring longer durations in close proximity and/or closer distance between devices.

    Of course dialing things too extreme would lead to “false negatives” which would be even harder to test for and validate, because of the nature of the recorded history on the device. Ideally there’d be 0 false negatives and 0 false positives, where every alert resulted in a positive covid test, and no positive covid test wasn’t prompted by an alert. This is obviously unrealistic, but finding a good balance would make the alerts reliable, and useful. Since this system is going away, it doesn’t really matter, but the principals of alerting are still important to consider in any system, especially where health & safety are involved.


  • I ended up getting a few alerts, and each time I tested negative. Then later during Omicron, I ended up getting Covid and was contacted by a contact tracer for the city. I explained to them if they give me the code for the app, I can signal that I have Covid, and they said it wasn’t worth it.

    Overall I think it was an interesting idea, and the approach was pretty clever while also maintaining privacy. Really the failure was from the municipalities being out of the loop. I’m not sure if there were studies done, but I do wonder how accurate the exposure determination was, since for me it was always false positives.


  • I’m always suspicious of apps which setup a local web server to accomplish some basic task. When Zoom did this, it was a security nightmare.

    Just based on the screenshots, DroidCamX sets up a local webserver on the phone, and then the video is accessible on the local network (for example: http://192.168.0.17:4747/video). This means anyone on the local network can access the webcam, which in an office or school setting, might be disastrous. If a coworker were in a conference room using this app, a malicious coworker could use this to spy on the meeting surreptitiously.

    However it’s implemented in the OS, a basic requirement is that there is some authentication to link the phone’s camera to the computer, and that the video is encrypted in transit, to avoid man in the middle attacks.