So it turns out that the cause was indeed a rogue change they couldn’t roll back as we had been speculating.

Weird that whatever this issue is didn’t occur in their test environment before they deployed into Production. I wonder why that is.

  • DavidDoesLemmy@aussie.zone
    link
    fedilink
    arrow-up
    35
    ·
    11 months ago

    All companies have a test environment. Some companies are lucky enough to have a separate environment for production.

  • No1@aussie.zone
    link
    fedilink
    arrow-up
    11
    ·
    edit-2
    11 months ago

    Change Manager who approved this is gonna be sweating bullets lol

    “Let’s take a look at the change request. Now, see here, this section for Contingencies and Rollback process? Why is it blank?”

    • pntha@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      11 months ago

      how else do you explain to the layman “catastrophic failure in the configuration update of core network infrastructure and its preceding, meant-to-be-soundproof, processes”

  • ji88aja88a@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    11 months ago

    This happens in my business all the time…the test FTP IP address is left in the code and shit falls apart costing us millions… They hold a PIR and then it happens again.

  • SituationCake@aussie.zone
    link
    fedilink
    arrow-up
    1
    ·
    11 months ago

    If this is how they do their routine updates, they have had an extremely lucky run so far. Inadequate understanding of what the update would/could do, inadequate testing prior to deployment, no rollback capability, no disaster recovery plan. Yeah nah, you can’t get that lucky for that long. Maybe they have cut budget or sacked the people who knew what they were doing? Let’s hope they learn from this.