The KGMN550 machine has suffered a catastrophic outage. Neither of the controllers will transmit audio, and we have completely lost internet connectivity and any remote access to the machine. We are working with our site manager to coordinate a trip to the tower site now, and we will keep you updated in this post as we know more information.
Update 1: [Dec 10, 2024 5:32PM] Wizard (WQXC747) has a repeater on 462.650 that is open to licensed GMRS users. If you would like information on how to set it up in your radio, I will be happy to assist. Feel free to use the contact form on the website, or send a DM to Cody (WRVY789). Another option is 462.725Mhz Simplex, the local Kingman Krowd simplex channel.
Update 2: [Dec 11, 2024, 10:05 AM] The repeater was confirmed to be transmitting audio again. WRVY789 disabled the damaged controller and swapped to the backup controller. Internet connection is still unavailable, so we will still be making a trip to the site to investigate the issue. For now the repeater is online and operational. NOTE: You may hear an operator on the air asking to clear traffic, please do so until you hear the all clear to resume normal traffic.
Update 3: [Dec 12, 2024, 7:16 AM] We have scheduled a trip to the tower site for diagnostics and repair on Sunday, December 15th. We do not have a planned maintenance time, but an announcement will be made before we shutdown the hardware over the air and on the website top banner. We appreciate your patience and understanding.
Update 4: [Dec 15, 2024, 12:06 PM] We are heading to the repeater site for maintenance, we will make an announcement before power down. Estimated maintenance window is 3-5 hours. We appreciate your patience and understanding.
Update 5: [Dec 16, 2024, 3:24 PM] All systems are now operational. We are still digging through the logs to get a better understanding of the issue that occurred. However, we believe this was caused by a power fault. Not all systems were fully offlined, giving us logs to compare between the states of the devices.
It appears the power fault occurred at 258am on December 10th, knocking our live stream node offline completely, and causing memory issues with our primary repeater controller. We are unsure if a second incident occurred, or if the modem suffered the same issue as the primary controller at a much later time. The modem logs showed the device moved into a "connecting to network" state from December 10th @ 913AM until it was rebooted on the 15th.
The controller was able to fully recover on its own 24H later but was disabled in the event the same error occurred. The memory issue caused all audio into the controller to exit out of port 2 only, and since the live stream server, and remote access server were offline, we were unable to determine the issue at the time of the outage. We have added a reboot routine into the controller to ensure that it can recover on its own after an error.
Proposed Resolution
We are working on a plan to increase the reliability of our repeater. Although 24H of down time in 5 months is fairly good(99.33%), we would ideally like to increase that to 99.9% uptime, or better. Listed below are the changes we are discussing with the repeater setup.
- Replace the primary controller with one that does not rely on an operating system
- We are looking at a few options for this that have a similar feature set as the controller we currently use. We are also looking for a controller that can activate relays via a remote base, this will allow us to cycle power on multiple devices, saving us a trip to the repeater site.
- Add 12v/24v power system in addition to the 120v UPS power system
- This should eliminate any problems from surges or sags that our UPS is not able to handle.
- Upgrade to a more robust LTE modem for remote access
- We are currently using a modem provided by the carrier, though it is a business class modem, a more robust model from Cisco or Ubiquity may increase reliability and ensure we can continue to control and troubleshoot the repeater remotely.
- We will upgrade this down the road to a P2P link between the tower site and a business in Kingman, giving us a connection that does not require internet to function. For now, a more robust LTE modem will suit our needs.
Outage Timeline
Dec 10, 2024, 02:58 AM |
LIVE STREAM DISCONECTED |
Dec 10, 2024, 09:13 AM |
INTERNET CONNECTION DROPPED |
Dec 10, 2024, 01:27 PM | WSAX507 ALERTED WRVY789 OPERATORS TO THE ISSUE |
Dec 11, 2024, 09:30 AM (Aprox) | CONFIRMED AUDIO FROM THE REPEATER, CONTROLLER BACK ONLINE |
Dec 11, 2024, 09:49 AM | BROKEN CONTROLLER DISABLED/BACKUP CONTROLLER ONLINE |
Dec 15, 2024, 02:00 PM | SITE MANAGER AND OPERATOR ARRIVED AT THE REPEATER SITE |
Dec 15, 2024, 02:32 PM | LOGS WERE SAVED FROM ALL DEVICES BEFORE THE SYSTEMS WERE REBOOTED |
Dec 15, 2024, 03:35 PM | ALL SYSTEMS ONLINE |