At 11:08am on 19 December, we identified that some calls received by end users continued to ring after pickup. Upon immediate investigation, we identified a series of transient faults in the data store cache triggered a crash in the connected proxies of one of our three zones. Reinitialising the proxies enabled service to resume at approximately 11:30am (service restoration).
Further troubleshooting indicated the data store cache was not able to handle a long log messages (fault cause).
To resolve proxy crashes caused by the data store’s inability to process large log messages, we optimised the configuration by increasing the output buffer and client limits while implementing stricter memory monitoring (service resolution).