I have a cluster of microservices. UI calls API1 (assuming it goes through ingress gateway, correct me if I am wrong), API1 calls API2 via RestTemplate.
The API2 process is bulky and takes roughly 1.5 minutes to complete, however there are no errors or exceptions in the process itself. For testing purposes, I called API2 directly via Bruno (whcih is set with a sufficiently large timeout value) which gives a socket hangup around 1 minute which is expected as the AWS LB connection idle timeout is 1 min. But from Chrome's network tab I see the timing of the call to succesfully complete with waiting for server time to be 1.5 minutes.
I understand istio has a default of 2 retries for connection failures, and timeout is disabled. My question is that why when calling from the UI it is successful after 1.5 minutes rather than waiting for 3 minutes and failing. Are the pods behaving in a way that I don't understand? My understanding is that after 1 minute socket gets closed and a retry kicks off, but that should also fail and kickoff the 2nd retry. Again, that should also fail after the next minute. Is the socket somehow reopened within the total time of 3x1=3 minutes and the call is successful because it takes less than 3 minutes?
P.S. I am a Junior developer who is just getting into the devops world of cluster orchestration, service mesh, etc. Any clarification is deeply appreciated.
I changed the LB connection idle timeout to a higher value and the call was successful in Bruno as expected. I put arbritrarily large sleep times in the process and altered the idle time out values to get expected results. But I don't understand the difference I am seeing in Chrome(UI->API1->API2) vs calling (Bruno->API2). I read the docs and scoured google with no satisfactory answer.