fix: allow capacity http errors to requeue#1326
Conversation
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
…detect transient HTTP error Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
|
For the open question, I did a brief check on the flow and it might require some changes to do so as the loop would end as it is right now given the error before it gets to update the (scheduling policy snapshot) status (i.e., if the plugins fail, the scheduling cycle ends right away with an error). We either have to update the status as part of the error handling logic, or refactor the flow so that status update would always happen despite errors at the end of each scheduling cycle. An alternative would be to emit an event, which is slightly easier to do. I agree 100% that it is better UX to let user know about the errors of course -> Britania, if you don't mind me asking, which way do you think works best? |
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
|
I think at the moment emitting an event is good. I will add that. If we decide to update the status, we can do that at a later time probably as we get more usage. |
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
michaelawyu
left a comment
There was a problem hiding this comment.
Just a naming nit; otherwise LGTM
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
Signed-off-by: Britania Rodriguez Reyes <britaniar@microsoft.com>
Description of your changes
When a querying the vm size recommender client, there may be times the client can run into transient client errors that will resolve on retry. At the moment the scheduler only runs once and sees all errors as unexpected. Now the possibility of transient http errors is possible with the addition of using azure property checker as we query an azure client. Therefore, these should not be considered unexpected, and the scheduler should be able to requeue on these errors.
Fixes #
I have:
Updated the scheduler to requeue on http errors.
Simplified http error logging to be one line for easier debugging.
Run
make reviewableto ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer
Simplified logging:
With this change the scheduler would be pending until the errors return other errors/status codes that we don't deem transient while it retries.
Open Question: Do we want to keep the status as not complete/pending?