Block invalid language tags#980
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #980 +/- ##
==========================================
- Coverage 65.47% 65.45% -0.02%
==========================================
Files 387 387
Lines 21704 21718 +14
Branches 2764 2769 +5
==========================================
+ Hits 14210 14215 +5
- Misses 6458 6466 +8
- Partials 1036 1037 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit reviewed 5 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Enkidu93 and pmachapman).
src/Serval/src/Serval.Translation/Features/EngineTypes/GetLanguageInfo.cs line 85 at r1 (raw file):
return NotFound(); } catch (ArgumentException)
This is a very generic exception. I don't want to accidentally mask bugs in the code. Could we use a more specific exception or catch the exception closer to where the LanguageTagService is called and indicate that it is invalid in the response?
|
I wonder how restrictive we want to make this. I'm worried about long codes we get out of Paratext projects settings files like |
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on pmachapman).
dfc4d6c to
553a981
Compare
pmachapman
left a comment
There was a problem hiding this comment.
@pmachapman made 2 comments.
Reviewable status: 1 of 5 files reviewed, 2 unresolved discussions (waiting on ddaspit and Enkidu93).
src/Machine/src/Serval.Machine.Shared/Services/LanguageTagService.cs line 5 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I wonder how restrictive we want to make this. I'm worried about long codes we get out of Paratext projects settings files like
cod-Scrip-RG-dialect. We have a number of these engines already in production. I assume if they are a true language code, we won't hit this regex in the code, but if they aren't (maybe a made-up code for a under documented language or new dialect) and they use a dialect-differentiator code (not sure what the official name is?) that has characters other than these, we could be preventing them from running a build. Do you know if you're able to only use a-z in these codes? None of the ones on production have something other than a-z. I'm thinking here of the final segment of a<LanguageIsoCode\>tag in a Settings.xml. I can't seem to find documentation for this.
I tried to make this broadly support BCP-47 (https://en.wikipedia.org/wiki/IETF_language_tag), with the primary purpose of not allowing injection, rather than ensuring a valid language code (I think the TryParse above will take care of known validity). As far as I am aware (based on the Wikipedia article) language codes are alphanumeric with underscores and dashes.
src/Serval/src/Serval.Translation/Features/EngineTypes/GetLanguageInfo.cs line 85 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This is a very generic exception. I don't want to accidentally mask bugs in the code. Could we use a more specific exception or catch the exception closer to where the
LanguageTagServiceis called and indicate that it is invalid in the response?
Done. I have changed this to an InvalidOperationException, as there is already a BadRequestExceptionFilter which will catch this and output a 400 HTTP status code.
|
Previously, pmachapman (Peter Chapman) wrote…
Right, I think it would be an odd case, but I'm just thinking of folks who might supply a yet-unregisterd/made-up language code. Then it would get past the |
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 reviewed 5 files and all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ddaspit and pmachapman).
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit reviewed 4 files and all commit messages, made 2 comments, and resolved 1 discussion.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Enkidu93).
src/Machine/src/Serval.Machine.Shared/Services/LanguageTagService.cs line 5 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Right, I think it would be an odd case, but I'm just thinking of folks who might supply a yet-unregisterd/made-up language code. Then it would get past the
TryParse. If it also had a variant specified with non-ASCII letters/numbers, we would throw an error. From the Wikipedia article, it's not clear than the variant names are completely standardized. I guess I was mainly wondering if Paratext itself allows you to enter non-ASCII letters/numbers in these settings because you just pass this tag from the Settings.xml on to Serval from SF, correct?
Paratext does not allow you to enter non-ASCII characters in a private-use subtag. All other subtags must be valid and known.
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 made 1 comment and resolved 1 discussion.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on pmachapman).
src/Machine/src/Serval.Machine.Shared/Services/LanguageTagService.cs line 5 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Paratext does not allow you to enter non-ASCII characters in a private-use subtag. All other subtags must be valid and known.
OK, perfect, thank you!
Fixes #967.
I also took the opportunity to improve unit testing and API documentation around this area.
This change is