Refactor parseArtSci function and add tests for newly introduced parseDepartmentList#1725
Refactor parseArtSci function and add tests for newly introduced parseDepartmentList#1725r-weng wants to merge 15 commits into
Conversation
Coverage Report for CI Build 696Coverage increased (+0.4%) to 58.964%Details
Uncovered Changes
Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
david-yz-liu
left a comment
There was a problem hiding this comment.
@r-weng This is good, but please write tests for this new function
| ### 🔧 Internal changes | ||
|
|
||
| - Refactor `parseArtSci` function in `app/WebParsing/ArtSciParser.hs` by introducing `parseDepartmentList` | ||
| - Add test cases for the `parseDepartmentList` function in `backend-test/WebParsing/ArtSciParserTests.hs` |
There was a problem hiding this comment.
It isn't necessary to add a second entry describing test cases added (this is implied as part of the first change)
| "Research Opportunity/Research Excursions (299/398/399)"] | ||
| bodyTags <- httpBodyTags url | ||
| let deptList = getDeptList bodyTags | ||
| let cleaned = map (BF.second (T.replace "\160" " ")) deptList |
There was a problem hiding this comment.
First, move this into getDeptList (i.e., we don't need to separate out the "cleaning" step). That function currently only calls T.strip, so it can be simplified by using getDeptList anyways.
Second, rather than do this replacing, please use the existing cleanText. That said, replacing "\160" with " " is better than replacing it with "", so you can also go ahead and modify cleanText itself. (See here for a reference for the '\160' char.)
| bodyTags <- httpBodyTags url | ||
| let deptList = getDeptList bodyTags | ||
| let cleaned = map (BF.second (T.replace "\160" " ")) deptList | ||
| return $ filter (\(deptPage, deptName) -> "/" `T.isPrefixOf` deptPage && deptName `notElem` ignoredDepts && not (" College)" `T.isSuffixOf` deptName)) cleaned |
There was a problem hiding this comment.
As was mentioned at our meeting yesterday, this filtering is a great candidate for a local helper function. The idiomatic way to define this is to use a where block in this function:
return filterDepartments deptList
where
filterDepartments :: [(T.Text, T.Text)] -> [(T.Text, T.Text)]
...| @@ -1,11 +1,12 @@ | |||
| module WebParsing.ArtSciParser | |||
| (parseCalendar, getDeptList) where | |||
| (parseCalendar, getDeptList, parseDepartmentList) where | |||
There was a problem hiding this comment.
While parseDepartmentList is good to be exported for testing purposes, getDeptList isn't used anywhere else in the codebase so we can remove it from this export list.
… refactor-parseArtSci-function
Proposed Changes
This pull request refactors the
parseArtScifunction inapp/WebParsing/ArtSciParser.hsby introducing a new function,parseDepartmentList, to handle parsing of department lists.parseDepartmentListnow ignores department pages with no courses, duplicate courses, and program areas belonging to a college. Additionally, this pull request adds test for the newly introducedparseDepartmentListfunction.Type of Change
(Write an
Xor a brief description next to the type or types that best describe your changes.)Checklist
(Complete each of the following items for your pull request. Indicate that you have completed an item by changing the
[ ]into a[x]in the raw text, or by clicking on the checkbox in the rendered description on GitHub.)Before opening your pull request:
After opening your pull request:
Questions and Comments
(Include any questions or comments you have regarding your changes.)