Skip to content

Commit 2f92c38

Browse files
committed
添加数字+英文的识别规则,比如四a级景区->4a级景区
1 parent 64761cb commit 2f92c38

3 files changed

Lines changed: 10 additions & 0 deletions

File tree

itn/chinese/rules/cardinal.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,5 +167,15 @@ def build_tagger(self):
167167
cardinal |= add_weight(number, 0.1)
168168
else:
169169
cardinal |= add_weight(number_exclude_0_to_9, 0.1)
170+
171+
172+
# 5. 添加"中文数字+英文字母"的规则,如"四a" -> "4a"
173+
# 匹配一个或多个英文字母(大小写)
174+
from pynini import union
175+
english_letters = union(*[accep(c) for c in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"])
176+
# 数字+字母的组合,如"四a" -> "4a"
177+
number_with_letter = number + english_letters.plus
178+
cardinal |= add_weight(number_with_letter, 0.05) # 使用较高优先级
179+
170180
tagger = insert('value: "') + cardinal + (insert(" ") + cardinal).star + insert('"')
171181
self.tagger = self.add_tokens(tagger)

itn/zh_itn_tagger.fst

1.24 MB
Binary file not shown.

itn/zh_itn_verbalizer.fst

116 KB
Binary file not shown.

0 commit comments

Comments
 (0)