Figure - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
The construction of a character dataset is an important part of the research on document analysis and recognition of historical Tibetan documents. The results of character segmentation research in the previous stage are presented by coloring the characters with different color values. On this basis, the characters are annotated, and the character i...
Context in source publication
Context 1
... Tibetan has only top-down superposition, up to 7 layers at most. In the Unicode encoding scheme of Tibetan [18], the encoding length of the same character is inconsistent (Table 1). Therefore, the annotated character text needs to be processed with a unified coding method, that is, the way that all annotated text is unified into the least number of coding units. ...