Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible types in benchmarks.word_tokenization #1030

Open
bact opened this issue Dec 15, 2024 · 0 comments
Open

Incompatible types in benchmarks.word_tokenization #1030

bact opened this issue Dec 15, 2024 · 0 comments
Labels
bug bugs in the library help wanted no contributor yet

Comments

@bact
Copy link
Member

bact commented Dec 15, 2024

Description

MyPy reports a bunch of typing issues in pythainlp/benchmarks/word_tokenization.py

Expected results

  • All functions have explicit type hinting information
  • No typing incompatible issues

Current results

ref_sample in these two lines for examples, are seen as str and should not have shape attribute.

c_pos_pred = c_pos_pred[c_pos_pred < ref_sample.shape[0]]
c_neg_pred = c_neg_pred[c_neg_pred < ref_sample.shape[0]]

But it looks like from _binary_representation function, it may has a type of ND array.

However, the _binary_representation type hints and docstring said they are str:

def _binary_representation(txt: str, verbose: bool = False):
"""
Transform text into {0, 1} sequence.
where (1) indicates that the corresponding character is the beginning of
a word. For example, ผม|ไม่|ชอบ|กิน|ผัก -> 10100...
:param str txt: input text that we want to transform
:param bool verbose: for debugging purposes
:return: {0, 1} sequence
:rtype: str
"""
chars = np.array(list(txt))

So there're confusions here to be fixed.

Steps to reproduce

Use MyPy to check the code

PyThaiNLP version

5

Python version

any

Operating system and version

any

More info

No response

Possible solution

No response

Files

No response

@bact bact added this to PyThaiNLP Dec 15, 2024
@bact bact added bug bugs in the library help wanted no contributor yet labels Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs in the library help wanted no contributor yet
Projects
Status: No status
Development

No branches or pull requests

1 participant