Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Line Break Strategy for Languages Without Spaces #496

Open
sjm1327605995 opened this issue Nov 20, 2024 · 3 comments
Open

New Line Break Strategy for Languages Without Spaces #496

sjm1327605995 opened this issue Nov 20, 2024 · 3 comments
Labels
discussion hard to solve Not food for newcomers in analysis Analyzing if should be implemented v2 will be solved in v2

Comments

@sjm1327605995
Copy link

Feature Request: New Line Break Strategy for Languages Without Spaces

Summary

Currently, the library provides two line break strategies:

  1. EmptySpaceStrategy: This strategy counts the length of words to create a new line and is designed for languages that use spaces to divide words.
  2. DashStrategy: This strategy counts the length for a set of characters and applies a dash at the end of the line. It is useful for languages that do not use spaces between words.

However, there are languages, such as Chinese, where characters do not use spaces or special characters to separate them. As a result, the existing strategies may not be suitable for handling line breaks effectively in these languages.

Proposal

I propose the addition of a new line break strategy that specifically caters to languages like Chinese, where:

  • Characters are standalone and do not require special characters for separation.
  • The line break logic should be based on character count or other relevant metrics that respect the linguistic structure of such languages.

Example

For instance, consider the following Chinese text:
With the proposed strategy, the text could be broken into lines like this:

老头儿不忍心朝船边的死鱼多看一眼,它已经绐咬得残缺不全了。他说:“一个人并不是生来要给打败的,你尽可能把他消灭掉,可就
是打不败他。”他想:“自己把鱼弄死不仅仅是为了养活自己,是为了光荣,因为你是个打鱼的。说到底,这个总要杀死那个。鱼一方面
养活我,一方面要弄死我。”这时,又有两条鲨鱼向他和死鱼袭来。他拿起绑着刀子的船桨向鲨鱼的头刺去。鲨鱼死的时候还吞着它咬
下的鱼肉。另一条鲨鱼在船底蹂躏着死鱼,老人设法使鲨鱼露出来,把刀子朝鲨鱼身上扎去。一次,两次,最后终于扎死了鲨鱼。现在
那条死鱼已经成了所有鲨鱼追踪的对象。鲨鱼每一次袭击,都从死鱼身上扯去很多肉。他想:“这一回它们可把我打惨了,可是我只要
有桨,有短棍,有舵把,就一定要揍死它们。”鲨鱼一次又一次冲来,老人用棍子揍。晚上,鲨鱼又成群窜来,老人只见它们身上的磷
光,他不顾一切地用棍棒劈去。棍棒丢了,就拽下舵把,两手抱住,一次又一次劈下去,但是鲨包还是从棍棒、舵把下撕咬下一块块死
鱼肉。
@Fernando-hub527 Fernando-hub527 added the in analysis Analyzing if should be implemented label Nov 28, 2024
@johnfercher
Copy link
Owner

You're 100% right about this, the problem is that having to apply a custom rule to each language based on structured seems to be very very complicated... I think that we have to use a specific project to handle this problem, maybe this is a common problem that text editors solved and maybe there is an open source project which we can use it...

@johnfercher johnfercher added hard to solve Not food for newcomers discussion v2 will be solved in v2 labels Dec 10, 2024
@sjm1327605995
Copy link
Author

I completely understand your point that implementing multiple line feed strategies can be very challenging. For line feeds, instead of implementing numerous strategies, you might consider using Donald Knuth's interline breakpoint algorithm. This algorithm determines the optimal position for a break by calculating the impact of each word on the overall aesthetics. The idea is to balance the spaces between words so that each line is visually uniform, rather than simply cutting off sentences at a certain number of characters.

I discovered this line wrapping implementation while using Go to draw a canvas picture. Here are the relevant parts:

If possible, I would like to think about it and add support. Thank you very much.

@ZoeKyHein
Copy link

need this feature, too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion hard to solve Not food for newcomers in analysis Analyzing if should be implemented v2 will be solved in v2
Projects
None yet
Development

No branches or pull requests

4 participants