-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Squamish (squ) + research documentation #172
Comments
Super, thanks @justinpenner, this is very valuable! Both to have your approach documented, and to include such a very local language. All in all this looks already very good. You can also open a PR and we refine in the PR; it's often easier to comment or amend code in the PR interface. A few pointers:
|
@kontur thanks, I've made a couple edits and submitted a PR (#173). Sadly the speaker count was indeed only 1 person in 2014, but happily, I found a new source (Canada's 2021 census) stating there are now 25 native speakers! I updated Wikipedia, too. Can we already include digraphs and trigraphs in |
Sorry, yes, include away! I misremembered, we have already changed that the input doesn't "vanish" those away on saving! |
Great, I'll add them to the PR. This language has a lot of them. I agree with comments in #116 that they're not too useful or interesting for a type designer, but they're part of the orthography in this case, which is what we're documenting, and I already have the research. |
It would be useful to have the graphemes using combining marks, like m̓ n̓ l̓ x̱, no? Designers may not be aware those are used and should be handled to support Squamish. |
@moyogo Yes, I will add those to the PR as well. Earlier I thought that the Hyperglot database was only cataloguing individual characters, but apparently base+mark pairs and multigraphs are allowed, and should be included. |
The pull request #173 now includes base+mark pairs and multigraphs: base: A AA AW AW̓ AY AY̓ Á CH CHʼ E EY EY̓ EW EW̓ É H I II IW IW̓ Í K Kʼ KW KWʼ Ḵ Ḵʼ ḴW ḴWʼ L L̓ LH M M̓ N N̓ P Pʼ S SH T Tʼ TLʼ TS TSʼ U UU UY UY̓ Ú W W̓ XW X̱ X̱W Y Y̓ a aa aw aw̓ ay ay̓ á ch chʼ e ey ey̓ ew ew̓ é h i ii iw iw̓ í k kʼ kw kwʼ ḵ ḵʼ ḵw ḵwʼ l l̓ lh m m̓ n n̓ p pʼ s sh t tʼ tlʼ ts tsʼ u uu uy uy̓ ú w w̓ xw x̱ x̱w y y̓ 7 ʼ ’ ''
marks: ◌̓ ◌̱
punctuation: . , - ? ! These are all listed in a pronunciation guide section at the beginning of the Squamish–English dictionary, which seems to be more complete than the orthography listed on Wikipedia. I used |
Justin, have you considered a mixed-case digraphs such as Aa Aw Ay Ch…?
…On Fri, May 31, 2024 at 19:39, Justin Penner ***@***.***(mailto:On Fri, May 31, 2024 at 19:39, Justin Penner <<a href=)> wrote:
The pull request [#173](#173) now includes base+mark pairs and multigraphs:
base
:
A AA AW AW̓ AY AY̓ Á CH CHʼ E EY EY̓ EW EW̓ É H I II IW IW̓ Í K Kʼ KW KWʼ Ḵ Ḵʼ ḴW ḴWʼ L L̓ LH M M̓ N N̓ P Pʼ S SH T Tʼ TLʼ TS TSʼ U UU UY UY̓ Ú W W̓ XW X̱ X̱W Y Y̓ a aa aw aw̓ ay ay̓ á ch chʼ e ey ey̓ ew ew̓ é h i ii iw iw̓ í k kʼ kw kwʼ ḵ ḵʼ ḵw ḵwʼ l l̓ lh m m̓ n n̓ p pʼ s sh t tʼ tlʼ ts tsʼ u uu uy uy̓ ú w w̓ xw x̱ x̱w y y̓ 7 ʼ ’ ''
marks
:
◌̓ ◌̱
punctuation
:
. , - ? !
These are all listed in a pronunciation guide section at the beginning of the Squamish–English dictionary, which seems to be more complete than the orthography listed on Wikipedia.
I used ʼ U+02BC MODIFIER LETTER APOSTROPHE rather than ’ U+2019 RIGHT SINGLE QUOTATION MARK in the digraphs. It isn't standardized so either is acceptable in everyday use of the language, but apostrophe modifier is more semantically correct, I think.
—
Reply to this email directly, [view it on GitHub](#172 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AADWQYYFFEMO73S4GYICKSLZFCYUFAVCNFSM6AAAAABIPKNJ5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSG4YDONJWGY).
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
I did think of it, but is there any usefulness of including them? I think mixed case digraphs would only be useful to include when they have their own unique codepoint like |
I do not have the answer st the moment. I would expect them there for the completeness sake.
It would make most sense to only include one case of everything (incl. single characters), but then German and ß. I will ponder.
…On Fri, May 31, 2024 at 21:22, Justin Penner ***@***.***(mailto:On Fri, May 31, 2024 at 21:22, Justin Penner <<a href=)> wrote:
> Justin, have you considered a mixed-case digraphs such as Aa Aw Ay Ch…?
I did think of it, but is there any usefulness of including them? I think mixed case digraphs would only be useful to include when they have their own unique codepoint like Dz U+01F2 LATIN CAPITAL LETTER D WITH SMALL LETTER Z. Otherwise mixed case digraphs aren't adding anything semantically unique, nor are they adding any new codepoints.
—
Reply to this email directly, [view it on GitHub](#172 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AADWQY736EEC56FOGHB5AMDZFDEWFAVCNFSM6AAAAABIPKNJ5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSHA2TKOBSG4).
You are receiving this because you commented.Message ID: ***@***.***>
|
Besides exceptions like eszett, adding uppercase or titlecase is redundant data. It can be useful for verbosity but it can also just be automatically derived from Unicode data. Then only exception need to be added. For example base="a b c", special_casing={"c": "X"}. For caseless orthographies, there could be a flag caseless=true. |
Agreed, I was thinking of something along those lines. It will make the database smaller too.
@kontur is working on better inheritance, so we could bundle that with it.
…On Sat, Jun 1, 2024 at 13:15, Denis Moyogo Jacquerye ***@***.***(mailto:On Sat, Jun 1, 2024 at 13:15, Denis Moyogo Jacquerye <<a href=)> wrote:
Besides exceptions like eszett, adding uppercase or titlecase is redundant data. It can be useful for verbosity but it can also just be automatically derived from Unicode data. Then only exception need to be added.
For example base="a b c", special_casing={"c": "X"}. For caseless orthographies, there could be a flag caseless=true.
—
Reply to this email directly, [view it on GitHub](#172 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AADWQY6GH6SUTN7BSQMSHG3ZFGUMFAVCNFSM6AAAAABIPKNJ5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBTGQYTAMZXHA).
You are receiving this because you commented.Message ID: ***@***.***>
|
Regarding the double capital digraphs, I think it helps to think of it not as a "caps lock" typing thing, but how an "own word" (omg, my terminology fails me here... city, name, etc) might need it. Official orthographies seem to list them as such, e.g. in Hungarian and Czech I believe it is the first letter only that is capitalized. Of course, there could be orthographic differences, so I am not categorically opposing double uppercase. From a font validation point of view these don't matter, so I am leaning towards leniency on how those are noted, but of course it would be nice to have consistency. Regarding the overall uppercase: Yes, it is redundant, but I think we explicitly included them at a point, since the yaml files as such should resemble an "full" orthography, not just codepoints which by capitalizing render the full orthography, and font checking should consider the uppercase as well. "Size" doesn't matter, I'd say. We may implement convenience automatisation that adds or warns missing upper/lower case, if consistency is an issue. Technically it would be trivial to have only lowercase and expand the yamls with uppercase variants when parsed. |
Thank you @justinpenner for the contribution and clarifying your approach, it helps us improve the instructions for new language additions and hopefully serves as a good reference to other future contributors. If more discussion regarding uppercase/digraphs is needed a new issue is better suited. |
FYI, the Tlingit entry tli.yaml uses the same U+0331 on K/k and X/x as well as on G/g so the notes there may be helpful. The use of U+0331 with these letters is relatively common among Northwest Coast language orthographies (Haida, Coast Tsimshian, Nisg̱aʼa, Gitksan, Kwakʼwala, Sechelt, etc.). |
Here's a preliminary entry for the Squamish language. This is my first language submission to Hyperglot, and I thought it would be helpful, for myself and others, to document my process in researching it.
For some background, I don't speak Squamish, but I live in the region this indigenous language is from. Prior to my research, I already had some familiarity with the language due to it being used prominently in place names and signage. I also frequently do graphic design work for the Squamish Nation and other local clients, which often involves typesetting in this language.
Research
I was able to find several sources: Wikipedia, FirstVoices iOS keyboard app, Typotheque's book Indigenous North American Type, and a Squamish–English dictionary from the indigenous collection at my local public library.
There were a number of differences in the orthographies documented by each of these sources, but each source gave me a fuller picture which helped me to decide what to do about the inconsistencies. The character sets I found were as follows:
23 common among all sources:
6 additional characters in Wikipedia:
3 additional characters in Typotheque:
7 additional characters in Squamish–English dictionary:
7 additional characters in FirstVoices:
From the above, I made the following decisions:
c
because it is very commonly used in Squamish, therefore its omission from two sources appears to be an oversight.ʼ U+02BC MODIFIER LETTER APOSTROPHE
in addition to the common’ U+2019 RIGHT SINGLE QUOTATION MARK
, because right quotation mark is more commonly used due to keyboard settings, but apostrophe modifier is more semantically correct and some Apple systems map both to the same key, inserting one or the other based on context. Also keep' U+0027 APOSTROPHE
as it is used instead of right quote in some keyboard layouts.o z
as I have yet to see any loanwords or uncommon orthographical preferences that make use of these letters, so it seems like Wikipedia may have included it erroneously, and Wikipedia was likely Typotheque's source for including it. These letters cannot even be typed in the FirstVoices keyboard, further evidence that they are not required. If new evidence arises in the future they could be added toauxiliary
or even an alternate orthography.ʔ
as no evidence was found for its usage at all. The Squamish glottal stop was standardized as7
in the typewriter era.. , - ? !
as punctuation is not in scope for Hyperglot (yet).design_requirement
based on a preference I've observed (but not confirmed). I suspect this preference is linked to the norms seen in fonts that support IPA, and it therefore might be a preference among many indigenous North American languages.note
linking back to this research.Result (squ.yaml)
Have I missed anything or made any errors in relation to Hyperglot or the Squamish language? I’ll leave this issue open for a bit in case anyone has feedback, then I will submit a pull request.
The text was updated successfully, but these errors were encountered: