Is it possible to apply BART to categorical variables? #50

fatihbozdag · 2023-01-24T20:54:21Z

Greetings all,

I know BART is relatively new, but I wonder if it is possible to apply BART to categorical (string) variables. Simply factorizing variables would do the trick? How should I proceed with such data?

aloctavodia · 2023-02-03T02:09:48Z

Sorry, I missed this. Could you provide a minimal example of what do you want to do?

fatihbozdag · 2023-02-03T19:30:04Z

Sure thing.

I mostly work with linguistic data, so I wonder if it is possible to apply decision trees to grammatical constructions.

For instance, the current dataset I work on is something like this;

{'Subject_Pos': CommonTerm(  
   name: Subject_Pos,
   prior: StudentT(mu: 0, nu: 30, lam: 1),
   shape: (14330, 2),
   categorical: True,
   levels: ['NOUN', 'PRON'],
   coords: {'Subject_Pos_dim': ['NOUN', 'PRON']}
 ),
 'Verb_C': CommonTerm(  
   name: Verb_C,
   prior: StudentT(mu: 0, nu: 30, lam: 1),
   shape: (14330, 6),
   categorical: True,
   levels: ['Aspectual', 'Communication', 'Existence or relationship', 'Facilitation or causation', 'Mental Verbs', 'Occurrence'],
   coords: {'Verb_C_dim': ['Aspectual', 'Communication', 'Existence or relationship', 'Facilitation or causation', 'Mental Verbs', 'Occurrence']}
 ),
 'Modal_C': CommonTerm(  
   name: Modal_C,
   prior: StudentT(mu: 0, nu: 30, lam: 1),
   shape: (14330, 2),
   categorical: True,
   levels: ['per_pos_abi', 'vol_pre'],
   coords: {'Modal_C_dim': ['per_pos_abi', 'vol_pre']}
 ),
 '1|Native_language': GroupSpecificTerm(  
   name: 1|Native_language,
   prior: Normal(mu: 0, sigma: HalfNormal(sigma: 2.5)),
   shape: (14330, 2),
   categorical: False,
   groups: ['Chinese', 'Turkish']
 )}

and the model is as follows;

Formula: Modal_C ~ 0 + Subject_Pos + Verb_C + Pattern_Type + (1|Native_language)
Family name: Categorical
Link: softmax
Observations: 14330
Priors:
  Common-level effects
    Subject_Pos ~ StudentT(mu: 0, nu: 30, lam: 1)
    Verb_C ~ StudentT(mu: 0, nu: 30, lam: 1)
    Pattern_Type ~ Normal(mu: [[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]], sigma: [[29.0401 29.0401]
 [ 9.3792  9.3792]
 [22.2055 22.2055]
 [94.6705 94.6705]])

  Group-level effects
    1|Native_language ~ Normal(mu: 0, sigma: HalfNormal(sigma: 2.6275))

Since each sentence is a construction where each constituent is influenced by and influences the others, I thought decision trees would be a tool the observe saliency among items. I guess what I want to do is an alternative analysis to the Beam Search algorithm through BART

We assume, particularly in learner languages, lexical choices are based on certain restrictions such as learner proficiency, lexical knowledge, grammatical awareness etc. So, investigating probabilities of observing certain lexical items with some other certain ones in a tree-based fashion would be feasible to learn more about learner constructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to apply BART to categorical variables? #50

Is it possible to apply BART to categorical variables? #50

fatihbozdag commented Jan 24, 2023

aloctavodia commented Feb 3, 2023

fatihbozdag commented Feb 3, 2023 •

edited

Loading

Is it possible to apply BART to categorical variables? #50

Is it possible to apply BART to categorical variables? #50

Comments

fatihbozdag commented Jan 24, 2023

aloctavodia commented Feb 3, 2023

fatihbozdag commented Feb 3, 2023 • edited Loading

fatihbozdag commented Feb 3, 2023 •

edited

Loading