README

The cluster module defines functions that allow you to cluster phrases, names, and titles by similarity or equivalence. It's functions can either accept a set of cluster categories and then group a set of names, phrases, or titles, into those categories, or it can accept a set of names, phrases, and titles, and partition them into a set of inferred clusters. This library is useful whenever you have data that you want to categorize.

Initializing the Build Environment

Note: the following are a hack solution to fix the fact that brew and opam are out of sync and opam's owl-plplot library requires library versions that can no longer be installed with brew.

opam switch create . ocaml-variants.4.10.0+flambda --no-install
opam update
opam install --deps-only .
dune build
dune exec analysis/main.exe

Usage in Other Projects

To use this PPX library in another project include the cluster library in your Dune file's libraries section. For example:

(executable
 (name main)
 (libraries
  core_kernel
  cluster
  )
 (preprocess
  (pps ppx_jane))
  (modes exe)
)

You must also update your OPAM package configuration file. Add the following line to your "pin-depends" section: ["cluster.1.0.0" "git+https://github.com/llee454/cluster.git#main"]. Add the following to your "depends" section: "cluster" { = "1.0.0"}.

Example

Imagine that you have a set of conceptual categories distinguished by the following keywords:

let clusters = [
  {
    label = "mental health issue";
    word_set = String.Set.of_list ["anxiety"; "depression"; "ptsd"; "adhd"; "attention"; "deficit"];
    entries = []
  };
  {
    label = "physical health issue";
    word_set = String.Set.of_list ["nutrition"; "dental"; "diabetes"];
    entries = []
  }
]

and you have a set of diagnoses such as:

let diagnoses = [
  "patient with anxiety disorder and panic attacks";
  "patient with dental carries and plaque";
  "attention deficit disorder with behavioral misconduct"
]

and you want to group the diagnoses into the two categories: "mental health issues," and "physical health issues." Then you can use the cluster function provided by this module to do so:

module Diagnoses_clusters = Cluster.Make ((
  struct
    type t = string [@@deriving sexp, compare]
    let get_name = Fn.id
  end : Cluster.Make_arg with type t = string
))

let categorize_diagnoses diagnoses =
  let open Diagnoses_clusters in
  cluster ~k_shared:0.5 ~k_not_shared:0.4 ~ignore:String.Set.empty
    ~clusters:(Diagnoses_clusters.Set.of_list clusters)
    diagnoses

Note: the cluster algorithm accepts two arguments: k_shared and k_not_shared. k_shared specifies how heavily the algorithm will weight words that appear in both the cluster's word set and the phrase being analyzed, while k_not_shared specifies how heavily the algorithm will weight words that are "missing." In general k_shared should be greater than k_not_shared.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
.ocamlformat		.ocamlformat
LICENSE.md		LICENSE.md
README.md		README.md
cluster.opam		cluster.opam
dune-project		dune-project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Initializing the Build Environment

Usage in Other Projects

Example

About

Releases

Packages

Languages

License

llee454/cluster

Folders and files

Latest commit

History

Repository files navigation

README

Initializing the Build Environment

Usage in Other Projects

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages