Skip to content

Commit

Permalink
docs: Add changelog entry for 0.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Feb 26, 2024
1 parent b1d7eda commit 49ee9e9
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ Gemfile.lock
doc/*
pkg/*
coverage/*
vendor/*
31 changes: 31 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Changelog

## v0.2.0

### Added

- Add `tokenizer` option to `Document` class

The value is an object with a `tokenize` method that accepts a string and returns an array of `Token` instances.

For example, to use [natto](https://rubygems.org/gems/natto) instead of [unicode_utils](https://rubygems.org/gems/unicode_utils) for Japanese, install MeCab (`brew install mecab`), and then:

```ruby
require 'natto'

class Tokenizer
def initialize
@nm = Natto::MeCab.new
end

def tokenize(text)
@nm.enum_parse(text).map do |node|
Token.new(node)
end
end
end

document = TfIdfSimilarity::Document.new("こんにちは世界", tokenizer: tokenizer)
```

- Add `to_s` method to `Token` class, to use less memory than chaining `lowercase_filter` with `classic_filter`

0 comments on commit 49ee9e9

Please sign in to comment.