Merge pull request #52 from takahiroanno2024/quickstart

Add Quickstart
takahiroanno2024 · Jan 28, 2025 · 5449fb9 · 5449fb9
2 parents 73f9bcf + 7ad6857
commit 5449fb9
Show file tree

Hide file tree

Showing 7 changed files with 164 additions and 11 deletions.
diff --git a/.env.example b/.env.example
@@ -0,0 +1 @@
+OPENAI_API_KEY=
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,30 @@
+# ベースイメージとしてPythonとNode.jsを含むイメージを使用
+FROM python:3.10-slim
+
+# 作業ディレクトリを設定
+WORKDIR /app
+
+# 必要なシステムパッケージをインストール
+RUN apt-get update && apt-get install -y curl gcc build-essential && rm -rf /var/lib/apt/lists/*
+
+# Node.jsをインストール
+RUN curl -fsSL https://deb.nodesource.com/setup_16.x | bash - \
+    && apt-get install -y nodejs
+
+# プロジェクトのファイルをコピー
+COPY . .
+
+# Pythonの依存関係をインストール
+RUN pip install --no-cache-dir -r scatter/requirements.txt
+
+# JavaScriptの依存関係をインストール
+RUN cd scatter/next-app && npm install
+
+# NLTKのデータをダウンロード
+RUN python -c "import nltk; nltk.download('stopwords')"
+
+# 環境変数を設定
+ENV OPENAI_API_KEY=your_openai_api_key_here
+
+# パイプラインを実行し、レポートを生成
+CMD ["bash", "-c", "cd scatter/pipeline && python main.py configs/example-polis.json --skip-interaction && cd outputs/example-polis/report && python -m http.server 8000"]
diff --git a/docs/for_windows_user.md b/docs/for_windows_user.md
@@ -0,0 +1,34 @@
+## Windowsユーザ向けDockerセットアップ
+(注: このドキュメントは作りかけです)
+
+
+WindowsユーザはMacやLinuxと比較して環境構築で迷いやすいです。Dockerを使用して比較的マシに環境をセットアップできます。以下の手順に従ってください。
+
+### 1. **Dockerのインストール**
+Docker Desktopをインストールし、起動します。
+
+### 2. **環境変数の設定**
+プロジェクトのルートディレクトリで`.env`ファイルを作成します。`.env.example`ファイルをコピーして編集すると良いでしょう。
+以下のような内容になります。
+
+```
+OPENAI_API_KEY=<your_openai_api_key_here>
+```
+
+### 3. **Dockerイメージのビルド**
+プロジェクトのルートディレクトリで以下のコマンドを実行してDockerイメージをビルドします。
+
+```bash
+docker build -t broadlistening .
+```
+
+### 3. **Dockerコンテナの起動**
+以下のコマンドを実行してDockerコンテナを起動し、レポートを生成します。
+
+```bash
+docker run -p 8000:8000 broadlistening
+```
+
+
+### 4. 結果の確認
+ブラウザで`http://localhost:8000`を開き、生成されたレポートを確認します。
diff --git a/docs/images/usage.png b/docs/images/usage.png
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -0,0 +1,87 @@
+# 初めてのブロードリスニングガイド
+
+このドキュメントは、初めてブロードリスニングに挑戦する人を想定したガイドです。
+
+## 概要
+
+このツール(*)は、コメントのCSVファイルを入力として受け取り、以下のようなHTMLレポートを生成するAIパイプラインです。
+
+- 元のコメントから主要な議論を抽出
+- 意味的な類似性に基づいて議論をクラスター化
+- 各クラスターにラベルと要約を生成
+- 各クラスター内の議論を探索するためのインタラクティブなマップを提供
+
+(* 「このツール」はAI Objective Instituteが開発したTalk to the Cityから派生したものですが、Talk to the Cityにも2種類あり、またかなり大々的に書き換えているため、近々分かりやすい名前をつける予定です)
+
+## 必要なもの
+
+- OpenAIのAPIキー
+
+
+## Windowsユーザ向けDockerセットアップ
+
+WindowsユーザはMacやLinuxと比較して環境構築で迷いやすいです。Dockerを使用して比較的マシに環境をセットアップできます。
+[Windowsユーザ向けDockerセットアップ](for_windows_user.md)を参照。
+
+## その他の環境でのセットアップ
+
+### **Python環境のセットアップ**
+
+Python 3.10以上が必要です。Pythonのバージョン管理にはpyenvを使用することをお勧めします。
+
+Python 3.10をインストールし、仮想環境を作成してアクティブ化します。
+
+必要な依存関係をインストールします。
+
+```bash
+pyenv install 3.10.15
+pyenv local 3.10.15
+python -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+python -c "import nltk; nltk.download('stopwords')"
+```
+
+### **JavaScript依存関係のインストール**
+npmを使用してJavaScriptの依存関係をインストールします。
+
+```bash
+cd next-app
+npm install
+```
+
+### **環境変数の設定**
+プロジェクトのルートディレクトリで`.env`ファイルを作成します。`.env.example`ファイルをコピーして編集すると良いでしょう。
+以下のような内容になります。
+
+```
+OPENAI_API_KEY=<your_openai_api_key_here>
+```
+
+### **レポートの生成**
+サンプルの`example-polis`データを使用して試してみましょう。
+
+```bash
+cd pipeline
+python main.py configs/example-polis.json
+```
+
+このコマンドは、`pipeline/inputs/example-polis.csv`のデータを使用し、`pipeline/outputs/example-polis/report`にレポートを生成します。
+
+### **レポートの表示**
+
+PythonのHTTPサーバーを使用してレポートを表示します。
+
+```bash
+cd pipeline/outputs/example-polis/report
+python -m http.server 8000
+```
+
+ブラウザで`http://localhost:8000`を開きます。
+
+### API費用の確認
+OpenAIのUsege画面を見るとかかった費用がわかります。
+
+![](images/usage.png)
+
+この処理では1〜2円程度であることがわかります。
diff --git a/scatter/pipeline/main.py b/scatter/pipeline/main.py
@@ -21,39 +21,40 @@ def parse_arguments():
         description="Run the annotation pipeline with optional flags."
     )
     parser.add_argument(
-        "config",
-        help="Path to config JSON file that defines the pipeline execution."
+        "config", help="Path to config JSON file that defines the pipeline execution."
     )
     parser.add_argument(
-        "-f", "--force",
+        "-f",
+        "--force",
         action="store_true",
-        help="Force re-run all steps regardless of previous execution."
+        help="Force re-run all steps regardless of previous execution.",
     )
     parser.add_argument(
-        "-o", "--only",
+        "-o",
+        "--only",
         type=str,
-        help="Run only the specified step (e.g., extraction, embedding, clustering, etc.)."
+        help="Run only the specified step (e.g., extraction, embedding, clustering, etc.).",
     )
     parser.add_argument(
         "--skip-interaction",
         action="store_true",
-        help="Skip the interactive confirmation prompt and run pipeline immediately."
+        help="Skip the interactive confirmation prompt and run pipeline immediately.",
     )
     return parser.parse_args()
 
 
 def main():
     args = parse_arguments()
-    
+
     # Convert argparse namespace to sys.argv format for compatibility
     new_argv = [sys.argv[0], args.config]
     if args.force:
         new_argv.append("-f")
     if args.only:
         new_argv.extend(["-o", args.only])
     if args.skip_interaction:
-        new_argv.append("-skip-interaction")
-    
+        new_argv.append("--skip-interaction")
+
     config = initialization(new_argv)
 
     try:

diff --git a/scatter/pipeline/utils.py b/scatter/pipeline/utils.py
@@ -140,7 +140,7 @@ def initialization(sysargv):
             config["force"] = True
         if option == "-o":
             config["only"] = sysargv[i + 1]
-        if option == "-skip-interaction":
+        if option == "--skip-interaction":
             config["skip-interaction"] = True
 
     output_dir = config["output_dir"]