Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gpt-4o-mini to generate doc metadata #867

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

johndmulhausen
Copy link
Contributor

@johndmulhausen johndmulhausen commented Oct 12, 2024

Uses OpenAI's gpt-4o-mini model to generate:

  • A description suitable for use in the front-matter of the page (which can also populate <meta name="description" content="{description}">)
  • A list of keywords suitable for use in the front-matter of the page (which can also populate <meta name="keywords" content="{keywords}"/>)
  • A TL;DR of the page in three bullet points, which can be added to the top of the page

To call, run this from the root of the docs repo, after getting an OpenAI API key at https://platform.openai.com/api-keys

pip install openai
pip install python-frontmatter
OPENAI_API_KEY={project API key} python scripts/generate_metadata.py docs/guides/sweeps/initialize-sweeps.md

Example output:

% OPENAI_API_KEY={redacted}  python scripts/generate_metadata.py docs/guides/sweeps/initialize-sweeps.md

title: Initialize a sweep
description: Initialize a W&B Sweep

NEW DESCRIPTION: Learn how to initialize W&B sweeps using the SDK in Python or the CLI, ensuring you have the appropriate configuration and project settings for optimal results.

NEW KEYWORDS: wandb sweep instructions, sweep configuration, machine learning, Jupyter Notebook, CLI

Summary bullet-points:
 - W&B utilizes a Sweep Controller to manage experimental runs across machines, issuing instructions for new runs to agents on user machines after each sweep completion.
- Users must define a sweep configuration in a YAML file or Python dictionary and ensure that both the W&B Sweep and the W&B Run are in the same project.
- Sweeps can be initialized using either the W&B SDK in Python scripts/Jupyter Notebooks or the W&B CLI, both returning a sweep ID that includes the entity and project names.

@johndmulhausen johndmulhausen requested a review from a team as a code owner October 12, 2024 16:48
Copy link

cloudflare-workers-and-pages bot commented Oct 12, 2024

Deploying docodile with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7c51095
Status: ✅  Deploy successful!
Preview URL: https://6cfb1d0b.docodile.pages.dev
Branch Preview URL: https://ai-generated-metadata.docodile.pages.dev

View logs

@ngrayluna
Copy link
Contributor

Nice. How should we interpret the returned summaries? These two points could be improved, imo

  • Users must define a sweep configuration in a YAML file or Python dictionary and ensure that both the W&B Sweep and the W&B Run are in the same project.

    • This part, "...both the W&B Sweep and the W&B Run are in the same project." this is true, but is not really helpful. It'd be cool if ChatGPT used the sentence following that statement " Therefore, the name you provide when you initialize W&B (wandb.init) must match the name of the project you provide when you initialize a W&B Sweep (wandb.sweep)." (See caution blurb on this page).
    • Even better, instead of taking a sentence verbatim, it'd be cool if it could summarize nuances like logging data. The main things to call out w/ the sweep configuration are: defining the method (i.e. search method), metric, and parameter keys, taking note of the key-value pairs w/in the sweep config since those are the values you can create a hyperparameter search w/ later on, sweep search criteria, etc.
  • Sweeps can be initialized using either the W&B SDK in Python scripts/Jupyter Notebooks or the W&B CLI, both returning a sweep ID that includes the entity and project names.

    • This is also giving only semi useful/accurate info. The point that ChatGPT is trying to make is (I think) is that there is a different workflow between using the Python SDK within a notebook/Python script vs using a Python script + CLI.

@johndmulhausen johndmulhausen added the DO-NOT-MERGE For PRs that should not be merged yet label Mar 10, 2025
Copy link

cloudflare-workers-and-pages bot commented Mar 10, 2025

Deploying docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9247365
Status: ✅  Deploy successful!
Preview URL: https://a6724656.docodile.pages.dev
Branch Preview URL: https://ai-generated-metadata.docodile.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO-NOT-MERGE For PRs that should not be merged yet WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants