This guide demonstrates how to use our read-only vector database powered by Weaviate with OpenAI integration for AI-enabled knowledge management. You'll need to provide your own OpenAI API key for generating embeddings and using generative AI features.
For complete documentation, visit the Weaviate Documentation.
REST Endpoint: https://larnd9rle1zwapsdcqw.c0.us-east1.gcp.weaviate.cloud
gRPC Endpoint: https://grpc-larnd9rle1zwapsdcqw.c0.us-east1.gcp.weaviate.cloud
ReadOnly API Key: Z0lQNXFDVnFnRmV6OTZRdF9KRVUwVzFEdjgxaHIvVFB3dkFyV0JqSnFxWFJYMWhqM0FTQ0NncU14S2hrPV92MjAw
See How to Connect for more details.
import weaviate
from weaviate.classes.init import Auth
import os
# Connect with read-only API key
client = weaviate.connect_to_weaviate_cloud(
cluster_url=os.environ["WEAVIATE_URL"],
auth_credentials=Auth.api_key(os.environ["WEAVIATE_API_KEY"]),
headers={
"X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]
}
)See Search Guide for complete search documentation.
content = client.collections.get("Content")
response = content.query.near_text(
query="factory farming investigation techniques",
limit=3
)
for result in response.objects:
print(result.properties)response = content.query.hybrid(
query="effective animal advocacy strategies",
alpha=0.5, # Balance between vector and keyword search
limit=3
)from weaviate.classes.query import Filter
response = content.query.near_text(
query="sanctuary best practices",
filters=Filter.by_property("content_type").equal("report"),
limit=3
)See RAG Documentation for more details.
response = content.generate.near_text(
query="factory farming investigation techniques",
single_prompt="Summarize this content: {main_text}",
limit=3
)
print(response.generated)response = content.generate.near_text(
query="successful campaign strategies",
grouped_task="What patterns emerge from these successful campaigns?",
limit=5
)
print(response.generated)See Aggregations Documentation for more options.
response = content.aggregate.over_all(
group_by=weaviate.classes.aggregate.GroupByAggregate(
prop="content_type"
)
)
for group in response.groups:
print(f"Type: {group.grouped_by.value} Count: {group.total_count}")This schema defines the structure of our unified, open-access database designed to support animal advocacy. By integrating data from diverse sources, it supports AI applications such as retrieval-augmented generation (RAG) and model training, as well as general strategic planning and referencing.
The Content class functions as the primary centralized repository for materials related to veganism and animal advocacy, such as articles, reports, blogs, and transcripts. It supports AI applications like semantic search, knowledge graphs, and retrieval-augmented generation (RAG).
Structure:
- summary: text
- main_text: text
- content_type: text
- source_url: array of texts
- date: date
- scores: object
- good_for_animals: number
- cultural_sensitivity: number
- relevance: number
- insight: number
- trustworthiness: number
- emotional_impact: number
- rationality: number
- influence: number
- alignment: number
- predicted_performance: number
- average_score: number
- tags: array of text
- Use hybrid search for optimal retrieval
- Leverage filters to narrow results
- Consider RAG for dynamic content generation
- Use aggregations for movement insights
- Start with small result limits and increase as needed
- Test queries with different alpha values in hybrid search to find the optimal balance
- Retrieve and Re-rank Pattern: When quality matters more than speed, retrieve a larger set of results (e.g., 20-30 items) and then apply post-processing to select the best ones:
# Retrieve more results initially
response = content.query.near_text(
query="effective outreach strategies",
limit=20
)
# Sort by relevant scores and select top performers
sorted_results = sorted(
response.objects,
key=lambda x: x.properties['scores']['predicted_performance'],
reverse=True
)
# Pass only top 5 to your application
top_results = sorted_results[:5]- Multi-Score Ranking: Combine multiple scores for more nuanced ranking:
def calculate_composite_score(item):
scores = item.properties['scores']
# Weight different scores based on your use case
return (
scores['relevance'] * 0.3 +
scores['trustworthiness'] * 0.3 +
scores['insight'] * 0.2 +
scores['predicted_performance'] * 0.2
)
sorted_results = sorted(
response.objects,
key=calculate_composite_score,
reverse=True
)- Score-Based Filtering: Use score thresholds to ensure quality before RAG generation:
from weaviate.classes.query import Filter
# Only retrieve highly trustworthy and relevant content
response = content.generate.near_text(
query="campaign evaluation methods",
filters=(
Filter.by_property("scores.trustworthiness").greater_than(0.7) &
Filter.by_property("scores.relevance").greater_than(0.6)
),
grouped_task="Synthesize the most reliable evaluation methods",
limit=10
)- Time-Aware Retrieval: Combine date filters with scores for recency-weighted results:
from datetime import datetime, timedelta
# Get high-quality content from the last 6 months
recent_date = datetime.now() - timedelta(days=180)
response = content.generate.near_text(
query="emerging advocacy trends",
filters=(
Filter.by_property("date").greater_than(recent_date) &
Filter.by_property("scores.insight").greater_than(0.5)
),
grouped_task="What are the most promising recent trends?",
limit=15
)- Progressive Filtering: Start broad, then narrow based on initial results:
# First pass: broad retrieval
initial_response = content.query.near_text(
query="corporate campaigns",
limit=30
)
# Analyze score distribution
avg_trustworthiness = sum(
obj.properties['scores']['trustworthiness']
for obj in initial_response.objects
) / len(initial_response.objects)
# Second pass: refined search with dynamic threshold
refined_response = content.generate.near_text(
query="corporate campaigns",
filters=Filter.by_property("scores.trustworthiness").greater_than(avg_trustworthiness),
grouped_task="What makes these campaigns particularly effective?",
limit=10
)Note: Always refer to OpenAI's documentation and Weaviate's documentation for the most up-to-date information.