-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Bug Fixes: Error Handling and Stability Improvements #108 #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bug Fixes: Error Handling and Stability Improvements #108 #117
Conversation
…handling, added new custom exceptions for validation and processing errors, improved session cleanup logic, and refined repository name extraction. Updated logging format for better traceability.
…G application: Introduced dedicated functions for initializing session state and handling repository loading with improved error handling. Enhanced chat message processing and updated logging for better traceability.
WalkthroughThe code refactors the Streamlit application in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant StreamlitApp
participant RepoProcessor
participant QueryEngine
User->>StreamlitApp: Enter GitHub URL & click Load
StreamlitApp->>RepoProcessor: validate_github_url(url)
RepoProcessor-->>StreamlitApp: ValidationError? (if invalid)
StreamlitApp->>RepoProcessor: process_with_gitingets(url) (with retry)
RepoProcessor-->>StreamlitApp: ProcessingError? (on failure)
RepoProcessor->>QueryEngine: create_query_engine(content_path, repo_name)
QueryEngine-->>StreamlitApp: QueryEngineError? (on failure)
StreamlitApp->>SessionState: Update with repo & engine
User->>StreamlitApp: Enter chat prompt
StreamlitApp->>QueryEngine: Query with prompt
QueryEngine-->>StreamlitApp: Return response
StreamlitApp->>SessionState: Append chat history
User->>StreamlitApp: Click Reset
StreamlitApp->>SessionState: cleanup_session()
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (3)
github-rag/app.py (3)
52-69
: Propagate the root cause when exhausting retries
raise last_exception
drops the causal chain. Wrap withfrom last_exception
(Ruff B904) to preserve traceback context:- raise last_exception + raise last_exception from NoneThis small change greatly improves debuggability.
118-129
: Re-raise with cause insideprocess_with_gitingets
Same Ruff B904 concern – chain the original exception so callers can differentiate between ingest-failures and decorator retries.
- except Exception as e: - logger.error(f"Error processing repository: {str(e)}") - raise ProcessingError(f"Failed to process repository: {str(e)}") + except Exception as e: + logger.error("Error processing repository: %s", e) + raise ProcessingError("Failed to process repository") from e🧰 Tools
🪛 Ruff (0.8.2)
128-128: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
170-186
: Session init: preserve cause on failureMinor, but consistent with earlier comments:
- except Exception as e: - logger.error(f"Error initializing session state: {str(e)}") - raise SessionError("Failed to initialize session state") + except Exception as e: + logger.error("Error initializing session state: %s", e) + raise SessionError("Failed to initialize session state") from e🧰 Tools
🪛 Ruff (0.8.2)
186-186: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
github-rag/app.py
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
github-rag/app.py (1)
github-rag/app_local.py (2)
reset_chat
(31-34)process_with_gitingets
(36-39)
🪛 Ruff (0.8.2)
github-rag/app.py
89-89: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
105-105: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
116-116: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
128-128: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
168-168: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
186-186: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (2)
github-rag/app.py (2)
130-168
: Streaming responses may not stringify cleanly
streaming=True
returns a generator /StreamingResponse
object.
Downstream (handle_chat_message
) concatenates it directly into a string which can yield"<generator object …>"
.Consider:
- query_engine = index.as_query_engine(streaming=True) + query_engine = index.as_query_engine(streaming=False)or consume the stream before storing.
🧰 Tools
🪛 Ruff (0.8.2)
168-168: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
252-257
: Ensure response is serialisableIf
query_engine.query()
returns an object, wrap withstr()
to avoid “Object of type … is not JSON serialisable” errors when Streamlit serialises session state.- full_response = f"Repository: {st.session_state.current_repo}\n\n{response}" + full_response = f"Repository: {st.session_state.current_repo}\n\n{str(response)}"
def reset_chat(): | ||
"""Reset chat session and clean up resources""" | ||
try: | ||
st.session_state.messages = [] | ||
st.session_state.context = None | ||
gc.collect() | ||
cleanup_session() | ||
logger.info("Chat session reset successfully") | ||
except Exception as e: | ||
logger.error(f"Error resetting chat: {str(e)}") | ||
raise GitHubRAGError("Failed to reset chat session") | ||
raise SessionError("Failed to reset chat session") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset_chat
should also reset the loaded engine
After a reset the previous query_engine
lingers in memory, allowing questions against a supposedly “cleared” session.
- st.session_state.context = None
+ st.session_state.context = None
+ st.session_state.query_engine = None
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def reset_chat(): | |
"""Reset chat session and clean up resources""" | |
try: | |
st.session_state.messages = [] | |
st.session_state.context = None | |
gc.collect() | |
cleanup_session() | |
logger.info("Chat session reset successfully") | |
except Exception as e: | |
logger.error(f"Error resetting chat: {str(e)}") | |
raise GitHubRAGError("Failed to reset chat session") | |
raise SessionError("Failed to reset chat session") | |
def reset_chat(): | |
"""Reset chat session and clean up resources""" | |
try: | |
st.session_state.messages = [] | |
st.session_state.context = None | |
st.session_state.query_engine = None | |
cleanup_session() | |
logger.info("Chat session reset successfully") | |
except Exception as e: | |
logger.error(f"Error resetting chat: {str(e)}") | |
raise SessionError("Failed to reset chat session") |
🧰 Tools
🪛 Ruff (0.8.2)
116-116: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
def get_repo_name(url: str) -> str: | ||
"""Extract repository name from URL""" | ||
try: | ||
return url.split('/')[-1].replace('.git', '') | ||
parts = url.split('/') | ||
if len(parts) < 5: | ||
raise ValidationError("Invalid repository URL format") | ||
repo_name = parts[-1].replace('.git', '') | ||
if not repo_name: | ||
raise ValidationError("Could not extract repository name from URL") | ||
return repo_name | ||
except Exception as e: | ||
raise ValidationError(f"Failed to extract repository name: {str(e)}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle trailing slashes and preserve exception context in get_repo_name
- URLs ending with a trailing slash (
…/repo/
) trigger a false “invalid format” error becauseparts[-1]
is empty. except Exception as e:
re-raises withoutfrom e
, losing the stack.
- repo_name = parts[-1].replace('.git', '')
+ repo_segment = parts[-1] or parts[-2] # supports trailing “/”
+ repo_name = repo_segment.replace('.git', '')
@@
- except Exception as e:
- raise ValidationError(f"Failed to extract repository name: {str(e)}")
+ except Exception as e:
+ raise ValidationError(
+ f"Failed to extract repository name: {e}"
+ ) from e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_repo_name(url: str) -> str: | |
"""Extract repository name from URL""" | |
try: | |
return url.split('/')[-1].replace('.git', '') | |
parts = url.split('/') | |
if len(parts) < 5: | |
raise ValidationError("Invalid repository URL format") | |
repo_name = parts[-1].replace('.git', '') | |
if not repo_name: | |
raise ValidationError("Could not extract repository name from URL") | |
return repo_name | |
except Exception as e: | |
raise ValidationError(f"Failed to extract repository name: {str(e)}") | |
def get_repo_name(url: str) -> str: | |
"""Extract repository name from URL""" | |
try: | |
parts = url.split('/') | |
if len(parts) < 5: | |
raise ValidationError("Invalid repository URL format") | |
- repo_name = parts[-1].replace('.git', '') | |
+ repo_segment = parts[-1] or parts[-2] # supports trailing “/” | |
+ repo_name = repo_segment.replace('.git', '') | |
if not repo_name: | |
raise ValidationError("Could not extract repository name from URL") | |
return repo_name | |
- except Exception as e: | |
- raise ValidationError(f"Failed to extract repository name: {str(e)}") | |
+ except Exception as e: | |
+ raise ValidationError( | |
+ f"Failed to extract repository name: {e}" | |
+ ) from e |
🧰 Tools
🪛 Ruff (0.8.2)
89-89: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
logging.basicConfig( | ||
level=logging.INFO, | ||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' | ||
) | ||
logger = logging.getLogger(__name__) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid duplicating log handlers on every Streamlit rerun
logging.basicConfig()
gets executed on every Streamlit script rerun, which may attach multiple handlers and cause duplicated log lines.
Guard the configuration or add handlers only once:
-logging.basicConfig(
- level=logging.INFO,
- format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
-)
-if not logger.handlers:
- logger = logging.getLogger(__name__)
+logger = logging.getLogger(__name__)
+if not logger.handlers:
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
Committable suggestion skipped: line range outside the PR's diff.
with st.spinner(f"Processing repository {repo_name}..."): | ||
summary, tree, content = process_with_gitingets(github_url) | ||
|
||
try: | ||
repo_name = get_repo_name(github_url) | ||
file_key = f"{session_id}-{repo_name}" | ||
query_engine = st.session_state.file_cache.get(file_key) | ||
# Create temporary directory for repository content | ||
with tempfile.TemporaryDirectory() as temp_dir: | ||
content_path = os.path.join(temp_dir, repo_name) | ||
os.makedirs(content_path, exist_ok=True) | ||
|
||
if query_engine is None: | ||
raise GitHubRAGError("Please load a repository first!") | ||
# Save repository content | ||
for file_path, file_content in content.items(): | ||
file_dir = os.path.dirname(os.path.join(content_path, file_path)) | ||
os.makedirs(file_dir, exist_ok=True) | ||
with open(os.path.join(content_path, file_path), 'w', encoding='utf-8') as f: | ||
f.write(file_content) | ||
|
||
response = query_engine.query(prompt) | ||
# Create query engine | ||
query_engine = create_query_engine(content_path, repo_name) | ||
|
||
if hasattr(response, 'response_gen'): | ||
for chunk in response.response_gen: | ||
if isinstance(chunk, str): | ||
full_response += chunk | ||
message_placeholder.markdown(full_response + "▌") | ||
else: | ||
full_response = str(response) | ||
message_placeholder.markdown(full_response) | ||
|
||
message_placeholder.markdown(full_response) | ||
st.session_state.messages.append({"role": "assistant", "content": full_response}) | ||
# Update session state | ||
st.session_state.query_engine = query_engine | ||
st.session_state.current_repo = repo_name | ||
st.session_state.context = { | ||
'summary': summary, | ||
'tree': tree, | ||
'content': content | ||
} | ||
|
||
except GitHubRAGError as e: | ||
st.error(str(e)) | ||
logger.error(f"Error in chat processing: {str(e)}") | ||
except Exception as e: | ||
st.error("An unexpected error occurred while processing your query") | ||
logger.error(f"Unexpected error in chat: {str(e)}") | ||
st.success(f"Successfully loaded repository: {repo_name}") | ||
logger.info(f"Repository {repo_name} loaded successfully") | ||
|
||
except ValidationError as e: | ||
st.error(f"Validation error: {str(e)}") | ||
logger.warning(f"Validation error for URL {github_url}: {str(e)}") | ||
except ProcessingError as e: | ||
st.error(f"Processing error: {str(e)}") | ||
logger.error(f"Error processing repository {github_url}: {str(e)}") | ||
except QueryEngineError as e: | ||
st.error(f"Query engine error: {str(e)}") | ||
logger.error(f"Error creating query engine for {github_url}: {str(e)}") | ||
except Exception as e: | ||
st.error(f"Unexpected error: {str(e)}") | ||
logger.error(f"Unexpected error processing {github_url}: {str(e)}") | ||
finally: | ||
cleanup_session() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path-traversal & binary-file issues when writing repo content
- A malicious repo file path such as
../../etc/passwd
escapestemp_dir
. - Opening files with
'w'
fails on binary blobs.
- for file_path, file_content in content.items():
- file_dir = os.path.dirname(os.path.join(content_path, file_path))
+ for file_path, file_content in content.items():
+ safe_path = os.path.normpath(os.path.join(content_path, file_path))
+ if not safe_path.startswith(content_path):
+ logger.warning("Skipping suspicious path: %s", file_path)
+ continue
+ file_dir = os.path.dirname(safe_path)
@@
- with open(os.path.join(content_path, file_path), 'w', encoding='utf-8') as f:
- f.write(file_content)
+ mode, kwargs = ('wb', {}) if isinstance(file_content, (bytes, bytearray)) else ('w', {'encoding': 'utf-8'})
+ with open(safe_path, mode, **kwargs) as f:
+ f.write(file_content)
This mitigates a critical security flaw and handles binary assets gracefully.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
with st.spinner(f"Processing repository {repo_name}..."): | |
summary, tree, content = process_with_gitingets(github_url) | |
try: | |
repo_name = get_repo_name(github_url) | |
file_key = f"{session_id}-{repo_name}" | |
query_engine = st.session_state.file_cache.get(file_key) | |
# Create temporary directory for repository content | |
with tempfile.TemporaryDirectory() as temp_dir: | |
content_path = os.path.join(temp_dir, repo_name) | |
os.makedirs(content_path, exist_ok=True) | |
if query_engine is None: | |
raise GitHubRAGError("Please load a repository first!") | |
# Save repository content | |
for file_path, file_content in content.items(): | |
file_dir = os.path.dirname(os.path.join(content_path, file_path)) | |
os.makedirs(file_dir, exist_ok=True) | |
with open(os.path.join(content_path, file_path), 'w', encoding='utf-8') as f: | |
f.write(file_content) | |
response = query_engine.query(prompt) | |
# Create query engine | |
query_engine = create_query_engine(content_path, repo_name) | |
if hasattr(response, 'response_gen'): | |
for chunk in response.response_gen: | |
if isinstance(chunk, str): | |
full_response += chunk | |
message_placeholder.markdown(full_response + "▌") | |
else: | |
full_response = str(response) | |
message_placeholder.markdown(full_response) | |
message_placeholder.markdown(full_response) | |
st.session_state.messages.append({"role": "assistant", "content": full_response}) | |
# Update session state | |
st.session_state.query_engine = query_engine | |
st.session_state.current_repo = repo_name | |
st.session_state.context = { | |
'summary': summary, | |
'tree': tree, | |
'content': content | |
} | |
except GitHubRAGError as e: | |
st.error(str(e)) | |
logger.error(f"Error in chat processing: {str(e)}") | |
except Exception as e: | |
st.error("An unexpected error occurred while processing your query") | |
logger.error(f"Unexpected error in chat: {str(e)}") | |
st.success(f"Successfully loaded repository: {repo_name}") | |
logger.info(f"Repository {repo_name} loaded successfully") | |
except ValidationError as e: | |
st.error(f"Validation error: {str(e)}") | |
logger.warning(f"Validation error for URL {github_url}: {str(e)}") | |
except ProcessingError as e: | |
st.error(f"Processing error: {str(e)}") | |
logger.error(f"Error processing repository {github_url}: {str(e)}") | |
except QueryEngineError as e: | |
st.error(f"Query engine error: {str(e)}") | |
logger.error(f"Error creating query engine for {github_url}: {str(e)}") | |
except Exception as e: | |
st.error(f"Unexpected error: {str(e)}") | |
logger.error(f"Unexpected error processing {github_url}: {str(e)}") | |
finally: | |
cleanup_session() | |
with st.spinner(f"Processing repository {repo_name}..."): | |
summary, tree, content = process_with_gitingets(github_url) | |
# Create temporary directory for repository content | |
with tempfile.TemporaryDirectory() as temp_dir: | |
content_path = os.path.join(temp_dir, repo_name) | |
os.makedirs(content_path, exist_ok=True) | |
# Save repository content | |
for file_path, file_content in content.items(): | |
safe_path = os.path.normpath(os.path.join(content_path, file_path)) | |
if not safe_path.startswith(content_path): | |
logger.warning("Skipping suspicious path: %s", file_path) | |
continue | |
file_dir = os.path.dirname(safe_path) | |
os.makedirs(file_dir, exist_ok=True) | |
mode, kwargs = ( | |
("wb", {}) | |
if isinstance(file_content, (bytes, bytearray)) | |
else ("w", {"encoding": "utf-8"}) | |
) | |
with open(safe_path, mode, **kwargs) as f: | |
f.write(file_content) | |
# Create query engine | |
query_engine = create_query_engine(content_path, repo_name) | |
# Update session state | |
st.session_state.query_engine = query_engine | |
st.session_state.current_repo = repo_name | |
st.session_state.context = { | |
"summary": summary, | |
"tree": tree, | |
"content": content, | |
} | |
st.success(f"Successfully loaded repository: {repo_name}") | |
logger.info(f"Repository {repo_name} loaded successfully") | |
except ValidationError as e: | |
… | |
finally: | |
cleanup_session() |
def cleanup_session(): | ||
"""Clean up session resources""" | ||
try: | ||
if hasattr(st.session_state, 'file_cache'): | ||
for key, value in st.session_state.file_cache.items(): | ||
try: | ||
del value | ||
except Exception as e: | ||
logger.warning(f"Failed to cleanup cache entry {key}: {str(e)}") | ||
st.session_state.file_cache.clear() | ||
gc.collect() | ||
logger.info("Session cleanup completed successfully") | ||
except Exception as e: | ||
raise GitHubRAGError(f"Invalid repository URL: {str(e)}") | ||
logger.error(f"Error during session cleanup: {str(e)}") | ||
raise SessionError(f"Failed to cleanup session: {str(e)}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
cleanup_session
does not actually remove cached objects
del value
only deletes the local reference, not the entry stored in st.session_state.file_cache
.
Also adopt from e
per Ruff B904.
- if hasattr(st.session_state, 'file_cache'):
- for key, value in st.session_state.file_cache.items():
- try:
- del value
- except Exception as e:
- logger.warning(f"Failed to cleanup cache entry {key}: {str(e)}")
- st.session_state.file_cache.clear()
+ cache = getattr(st.session_state, "file_cache", {})
+ for key in list(cache.keys()):
+ try:
+ del cache[key]
+ except Exception as e:
+ logger.warning("Failed to cleanup cache entry %s: %s", key, e)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def cleanup_session(): | |
"""Clean up session resources""" | |
try: | |
if hasattr(st.session_state, 'file_cache'): | |
for key, value in st.session_state.file_cache.items(): | |
try: | |
del value | |
except Exception as e: | |
logger.warning(f"Failed to cleanup cache entry {key}: {str(e)}") | |
st.session_state.file_cache.clear() | |
gc.collect() | |
logger.info("Session cleanup completed successfully") | |
except Exception as e: | |
raise GitHubRAGError(f"Invalid repository URL: {str(e)}") | |
logger.error(f"Error during session cleanup: {str(e)}") | |
raise SessionError(f"Failed to cleanup session: {str(e)}") | |
def cleanup_session(): | |
"""Clean up session resources""" | |
try: | |
- if hasattr(st.session_state, 'file_cache'): | |
- for key, value in st.session_state.file_cache.items(): | |
- try: | |
- del value | |
- except Exception as e: | |
- logger.warning(f"Failed to cleanup cache entry {key}: {str(e)}") | |
- st.session_state.file_cache.clear() | |
+ cache = getattr(st.session_state, "file_cache", {}) | |
+ for key in list(cache.keys()): | |
+ try: | |
+ del cache[key] | |
+ except Exception as e: | |
+ logger.warning("Failed to cleanup cache entry %s: %s", key, e) | |
gc.collect() | |
logger.info("Session cleanup completed successfully") | |
except Exception as e: | |
- logger.error(f"Error during session cleanup: {str(e)}") | |
- raise SessionError(f"Failed to cleanup session: {str(e)}") | |
+ logger.error(f"Error during session cleanup: {str(e)}") | |
+ raise SessionError(f"Failed to cleanup session: {str(e)}") from e |
🧰 Tools
🪛 Ruff (0.8.2)
105-105: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
This PR addresses issue #108 by improving the application's error handling and session state management with the following changes:
Error Handling Improvements
Session State Management
Code Organization
These changes make the application more robust, easier to maintain, and provide better feedback to users when errors occur.
Closes #108
Summary by CodeRabbit
New Features
Refactor
Bug Fixes