-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add NotepadComponent #5845
base: main
Are you sure you want to change the base?
Conversation
…rsion - Added support for converting DataFrame outputs to a dictionary format when processing tool events. - Updated the output assignment logic to handle both DataFrame and non-DataFrame outputs, improving flexibility in data handling. These changes enhance the functionality of the tool event handling by accommodating DataFrame outputs, ensuring better integration with data processing workflows.
- Introduced NotepadComponent to store and manage values with operations to add, remove, and edit entries. - Implemented input handling for value, operation type, and position, enhancing user interaction. - Utilized DataFrame for structured data management, ensuring robust context handling and data integrity. This addition improves the functionality of the application by providing a dedicated component for value management, facilitating better data organization and manipulation.
- Introduced a new test suite for NotepadComponent, covering various operations including adding, removing, and editing values. - Implemented tests for default and specific positions when adding values, ensuring correct behavior in different scenarios. - Validated persistence of values between operations and confirmed correct handling of empty notepad states. - Enhanced test coverage for edge cases, improving reliability and robustness of the NotepadComponent functionality.
…gement - Added a new Protocol `DfOperation` to define the interface for notepad operations, ensuring consistency across add, remove, and edit functionalities. - Implemented `add_value`, `remove_value`, and `edit_value` functions to manage values in the notepad, allowing for insertion, deletion, and modification at specified positions or by value. - Enhanced the `NotepadComponent` to utilize these operations, improving the management of notepad data within the component's context. - Introduced methods for initializing and retrieving the current notepad, ensuring robust context handling and data integrity. - Updated the `process_and_get_notepad` method to streamline operation execution and error handling, enhancing overall reliability. These changes significantly improve the functionality and usability of the NotepadComponent, providing a more structured approach to managing notepad entries.
…unction - Enhanced the `remove_value` function to validate the `position` parameter, ensuring it is an integer and within the bounds of the notepad DataFrame. - Added specific `ValueError` messages for invalid position inputs, improving user feedback. - Updated error handling in the `NotepadComponent` to provide clearer context in exception messages when operations fail. These changes enhance the robustness and usability of the notepad operations, ensuring better error management and user experience.
- Added comprehensive tests for adding, removing, and editing values in the NotepadComponent, covering edge cases such as out-of-range positions and invalid operations. - Implemented tests to ensure correct behavior when handling negative positions and positions beyond the notepad length, confirming that values are appended correctly. - Validated error handling for invalid removal operations, ensuring appropriate exceptions are raised and the notepad state remains unchanged when necessary. - Enhanced test coverage for multiple notepad instances, verifying that each notepad maintains its own content independently. These changes improve the reliability and robustness of the NotepadComponent by ensuring thorough testing of its functionalities.
notepad_length = notepad.shape[0] | ||
|
||
# If position is provided, validate it's within bounds | ||
if position is not None: | ||
if not isinstance(position, int): | ||
msg = f"Position must be an integer, got {type(position)}" | ||
raise ValueError(msg) | ||
if position < 0 or position >= notepad_length: | ||
msg = f"Position {position} is out of bounds for notepad of length {notepad_length}" | ||
raise ValueError(msg) | ||
# Remove at valid position | ||
return notepad.drop(notepad.index[position]).reset_index(drop=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notepad_length = notepad.shape[0] | |
# If position is provided, validate it's within bounds | |
if position is not None: | |
if not isinstance(position, int): | |
msg = f"Position must be an integer, got {type(position)}" | |
raise ValueError(msg) | |
if position < 0 or position >= notepad_length: | |
msg = f"Position {position} is out of bounds for notepad of length {notepad_length}" | |
raise ValueError(msg) | |
# Remove at valid position | |
return notepad.drop(notepad.index[position]).reset_index(drop=True) | |
notepad_length = len(notepad) | |
# If position is provided, remove by position after validation | |
if not (0 <= position < notepad_length): | |
raise ValueError(f"Position {position} is out of bounds for notepad of length {notepad_length}") | |
return notepad[notepad["value"] != value].reset_index(drop=True) |
⚡️ Codeflash found optimizations for this PR📄 39% (0.39x) speedup for
|
Test | Status |
---|---|
⚙️ Existing Unit Tests | 🔘 None Found |
🌀 Generated Regression Tests | ✅ 4 Passed |
⏪ Replay Tests | 🔘 None Found |
🔎 Concolic Coverage Tests | 🔘 None Found |
📊 Tests Coverage | undefined |
🌀 Generated Regression Tests Details
import pandas as pd
# imports
import pytest # used for our unit tests
from langflow.components.helpers.notepad import remove_value
# function to test
class DataFrame(pd.DataFrame):
"""A pandas DataFrame subclass specialized for handling collections of Data objects.
This class extends pandas.DataFrame to provide seamless integration between
Langflow's Data objects and pandas' powerful data manipulation capabilities.
Args:
data: Input data in various formats:
- List[Data]: List of Data objects
- List[Dict]: List of dictionaries
- Dict: Dictionary of arrays/lists
- pandas.DataFrame: Existing DataFrame
- Any format supported by pandas.DataFrame
**kwargs: Additional arguments passed to pandas.DataFrame constructor
Examples:
>>> # From Data objects
>>> dataset = DataFrame([Data(data={"name": "John"}), Data(data={"name": "Jane"})])
>>> # From dictionaries
>>> dataset = DataFrame([{"name": "John"}, {"name": "Jane"}])
>>> # From dictionary of lists
>>> dataset = DataFrame({"name": ["John", "Jane"], "age": [30, 25]})
"""
def __init__(self, data: list[dict] | list["Data"] | pd.DataFrame | None = None, **kwargs):
if data is None:
super().__init__(**kwargs)
return
if isinstance(data, list):
if all(isinstance(x, Data) for x in data):
data = [d.data for d in data if hasattr(d, "data")]
elif not all(isinstance(x, dict) for x in data):
msg = "List items must be either all Data objects or all dictionaries"
raise ValueError(msg)
kwargs["data"] = data
elif isinstance(data, dict | pd.DataFrame):
kwargs["data"] = data
super().__init__(**kwargs)
def to_data_list(self) -> list["Data"]:
"""Converts the DataFrame back to a list of Data objects."""
list_of_dicts = self.to_dict(orient="records")
return [Data(data=row) for row in list_of_dicts]
def add_row(self, data: dict | "Data") -> "DataFrame":
"""Adds a single row to the dataset.
Args:
data: Either a Data object or a dictionary to add as a new row
Returns:
DataFrame: A new DataFrame with the added row
Example:
>>> dataset = DataFrame([{"name": "John"}])
>>> dataset = dataset.add_row({"name": "Jane"})
"""
if isinstance(data, Data):
data = data.data
new_df = self._constructor([data])
return pd.concat([self, new_df], ignore_index=True).pipe(self._constructor)
def add_rows(self, data: list[dict | "Data"]) -> "DataFrame":
"""Adds multiple rows to the dataset.
Args:
data: List of Data objects or dictionaries to add as new rows
Returns:
DataFrame: A new DataFrame with the added rows
"""
processed_data = []
for item in data:
if isinstance(item, Data):
processed_data.append(item.data)
else:
processed_data.append(item)
new_df = self._constructor(processed_data)
return pd.concat([self, new_df], ignore_index=True).pipe(self._constructor)
@property
def _constructor(self):
def _c(*args, **kwargs):
return DataFrame(*args, **kwargs).__finalize__(self)
return _c
def __bool__(self):
"""Truth value testing for the DataFrame.
Returns True if the DataFrame has at least one row, False otherwise.
"""
return not self.empty
from langflow.components.helpers.notepad import remove_value
# unit tests
def test_edge_invalid_position():
# Test removing by an invalid position (negative)
df = DataFrame({"value": ["a", "b", "c"]})
with pytest.raises(ValueError):
remove_value(df, value="", position=-1)
# Test removing by an invalid position (out of bounds)
with pytest.raises(ValueError):
remove_value(df, value="", position=3)
import pandas as pd
# imports
import pytest # used for our unit tests
from langflow.components.helpers.notepad import remove_value
# function to test
class DataFrame(pd.DataFrame):
"""A pandas DataFrame subclass specialized for handling collections of Data objects.
This class extends pandas.DataFrame to provide seamless integration between
Langflow's Data objects and pandas' powerful data manipulation capabilities.
Args:
data: Input data in various formats:
- List[Data]: List of Data objects
- List[Dict]: List of dictionaries
- Dict: Dictionary of arrays/lists
- pandas.DataFrame: Existing DataFrame
- Any format supported by pandas.DataFrame
**kwargs: Additional arguments passed to pandas.DataFrame constructor
Examples:
>>> # From Data objects
>>> dataset = DataFrame([Data(data={"name": "John"}), Data(data={"name": "Jane"})])
>>> # From dictionaries
>>> dataset = DataFrame([{"name": "John"}, {"name": "Jane"}])
>>> # From dictionary of lists
>>> dataset = DataFrame({"name": ["John", "Jane"], "age": [30, 25]})
"""
def __init__(self, data: list[dict] | pd.DataFrame | None = None, **kwargs):
if data is None:
super().__init__(**kwargs)
return
if isinstance(data, list):
if all(isinstance(x, dict) for x in data):
kwargs["data"] = data
else:
msg = "List items must be all dictionaries"
raise ValueError(msg)
elif isinstance(data, (dict, pd.DataFrame)):
kwargs["data"] = data
super().__init__(**kwargs)
def to_data_list(self) -> list:
"""Converts the DataFrame back to a list of Data objects."""
list_of_dicts = self.to_dict(orient="records")
return [Data(data=row) for row in list_of_dicts]
def add_row(self, data: dict) -> "DataFrame":
"""Adds a single row to the dataset.
Args:
data: Either a Data object or a dictionary to add as a new row
Returns:
DataFrame: A new DataFrame with the added row
Example:
>>> dataset = DataFrame([{"name": "John"}])
>>> dataset = dataset.add_row({"name": "Jane"})
"""
new_df = self._constructor([data])
return pd.concat([self, new_df], ignore_index=True)
def add_rows(self, data: list[dict]) -> "DataFrame":
"""Adds multiple rows to the dataset.
Args:
data: List of Data objects or dictionaries to add as new rows
Returns:
DataFrame: A new DataFrame with the added rows
"""
new_df = self._constructor(data)
return pd.concat([self, new_df], ignore_index=True)
@property
def _constructor(self):
def _c(*args, **kwargs):
return DataFrame(*args, **kwargs).__finalize__(self)
return _c
def __bool__(self):
"""Truth value testing for the DataFrame.
Returns True if the DataFrame has at least one row, False otherwise.
"""
return not self.empty
from langflow.components.helpers.notepad import remove_value
# unit tests
def test_invalid_position():
# Edge case: Invalid position
df = DataFrame({"value": ["a", "b", "c"]})
with pytest.raises(ValueError):
remove_value(df, value="", position=-1)
with pytest.raises(ValueError):
remove_value(df, value="", position=3)
CodSpeed Performance ReportMerging #5845 will not alter performanceComparing Summary
|
⚡️ Codeflash found optimizations for this PR📄 240% (2.40x) speedup for
|
Test | Status |
---|---|
⚙️ Existing Unit Tests | 🔘 None Found |
🌀 Generated Regression Tests | ✅ 21 Passed |
⏪ Replay Tests | 🔘 None Found |
🔎 Concolic Coverage Tests | 🔘 None Found |
📊 Tests Coverage | undefined |
🌀 Generated Regression Tests Details
import pandas as pd
# imports
import pytest # used for our unit tests
from langflow.components.helpers.notepad import edit_value
# function to test
class DataFrame(pd.DataFrame):
"""A pandas DataFrame subclass specialized for handling collections of Data objects.
This class extends pandas.DataFrame to provide seamless integration between
Langflow's Data objects and pandas' powerful data manipulation capabilities.
Args:
data: Input data in various formats:
- List[Data]: List of Data objects
- List[Dict]: List of dictionaries
- Dict: Dictionary of arrays/lists
- pandas.DataFrame: Existing DataFrame
- Any format supported by pandas.DataFrame
**kwargs: Additional arguments passed to pandas.DataFrame constructor
Examples:
>>> # From Data objects
>>> dataset = DataFrame([Data(data={"name": "John"}), Data(data={"name": "Jane"})])
>>> # From dictionaries
>>> dataset = DataFrame([{"name": "John"}, {"name": "Jane"}])
>>> # From dictionary of lists
>>> dataset = DataFrame({"name": ["John", "Jane"], "age": [30, 25]})
"""
def __init__(self, data: list[dict] | pd.DataFrame | None = None, **kwargs):
if data is None:
super().__init__(**kwargs)
return
if isinstance(data, list):
if all(isinstance(x, dict) for x in data):
kwargs["data"] = data
else:
msg = "List items must be all dictionaries"
raise ValueError(msg)
elif isinstance(data, dict | pd.DataFrame):
kwargs["data"] = data
super().__init__(**kwargs)
def to_data_list(self) -> list:
"""Converts the DataFrame back to a list of dictionaries."""
return self.to_dict(orient="records")
def add_row(self, data: dict) -> "DataFrame":
"""Adds a single row to the dataset.
Args:
data: A dictionary to add as a new row
Returns:
DataFrame: A new DataFrame with the added row
Example:
>>> dataset = DataFrame([{"name": "John"}])
>>> dataset = dataset.add_row({"name": "Jane"})
"""
new_df = self._constructor([data])
return pd.concat([self, new_df], ignore_index=True)
def add_rows(self, data: list[dict]) -> "DataFrame":
"""Adds multiple rows to the dataset.
Args:
data: List of dictionaries to add as new rows
Returns:
DataFrame: A new DataFrame with the added rows
"""
new_df = self._constructor(data)
return pd.concat([self, new_df], ignore_index=True)
@property
def _constructor(self):
def _c(*args, **kwargs):
return DataFrame(*args, **kwargs).__finalize__(self)
return _c
def __bool__(self):
"""Truth value testing for the DataFrame.
Returns True if the DataFrame has at least one row, False otherwise.
"""
return not self.empty
from langflow.components.helpers.notepad import edit_value
# unit tests
def test_edit_value_basic():
# Test editing the first row
df = DataFrame([{"value": "old1"}, {"value": "old2"}])
codeflash_output = edit_value(df, "new", 0)
# Test editing a middle row
df = DataFrame([{"value": "old1"}, {"value": "old2"}, {"value": "old3"}])
codeflash_output = edit_value(df, "new", 1)
# Test editing the last row
df = DataFrame([{"value": "old1"}, {"value": "old2"}])
codeflash_output = edit_value(df, "new", 1)
# Test editing the last row when no position is specified
df = DataFrame([{"value": "old1"}, {"value": "old2"}])
codeflash_output = edit_value(df, "new")
# Test editing the only row in a single-row DataFrame
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, "new")
def test_edit_value_edge_cases():
# Test editing a value in an empty DataFrame
df = DataFrame([])
codeflash_output = edit_value(df, "new")
# Test editing with a negative position
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, "new", -1)
# Test editing with a position equal to the length of the DataFrame
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, "new", 1)
# Test editing with a position greater than the length of the DataFrame
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, "new", 2)
def test_edit_value_data_types():
# Test editing with a string value
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, "new")
# Test editing with an integer value
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, 123)
# Test editing with a float value
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, 123.45)
# Test editing with a boolean value
df = DataFrame([{"value": "old"}])
codeflash_output = edit_value(df, True)
def test_edit_value_large_scale():
# Test editing a value in a DataFrame with a large number of rows
df = DataFrame([{"value": "old"}] * 1000000)
codeflash_output = edit_value(df, "new", 999999)
# Test editing a value in a DataFrame with a large number of columns
df = DataFrame([{f"col{i}": "old" for i in range(1000)}])
codeflash_output = edit_value(df, "new")
def test_edit_value_miscellaneous():
# Test editing a value in a DataFrame with a custom index
df = DataFrame([{"value": "old"}], index=pd.date_range("20210101", periods=1))
codeflash_output = edit_value(df, "new")
# Test editing a value in a DataFrame with a multi-index
arrays = [["A", "A", "B", "B"], ["one", "two", "one", "two"]]
index = pd.MultiIndex.from_arrays(arrays, names=("first", "second"))
df = DataFrame([{"value": "old"}] * 4, index=index)
codeflash_output = edit_value(df, "new", 2)
# Test editing a value in a DataFrame that contains NaN values
df = DataFrame([{"value": "old", "name": None}])
codeflash_output = edit_value(df, "new")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from typing import cast
import pandas as pd
# imports
import pytest # used for our unit tests
from langflow.components.helpers.notepad import edit_value
# function to test
class DataFrame(pd.DataFrame):
"""A pandas DataFrame subclass specialized for handling collections of Data objects.
This class extends pandas.DataFrame to provide seamless integration between
Langflow's Data objects and pandas' powerful data manipulation capabilities.
Args:
data: Input data in various formats:
- List[Data]: List of Data objects
- List[Dict]: List of dictionaries
- Dict: Dictionary of arrays/lists
- pandas.DataFrame: Existing DataFrame
- Any format supported by pandas.DataFrame
**kwargs: Additional arguments passed to pandas.DataFrame constructor
Examples:
>>> # From Data objects
>>> dataset = DataFrame([Data(data={"name": "John"}), Data(data={"name": "Jane"})])
>>> # From dictionaries
>>> dataset = DataFrame([{"name": "John"}, {"name": "Jane"}])
>>> # From dictionary of lists
>>> dataset = DataFrame({"name": ["John", "Jane"], "age": [30, 25]})
"""
def __init__(self, data: list[dict] | list["Data"] | pd.DataFrame | None = None, **kwargs):
if data is None:
super().__init__(**kwargs)
return
if isinstance(data, list):
if all(isinstance(x, Data) for x in data):
data = [d.data for d in data if hasattr(d, "data")]
elif not all(isinstance(x, dict) for x in data):
msg = "List items must be either all Data objects or all dictionaries"
raise ValueError(msg)
kwargs["data"] = data
elif isinstance(data, dict | pd.DataFrame):
kwargs["data"] = data
super().__init__(**kwargs)
def to_data_list(self) -> list["Data"]:
"""Converts the DataFrame back to a list of Data objects."""
list_of_dicts = self.to_dict(orient="records")
return [Data(data=row) for row in list_of_dicts]
def add_row(self, data: dict | "Data") -> "DataFrame":
"""Adds a single row to the dataset.
Args:
data: Either a Data object or a dictionary to add as a new row
Returns:
DataFrame: A new DataFrame with the added row
Example:
>>> dataset = DataFrame([{"name": "John"}])
>>> dataset = dataset.add_row({"name": "Jane"})
"""
if isinstance(data, Data):
data = data.data
new_df = self._constructor([data])
return cast("DataFrame", pd.concat([self, new_df], ignore_index=True))
def add_rows(self, data: list[dict | "Data"]) -> "DataFrame":
"""Adds multiple rows to the dataset.
Args:
data: List of Data objects or dictionaries to add as new rows
Returns:
DataFrame: A new DataFrame with the added rows
"""
processed_data = []
for item in data:
if isinstance(item, Data):
processed_data.append(item.data)
else:
processed_data.append(item)
new_df = self._constructor(processed_data)
return cast("DataFrame", pd.concat([self, new_df], ignore_index=True))
@property
def _constructor(self):
def _c(*args, **kwargs):
return DataFrame(*args, **kwargs).__finalize__(self)
return _c
def __bool__(self):
"""Truth value testing for the DataFrame.
Returns True if the DataFrame has at least one row, False otherwise.
"""
return not self.empty
from langflow.components.helpers.notepad import edit_value
# unit tests
def test_edit_value_empty_dataframe():
# Test editing the value in an empty DataFrame
df = DataFrame([])
codeflash_output = edit_value(df, "new_value")
def test_edit_value_invalid_dataframe():
# Test editing the value when the input is not a DataFrame
with pytest.raises(AttributeError):
edit_value("not_a_dataframe", "new_value")
def test_edit_value_multi_index_dataframe():
# Test editing the value in a multi-index DataFrame
arrays = [['bar', 'bar', 'baz', 'baz'],
['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
df = DataFrame({"value": ["old_value1", "old_value2", "old_value3", "old_value4"]}, index=index)
codeflash_output = edit_value(df, "new_value", 2)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
This PR is now faster! 🚀 Gabriel Luiz Freitas Almeida accepted my code suggestion above. |
This pull request introduces significant improvements to the NotepadComponent, including enhanced operations for adding, removing, and editing values, as well as robust context management using DataFrames. It also adds support for converting DataFrame outputs to a dictionary format when processing tool events. Comprehensive unit tests have been implemented to ensure reliability and robustness of the NotepadComponent functionalities, including error handling and validation improvements. These changes enhance the overall functionality and usability of the application, facilitating better data organization and manipulation.