Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions include/tvm/ffi/string.h
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,58 @@ class String {
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with std::string, it's good practice to define a static npos member to represent 'not found'. This improves readability and makes the interface more familiar to C++ developers.

  static constexpr size_t npos = static_cast<size_t>(-1);

/*!
* \brief Find the first occurrence of a substring
* \param str The substring to search for
* \param pos The position at which to start the search
* \return The position of the first character of the first match, or size_t(-1) if not found
*/
size_t find(const String& str, size_t pos = 0) const { return find(str.data(), pos, str.size()); }

/*!
* \brief Find the first occurrence of a substring
* \param str The substring to search for
* \param pos The position at which to start the search
* \return The position of the first character of the first match, or size_t(-1) if not found
*/
size_t find(const char* str, size_t pos = 0) const { return find(str, pos, std::strlen(str)); }

/*!
* \brief Find the first occurrence of a substring
* \param str The substring to search for
* \param pos The position at which to start the search
* \param count The length of the substring
* \return The position of the first character of the first match, or size_t(-1) if not found
*/
size_t find(const char* str, size_t pos, size_t count) const {
if (count == 0) return pos <= size() ? pos : size_t(-1);
if (pos >= size() || count > size() - pos) return size_t(-1);

const char* this_data = data();
size_t this_size = size();

for (size_t i = pos; i <= this_size - count; ++i) {
if (std::memcmp(this_data + i, str, count) == 0) {
return i;
}
}
return size_t(-1);
}
Comment on lines 640 to 642
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The manual implementation of find is complex and has a subtle inconsistency with std::string::find's behavior for searching for empty strings. It's also likely less performant than the standard library's implementation.

It's better to delegate to std::string_view::find, which is highly optimized and guaranteed to be correct. This simplifies the code, improves maintainability, and ensures standard-compliant behavior (including edge cases like searching for an empty string past the end).

Note that std::string_view::find returns std::string_view::npos on failure, which is equivalent to static_cast<size_t>(-1).

  size_t find(const char* str, size_t pos, size_t count) const {
    return std::string_view(data(), size()).find(std::string_view(str, count), pos);
  }


/*!
* \brief Returns a substring [pos, pos+count)
* \param pos The position of the first character to include
* \param count The length of the substring (default: until end of string)
* \return A string containing the substring
*/
String substr(size_t pos = 0, size_t count = size_t(-1)) const {
if (pos > size()) {
throw std::out_of_range("tvm::String substr index out of bounds");
}
size_t rcount = std::min(count, size() - pos);
return String(data() + pos, rcount);
}

/*!
* \brief Convert String to an std::string object
*
Expand Down
36 changes: 36 additions & 0 deletions tests/cpp/test_string.cc
Original file line number Diff line number Diff line change
Expand Up @@ -445,4 +445,40 @@ TEST(String, StdHash) {
EXPECT_EQ(std::hash<Bytes>()(s3), std::hash<Bytes>()(s4));
}

TEST(String, Find) {
String s{"hello world"};
EXPECT_EQ(s.find("world"), 6);
EXPECT_EQ(s.find("hello"), 0);
EXPECT_EQ(s.find("o"), 4);
EXPECT_EQ(s.find("o", 5), 7);
EXPECT_EQ(s.find("notfound"), size_t(-1));
EXPECT_EQ(s.find(""), 0);
EXPECT_EQ(s.find("", 5), 5);
EXPECT_EQ(s.find("", 20), size_t(-1));

String pattern{"world"};
EXPECT_EQ(s.find(pattern), 6);

String empty{""};
EXPECT_EQ(empty.find("x"), size_t(-1));
EXPECT_EQ(empty.find(""), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve test clarity and align with the suggested changes for String::find:

  1. Use String::npos (which should be added to the String class) instead of size_t(-1) for better readability.
  2. The test for finding an empty string with a starting position beyond the string's length should expect s.size() to be returned, which is the standard behavior for std::string::find.
  EXPECT_EQ(s.find("notfound"), String::npos);
  EXPECT_EQ(s.find(""), 0);
  EXPECT_EQ(s.find("", 5), 5);
  EXPECT_EQ(s.find("", 20), s.size());

  String pattern{"world"};
  EXPECT_EQ(s.find(pattern), 6);

  String empty{"

}

TEST(String, Substr) {
String s{"hello world"};
EXPECT_EQ(s.substr(0, 5), "hello");
EXPECT_EQ(s.substr(6, 5), "world");
EXPECT_EQ(s.substr(6), "world");
EXPECT_EQ(s.substr(0), "hello world");
EXPECT_EQ(s.substr(11), "");
EXPECT_EQ(s.substr(0, 0), "");

EXPECT_THROW(s.substr(12), std::out_of_range);
EXPECT_THROW(s.substr(100), std::out_of_range);

String empty{""};
EXPECT_EQ(empty.substr(0), "");
EXPECT_THROW(empty.substr(1), std::out_of_range);
}

} // namespace
15 changes: 15 additions & 0 deletions tests/python/test_string.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,18 @@ def test_bytes() -> None:
b5 = pickle.loads(pickle.dumps(b))
assert b5 == b"hello"
assert isinstance(b5, tvm_ffi.core.Bytes)


def test_string_find_substr() -> None:
s = tvm_ffi.core.String("hello world")
assert s.find("world") == 6
assert s.find("hello") == 0
assert s.find("o") == 4
assert s.find("o", 5) == 7
assert s.find("notfound") == -1
assert s.find("") == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Python tests for find could be more comprehensive. It would be good to add tests for searching for an empty string with a starting position, especially one that is out of bounds, to ensure it matches Python's str.find behavior.

For example, Python's "abc".find("", 5) returns 3 (the length of the string).

Suggested change
assert s.find("") == 0
assert s.find("") == 0
assert s.find("", 5) == 5
assert s.find("", 11) == 11
assert s.find("", 20) == 11


assert s[6:11] == "world"
assert s[0:5] == "hello"
assert s[6:] == "world"
assert s[:5] == "hello"