Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove more regex #2897

Merged
merged 1 commit into from
Jan 25, 2024
Merged

remove more regex #2897

merged 1 commit into from
Jan 25, 2024

Conversation

neheb
Copy link
Collaborator

@neheb neheb commented Jan 19, 2024

Mitght be worth it. A lot of extra code though...

ping @Wormnest

@ghost
Copy link

ghost commented Jan 19, 2024

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (57d9d2f) 63.90% compared to head (1000540) 63.93%.

Files Patch % Lines
src/value.cpp 86.36% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2897      +/-   ##
==========================================
+ Coverage   63.90%   63.93%   +0.03%     
==========================================
  Files         104      104              
  Lines       22389    22400      +11     
  Branches    10876    10877       +1     
==========================================
+ Hits        14308    14322      +14     
+ Misses       5857     5854       -3     
  Partials     2224     2224              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Wormnest
Copy link

@neheb Thanks for working on this. Tested the patch and still seeing the slowdown. TimeValue::read in the same file is affected in the same way.

@neheb
Copy link
Collaborator Author

neheb commented Jan 19, 2024

On a cursory glance, that regex is quite long. It would be difficult to remove.

@Wormnest
Copy link

Yes, understandable. GIMP may have to wait switching to UCRT64 until this is fixed upstream, or go for CLANG64 instead.

@neheb
Copy link
Collaborator Author

neheb commented Jan 19, 2024

alternatively ask the GIMP developers to patch exiv2 to get rid of regex usage. I may take a crack at it at some point. Doubtful though.

It's very good that there are tests for this. Original version was horribly broken.

@Wormnest
Copy link

I am one of the GIMP devs 😃 However, I personally have little C++ experience and we are already spread thin with what we can handle and needing to release 3.0 in a reasonable time. Since MINGW64 is still working ok for us, I think it best we stay with that for now, even if it means not being able to use the latest version of exiv2.

I appreciate you having taken the time to look at, thanks.

@neheb
Copy link
Collaborator Author

neheb commented Jan 19, 2024

Regex is a recent addition?

oh I see. regex.h was replaced with C++ regex.

@kmilos
Copy link
Collaborator

kmilos commented Jan 20, 2024

regex.h was replaced with C++ regex

Yes, and there is unfortunately a bug in libstdc++ interaction w/ UCRT on Windows. CLANG64 is not affected as it uses libc++, and MINGW64 is not affected as it links to legacy MSVCRT instead.

Edit: Note that some say that moving away from a custom regex implementation was maybe not the best choice...

@neheb
Copy link
Collaborator Author

neheb commented Jan 20, 2024

It might make sense to switch to https://github.com/hanickadot/compile-time-regular-expressions

Or at least reg rid of regex in src.

@neheb
Copy link
Collaborator Author

neheb commented Jan 21, 2024

last regex removal in value.cpp

--- a/src/value.cpp
+++ b/src/value.cpp
@@ -9,7 +9,6 @@
 #include "types.hpp"
 
 // + standard includes
-#include <regex>
 #include <sstream>
 
 // *****************************************************************************
@@ -909,39 +908,66 @@ int TimeValue::read(const std::string& buf) {
   // https://web.archive.org/web/20171020084445/https://www.loc.gov/standards/datetime/ISO_DIS%208601-1.pdf
   // Not supported formats:
   // 4.2.2.4 Representations with decimal fraction: 232050,5
-  static const std::regex re(R"(^(2[0-3]|[01][0-9]):?([0-5][0-9])?:?([0-5][0-9])?$)");
-  static const std::regex reExt(
-      R"(^(2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])(Z|[+-](?:2[0-3]|[01][0-9])(?::?(?:[0-5][0-9]))?)$)");
-
-  if (std::smatch sm; std::regex_match(buf, sm, re) || std::regex_match(buf, sm, reExt)) {
-    time_.hour = sm.length(1) ? std::stoi(sm[1].str()) : 0;
-    time_.minute = sm.length(2) ? std::stoi(sm[2].str()) : 0;
-    time_.second = sm.length(3) ? std::stoi(sm[3].str()) : 0;
-    if (sm.size() > 4) {
-      std::string str = sm[4].str();
-      const auto strSize = str.size();
-      auto posColon = str.find(':');
-
-      if (posColon == std::string::npos) {
-        // Extended format
-        time_.tzHour = std::stoi(str.substr(0, 3));
-        if (strSize > 3) {
-          int minute = std::stoi(str.substr(3));
-          time_.tzMinute = time_.tzHour < 0 ? -minute : minute;
-        }
-      } else {
-        // Basic format
-        time_.tzHour = std::stoi(str.substr(0, posColon));
-        int minute = std::stoi(str.substr(posColon + 1));
-        time_.tzMinute = time_.tzHour < 0 ? -minute : minute;
-      }
+  size_t minutePos = 2;
+  size_t secondPos = 4;
+
+  if (buf.size() < 2)
+    return 1;
+
+  if (buf.size() > 8 && buf.find('+') == std::string::npos)
+    return 1;
+
+  for (auto c : buf)
+    if (c != ':' && c != '+' && c != '-' && !std::isdigit(c))
+      return 1;
+
+  if (buf[2] == ':') {
+    minutePos = 3;
+  }
+
+  if (buf.size() > 4 && buf[5] == ':') {
+    secondPos = 6;
+  }
+
+  auto tz = buf.find('+');
+  if (tz == std::string::npos) {
+    auto tzn = buf.find('-');
+    if (tzn == std::string::npos) {
+      time_.tzHour = 0;
+      time_.tzMinute = 0;
+    } else {
+      auto tzs = buf.substr(tzn, buf.size() - tzn);
+      time_.tzHour = std::stoul(tzs.substr(1, 2));
+      if (tzs.size() > 3)
+        time_.tzMinute = -std::stoul(tzs.substr(4, 2));
+      else if (tzs.size() > 4)
+        time_.tzMinute = -std::stoul(tzs.substr(3, 2));
     }
-    return 0;
+  } else {
+    auto tzs = buf.substr(tz, buf.size() - tz);
+    time_.tzHour = std::stoul(tzs.substr(1, 2));
+    if (tzs.size() > 3)
+      time_.tzMinute = std::stoul(tzs.substr(4, 2));
+    else if (tzs.size() > 4)
+      time_.tzMinute = std::stoul(tzs.substr(3, 2));
   }
+
+  auto h = std::stoi(buf.substr(0, 2));
+  if (h >= 24)
+    return 1;
+
+  time_.hour = h;
+  time_.minute = buf.size() > 3 ? std::stoul(buf.substr(minutePos, 2)) : 0;
+  time_.second = buf.size() > 5 ? std::stoul(buf.substr(secondPos, 2)) : 0;
+
 #ifndef SUPPRESS_WARNINGS
   EXV_WARNING << Error(ErrorCode::kerUnsupportedTimeFormat) << "\n";
 #endif
-  return 1;
+
+  if (time_.hour >= 24 || time_.minute >= 60 || time_.second >= 60 || time_.tzHour >= 24 || time_.tzMinute >= 60)
+    return 1;
+
+  return 0;
 }
 
 /// \todo not used internally. At least we should test it

still fails ATimeValue.canBeReadFromExtendedStringWithTimeZoneDesignatorNegative

Not sure what's wrong.

Signed-off-by: Rosen Penev <[email protected]>
@neheb neheb merged commit 2d7814b into Exiv2:main Jan 25, 2024
60 checks passed
@neheb neheb deleted the regex branch January 25, 2024 08:26
@kmilos kmilos added the refactoring Cleanup / Simplify code -> more readable / robust label Jan 25, 2024
@kmilos
Copy link
Collaborator

kmilos commented Jan 25, 2024

@mergify backport 0.28.x

Copy link
Contributor

mergify bot commented Jan 25, 2024

backport 0.28.x

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Cleanup / Simplify code -> more readable / robust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants