-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue#18: internal_co: Always use the internal logical lines of RCSStream instead of the flat plain text to reproduce the next revision. #22
base: master
Are you sure you want to change the base?
Conversation
* run-test.py (internal_co_broken_rcsfile): New test. (test_list): Add it.
…t_db. As the return value of RCSStream.invert_diff() is not to be applied for flat text content but for internal logical lines in RCSStream, which may contain some unterminated logical lines at any position, so we should use the content of internal logical lines in RCSStream as base text which is used to get the content of the next newer revision. With this commit, it is implemented by splitting the checkout() method in TextRecord, checkout() method for external use for as in the past and checkout_as_lines() method for internal use. * cvs2svn_lib/checkout_internal.py (TextRecord.checout_as_lines): New method. Replacement of checkout() method but returns internal lines in RCSStream instead of a plain text. (TextRecord.checkout): Use checkout_as_lines() for default implementation. (FullTextRecord.checkout, DeltaTextRecord.checkout): Removed to use default implementation. (FullTextRecord.checkout_as_lines): New method. Just same logic as the past checkout() method. (DeltaTextRecord.checkout_as_lines): New method. Just same logic as the past checkout() method but uses internal lines in rcs_stream instead of its text. (_Sink.set_revision_info): Record the internal lines in rcs_stream instead of its text at revision 1.1. * cvs2svn_lib/rcs_stream.py (RCSStream.__init__): Allow to set lines directly in addition to a text. (RCSStream.get_lines): New method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a clean solution.
you're now serializing revision contents as lines rather than text (which, btw, means that you need to update some classes' comment). i'd expect this to have a somewhat detrimental effect on database size, and probably also on runtime (more i/o, and serialization with higher granularity, partially offset by saving the repeated splits and joins). did you benchmark this variant? the commit message should document the result.
more fundamentally, i wonder whether this way of dealing with the problem is correct.
did you verify that rcs produces the same output, incl. on intermediate revision, after repeated corruptions of this kind?
the alternative would be dealing with the problem right at the source: if an add not at the end of the text doesn't end with a newline, just add one.
cvs2svn_lib/checkout_internal.py
Outdated
def checkout(self, text_record_db): | ||
"""Return the text for this revision. | ||
|
||
Just same as checkout_as_lines() but returns text as flat text string.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"just same as" is a questionable construct. "same as" or "just as".
you're making the same mistake in the commit message.
also, add comma before "but".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing it out. I corrected it and looked over comments in this file and make a new commit eb5aab7.
* test-data/gh-issue-18-irregular-rcsfile-cvsrepos/irregular.txt,v - Insert a broken delta in r1.5 - Add a branch revision r1.2.1.1, branched from corrupted revision r1.2.
Thank you for the review.
I think amount of the increase of the resouce consumption is limited by the number of branches, FullTextRecord for rev 1.1 in trunk and CheckedOutTextRecord for each branching point revision in other branches. Unfortunately I don't have suitable example of such a repo that have many branches compared with total revision numbers and each files in the repos are large enough. So I didn't any benchmark it.
Yes, I compared with RCS output in some examples including
The problem is that we don't know how it was corrupted and what it should be. So I tried to imitate how RCS/CVS deal it. |
This was not true... It contains only 1 corrupt delta. One more corrupt delta was lost, perhaps by wrong git stash operation. (So the commit log of 5d1a8df is incorrect). |
Previous commit did not contain new delta r1.5. This is a correction.
Then I've added the lost corrupted delta. |
I'm sorry, I missunderstood. On intermediate revison nodes also use CheckedOutTextRecord after checked it out, because there still exist atleast one newer revision to refer it. So it does not need so special example to measure its effect. (although I still don't have a repos example which has suitable size...) |
As the return value of RCSStream.invert_diff() is not to be applied for flat text content but for internal logical lines in RCSStream, which may contain some unterminated logical lines at any position, so we should use the content of internal logical lines in RCSStream as base text which is used to get the content of the next newer revision.
With this patch, it is implemented by splitting the checkout() method in TextRecord, checkout() method for external use for as in the past and checkout_as_lines() method for internal use.
See also the comments in PR #19, which will be closed as this PR is a replacement of it.