Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add target-diff #664

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

JSCU-CNI
Copy link
Contributor

@JSCU-CNI JSCU-CNI commented Apr 4, 2024

This PR adds the command target-diff, which can be used to compare two or more targets against one another:

$ target-diff --help

target-diff

positional arguments:
  {shell,fs,query}      Mode for differentiating targets
    shell               Open an interactive shell to compare two or more targets.
    fs                  Yield records about differences between target filesystems.
    query               Differentiate plugin outputs between two or more targets.

options:
  -d, --deep            Compare file contents even if metadata suggests they have been left unchanged (default: False)
  -l LIMIT, --limit LIMIT
                        How many bytes to compare before assuming a file is left unchanged (0 for no limit) (default:
                        32768)

fs mode outputs records denoting filesystem changes from one target to the other:

$ target-diff --deep fs src.tar dst.tar

<differential/file/created hostname=None domain=None src_target='src.tar' dst_target='dst.tar' path='/changes/only_on_dst'>
<differential/file/deleted hostname=None domain=None src_target='src.tar' dst_target='dst.tar' path='/changes/only_on_src'>
<differential/file/modified hostname=None domain=None src_target='src.tar' dst_target='dst.tar' path='/changes/changed' diff=[b'--- \n', b'+++ \n', b'@@ -1 +1 @@\n', b'-SRC', b'+DST']>

Using query mode, you can compare plugin outputs from one target to the other:

$ target-diff query -f users src.tar dst.tar

<differential/record/unchanged hostname=None domain=None src_target='src.tar' dst_target='dst.tar' record=<unix/user hostname='dst_target' domain=None name='root' passwd='x' uid=0 gid=0 gecos='root' home='/root' shell='/bin/bash' source='/etc/passwd'>>
<differential/record/unchanged hostname=None domain=None src_target='src.tar' dst_target='dst.tar' record=<unix/user hostname='dst_target' domain=None name='user' passwd='x' uid=1000 gid=1000 gecos='user' home='/home/user' shell='/bin/bash' source='/etc/passwd'>>
<differential/record/created hostname=None domain=None src_target='src.tar' dst_target='dst.tar' record=<unix/user hostname='dst_target' domain=None name='dst_user' passwd='x' uid=1001 gid=1001 gecos='dst_user' home='/home/dst_user' shell='/bin/bash' source='/etc/passwd'>>
<differential/record/deleted hostname=None domain=None src_target='src.tar' dst_target='dst.tar' record=<unix/user hostname='src_target' domain=None name='src_user' passwd='x' uid=1001 gid=1001 gecos='src_user' home='/home/src_user' shell='/bin/bash' source='/etc/passwd'>>

In shell mode, you can browse the target filesystems like in target-shell, where directory listings will show which files / directories have been changed, added or deleted. Using the plugin command, plugin outputs can be compared from within the shell context.

$ target-diff shell src.tar dst.tar

(dst_target/src_target)/diff />help

Target Diff
==========


Documented commands (type help <topic>):
=================================================================
cat  clear  diff   exit  help  ls    plugin  previous  set
cd   cyber  enter  find  list  next  prev    python  

(dst_target/src_target)/diff />cd changes
(dst_target/src_target)/diff /changes>ls
changed
only_on_dst (deleted)
only_on_src (created)
subdirectory_both
subdirectory_dst (deleted)
subdirectory_src (created)
unchanged

target-diff depends on fox-it/flow.record#107. To allow tests to run for this PR we've temporarily bumped flow.record to 3.15.dev10 in pyproject.toml

When three or more targets are provided, you can choose between treating every target as a 'delta' or compare every target against one 'absolute' target. Treating targets as 'deltas' is useful if you have multiple snapshots of the same target from different points in time. Treating targets as 'absolutes' can be useful in situations where you have a 'golden image' that you want to compare different targets against.

To keep code duplication low between tools/diff.py and tools/shell.py, this PR adds a superclass ExtendedCmd to shell.py that contains most of the functionality that is shared between the two. Both TargetCmd and DifferentialCli inherit from this class.

Copy link
Member

@yunzheng yunzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review the whole PR, just added some small flow.record quality of life suggestions since the merge of fox-it/flow.record#115

Available in flow.record==3.15.

pyproject.toml Outdated Show resolved Hide resolved
dissect/target/tools/diff.py Outdated Show resolved Hide resolved
dissect/target/tools/diff.py Outdated Show resolved Hide resolved
@JSCU-CNI JSCU-CNI force-pushed the feature/add-target-diff branch 2 times, most recently from 131e6b5 to aac31af Compare June 12, 2024 09:02
@JSCU-CNI
Copy link
Contributor Author

Thanks for your suggestions on the flow.record context manager addition. We were wondering if there is an ETA on a review of the PR as a whole, considering it's been over 2 months.

@JSCU-CNI
Copy link
Contributor Author

JSCU-CNI commented Aug 1, 2024

We understand this PR will take some time to review, but we do think it might be worthwhile to incorporate the changes to shell.py (the ExtendedCmd class, some fixes to target-shell, moving some functions outside of Cmd classes so they can be re-used) into main earlier. In its current state, this PR being left open for a long time causes it to increasingly divert from target-shell.

What we could do is move the shell.py changes into a seperate PR that you can then review first. We don't mind making this PR, and while we're at it we can also pick up some open issues along the way, such as #623, #624, #625 and #585. These improvements can be used for target-fs as well, as it only makes sense to re-use as much as possible between target-shell and target-fs when it comes to outputting target filesystem information. This will likely create a merge conflict with #716 if that is not yet merged, but if that ends up happening we'll incorporate those changes as well.

Do you agree with this approach?

@Schamper
Copy link
Member

Schamper commented Aug 1, 2024

I am working slowly through the backlog of long outstanding PRs but had not gotten to this yet. Your proposal of splitting it up sounds like a good idea though, it's always nice if we can split of large PRs into separate smaller ones. I won't complain either if you pick up some of the mentioned improvements along the way 😄.

I've just merged #716 for your convenience so feel free to do with that as you please 😉.

@JSCU-CNI JSCU-CNI mentioned this pull request Aug 5, 2024
@EinatFox EinatFox linked an issue Aug 6, 2024 that may be closed by this pull request
@JSCU-CNI JSCU-CNI mentioned this pull request Aug 8, 2024
* `--hex` can be used to diff binary files in a readable way.
* `--only-changed` can be used to omit unchanged records when comparing plugin outputs
@JSCU-CNI
Copy link
Contributor Author

Target-diff should be more or less on par with the changes made in #812 and should be ready for review @Schamper .

@JSCU-CNI JSCU-CNI requested a review from yunzheng October 7, 2024 09:56
@yunzheng yunzheng removed their request for review October 7, 2024 10:38
@yunzheng
Copy link
Member

yunzheng commented Oct 7, 2024

I will check if someone from the dissect team can review this.

type=int,
help="How many bytes to compare before assuming a file is left unchanged (0 for no limit)",
)
subparsers = parser.add_subparsers(help="Mode for differentiating targets", dest="mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
subparsers = parser.add_subparsers(help="Mode for differentiating targets", dest="mode")
subparsers = parser.add_subparsers(help="Mode for differentiating targets", dest="mode", required=True)

shell_mode.add_argument("targets", metavar="TARGETS", nargs="*", help="Targets to differentiate between")

fs_mode = subparsers.add_parser("fs", help="Yield records about differences between target filesystems.")
fs_mode.add_argument("targets", metavar="TARGETS", nargs="*", help="Targets to differentiate between")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fs_mode.add_argument("targets", metavar="TARGETS", nargs="*", help="Targets to differentiate between")
fs_mode.add_argument("targets", metavar="TARGETS", nargs="+", help="Targets to differentiate between")

subparsers = parser.add_subparsers(help="Mode for differentiating targets", dest="mode")

shell_mode = subparsers.add_parser("shell", help="Open an interactive shell to compare two or more targets.")
shell_mode.add_argument("targets", metavar="TARGETS", nargs="*", help="Targets to differentiate between")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shell_mode.add_argument("targets", metavar="TARGETS", nargs="*", help="Targets to differentiate between")
shell_mode.add_argument("targets", metavar="TARGETS", nargs="+", help="Targets to differentiate between")

)

query_mode = subparsers.add_parser("query", help="Differentiate plugin outputs between two or more targets.")
query_mode.add_argument("targets", metavar="TARGETS", nargs="*", help="Targets to differentiate between")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
query_mode.add_argument("targets", metavar="TARGETS", nargs="*", help="Targets to differentiate between")
query_mode.add_argument("targets", metavar="TARGETS", nargs="+", help="Targets to differentiate between")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid errors when invoking target-diff without targets or mode


args = parser.parse_args()
process_generic_arguments(args)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(args.targets) < 2:
print("At least two targets are required for diff.")
exit(1)

Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 70.28986% with 164 lines in your changes missing coverage. Please review.

Project coverage is 76.63%. Comparing base (6c672fa) to head (5436273).

Files with missing lines Patch % Lines
dissect/target/tools/diff.py 70.28% 164 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #664      +/-   ##
==========================================
- Coverage   76.75%   76.63%   -0.13%     
==========================================
  Files         315      316       +1     
  Lines       27126    27678     +552     
==========================================
+ Hits        20820    21210     +390     
- Misses       6306     6468     +162     
Flag Coverage Δ
unittests 76.63% <70.28%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

exclude=args.exclude,
)
elif args.mode == "query":
iterator = differentiate_target_plugin_outputs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
iterator = differentiate_target_plugin_outputs(
if args.deep:
log.warning("--deep parameter ignored for query mode.")
iterator = differentiate_target_plugin_outputs(

self._select_source_and_dest(0, 1)
if len(self.targets) > 2:
# Some help may be nice if you are diffing more than 2 targets at once
self.do_help(arg=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line should be moved after the super().__init__() otherwise you get an error:
AttributeError: 'DifferentialCli' object has no attribute 'stdout'

return self._write_entry_contents_to_stdout(entry.dst_target_entry, stdout)
print(f"File {name} not found.")
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@arg("path", nargs="?")
@alias("xxd")
def cmd_hexdump(self, args: argparse.Namespace, stdout: TextIO) -> bool:
setattr(args, "hex", True)
return self.cmd_diff(args, stdout)

"""Given a list of targets, compare targets against one another and yield File[Created|Modified|Deleted]Records
indicating the differences between them."""
if len(targets) < 2:
raise ValueError("Provide two or more targets to differentiate between.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true for all modes right? maybe move it to a more generic place? - see my relevant corresponding suggestion...

type=int,
help="How many bytes to compare before assuming a file is left unchanged (0 for no limit)",
)
subparsers = parser.add_subparsers(help="Mode for differentiating targets", dest="mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be an interesting idea to split the subparsers into multiple different functions for readability...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add target-diff PR#664
4 participants