Skip to content

Commit

Permalink
Merge pull request #228 from TeamMsgExtractor/next-release
Browse files Browse the repository at this point in the history
Version 0.30.0
  • Loading branch information
TheElementalOfDestruction authored Jan 20, 2022
2 parents 125ad02 + 0939ae0 commit 4aabd61
Show file tree
Hide file tree
Showing 24 changed files with 612 additions and 582 deletions.
3 changes: 1 addition & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
language: python
python:
- "2.7"
- "3.5"
- "3.6"
install:
- python setup.py install
script:
Expand Down
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,29 @@
**v0.30.0**
* Removed all support for Python 2. This caused a lot of things to be moved around and changed from indirect references to direct references, so it's possible something fell through the cracks. I'm doing my best to test it, but let me know if you have an issue.
* Changed classes to now prefer super() over direct superclass initialization.
* Removed explicit object subclassing (it's implicit in Python 3 so we don't need it anymore).
* Converted most `.format`s into f strings.
* Improved consistency of docstrings. It's not perfect, but it should at least be better.
* Started the addition of type hints to functions and methods.
* Updated `utils.bytesToGuid` to make it faster and more efficient.
* Renamed `utils.msgEpoch` to `utils.filetimeToUtc` to be more descriptive.
* Updated internal variable names to be more consistent.
* Improvements to the way `__main__` works. This does not affect the output it will generate, only the efficiency and readability.

**v0.29.3**
* [[TeamMsgExtractor #226](https://github.com/TeamMsgExtractor/msg-extractor/issues/198)] Fix typo in command parsing that prevented the usage of `allowFallback`.
* Fixed main still manually navigating to a new directory with os.chdir instead of using `customPath`.
* Fixed issue in main where the `--html` option was being using for both html *and* rtf. This meant if you wanted rtf it would not have used it, and if you wanted html it would have thrown an error.
* Fixed `--out-name` having no effect.
* Fixed `--out` having no effect.

**v0.29.2**
* Fixed issue where the RTF injection was accidentally doing HTML escapes for non-encapsulated streams and *not* doing escapes for encapsulated streams.
* Fixed name error in `Message.save` causing bad logic. For context, the internal variable `zip` was renamed to `_zip` to avoid a name conflict with the built-in function. Some instances of it were missed.

**v0.29.1**
* [[TeamMsgExtractor #198](https://github.com/TeamMsgExtractor/msg-extractor/issues/198)] Added a feature to save the header in it's own file (prefers the full raw header if it can find it, otherwise puts in a generated one) that was actually supposed to be in v0.29.0 but I forgot, lol.

**v0.29.0**
* [[TeamMsgExtractor #207](https://github.com/TeamMsgExtractor/msg-extractor/issues/207)] Made it so that unspecified dates are handled properly. For clarification, an unspecified date is a custom value in MSG files for dates that means that the date is unspecified. It is distinctly different from a property not existing, which will still return None. For unspecified dates, `datetime.datetime.max` is returned. While perhaps not the best solution, it will have to do for now.
* Fixed an issue where `utils.parseType` was returning a string for the date when it makes more sense to return an actual datetime instance.
Expand Down
94 changes: 50 additions & 44 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
|License: GPL v3| |PyPI3| |PyPI1| |PyPI2|
|License: GPL v3| |PyPI3| |PyPI2|

msg-extractor
=============
Expand All @@ -10,10 +10,8 @@ data (from, to, cc, date, subject, body) and the email's attachments.

NOTICE
======
0.29.* will be the last versions that will support Python 2. While we want to
continue to support it, it would just be too much work to do so. We are
providing notice ahead of time of this change so that you may sort out your
Python environments ahead of time.
0.29.* is the branch that supports both Python 2 and Python 3. It is now only
receiving bug fixes and will not be receiving feature updates.

This module has a Discord server for general discussion. You can find it here:
`Discord`_
Expand All @@ -40,13 +38,13 @@ attachments.
The script uses Philippe Lagadec's Python module that reads Microsoft
OLE2 files (also called Structured Storage, Compound File Binary Format
or Compound Document File Format). This is the underlying format of
Outlook's .msg files. This library currently supports up to Python 2.7
and 3.4.
Outlook's .msg files. This library currently supports Python 3.6 and above.

The script was built using Peter Fiskerstrand's documentation of the
.msg format. Redemption's discussion of the different property types
used within Extended MAPI was also useful. For future reference, I note
that Microsoft have opened up their documentation of the file format.
The script was originally built using Peter Fiskerstrand's documentation of the
.msg format. Redemption's discussion of the different property types used within
Extended MAPI was also useful. For future reference, I note that Microsoft have
opened up their documentation of the file format, which is what is currently
being used for development.


#########REWRITE COMMAND LINE USAGE#############
Expand All @@ -55,35 +53,48 @@ refer to the usage information provided from the program's help dialog:
::

usage: extract_msg [-h] [--use-content-id] [--dev] [--validate] [--json]
[--file-logging] [--verbose] [--log LOG]
[--config CONFIG_PATH] [--out OUT_PATH] [--use-filename]
msg [msg ...]
[--file-logging] [--verbose] [--log LOG]
[--config CONFIG_PATH] [--out OUT_PATH] [--use-filename]
[--dump-stdout] [--html] [--raw] [--rtf]
[--allow-fallback] [--out-name OUT_NAME] msg [msg ...]

extract_msg: Extracts emails and attachments saved in Microsoft Outlook's .msg
files. https://github.com/mattgwwalker/msg-extractor
extract_msg: Extracts emails and attachments saved in Microsoft Outlook's
.msg files. https://github.com/TeamMsgExtractor/msg-extractor

positional arguments:
msg An msg file to be parsed
msg An msg file to be parsed

optional arguments:
-h, --help show this help message and exit
--use-content-id, --cid
Save attachments by their Content ID, if they have
one. Useful when working with the HTML body.
--dev Changes to use developer mode. Automatically enables
the --verbose flag. Takes precedence over the
--validate flag.
--validate Turns on file validation mode. Turns off regular file
output.
--json Changes to write output files as json.
--file-logging Enables file logging. Implies --verbose
--verbose Turns on console logging.
--log LOG Set the path to write the file log to.
--config CONFIG_PATH Set the path to load the logging config from.
--out OUT_PATH Set the folder to use for the program output.
(Default: Current directory)
--use-filename Sets whether the name of each output is based on the
msg filename.
-h, --help show this help message and exit
--use-content-id, --cid
Save attachments by their Content ID, if they have
one. Useful when working with the HTML body.
--dev Changes to use developer mode. Automatically
enables the --verbose flag. Takes precedence over
the --validate flag.
--validate Turns on file validation mode. Turns off regular
file output.
--json Changes to write output files as json.
--file-logging Enables file logging. Implies --verbose.
--verbose Turns on console logging.
--log LOG Set the path to write the file log to.
--config CONFIG_PATH Set the path to load the logging config from.
--out OUT_PATH Set the folder to use for the program output.
(Default: Current directory)
--use-filename Sets whether the name of each output is based on
the msg filename.
--dump-stdout Tells the program to dump the message body (plain
text) to stdout. Overrides saving arguments.
--html Sets whether the output should be html. If this is
not possible, will error.
--raw Sets whether the output should be html. If this is
not possible, will error.
--rtf Sets whether the output should be rtf. If this is
not possible, will error.
--allow-fallback Tells the program to fallback to a different save
type if the selected one is not possible.
--out-name OUT_NAME Name to be used with saving the file output.
Should come immediately after the file name.

**To use this in your own script**, start by using:

Expand Down Expand Up @@ -116,11 +127,8 @@ where ``CustomAttachmentClass`` is your custom class.

#TODO: Finish this section

If you have any questions feel free to contact me, Matthew Walker, at
mattgwwalker at gmail.com. NOTE: Due to time constraints, The Elemental
of Destruction has been added as a contributor to help manage the project.
As such, it may be helpful to send emails to [email protected] as
well.
If you have any questions feel free to contact me, Destiny, as arceusthe [at]
gmail [dot] com. I am the co-owner and main developer of the project.

If you have issues, it would be best to get help for them by opening a
new github issue.
Expand Down Expand Up @@ -197,11 +205,9 @@ And thank you to everyone who has opened an issue and helped us track down those
.. |License: GPL v3| image:: https://img.shields.io/badge/License-GPLv3-blue.svg
:target: LICENSE.txt

.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.29.0-blue.svg
:target: https://pypi.org/project/extract-msg/0.29.0/
.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.30.0-blue.svg
:target: https://pypi.org/project/extract-msg/0.30.0/

.. |PyPI1| image:: https://img.shields.io/badge/python-2.7+-brightgreen.svg
:target: https://www.python.org/downloads/release/python-2715/
.. |PyPI2| image:: https://img.shields.io/badge/python-3.6+-brightgreen.svg
:target: https://www.python.org/downloads/release/python-367/
.. _Matthew Walker: https://github.com/mattgwwalker
Expand Down
4 changes: 2 additions & 2 deletions extract_msg/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.

__author__ = 'Destiny Peterson & Matthew Walker'
__date__ = '2022-01-13'
__version__ = '0.29.0'
__date__ = '2022-01-19'
__version__ = '0.30.0'

import logging

Expand Down
45 changes: 29 additions & 16 deletions extract_msg/__main__.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,27 @@
import logging
import os
import sys
import traceback

from extract_msg import __doc__, utils
from extract_msg.compat import os_ as os
from extract_msg.message import Message

def main():

def main() -> None:
# Setup logging to stdout, indicate running from cli
CLI_LOGGING = 'extract_msg_cli'

args = utils.getCommandArgs(sys.argv[1:])
level = logging.INFO if args.verbose else logging.WARNING
currentdir = os.getcwdu() # Store this just in case the paths that have been given are relative

# Determine where to save the files to.
currentDir = os.getcwd() # Store this incase the path changes.
if args.out_path:
if not os.path.exists(args.out_path):
os.makedirs(args.out_path)
out = args.out_path
else:
out = currentdir
out = currentDir

if args.dev:
import extract_msg.dev
extract_msg.dev.main(args, sys.argv[1:])
Expand All @@ -30,29 +33,39 @@ def main():
from extract_msg import validation

valResults = {x[0]: validation.validate(x[0]) for x in args.msgs}
filename = 'validation {}.json'.format(int(time.time()))
filename = f'validation {int(time.time())}.json'
print('Validation Results:')
pprint.pprint(valResults)
print('These results have been saved to {}'.format(filename))
print(f'These results have been saved to {filename}')
with open(filename, 'w') as fil:
fil.write(json.dumps(valResults))
utils.getInput('Press enter to exit...')
json.dump(valResults, fil)
input('Press enter to exit...')
else:
if not args.dump_stdout:
utils.setupLogging(args.config_path, level, args.log, args.file_logging)

# Quickly make a dictionary for the keyword arguments.
kwargs = {
'customPath': out,
'customFilename': args.out_name,
'json': args.json,
'useMsgFilename': args.use_filename,
'contentId': args.cid,
'html': args.html,
'rtf': args.rtf,
'allowFallback': args.allowFallback,
}

for x in args.msgs:
try:
with Message(x[0]) as msg:
# Right here we should still be in the path in currentdir
with utils.openMsg(x[0]) as msg:
if args.dump_stdout:
print(msg.body)
else:
os.chdir(out)
msg.save(json = args.json, useMsgFilename = args.use_filename, contentId = args.cid, html = args.html, rtf = args.html, allowFallback = args.allowFallback)
msg.save(**kwargs)
except Exception as e:
print("Error with file '" + x[0] + "': " +
traceback.format_exc())
os.chdir(currentdir)
print(f'Error with file "{x[0]}": {traceback.format_exc()}')


if __name__ == '__main__':
main()
2 changes: 1 addition & 1 deletion extract_msg/appointment.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class Appointment(MessageBase):
"""

def __init__(self, path, prefix = '', attachmentClass = Attachment, filename = None, delayAttachments = False, overrideEncoding = None, attachmentErrorBehavior = constants.ATTACHMENT_ERROR_THROW, recipientSeparator = ';'):
MessageBase.__init__(self, path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior, recipientSeparator)
super().__init__(path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior, recipientSeparator)

@property
def appointmentClassType(self):
Expand Down
Loading

0 comments on commit 4aabd61

Please sign in to comment.