Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.29.0 #224

Merged
merged 42 commits into from
Jan 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
08759f5
progress report on 0.29.0
TheElementalOfDestruction May 9, 2021
3236e00
Progress report on 0.29.0
TheElementalOfDestruction May 11, 2021
9ea78c4
Progress Report
TheElementalOfDestruction May 12, 2021
8597f8b
progress report
TheElementalOfDestruction May 12, 2021
9b8be08
progress report
TheElementalOfDestruction May 12, 2021
cf01a3c
update
TheElementalOfDestruction Jun 10, 2021
711b83d
Update the imports
TheElementalOfDestruction Jun 10, 2021
b8bcfd2
Progress Report
TheElementalOfDestruction Jun 10, 2021
b1172cc
Fix for TeamMsgExtractor#201
TheElementalOfDestruction Jul 13, 2021
91da283
Addition of the recipientSeparator field
TheElementalOfDestruction Jul 14, 2021
bae9b92
Progress report
TheElementalOfDestruction Jul 14, 2021
2084cde
Progress report
TheElementalOfDestruction Jul 14, 2021
53915ee
Fix (thanks for noticing)
TheElementalOfDestruction Jul 14, 2021
fecbd0f
update
TheElementalOfDestruction Aug 7, 2021
144072e
update
TheElementalOfDestruction Aug 7, 2021
4a9e47f
Update
TheElementalOfDestruction Aug 19, 2021
90cb7d9
Progress report
TheElementalOfDestruction Aug 23, 2021
8d04913
Changelog fix
TheElementalOfDestruction Aug 23, 2021
0bc2306
Progress report.
TheElementalOfDestruction Aug 29, 2021
c51c67b
Merge pull request #209 from TheElementalOfDestruction/master
TheElementalOfDestruction Sep 2, 2021
7abf67c
Progress report: Finished implementation of filename limiting.
TheElementalOfDestruction Sep 3, 2021
bbcec85
Progress Report: Fixed unspecified date.
TheElementalOfDestruction Sep 3, 2021
ac7148c
Progress report: Fixed more issues and oversights.
TheElementalOfDestruction Sep 20, 2021
34d49be
Progress report: Added details to a comment in named about an issue.
TheElementalOfDestruction Sep 27, 2021
ceb6e3e
Progress Report: Started working on extending the contact class furth…
TheElementalOfDestruction Sep 27, 2021
d05c378
Merge branch 'next-release' of https://github.com/TeamMsgExtractor/ms…
TheElementalOfDestruction Sep 27, 2021
9f812c4
Progress Report: Added more contact properties.
TheElementalOfDestruction Sep 28, 2021
f7740c5
Progress report: added ability to ensure set without type knowledge.
TheElementalOfDestruction Sep 29, 2021
d8ca718
Progress report: Minor changes.
TheElementalOfDestruction Oct 13, 2021
d49be4a
Progress report: updated pull request template
TheElementalOfDestruction Oct 22, 2021
160f21a
Update to some of the named property code.
TheElementalOfDestruction Nov 1, 2021
8224f1e
Normalized encodings and fixed errors in changelog.
TheElementalOfDestruction Nov 1, 2021
4d57cb6
*Finally* added saving for HTML but haven't had time to extensively test
TheElementalOfDestruction Nov 1, 2021
c80f982
Fixed an issue where saving would save attachments before checking body
TheElementalOfDestruction Nov 1, 2021
d6e982a
Fixed the extension being wrong after last update.
TheElementalOfDestruction Nov 1, 2021
ed08c9b
Fixed the style of the injected header so it no longer has a bar on top.
TheElementalOfDestruction Nov 1, 2021
6b06bca
Reorganized the html code so that it can be used outside of Message.
TheElementalOfDestruction Nov 4, 2021
997cec9
Started some of the work for injecting into RTF body.
TheElementalOfDestruction Nov 25, 2021
936f174
Fixed the command line documentation to reflect new repo
TheElementalOfDestruction Dec 1, 2021
ce8e28e
*Finally* got the rtf implementation. Going to be doing some more tests.
TheElementalOfDestruction Jan 13, 2022
278616e
Fixed the injectable header for plain rtf
TheElementalOfDestruction Jan 13, 2022
b2372f9
Ready for release.
TheElementalOfDestruction Jan 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [ ] If necessary, have you bumped the version number? We will usually do this for you.
- [ ] Have you included py.test tests with your pull request. (Not yet necessary)
- [ ] Ensured your code is as close to PEP 8 compliant as possible?
- [ ] Ensured your pull request is to the next-release branch?

If you haven't completed the above items, please wait to create a PR until you have done so. We will try to review and reply to PRs as quickly as possible.

Expand Down
58 changes: 58 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,61 @@
**v0.29.0**
* [[TeamMsgExtractor #207](https://github.com/TeamMsgExtractor/msg-extractor/issues/207)] Made it so that unspecified dates are handled properly. For clarification, an unspecified date is a custom value in MSG files for dates that means that the date is unspecified. It is distinctly different from a property not existing, which will still return None. For unspecified dates, `datetime.datetime.max` is returned. While perhaps not the best solution, it will have to do for now.
* Fixed an issue where `utils.parseType` was returning a string for the date when it makes more sense to return an actual datetime instance.
* [[TeamMsgExtractor #165](https://github.com/TeamMsgExtractor/msg-extractor/issues/165)] [[TeamMsgExtractor #191](https://github.com/TeamMsgExtractor/msg-extractor/issues/191)] Completely redesigned all existing save functions. You can now properly save to custom locations under custom file names. This change may break existing code for several reasons. First, all arguments have been changed to keyword arguments. Second, a few keyword arguments have been renamed to better fit the naming conventions.
* [[TeamMsgExtractor #200](https://github.com/TeamMsgExtractor/msg-extractor/issues/200)] Changed imports to use relative imports instead of hard imports where applicable.
* Updated the save functions to no longer rely on the current working directory to save things. The module now does what it can to use hard pathing so that if you spontaneously change working directory it will not cause problems. This should also allow for saving to be threaded, if I am correct.
* [[TeamMsgExtractor #197](https://github.com/TeamMsgExtractor/msg-extractor/issues/197)] Added new property `Message.defaultFolderName`. This property returns the default name to be used for a Message if none of the options change the name.
* [[TeamMsgExtractor #201](https://github.com/TeamMsgExtractor/msg-extractor/issues/201)] Fixed an issue where if the class type was all caps it would not be recognized. According to the documentation the comparisons should have been case insensitive, but I must have misread it at some point.
* [[TeamMsgExtractor #202](https://github.com/TeamMsgExtractor/msg-extractor/issues/202)] Module will now handle path lengths in a semi-intelligent way to determine how best to save the MSG files. Default path length max is 255.
* [[TeamMsgExtractor #203](https://github.com/TeamMsgExtractor/msg-extractor/issues/203)] Fixed an issue where having multiple "." characters in your file name would cause the directories to be incorrectly named when using the `useFileName` (now `useMsgFilename`) argument in the save function.
* [[TeamMsgExtractor #204](https://github.com/TeamMsgExtractor/msg-extractor/issues/204)] Fixed an issue where the failsafe name used by attachments wasn't being encoded before hand causing encoding errors.
* MSG files with a type of simply `IPM` will now be returned as `MSGFile` by `openMsg`, as this specifies that no format has been specified.
* [[TeamMsgExtractor #214](https://github.com/TeamMsgExtractor/msg-extractor/issues/214)] Attachments that error because the MSG class type wasn't recognized or isn't supported will now correctly be `UnsupportedAttachment` instead of `BrokenAttachment`.
* Improved internal code in many functions to make them faster and more efficient.
* `openMsg` will now tell you if a class type is simply unsupported rather than unrecognized. If it is found in the list, the function will raise `UnsupportedMSGTypeError`.
* Added caching to `MSGFile.listDir`. I found that if you have larger files this single function might be taking up over half of the processing time because of how many times it is used in the module.
* Fully implemented raw saving.
* Extended the `Contact` class to have more properties.
* Added new function `MSGFile._ensureSetTyped` which acts like the other ensure set functions but doesn't require you to know the type. Prefer to use other ensure set function when you know exactly what type it will be.
* Changed `Message.saveRaw` to `MSGFile.saveRaw`.
* Changed `MSGFile.saveRaw` to take a path and save the contents to a zip file.
* Corrected the help doc to reflect the current repository (was still on mattgwwalker).
* Fixed a bug that would cause an exception on trying to access the RTF body on a file that didn't have one. This is now correctly returning `None`.
* The `raw` keyword of `Message.save` now actually works.
* Added property `Attachment.randomFilename` which allows you to get the randomly generated name for attachments that don't have a usable one otherwise.
* Added function `Attachment.regenerateRandomName` for creating a new random name if necessary.
* Added function `Attachment.getFilename`. This function is used to get the name an attachment will be saved with given the specified arguments. Arguments are identical to `Attachment.save`.
* Changed pull requests to reflect new style.
* Added additional properties for defined MSG file fields.
* Added zip file support for the `Attachment.save` and `Message.save`. Simply pass a path for the `zip` keyword argument and it will create a new `ZipFile` instance and save all of it's data inside there. Alternatively, you can pass an instance of a class that is either a `ZipFile` or `ZipFile`-like and it will simply use that. When this argument is defined, the `customPath` argument refers to the path inside the zip file.
* Added the `html` and `rtf` keywords to `Message.save`. These will attempt to save the body in the html or rtf format, respectively. If the program cannot save in those formats, it will raise an exception unless the `allowFallback` keyword argument is `True`.
* Changed `utils.hasLen` to use `hasattr` instead of the try-except method it was using.
* Added new option `recipientSeparator` to `MessageBase` allowing you to specify a custom recipient separator (default is ";" to match Microsoft Outlook).
* Changed the `openMsg` function in `Attachment` to not be strict. This allows you to actually open the MSG file even if we don't recognize the type of embedded MSG that is being used.
* Attempted to normalize encoding names throughout the module so that a certain encoding will only show up using one name and not multiple.
* Finally figured out what CRC32 algorithm is used in named properties after directly asking in a Microsoft forum (see the thread [here](https://docs.microsoft.com/en-us/answers/questions/574894/ms-oxmsg-specifies-the-use-of-crc-32-checksums-wit.html)). Fortunately the is already defined in the `compressed-rtf` module so we can take advantage of that.
* Reworked `MessageBase._genRecipient` to improve it (because what on earth was that code it was using before?). Variables in the function are now more descriptive. Added comments in several places.
* Many renames to better fit naming convention:
* `dev.setup_dev_logger` to `dev.setupDevLogger`.
* `MSGFile.fix_path` to `MSGFile.fixPath`.
* `MessageBase.save_attachments` to `MessageBase.saveAttachments`.
* `*.Exists` to `exists`.
* `*.ExistsTypedProperty` to `*.existsTypedProperty`.
* `prop.create_prop` to `prop.createProp`.
* `Properties.attachment_count` to `Properties.attachmentCount`.
* `Properties.next_attachment_id` to `Properties.nextAttachmentId`.
* `Properties.next_recipient_id` to `Properties.nextRecipientId`.
* `Properties.recipient_count` to `Properties.recipientCount`.
* `utils.get_command_args` to `utils.getCommandArgs`.
* `utils.get_full_class_name` to `utils.getFullClassName`.
* `utils.get_input` to `utils.getInput`.
* `utils.has_len` to `utils.hasLen`.
* `utils.setup_logging` to `utils.setupLogging`.
* `constants.int_to_data_type` to `constants.intToDataType`.
* `constants.int_to_intelligence` to `constants.intToIntelligence`.
* `constants.int_to_recipient_type` to `constants.intToRecipientType`.
* Misc internal function variables.

**v0.28.7**
* Added hex versions of the `MULTIPLE_X_BYTES` constants.
* Added `1048` to `constants.MULTIPLE_16_BYTES`
Expand Down
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ Credits

`Matthew Walker`_ - Original developer and owner

`Destiny Peterson (The Elemental of Destruction)`_ - Principle programmer, manager, and msg file "expert"
`Destiny Peterson (The Elemental of Destruction)`_ - Co-owner, principle programmer, knows more about msg files than anyone probably should

`JP Bourget`_ - Senior programmer, readability and organization expert, secondary manager

Expand All @@ -197,8 +197,8 @@ And thank you to everyone who has opened an issue and helped us track down those
.. |License: GPL v3| image:: https://img.shields.io/badge/License-GPLv3-blue.svg
:target: LICENSE.txt

.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.28.7-blue.svg
:target: https://pypi.org/project/extract-msg/0.28.7/
.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.29.0-blue.svg
:target: https://pypi.org/project/extract-msg/0.29.0/

.. |PyPI1| image:: https://img.shields.io/badge/python-2.7+-brightgreen.svg
:target: https://www.python.org/downloads/release/python-2715/
Expand Down
28 changes: 14 additions & 14 deletions extract_msg/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,20 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.

__author__ = 'Destiny Peterson & Matthew Walker'
__date__ = '2021-03-02'
__version__ = '0.28.7'
__date__ = '2022-01-13'
__version__ = '0.29.0'

import logging

from extract_msg import constants
from extract_msg.appointment import Appointment
from extract_msg.attachment import Attachment
from extract_msg.contact import Contact
from extract_msg.exceptions import UnrecognizedMSGTypeError
from extract_msg.message import Message
from extract_msg.message_base import MessageBase
from extract_msg.msg import MSGFile
from extract_msg.prop import create_prop
from extract_msg.properties import Properties
from extract_msg.recipient import Recipient
from extract_msg.utils import openMsg, properHex
from . import constants
from .appointment import Appointment
from .attachment import Attachment
from .contact import Contact
from .exceptions import UnrecognizedMSGTypeError
from .message import Message
from .message_base import MessageBase
from .msg import MSGFile
from .prop import createProp
from .properties import Properties
from .recipient import Recipient
from .utils import openMsg, properHex
14 changes: 7 additions & 7 deletions extract_msg/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ def main():
# Setup logging to stdout, indicate running from cli
CLI_LOGGING = 'extract_msg_cli'

args = utils.get_command_args(sys.argv[1:])
args = utils.getCommandArgs(sys.argv[1:])
level = logging.INFO if args.verbose else logging.WARNING
currentdir = os.getcwdu() # Store this just in case the paths that have been given are relative
if args.out_path:
Expand All @@ -29,17 +29,17 @@ def main():

from extract_msg import validation

val_results = {x[0]: validation.validate(x[0]) for x in args.msgs}
valResults = {x[0]: validation.validate(x[0]) for x in args.msgs}
filename = 'validation {}.json'.format(int(time.time()))
print('Validation Results:')
pprint.pprint(val_results)
pprint.pprint(valResults)
print('These results have been saved to {}'.format(filename))
with open(filename, 'w') as fil:
fil.write(json.dumps(val_results))
utils.get_input('Press enter to exit...')
fil.write(json.dumps(valResults))
utils.getInput('Press enter to exit...')
else:
if not args.dump_stdout:
utils.setup_logging(args.config_path, level, args.log, args.file_logging)
utils.setupLogging(args.config_path, level, args.log, args.file_logging)
for x in args.msgs:
try:
with Message(x[0]) as msg:
Expand All @@ -48,7 +48,7 @@ def main():
print(msg.body)
else:
os.chdir(out)
msg.save(toJson = args.json, useFileName = args.use_filename, ContentId = args.cid)#, html = args.html, rtf = args.html, args.allowFallback)
msg.save(json = args.json, useMsgFilename = args.use_filename, contentId = args.cid, html = args.html, rtf = args.html, allowFallback = args.allowFallback)
except Exception as e:
print("Error with file '" + x[0] + "': " +
traceback.format_exc())
Expand Down
11 changes: 6 additions & 5 deletions extract_msg/appointment.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
from extract_msg import constants
from extract_msg.attachment import Attachment
from extract_msg.message_base import MessageBase
from . import constants
from .attachment import Attachment
from .message_base import MessageBase


class Appointment(MessageBase):
"""
Parser for Microsoft Outlook Appointment files.
"""

def __init__(self, path, prefix = '', attachmentClass = Attachment, filename = None, delayAttachments = False, overrideEncoding = None, attachmentErrorBehavior = constants.ATTACHMENT_ERROR_THROW):
MessageBase.__init__(self, path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior)
def __init__(self, path, prefix = '', attachmentClass = Attachment, filename = None, delayAttachments = False, overrideEncoding = None, attachmentErrorBehavior = constants.ATTACHMENT_ERROR_THROW, recipientSeparator = ';'):
MessageBase.__init__(self, path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior, recipientSeparator)

@property
def appointmentClassType(self):
Expand Down
Loading