This repository has been archived by the owner on Feb 19, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 499
Guesswork: File naming for attributes #686
Comments
Thanks @FrankBrandel for reporting. Analysis of the problemThe guesswork is defined as follows here: formats = "pdf|jpe?g|png|gif|tiff?|te?xt|md|csv"
REGEXES = OrderedDict([
("created-correspondent-title-tags", re.compile(
r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
r"(?P<correspondent>.*) - "
r"(?P<title>.*) - "
r"(?P<tags>[a-z0-9\-,]*)"
r"\.(?P<extension>{})$".format(formats),
flags=re.IGNORECASE
)),
("created-title-tags", re.compile(
r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
r"(?P<title>.*) - "
r"(?P<tags>[a-z0-9\-,]*)"
r"\.(?P<extension>{})$".format(formats),
flags=re.IGNORECASE
)),
("created-correspondent-title", re.compile(
r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
r"(?P<correspondent>.*) - "
r"(?P<title>.*)"
r"\.(?P<extension>{})$".format(formats),
flags=re.IGNORECASE
)),
("created-title", re.compile(
r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
r"(?P<title>.*)"
r"\.(?P<extension>{})$".format(formats),
flags=re.IGNORECASE
)),
("correspondent-title-tags", re.compile(
r"(?P<correspondent>.*) - "
r"(?P<title>.*) - "
r"(?P<tags>[a-z0-9\-,]*)"
r"\.(?P<extension>{})$".format(formats),
flags=re.IGNORECASE
)),
("correspondent-title", re.compile(
r"(?P<correspondent>.*) - "
r"(?P<title>.*)?"
r"\.(?P<extension>{})$".format(formats),
flags=re.IGNORECASE
)),
("title", re.compile(
r"(?P<title>.*)"
r"\.(?P<extension>{})$".format(formats),
flags=re.IGNORECASE
))
])
# Parse filename components.
for key,regex in REGEXES.items():
# First match for created-title-tags
m = regex.match("20150314Z - Some Company Name - Invoice.pdf")
# First match for created-correspondent-title
#m = regex.match("20150314Z - Some Company Name - Invoice whenever.pdf")
if m:
properties = m.groupdict()
print(key)
print(properties)
Possible SolutionsI'm not sure how the mechanism is intended to work. Can we assume that tags are always lowercase? Then we simply need to remove the What do you guys think @pitkley @bauerj @MasterofJOKers? |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The consumer seems to have an issue detecting values from the file name when a certain pattern occurs.
Using the example from the documentation:
20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf
Works great, as expected.
Now try this:
20150314Z - Some Company Name - Invoice.pdf
In my instance it detects the date correctly, but "Some Company Name" as the title (should be correspondent) and "Invoice" as a tag (should be title).
Following pattern works though, filling the values for correspondent and title ("Invoice whenever") correctly, no tags:
20150314Z - Some Company Name - Invoice whenever.pdf
Without the date also no problem:
Some Company Name - Invoice.pdf
My setup does not use PAPERLESS_FILENAME_DATE_ORDER or PAPERLESS_FILENAME_PARSE_TRANSFORMS.
From the documentation and my logic I don't see why no. 2 should not work.
Can somebody replicate this?
The text was updated successfully, but these errors were encountered: