Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amazon module fails to download digital invoices #65

Open
dppdppd opened this issue Jan 16, 2022 · 13 comments
Open

amazon module fails to download digital invoices #65

dppdppd opened this issue Jan 16, 2022 · 13 comments

Comments

@dppdppd
Copy link

dppdppd commented Jan 16, 2022

finance-dl formats the url as
https://www.amazon.com/gp/css/summary/print.html?ie=UTF8&orderID=D01-1380792-3469006

which results in an error.

This one, which matches the pattern when I manually visit digital order invoices, works.
https://www.amazon.com/gp/digital/your-account/order-summary.html/ref=ppx_yo_dt_b_dpi_o00?ie=UTF8&orderID=D01-1380792-3469006&print=1

Dunno if the url has been changed since the code was written or if I'm hitting a unique issue.

@Zburatorul
Copy link
Collaborator

Can you paste the trace? What's the exact line where the exception is thrown?

@dppdppd
Copy link
Author

dppdppd commented Jan 17, 2022

2022-01-16 20:44:57,318 amazon.py:255 [INFO] Downloading invoice for order 'D01-1380792-3469006'
...
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions

It times out because the browser is sitting on a "We're sorry!" page

Screenshot 2022-01-17 134015
.

@Zburatorul
Copy link
Collaborator

It looks like I might have introduced this issue in #46, where I did not consider the case of digital orders.
Can you check out the commit before my PR (70e9385) and see if the issue is present for you?

@dppdppd
Copy link
Author

dppdppd commented Jan 18, 2022

It downloads digital orders just fine when I switch to that commit.

@Zburatorul
Copy link
Collaborator

Can you confirm if your terminal shows the line "Found likely Amazon Fresh order. Falling back to direct invoice URL." before the script crashes?

@dppdppd
Copy link
Author

dppdppd commented Jan 19, 2022

Doesn't look like it. This is the entirety of the spew

2022-01-19 10:50:00,350 amazon.py:223 [INFO] Skipping order group: '1998'
2022-01-19 10:50:00,398 amazon.py:223 [INFO] Skipping order group: '1997'
2022-01-19 10:50:00,444 amazon.py:223 [INFO] Skipping order group: '1996'
2022-01-19 10:50:00,494 amazon.py:223 [INFO] Skipping order group: '1995'
2022-01-19 10:50:00,525 amazon.py:255 [INFO] Downloading invoice for order 'D01-5823337-6656218'
Traceback (most recent call last):
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 408, in retry
    return func()
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 428, in fetch
    scraper.run()
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 282, in run
    self.get_orders(regular=self.regular, digital=self.digital)
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 249, in get_orders
    self.retrieve_invoices(invoice_hrefs)
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 267, in retrieve_invoices
    page_source, = self.wait_and_return(get_source)
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 244, in wait_and_return
    WebDriverWait(self.driver, timeout).until(predicate, message=message)
  File "/home/ido/.local/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions

Waiting 0 seconds before retrying
 --connect=http://127.0.0.1:59833 --session-id=1f390d5a8651a03a26497cdc7e766b17
2022-01-19 10:50:33,871 amazon.py:109 [INFO] Initiating log in
2022-01-19 10:50:36,273 amazon.py:116 [INFO] You must be already logged in!
2022-01-19 10:50:41,160 amazon.py:223 [INFO] Skipping order group: 'last 30 days'
2022-01-19 10:50:41,224 amazon.py:223 [INFO] Skipping order group: 'past 3 months'
2022-01-19 10:50:41,284 amazon.py:223 [INFO] Skipping order group: '2022'

@Zburatorul
Copy link
Collaborator

Something doesn't add up.
If the Fresh log message is not present, then the code is not creating the URL but is extracting it from the page, which should result in a correct URL.

I suggest you sprinkle some logger.info in various places to see what's happening.

@dppdppd
Copy link
Author

dppdppd commented Jan 20, 2022

dunno if related, but HEAD scrapes the wrong group. If I set it to 2022, i can watch it scrape 2021.

#70e9385 scrapes the correct group.

@Zburatorul
Copy link
Collaborator

That's strange, because my PR did not touch any of the group logic.

We had an off by one issue with the drop-down menu in the Amazon downloader a while ago, but I think that got fixed.
I can't reproduce either of your two issues with master. Digital orders download just fine for me.

@dppdppd
Copy link
Author

dppdppd commented Jan 21, 2022

Odd. I wiped the 2022 dir and repeated it. I get both 2022 and 2021 invoices in there. This is my cfg

def CONFIG_amazon_2022():
    return dict(
        order_groups=[
            "2022",
        ],
        module='finance_dl.amazon',
        digital=True,
        credentials={
            'username': XXXX
            'password': XXXX
        },
        output_directory=os.path.join(data_dir, 'amazon', "amazon_2022"),
        profile_dir=os.path.join(profile_dir, 'amazon'),
    )

Anyways, using that older commit I was able to download 3048 invoices starting back from the year 2000, 1133 of which were digital invoices that HEAD would not fetch. Of all of those, amazon legitimately can't produce 4 of them so I had to stub the files so the script would pass over them.

Feel free to close this out. I have a version of the code that works for me and unless anyone else is having issues, I would not prioritize an issue you can't reproduce.

@mjjohnson
Copy link

I'm hitting both of these issues; I only get orders from 2020 when I specify order_groups=['2021'], and it fails while trying to download digital orders, with the same error page @dppdppd described.

I installed finance-dl using pip install finance-dl, which got me finance-dl-1.3.3.

@mjjohnson
Copy link

Comparing the the v1.3.3 tag to master, there are several commit messages that mention various fixes for Amazon. Maybe it would be worth it to cut a new release and push it up to PyPI?

@arnold-c
Copy link

arnold-c commented Sep 6, 2022

Adding some more information to this, the order groups don't download correctly for me i.e. 2022 downloads 2021 invoices, and so on, and not inputting an order group results in a timeout error. Interestingly, setting the order group "past 3 months" downloads 2022 invoices, so it seems like everything is being shifted 'down' the order group hierarchy by one level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants