Skip to content

Commit

Permalink
Dependencies: Upgrade python-docx to 1.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
BLKSerene committed Dec 18, 2023
1 parent 940b186 commit 6dae1bb
Show file tree
Hide file tree
Showing 8 changed files with 31 additions and 60 deletions.
2 changes: 1 addition & 1 deletion ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ As Wordless stands on the shoulders of giants, I hereby extend my sincere gratit
17|[Pyphen](https://pyphen.org/)|0.14.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
18|[PyQt](https://riverbankcomputing.com/software/pyqt/)|5.15.10|Riverbank Computing|[Commercial-License/GPL-3.0-only](https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license)
19|[PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)|4.0.2|Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)|[Apache-2.0](https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE)
20|[python-docx](https://github.com/python-openxml/python-docx)|0.8.11|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
20|[python-docx](https://github.com/python-openxml/python-docx)|1.1.0|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
21|[python-mecab-ko](https://github.com/jonghwanhyeon/python-mecab-ko)|1.3.3|Jonghwan Hyeon|[BSD-3-Clause](https://github.com/jonghwanhyeon/python-mecab-ko/blob/main/LICENSE)
22|[Requests](https://github.com/psf/requests)|2.31.0|Kenneth Reitz|[Apache-2.0](https://github.com/psf/requests/blob/main/LICENSE)
23|[Sacremoses](https://github.com/alvations/sacremoses)|0.0.53|Liling Tan|[MIT](https://github.com/alvations/sacremoses/blob/master/LICENSE)
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
- Dependencies: Upgrade Lingua to 2.0.2
- Dependencies: Upgrade pymorphy3 to 1.3.1
- Dependencies: Upgrade PyQt to 5.15.10
- Dependencies: Upgrade python-docx to 1.1.0
- Dependencies: Upgrade spaCy to 3.7.2
- Dependencies: Upgrade spacy-pkuseg to 0.0.33

Expand Down
2 changes: 1 addition & 1 deletion doc/trs/zho_cn/ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
17|[Pyphen](https://pyphen.org/)|0.14.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
18|[PyQt](https://riverbankcomputing.com/software/pyqt/)|5.15.10|Riverbank Computing|[Commercial-License/GPL-3.0-only](https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license)
19|[PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)|4.0.2|Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)|[Apache-2.0](https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE)
20|[python-docx](https://github.com/python-openxml/python-docx)|0.8.11|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
20|[python-docx](https://github.com/python-openxml/python-docx)|1.1.0|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
21|[python-mecab-ko](https://github.com/jonghwanhyeon/python-mecab-ko)|1.3.3|Jonghwan Hyeon|[BSD-3-Clause](https://github.com/jonghwanhyeon/python-mecab-ko/blob/main/LICENSE)
22|[Requests](https://github.com/psf/requests)|2.31.0|Kenneth Reitz|[Apache-2.0](https://github.com/psf/requests/blob/main/LICENSE)
23|[Sacremoses](https://github.com/alvations/sacremoses)|0.0.53|Liling Tan|[MIT](https://github.com/alvations/sacremoses/blob/master/LICENSE)
Expand Down
2 changes: 1 addition & 1 deletion doc/trs/zho_tw/ACKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
17|[Pyphen](https://pyphen.org/)|0.14.0|Guillaume Ayoub|[GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1](https://github.com/Kozea/Pyphen/blob/master/LICENSE)
18|[PyQt](https://riverbankcomputing.com/software/pyqt/)|5.15.10|Riverbank Computing|[Commercial-License/GPL-3.0-only](https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license)
19|[PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)|4.0.2|Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)|[Apache-2.0](https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE)
20|[python-docx](https://github.com/python-openxml/python-docx)|0.8.11|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
20|[python-docx](https://github.com/python-openxml/python-docx)|1.1.0|Steve Canny|[MIT](https://github.com/python-openxml/python-docx/blob/master/LICENSE)
21|[python-mecab-ko](https://github.com/jonghwanhyeon/python-mecab-ko)|1.3.3|Jonghwan Hyeon|[BSD-3-Clause](https://github.com/jonghwanhyeon/python-mecab-ko/blob/main/LICENSE)
22|[Requests](https://github.com/psf/requests)|2.31.0|Kenneth Reitz|[Apache-2.0](https://github.com/psf/requests/blob/main/LICENSE)
23|[Sacremoses](https://github.com/alvations/sacremoses)|0.0.53|Liling Tan|[MIT](https://github.com/alvations/sacremoses/blob/master/LICENSE)
Expand Down
Binary file modified tests/files/wl_file_area/file_types/docx.docx
Binary file not shown.
2 changes: 1 addition & 1 deletion tests/tests_file_area/test_file_area_file_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ def update_gui_file_types(err_msg, new_files):
assert file_text.tokens_multilevel == [[[['Lorem', 'ipsum', 'dolor', 'sit', 'amet', ','], ['consetetur', 'sadipscing', 'elitr', ','], ['sed', 'diam', 'nonumy', 'eirmod']]], [[['tempor', 'invidunt', 'ut', 'labore', 'et', 'dolore', 'magna', 'aliquyam', 'erat', ','], ['sed', 'diam', 'voluptua', '.']], [['At', 'vero']]], [[['eos', 'et', 'accusam', 'et', 'justo', 'duo', 'dolores', 'et', 'ea', 'rebum', '.']], [['Stet', 'clita', 'kasd', 'gubergren', ','], ['no', 'sea', 'taki-']]], [[['mata', 'sanctus', 'est', 'Lorem', 'ipsum', 'dolor', 'sit', 'amet', '.']], [['Lorem', 'ipsum', 'dolor', 'sit', 'amet', ','], ['consetetur']]], [[['sadipscing', 'elitr', ','], ['sed', 'diam', 'nonumy', 'eirmod', 'tempor', 'invidunt', 'ut', 'labore', 'et', 'dolore', 'magna']]], [[['aliquyam', 'erat', ','], ['sed', 'diam', 'voluptua', '.']], [['At', 'vero', 'eos', 'et', 'accusam', 'et', 'justo', 'duo', 'dolores', 'et', 'ea']]], [[['rebum', '.']], [['Stet', 'clita', 'kasd', 'gubergren', ','], ['no', 'sea', 'takimata', 'sanctus', 'est', 'Lorem', 'ipsum', 'dolor', 'sit']]], [[['amet', '.']]], [[['1']]]]
# Word documents
elif file_name == 'docx.txt':
assert file_text.tokens_multilevel == [[], [], [[['Heading']]], [], [], [[['This', 'is', 'the', 'first', 'sentence', '.']], [['This', 'is', 'the', 'second', 'sentence', '.']]], [], [], [[['This', 'is', 'the', 'third', 'sentence', '.']]], [], [[['2', '-', '2', '&', '2', '-', '3']], [['2', '-', '4']]], [[['3', '-', '2', '&', '4', '-', '2']], [['3', '-', '3']], [['3', '-', '4']]], [[['4', '-', '3']], [['4', '-', '4']]], [[['5', '-', '2', '5', '-', '3', '5', '-', '4', '5', '-', '4', '-', '1', '5', '-', '4', '-', '2', '5', '-', '4', '-', '3', '5', '-', '4', '-', '4']]], [], [], []]
assert file_text.tokens_multilevel == [[], [[['Heading']]], [], [], [[['This', 'is', 'the', 'first', 'sentence', '.']], [['This', 'is', 'the', 'second', 'sentence', '.']]], [[['This', 'is', 'the', 'third', 'sentence', '.']]], [], [], [[['2', '-', '2/3']], [['2', '-', '4']]], [[['3/4', '-', '2', '3', '-', '3', '3', '-', '4']]], [[['4', '-', '3', '4', '-', '4', '4', '-', '4', '-', '1/2', '4', '-', '4', '-', '3/5', '4', '-', '4', '-', '4', '4', '-', '4', '-', '6']]], [], []]
# XML files
elif file_name == 'xml.xml':
assert file_text.tokens_multilevel == [[[['FACTSHEET', 'WHAT', 'IS', 'AIDS', '?']]], [[['AIDS', '(', 'Acquired', 'Immune', 'Deficiency', 'Syndrome', ')', 'is', 'a', 'condition', 'caused', 'by', 'a', 'virus', 'called', 'HIV', '(', 'Human', 'Immuno', 'Deficiency', 'Virus', ')', '.']], [['This', 'virus', 'affects', 'the', 'body', "'s", 'defence', 'system', 'so', 'that', 'it', 'can', 'not', 'fight', 'infection', '.']]]]
Expand Down
2 changes: 1 addition & 1 deletion utils/wl_generate_acks.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
['Pyphen', 'https://pyphen.org/', '0.14.0', 'Guillaume Ayoub', 'GPL-2.0-or-later/LGPL-2.1-or-later/MPL-1.1', 'https://github.com/Kozea/Pyphen/blob/master/LICENSE'],
['PyQt', 'https://riverbankcomputing.com/software/pyqt/', '5.15.10', 'Riverbank Computing', 'Commercial-License/GPL-3.0-only', 'https://www.riverbankcomputing.com/static/Docs/PyQt5/introduction.html#license'],
['PyThaiNLP', 'https://github.com/PyThaiNLP/pythainlp', '4.0.2', 'Wannaphong Phatthiyaphaibun (วรรณพงษ์ ภัททิยไพบูลย์)', 'Apache-2.0', 'https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE'],
['python-docx', 'https://github.com/python-openxml/python-docx', '0.8.11', 'Steve Canny', 'MIT', 'https://github.com/python-openxml/python-docx/blob/master/LICENSE'],
['python-docx', 'https://github.com/python-openxml/python-docx', '1.1.0', 'Steve Canny', 'MIT', 'https://github.com/python-openxml/python-docx/blob/master/LICENSE'],
['python-mecab-ko', 'https://github.com/jonghwanhyeon/python-mecab-ko', '1.3.3', 'Jonghwan Hyeon', 'BSD-3-Clause', 'https://github.com/jonghwanhyeon/python-mecab-ko/blob/main/LICENSE'],
['Requests', 'https://github.com/psf/requests', '2.31.0', 'Kenneth Reitz', 'Apache-2.0', 'https://github.com/psf/requests/blob/main/LICENSE'],
['Sacremoses', 'https://github.com/alvations/sacremoses', '0.0.53', 'Liling Tan', 'MIT', 'https://github.com/alvations/sacremoses/blob/master/LICENSE'],
Expand Down
80 changes: 25 additions & 55 deletions wordless/wl_file_area.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,6 @@

import bs4
import docx
from docx.document import Document
from docx.oxml.table import CT_Tbl
from docx.oxml.text.paragraph import CT_P
from docx.table import _Cell, Table
from docx.text.paragraph import Paragraph
import openpyxl
import pypdf
from PyQt5.QtCore import pyqtSignal, QCoreApplication, QItemSelection, QRect, Qt
Expand Down Expand Up @@ -839,19 +834,20 @@ def run(self):

new_file['text'] = soup.get_text()
# Word documents
# Reference: https://github.com/python-openxml/python-docx/issues/40#issuecomment-1793226714
elif file_ext == '.docx':
lines = []
doc = docx.Document(file_path)

for block in self.iter_block_items(doc):
if isinstance(block, docx.text.paragraph.Paragraph):
lines.append(block.text)
elif isinstance(block, docx.table.Table):
for row in self.iter_visual_cells(block):
cells = []

for cell in row:
cells.append(' '.join([item.text for item in self.iter_cell_items(cell)]))
for item in doc.iter_inner_content():
if isinstance(item, docx.text.paragraph.Paragraph):
lines.append(item.text)
elif isinstance(item, docx.table.Table):
for row in self.iter_visual_cells(item):
cells = [
' '.join([cell_item.text for cell_item in self.iter_block_items(cell)])
for cell in row
]

lines.append('\t'.join(cells))

Expand Down Expand Up @@ -935,58 +931,32 @@ def run(self):
self.progress_updated.emit(self.tr('Updating table...'))
self.worker_done.emit(err_msg, new_files)

# Reference: https://github.com/python-openxml/python-docx/issues/276
def iter_block_items(self, parent):
"""
Yield each paragraph and table child within *parent*, in document order.
Each returned value is an instance of either Table or Paragraph. *parent*
would most commonly be a reference to a main Document object, but
also works for a _Cell object, which itself can contain paragraphs and tables.
"""
if isinstance(parent, Document):
parent_elm = parent.element.body
elif isinstance(parent, _Cell):
parent_elm = parent._tc
else:
raise ValueError("something's not right")

for child in parent_elm.iterchildren():
if isinstance(child, CT_P):
yield Paragraph(child, parent)
elif isinstance(child, CT_Tbl):
yield Table(child, parent)

def iter_cell_items(self, parent):
parent_elm = parent._tc

for child in parent_elm.iterchildren():
if isinstance(child, CT_P):
yield Paragraph(child, parent)
elif isinstance(child, CT_Tbl):
table = Table(child, parent)

for row in table.rows:
for cell in row.cells:
yield from self.iter_cell_items(cell)

# Reference: https://github.com/python-openxml/python-docx/issues/40
# Reference: https://github.com/python-openxml/python-docx/issues/40#issuecomment-1793226714
def iter_block_items(self, blkcntnr):
for item in blkcntnr.iter_inner_content():
if isinstance(item, docx.text.paragraph.Paragraph):
yield item
elif isinstance(item, docx.table.Table):
for row in self.iter_visual_cells(item):
for cell in row:
yield from self.iter_block_items(cell)

# Reference: https://github.com/python-openxml/python-docx/issues/344#issuecomment-271390490
def iter_visual_cells(self, table):
prior_tcs = []
visual_cells = []
prior_tcs = set()

for row in table.rows:
visual_cells.append([])

for cell in row.cells:
this_tc = cell._tc

if this_tc in prior_tcs: # skip cells pointing to same `<w:tc>` element
if cell._tc in prior_tcs: # skip cells pointing to same `<w:tc>` element
continue
else:
prior_tcs.append(this_tc)

visual_cells[-1].append(cell)

prior_tcs.add(cell._tc)

return visual_cells

class Wl_Worker_Open_Files(wl_threading.Wl_Worker):
Expand Down

0 comments on commit 6dae1bb

Please sign in to comment.