Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement suggestion - direct matplotlib figure save from within fpdf2 #789

Open
LandyQuack opened this issue May 21, 2023 · 18 comments
Open
Labels
image performance research needed too complicated to implement without careful study of official specifications

Comments

@LandyQuack
Copy link

Firstly - excellent library / thank you for all your hard work. Used it vs alternatives because of vector graphics support but was really surprised by (slow) speed on some matplotlib images (savefig to BytesIO as SVG to pdf.image) and wondered if MatPlotLib direct conversion was much faster - it is.

As an example (attached script reproduces), 3 matplotlib plots (1 of blood pressure, 1 the anatomy example and 1 an xkcd example) having timings like:

Generate figures: 102.45670797303319 ms <-- all 3
#----------------------------------------------------------------------------------------
MatPlotLib PdfPages - fig 0: 33.42100000008941 ms <-- Blood Pressure
MatPlotLib PdfPages - fig 1: 89.3275830312632 ms <-- Anatomy
MatPlotLib PdfPages - fig 2: 38.10716699808836 ms <-- xkcd
MatPlotLib PdfPages - overall: 160.90095799881965 ms
#----------------------------------------------------------------------------------------
Fpdf - fig 0: 276.9010409829207 ms <-- Blood Pressure
Fpdf - fig 1: 4885.383540997282 ms <-- Anatomy
Fpdf - fig 2: 646.598165971227 ms <-- xkcd
Fpdf - overall: 5808.975291030947 ms
#----------------------------------------------------------------------------------------

So nearly 6,000 ms for Fpdf2 for 3 plots versus 160 ms for MatPlotLib to produce essentially the same PDF. Size wise they're within 1k of each other.

Not sure how much of this is figure -> svg -> pdf vs figure -> pdf and how much is C vs Python but I started looking because a document with ~ 20 plots in Fpdf2 was taking a surprisingly long time to generate.

My question is about whether or not a feature might be considered to implement fpdf.savefig() or similar - perhaps by nabbing images direct from figure -> pdf -> fpdf2?

test1.txt

@Lucas-C
Copy link
Member

Lucas-C commented May 21, 2023

Welcome @LandyQuack 🙂

Firstly - excellent library / thank you for all your hard work.

Thank you!

My question is about whether or not a feature might be considered to implement fpdf.savefig() or similar - perhaps by nabbing images direct from figure -> pdf -> fpdf2?

I have adapted your script using the FigureCanvas approach to embed figures, as described in our documentation:
https://pyfpdf.github.io/fpdf2/Maths.html#using-matplotlib

issue_789.py.txt

The results are a lot better, performance-wise:

$ ./issue_789.py
Generate / append figures: 116.06030003167689 ms
#----------------------------------------------------------------------------------------
PdfPages - fig 0: 92.47630002209917 ms
PdfPages - fig 1: 175.60830002184957 ms
PdfPages - fig 2: 35.08909995434806 ms
PdfPages - overall: 303.32190002081916 ms
#----------------------------------------------------------------------------------------
Fpdf - fig 0: 101.3886000146158 ms
Fpdf - fig 1: 199.06949996948242 ms
Fpdf - fig 2: 48.978100006934255 ms
Fpdf - overall: 349.7093000332825 ms
#----------------------------------------------------------------------------------------

To me, there does not seem to be a need for much enhancement.

What do you think?

@LandyQuack
Copy link
Author

I may be misreading your link but doesn't that create an image rather than anything vector based?

@Lucas-C
Copy link
Member

Lucas-C commented May 21, 2023

I may be misreading your link but doesn't that create an image rather than anything vector based?

Ah yes, sorry, I did not realize that you wanted vector graphics and not raster graphics 😅

@LandyQuack
Copy link
Author

I spent a little bit of time this evening trying to see if I could snaffle the relevant bits from matplotlib/PDFPages/savefig and... I think what it's doing is translating the figure into PDF paths and then wrapping that up with the rest of the PDF essentials like fonts and metadata.

I guess what I was wondering is if there might be a way to use some of that existing code to turn a figure into whatever it looks like in a PDF and then put that in the right place in the pdf using fpdf2?

@Lucas-C
Copy link
Member

Lucas-C commented May 21, 2023

I had a look myself at https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1939

I think we could subclass matplotlib.backends.backend_pdf.RendererPdf in order to render figures directly to a fpdf2.FPDF instance.

I won't have the time to tackle this interesting challenge myself, but this sure looks like a fun exercise,
and I would welcome a Pull Request that provides that!

@LandyQuack
Copy link
Author

Played around with that and got a little lost in the function calls but have something very simple (attached) which spits out entries like:

b'/DeviceRGB CS'
b'/DeviceRGB cs'
b'1 j'
b'1 g 0 j 0 w 1 G 1 g'
b'0 0 m\n460.8 0 l\n460.8 345.6 l\n0 345.6 l\nh\n'
b'f'
b'/A1 gs 0.9176470588 0.9176470588 0.9490196078 rg 0 G 0.9176470588\n0.9176470588 0.9490196078 rg'
b'57.6 38.016 m\n414.72 38.016 l\n414.72 304.128 l\n57.6 304.128 l\nh\n'
b'f'
b'q 57.6 38.016 357.12 266.112 re W n /A2 gs 1 J 1 j 0.8 w 1 G /DeviceRGB cs'
b'89.594157 38.016 m\n89.594157 304.128 l\n'
b'S'
b'Q q /A2 gs 0.15 g 1 j 1 w 0.15 G 0.15 g'
b'q'
b'1 0 -0 1 78.469156895 23.85975 cm'
b'BT'
b'/F1 10 Tf'
b'0 0 Td'
b'[ (2006) ] TJ'
b'ET'

which, looking at https://github.com/gendx/pdf-cheat-sheets/blob/master/pdf-graphics.clean.pdf, seem to be PDF drawing commands and there are recognisable year names and strings like

b'[ (Blood Pressure) ] TJ'

which are clearly from my test image.

The code is trivial - basically two subclasses overriding init and 1 print statement in the output function of PdfFile.

Now... since PDF innards are a black art... does any of this look like it might move things towards a goal of taking a MatPlotLib figure and (quickly) turning it into FPDF2 usable content without the (relatively) slow SVG intermediate parse?

If it does, can anyone point me in the right direction for finding the start and end of the converted figure? If I know those, I can work to finding what's generating everything in between!

mpl1.txt

@Lucas-C
Copy link
Member

Lucas-C commented May 22, 2023

Hi @LandyQuack!

This looks promising 👍

I'll try to give a closer look at your code whenever I have some free time this week.

@Lucas-C
Copy link
Member

Lucas-C commented May 23, 2023

A quick analysis of the stuff in matplotlib.backends.backend_pdf:

  • PDFPages uses FigureCanvasPdf (in its savefig() method)
  • FigureCanvasPdf uses RendererPdf (in its print_pdf() method)
  • RendererPdf uses PdfFile

Hence, the crux of the processing lies in those two last classes.

There is how you can use subclasses of them:

import matplotlib as mpl
from matplotlib.backends.backend_pdf import PdfFile, RendererPdf

class CustomPdfFile(PdfFile):
    pass

class CustomRendererPdf(RendererPdf):
    pass

mpl.rcParams['pdf.compression'] = False
mpl.rcParams['pdf.use14corefonts'] = True

# ... obtain a fig and then:
data = BytesIO()
width, height = fig.get_size_inches()
pdf_file = CustomPdfFile(data)
pdf_file.newPage(width, height)
renderer = CustomRendererPdf(pdf_file, fig.dpi, height, width)
print("PDF file initial content:")
for line in data.getvalue().split(b"\n"):
    print(line)
fig.draw(renderer)
pdf_file.finalize()
with open("issue-789-PdfFile.pdf", "wb") as out_file:
    out_file.write(data.getvalue())

This should help you to figure when the figure rendering starts! 😊

@LandyQuack
Copy link
Author

LandyQuack commented May 24, 2023

Lucas - that's been super helpful especially the compression and the font bits. I didn't quite use your code but used something like

class Pdf_Object (PdfFile):
	"""
		The theory goes... everything about how the PDF is constructed happens in PdfFile so... if we can decipher it...
		then we can capture what we'll need for FPDF2 e.g. fonts and drawing instructions etc... and if we can do that
		then we should be able to do FPDF2.fonts.append (blah) and FPDF2.add_mpl_figure (PDF_Obj.blah) or whatever the
		function calls might be. Haven't looked yet but presumably the SVG parser must wrap up similar drawing primitives
		so that might be the way to test a proof of concept.
		
		Subclass https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L660 so we can
		log member function calls within pdf output. There are a couple of functions we can't log with PdfFile.output
		because it triggers a recursion level limit fault. We also skip what look like non output utility functions.
	"""
	def __init__ (self, filename, metadata=None):
		super().__init__(filename, metadata=None)

	def newPage(self, width, height):
		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L769 """
		self.output ('PdfFile.newPage')
		super().newPage(width, height)

	def newTextnote(self, text, positionRect=[-100, -100, 0, 0]):
		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L798 """
		self.output ('PdfFile.newTextnote')
		super().newTextnote(text, positionRect)

and that's giving me output like

python3 mpl1.py
b'%PDF-1.4'
b'%\xac\xdc \xab\xba'
┌─────────────────────┐
│ PdfFile.writeObject │
└─────────────────────┘
b'1 0 obj'
b'<< /Type /Catalog /Pages 2 0 R >>'
b'endobj'
┌─────────────────────┐
│ PdfFile.writeObject │
└─────────────────────┘
b'8 0 obj'
b'<< /Font 3 0 R /XObject 7 0 R /ExtGState 4 0 R /Pattern 5 0 R'
b'/Shading 6 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >>'
b'endobj'
┌─────────────────┐
│ PdfFile.newPage │
└─────────────────┘
┌───────────────────┐
│ PdfFile.endStream │
└───────────────────┘
┌─────────────────────┐
│ PdfFile.writeObject │
└─────────────────────┘
b'11 0 obj'
b'<< /Type /Page /Parent 2 0 R /Resources 8 0 R'
b'/MediaBox [ 0 0 460.8 345.6 ] /Contents 9 0 R /Annots 10 0 R >>'
b'endobj'
┌─────────────────────┐
│ PdfFile.beginStream │
└─────────────────────┘
b'9 0 obj'
b'<< /Length 12 0 R >>'
b'stream'
b'/DeviceRGB CS'
b'/DeviceRGB cs'
b'1 j'
b'1 g 0 j 0 w 1 G 1 g'
┌───────────────────┐
│ PdfFile.writePath │
└───────────────────┘
b'0 0 m'
b'460.8 0 l'
b'460.8 345.6 l'
b'0 345.6 l'
b'h'
b''
b'f'
b'/A1 gs 0.9176470588 0.9176470588 0.9490196078 rg 0 G 0.9176470588'
b'0.9176470588 0.9490196078 rg'
┌───────────────────┐
│ PdfFile.writePath │
└───────────────────┘
b'57.6 38.016 m'
b'414.72 38.016 l'
b'414.72 304.128 l'
b'57.6 304.128 l'
b'h'
b''
b'f'
b'q 57.6 38.016 357.12 266.112 re W n /A2 gs 1 J 1 j 0.8 w 1 G /DeviceRGB cs'

and I can start to see where the figure is represented in the PDF.

I think I need to look at the SVG code next because I presume that the vector lines etc in the SVG become pdf drawing commands in the same way so... if I can see what that code does to say "insert these drawing commands here and magically FPDF2 shall find and incorporate them" (paths?) then I should be a bit further to something that says

pdf = FPDF()
pdf.add_mpl_figure (fig, w,h)

so it behaves like an svg or a png or whatever and can be put in table cells etc.

i'm thinking that FPDF will need / want some sort of PDF object (basically PdfFile without the file generation) that can be queried to say - give me your images and your font usage and your paths a bit like

for paths in pdf_obj.paths(): add in some clever fashion.

Current code attached.

Iain
mpl1.txt

@LandyQuack
Copy link
Author

Got this working to proof of concept level at least. After playing around with trying to reconstruct the pdf from the innards of the renderer (and at least getting something on screen), decided that the matplotlib pdf backend is perfectly capable of generating pdf content so...

subclassed PdfFile, captured output to a BytesIO and nabbed everything between stream and endstream and put it into FPDF using _out().

Fonts were a bit harder as the reference in the stream has to match what FPDF is adding so replaced fontname.

It works in as far as I get my test MatPlotLib figure in a FPDF page at standard zoom and have embedded a vector graphic.

Needs work on scaling and positioning (to use in something like a cell) and Truetype fonts but, as a proof of concept, I'm happy with it so far.

Iain

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
#from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
from matplotlib.patches import Circle
from matplotlib.patheffects import withStroke
from matplotlib.ticker import AutoMinorLocator, MultipleLocator
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
import seaborn as sns
from io import BytesIO
from fpdf import FPDF, drawing
import logging

# For PDF export
from matplotlib.backends.backend_pdf import PdfPages, PdfFile, pdfRepr, _fill,FigureCanvasPdf, RendererPdf, Op
from matplotlib import cbook, _path
from matplotlib._pylab_helpers import Gcf
from matplotlib.backends.backend_mixed import MixedModeRenderer
from matplotlib.font_manager import fontManager as _fontManager, FontProperties
from pathlib import Path

#----------------------------------------------------------------------------------------
def W (txt):
	""" Wrap a string in a box using ascii line drawing characters - easier to see """
	s = '\u2500' * (len (txt) + 2)
	print (f"\u250c{s}\u2510\n\u2502 {txt} \u2502\n\u2514{s}\u2518")
#----------------------------------------------------------------------------------------

#----------------------------------------------------------------------------------------
def Draw_BP_Graph ():
	""" Draw simple floating bar graph of Blood Pressure """
	
	BP_data = [
	('21/3/2005',142, 86),('13/2/2010', 131, 87),('2/6/2011', 141, 83),('27/2/2013', 180, 93),
	('1/5/2017', 137, 65),('12/11/2018',151,68),('14/5/2022',155, 86)
	]
	
	# Create the dataframe
	BP = pd.DataFrame (BP_data, columns=['When', 'Systolic', 'Diastolic'])
	
	# Convert dates in the When column - lose the time component
	BP['When'] = pd.to_datetime(BP['When'], dayfirst=True).dt.date
	
	# For a floating bar graph we need a height (systolic - diastolic) as the bar starts at diastolic and has a height
	BP['Height'] = BP['Systolic'] - BP['Diastolic']
	
	# Graph Blood Pressure - label things
	plt.title('Blood Pressure', fontsize=10)
	# plt.xlabel('Year', fontsize=14)
	# plt.ylabel('mm Hg', fontsize=14)

	# Plot bars from diastolic up to systolic in blue
	plt.bar (BP['When'], BP['Height'], bottom=BP['Diastolic'], width=40, color='blue')
	plt.grid(True)
	
	# Add lines at 140 & 90 in red - styles as per https://matplotlib.org/3.5.0/api/_as_gen/matplotlib.pyplot.axhline.html (: is subtle)
	ax = plt.gca()
	for y in (140,90): ax.axhline(y, color='red', linestyle=':')
	
	# Shift the y-axis down by 15 (looks prettier) and up by the same
	bottom, top = plt.ylim()  # return the current ylim
	plt.ylim((bottom-15, top+15))   # set the ylim to bottom, top
	
	# Return the figure
	return plt.gcf()


#----------------------------------------------------------------------------------------
class Custom_FPDF(FPDF):
    
	def MPL_Figure (self, fig):
		""" Try and save an MatPlotLib figure to a FPDF instance """
		
		fig.dpi = 72  # there are 72 pdf points to an inch
		width, height = fig.get_size_inches()
	
		# pdf_file is our in memory PDF generated'ish by MatPlotLib
		data = BytesIO()
		pdf_file = Pdf_Object(data,parent=self)
		
		# Have to figure out how to alter both position and size
		pdf_file.newPage(width,height)
		renderer = RendererPdf(pdf_file, fig.dpi, height, width)
		
		#renderer = MixedModeRenderer(fig, width, height, fig.dpi,renderer,bbox_inches_restore=bbox_inches_restore)
		renderer = MixedModeRenderer(fig, width, height, fig.dpi, renderer)
		
		fig.draw(renderer)
		renderer.finalize()
		
		pdf_file.finalize()

		# And the same for the XRef table - we may want to grab things from here
		#for i,x in enumerate(pdf_file.XRef()): print (f'Xref[{i}]: {x}')
		
		# Get the in memory PDF
		dv = data.getvalue()
		
		# Debug
		#for line in dv.split(b"\n"): print (line)

		# Look for output between b'stream' and b'endstream'
		idx1 = dv.find(b'stream')
		idx2 = dv.find(b'endstream')
		
		# and write that wholesale and unmodified into a FPDF page
		self._out (dv[idx1+7:idx2])

#----------------------------------------------------------------------------------------
# class RendererPdf2(RendererPdf):
# 	_afm_font_dir = cbook._get_data_path("fonts/pdfcorefonts")
# 	_use_afm_rc_name = "pdf.use14corefonts"
	
# 	def __init__(self, file, image_dpi, height, width):
# 		super().__init__(file, image_dpi, height, width)
# 		self.file = file
# 		self.gc = self.new_gc()
# 		self.image_dpi = image_dpi

# 	def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
# 		print (f'draw_text: {s} @ {x},{y} - {prop}')
# 		super().draw_text(gc, x, y, s, prop, angle, ismath, mtext)

#----------------------------------------------------------------------------------------
class Pdf_Object (PdfFile):
	"""
		For now, we generate a PDF in memory and re-use anything between stream and endstream labels
		and can see a MatPlotLib figure rendered in an FPDF page. We need to sort font references
		next and if that works we can remove PDF building blocks we will never use. 
	"""
	
	def __init__ (self, filename, metadata=None, parent=None ):
		super().__init__(filename, metadata=None)
		self.parent = parent

	def XRef(self):
		return self.xrefTable
		
	def fontName(self, fontprop):
		"""
			Font names used in the rendered MatPlotLib Figure are references to a font table (key in a dictionary)
			e.g. sans\-serif:style=normal:variant=normal:weight=normal:stretch=normal:size=10.0 is "/F1"
			----
			The generated figure->pdf has to reference the font name used internal to FPDF rather than the one
			from the MatPlotLib pdf rendering backend
		"""

		print (f'FontProp: {fontprop}')

		# TTF? Needs work
		if isinstance(fontprop, str):
			self.parent.add_font(fname=fontprop)
			for k,v in self.parent.fonts.items():
				if str(v['ttffile']) == fontprop:
					self.parent.set_font ('arial', size=10.0)
					return (v['fontkey'])
		# Built in
		elif isinstance(fontprop, FontProperties):
			self.parent.set_font(fontprop.get_name(), size=fontprop.get_size())
			return self.parent.current_font['i']

#	def newPage(self, width, height):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L769 """
#		self.output ('PdfFile.newPage')
#		super().newPage(width, height)

#	def newTextnote(self, text, positionRect=[-100, -100, 0, 0]):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L798 """
#		self.output ('PdfFile.newTextnote')
#		super().newTextnote(text, positionRect)

#	def finalize(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L823 """
#		self.output ('PdfFile.finalize')
#		super().finalize ()

#	def close(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L856 """
#		self.output ('PdfFile.close')
#		super().close()

#	def beginStream(self, id, len, extra=None, png=None):
		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L877 """
#		self.output ('PdfFile.beginStream')
#		super().beginStream (id, len, extra=None, png=None)
		
#	def endStream(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L881 """
#		self.output ('PdfFile.endStream')
#		super().endStream()
		
#	def fontName(self, fontprop):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L895 """
#		self.output ('PdfFile.fontName')
#		super().fontName (fontprop)
		
#	def dviFontName(self, dvifont):
		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L926 """
#		self.output ('PdfFile.dviFontName')
#		super().dviFontName (dvifont)
		
#	def writeFonts(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L956 """
#		self.output ('PdfFile.writeFonts')
#		super().writeFonts ()
		
#	def _write_afm_font(self, filename):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L977 """
#		self.output ('PdfFile._write_afm_font')
#		super()._write_afm_font (filename)
		
#	def _embedTeXFont(self, fontinfo):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L989 """
#		self.output ('PdfFile._embedTeXFont')
#		super()._embedTeXFont (fontinfo)
		
#	def createType1Descriptor(self, t1font, fontfile):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1047 """
#		self.output ('PdfFile.createType1Descriptor')
#		super().createType1Descriptor (fontinfo)
		
#	def embedTTF(self, filename, characters):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1138 """
#		self.output ('PdfFile.embedTTF')
#		super().embedTTF (filename, characters)
#
#	def writeExtGSTates(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1520 """
#		self.output ('PdfFile.writeExtGSTates')
#		super().writeExtGSTates ()
		
#	def _write_soft_mask_groups(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1529 """
#		self.output ('PdfFile._write_soft_mask_groups')
#		super()._write_soft_mask_groups ()
		
#	def writeHatches(self):
##		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1553 """
#		self.output ('PdfFile.writeHatches')
#		super().writeHatches ()
		
#	def writeGouraudTriangles(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1614 """
#		self.output ('PdfFile.writeGouraudTriangles')
#		super().writeGouraudTriangles ()
#		
#	def _writePng(self, img):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1693 """
#		self.output ('PdfFile._writePng')
#		super()._writePng (img)

#	def _writeImg(self, data, id, smask=None):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1722 """
#		self.output ('PdfFile._writeImg')
#		super()._writePng (data, id, smask)

#	def writeImages(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1781 """
#		self.output ('PdfFile.writeImages')
#		super().writeImages ()

#	def writeMarkers(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1820 """
#		self.output ('PdfFile.writeMarkers')
#		super().writeMarkers ()

#	def writePathCollectionTemplates(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1850 """
#		self.output ('PdfFile.writePathCollectionTemplates')
#		super().writePathCollectionTemplates ()

#	def writePath(self, path, transform, clip=False, sketch=None):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1880 """
#		self.output ('PdfFile.writePath')
#		if clip:
#			#print ('Clip')
#			clip = (0.0, 0.0, self.width * 72, self.height * 72)
#			simplify = path.should_simplify
#		else:
#			#print ('No Clip')
#			clip = None
#			simplify = False
#
#		cmds = self.pathOperations(path, transform, clip, simplify=simplify, sketch=sketch)
#		self.output(*cmds)
#
#		# Return the pdf draw command
#		return (cmds)
#		super().writePath (path, transform, clip, sketch)

#	def writeObject(self, object, contents):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1905 """
#		self.output ('PdfFile.writeObject')
#		super().writeObject (object, contents)

#	def writeXref(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1909 """
#		self.output ('PdfFile.writeXref')
#		super().writeXref ()
#		
#	def writeInfoDict(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1922 """
#		self.output ('PdfFile.writeInfoDict')
#		super().writeInfoDict ()
		
#	def writeTrailer(self):
#		""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1928 """
#		self.output ('PdfFile.writeTrailer')
#		super().writeTrailer ()
		
#	def savefig(self, figure=None, **kwargs):
#		""" Based on https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#LL2724C1-L2745C57 """
#		if not isinstance(figure, Figure):
#			if figure is None: manager = Gcf.get_active()
#			else: manager = Gcf.get_fig_manager(figure)
#		
#			if manager is None: raise ValueError(f"No figure {figure}")
#		
#			figure = manager.canvas.figure
#
#		# Force use of pdf backend, as PdfPages is tightly coupled with it.
#		with cbook._setattr_cm(figure, canvas=FigureCanvasPdf2(figure)): figure.savefig(self, format="pdf", **kwargs)
#		
#	def finalize(self, pdf):
#		self.output ('PdfFile.finalize')
#		super().finalize()

#----------------------------------------------------------------------------------------
def main():
 
	# Set Seaborn plot style
	sns.set_style("dark")

	# Hide a bunch of missing font messages (xkcd graph)
	logging.getLogger('matplotlib.font_manager').setLevel(logging.ERROR)

	# Switch off compression and simplify fonts
	mpl.rcParams['pdf.compression'] = False
	mpl.rcParams['pdf.use14corefonts'] = True

	# Simple 1 page PDF
	pdf = Custom_FPDF()
	pdf.add_page()
	pdf.set_draw_color (0,0,0)
	#pdf.set_line_width(20)

	# Crudely hacked out of  MatPlotLib multipage PDF
	fig = Draw_BP_Graph()

	# Output thefigure using MatPlotLib
	with PdfPages('MatPlotLib_Output.pdf') as mpdf: mpdf.savefig ()
		
	# Protype FPDF extension
	pdf.MPL_Figure (fig)

	# Output what we've got into FPDF2 so far
	pdf.output ('FPDF_Output.pdf')

#----------------------------------------------------------------------------------------
# Main runtime entry point
if __name__ == "__main__": main()

@Lucas-C
Copy link
Member

Lucas-C commented Jun 5, 2023

Hi @LandyQuack!

Sorry for the delay, I have been a bit busy over the last 2 weeks.

Currently, when trying to run your latest script, I get this error:

  File "./issue_789c.py", line 163, in fontName
    self.parent.set_font(fontprop.get_name(), size=fontprop.get_size())
...
fpdf.errors.FPDFException: Undefined font: dejavu sans - Use built-in fonts or FPDF.add_font() beforehand

But I was able to solve this error by simply adding pdf.add_font("dejavu sans", fname="test/fonts/DejaVuSans.ttf") in main()

The resulting PDF is promising, but I see zero visible text. There might still be something wrong regarding font management.

Apart from that, I looked at the Custom_FPDF.MPL_Figure() method & Pdf_Object class you wrote.
Dumping the whole content stream to FPDF._out() is very "raw"...
Providing another implementations of the matplotlib.backends.backend_pdf.GraphicsContextPdf.commands could be a cleaner approach... There are only 9 commands there, that could all be implemented with calls to FPDF methods.
Have you considered this option?

Also, what is your end goal?
Would you like to contribute code to fpdf2?
If so, I will be relatively strict on the code quality if you want to add public methods to the fpdf package, but this can be a very good learning exercice 😊
On the other hand, an autonomous script could be provided as part of our docs/ (maybe in https://pyfpdf.github.io/fpdf2/Maths.html?), and I would be less strict on the code quality then, as long as it's relatively short.
And finally, you of course choose not to share your code in fpdf2, which is totally fine 😅. In that case I'm still available to answer your questions, and just hope the solution you found solved your initial need!

@LandyQuack
Copy link
Author

Hi Lucas - no worries at all at the delay.

Agree re "raw"ness of that approach - was more to get a handle on what was happening where in the code. Have done much as you suggest but subclassed PdfFile because it seemed easier to start with something which worked and then add diagnostics as and where I needed.

So... where is the code up to?

	# Our FPDF version
	fpdf =  MPL_FPDF()
	print ('FPDF')

	fpdf.add_font(fname='/Library/Fonts/Microsoft/Times New Roman.ttf')
	fpdf.add_font(fname='/Library/Fonts/Microsoft/Arial.ttf')
	fpdf.add_font(family='dejavu sans mono', fname='/Users/iain/Library/Fonts/DejaVuSansMono.ttf')
	fpdf.add_font(fname='/System/Library/Fonts/Supplemental/Courier New.ttf')
	#fpdf.set_font("Arial", size=10)
	
	for fig in figs:
		f = fig()
		fpdf.add_page()
		fpdf.savefig (figure=f, bbox_inches='tight')
		plt.close(f)

	# Output what we've got into FPDF2 so far
	fpdf.output ('Output_FPDF.pdf')

ends up in

class MPL_FPDF(FPDF):
	#----------------------------------------------------------------------------------------
	def savefig(self, figure=None, **kwargs):
		if not isinstance(figure, Figure):
			if figure is None: manager = Gcf.get_active()
			else: manager = Gcf.get_fig_manager(figure)
			if manager is None: raise ValueError(f"No figure {figure}")
			figure = manager.canvas.figure

		# Fpdf uses top left origin, matplotlib bottom left so... fix Y axis
		ax = figure.gca()
		ax.set_ylim(ax.get_ylim()[::-1])
		ax.xaxis.tick_top()  
		
		# Fix title position
		mpl.rcParams['axes.titley'] = -0.1

		# Force use of pdf backend, as PdfPages is tightly coupled with it.
		with cbook._setattr_cm(figure, canvas=FigureCanvasPdf2(figure, parent=self)):
			figure.savefig(self, format="pdf", **kwargs)

and taking draw_text as an example

	def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):

		#print (f'draw_text: {s} @ {x},{y} - {prop} @ {angle} degrees')

		if isinstance(prop, str):
			self.parent.add_font(fname=prop)
			for k,v in self.parent.fonts.items():
				if str(v['ttffile']) == prop:
					print (f'Font: prop')
					self.parent.set_font('Arial', size=10.0)
		# Built in
		elif isinstance(prop, FontProperties):
			self.parent.set_font(prop.get_name(), size=prop.get_size())

		x,y = self._trans.transform ((x,y))
		self.parent.text(x,y,s)

with self.parent.text being fpdf.text

So... I can draw a number of basic / standard matplotlib figures directly into fpdf :-) Fonts work but need to sort rotated text yet.

I need to (a) make it not subclass the existing pdf renderer from MatPlotLib because I don't think it needs to (b) figure out how to fit the resulting output into an FPDF container (say a table cell) - more below (c) figure out why the anatomy path with the markers on doesn't draw in MPL but does in my code and (d) do proper circles (think I just need to tell the renderer that we speak bezier.

This is a screenshot of what my output (non "raw" drawing direct into fpdf using the existing drawing commands looks like. I'm pleased with progress so far.

Screenshot 2023-06-05 at 21 22 43

and for simpler plots it works out of the box and looks like MPL.

As above, need to figure out how to get what I'm generating into the right place / size on the screen. I'm currently doing this:

self._scale = scale        # scale = self._parent.epw / (width*self.figure.dpi)
self._origin = (2,2)

# Setup our transform
self._trans = Affine2D().scale(self._scale).translate(*self._origin)

so can size and position where needed but need to see what fpdf actually needs me to do.

End goal... hmm, I'm a medic rather than a coder so for what I need/want it's tediously simple vector graphs in amongst text in a PDF (kinda what being lazy I'd have done with Word). Raster graphics would probably have been fine but the purist in me much prefer nice crisp vectors. I just thought I'd see what I could do in code because I enjoy it. Learned about affine transformations along the way.

If I can make something others can get use out of - even better. I get from the community so if I can give back, seems fair.

There will be 20 more optimal ways of doing some of what I've done so think I'll offer the final working version for someone who knows what they're doing to look at / use in whatever way they see fit :-) This is hobby stuff for me and the rest of life keeps me busy enough to not want to maintain code / debug esoteric corner cases.

Current code base attached.

Iain

mpl5.zip

@LandyQuack
Copy link
Author

Interesting discovery last night - mpl.use

	pdf =  FPDF()
	pdf.set_font('Times')

	# Use our custom renderer
	mpl.use("module://fpdf_renderer")
	
	pdf.add_font(family='dejavu sans mono', fname='/Users/iain/Library/Fonts/DejaVuSansMono.ttf')
	
	for fig in figs:
		f = fig()
		origin = (20,100)
		scale = 0.3

		pdf.add_page()
		f.savefig (fname=None, fpdf=pdf, origin=origin, scale=scale, bbox_inches='tight')
		plt.close(f)

	# Output what we've got into FPDF 
	pdf.output ('Output_FPDF.pdf')

where fpdf_renderer.py looks like the code below. Needs quite a bit of work yet but got text and a grid in a pdf.

"""
	Based on https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/backends/backend_template.py
	
	Just need to tell MatPlotLib to use this renderer and then do fig.savefig.
"""

from matplotlib import _api
from matplotlib._pylab_helpers import Gcf
from matplotlib.backend_bases import (FigureCanvasBase, FigureManagerBase, GraphicsContextBase, RendererBase)
from matplotlib.figure import Figure
from matplotlib.transforms import Affine2D
import matplotlib as mpl

class RendererTemplate(RendererBase):
	""" Removed draw_markers, draw_path_collection and draw_quad_mesh - all optional, we can add later """

	def __init__(self, dpi, fpdf, transform):
		super().__init__()
		self.dpi = dpi
		print (f'FPDF: {fpdf}')
		self._fpdf = fpdf
		self._trans = transform

		# some safe defaults
		if fpdf:
			fpdf.set_draw_color(0,0,0)
			fpdf.set_fill_color(255,0,0)

			#		
	def draw_path(self, gc, path, transform, rgbFace=None):

		#self.check_gc(gc, rgbFace)
		gc.paint()
		
		# Unzip the path segments into 2 arrays - commands and vertices, the transform sorts scaling and positioning
		tran = transform + self._trans
		c,v = zip(*[(c,v.tolist()) for v,c in path.iter_segments(transform=tran)])

		p = self._fpdf
		
		with p.local_context():
			
			if rgbFace: p.set_draw_color (rgbFace[:3])
			
			#p.set_line_width (gc._linewidth*self._scale)
			
			match c:
				# Polygon - starts with moveto, end with closepoly - DF means draw and fill
				case [path.MOVETO, *_, path.CLOSEPOLY]:
					p.polygon(v[:-1],style="DF")
	
				# Simple line
				case [path.MOVETO, path.LINETO]:
					p.polyline(v)
	
				# Polyline - move then a set of lines
				case [path.MOVETO, *mid, path.LINETO] if all(e == path.LINETO for e in mid):
					p.polyline (v)
	
				case _:
					print (f'draw_path: Unmatched {c}')
		
	def draw_image(self, gc, x, y, im):
		pass

	def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
		print (f'[{x},{y}] {s}')
		x,y = self._trans.transform ((x,y))
		self._fpdf.text(x,y,s)

	def flipy(self):
		return True

	def get_canvas_width_height(self):
		return 100, 100

	def get_text_width_height_descent(self, s, prop, ismath):
		return 1, 1, 1

	def new_gc(self):
		return GraphicsContextTemplate()

	def points_to_pixels(self, points):
		# if backend doesn't have dpi, e.g., postscript or svg
		return points
		# elif backend assumes a value for pixels_per_inch
		# return points/72.0 * self.dpi.get() * pixels_per_inch/72.0
		# else
		# return points/72.0 * self.dpi.get()


class GraphicsContextTemplate(GraphicsContextBase):
	"""
	The graphics context provides the color, line styles, etc.  See the cairo
	and postscript backends for examples of mapping the graphics context
	attributes (cap styles, join styles, line widths, colors) to a particular
	backend.  In cairo this is done by wrapping a cairo.Context object and
	forwarding the appropriate calls to it using a dictionary mapping styles
	to gdk constants.  In Postscript, all the work is done by the renderer,
	mapping line styles to postscript calls.
	
	If it's more appropriate to do the mapping at the renderer level (as in
	the postscript backend), you don't need to override any of the GC methods.
	If it's more appropriate to wrap an instance (as in the cairo backend) and
	do the mapping here, you'll need to override several of the setter
	methods.
	
	The base GraphicsContext stores colors as an RGB tuple on the unit
	interval, e.g., (0.5, 0.0, 1.0). You may need to map this to colors
	appropriate for your backend.
	"""

########################################################################
#
# The following functions and classes are for pyplot and implement
# window/figure managers, etc.
#
########################################################################


class FigureManagerTemplate(FigureManagerBase):
	"""
	Helper class for pyplot mode, wraps everything up into a neat bundle.
	
	For non-interactive backends, the base class is sufficient.  For
	interactive backends, see the documentation of the `.FigureManagerBase`
	class for the list of methods that can/should be overridden.
	"""


class FigureCanvasTemplate(FigureCanvasBase):
	"""
	The canvas the figure renders into.  Calls the draw and print fig
	methods, creates the renderers, etc.
	
	Note: GUI templates will want to connect events for button presses,
	mouse movements and key presses to functions that call the base
	class methods button_press_event, button_release_event,
	motion_notify_event, key_press_event, and key_release_event.  See the
	implementations of the interactive backends for examples.
	
	Attributes
	----------
	figure : `matplotlib.figure.Figure`
	    A high-level Figure instance
	"""
	
	# The instantiated manager class.  For further customization,
	# ``FigureManager.create_with_canvas`` can also be overridden; see the
	# wx-based backends for an example.
	manager_class = FigureManagerTemplate

	def draw(self):
		"""
		Draw the figure using the renderer.
		
		It is important that this method actually walk the artist tree
		even if not output is produced because this will trigger
		deferred work (like computing limits auto-limits and tick
		values) that users may want access to before saving to disk.
		"""
		print (f'Draw: {self._fpdf}')

		renderer = RendererTemplate(self.figure.dpi, self._fpdf, self._trans)
		self.figure.draw(renderer)

		# You should provide a print_xxx function for every file format
		# you can write.
		
		# If the file type is not in the base set of filetypes,
		# you should add it to the class-scope filetypes dictionary as follows:
		filetypes = {**FigureCanvasBase.filetypes, 'fpdf': 'My magic FPDF format'}

	def print_fpdf(self, filename, **kwargs):
		self._fpdf = self._trans = origin = scale = None

		# if not isinstance(self.figure, Figure):
		# 	if self.figure is None: manager = Gcf.get_active()
		# 	else: manager = Gcf.get_fig_manager(figure)
		# 	if manager is None: raise ValueError(f"No figure {self.figure}")
		# 	figure = manager.canvas.figure

		# Fpdf uses top left origin, matplotlib bottom left so... fix Y axis
		ax = self.figure.gca()
		ax.set_ylim(ax.get_ylim()[::-1])

		# We pass scale, origin and a handle to the fpdpf instance through here
		for k,v in kwargs.items():
			match (k):
				case 'fpdf': self._fpdf = v
				case 'origin': origin = v
				case 'scale': scale = v
				case _:
					print (f'Unrecognised keyword {k} -> {v}')
		
		# Build our transformation do scale and offset for whole figure
		if origin and scale:
			print ('Transform')
			self._trans = Affine2D().scale(scale).translate(*origin)

		self.draw()
		
	def get_default_filetype(self):
		return 'fpdf'


########################################################################
#
# Now just provide the standard names that backend.__init__ is expecting
#
########################################################################

FigureCanvas = FigureCanvasTemplate
FigureManager = FigureManagerTemplate

@Lucas-C
Copy link
Member

Lucas-C commented Jun 14, 2023

Interesting!

You are really performing an in-depth research 😊

@LandyQuack
Copy link
Author

Afternoon - 1 thing which is giving me a little difficulty is the text positioning. It looks like fpdf uses x,y as the origin (bottom left I think) for the text whereas matplotlib is using x,y as the centre of the text I think.

	def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
		print (f'RendererTemplate.draw_text - {s} at {x:.0f},{y:.0f} at angle {angle:.1f} with prop {prop} - {mtext}')
		#print (f'RendererTemplate.draw_text - {s} at {x:.0f},{y:.0f} - {mtext}')
		
		if isinstance(prop, str):
			raise ValueError (f'draw_text.prop is a string ({prop}) - add code to add font')

		# We're expecting a FontProperties instance
		elif isinstance(prop, FontProperties):
			g_fpdf.set_font(prop.get_name(), size=prop.get_size())


		# Transform our data point
		x,y = g_ttrans.transform ((x,y))
		#print (f'[{x:.0f},{y:.0f}] {s}')

		# Get text width to sort positioning - MPL centers on co-ordinate
		tw = g_fpdf.get_string_width(s)

		match angle:
			case 0:
				x -= (tw/2)
				g_fpdf.text(x,y,s)
			case 90 | 90.0:
				print (f'Rotate1 to "{angle}" {type(angle)}') 
				y += (tw/2)
				with g_fpdf.rotation(angle=angle, x=x, y=y):
					g_fpdf.text(x,y,s)
			case _:
				print (f'Rotate to "{angle}" {type(angle)}') 
				with g_fpdf.rotation(angle=angle, x=x, y=y):
					g_fpdf.text(x,y,s)

works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?

@Lucas-C
Copy link
Member

Lucas-C commented Aug 2, 2023

Hi @LandyQuack!

Are you still playing with this? 😊

works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?

fpdf2 does not have a get_string_height function, but it's usually the opposite:
when users call FPDF.cell() / FPDF.multi_cell() / FPDF.write(), they provide a a h= parameter defining the line height.

@Lucas-C Lucas-C added research needed too complicated to implement without careful study of official specifications pending-answer performance image labels Aug 2, 2023
@LandyQuack
Copy link
Author

LandyQuack commented Aug 2, 2023 via email

@Lucas-C
Copy link
Member

Lucas-C commented Aug 2, 2023

Thank you for the update Iain

Take your time, and enjoy the summer / your holidays 😊

I'll be happy to give you some feedbacks if you at some point you want to submit a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
image performance research needed too complicated to implement without careful study of official specifications
Projects
None yet
Development

No branches or pull requests

2 participants