-
-
Notifications
You must be signed in to change notification settings - Fork 17
This page gathers R&D ideas to improve both performance and quality:
Marlin stores each edges using 2 float (edge slope and X position) and 3 integer values (next edge index, Y max, orientation +/- 1). It represents 5 x 4 ie 20 bytes.
Marlin uses 2 arrays: int[] and float[] wasting 2 values per edges (index alignment). Marlin Unsafe uses a single off-heap buffer storing edges (like c-struct) into a single byte array.
TODO:
- Try storing orientation (single bit) into Ymax integer using bit mask / shift operations => 16 bytes only per edge !
- Edges are left unsorted but active edge list is sorted at each scanline pass. Having an edge ID (int) that simply increments following the polygon edge iteration could help keeping edge continuity and certainly may help sorting / adding / removing edges in the active edge list.
STATUS: IN PROGRESS
These arrays store respectively the first edge index and edge count per subpixel Y value. The edge count uses a bit mask to store both new edge count and a removed flag indicating that at least one edge must be removed.
It helps a lot finding which edges to add (simply iterate edges using the first edge index and the next edge index in every edge) or remove edges.
To remove edges, it is not optimal as it needs to traverse all active edges (always small) as the removal count is not known.
TODO:
- using bit masks (to be defined maybe only 4 bits), it can be possible to replace the removed flag stored in the bucketCount array by a removed edge count. However, it may reduce the possible number of edge addition / removal (int overflow) !
- optimize active edge removal in the scanline processing
Marlin uses two sort algorithms depending on the AEL size:
- heap sort (crossing array)
- binary search then insertion sort
These are not optimal but it is quite complex to sort active edge list (edge pointers and X crossing arrays) according to their current X crossing and orientation. In fact, only newly added edges need deeply sort as these are totally unsorted whereas "old" edges in AEL are almost sorted (few inversions may be possible) ...
TODO:
- restore sort on orientation as it was done in pisces (bit mask)
- to improve cache locality, copy active edges into a new activeEdges 1D array (small) to maximize the cache coherence (spatial and time locality) and fix edgePtrs to point into the small activeEdges instead of the (sometimes) large complete edges array
- use quick sort or dual pivot sort to optimize sort of newly added edges
- skip sort whenever possible ! however, as polygons can have self intersections and even/odd filling rules can happen, it is quite dangerous to avoid sorting AEL at each scanline processing...
Try using DDA as former pisces implementation instead of floating point operations to update X crossing for each edge and each scanline. See https://code.google.com/p/pisces-graphics/
As it can lead to increased precision errors and openjdk pisces was modified to use floating point operations instead of DDA, this change is postponed for now.
As shown in [1], supersampled images should be filtered before rescaling to avoid artefacts (moire patterns or jaggies). Pisces / marlin (march 2013) only counts subpixels to compute the pixel coverage ie all subpixels (8x8) have the same weight (=1). To improve visual quality better resampling, filters can be implemented (tent or gaussian filters).
TODO:
- modify PiscesCache to store byte[] alpha values ie convert coverage to alpha earlier to avoid storing coverages as int[] (large arrays)
- allow infinite precision ie more than 16x16 subpixels ie adjust alpha map and normalization depending on the number of subpixels: 8x8 ~ 64 levels vs 256x256 ~ 65536 levels but only 256 possible alpha values; 8x8 ~ multiply by 4 vs 256x256 ~ divide by 256 => use appropriate bit shifts
STATUS: IN PROGRESS
TODO:
- precompute filter weights according to the subpixel grid (2x2 up to 16x16) but also their integration onto the subpixel grid for performance reasons to improve scaline processing (sp_x, sp_y)
- instead of counting covered subpixels during scanline processing, use filter weights per covered subpixels (use integer weights by scaling up by a scaling factor) before summation and store weighted coverage into alpha line (as usual). Warning: for each edge only 1 subpixel corresponds to the edge intersection; the pixel coverage corresponds to all subpixels between 2 edges: take care of correct filter integral between these subpixels: same several pixel(s) !
- normalize pixel coverages once all scanlines have been processed per pixel line according to the filter scaling factor
- convert then the pixel coverage (int) to an alpha value (byte) [0..1] ie [0.255] using the maximal precision ie 1/256 uncertainty
Several solutions have been implemented in GPU to compute more accurately the pixel coverage of a primitive (line):
- Analytical line coverage:
Using mathematical area coverage for a trapezoid, it is possible to compute the exact pixel or subpixel area covered by an edge: see openjdk's MaskFill.c
- Edge distance function:
See [2] for proper line rendering using 4 distance functions and filtering
See also Wu line algorithm: http://en.wikipedia.org/wiki/Xiaolin_Wu's_line_algorithm http://freespace.virgin.net/hugo.elias/graphics/x_wuline.htm
- Coverage mask:
The pixel coverage can be computed from the edge slope and pixel distance: see [3] and [4] ...
For very small polygons or details (intersection, corners or small polygons), the scanline step controls the detail level ie using 2x2 or 16x16 will refine the pixel coverage accuracy ie take into account the small polygon features.
Depending on the edge slope / edge intersections or edge length, it may be possible to refine the supersampling using 4x4 (by default) up to 16x16 or more.
Another issue concerns small polygons / shape details (corner or extrema) that are smaller than 1 pixel or few subpixels: again their coverage is too badly estimated due to rounded starting / ending edge coordinates: refining scanline increments should improve that but it can become very costly in terms of performance.
Adaptive algo is needed but rather more complex to do but I have the feeling it is the good direction: only refine scanlines when needed ~ error estimation / propagation in physics.
[1] Study of Supersampling Methods for Computer Graphics Hardware Antialiasing Goss, Michael E.; Wu, Kevin 12/05/2000 http://www.hpl.hp.com/techreports/1999/HPL-1999-121R1.html
[2] Prefiltered Antialiased Lines Using Half-Plane Distance Functions McNamara, Robert ; McCormack, Joel ; Jouppi, Norman P. 1998 http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-98-2.html
[3] Efficient Coverage Mask Generation for Antialiasing using Linear Edge Functions M.D. Walker, J.P. Ewins, M White, P.F. Lister http://www.researchgate.net/profile/martin_white6/publication/228573021_efficient_coverage_mask_generation_for_antialiasing_using_linear_edge_functions%2ffile%2f72e7e526ccbe92a4f2.pdf
[4] A New Simple and Efficient Antialiasing with Subpixel Masks Andreas Schilling 1991 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.3971&rep=rep1&type=pdf