Incorrect reading of two openly available test datasets in .ang file format #413

hakonanes · 2022-12-04T13:10:47Z

See issues with incorrectly read .ang files into CrystalMap in #411:

Scan units should be "um", not "nm"
Phase IDs of the AF96 dataset are incorrectly read

The text was updated successfully, but these errors were encountered:

argerlt · 2022-12-05T17:47:22Z

Here's the code snippet for loading the AF96 datasets correctly. I don't know enough about the TSL OIM software used to collect this data to know if ALL ebsd scans from tsl can be read like this, or if there are user choices that change the ordering of columns. Also not sure how best to make orix determine the correct phase data, or if that should be left up to orix users to add.

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 10 10:26:48 2022

@author: agerlt
"""

import numpy as np
import glob
from orix.quaternion import Rotation
from diffpy.structure import Atom, Lattice, Structure
from orix.crystal_map import CrystalMap, PhaseList
from orix import io
import os


try:
    os.mkdir("AF96")
except:
    # delete the files from the last run
    old = glob.glob("AF96/AF96_Large*.h5")
    [os.remove(x) for x in old]

angs = glob.glob("AF96_Large*.ang")
xmaps = []

#add iterator for naming files
iterator = 0
for i, ang in enumerate(angs):

    # Load the data
    e1, e2, e3, x, y, image_quality, confidence_index, phase, indexed, \
        fit_parameter = np.loadtxt(ang, unpack=True)

    # Make an orix .h5 file.
    eu = np.column_stack((e1, e2, e3))
    rots = Rotation.from_euler(eu)
    properties = dict(iq=image_quality.astype(np.float32),
                      ci=confidence_index.astype(np.float32),
                      fit_parameter=fit_parameter.astype(np.float32))
    # Create unit cells of the phases
    structures = [
        Structure(
            title="austenite",
            atoms=[Atom("fe", [0] * 3)],
            lattice=Lattice(0.360, 0.360, 0.360, 90, 90, 90),
        ),
        Structure(
            title="ferrite",
            atoms=[Atom("fe", [0] * 3)],
            lattice=Lattice(0.287, 0.287, 0.287, 90, 90, 90),
        ),
    ]

    phase_list = PhaseList(
        names=["austenite", "ferrite"],
        point_groups=["m-3m", "m-3m"],
        space_groups=[225, 229],
        structures=structures,
    )
    # Create a CrystalMap instance
    xmap = CrystalMap(
        rotations=rots,
        phase_id=phase.astype(np.int32),
        x=x.astype(np.float32),
        y=y.astype(np.float32),
        phase_list=phase_list,
        prop=properties,
        )
    xmap
    xmaps.append(xmap)

print("Saving everything as orix .h5 files...")
[io.save("AF96/AF96_Large_{}.h5".format(i+1), xmaps[i]) for i in np.arange(5)]
print("Done!")

argerlt · 2022-12-05T17:49:04Z

Also, attached is all the information on the collection software, taken from the following paper https://doi.org/10.1016/j.matchar.2019.109835

hakonanes · 2022-12-07T20:42:37Z

Thanks for pointing me to these test datasets, @argerlt. I've fixed the identified issues in #416, and hope to release it in a 0.10.3 patch next week.

In your code snippet above, you assume the following column names for the .ang file data:

e1, e2, e3, x, y, image_quality, confidence_index, phase, indexed, fit_parameter

I'm not sure about the ninth column, "indexed". What do you base this name on? In the file I've tested, Field of view 1_EBSD data_Raw.ang, this column contains only ones. However, in all other .ang files I've read before, un-indexed points are identified as having a confidence index (CI) of -1 and a pattern fit of 180 degrees. 29 points in the mentioned .ang file has a CI of -1, i.e. are identified as un-indexed. Thus, I believe the ninth column contains some other data. But I have no idea what, so I've named the data "unknown1" in the returned CrystalMap.prop dictionary.

argerlt · 2022-12-07T22:29:58Z

Your're right. that was a mistake on my part. The correct name is either "SEM signal" or "detector signal" or just "sem", which is left as 1 if there is no corresponding SEM data included.

Looking inside MTEX's .ang reader found here, I believe this case lines up with their description of "version 5" (line 113):

  % we need to guess one of the following conventions
  % Euler 1 Euler 2 Euler 3 X Y IQ CI Phase SEM_signal Fit
  % Euler 1 Euler 2 Euler 3 X Y IQ CI Fit phase
  % Euler 1 Euler 2 Euler 3 X Y IQ CI Fit unknown1 unknown2 phase
  % most important is the position of the phase
  
  % for future reference:
  % the following is taken from a recent .ang file - some new files might 
  % actually state the version in the header
  %
  % # NOTES: Start
  % # Version 1: phi1, PHI, phi2, x, y, iq (x*=0.1 & y*=0.1)
  % # Version 2: phi1, PHI, phi2, x, y, iq, ci
  % # Version 3: phi1, PHI, phi2, x, y, iq, ci, phase
  % # Version 4: phi1, PHI, phi2, x, y, iq, ci, phase, sem
  % # Version 5: phi1, PHI, phi2, x, y, iq, ci, phase, sem, fit
  % # Version 6: phi1, PHI, phi2, x, y, iq, ci, phase, sem, fit, PRIAS Bottom Strip, PRIAS Center Square, PRIAS Top Strip, Custom Value
  % # Version 7: phi1, PHI, phi2, x, y, iq, ci, phase, sem, fit. PRIAS, Custom, EDS and CMV values included if valid
  % # Phase index: 0 for single phase, starting at 1 for multiphase
  % # CMV = Correlative Microscopy value
  % # EDS = cumulative counts over a specific range of energies
  % # SEM = any external detector signal but usually the secondary electron detector signal
  % # NOTES: End
  %

My two cents:
Asking around in my lab, it seems the TSL .ang file format has changed some over the years, as has Oxford's. Thus, when trying to write a generic EBSD_loader, it seems the best practice would be to make a list of all possible formats, then pair it down based on column number, if columns contain integer or float data, etc.

That said, creating a comprehensive "if/then/else" tree for every oddball format sounds exhausting, and so far for me, saying "if 10 columns, assume phi1, Phi, phi2, x, y, iq, ci, phase_id, detector_signal, pattern_fit" has yet to fail, so such a function might only need be included if and when a test case is found that Orix mishandles.

hakonanes · 2022-12-08T08:43:34Z

[...] it seems the best practice would be to make a list of all possible formats, then pair it down based on column number

This is what the current reader does. See relevant lines in the updated possible columns in #416. Since ASTAR, EMsoft and orix have unique footprints in their .ang file header, if none of these footprints are found, we assume the file was written by EDAX TSL. Then, we determine the column names based on the number of columns available. Reading EDAX TSL .ang files with 10 or 15 columns should now work. The reader will fail as in #411 if a file with another number of columns is read. But it should not fail silently (as demonstrated), so we can improve it further when that happens.

I consider this fixed once #416 is merged.

argerlt · 2022-12-08T16:59:19Z

Ah, you are right.

In that case, my feedback is just "I believe unknown 1 should be changed to sem". Apologies for the long walk to a short answer.

"If I had had more time I would have written a shorter letter"
- Mark Twain

hakonanes added the bug Something isn't working label Dec 4, 2022

hakonanes added this to the v0.10.3 milestone Dec 4, 2022

hakonanes mentioned this issue Dec 4, 2022

Adding open source EBSD datasets #411

Open

This was referenced Dec 7, 2022

Prepare minor release 0.11.0 #415

Closed

Allow reading of EDAX TSL .ang files with ten columns #416

Merged

hakonanes closed this as completed in #416 Jan 31, 2023

hakonanes modified the milestones: v0.10.3, v0.11.0 Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect reading of two openly available test datasets in .ang file format #413

Incorrect reading of two openly available test datasets in .ang file format #413

hakonanes commented Dec 4, 2022

argerlt commented Dec 5, 2022 •

edited by hakonanes

Loading

argerlt commented Dec 5, 2022

hakonanes commented Dec 7, 2022

argerlt commented Dec 7, 2022

hakonanes commented Dec 8, 2022

argerlt commented Dec 8, 2022

Incorrect reading of two openly available test datasets in .ang file format #413

Incorrect reading of two openly available test datasets in .ang file format #413

Comments

hakonanes commented Dec 4, 2022

argerlt commented Dec 5, 2022 • edited by hakonanes Loading

argerlt commented Dec 5, 2022

hakonanes commented Dec 7, 2022

argerlt commented Dec 7, 2022

hakonanes commented Dec 8, 2022

argerlt commented Dec 8, 2022

argerlt commented Dec 5, 2022 •

edited by hakonanes

Loading