-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest cleanup of geometry handling in the CPU version #198
Comments
I failed to reproduce what I thought was the most severe issue that motivated this proposal for a thorough cleanup with regards to how the geometry is being handled. I was under the impression that sometimes, the grid/lattice parameters read from |
I began reviewing the code and now it seems to me that what I will advocate with respect to the geometry will need not be any big revolution after all. I have looked into Line 70 in 2344a7b
Do we even need it? I mean, it seems to be used only in some obscure examples in examples/_bak/ . Do we want to have these as a part of the ppafm package? If not, I suggest we remove those examples along with the function setFF . Otherwise, if we want to keep it, I suggest the following small cleanup: setFF seems to reach for the global grid parameters twice if cell is None ; first directly in its body and second time (but this never happens because by then cell has been already defined by the first case) through setFF_shape which in turn calls setGridCell . I would remove the test of cell is None and the subsequent assignment of gridA , gridB , gridC from setFF , relying on this being done in setGridCell .Moreover, if we use gridA , gridB , and gridC as the default for the lattice vectors in core.py , I believe we should also use gridN as the default for the grid size there (in particular, probably in the setGridN function). This enhancement should in fact not change the behavior of our code at this moment, because the flow of the code is now such that the core.py functions are always called for some pre-established FF array and learn the grid from the shape of this array. That is fine and should remain that way, but I still think the setGridN should check the n argument for being None and use gridN as the fallback in such a case, just to make the code more robust.
|
Just to clarify how I expect the job related to this issue would be done: I will do it, of course, but only after hearing the opinion of others. |
Another candidate for deletion: |
yes, it is true I was aiming for such compromise. But it is also true some structures in the code are just for historical reasons (to minimize effor of rewritining parts of code which are already there)
I'm very much pro. If you have ideas pleas go ahead with proposing them. I'm very open to it. I'm just not enough systematic myself, and also currently it is not my priority. ad what you meanit is not used? OK, maybe So yes, this is also perhaps more historical. Firts I wrote ad |
@ProkopHapala, thanks for the reply. As for the
|
Now I see that the |
I agree that it is not nice from the point of code structure (modularity etc.). But the point is that But yes, it would makes sense to factor out these duplicite things into singe header file (.h). But I'm not sure how it works if the function is in |
OK, to be honest I don't remember exactly the code structure and the rational nor the historical path which lead to this. If you find it nicer that way, feel free to change it as you say |
Good. Concerning the duplicity of |
I'm also very much in support for packing things into a class object. And I already did something in this direction in the OCL API with the Lines 102 to 112 in 4e04d47
It could be nice to somehow interface this with the CPU part, maybe make the DataGrid a subclass of the whatever class we create for the core code, to avoid repeated code and maybe help somehow unify the two parts.
As for ASE, I think it would be a pretty good option for handling things. I get that too many dependencies is not desirable, but ASE is a pretty well established and stable package, and probably many people who use ppafm also already have ASE in their environment. Also, I already used ASE as an optional dependency in the GUI for the geometry viewer. Making our own solution is fine too, but it's a lot more work. |
Yes, I like your
I'm not sure which is best approach. Because python in very dynamic unlike C++, It woudl maybe slightly prefer (1) i.e. to have optional dynamic fields. The disadvantage is that it would require to put everywhere checks if some field is initialized, like I have to say, that one thing I hate about python is that due to dynamic initialization of class properties, it is often not clear what properties the class is supposed to have when reading the code (at least comparing to C++). This was actually main reason why I considered classes in python poinless for quite long time. Do you know some good practicies how to make the declaration of class properties in python more explicit?
Don't understand me wrong. I'm very much pro integration with ASE as optional dependency. I also use ASE and I like it. But I don't like the idea to depend on it in such core functionality as loading and manegind atomic coordinates (i.e. not being able to run ppafm at all without ASE) |
I would avoid deep class hierarchies (2). They become very rigid and difficult to change when new needs arise, and it's often not clear at which level of the hierarchy some piece of data should exist. So I would opt for composition instead (1, 3). I see 1 and 3 being structurally the same. E.g. either we store the lvec and grid as
One way to do this is to make all of the internal variables private (prefix with underscore Line 118 in 4e04d47
but it can also sometimes be None because it exists on the GPU instead. And then I create a property method that returns the array after checking that it's fine: Lines 152 to 158 in 4e04d47
The array = datagrid_instance.array With the doc string in the method, it also makes for nice documentation, so the API is clear: https://ppafm.readthedocs.io/en/latest/ppafm.ocl.html#ppafm.ocl.field.DataGrid.array |
We can probably make this simpler for ourselves by insisting that some of the fields always exist. The I think this consideration also encourages splitting the functionality into multiple classes (option 3.), because then if you only need the grid, for example, then you can just use the |
|
Not sure I understand.
|
@ProkopHapala: Both, sort of.
By now, I have started working on the problem of properly defining the grid origin, as needed to solve #223. On that occasion, I noticed there are parameters called |
I see, make sens
I agree, I was also considering .xsf files as intermediate result which allows easy debugging and eventually some way to analyze the results. ... Maybe I just don't understand the proper format of .xsf files. I did not saw any problem there. If there is such problem it make sense to correct it. |
@ProkopHapala: As for the problems with the XSF format:
Let me explain here how (to my understanding) the logic with the extra points on the grid cell boundary is supposed to work in the XSF files. First, if the length of a grid vector is a=(ax, ay, az) and there are Na points along that vector, the elementary grid step along that vector is da=(ax, ay, az)/(Na-1). This is always true. It is part of the definition of the XSF format and it is non-obvious, you can imagine a different format that would have Na instead of Na-1 in the denominator of that definition. For a general grid the dimensions of which have nothing to do with the periodic unit cell, this is end of the story. However, there is a little problem when we want the grid to cover exactly the unit cell. Imagine that a=(ax, ay, az) is now the lattice vector. We want to sample the cell by Na points along this vector with the grid points being (of course) equally spaced and, for starters, they are supposed to be unique, they do not repeat themselves because of the periodicity. The spacing between the points should of course be da=(ax, ay, az)/Na under this condition. So what should the corresponding grid vector be? Right, considering the formula that defines the relation between the grid vector and the grid spacing, it has to be a' = a*(Na-1)/Na But this is inelegant, we would prefer to have a' = a and this is what most codes (but not FHI-AIMS!) generating XSF files for periodic structures chose to enforce. The price to pay is adding a last point along every axis that is an identical copy of the first point by the lattice periodicity. So now, we have Na +1 points instead of Na and the last point is kind of superfluous in terms of information content, but the lattice vectors and grid vectors are the same, which makes us happy, because we can immediately see (looking at the XSF file header) that the grid indeed covers the whole unit cell. So, to sum up
|
@ProkopHapala Is there any good reason why the order of indices for the grid array is FF[iz,iy,ix] rather than FF[ix,iy,iz] in the CLI version? I used to think this is needed for the CPP parts of the code to be efficient but now as I look deeper into it, that does not seem to be the case. |
I think efficiency-wise it is rather opposite (we compute each z-stroke independently => better if z is the fastest axes in the array ) I think the reason for FF[iz,iy,ix] ordering was perhaps plotting ? (z-slices), but it is not very good motivation (matplotlib can plot FF[:,:,iz] as well as FF[iz,:,:] ). Maybe it is somehow useful in Fz->df conversion (but perhaps not), or for saving the data to .xsf ? I don't know. Eventually we may change the order ... but it would require changes on many places in the code ... co unless there is good reason .... ? But |
@ProkopHapala, thanks for the explanation with respect to the array ordering. As for the Fz->df conversion and saving |
@mondracek As far as I understood, the zyx ordering in the IO functions was mostly because that is how the XSF file format is defined. I specifically added those It should not be a big change. I think the only places in the OCL code that do IO related things are these two methods: Lines 192 to 195 in 9f009ae
Lines 227 to 230 in 9f009ae
The to_file method in particular transposes the array before saving.
|
@NikoOinonen, thanks for the explanation. So @ProkopHapala was right after all that the likely original motivation of the zyx order in the CLI code was reading and saving XSF files. The native ordering in the XSF file is the opposite of what we'd need, either in CLI or OCL. Seems that historically, the approach in CLI was to accept that ordering and interpret the array indices as ZYX while |
As for the |
I think it is woth it.
if you implement it, I'm pro :-)
Copy that |
As I mentioned in my comment to #52, the way the CLI version of our code handles the geometry parameters (
params['gridA'], ... params['gridN']
,lvec
,nDim
etc.) could use some cleanup. For starters, I propose we discuss this issue:gridA
,gridB
,gridC
, andgridN
parameters in the code? Do we want them to be just an optional input (in case the geometry is not specified by other means, in particular in a*.xsf
or other input file)? In such a case, they can be safely forgotten (I do not mean literally deleted, just ignored) after the initialization stage. The actual lattice vectors and grid size should be consistently carried in variables likelvec
andnDim
throughout the actual computation. Alternatively, do we wantparms['gridA'], ... params'['gridN']
to serve as a global variable? Then, we should consistently work with those. In the most extreme version of this choice, we would not even need to introducelvec
andnDim
variables at all. I can somehow understand that a compromise might be desirable: Perhaps we want some functions to be also usable as stand-alone entities, without being called from the main scripts. A moderately experienced user might want to do something in an interactive Python session. TheloadGeometry
function is one of those we may want to be able to call this way. Than, it makes sense for such a function to takelvec
, 'nDim` etc. as arguments and store them as its local variables. On the other hand, some other functions may be intended as mere subprograms of the main executable scripts and need access to global variables, since passing everything they need through arguments to them would be inefficient and would generate too long argument lists. I suspect this compromise was what @ProkopHapala had in mind when originally writing the code. Prokop, is that so? I am actually in favor of the compromise solution, given that we decide very carefully which functions should be the independent entities, taking geometry parameters as their arguments and perhaps returning them as their values, which functions should in contrast access the global variables, and why the chosen option is needed in every case. Because if we are not clear about this, we inevitably end up with the mess we currently have.*.xsf
and*.cube
files every time, stores it and handles accordingly. That is, the code should not rely on the origin being zero. We might even introduce an optional parameterparams['gridOrigin']
toparams.ini
, which would be used with*.xyz
files. That should solve the problem with negative coordinates of atoms in non-periodic systems, which is currently effectively forbidden.The text was updated successfully, but these errors were encountered: