Skip to content

Commit

Permalink
LOOP-OPTIMIZATION: restarting and parfors
Browse files Browse the repository at this point in the history
- Restarting protocol for both U-driven and T-driven lines

- Hybrid Distributed/Restarting protocol for (U,T) scans

  > the external T-driven loop is parallelized in independent processes through the native parfor construct

  > there is no restart connecting each initial T value

  > this assumes a sufficiently low/high interaction, so that the noninteracting/atomic limit seeds are sufficient to achieve a good and fast convergence, and proceed from there with the sequential, restarting-aided U-driven lines

  > this *requires* the Parallel Computing Toolbox to be actually performed, but runs just fine if such plug-in is missing: the parfor just fall back to traditional sequential loops.

A test with the current input state gives a full (U,T) diagram in less then five minutes, on a local machine with 4 physical i5 cores. Sounds good.
  • Loading branch information
beddalumia committed Feb 26, 2022
1 parent bb4fe75 commit 0ae3305
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 20 deletions.
4 changes: 2 additions & 2 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@
- [x] Compute the Luttinger integral, as defined in `PRB 90 075150 (2014)`. Since it appears to be quantized _at very low temperatures_, it could become the definitive **flag** for _quantum_ phase diagrams; much better than Z or S for it is an **integer**. [→ easier phase-boundary recognition!]
> Luttinger Theorem currently works for very low temperatures only. It might well be an inherent limitation, restricting its domain to the quantum Mott transition. Also note that to have a sharp first order step-up at the transition a very highly frequency resolution is needed, so to make the IPT solver performance-critical! [solved brilliantly with fast convolutions, see the `solver-optimization` section below]
- [x] `SOLVER-OPTIMIZATION`: make the SOPT runs faster, by optimizing the needed convolutions. [implemented a pow2-optimized FFTW-based custom algorithm that actually greatly improves the cpu-time for the `wres=2^15` calculation: almost a x10 overall speedup!]
- [x] `SOLVER-OPTIMIZATION`: make the SOPT run faster, by optimizing the needed convolutions. [implemented a pow2-optimized FFTW-based custom algorithm that actually greatly improves the cpu-time for the `wres=2^15` calculation: almost a x10 overall speedup!]

- [ ] `LOOP-OPTIMIZATION`: insert a "restarting" protocol for lines and full phase diagram spans. The gloc0=0 condition appers to be too unstable to obtain accurate UC1 values. Furthermore this would most probably speed up a lot the convergence, by lowering the required number of iterations.
- [x] `LOOP-OPTIMIZATION`: insert a "restarting" protocol for lines and full phase diagram spans. The gloc0=0 condition appers to be too unstable to obtain accurate UC1 values. Furthermore this would most probably speed up a lot the convergence, by lowering the required number of iterations.

- [x] `HPC-OPTIMIZATION`: configure an interface to cluster facilities and define the scheduling resources for optimal running [no distributed computing, just built-in handling of shared-memory parallelization]

Expand Down
2 changes: 1 addition & 1 deletion code/+phys/sopt.m
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

end

if FAST && DEBUG
if FAST & DEBUG % Avoiding && due to parfors...
%% Cross-Check
P_test = conv(A,A,'same');
err1 = norm(P_test-P)/norm(P_test);
Expand Down
51 changes: 35 additions & 16 deletions code/main.m
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,22 @@
end

%% INPUT: Physical Parameters
D = 4.0; % Bandwidth
U = 4.0; % On-site Repulsion
D = 1.0; % Bandwidth
U = 3.0; % On-site Repulsion
beta = inf; % Inverse Temperature

%% INPUT: Boolean Flags
MottBIAS = 0; % Changes initial guess of gloc (strongly favours Mott phase)
ULINE = 1; % Takes and fixes the given beta value and performs a U-driven line
ULINE = 0; % Takes and fixes the given beta value and performs a U-driven line
TLINE = 0; % Takes and fixes the given U value and performs a T-driven line
UTSCAN = 0; % Ignores both given U and beta values and builds a full phase diagram
SPECTRAL = 1; % Controls plotting of spectral functions
UTSCAN = 1; % Ignores both given U and beta values and builds a full phase diagram
SPECTRAL = 0; % Controls plotting of spectral functions
PLOT = 1; % Controls plotting of *all static* figures
GIF = 0; % Controls plotting of *animated* figures
UARRAY = 0; % Activates SLURM scaling of interaction values
TARRAY = 0; % Activates SLURM scaling of temperature values
DEBUG = 1; % Activates debug prints / plots / operations
TARRAY = 0; % Activates SLURM scaling of temperature values
RESTART = 1; % Activates the restarting strategies for lines
DEBUG = 0; % Activates debug prints / plots / operations
FAST = 1; % Activates fast FFTW-based convolutions

%% INPUT: Control Parameters
Expand All @@ -38,7 +39,7 @@
Ustep = 0.10 * D; % Hubbard U incremental step for phase diagrams
Umax = 6.00 * D; % Hubbard U maximum value for phase diagrams
Tmin = 1e-3; % Temperature U minimum value for phase diagrams
Tstep = 1e-3; % Temperature incremental step for phase diagrams
Tstep = 1e-2; % Temperature incremental step for phase diagrams
Tmax = 5e-2; % Temperature U maximum value for phase diagrams
dt = 0.05; % Frame duration in seconds (for GIF plotting)

Expand All @@ -63,17 +64,17 @@

% Initial guess for the local Green's function
if MottBIAS
gloc_0 = 0; % no bath -> no Kondo resonance -> strong Mott bias :)
seed = 0; % no bath -> no Kondo resonance -> strong Mott bias :)
else
gloc_0 = phys.bethe(w + 10^(-3)*1i,D); % D is the DOS "radius"
seed = phys.bethe(w + 10^(-3)*1i,D);
end

%% Workflows

if not( ULINE || TLINE || UTSCAN )
%% Single (U,T) point
fprintf('Single point evaluation @ U = %f, T = %f\n\n',U,1/beta); tic
[gloc,sloc] = dmft_loop(gloc_0,w,D,U,beta,mloop,mix,err);
[gloc,sloc] = dmft_loop(seed,w,D,U,beta,mloop,mix,err);
Z = phys.zetaweight(w,sloc);
I = phys.luttinger(w,sloc,gloc);
S = phys.strcorrel(w,sloc);
Expand All @@ -87,12 +88,17 @@
if ULINE
%% U-driven MIT line [given T]
fprintf('U-driven span @ T = %f\n\n',1/beta); tic
clear('gloc','sloc','Z','I','S')
clear('gloc','sloc','Z','I','S');
Uvec = Umin:Ustep:Umax; NU = length(Uvec);
gloc_0 = seed; gloc = cell(NU,1); sloc = gloc;
Z = zeros(NU,1); I = zeros(NU,1); S = zeros(NU,1);
for i = 1:NU
U = Uvec(i);
fprintf('< U = %f\n',U);
[gloc{i},sloc{i}] = dmft_loop(gloc_0,w,D,U,beta,mloop,mix,err,'quiet');
if(RESTART)
gloc_0 = gloc{i};
end
Z(i) = phys.zetaweight(w,sloc{i});
I(i) = phys.luttinger(w,sloc{i},gloc{i});
S(i) = phys.strcorrel(w,sloc{i});
Expand All @@ -112,10 +118,15 @@
fprintf('T-driven span @ U = %f\n\n',U); tic
clear('gloc','sloc','Z','I','S')
Tvec = Tmin:Tstep:Tmax; NT = length(Tvec);
gloc_0 = seed; gloc = cell(NT,1); sloc = gloc;
Z = zeros(NT,1); I = zeros(NT,1); S = zeros(NT,1);
for i = 1:NT
T = Tvec(i); beta = 1/T;
fprintf('< T = %f\n',T);
[gloc{i},sloc{i}] = dmft_loop(gloc_0,w,D,U,beta,mloop,mix,err,'quiet');
if(RESTART)
gloc_0 = gloc{i};
end
Z(i) = phys.zetaweight(w,sloc{i});
I(i) = phys.luttinger(w,sloc{i},gloc{i});
S(i) = phys.strcorrel(w,sloc{i});
Expand All @@ -134,23 +145,31 @@
%% Full Phase-Diagram [U-driven]
fprintf('Full phase diagram\n\n'); tic;
clear('gloc','sloc','Z','I','S')
%restart_gloc = gloc_0;
Tvec = Tmin:Tstep:Tmax; NT = length(Tvec);
Uvec = Umin:Ustep:Umax; NU = length(Uvec);
for i = 1:NT
gloc = cell(NT,NU); sloc = gloc;
Z = zeros(NT,NU); I = Z; S = Z;
feature('numcores');
parfor i = 1:NT
T = Tvec(i); beta = 1/T;
gloc_0 = seed;
for j = 1:NU
U = Uvec(j);
fprintf('< U = %f, T = %f\n',U, T);
[gloc{i,j},sloc{i,j}] = dmft_loop(gloc_0,w,D,U,beta,mloop,mix,err,'quiet');
%restart_gloc = gloc{i,j};
if(RESTART)
gloc_0 = gloc{i,j};
end
Z(i,j) = phys.zetaweight(w,sloc{i,j});
I(i,j) = phys.luttinger(w,sloc{i,j},gloc{i,j});
S(i,j) = phys.strcorrel(w,sloc{i,j});
end
end
if(PLOT)
phasemap = plot.phase_diagram(S,Umin,Ustep,Umax,Tmin,Tstep,Tmax);
S = S/max(max(S));
zetamap = plot.phase_diagram(Z,Umin,Ustep,Umax,Tmin,Tstep,Tmax);
luttmap = plot.phase_diagram(I,Umin,Ustep,Umax,Tmin,Tstep,Tmax);
strcmap = plot.phase_diagram(S,Umin,Ustep,Umax,Tmin,Tstep,Tmax);
end
ET = [0,0,toc]; fmt = 'hh:mm:ss.SSS';
fprintf('> %s < elapsed time\n\n',duration(ET,'format',fmt));
Expand Down
4 changes: 3 additions & 1 deletion docs/hpc_scaling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,6 @@ In conclusion I would suggest to perform single runs just on your personal works

## Distributed computing

At the moment no distributed computing support is provided for the package. Since the convergence of the scans (both lines and phase diagrams) is highly aided by a restarting protocol (making each single-point run dependent on the previous one) and even more the DMFT-loop is inherently all-entangled, there is no simple target for a MPI-like parallelization. Furthermore the overall performance of the current workflows seems quite acceptable to just discard the idea.
Some form of distrubuted computing is implemented for only the two-dimensional phase diagrams: the external (temperature) loops are parallelized through the native `parfor` construct, which autodetects the number of available physical cores and activates a corresponding number of independent 'workers'. This is reasonable within the assumption that each internal U-driven line starts at a sufficiently low interaction, so to make unuseful a restarting protocol that connects lines at different temperatures. No profiling has been done so far on this procedure.

> NB. If the Parallel Computing Toolbox is not installed (or if you are running GNU Octave) the `parfor` construct acts as an alias of the traditional sequential `for` loop.

0 comments on commit 0ae3305

Please sign in to comment.