Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNN-based Meta-Learning for Sparse Portfolio Optimization #52

Open
kayuksel opened this issue Jan 6, 2023 · 5 comments
Open

GNN-based Meta-Learning for Sparse Portfolio Optimization #52

kayuksel opened this issue Jan 6, 2023 · 5 comments

Comments

@kayuksel
Copy link

kayuksel commented Jan 6, 2023

Hello,

Let me start by saying that I am a fan of your work here. I have recently open-sourced by GNN-based meta-learning method for optimization. I have applied it to the sparse index-tracking problem from real-world (after an initial benchmarking on Schwefel function), and it seems to outperform Fast CMA-ES significantly both in terms of producing robust solutions on the blind test set and also in terms of time (total duration and iterations) and space complexity. I include the link to my repository here, in case you would consider adding the method or the benchmarking problem to your repository. Note: GNN, which learns how to generate populations of solutions at each iteration, is trained using gradients of the loss function, as opposed to black-box algorithms here.

x

Sincerely, K

@kayuksel kayuksel changed the title A GNN-based Meta-Learning Method for Sparse Portfolio Optimization GNN-based Meta-Learning for Sparse Portfolio Optimization Jan 6, 2023
@NaturalGradient
Copy link
Collaborator

Hi @kayuksel! I saw your post on Reddit I think, very cool stuff :) we'd always love to integrate new techniques into EvoTorch, so if you have time, maybe you'd consider reformatting your code to work in the EvoTorch searchalgorithm API and open a pull request? Otherwise I'll see if I can get to doing that myself when I find time.

@kayuksel
Copy link
Author

Thank you very much, Timothy. Would it help you if I provided a highly simplified version (~75 lines) as below? FYI, I also created a game-theoretic or adversarial version of the generative model for global optimization that utilizes a positive surprise mechanism obtained through a surrogate model (critic) trained simultaneously with Adaptive Gradient Clipping. It is quite competitive against the best optimizers in Nevergrad in large-scale non-convex optimization problems, especially with noisy rewards. Note: The only bottleneck seems to be the random seed sensitivity, for which GradInit (arxiv.org/abs/2102.08098) seems to be a solution.

import torch
import torch.nn as nn

def schwefel(x):
    x = x * 500
    return 418.9829 * x.shape[1] - (x * x.abs().sqrt().sin()).sum(dim=1)
    
def init_weights(model):
    for m in model.modules():
        if isinstance(m, nn.BatchNorm1d):
            m.weight.data.fill_(1)
        elif isinstance(m, nn.Linear):
            nn.init.xavier_uniform_(m.weight, gain = 5/3)
        if hasattr(m, 'bias') and m.bias is not None: m.bias.data.zero_()

class LSTMModule(nn.Module):
    def __init__(self, input_size = 1, hidden_size = 1, num_layers = 2):
        super(LSTMModule, self).__init__()
        self.rnn = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.h = torch.zeros(num_layers, 1, hidden_size, requires_grad=True).cuda()
        self.c = torch.zeros(num_layers, 1, hidden_size, requires_grad=True).cuda()
    def forward(self, x):
        self.rnn.flatten_parameters()
        out, (h_end, c_end) = self.rnn(x, (self.h, self.c))
        self.h.data = h_end.data
        self.c.data = c_end.data
        return out[:,-1, :].flatten()

class Extractor(nn.Module):
    def __init__(self, latent_dim, ks = 5):
        super(Extractor, self).__init__()
        self.conv = nn.Conv1d(args.noise, latent_dim,
            bias = False, kernel_size = ks, padding = (ks // 2) + 1)
        self.conv.weight.data.normal_(0, 0.01)
        self.activation = nn.Sequential(nn.BatchNorm1d(
            latent_dim, track_running_stats = False), nn.Mish())
        self.gap = nn.AvgPool1d(kernel_size = args.batch, padding = 1)
        self.rnn = LSTMModule(hidden_size = latent_dim)
    def forward(self, x):
        y = x.unsqueeze(0).permute(0, 2, 1)
        y = self.rnn(self.gap(self.activation(self.conv(y))))
        return torch.cat([x, y.repeat(args.batch, 1)], dim = 1)
        
class Generator(nn.Module):
    def __init__(self, noise_dim = 0):
        super(Generator, self).__init__()
        def block(in_feat, out_feat):
            return [nn.Linear(in_feat, out_feat), nn.Tanh()]
        self.model = nn.Sequential(
            *block(noise_dim+args.cnndim, 480), *block(480, 1103), nn.Linear(1103, args.funcd))
        init_weights(self)
        self.extract = Extractor(args.cnndim)
        self.std_weight = nn.Parameter(torch.zeros(args.funcd).cuda())
    def forward(self, x):
        mu = self.model(self.extract(x))
        return mu + (self.std_weight * torch.randn_like(mu))

actor = Generator(args.noise).cuda()
opt = torch.optim.AdamW(filter(lambda p: p.requires_grad, actor.parameters()), lr=1e-3)
best_reward = None
for epoch in range(args.iter):
    torch.cuda.empty_cache()
    opt.zero_grad()
    z = torch.randn((args.batch, args.noise)).cuda().requires_grad_()
    rewards =  schwefel(actor(z).tanh())
    min_index = rewards.argmin()
    if best_reward is None: best_reward = rewards[min_index]
    actor_loss = rewards.mean()
    actor_loss.backward()
    nn.utils.clip_grad_norm_(actor.parameters(), 1.0)
    opt.step()
    with torch.no_grad():
        if rewards[min_index] > best_reward: continue
        best_reward = rewards[min_index]
        print('epoch: %i loss: %f' % (epoch, best_reward.item()))

@kayuksel
Copy link
Author

It is also good and applicable for combinatorial optimization, simply by sampling with the sigmoid of logits from the generator.

@kayuksel
Copy link
Author

Hello again,

I've made a quick comparison on 30-dim Schwefel (as the hardest math function) against Nevergrad here.
In my experiments, it easily scales to 100K+ dimensions in most other optimization benchmark functions.
https://github.com/kayuksel/genmeta-vs-nevergrad

Sincerely, K

@kayuksel
Copy link
Author

fyi, I have also applied it to MovieLens 1M matrix factorization problem (with 500K parameters), the codes are in the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants