Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The accuracy of my MDNN is too low, is there something wrong? #156

Open
zzzqqw opened this issue Dec 26, 2023 · 16 comments
Open

The accuracy of my MDNN is too low, is there something wrong? #156

zzzqqw opened this issue Dec 26, 2023 · 16 comments
Labels

Comments

@zzzqqw
Copy link

zzzqqw commented Dec 26, 2023

Hello, I trained a simple network to recognize the MNIST dataset with an accuracy of 0.97 before converting the network to MDNN. But the accuracy of MDNN is only around 0.10. May I ask what the reason is?
The code is as follows:

from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
import torch
import torch.nn as nn
import numpy as np
from torchvision import datasets, transforms
import memtorch
import pandas as pd
import copy
from memtorch.mn.Module import patch_model
from memtorch.map.Parameter import naive_map
import os

class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.linear1 = nn.Linear(784,256)
        self.linear2 = nn.Linear(256,64)
        self.linear3 = nn.Linear(64,10)

    def forward(self,x):
        x = x.view(-1,784)
        x = torch.relu(self.linear1(x))
        x = torch.relu(self.linear2(x))
        x = torch.relu(self.linear3(x))
        return x


model = Model()
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.8)


if not os.path.exists('saved_model'):
    os.mkdir('saved_model')

def train():
    for index,data in enumerate(train_loader):
        input,target = data
        input, target = input.to(device), target.to(device)
        optimizer.zero_grad()
        y_predict = model(input)
        loss = criterion(y_predict,target)
        loss.backward()
        optimizer.step()
        if index % 100 == 0:
            torch.save(model.state_dict(),"saved_model/simpleCNN_model_MNIST.pth")
            print("LOSS:%.2f" % loss.item())

def test(model1):
    correct = 0
    total = 0
    acc = 0
    model1.eval()
    with torch.no_grad():
        for data in test_loader:
            input,target = data
            input, target = input.to(device), target.to(device)
            output=model1(input)
            output = output.to(device)
            probability,predict=torch.max(output.data,dim=1)
            total += target.size(0)
            correct += (predict == target).sum().item()
        print("Accuracy:%.2f" % (correct / total))
        acc = correct / total
    return acc


transform = transforms.Compose({
    transforms.ToTensor(),
    transforms.Normalize((0.1307,),(0.3081))
})
train_data = MNIST(root='./datat',train=True,download=True,transform=transforms.ToTensor())
train_loader = DataLoader(train_data,shuffle=True,batch_size=64)
test_data = MNIST(root='./data',train=False,download=True,transform=transforms.ToTensor())
test_loader = DataLoader(test_data,shuffle=False,batch_size=64)

"""
    Training
"""
for epoch in range(5):
    train()
    test(model)

"""
    DNN TO MDNN
"""
r_on = 1.4e4
r_off = 5e7

model.load_state_dict(torch.load("saved_model/simpleCNN_model_MNIST.pth"), strict=True)

_ = test(model) # print the best accuarcy

def trial(r_on, r_off, tile_shape, ADC_resolution, sigma):
    model_ = copy.deepcopy(model)
    reference_memristor = memtorch.bh.memristor.VTEAM
    if sigma == 0.:
        reference_memristor_params = {'time_series_resolution': 1e-10, 'r_off': r_off, 'r_on': r_on}
    else:
        reference_memristor_params = {'time_series_resolution': 1e-10,
                                      'r_off': memtorch.bh.StochasticParameter(loc=r_off, scale=sigma * 2, min=1),
                                      'r_on': memtorch.bh.StochasticParameter(loc=r_on, scale=sigma, min=1)}

    patched_model = patch_model(copy.deepcopy(model_),
                                memristor_model=reference_memristor,
                                memristor_model_params=reference_memristor_params,
                                module_parameters_to_patch=[torch.nn.Linear],
                                mapping_routine=naive_map,
                                transistor=True,
                                programming_routine=memtorch.bh.crossbar.Program.naive_program,
                                scheme=memtorch.bh.Scheme.DoubleColumn,
                                tile_shape=tile_shape,
                                max_input_voltage=0.3,
                                ADC_resolution=int(ADC_resolution),
                                ADC_overflow_rate=0,
                                quant_method='linear')

    patched_model.tune_()
    return test(patched_model)

df = pd.DataFrame(columns=['tile_shape', 'ADC_resolution', 'sigma', 'test_set_accuracy'])
tile_shape = [(256, 64)]
ADC_resolution = np.linspace(2, 10, num=5, endpoint=True, dtype=int)
sigma = np.logspace(6, 7, endpoint=True, num=5)
for tile_shape_ in tile_shape:
    for ADC_resolution_ in ADC_resolution:
        for sigma_ in sigma:
            print('tile_shape: %s; ADC_resolution: %d; sigma: %d' % (tile_shape_, ADC_resolution_, sigma_))
            df = df.append({'tile_shape': tile_shape_,
                            'ADC_resolution': ADC_resolution_,
                            'sigma': sigma_,
                            'test_set_accuracy': trial(r_on, r_off, tile_shape_, ADC_resolution_, sigma_)},
                           ignore_index=True)
            df.to_csv('simpleCNN_MDNN.csv', index=False)
@RTCartist
Copy link

@zzzqqw Have you solved it? I think this problem is caused by the linear layer transformation from DNN to MDNN. I am handling a CNN regression neural network, and I always find it false or resulting in bad accuracy when I patch the linear layer.

@RTCartist
Copy link

@zzzqqw You can try to remove the tile_shape = [(256, 64)] and max_input_voltage=0.3 during patching the model, and the results may be great.

@charanecer12
Copy link

hi , i have tried installing memtorch package by all given methods,but it is throwing error during the process
i have attached the screenshot of it,please do help me out or please do guide the process u have followed to install and to proceed further
err_mem

@spectacles9468
Copy link

In the setup file, one of the requirements is sklearn, but it's name has changed to scikit-learn, change it

Copy link

stale bot commented Jun 11, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 11, 2024
@24367452
Copy link

24367452 commented Jul 9, 2024

Hello, I also encountered the problem of low accuracy. Is it because I used the method of memtorch incorrectly? If you could help me, I would be very grateful

memristor_model = SimpleCNN()
memristor_model.load_state_dict(torch.load("D:MemTorch_CNN\memristor_cnn\model.ckpt"), strict=False)
# Create new reference memristor
reference_memristor = memtorch.bh.memristor.VTEAM
reference_memristor_params = {"time_series_resolution": 1e-10}
memristor = reference_memristor(**reference_memristor_params)
memristor.plot_hysteresis_loop()


#
patched_model = patch_model(copy.deepcopy(memristor_model),
                            memristor_model=reference_memristor,
                            memristor_model_params=reference_memristor_params,
                            module_parameters_to_patch=[torch.nn.Conv1d,torch.nn.Linear],
                            mapping_routine=naive_map,
                            transistor=True,
                            programming_routine=None,
                            tile_shape=(128, 128),
                            max_input_voltage=1.0,
                            ADC_resolution=8,
                            ADC_overflow_rate=0.,
                            quant_method='linear')


patched_model.eval()
with torch.no_grad():
    Y_train_pre = []
    all_Y_train = []
    for X_batch, Y_batch in train_loader:
        X_batch = X_batch.view(X_batch.size(0), 1, 4)
        outputs = patched_model(X_batch)
        Y_train_pre.append(outputs)
        all_Y_train.extend(Y_batch.numpy())
    Y_train_pre = torch.cat(Y_train_pre, dim=0).numpy()

    Y_test_pre = []
    all_Y_test = []
    for X_batch, Y_batch in test_loader:
        X_batch = X_batch.view(X_batch.size(0), 1, 4)
        outputs = patched_model(X_batch)
        Y_test_pre.append(outputs)
        all_Y_test.extend(Y_batch.numpy())
    Y_test_pre = torch.cat(Y_test_pre, dim=0).numpy()

Y_train_pre = Y_train_pre * label_std + label_mean
Y_test_pre = Y_test_pre * label_std + label_mean
Y_train = np.array(all_Y_train)
Y_test = np.array(all_Y_test)


Y_train = Y_train * label_std + label_mean
Y_test = Y_test * label_std + label_mean

Y_train_error = np.sqrt(np.sum((Y_train - Y_train_pre) ** 2, axis=1))
Y_test_error = np.sqrt(np.sum((Y_test - Y_test_pre) ** 2, axis=1))

print('Y_train_error_max=', np.max(Y_train_error))
print('Y_train_error_min=', np.min(Y_train_error))
print('Y_train_error_mean=', np.mean(Y_train_error))

print('Y_test_error_max=', np.max(Y_test_error))
print('Y_test_error_min=', np.min(Y_test_error))
print('Y_test_error_mean=', np.mean(Y_test_error))

RMSE = np.sqrt(mean_squared_error(Y_test, Y_test_pre))
print('RMSE=', RMSE)

The output of the above code is as follows, which seems to be a fixed value

[-0.1142537  -0.07591234]
 [-0.1142537  -0.07591234]
 [-0.1142537  -0.07591234]
 [-0.1142537  -0.07591234]
 [-0.1142537  -0.07591234]
 [-0.1142537  -0.07591234]
 [-0.1142537  -0.07591234]
 [-0.1142537  -0.07591234]

@stale stale bot removed the stale label Jul 9, 2024
@24367452
Copy link

24367452 commented Jul 9, 2024

Hello, can you help me check where the problem lies @RTCartist

@RTCartist
Copy link

Hello, can you help me check where the problem lies @RTCartist
Please try to modulate the tile_shape and max_input_voltage parameters during patching the model. From where i am sitting, i think the mechanism of MemTorch is to add additional matrix calculation into original ideal software neural network, which is based on the parameters during patching the model. And these parameters influences the results a lot, and it seems it will encounter some problems or faults under some situations. So, please try to modify the parameters. Hope it can work well.

@24367452
Copy link

您好,您能帮我检查一下问题出在哪里吗 请在修补模型时尝试调节tile_shape和max_input_voltage参数。从我所处的位置来看,我认为 MemTorch 的机制是在原始理想软件神经网络中添加额外的矩阵计算,该算法基于修补模型期间的参数。而这些参数对结果影响很大,在某些情况下似乎会遇到一些问题或故障。因此,请尝试修改参数。希望它能很好地工作。

Thank you for your answer. As you mentioned, the parameters have a significant impact

@24367452
Copy link

Hello, I'm sorry to bother you again because I really can't find a solution to the problem. When the finite conductance state of non ideal factors changes, the accuracy of the model does not change, and even when the conductance state is set to 0, there is no change. I think it is because the true quantization part, memorchid_bindings. quantize (tensor, nquant_levels=quant, min=min, max=max), has not been executed, so there will be no impact. Do you have any solution.@RTCartist

@24367452
Copy link

@RTCartist 你好,能帮我看一下是为什么吗?

@RTCartist
Copy link

@RTCartist 你好,能帮我看一下是为什么吗?

I am not familiar with this problem. sorry that can't help you.

@24367452
Copy link

@RTCartist 你好,能帮我看一下是为什么吗?

我不熟悉这个问题。对不起,这帮不了你。

Thank you for your attention and answer!This question is indeed difficult to solve.

@RTCartist
Copy link

RTCartist commented Jul 17, 2024 via email

@24367452
Copy link

这个回购确实需要维护。为什么不使用其他架构来 模拟用于AI应用的忆阻器阵列?例如 MNSIM。

2024 年 7 月 17 日星期三 14:06,Nanchen @.>写道: @RTCartist https://github.com/RTCartist 你好,能帮我看一下是为什么吗? 我不熟悉这个问题。对不起,这帮不了你。 感谢您的关注和回答!这个问题确实很难 解决。 — 直接回复此邮件,在 GitHub 上查看 <#156 (评论)>, 或取消订阅 https://github.com/notifications/unsubscribe-auth/AX5GMGGZ2RM2MMHPZWI7EH3ZMYCW7AVCNFSM6AAAAABBDDPNYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZSGQ4TMNBXGM . 你收到这个是因为你被提到了。消息 ID: @.>

I did try another architecture a few days ago, IBM AIHWKIT.
But after installation, there was also a problem that I couldn't solve when running, so I came back to try memtorch.

Copy link

stale bot commented Sep 21, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants