I want to know if it is normal that the loss is about 98000 when training in VisA dataset candle class?