Yokogawa WT310E ranging failures at under 0.2 A #253

psyhtest · 2021-08-01T21:53:39Z

I'm investigating a very weird issue with the updated r1.1 workflow. It started with testing workflows on an Edge system. In the Offline scenario, the system power is over 24 Watts, and everything works as expected. In the Single Scenario, however, the system power is under 20 Watts, and I'm unable to take the measurements. Here's a typical failure log on the director side:

08-01-2021 10:56:59.225: WARNING: Uncertainty unknown for the last measurement sample! channel 0
ptd-server 2021-08-01 11:56:59,416 [INFO] Sending to ptd: 'Stop'
08-01-2021 10:56:59.417: Response to client sent: Stopping untimed measurement
ptd-server 2021-08-01 11:56:59,417 [INFO] Reply from ptd: 'Stopping untimed measurement'
ptd-server 2021-08-01 11:56:59,437 [ERROR] Got an exception
Traceback (most recent call last):
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 577, in handle_connection
    reply = self._handle_cmd(cmd, p)
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 650, in _handle_cmd                                     
    return unbool[self.session.stop(Mode.RANGING)]
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 861, in stop                                                                                
    self._maxVolts, self._maxAmps = max_volts_amps(
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 187, in max_volts_amps           
    raise MaxVoltsAmpsNegativeValuesError(f"Could not find values for {mark!r}")
ptd_client_server.lib.server.MaxVoltsAmpsNegativeValuesError: Could not find values for '2021-08-01_11-45-44_ranging'                                                                                      
ptd-server 2021-08-01 11:56:59,437 [INFO] Sending reply to client 'Error: exception'
ptd-server 2021-08-01 11:56:59,452 [INFO] Connection closed         
ptd-server 2021-08-01 11:56:59,452 [WARNING] Client connection closed unexpectedly
ptd-server 2021-08-01 11:56:59,452 [INFO] Sending to ptd: 'Stop'
08-01-2021 10:56:59.923: Completed test
08-01-2021 10:56:59.923: WARNING: No valid watts samples for this measurement!      
08-01-2021 10:56:59.924: Response to client sent: Error: no measurement to stop
ptd-server 2021-08-01 11:56:59,924 [INFO] Reply from ptd: 'Error: no measurement to stop'
ptd-server 2021-08-01 11:56:59,924 [INFO] Sending to ptd: 'SR,V,300.000000'
08-01-2021 10:56:59.924: Volt range set to 300.000000V

Here's how the PTDaemon log looks like on the director side:

me,08-01-2021 10:56:42.225,Watts,-22.100000,Volts,248.180000,Amps,-3.168000,PF,0.456200,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:43.226,Watts,-21.900000,Volts,248.130000,Amps,-3.168000,PF,0.454700,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:44.225,Watts,-21.300000,Volts,248.150000,Amps,-3.163000,PF,0.452500,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:45.225,Watts,-16.800000,Volts,248.130000,Amps,-3.130000,PF,0.427200,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:46.226,Watts,-16.000000,Volts,248.060000,Amps,-3.124000,PF,0.421700,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:47.225,Watts,-16.000000,Volts,248.140000,Amps,-3.125000,PF,0.422000,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:48.225,Watts,-15.700000,Volts,248.140000,Amps,-3.122000,PF,0.419100,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:49.226,Watts,-15.600000,Volts,248.170000,Amps,-3.121000,PF,0.417600,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:50.225,Watts,-15.500000,Volts,248.170000,Amps,-3.121000,PF,0.417300,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:51.225,Watts,-15.100000,Volts,248.090000,Amps,-3.118000,PF,0.413500,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:52.225,Watts,-15.200000,Volts,248.070000,Amps,-3.119000,PF,0.413400,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:53.225,Watts,-15.200000,Volts,248.110000,Amps,-3.119000,PF,0.413100,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:54.226,Watts,-15.200000,Volts,248.180000,Amps,-3.118000,PF,0.414100,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:55.225,Watts,-15.200000,Volts,248.220000,Amps,-3.119000,PF,0.414600,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:56.225,Watts,-15.200000,Volts,248.210000,Amps,-3.118000,PF,0.414400,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:57.226,Watts,-15.200000,Volts,248.230000,Amps,-3.118000,PF,0.415100,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:58.225,Watts,-15.200000,Volts,248.110000,Amps,-3.118000,PF,0.415300,Mark,2021-08-01_11-45-44_ranging
Time,08-01-2021 10:56:59.225,Watts,-15.200000,Volts,248.100000,Amps,-3.118000,PF,0.415200,Mark,2021-08-01_11-45-44_ranging

The power figures are negative but otherwise sensible. The Amps figures are negative and non-sensible.

The text was updated successfully, but these errors were encountered:

psyhtest · 2021-08-01T22:08:00Z

Ten hours later, I connected to the meter a different system drawing more power. The PTDaemon log continued:

Time,08-01-2021 21:12:31.035,Watts,-10.000000,Volts,249.930000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:32.036,Watts,-13.200000,Volts,249.940000,Amps,-3.104000,PF,0.390700,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:33.035,Watts,-14.600000,Volts,249.910000,Amps,-3.113000,PF,0.411400,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:34.036,Watts,-10.400000,Volts,249.920000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:35.035,Watts,-10.100000,Volts,249.940000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:36.035,Watts,-9.900000,Volts,249.920000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:37.036,Watts,-10.000000,Volts,249.920000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:38.035,Watts,-11.100000,Volts,249.870000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:39.035,Watts,-10.100000,Volts,249.880000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:40.035,Watts,-14.900000,Volts,249.860000,Amps,-3.116000,PF,0.412400,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:41.036,Watts,-14.900000,Volts,249.820000,Amps,-3.115000,PF,0.413800,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:42.035,Watts,-15.200000,Volts,249.830000,Amps,-3.117000,PF,0.416200,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:43.035,Watts,-14.900000,Volts,249.900000,Amps,-3.115000,PF,0.414300,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:44.035,Watts,-15.100000,Volts,249.900000,Amps,-3.116000,PF,0.416100,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:45.035,Watts,-14.900000,Volts,249.970000,Amps,-3.115000,PF,0.413300,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:46.035,Watts,-16.000000,Volts,249.980000,Amps,-3.124000,PF,0.422400,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:47.035,Watts,-16.200000,Volts,249.970000,Amps,-3.125000,PF,0.423500,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:48.035,Watts,-15.700000,Volts,249.920000,Amps,-3.121000,PF,0.419900,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:49.035,Watts,-15.000000,Volts,249.940000,Amps,-3.116000,PF,0.414400,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:50.036,Watts,-14.900000,Volts,249.930000,Amps,-3.116000,PF,0.412300,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:51.035,Watts,-14.300000,Volts,249.890000,Amps,-3.117000,PF,0.385200,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:52.035,Watts,-26.300000,Volts,249.850000,Amps,-3.197000,PF,0.471800,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:53.035,Watts,25.600000,Volts,249.880000,Amps,0.210000,PF,0.488200,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:12:54.035,Watts,25.900000,Volts,249.940000,Amps,0.212000,PF,0.489900,Mark,2021-08-01_22-12-01_ranging

Note how the negative readings at 21:12:52.035 are followed by the positive readings at 21:12:53.035.

Similar positive and sensible readings continued for 10 minutes when they became negative again on the system winding down the execution:

Time,08-01-2021 21:22:51.035,Watts,27.000000,Volts,250.060000,Amps,0.221000,PF,0.489700,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:52.035,Watts,-19.800000,Volts,250.120000,Amps,-3.160000,PF,0.420200,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:53.035,Watts,-10.100000,Volts,250.140000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:54.035,Watts,-9.800000,Volts,250.130000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:55.036,Watts,-9.900000,Volts,250.130000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:56.035,Watts,-10.100000,Volts,250.110000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:57.035,Watts,-9.600000,Volts,250.130000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:58.035,Watts,-10.100000,Volts,250.160000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:22:59.035,Watts,-9.800000,Volts,250.120000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:23:00.035,Watts,-9.800000,Volts,250.120000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging
Time,08-01-2021 21:23:01.035,Watts,-10.100000,Volts,250.120000,Amps,0.000000,PF,-1.000000,Mark,2021-08-01_22-12-01_ranging

Nevertheless, unlike the example above, the testing phase continued with the positive and sensible readings:

Time,08-01-2021 21:23:15.049,Watts,6.980000,Volts,250.240000,Amps,0.079930,PF,0.349000,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:16.050,Watts,10.000000,Volts,250.210000,Amps,0.102660,PF,0.389100,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:17.049,Watts,11.770000,Volts,250.230000,Amps,0.114740,PF,0.410000,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:18.049,Watts,7.300000,Volts,250.250000,Amps,0.082500,PF,0.353600,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:19.049,Watts,7.230000,Volts,250.280000,Amps,0.082000,PF,0.352300,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:20.049,Watts,7.020000,Volts,250.240000,Amps,0.080000,PF,0.350800,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:21.049,Watts,6.760000,Volts,250.230000,Amps,0.077860,PF,0.347100,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:22.050,Watts,8.500000,Volts,250.210000,Amps,0.091610,PF,0.370700,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:23.049,Watts,7.090000,Volts,250.200000,Amps,0.080550,PF,0.351900,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:24.049,Watts,11.330000,Volts,250.160000,Amps,0.111750,PF,0.405300,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:23:25.049,Watts,11.560000,Volts,250.110000,Amps,0.113520,PF,0.407100,Mark,2021-08-01_22-12-01_testing

right until the very end:

Time,08-01-2021 21:33:32.049,Watts,27.350000,Volts,250.230000,Amps,0.224670,PF,0.486500,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:33.049,Watts,26.850000,Volts,250.180000,Amps,0.221050,PF,0.485500,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:34.049,Watts,27.100000,Volts,250.210000,Amps,0.222110,PF,0.487500,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:35.050,Watts,27.280000,Volts,250.230000,Amps,0.223290,PF,0.488200,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:36.049,Watts,14.070000,Volts,250.250000,Amps,0.140800,PF,0.399200,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:37.049,Watts,7.080000,Volts,250.260000,Amps,0.080990,PF,0.349200,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:38.049,Watts,6.840000,Volts,250.270000,Amps,0.078870,PF,0.346600,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:39.049,Watts,7.110000,Volts,250.290000,Amps,0.081040,PF,0.350300,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:40.050,Watts,6.950000,Volts,250.300000,Amps,0.079890,PF,0.347500,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:41.049,Watts,6.890000,Volts,250.310000,Amps,0.079340,PF,0.347200,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:42.049,Watts,6.990000,Volts,250.290000,Amps,0.080120,PF,0.348300,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:43.049,Watts,6.670000,Volts,250.280000,Amps,0.077750,PF,0.342500,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:44.049,Watts,7.140000,Volts,250.320000,Amps,0.081320,PF,0.350800,Mark,2021-08-01_22-12-01_testing
Time,08-01-2021 21:33:45.050,Watts,6.870000,Volts,250.320000,Amps,0.079120,PF,0.346900,Mark,2021-08-01_22-12-01_testing

psyhtest · 2021-08-01T22:12:52Z

In the director logs, the short negative section at the end of the ranging run manifested itself as follows:

ptd-server 2021-08-01 22:22:51,254 [INFO] Got command from the client 'session,2021-08-01_22-12-01,stop,ranging'
08-01-2021 21:22:52.035: ERROR: Can't evaluate uncertainty of this sample!                                                                                                                                 
08-01-2021 21:22:52.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0                                                                                                           
08-01-2021 21:22:53.035: ERROR: Bad pf reading inf from WT310                                                                                                                                              
Buffer was 20.000E+00;0;300.0E+00;1                                                                                                                                                                        
                                                                                                                                                                                                           
08-01-2021 21:22:53.035: ERROR: Can't evaluate uncertainty of this sample!                                                                                                                                 
08-01-2021 21:22:53.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0                                                                                                           
08-01-2021 21:22:54.035: ERROR: Bad pf reading inf from WT310                                                                                                                                              
Buffer was 20.000E+00;0;300.0E+00;1                                                                                                                                                                        
                                                                                                                                                                                                           
08-01-2021 21:22:54.035: ERROR: Can't evaluate uncertainty of this sample!                                                                                                                                 
08-01-2021 21:22:54.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0                                                                                                           
08-01-2021 21:22:55.036: ERROR: Bad pf reading inf from WT310              
Buffer was 20.000E+00;0;300.0E+00;1                                                                  
                                                                                                     
08-01-2021 21:22:55.036: ERROR: Can't evaluate uncertainty of this sample!                      
08-01-2021 21:22:55.036: WARNING: Uncertainty unknown for the last measurement sample! channel 0
08-01-2021 21:22:56.035: ERROR: Bad pf reading inf from WT310            
Buffer was 20.000E+00;0;300.0E+00;1
                                                                                                     
08-01-2021 21:22:56.035: ERROR: Can't evaluate uncertainty of this sample! 
08-01-2021 21:22:56.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0
08-01-2021 21:22:57.035: ERROR: Bad pf reading inf from WT310                                    
Buffer was 20.000E+00;0;300.0E+00;1                                                                  
                                                                                                                                                                                                           
08-01-2021 21:22:57.035: ERROR: Can't evaluate uncertainty of this sample!                                                                                                                                 
08-01-2021 21:22:57.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0
08-01-2021 21:22:58.035: ERROR: Bad pf reading inf from WT310                                                                                                                                              
Buffer was 20.000E+00;0;300.0E+00;1                                                                                                                                                                        
                                                                                                                                                                                                           
08-01-2021 21:22:58.035: ERROR: Can't evaluate uncertainty of this sample!                                                                                                                                 
08-01-2021 21:22:58.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0                                                                                                           
08-01-2021 21:22:59.035: ERROR: Bad pf reading inf from WT310
Buffer was 20.000E+00;0;300.0E+00;1
                                                                                                                                                                                                           
08-01-2021 21:22:59.035: ERROR: Can't evaluate uncertainty of this sample!                                                                                                                                 
08-01-2021 21:22:59.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0                                                                                                           
08-01-2021 21:23:00.035: ERROR: Bad pf reading inf from WT310                                                                                                                                              
Buffer was 20.000E+00;0;300.0E+00;1                                                                                                                                                                        

08-01-2021 21:23:00.035: ERROR: Can't evaluate uncertainty of this sample!
08-01-2021 21:23:00.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0
08-01-2021 21:23:01.035: ERROR: Bad pf reading inf from WT310
Buffer was 20.000E+00;0;300.0E+00;1

08-01-2021 21:23:01.035: ERROR: Can't evaluate uncertainty of this sample!
08-01-2021 21:23:01.035: WARNING: Uncertainty unknown for the last measurement sample! channel 0
ptd-server 2021-08-01 22:23:01,265 [INFO] Sending to ptd: 'Stop'
08-01-2021 21:23:01.265: Response to client sent: Stopping untimed measurement
ptd-server 2021-08-01 22:23:01,265 [INFO] Reply from ptd: 'Stopping untimed measurement'
ptd-server 2021-08-01 22:23:01,286 [INFO] Sending reply to client 'OK'
ptd-server 2021-08-01 22:23:01,372 [INFO] Got command from the client 'time'
ptd-server 2021-08-01 22:23:01,372 [INFO] Sending reply to client '1627852981.3721929'
ptd-server 2021-08-01 22:23:01,376 [INFO] Got command from the client 'session,2021-08-01_22-12-01,start,testing'
ptd-server 2021-08-01 22:23:01,376 [INFO] Sending to ptd: 'SR,V,250.270000'
08-01-2021 21:23:01.733: Completed test
08-01-2021 21:23:01.733: Avg watts 26.780968, min watts 25.600000, max watts 27.800000, samples 631, errors 32, valid 599

The subsequent testing run was error-free:

08-01-2021 21:33:45.747: Completed test
08-01-2021 21:33:45.747: Avg watts 26.195943, min watts 6.670000, max watts 28.210000, samples 631, errors 0, valid 631

psyhtest · 2021-08-01T22:27:03Z

I've launched another workload which goes just above 24 Watts. The PTDaemon log is oscillating between positive and negative values:

Time,08-01-2021 22:21:24.318,Watts,24.300000,Volts,251.890000,Amps,0.200000,PF,0.482800,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:25.318,Watts,-27.100000,Volts,251.850000,Amps,-3.199000,PF,0.481400,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:26.319,Watts,-27.100000,Volts,251.880000,Amps,-3.199000,PF,0.480900,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:27.318,Watts,-26.900000,Volts,251.870000,Amps,-3.198000,PF,0.480600,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:28.318,Watts,24.600000,Volts,251.910000,Amps,0.202000,PF,0.482600,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:29.319,Watts,-27.100000,Volts,251.880000,Amps,-3.199000,PF,0.480500,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:30.318,Watts,24.300000,Volts,251.910000,Amps,0.200000,PF,0.482300,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:31.318,Watts,24.500000,Volts,251.920000,Amps,0.201000,PF,0.484200,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:32.319,Watts,-27.100000,Volts,251.930000,Amps,-3.199000,PF,0.481000,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:33.318,Watts,24.300000,Volts,251.920000,Amps,0.200000,PF,0.480900,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:34.319,Watts,24.500000,Volts,251.890000,Amps,0.202000,PF,0.481500,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:35.318,Watts,-27.200000,Volts,251.890000,Amps,-3.199000,PF,0.480800,Mark,2021-08-01_23-18-11_ranging

But on the meter itself, the Watt display shows 0.0241 .. 0.0249 kW, while the Amps display shows 0.199 .. 0.203. It must be crossing this value 0.2 A back and forth that causes the kerfuffle with the log. Indeed, this is exactly 1% of the 20 Amps range, maximum for the WT310E.

psyhtest · 2021-08-01T22:36:53Z

Note that we cannot simply take the absolute values:

Time,08-01-2021 22:21:32.319,Watts,-27.100000,Volts,251.930000,Amps,-3.199000,PF,0.481000,Mark,2021-08-01_23-18-11_ranging
Time,08-01-2021 22:21:33.318,Watts,24.300000,Volts,251.920000,Amps,0.200000,PF,0.480900,Mark,2021-08-01_23-18-11_ranging

24.3 W is not close to 27.1 W, even though 0.200 A and 0.199 A are. (One should replace -3. with 0. for negative readings.)

psyhtest · 2021-08-01T22:42:21Z

I guess that when a system has at least some readings above 0.200 A in the ranging run, the workflow can proceed to the testing run. Unfortunately, this will affect a lot of systems Krai measured for v1.0 and are looking to update for v1.1.

@s-idgunji @araghun At the moment I suggest we revert to the old behaviour. That is, perform the ranging run in the Auto ranging mode. I will look into preparing a patch for our Wednesday meeting.

psyhtest · 2021-08-03T14:26:30Z

We have almost all of the efficientnet-lite family producing negative readings with armnn-v21.05-neon on xavier, except for efficientnet-lite4-non-quantized. It's average power with armnn-v21.02-neon in the last round was 22.07 W - the highest of all models.

Most of the ranging trace is under the radar (0.2 A ~ 23.4 W), fewer than 2.5% of the samples are valid:

08-02-2021 08:30:35.103: Avg watts 23.593333, min watts 23.400000, max watts 24.000000, samples 627, errors 612, valid 15

psyhtest · 2021-08-03T15:48:48Z

I'm not sure if it's related to the power workflow update, but the overheads when measuring power seem to have increased dramatically, at least for resnet50 on xavier with tflite:

Verifying performance.
reference score = 112767727
test score = 86564185

I get the same high execution time (~113 ms) in the ranging and testing power runs, and the same low execution time (~87 ms) in the TEST01 and TEST05 compliance tests. That's a 30% overhead, not 10% as previously observed (and for armnn, not tflite, on xavier).

s-idgunji · 2021-08-03T16:00:22Z

@psyhtest - Let's discuss this tomorrow in the WG with @araghun and the rest of the Power WG. There has been no change to the base flow, other than accepting submits that were done as improvements during v1.0. What we should see is if other teams see similar issues or is it tied to the other issue that you had mentioned (CPU only submissions that are low power envelope get loaded by Power infra flow)

psyhtest · 2021-08-03T16:07:13Z

@s-idgunji We will certainly rerun the same experiment (~113 ms) to see if it's just a blip.

But changing the ranging phase to use the maximum range of a meter instead of auto ranging has proved to be a wrong decision.

s-idgunji · 2021-08-03T16:25:13Z

@s-idgunji We will certainly rerun the same experiment (~113 ms) to see if it's just a blip.

But changing the ranging phase to use the maximum range of a meter instead of auto ranging has proved to be a wrong decision.

@psyhtest - I don't think you can say that. The approach is right. And it eliminates issues. Perhaps for very low power system submissions there may be a cost. But then, we need to understand how to eliminate ranging related errors in our flow. Why is it that using max range and then setting the highest observed load for perf testing causing an issue ? Can you explain that ?

psyhtest · 2021-08-03T16:30:14Z

@s-idgunji As I explained above, for systems that draw less than 0.2 A, the ranging log is full of negative and non-sensible values. The workflow doesn't even allow one to proceed to the testing phase failing with:

ptd-server 2021-08-01 11:56:59,437 [ERROR] Got an exception
Traceback (most recent call last):
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 577, in handle_connection
    reply = self._handle_cmd(cmd, p)
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 650, in _handle_cmd                                     
    return unbool[self.session.stop(Mode.RANGING)]
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 861, in stop                                                                                
    self._maxVolts, self._maxAmps = max_volts_amps(
  File "/home/anton/projects/mlperf/power-dev-r1.1/ptd_client_server/../ptd_client_server/lib/server.py", line 187, in max_volts_amps           
    raise MaxVoltsAmpsNegativeValuesError(f"Could not find values for {mark!r}")
ptd_client_server.lib.server.MaxVoltsAmpsNegativeValuesError: Could not find values for '2021-08-01_11-45-44_ranging'                                                                                      
ptd-server 2021-08-01 11:56:59,437 [INFO] Sending reply to client 'Error: exception'

If we don't fix it, this will affect hundreds of submissions, including customer-critical ones.

psyhtest · 2021-08-03T16:32:28Z

I can offer an analogy with floating-point calculations: we have an "underflow" here. But instead of zeros we get all negative values.

araghun · 2021-08-03T16:35:46Z

@s-idgunji , @psyhtest : let us discuss this tomorrow in the WG.
Looks like the question to evaluate is:

Is the change from using Auto to max causing this issue (in other words, if we revert back for experimentation, does this issue go away). If this happens, we will need to understand why is it happening (are the -ve values messing around with the functioning), and,
Is this an issue for other submitters as well

s-idgunji · 2021-08-03T16:38:39Z

@psyhtest - Thanks for the analogy, but that was not my ask. It was what is causing it and if it is specific to a certain class of systems (e.g. low power) . Seems like the change has been in for a while but we'd not seen any validation driven errors reported before now.

I will add Dejan to this issue to help track. Update : I cannot add Dejan , so have forwarded the meeting.. next best thing.

s-idgunji · 2021-08-03T16:41:59Z

I'm not sure if it's related to the power workflow update, but the overheads when measuring power seem to have increased dramatically, at least for resnet50 on xavier with tflite:
Verifying performance.
reference score = 112767727
test score = 86564185
I get the same high execution time (~113 ms) in the ranging and testing power runs, and the same low execution time (~87 ms) in the TEST01 and TEST05 compliance tests. That's a 30% overhead, not 10% as previously observed (and for armnn, not tflite, on xavier).

Also checking, how can you measure the overhead when the flow itself does not allow you to proceed per other comments .

psyhtest · 2021-08-03T17:01:35Z

@s-idgunji: Seems like the change has been in for a while but we'd not seen any validation driven errors reported before now.

@araghun: Is this an issue for other submitters as well

As I'm the only submitter experimenting with low-power systems at the moment, I'm not surprised this hasn't been reported before. Until last week, I'd been validating the r1.1 power workflow only with systems drawing more than 25 Watts, where it is not an issue.

@s-idgunji: Also checking, how can you measure the overhead when the flow itself does not allow you to proceed per other comments

Oh, but resnet50 on xavier draws more than 25 Watts (see above), so 95% of the samples are positive:

08-01-2021 21:23:01.733: Avg watts 26.780968, min watts 25.600000, max watts 27.800000, samples 631, errors 32, valid 599

But most other models we use are indeed draw under 22 Watts even on xavier, so I can't really measure anything, leave alone any overheads. (Also, they are for the open division, so I don't run the compliance tests for them.)

@araghun : Is the change from using Auto to max causing this issue (in other words, if we revert back for experimentation, does this issue go away).

I should confirm that by the WG meeting tomorrow.

Thanks.

s-idgunji · 2021-08-03T17:07:05Z

@psyhtest - Given this is just your reporting, and also having covered systems > 25W (in your case) where there are no issues, I do not think we can say this method is wrong. The question is can we resolve this in time and how. We can discuss tomorrow.

One of the reasons we've been asking submitters to test early and report since the changes have been so incremental and code ready to test for a while.

psyhtest · 2021-08-03T17:16:01Z

One of the reasons we've been asking submitters to test early and report since the changes have been so incremental and code ready to test for a while.

Oh, I have been testing it for 2-3 months now, just not across the full range of systems and scenarios! :)

Looking at submissions 1.0-633,634 in Edge - Closed - Power, I expect NVIDIA to run into exactly the same issue on all Jetson Xavier NX (MaxQ, TensorRT) benchmarks and some Jetson AGX Xavier 32GB (MaxQ, TensorRT) benchmarks. Do you know if someone at NVIDIA has already tested this?

psyhtest · 2021-08-03T17:18:40Z

the changes have been so incremental

I hope so. It's easier to revert incremental changes.

s-idgunji · 2021-08-03T17:26:50Z

the changes have been so incremental

I hope so. It's easier to revert incremental changes.

Defeats the purpose of the change if it is made in the right direction and works for most submitters. Let's figure out the solution tomorrow.

psyhtest · 2021-08-03T19:42:41Z

@s-idgunji The previous version of the workflow worked for ALL submitters. The current version of the workflow would exclude 80-90% of the power results from the previous round. Would we carry on regardless?

s-idgunji · 2021-08-03T22:58:46Z

@s-idgunji The previous version of the workflow worked for ALL submitters. The current version of the workflow would exclude 80-90% of the power results from the previous round. Would we carry on regardless?

@psyhtest - I do not understand this point. Are you saying that % of power results submitted should dictate the decision ? So if a submitter was contributing 1% of the results maybe we would not consider ? To me that's what the statement appears to suggest. Rather, let's look at the solution and options live. As far as I know other submitters < 25W have not seen this issue, but I can check with Ashwin/Dilip about NVIDIA's testing.

psyhtest · 2021-08-03T23:07:15Z

I've prepared a new branch which reverts two commits to server.py and updates its checksum.

On restarting the server service, I no longer see negative non-sensible values:

Time,08-03-2021 22:50:03.188,Watts,6.571000,Volts,249.370000,Amps,0.072650,PF,0.362700,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:04.189,Watts,9.666000,Volts,249.310000,Amps,0.094920,PF,0.408400,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:05.188,Watts,11.325000,Volts,249.260000,Amps,0.105710,PF,0.429800,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:06.188,Watts,6.852000,Volts,249.350000,Amps,0.074790,PF,0.367400,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:07.189,Watts,7.045000,Volts,249.340000,Amps,0.076170,PF,0.371000,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:08.188,Watts,6.911000,Volts,249.310000,Amps,0.074940,PF,0.369900,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:09.188,Watts,6.543000,Volts,249.310000,Amps,0.072170,PF,0.363700,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:10.188,Watts,8.319000,Volts,249.300000,Amps,0.085390,PF,0.390800,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:11.188,Watts,7.173000,Volts,249.210000,Amps,0.077330,PF,0.372200,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:12.189,Watts,11.307000,Volts,249.280000,Amps,0.104750,PF,0.433000,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:13.188,Watts,11.395000,Volts,249.210000,Amps,0.105530,PF,0.433300,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:14.188,Watts,11.411000,Volts,249.170000,Amps,0.105550,PF,0.433900,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:15.188,Watts,11.232000,Volts,249.180000,Amps,0.104310,PF,0.432100,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:16.188,Watts,11.613000,Volts,249.210000,Amps,0.107100,PF,0.435100,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:17.188,Watts,11.579000,Volts,249.200000,Amps,0.106830,PF,0.434900,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:18.188,Watts,11.746000,Volts,249.190000,Amps,0.107960,PF,0.436600,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:19.188,Watts,11.477000,Volts,249.250000,Amps,0.106070,PF,0.434100,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:20.188,Watts,11.818000,Volts,249.250000,Amps,0.108530,PF,0.436800,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:21.188,Watts,11.732000,Volts,249.270000,Amps,0.107970,PF,0.435900,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:22.188,Watts,9.993000,Volts,249.270000,Amps,0.096520,PF,0.415300,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:23.188,Watts,-1.000000,Volts,-1.000000,Amps,-1.000000,PF,-1.000000,Mark,2021-08-03_23-49-34_ranging
Time,08-03-2021 22:50:24.188,Watts,26.130000,Volts,249.230000,Amps,0.212740,PF,0.492800,Mark,2021-08-03_23-49-34_ranging

except from the quick change to the next range on the way up and then back.

This produces only two bad samples:

08-03-2021 23:00:32.886: Avg watts 26.480325, min watts 6.543000, max watts 28.390000, samples 630, errors 2, valid 628

which I have seen happening for v1.0 as well e.g. here.

Tomorrow we are going to test it for a complete family of networks crossing the 0.2 A boundary.

psyhtest · 2021-08-04T09:02:25Z

Are you saying that % of power results submitted should dictate the decision ?
So if a submitter was contributing 1% of the results maybe we would not consider

That's categorically not what I'm saying. We should consider all submitters, whether they are in the minority or in the majority.

As far as I know other submitters < 25W have not seen this issue, but I can check with Ashwin/Dilip about NVIDIA's testing.

Coming to think about this, WT333E can behave differently from WT310E in this regard. Let's discuss today.

s-idgunji · 2021-08-04T15:58:33Z

Are you saying that % of power results submitted should dictate the decision ?
So if a submitter was contributing 1% of the results maybe we would not consider

That's categorically not what I'm saying. We should consider all submitters, whether they are in the minority or in the majority.

Thanks for clarifying because it was not clear why we referred to % of submissions. There is no question that we have to help resolve issues for every submitter. I think Dejan has reached out to understand the issue and can explain in the meeting. But we'll have to look for some solution that allows for submitters who have already prepared results with current version , vs any reverted version proposed (i.e. back to auto) so that we allow for least impact.

psyhtest · 2021-08-05T22:14:49Z

Just a quick update. I've been collecting test data to verify a potential solution: 1) measurements taken with the top of r1.1; 2) measurements taken with the auto-ranging patch. Once I validate the solution works for both, I'll make a PR to r1.1.

s-idgunji · 2021-08-06T00:41:36Z

Thanks . @psyhtest - Has the main Inference v1.1 pointed to the Power v1.1 instead of power-dev. Do I need to follow up ?

gregdarnell · 2022-01-28T05:08:36Z

I'm not sure if you are looking for an explanation for this behavior, but I can explain. The WT310 specification says that its effective input range is 1% to 130% of range for voltage and current. If you're in the 20A range and drawing less than 0.2A, you are outside the analyzer's specifications and the data is invalid. PTDaemon marks all invalid data by using negative values.

psyhtest changed the title ~~Ranging phase failures at below ~22 Watts~~ Yokogawa WT310E ranging failures at under 0.2 A Aug 1, 2021

psyhtest mentioned this issue Aug 9, 2021

Reintroduce autoranging #257

Merged

psyhtest mentioned this issue Jan 18, 2022

Option for auto/max ranging #260

Closed

psyhtest closed this as completed May 24, 2022

psyhtest mentioned this issue Jan 10, 2023

Added option to avoid ranging mode run, fixes #270 #273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yokogawa WT310E ranging failures at under 0.2 A #253

Yokogawa WT310E ranging failures at under 0.2 A #253

psyhtest commented Aug 1, 2021 •

edited

Loading

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 3, 2021 •

edited

Loading

psyhtest commented Aug 3, 2021 •

edited

Loading

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021

psyhtest commented Aug 3, 2021

araghun commented Aug 3, 2021

s-idgunji commented Aug 3, 2021 •

edited

Loading

s-idgunji commented Aug 3, 2021 •

edited

Loading

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021 •

edited

Loading

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021 •

edited

Loading

psyhtest commented Aug 4, 2021

s-idgunji commented Aug 4, 2021

psyhtest commented Aug 5, 2021

s-idgunji commented Aug 6, 2021

gregdarnell commented Jan 28, 2022

Yokogawa WT310E ranging failures at under 0.2 A #253

Yokogawa WT310E ranging failures at under 0.2 A #253

Comments

psyhtest commented Aug 1, 2021 • edited Loading

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 1, 2021

psyhtest commented Aug 3, 2021 • edited Loading

psyhtest commented Aug 3, 2021 • edited Loading

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021

psyhtest commented Aug 3, 2021

araghun commented Aug 3, 2021

s-idgunji commented Aug 3, 2021 • edited Loading

s-idgunji commented Aug 3, 2021 • edited Loading

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021 • edited Loading

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021

s-idgunji commented Aug 3, 2021

psyhtest commented Aug 3, 2021 • edited Loading

psyhtest commented Aug 4, 2021

s-idgunji commented Aug 4, 2021

psyhtest commented Aug 5, 2021

s-idgunji commented Aug 6, 2021

gregdarnell commented Jan 28, 2022

psyhtest commented Aug 1, 2021 •

edited

Loading

psyhtest commented Aug 3, 2021 •

edited

Loading

psyhtest commented Aug 3, 2021 •

edited

Loading

s-idgunji commented Aug 3, 2021 •

edited

Loading

s-idgunji commented Aug 3, 2021 •

edited

Loading

psyhtest commented Aug 3, 2021 •

edited

Loading

psyhtest commented Aug 3, 2021 •

edited

Loading