This page documents the simulation module. The simulator allows the control of a single agent while replaying the trajectories of all other vehicles from the dataset. To do so, the simulator requires the definition of a dataset, a dynamics model, a suite of sensors, an observer that map sensor measurements to agent observations, and a reward model.
We convert the raw track data stored in pandas dataframe into a stack of frames, where each frame contains the state variables of all vehicles in the frame. The state variables contain: [x, y, vx, vy, ax, ay, psi_rad, length, width, track_id]
. We also collect the ego track id and the start and end frame of each episode. These information are stored in the VehicleTrajectories
object.
We use the constant acceleration model to step the simulator forward in time. Let
$x' = x + v * dt + a * dt^2$ $v' = a * dt$
We have currently implemented three sensors. All sensors require the MapReader
object as input. We define left of the ego vehicle and counter clockwise as positive.
EgoSensor
: This sensor tracks ego vehicle's lateral lane offset, ego longitudinal and lateral speed, ego heading error w.r.t. the lane centerline, lane centerline curvature, and lane id.LeadVehicleSensor
: This sensor tracks the ego vehicle's relative distance, speed, and inverse tau w.r.t. the lead vehicle. These values are positive when the lead vehicle is in front of the ego vehicle. It also tracks lead vehicle's lateral lane offset, lane speed, and lead vehicle track id.LidarSensor
: This sensor simulates a lidar scanner. It divides the ego vehicle's surrounding area intonum_beams
number of equally sized fan-shaped bin. For each bin, it finds the closest other vehicle and computes and range and range rate (speed of change of range) to that vehicle. If there is no other vehicle within a maximum distance ofmax_range
, it returnsmax_range
as the range and0
as the range rate.
All sensors have the get_obs
method which takes in the state of ego and all other vehicles and returns the measurements and positions of the vehicle hit by the sensor beam. Hit positions are used to create animations.
The observer object interfaces the simulator with the agent. It takes all sensor measurements stored in a dictionary and flatten it into a
We have implemented a special CarfollowObserver
. In this observer, we expect the agent to only emit the longitudinal control. Lateral control is handeled by the observer itself using a simple feedback control rule. These two control signals are concatenated before passing to the simulator.
The observer also computes the info
output field in the simulator. Currently, the CarfollowObserver
will set info["terminate"] = True
if the ego vehicle is ahead of the lead vehicle for more than a certain distance and the ego vehicle's lane offset is higher than a certain value.
The reward module monitors the quality of a simulated trajectory. Currently the reward module calculate the absolute distance between the simulated ego vehicle position and the position of the ego vehicle in the actual track.
The InteractionSimulator
follows the OpenAI Gym framework with the following methods:
reset
: This method requires an argument of the episode id to initialize the state of the simulator to the first frame of the entered episode. It then computes and returns sensor measurements.step
: This method takes in the agent's action, converts it to the global cartesian coordinate, steps the ego state forward using the dynamics model, and then computes sensor measurements. This method outputsobs, rwd, done, info
.done = True
if the simulator has reached the last time step of the episode.
The simulator also stores the simulated states, actions, observations, and rewards internally under the self._state
dictionary. For specific fields in the dictionary please see code.