W2101 Data Model for World State

World State

Introduction

In a problem like this where we are trying to simulate some behaviour it is easy to confuse things. We need a representation of the world that is akin to objective reality, all the things in the world that we want to keep track of and provide a 'place' for in our little simulated world. We don't want to overly complicate things because the greater the fidelity of the simulation the more expensive in time, energy and complexity it becomes.

And while the world state is a big part of it, it is not the 'mind' of the Adventurer. It will be a part of the solution but it is not the entire solution (at least as I conceptualize it now, hey this is iterative and who knows where we will end up :-) ) So from the work done by others the old n x 4 x 4 matrix is looking good as a starting point. The 4 x 4 part obviously represents location, the 'n' part is the domain of things in the world we want to keep track of. This is a little counter intuitive, normally you would think of a database with tables representing the key concepts but location information would be merely an attribute, not a key organizing principle. I'd like to thank Brandon for his blog post explaining this.

Welcome to the unintuitive world of AI programming. By an large this is a problem that is going to be solved by performing vast amounts of matrix algebra, multiplies, divides, addition and subtraction. So how this data can be formalized that reducing the cognitive and procedural gap between it and a matrix algebra library and a GPU processor the better. This is why, by and large, it matters far less today than in years previously, how performative your language choice is. How well it works with matrix algebra is far more important than how fast it runs because ultimately the hard work will be done by some very optimized library routines running on specialized silicon. This is why Googles Tensor flow language is attractive to many people because a Tensor is just a big fancy matrix algebra object.

The Data Structure

In Brandon approach he only modeled four concepts, player, wall, pit and goal positions. I want to model eight things, the locations for the Wumpus, Adventurer, Pits, gold, breezes, stenches and walls. I would also like to model if the Adventurer has or has not visited a location before so later we can do some experiments comparing exploration versus exploitation search strategies. So our array will be an 8 x 4 x 4 array. A zero represents an empty location for that thing and a one represents it is present. With the 'have I been here before' I'm clear that a one will represent I've been here before but I'm not sure about how to represent 'never been here before', a zero is an obvious choice but taking the derivative of zero is zero so using something like minus one would actually provide a signal that could be used. I think I will try it first using minus one if the Adventurer has never been to a location and then use zero and compare the results.

Note that -1 means not present, +1 means present, I won't list it for each entry as it will apply to all entries. All movement is along the four cardinal directions and never diagonally.

Things in the word	Array Location	Constraints	Rewards
The Wumpus	(0,4,4)	Only one Wumpus per Cave system. The Wumpus doesn't move (at least for now) during the simulation. Emits a stench which occupies the adjacent grid cells.	-100 Adventurer becomes Wumpus poo. Ends that simulation run.
Stenches	(1,4,4)	Doesn't move unless we enable the Wumpus to move. They occupy the cells adjacent to the Wumpus.	0
Pits	(2,4,4)	Don't move. They emit a breeze which is detectable in the adjacent cells of the grid by the presence of +1 in the breezes array.	-100 Adventurer falls into bottomless pit.
Breezes	(3,4,4)	Are emitted by Pits and occupy the cells adjacent to pits.	0
Gold	(4,4,4)	Doesn't move unless the Adventurer moves it. The glittering light of the gold can be seen from the adjacent cells.	+100 Adventurer finds the gold.
Glittering	(5,4,4)	Doesn't move unless the gold moves.	0
The Adventurer	(6,4,4)	Can move in each cardinal direction but not diagonally. Can detect breezes, stenches and the glittering of the gold. Dies if she walks into a Pit or a cell in which the Wumpus lives. In this project I will not be modelling the arrow and the ability to kill the Wumpus.	0
Walls	(7,4,4)	Don't move and emit no signals to any adjacent cell.	-10 Adventurer bumps into wall and hurts them self. Learns that Walls hurt.
Visited	(8,4,4)	Changes constantly during the simulation.	0/+2 For some experiments we may reward exploration.

Other rewards

Action	Reward
Each step taken	-1 The longer you stay in the cave the worse it is for you.
Exiting alive	+100 I may run experiments that include getting out or as others have done just up to when the Adventurer finds the goal or dies.
After 100 steps	-100 Like others this penalizes endless looping behavior. After 100 steps the Adventurer dies of starvation after getting lost. The cumulative penalty would be -200, 100 steps at -1 per step then an additional -100 on the hundredth step.

After thoughts

I just realized with this approach I can't explicitly model the two goal scenario where the adventurer finds the goal and then exits the caves. I would need to add an addition 4 x 4 matrix to represent the exit goal. I will skip this for now and consider it for a later iteration.

Provide feedback

Saved searches