-
Notifications
You must be signed in to change notification settings - Fork 157
The BESS Scheduler
Let's take a look at simple pipeline from a sample script (bess/bessctl/conf/samples/acl.bess
):
localhost:10514 $ show pipeline
+---------+ +----------+ +-----+ +-------+
| source0 | :0 200285248 0: | rewrite0 | :0 200257248 0: | fw | :0 100117232 0: | sink0 |
| Source | -----------------> | Rewrite | -----------------> | ACL | -----------------> | Sink |
+---------+ +----------+ +-----+ +-------+
The modules classes above can be divided into two categories, based on their behavior:
- The
Rewrite
,ACL
andSink
classes are only called by their left neighbor when there are packets to process. - The
Source
module class is called periodically and generates packets on its own.
The Source
module class behaves differently because its instances create a task.
The QueueInc
class behaves a lot like Source
: it registers a task that periodically gets called to read packets from a Port rxq. Remember: BESS ports operate in polling mode, to avoid interrupt overhead.
The PortInc
module is very similar to the QueueInc
module, except that it may register more tasks (one per each rxq of the Port).
A module class is not forced to choose between receiving packets and generating them: there are classes that mix the two behaviors. Let's take a look at another pipeline (bess/bessctl/conf/samples/queue.bess
):
localhost:10514 $ show pipeline
+--------+ +----------+ +-----------+ +-------------------+ +-------+
| src | | rewrite0 | | queue | | vlan_push0 | | sink0 |
| Source | :0 26043968 0: | Rewrite | :0 26040608 0: | Queue | :0 2897536 0: | VLANPush | :0 2898400 0: | Sink |
| | ----------------> | | ----------------> | 1023/1024 | ---------------> | PCP=0 DEI=0 VID=2 | ---------------> | |
+--------+ +----------+ +-----------+ +-------------------+ +-------+
The queue
module above (instance of Queue
, very different from QueueInc
!), receives packets generated from a task registered by src
, but also registers its own task. It doesn't immediately forwards the packets received, but it stores them in a ring buffer. The task created by queue
will later read packets from the ring buffer and forwards them to its right neighbor.
There are two different tasks in the pipeline. How often does the src
task get called? How often does the queue
task get called?
BESS implements a fully hierarchical task scheduler that supports different policies. The job of the scheduler is to decide which task needs to be executed next. The tasks in the scheduler are organized in a tree-like data structure, where the leaf nodes are the tasks itself, and the other nodes represents particular policies.
When the user or the author of the BESS script doesn't configure the scheduler, the execution of all the tasks is interleaved in a round robin fashion. The scheduler tree can be examined using the show tc
command:
localhost:10514 $ show tc
<worker 0>
+-- !default_rr_0 round_robin
+-- !leaf_src:0 leaf
+-- !leaf_queue:0 leaf
The above command shows that we have only one thread (worker 0
) with a very simple tree: there's a root node (called !default_rr_0
) with type round_robin
and two children (!leaf_src:0
) and (!leaf_queue:0
), which are the two tasks registered by the src
and queue
modules. In this case the scheduler behavior is very simple: it will simply alternate the execution of src
and queue
over and over.
The rate of the execution of a task can be throttled with a 'rate_limit'
node in the scheduler tree. We can create the node with a line in the BESS configuration script:
bess.add_tc('fast', policy='rate_limit', resource='packet', limit={'packet': 9000000})
If we inspect the tree now we see:
localhost:10514 $ show tc
<worker 0>
+-- !default_rr_0 round_robin
+-- !leaf_src:0 leaf
+-- !leaf_queue:0 leaf
+-- fast rate_limit 9.000 Mpps
The newly created note doesn't have any effect on the src
or queue
tasks, because they're still under the round_robin
policy. To actually enforce the limit on a task, we have to make it a child of the rate_limit
node, using this code:
src.attach_task('fast')
With the above line we tell the module src
to attach its task under the 'fast'
policy. Now the scheduler tree looks like:
localhost:10514 $ show tc
<worker 0>
+-- !default_rr_0 round_robin
+-- !leaf_queue:0 leaf
+-- fast rate_limit 9.000 Mpps
+-- !leaf_src:0 leaf
Similarly, we can also limit the execution of the task registered by the queue module to a slower rate with these two lines:
bess.add_tc('slow', policy='rate_limit', resource='packet', limit={'packet': 1000000})
queue.attach_task('slow')
The final tree looks like this:
localhost:10514 $ show tc
<worker 0>
+-- !default_rr_0 round_robin
+-- fast rate_limit 9.000 Mpps
| +-- !leaf_src:0 leaf
+-- slow rate_limit 1.000 Mpps
+-- !leaf_queue:0 leaf
The following functions are used to interact with the scheduler from a BESS script:
-
bess.add_tc(name, policy, wid=-1, parent='', resource=None, priority=None, share=None, limit=None, max_burst=None)
Create a new node in the scheduler tree called
name
of typepolicy
.name
must be a unique string that identifies the node (it cannot start with'!'
).policy
can be one of the following:-
'round_robin'
: Each time this node is visited by the scheduler, a child is picked in a round robin fashion. -
'weighted_fair'
: The children of this nodes are executed in proportion to theirshare
. -
'rate_limit'
: The node can have at most one child. When the child execution exceeds the limits imposed bylimit
andmax_burst
the node will be put in a blocked state (it will be unblocked after an appropriate amount of time). -
'priority'
The node always schedules the child with the highest priority that's not blocked (in the sense described by therate_limit
node.). If a node has no children, or if all its children are temporarily blocked, it is considered blocked itself. -
'leaf'
Nodes with this policy represent a task. They cannot be created withadd_tc
, they're added by the modules when a task is registered.
The
wid
andparent
arguments control where to place the new node in the tree. They're mutually exclusive, i.e. only one of them can have a non default value. Ifparent
is specified, its value must be the name of an existing node in the tree: the newly added node will become one of its children. Ifwid
is specified, the newly added node will become the root of the tree on workerwid
; if there's a root already, the roots will be placed under a round robin node named'default_rr_<wid>'
. If alsowid
is unspecified (i.e. -1), the worker will be chosen in a round robin fashion.The
weighted_fair
orrate_limit
policies can (respectively) share among children or limit different types of resources. Theresource
parameter must be used when creating one of them to choose. It can be:-
count
: The new node will share fairly or limit the number of times a child is scheduled. -
cycle
: The new node will share fairly or limit the number of cycles (as measured by the TSC) a child's execution takes. -
packet
: The new node will schedule the children nodes to try to share fairly or to limit the number of packets generated. -
bit
: The new node will schedule the children nodes to try to share fairly or to limit the number of bits generated.
The next two parameters are only used when attaching to certain types of parent nodes. The
priority
parameter control the priority that this node has among the children of apriority
parent (a lower number means higher priority). Theshare
parameter controls the relative share of the resource among the children of aweighted_fair
parent.limit
andmax_burst
are only used when creatingrate_limit
nodes: they control the rate of the resource used and the excess allowed. They must be in the form of an object with the resource as key (which must be the same as theresource
parameter) and an integer as value (e.g.{'packet': 1000}
). The node will schedule its children to consume no more thanlimit
units per second. -
-
<module>.attach_task(parent='', wid=-1, module_taskid=0, priority=None, share=None)
bess.attach_task(module_name, parent='', wid=-1, module_taskid=0, priority=None, share=None)
Moves a task in the scheduler tree. The two forms are equivalent.
The task to move is the task numbered
module_taskid
(it is usually 0), registered by a module. The module is identified by the<module>
object in the first form, or bymodule_name
in the second form.parent
andwid
behave like inbess.add_tc()
.priority
andshare
behave like inbess.add_tc()
. -
update_tc_params(self, name, resource=None, limit=None, max_burst=None)
Update the parameters of the node named
name
in the scheduler tree. Onlyweighted_fair
andrate_limit
nodes have parameters that can be updated.resource
,limit
andmax_burst
behave like inbess.add_tc()