1. Design Documentation¶
1.1. Overview¶
1.1.1. Basic Components¶
The goal of this module is to support the integration of reinforcement
learning (RL) components into network scenarios to simulate their
deployment and the communication between them. Typical RL tasks include
agents, actions, observations and rewards as their main components. In a
network, these components are often placed on different nodes. For example,
collecting observations and training an agent often happen at different
locations in the network. To associate these RL components with Node
s,
the abstraction of user applications is used. The following
applications inherit from a general RlApplication
:
ObservationApplication
: observes part of the network state and communicates the collected data (i.e. observations or data used to calculate observations) to one or more agents
RewardApplication
: collects data to calculate a reward and communicates it to one or more agents
AgentApplication
: represents the training and/or inference agent in the network.
ActionApplication
: executes an action that was inferred by an agent and thereby changes a part of the network state
A commonly used standard for implementing RL environments is the Gymnasium standard [Gymnasium], which is based on Python. With RLLib (Ray) [RLLib] an extensive Python library for RL exists that uses this standard as an interface for single-agent training. As ns-3 is implemented in C++, a connection with the mainly Python-based RL frameworks needs to be established. This module uses ns3-ai [ns3-ai] for the inter-process communication.
1.1.2. Design Criteria¶
Possible use cases this module is designed for are the following:
Simulation of communication overhead between RL components
Simulating how calculation and/or communication delays influence the performance of an RL approach via configurable delays
Testing and evaluating tradeoffs between different RL deployments, e.g., distributed deployment on several nodes vs. centralized deployment on a single node
To make these generalized use cases possible, the following main requirements have been considered:
Support integration with existing ns-3 scenarios with as few assumptions about the scenario as possible (even complex scenarios such as Example scenario setup that should be supported by the framework should be supported)
Support single-agent and multi-agent reinforcement learning (MARL)
Support communication between RL components via simulated network traffic
1.1.3. Class diagram¶
The following class diagram includes all classes provided by DEFIANCE. You can also find member variables and class methods that are particularly important.
1.2. Customization¶
This module provides a framework to simulate different RL components by different
RlApplication
s. The main tasks that the framework performs for the user in
order to make it well usable are the following:
provide frameworks for prototypical
RlApplication
s,provide helper functionality to support creation of
RlApplication
s and their installation onNode
s,enable typical communication between
RlApplication
s, andhandle the interaction between
RlApplication
s and the Python-based training/inference processes in compliance with the typical RL workflow.
In addition to these tasks performed by the framework, some aspects of the
RlApplication
s strongly depend on the specific RL task and solution
approach that is to be implemented. Therefore, custom code provided by the user
of the framework has to be integrated into the RlApplication
s. Typically,
this mainly concerns the following aspects of RlApplication
s:
Data collection: How are observations and rewards collected/calculated exactly?
Communication between
RlApplication
s: When and to whom are messages sent?Behavior of agents: At what frequency does the agent step? What triggers a step?
Execution of actions: What happens exactly when a specific action occurs?
A typical example of necessary customization is an ObservationApplication
which should be registered at a specific ns-3 trace source to provide it with the
necessary data. The according trace source and its signature have to be
configurable as they depend on the specific scenario. Additionally it should
be configurable to which AgentApplication
s the collected data is sent.
One option to solve this task are callbacks: The user could create
functions outside the according RlApplication
with a distinct interface.
Those could then be registered as callbacks in the according RlApplication
.
Whenever user-specific code is required, the RlApplication
would then
call these callbacks. Similarly, the RlApplication
could provide a method
with a distinct interface. The user then has to register this method at a trace
source to provide the RlApplication
with data. This option is not very
flexible as all function signatures have to be fixed and known already when the
RlApplication
class is designed. Another drawback
of this approach is that there is no defined location for the custom
code of an RlApplication
.
Therefore, an approach using inheritance was chosen: The RlApplication
s
are designed as abstract classes from which the user has to inherit in
order to add the scenario-specific code. This has the advantage that all
code connected to an RlApplication
is collected in a single class.
Additionally, it guarantees that all necessary methods are implemented
and usable defaults can be implemented for methods that may be
customized.
1.3. ChannelInterface¶
This framework is supposed to allow communication between RlApplication
s in a
custom scenario. Therefore, it is the task of the framework user to set
up the scenario and the communication channels between Node
s. This implies
that the user has to provide the framework with an abstraction of a
pre-configured channel over which data can be sent. Intuitively, this
would be sockets. Nevertheless, the framework should prevent the user
from the overhead of creating sockets. That is why the framework uses
IP addresses and the type of protocol as data the user has to
provide. Using this data, sockets can be created and connected to each
other.
RlApplication
s should handle the interfaces of their communication channels
transparently, e.g. independent from the protocol type. Additionally,
direct communication without simulated network traffic should be possible.
To this end, the ChannelInterface
class was introduced as a generalized
interface used in RlApplication
s. It is subclassed by the
SocketChannelInterface
class, which is responsible for creating sockets
when provided with the necessary information (IP addresses and protocol
type). The SimpleChannelInterface
provides the RlApplication
s with the
same interface while maintaining a direct reference to another
SimpleChannelInterface
to allow communication with a fixed delay (which
might also be 0).
It should be noted that the framework should support multiple connections
over ChannelInterface
s between a single pair of RlApplication
s to allow
using different communication channels.
Simulating communication between RlApplication
s over simulated network
channels includes the chance that a channel is broken and that therefore
no communication is possible. This has to be handled by the underlying
protocols or the user of the framework, since the user is responsible for the
whole setup and configuration of the concrete network scenario.
1.4. Design of RlApplications¶
1.4.1. RlApplication¶
The RlApplication
generalizes functionality that is equal among all
applications provided by this module. This includes IDs to identify specific
RlApplication
, functionality to send and to handle ChannelInterface
s.
In this way a generalized interface for all possible RL applications is
established which can be used by all classes handling all kinds of RL applications, like
the CommunicationHelper
introduced in Helper.
In theory, multiple RlApplication
s of the same type can be installed on
the same Node
. Nevertheless, this was not tested yet since in most cases
tasks of the same type (e.g. collecting observations) do not have to be
separated into different applications when performed on the same Node
.
1.4.2. AgentApplication¶
1.4.2.1. Basic Concept¶
The AgentApplication
represents an RL agent (which is trained with e.g.
RLLib) within the network. It has a scenario-specific
observation and action space. Currently, the framework is tested only with
fixed observation and action spaces (and not with parametric action
spaces).
1.4.2.2. Interaction with other RlApplications¶
The AgentApplication
may receive observations and rewards from one or
multiple ObservationApplication
s resp. RewardApplication
s.
To support as many use cases as possible, it is also supported to receive
any data from ObservationApplication
s resp. RewardApplication
s, which is
not immediatly used as observations or rewards but from which observations
and rewards are derived by custom calculations. Therefore, the data
transmitted from ObservationApplication
s to AgentApplication
s (which is
called observation in the following) does not necessarily fit into the
observation space of the agent.
Likewise, an AgentApplication
can send actions (or any data derived
from it’s actions) to one or multiple ActionApplication
s.
Additionally to the common RL interactions, this framework also supports
transmitting arbitrary messages between AgentApplication
s. This provides
users of this framework with the chance to implement a protocol for agent
communication. Furthermore, it is the basis for exchanging model updates
or policies between agents.
1.4.2.3. Interaction with Python-based learning process¶
The AgentApplication
is intended to interact with the Python-based
training/inference processes over the OpenGymMultiAgentInterface
.
This is primarily done by the AgentApplication::InferAction
method(s), which call(s)
OpenGymMultiAgentInterface::NotifyCurrentState
. This interaction
can happen timer-based (i.e. in fixed time intervals) or event-based
(e.g. depending on how many observations were received). To have always
access to the current observation and reward, which shall be sent to
the Python side, the AgentApplication
stores an m_observation
and
m_reward
object.
1.4.2.4. Receiving, storing and calculating observations resp. rewards¶
To allow the AgentApplication
to arbitrarily calculate observations and
rewards based on the messages received from ObservationApplication
s and
RewardApplication
s, these received messages have to be stored in the
AgentApplication
. For this purpose a new data structure, called
HistoryContainer
was designed. Each AgentApplication
maintains one
HistoryContainer
for observations (m_obsDataStruct
) and one for rewards
(m_rewardDataStruct
). m_obsDataStruct
stores one deque for each connected
ObservationApplication
in which the newest m_maxObservationHistoryLength
observations received from this ObservationApplication
are stored.
Additionally, m_obsDataStruct
contains another deque, which stores the
newest observations received independent from the ObservationApplication
.
m_rewardDataStruct
is used equivalently. In this way, the user can
specify how much observation and reward data is stored in the
AgentApplication
and use it arbitrarily.
Besides storing the received data, it is necessary to inform the
AgentApplication
when an observation or a reward is received. The user
can then specify the behavior of the AgentApplication
in response to such
a message. For example, the AgentApplication
could wait for 10
observations before inferring the next action. This is done by
registering the abstract methods AgentApplication::OnRecvObs
and
AgentApplication::OnRecvReward
at the according ChannelInterface
s.
This framework is intended to make communications between RL components
more realistic. Nevertheless, it shall still support using global
knowledge (e.g. knowledge available on other Node
s) to calculate rewards
and observations. Particularly, global knowledge can be helpful to
calculate rewards during offline training. If such global knowledge (i.e.
data available without delay or communication overhead) shall be used,
it can just be accessed when rewards and/or observations are calculated
within the AgentApplication
or data can be transmitted via
SimpleChannelInterface
s.
1.4.2.5. Execution of actions¶
After the AgentApplication
called
OpenGymMultiAgentInterface::NotifyCurrentState
, it receives an
action via AgentApplication::InitiateAction
from the Python side.
To simulate the computation delay of the agent, an actionDelay
can be configured in OpenGymMultiAgentInterface::NotifyCurrentState
.
Then the OpenGymMultiAgentInterface
delays calling
AgentApplication::InitiateAction
by the configured actionDelay. Per default,
AgentApplication::InitiateAction
sends the received action to all connected
ActionApplication
s. Because data is transmitted via
OpenGymDictContainer
s between RlApplication
s, the received action is
wrapped into such a container under the key "default". This method is
intended to be overwritten if different behaviour is needed. In this way,
the action can for example be divided into partial actions that are
sent to different ActionApplication
s. Alternatively, one could also
specify in a part of the action to which ActionApplication
s the action
shall be sent.
1.4.2.6. Inference agents vs. training agents¶
In many RL tasks different agents perform inference and training.
Therefore, one could provide different AgentApplication
classes for these
two purposes. Nevertheless, a general AgentApplication
class, which can
perform both inference and training is also necessary to support e.g.
online training. Consequently, the AgentApplication
s used for inference
and training would only be specializations of this class, which provide
less functionality. That is why it was decided to leave it to the user
to use only the functionality which is needed in the current use case.
When it is necessary to differentiate between inference and training
agents, this can be done e.g. by a flag introduced in a user-defined
inherited RlApplication
.
1.4.3. DataCollectorApplication¶
The DataCollectorApplication
is the base class which is inherited by
ObservationApplication
and RewardApplication
since both provide similar
functionality: They collect scenario-specific data, maintain
ChannelInterface
s connected to AgentApplication
s, and provide
functionality to send over these interfaces. To register the applications
at scenario-specific trace sources the user has to define a custom
ObservationApplication::Observe
resp. RewardApplication::Reward
method with a custom signature within the custom ObservationApplication
resp.
RewardApplication
. To provide a place to connect this custom method with
an existing trace source, the abstract DataCollectorApplication::RegisterCallbacks
method
was created. If necessary, the user may also register multiple custom
ObservationApplication::Observe
resp. RewardApplication::Reward
methods within
DataCollectorApplication::RegisterCallbacks
. To ensure that the callbacks are registered before
the simulation starts, DataCollectorApplication::RegisterCallbacks
is called in the DataCollectorApplication::Setup
method.
Each ObservationApplication
resp. RewardApplication
can send observations
resp. rewards to one or multiple AgentApplication
s in order not to limit
possible scenarios.
1.4.4. ActionApplication¶
The ActionApplication
provides functionality to maintain ChannelInterface
s
which are connected to AgentApplication
s and to receive actions (in the
form of OpenGymDictContainer
s). The abstract method ActionApplication::ExecuteAction
s is
designed to provide a place for the user-specific code that handles the
different actions. This method is automatically called when data is
received on the registered ChannelInterface
s. Therefore, it is connected
to the according callbacks within the ActionApplication::AddAgentInterface
method.
1.4.5. General Decisions¶
All RlApplication
s have to store multiple ChannelInterface
s that connect
them to other RlApplication
s. Typically, all ChannelInterface
s connected
to a specific remote RlApplication
are used together. Furthermore,
multiple ChannelInterface
s between a pair of RlApplication
s have to be
supported to enable communication over different channels. Therefore,
InterfaceMaps were introduced, which are essentially two-dimensional maps.
The outer map is unordered and maps applicationId
s to a second ordered map.
The second map maps an ID to the ChannelInterface
. This ID is unique
within this map of ChannelInterface
s connected to a specific RlApplication
.
To ensure this uniqueness, the entries are stored in ascending order of
the IDs. In this way, one can simply use the last entry to generate a new
unique ID. Connecting two RlApplication
s over multiple ChannelInterface
s
is an edge case. Therefore, all RlApplication::Send
methods are implemented with
signatures that allow to send to a specific RlApplication
. Nevertheless,
storing ChannelInterface
s with IDs makes it possible to also provide
methods to sent over a certain ChannelInterface
.
We did not consider that during inference the agent might not be able to compute another action. In reality, the computation either needs to be queued (“single threaded”) or processed in parallel (“multi threaded”). The latter case is different than the current implementation, because the individual inference times increase with increased parallelism. For a detailed discussion as how to extend the framework with this feature, see Framework expansion options
In complex scenarios with many ObservationApplication
s and
AgentApplication
s each ObservationApplication
should possibly be able to
communicate with each AgentApplication
. In this case, it is not
practicable to configure all communication connections before the
simulation started. Therefore, it is necessary to support dynamically
adding and removing ChannelInterface
s during simulation time, which is
done by RlApplication::AddInterface
and RlApplication::DeleteInterface
methods.
In some cases, one has to configure something within an RlApplication
based on the attributes which were set but before the application is
started. One example for this is the initialization of data structures
with a scenario-dependent length. To provide a central place for such intialization
functionality which cannot be placed in the constructor, the RlApplication::Setup
method was created.
1.5. Interface for Multi-Agent RL¶
Gymnasium is a commonly used environment interface for single-agent training, which is also supported by ns3-ai [ns3-ai]. For multi-agent training Ray implemented the MultiAgentEnv API [MultiAgentEnv]. Besides this API, there is also the PettingZoo API [Pettingzoo] proposed by the Farama Foundation. Besides the Agent Environment Cycle (AEC) API, which is the main API of PettingZoo, exists also a Parallel API. For both APIs, RLLib provides a wrapper to make them compatible with the MultiAgentEnv [PettingzooWrapper].
Since this framework is intended to support multi-agent RL, it had to be decided which API to use. For the chosen API, the ns3-ai interface then had to be extended to support multi-agent RL.
The basic idea of the AEC [AEC] is that agents step sequentially and not in parallel. This restriction is intended to create a better understandable and less error-prone model to prevent developers for example from race conditions.
To decide for an API, the following aspects were considered:
The AEC API is a subset of the MultiAgentEnv API, meaning that everything implemented with AEC API is representable with MultiAgentEnv. Using the AEC API would therefore add no functionality, but could be less error-prone because of its restrictions.
For every step of an agent, observations and rewards have to be transferred from C++ to Python and an action back from Python to C++. To avoid difficulties with synchronizing agents, the most simple model is sequentially stepping agents. If agents should step simultaneously this can then be simulated by not continuing the simulation time between their steps.
Including the AEC API when training with RLLib means including a further dependency and the environment would have to be wrapped into a MultiAgentEnv.
According to [PettingzooWrapper], AEC expects agents to work in a cooperative manner. However, this framework should support also conflicting agents.
Documentation of RLLib is not as comprehensive as it should be in some places. Nevertheless, there are many code examples for RLLib online to look up.
For these reasons, it was decided to use the MultiAgentEnv API instead of the PettingZoo API, but apply the restriction of sequentially stepping agents when expanding ns3-ai.
This framework should support both single-agent and multi-agent RL. To provide a uniform interface without code duplication, this framework handles single-agent RL as a special case of multi-agent RL.
Communication between the Python-based training process and the simulation
in C++ works over the Ns3MultiAgentEnv
(in Python) and the
OpenGymMultiAgentInterface
(in C++), which were added to ns3-ai. The
training/inference process is then initiated by the Python side using
Ns3MultiAgentEnv
. The Python process starts the ns-3 simulation process
(implemented in C++) as a subprocess and waits for receiving observations
and rewards from the C++ process. Whenever an agent decides to step (via
the AgentApplication::InferAction
method), the C++ process running the ns-3 simulation
switches back to the Python process via the OpenGymMultiAgentInterface::NotifyCurrentState
method
with the observation and the reward of the according agent. The Python
process answers with an action for this agent. Only then, the simulation is resumed
and the callback registered in OpenGymMultiAgentInterface::NotifyCurrentState
is called with the
action. Note the one to one relation between environment steps and calls
to AgentApplication::InferAction
. If the simulation does not call AgentApplication::InferAction
, the
environment won’t step.
1.6. Helper¶
In a typical use case this framework has to be integrated into an existing ns-3 scenario. In ns-3, the concept of helpers is commonly used to simplify the configuration and setup tasks the user has to perform.
In ns-3.42 an ApplicationHelper
was introduced, which is used to create
and install applications of a specified type on Node
s. To avoid repeating
casts, which would lead to very cluttered code, an RlApplicationHelper
was introduced by this framework which returns RlApplicationContainer
s
instead of ApplicationContainer
s.
The main configuration task of this framework is the setup of all
communication connections between RlApplication
s, e.g. the connection of
all ObservationApplication
s to their according AgentApplication
s. For this
purpose, the CommunicationHelper
was created. The framework should allow
all possible connections between pairs of RlApplication
s without making
any restricting assumptions. This is done by letting the user configure
the communication relationships via an adjacency list. Thereby, it is even
possible to configure multiple different connections, e.g. over different
channels between two RlApplication
s.
To allow the user to identify RlApplication
s e.g. when passing them to
this adjacency list, RlApplicationId
s were introduced. They consist of a
part identifying the applicationType
(e.g. ObservationApplication
) and an
applicationId
which is unique among all RlApplication
s of this type. In
this way, the applicationType
can be identified when necessary and
whenever the applicationType
is clear, only the applicationId
is used for
identification. The CommunicationHelper
is also used for creating these
unique Ids.
To do this, it needs to have access to all RlApplication
s existing in a
scenario. One option for this is to create all RlApplication
s within the
CommunicationHelper
. This requires the user to provide the
CommunicationHelper
with all Node
s and the according:code:applicationTypes to
install on them. However, this would just move the identification problem
to the level of the Node
s. Additionally, this approach would conform less
with the general idea that the user defines the location of applications
by installing them on Node
s. That is why, the tasks of creating/installing
RlApplication
s and configuring them and their communication relationships
was split between the RlApplicationHelper
and the CommunicationHelper
.
In this way, it is required that the user passes all RlApplication
s to
the CommunicationHelper
. Then the RlApplicationId
s can be set by the
CommunicationHelper
via the CommunicationHelper::SetIds
method.
Besides a pair of RlApplicationId
s, the user has to specify in the
adjacency list all attributes that are necessary to configure the
connection between these RlApplication
s. This is done via
CommunicationAttributes
as a compact format for all possible configuration
data. If no information (i.e. {}
) is provided by the user, the framework will
establish SimpleChannelInterface
s, so that as little configuration is
required as possible. If SocketCommunicationAttributes
are provided, the
CommunicationHelper
is responsible for creating the according
ChannelInterface
s and connecting them. The main goal when designing this
configuration interface was to enable as many configurations as possible,
while making as few configurations as possible necessary. That is why, e.g.
a default protocol for SocketCommunicationAttributes
and default IP addresses
for each RlApplication
(that is derived from the list of network interfaces
of its Node
) were implemented.
The CommunicationHelper::Configure
method was introduced to make it possible to simultaneously
call the RlApplication::Setup
method on all RlApplication
s at a time which is
independent from e.g. the constructors, so that it can be done after
setting the RlApplicationId
s but before setting up the communication
relationships. The methods CommunicationHelper::Configure
and CommunicationHelper::SetIds
could be called
combinedly in a single method, so that the user does not have to
call two methods. However, this was not done so far because both methods
perform very different tasks.
1.7. Framework expansion options¶
Create interface for sharing model updates or policies between agents. (already implemented to a large extent)
In some network infrastructures it is necessary to outsource training to a remote server, to share learned model updates, or to share policies between participants. To simulate resulting constraints and research possible opportunities it is required to realistically simulate the performance of shared updates and policies as well as their size. This feature addresses issues like:
How is performance affected when learning distributedly?
What burden does resulting communication pose on a network and can it be reduced?
The required communication functionality is already implemented to a large extent: On the ns-3 side, in
AgentApplication::OnRecvFromAgent
logic to handle model weights, experience, and model update messages need to be handled by the agent. The message flow is depicted in Interaction of inference agents, trainings server, and the ns3-ai message interface.This message flow is fully implemented; only the ns3-ai message handling on the Python side alongside the interaction with Ray is still missing.
Support moving agents (and other
RlApplication
s) to anotherNode
. (not started)
In complex scenarios it might be required to change the
Node
from which the agent receives its observations or where it performs its actions. Currently, this would require installingObservationApplication
s andActionApplication
s on every possibleNode
and then switch between them when sending. Since this is prone to bugs at runtime and difficult to track especially for bigger scenarios, it would be more handy to move an existing application to a differentNode
. The same applies if agents shall switch theNode
during simulation time. This would be possible via model updates if anAgentApplication
was installed on every possibleNode
. However, it would be much easier if it would be possible to move an application to anotherNode
.Checkpointing (almost done)
To simulate inference without training or continue training of promising policies, it is required to implement Ray’s checkpointing. We have already implemented inference runs. However, continuing training hasn’t been tested yet.
Multithreading vs. Singlethreading (not started)
What happens if multiple observations arrive while the agent is already inferring an action? In a realistic scenario with limited resources, the agent might only be capable of starting a limited amount of threads for inference. Then, increased parallelism increases the inference times for a job. Maybe the node is even single-threaded. To provide inference for all observations it would be required to buffer some of the observations. This feature would allow to simulate thereby introduced latency as well as additional limitations in regard to the buffer size. Scenarios could explore questions like: Which buffer strategies are sensible for overall performance if the buffer is full? How beneficial is it to provide more resources for the agent in order to allow multithreading? This would lead to quantifiable answers to complex optimization problems.