This is the beginnings of what I hope we can present as a formal proposal for a multimaster framework in RoS, starting with the most fundamental block - the gateway. It should also help us clarify and crystallise exactly the design goals we wish to meet.
Definitions
So we're all on the same page:
Ros API : a single topic/service/action or a collection of such for a node or launched system.
Gateway : the object connecting local and remote ros systems.
Public Interface : the set of ros api offered by a gateway for others to use.
Flipping : the process of flipping local ros api out to a remote system.
Pulling : the process of pulling ros api from a remote gateway’s public interface to the local system.
Concept
The gateway will be the public frontend for a ros master and is intended to act much like a gateway on a local area network controlling what is exposed and what is forwarded between the local ros master and the outside world (other ros systems). The intention is to generalise this kind of interface beyond tools that previously existed (foreign_relay, master_sync and fki_multimaster) and also make their configuration/usage as simple as possible.
Design Goals
- Gate_G01 : do not burden non-multimaster systems with unnecessary overhead
- Gate_G02 : do not interface directly with foreign ros masters (put the gateway inbetween)
- Gate_G03 : support all ros api types (topics, services and actions).
- Gate_G04 : auto-discover other gateways (pre-configured, zeroconf or name server)
- Gate_G05 : provide quality of network connection statistics between systems
- Gate_G06 : control what ros api should be put on a public interface (simplicity and security)
- Gate_G07 : support flipping of local ros api to a remote gateway (control what and where your share)
- Gate_G08 : support pulling of publicly exposed remote ros api from a remote gateway (open sharing)
- Gate_G09 : configure/manage the type of connections created between systems (unreliable, comporessed, ...)
- Gate_G10 : peer to peer gateway interactions (true multimaster, not two multimaster)
- Gate_G11 : access control to permit/block requests for flipping/pulling
- Gate_G12 : decouple from higher level components such as the app_manager (re-usable building block).
Design Decisions
- Gate_D01a : sit alongside, do not extend rosmaster itself
- Gate_D01b : modular components called when necessary (discovery, sync, zeroconf, multicast)
- Gate_D02 : adapter like interface that accepts requests on one side and interacts with the ros master api on the other
- Gate_D03 : leave concepts like bundling of ros api in capabilities (or similar) to higher level components
- Gate_D04 : gateway discovery mechanisms should be optional and varied
- By hand (yaml), centralised server(redis), zeroconf or multicast.
- Gate_D05 : assume each gateway represents a single system interface that needs to be monitored, this makes it simpler and is typical for robots, even if there is multiple machines connected internally
- Gate_D06a : default settings for a gateway should not expose anything.
- Gate_D06b : a convenience option to dump all local topics on the public interface.
- Gate_D07a : gateways should have the option to block flip requests.
- Gate_D07b : flip requests should have enough details to the local ros master api.
- Gate_D07c : a convenience option to flip all local interfaces to a remote.
- Gate_D07d : a convenience option to flip all public interfaces to a remote.
- Gate_D09a : specify transport type and hints at the system level (e.g. reliable/unreliable configuration via roslaunch).
- Gate_D09b : more complete transport types (multi-language unreliable, compression etc).
- Gate_D10 : do not make decisions about what to expose, where and how - focus should be that it is just a tool that can be controlled from elsewhere.
Other Notes
Gateway Model - some more details here.
First Implementation - some technical steps outlining progress towards a first implementation.
Comments/Issues
Gateways as relays only?
In the initial implementation, will gateways act as relays? i.e. all topics in the public API will be subscribed by the gateway and republished to the foreign gateway?
I had thought about this as well. One problem is that there is currently no way to guarantee unreliable connections. Unreliable transport hints go in at subscriber creation - there is nothing at the publisher end (if an unreliable subscriber requests a connection with a publisher, it will be an unreliable connection if the transport types can do it - i.e. roscpp can do it since it has a udpros implementation, but rospy connections can't). This has some interesting consequences.
- Flipping roscpp publishers out to a remote ros system don't need anything done and subsequently don't need relays.
- Flipping rospy publishers wouldn't create any unreliable connections, so these could use a relay to convert to a roscpp publisher
- The resulting connection to a publisher on a remote ros system is at the mercy of the subscriber (don't have any control over whether the user is connecting a reliable or unreliable subscriber).
- Similarly, a relayed unreliable subscriber connection to a remote system is at the mercy of the kind of publisher (roscpp ok, rospy fail).
Still, at least it would ensure that the gateway has done it's part in making sure the connections would be unreliable. I'd like to raise some of these issues on the ros-ng sig [DS].
Transport Types
Gate_G09 will be a difficult problem. Currently you can't configure a connection's type or transport hints from the system level, e.g. from roslauchers, like remaps are done. You can't introspect on them either (introspection would allow you to interpose relays if desired). See the discussion on Transports.
The TF Tree
Exposing TF is important as it allows out of the box usage of a number of existing ROS applications. Ideally, only a subset of the TF tree should be exposed as part of the public ROS API. This is necessary for ensuring some privacy of data of the local machine, as well as not burdening foreign TF trees with unnecessary transforms.
Good point - Nick said they compressed TF trees. Also we should see what has changed with tf2 and whether it is easier now [DS].
Clock Synchronization
This might be important for things like tf, but I am not sure how crucial it is.
Other
There are alot of different ideas on exactly how to implement the higher level components (linking, co-ordination etc). Quite likely they will require different variations. Since the target is a long way off, we're not worrying about it just yet.