[Documentation] [TitleIndex] [WordIndex

  Show EOL distros: 

node_monitoring: nodemon_cpp | nodemon_lua | nodemon_msgs | nodemon_py | nodemon_tui | nodemon_webview

Package Summary

node_monitoring

Package Summary

The node_monitoring package

  • Maintainer status: developed
  • Maintainer: Andre Luis Frana <andreluisfranca98 AT gmail DOT com>
  • Author: Andre Luis Frana <andreluisfranca98 AT gmail DOT com>
  • License: MIT
  • Source: git https://github.com/alf767443/node_monitoring.git (branch: main)

Stack Summary

This stack provides messages and client libraries to monitor the states of nodes in general, and critical failure situations in particular.

Robot systems often comprise a variety of components. Each can fail for a variety of reasons like hardware failures, communication errors, or implausible requests and configuration. These errors might go unnoticed in particular in systems with many nodes and lots of log messages. The goal of the node monitoring stack is to provide frequent heartbeat information to assert aliveness and proper operation of nodes, and to announce critical system-level errors. The stack contains a text-based interface to view this information, and the skillgui was extended recently to display this information.

We distinguish two classes of errors. One are anticipated execution and behavior level failures, and the other are system-level failures often cascading into a condition where fault diagnosis is required. As an example consider a vision system that is to detect an object on a table on request. An error that has to be expected is that the object might not be visible, for example if it's simply not there or the gathered data is inconclusive. However, if the object is not visible because the camera cannot produce new images due to a pulled cable, then this facilitates a system-level failure that is announced via the node monitoring library (possibly by both, the image gathering and the image processing node). In a way this compares to to exception handling as known from C++. During normal operation functions that are called return a value like true or false, but in case of an error an extraordinary error is raised interrupting the normal program flow.

The stack is in active use and considered stable, though not immutable. We currently provide libraries for C++, Python, and Lua to make integration as easy as possible. For all other languages the messages can be used as-is. Documentation is provided in the libraries and examples come with each library.

The source code is available at http://github.com/timn/ros-node_monitoring.

Report a Bug

Reports bugs at https://github.com/timn/ros-behavior_engine/issues


2024-11-09 14:41