bb_estimator

Overview

Brief description: ROS service performing rough bounding box estimation from specified 2D region of interest using the Kinect depth data. Beside that, this node also offers a service for calculation of 2D convex hull of a 3D mesh or a point cloud. It was created within SRS project.

Actual version: 7.0 (20.8.2012)

Author: Tomáš Hodaň (xhodan04@stud.fit.vutbr.cz) and Michal Španěl (spanel@fit.vutbr.cz) from dcgm-robotics@FIT group

Contents

Overview
BB Estimator - Server
1. BB Estimation Modes
2. Cache
BB Estimator - Client
ROS API
Manual
1. How to Compile
2. How to Run
Source files

BB Estimator - Server

The task is to estimate a 3D bounding box (BB) from a 2D region of interest (ROI). Since there is missing one dimension, there can be found different approaches to interpret the ROI. Three different approaches were identified, implemented and described below as three estimation modes.

BB Estimation Modes

In all modes, zeros are ignored in depth data as unknown values. Thus, if there isn't any non-zero value in the specified ROI, BB cannot be calculated and user is informed about this fact through a warning in console.

A situation when a BB is going under the floor is handled is such way that the BB is shrinked (its bottom face is raised) so the lowest corner lies right on the floor.

Notes about images below:

At each mode, there is included a visualization of the resulting BB. All of them are calculated from the same ROI, so you can compare them and clearly see characteristics of the different approaches.
There are also images explaining the geometry. They depicts the scene from the top view in direction of -Y axis. The situation from the perpendicular view (in direction of -X axis) is the same, thus it is not considered in the further explanation for simplicity.
Legend for the geometry images: 0 - origin of camera (world) coordinate system, Z - camera (world) Z axis, X - image X axis positioned at Z = focal length, ROI - region of interest specified on the image plane, R1 - X-coordinate of ROI left edge, R2 - X-coordinate of ROI right edge, Zmean - mean of depth in a space corresponding to ROI, ZstdDev - standard deviation of depth in a space corresponding to ROI, Dmean - mean of distance from origin in a space corresponding to ROI, DstdDev - standard deviation of distance from origin in a space corresponding to ROI, t1 - perpendicular projection of viewing frustum left side on XZ-plane, t2 - perpendicular projection of viewing frustum right side on XZ-plane, T1/T3 - intersection of t1 and some depth/distance level (according to given mode), T2/T4 - intersection of t2 and some depth/distance level (according to given mode)

Estimation Mode #1

(This is the default estimation mode.) ROI corresponds to perspective projection of BB front face. BB is rotated to fit the viewing frustum (representing back-projection of ROI) in such way, that BB front face is perpendicular to the frustum's center axis. BB can be non-parallel with all axis, thus each coordinate of each BB corner can be different. This mode doesn't directly work with depth given by Z-coordinate (as it is in the other estimation modes). Instead, this depth is converted to distance from origin. Corners of the BB front face are then given as points on viewing frustum edges with distance from origin equal to Dmean-DstdDev. Corners of the BB back face have distance from origin equal to Dmean+DstdDev and are positioned appropriately (translated along the center axis of the viewing frustum) so all corners form a perpendicular BB.

Note: On the geometry picture, the corners of the bottom face have the same X- and Z-coordinate as the corresponding corners of the top face. This is however not necessarily the case. Here it is considered for simplicity of explanation. However, the explanation holds also for other cases.

Geometry of BB estimation mode #1

Estimation Mode #2

ROI contains the whole perspective projection of BB, which is parallel with all axis. The BB front face is in depth Zmean-ZstdDev and the back face is in depth Zmean+ZstdDev. When R1 is negative and R2 positive, the calculation is identical with estimation mode #3. Otherwise (if both are negative or positive), we can transform the situation to the one depicted on the geometry picture (to unify the calculation). The X- and Z-coordinates of left-back BB corners are given by T1, which is a point on t1 in depth of Zmean+ZstdDev, and the X- and Z- coordinates of right-front BB corners are given by T2, which is a point on t2 in depth of Zmean-ZstdDev.

Geometry of BB estimation mode #2

Degenerate cases:

As can be seen on the lower geometry picture, T1 is on the right of T2 (it has a bigger X-coordinate). Hence, it is not possible to construct a BB with required dimensions, because its projection wouldn't fit ROI any more. In this case, the X-coordinate of all BB corners is set to a mean value of T1 and T2 X-coordinates (therefore the visualized BB is only a line from the top view). There is of course equivalent problem for Y-coordinate.

Geometry of BB estimation mode #2 - Degenerate Case

Estimation Mode #3

ROI corresponds to perspective projection of the BB front face. BB is parallel with all axis. The BB front face is in depth Zmean-ZstdDev and the back face is in depth Zmean+ZstdDev.

Geometry of BB estimation mode #3

Cache

An already existing class message_filters::Cache was used to implement a cache memory. There is one cache for depth data (or point cloud data if the subscription variant #2 is used) and one for camera information. Both have default size of 10. Messages are at first synchronized and then added to the appropriate cache.

When a request arrives, from both cache memories are retrieved messages whose timestamp is the latest one before the request timestamp. These messages are then used for bounding box estimation.

BB Estimator - Client

This client mainly intends to demonstrate the function of service /bb_estimator/estimate_bb, to allow user to understand different estimation modes and examine the functionality on a test data.

For visualization of the resulting BB on a 2D image is used standard perspective projection (using the intrinsic matrix obtained from a CameraInfo message). Edges of the BB front face are drawn with red color, the other edges are blue.

The client can also call services of srs_interaction_primitives package by which the resulting bounding box can be visualized using interactive markers in rviz (see the examples below).

Visualization of BB in rviz - fridge

ROS API

In this section, you can find details about the provided service and a summary of ROS nodes and parameters.

Services

All services starts with prefix /bb_estimator

Service Name	Input	Output	Description
/estimate_bb	Header header int16[2] p1 int16[2] p2 int8 mode	float32[3] p1 float32[3] p2 float32[3] p3 float32[3] p4 float32[3] p5 float32[3] p6 float32[3] p7 float32[3] p8 geometry_msgs/Pose pose geometry_msgs/Vector3 scale	Estimates the bounding box
/estimate_bb_alt	Header header int16[2] p1 int16[2] p2 int8 mode	geometry_msgs/Pose pose geometry_msgs/Point bounding_box_lwh	Estimates the bounding box
/estimate_rect	Header header geometry_msgs/Pose pose geometry_msgs/Vector3 scale	int16[2] p1 int16[2] p2	Estimates the image rectangle
/estimate_rect_alt	Header header geometry_msgs/Pose pose geometry_msgs/Point bounding_box_lwh	int16[2] p1 int16[2] p2	Estimates the image rectangle
/estimate_2D_hull_mesh	Header header arm_navigation_msgs/Shape mesh	geometry_msgs/Polygon convexHull	Calculates the 2D convex hull of a 3D mesh
/estimate_2D_hull_point_cloud	Header header sensor_msgs/PointCloud2 pointCloud	geometry_msgs/Polygon convexHull	Calculates the 2D convex hull of a 3D point cloud

Service Details

The server provides a service called /bb_estimator/estimate_bb, which is performing the bounding box estimation. Request and response of this service are defined as follows.

Request

Attribute Name	Type	Description
header	Header	Contains (among others) a timestamp, which is necessary for time synchronization.
p1 and p2	int16[2]	2D points representing two diagonally opposite corners of ROI (the first element of array = X-coordinate, the second = Y-coordinate).
mode	int8	BB estimation mode. If it isn't specified, the default value is 1 (= estimation mode #1).

Response

Attribute Name	Type	Description
p1 - p8	float32[3]	Eight 3D points representing the corners of the resulting BB (the first element of array = X-coordinate, the second = Y-coordinate, the third = Z-coordinate).
pose	geometry_msgs/Pose	Includes a position and an orientation of the resulting bounding box.
scale	geometry_msgs/Vector3	Includes a size of the bounding box in direction of X-, Y- and Z-coordinates (before rotation given by orientation - included in pose).

Notes:

The bounding box can be fully represented by the eight points or by the pose and scale. Both ways are offered in response now (mainly for testing reasons). However, the eight point representation can be removed in the future versions, due to its poor efficiency.
There are provided all eight corners in the response, because estimation mode #1 can produce a BB which is non-parallel with all axis. In fact, this BB could be uniquely represented by only four corners, but there are included all of them to avoid troubles with calculation of the remaining four corners.

Nodes

Node Name	Description
bb_estimator_server	BB estimator - server
bb_estimator_client	BB estimator - client

Parameters

The parameters mentioned below are influencing the behavior of the server and client and can be set on the parameter server.

There are two variants of subscription to topics. Subscription variant #2 is meant to be used with Care-o-Bot simulation, which does not produce a depth map and so it must be created from a point cloud. Also the names of other topics are different when using a real robot or its simulation.

Parameters for subscription variant #1:

Parameter Name	Description	Default Value
bb_sv1_camInfo_topic	Name of topic with CameraInfo messages.	/cam3d/depth/camera_info
bb_sv1_depth_topic	Name of topic with Image messages containing depth information.	/cam3d/depth/image_raw
bb_sv1_rgb_topic	Name of topic with Image messages containing RGB information (it is used only by the client).	/cam3d/rgb/image_raw

Parameters for subscription variant #2:

Parameter Name	Description	Default Value
bb_sv2_camInfo_topic	Name of topic with CameraInfo messages.	/cam3d/camera_info
bb_sv2_pointCloud_topic	Name of topic with PointCloud2 messages containing Point Cloud.	/cam3d/depth/points
bb_sv2_rgb_topic	Name of topic with Image messages containing RGB information.	/cam3d/rgb/image_raw

Other parameters:

Parameter Name	Description	Default Value
outliers_percent	Percentage of furthest points from the mean depth/distance considered as outliers when calculating statistics of ROI.	10
sampling_percent	Percentage of rows and columns considered for sampling (to speed up the calculation of statistics (mean, standard deviation, etc.), only data on a sampling grid is taken into account - density of the grid is given by the sampling percentage).	30
global_frame	Frame ID in which the BB coordinates are returned.	/map
sides_ratio	The required maximum ratio of sides length (the longer side is at maximum sidesRatio times longer than the shorter one).	5

Manual

How to Compile

To be able to use both, the BB Estimator server and client, just download the srs_env_model_percp package and compile it.

 rosmake srs_env_model_percp

How to Run

1) Start the main node of ROS system:

roscore

2) If desired, set parameters on the parameter server (Server Parameters, Client Parameters).

3) Start the nodes performing the function of server and client (you can, of course, start up just the server and send your own request to the /bb_estimator/estimate_bb service).

rosrun srs_env_model_percp bb_estimator_server
rosrun srs_env_model_percp bb_estimator_client

4a) Connect to a Care-O-bot or run its simulation.

4b) Alternatively, you can test the BB Estimator using a bag file. In this case, you can skip a few seconds at the beginning, since there are usually some missing data (e.g. messages on cam3d/* topics) - it is often convenient to set the value of the argument START_TIME to something around 6 seconds (6 seconds is also its default value).

. utils/play_bag.sh -p PATH-TO-BAG-FILE [-t START-TIME]

The replay is paused at the beginning, so press key SPACE to start it (or later to pause it again). It goes in the loop, so you can just keep it running.

5) If the client was started, use your mouse to specify a ROI (in the Input Video window) for which you want to calculate a BB. After specifying a ROI, you will get a new window called Bounding Box with a visualization of the resulting BB.

6) By pressing key M you can test the service for calculation of 2D convex hull of a 3D mesh.

Notes:

While testing using a bag file, it is convenient to pause the replay, try to specify different ROI within the same frame and switch between different visualization and BB estimation modes.
To swith between different BB estimation modes, activate one of the windows created by the client and press key E.
It is possible to switch between a visualization of image or depth data in the windows created by client (Input Video or Bounding Box). To do that, activate one of these windows and press key D.

Visualization of image data Visualization of depth data

Source files

The relevant source files are included in but_srs stack in srs_env_model_percp package. In particular, there are these files:

src/bb_estimator_server.cpp - source file of BB Estimator server.
src/bb_estimator_client.cpp - source file of BB Estimator client.
include/bb_estimator/services_list.h - definition of service names (there is only one service right now).
srv/EstimateBB.srv - definition of a request and a response of the service /bb_estimator/estimate_bb.
utils/play_bag.sh - script replaying a bag file and setting simulated time publishing (this simulated time corresponds to time in the given bag file). All nodes are then using this simulated time.

Note: All these source files are intensively commented, so if you don't find any information in this documentation, you can have a look into them.