An Introduction to Testing Robot Code
Ted Kern
on 14 September 2020
The myriad of different fields that make up robotics makes QA practices difficult to settle on. Field testing is the go-to, since a functioning robot is often proof enough that a system is working. But online tests are slow. The physical environment must be set up. The entire system has to be in a workable state. Multiple runs are needed to build confidence. This grinds development to a halt. To those who took courses in computer science, this is akin to using the auto-grader “as the compiler.” It’s slow, it makes it hard to narrow down what isn’t working, it’s prone to wasting time seeing runs fail for simple errors.
Many roboticists don’t come from a background in software engineering, and they haven’t seen firsthand the benefits of having a solid test setup.
Having a good test system:
- makes development significantly easier and faster.
- allows one to make changes in one part of the codebase without worrying about understanding the rest.
- lets one verify integration with the whole system without having to worry about how errors will look at higher or lower levels of abstraction.
- allows newcomers to be confident in their ability to contribute to a project when they can get feedback from the test suite.
This post will hopefully help get those unfamiliar with the basics of writing tests up and running with an offline test suite for their robot.
Terminology
We’ll start with some definitions and descriptions
Assertion
An “assertion” is a programming expression that contains a statement that should be true when run, and raises a descriptive error when false. Often a language feature that produces more descriptive output when run in a test or debug environment.
Test Case
A “test case” is a basic unit of testing. A block of code, often in a function/method, containing one or more assertions.
Test Runner
A “test runner” is a software tool that manages invoking the code that contains test cases. Tools often look in specific locations or for specially annotated code to run. Can also provide functionality for mocking code and setting up fixtures (defined below), or fabricating program state, as well as tearing down and restoring state between tests.
Dependency Injection
When code is designed such that external dependencies are passed as arguments to the functions that will be accessing them, tests can “inject” special test objects as those dependencies instead of the real deal. This practice is called “dependency injection”, and it can lead to dramatically simpler tests with a narrower scope. Also, structuring the codebase in this way makes it more modular, flexible, and allows developers to more easily respond to change.
For example, imagine a function in a robot’s path planner that accepts a series of waypoints and attempts to generate a path between them. This function relies on an external path library, which has a PathPlanner
class with children AStar
, WeightedAStar
, and Dijkstra
. Rather than have the function body make mention of these classes right when needed, dependency injection would mean adding an argument of type “PathPlanner” to the function, where PathPlanner
defines the interface different implementations will implement.
This first allows an easy way to change how the path is generated without changing the code itself, and secondly allows us to easily pass in test objects (defined below) to handle generating the path when testing.
Monkey Patching
“Monkey patching” is the process of dynamically replacing parts of the code. For example, changing the definition of a function at runtime, or overwriting a constant in an imported library. Monkey patching is natively supported in dynamic languages like Python, where assignment is rarely restricted. Various libraries extend support and add functionality to limit the scope of the patch and restore behavior after use.
Languages like C++ are more difficult to monkey patch, and are more reliant on good design principles like utilizing dependency injection to provide a foundation.
Fakes, Stubs, and Mocks
These are each different types of objects that are coded to match a specific interface, but are used exclusively for testing. Often used interchangeably, they refer to specific ways of substituting test logic in for the actual logic.
When a program has a complex data flow, with many objects that carry information through multiple functions, these tools let you quickly isolate a step in the call chain and test it without needing to record and manually rebuild objects from earlier in the data flow. With good use of dependency injection, you can use these objects to cordon off your test code, preventing the test from failing because of calls to external functions despite their preconditions being met.
Stub
A “stub” provides fixed data instead of relying on normal logic, and is used to isolate specific parts of code under test. Stubs are often used in place of systems that rely on network, disk, or hardware reads.
For example, to test that a function that takes an action based on a sensor reading, stubbing the sensor object allows you to slot in arbitrary sensor readings without relying on actual hardware reads.
Fake
A “fake” is an object that circumvents normal logic to provide an easier or more performant result when under test conditions.
For example, a function that relies on a path planner might not need optimality, or to start an anytime or dynamic path planning system while under testing. When the test data is a simple, low dimensional occupancy grid, a fake object could instead just run A*.
Similarly, a fake object can replace a network stream object so that when a test attempts to make a network connection, it instead automatically succeeds and subsequent calls to read from the stream feed out a scripted sequence instead.
Mock
Fakes and stubs are generally used to allow for testing the behavior of a given piece of code, where they’re passed in and never touched by the test again. A “mock”, on the other hand, is dynamically modified by the code performing the test, and it provides tools for introspection on how it was called. Mocks allow for verifying an interface between the code under test and the object being mocked. For example, how many times was a particular method of this mock object called, and with what arguments?
Aside: Overmocking
Test objects like mocks, fakes, and stubs are great tools for setting boundaries for your tests and creating test states. However, test design is a skill, and knowing when and how to apply these tools is a part of that skill. Test objects don’t necessarily have to behave like what they’re replacing, which means you can easily create an “ideal” case that never matches up to how your code actually behaves, or have behavior drift as your implementation changes over time.
It’s important to keep in mind that mocks work best when you’ve correctly targeted the behavior, and not the implementation, of your code. They can hide errors or real world cases that only well designed test data, or real world data, can catch. Be careful of overmocking!
Fixture
A “fixture” is a convenience tool provided by test libraries. Fixtures are objects cleanly rebuilt before each test case and disposed of immediately afterward. Good testing design should not have state shared between tests.
Unit Test
A “unit test” is a test case covering the smallest unit of logic possible. Unit tests should generally be limited to one function, and not reliant on the behavior of other functions. Test objects or manually crafted variables should be used to simulate input and any state variables.
A unit test should prove that a function, when all preconditions are met, performs what it is expected to do. In design by contract parlance, all “requires” and “ensures” are met.
Integration Test
An “integration test” is a test case that includes dependent logic. Where a unit test makes sure that each function does what it’s supposed to, an integration test makes sure that the logic that could involve many such function calls is sound. In addition, integration tests make sure that the data flow is correctly set up. For instance, after a refactor, integration tests will make sure that the pieces of code are still interacting with each other properly.
Unit Test Example
Integration tests are simple enough: create data and pass it in to a function, and check it performs correctly. Every call it makes should perform correctly, and those parts are implicitly also under test.
Unit tests are where things differ. We need to understand how to break apart the test to validate only a single unit, that way we know WHY a specific part breaks. Let’s create an example scenario to see how to unit test.
Consider the path planner mentioned in the Dependency Injection section. Let’s make it in pseudocode:
combinePathSegments(waypoints: Sequence[Waypoint], pathAlgorithm: PathPlanner) -> Path:
path = Path.empty()
if any (waypoint not reachable for waypoint in waypoints):
raise InvalidWaypointError()
try:
for i in enumerate(waypoints[:end-1]):
path.addSegment(pathAlgorithm.getPath(waypoints[i], waypoints[i+1])
except NoPathFoundError as e:
e.segment = i
raise e
return path
Some unit tests immediately spring to mind
test_handle_empty_list
: test that an empty waypoint list returns an empty path.test_handle_singleton
: test that a single waypoint returns an empty path.test_handle_valid
: test that a valid path causes no problems.test_fail_on_invalid_waypoint
: a simple case, we want to verify that when an invalid waypoint is passed in tocombinePathSegments
, it raisesInvalidWaypointError
. Pass a waypoint that fails the reachability test, however it is defined, and verify the error is raised.test_fail_on_no_path_found
: here we want to verify that a path planner failing, which we are signaled to withNoPathFoundError
, is forwarded out with the additional information of which segment failed.
Using test objects example
Let’s look at two of these test cases and see how test objects like mocks, stubs, and fakes can be used.
test_handle_valid
Since the function relies on external behavior in PathPlanner
, we don’t want to actually test if a specific PathPlanner
implementation is valid. That’s an integration test, and doesn’t tell us if the logic in our code specifically is OK.
Rather than create data where we should get a valid path, we create a PathPlanner
stub to return a valid path. Our test is that when the PathPlanner
returns a path, we handle it correctly, after all.
Using the python Mock library, generate a PathPlannerStub
from the PathPlanner
interface, and set it to return a valid path.
In python’s syntax:
test_handle_valid():
pathPlannerStub = unittest.mock.Mock(spec=PathPlanner) # creates stub of the PathPlanner interface
## pathPlannerStub.getPath is also a stub, `side_effect` attribute sets return on repeated calls
pathPlannerMock.getPath.side_effect = [Path([(0,0), (0,1)]),
Path([(0,1), (1,1)])]
result = combinePathSegments([(0,0), (0,1), (1,1)], pathPlannerMock)
expected = Path.empty()
expected.add_segment(Path([(0,0), (0,1)]))
expected.add_segment(Path([(0,1), (1,1)]))
assert result == expected
Note we used the mock library here to implement a stub. We could have written a class for this as well, but Python mocks are versatile and cover most every use case for a test object, since they duck type and match the interface as expected. Either option is valid.
test_fail_on_no_path_found
This is another great case for a stub. Generate a PathPlannerStub
from PathPlanner
, and set its behavior to fail at a specific, known point.
In python’s syntax:
test_fail_on_no_path_found():
pathPlannerMock = unittest.mock.Mock(spec=PathPlanner) # creates mock of PathPlanner interface
# pathPlannerMock.getPath is also a mock, `side_effect` attribute sets return on repeated calls
pathPlannerMock.getPath.side_effect = [Path.empty(),
Path.empty(),
NoPathFoundError(),
Path.empty()]
# the path planner will fail on the 3rd call, aka index 2
test_waypoints = [Waypoint()] * 4
# this asserts that the call in this with block will raise a specified type of exception
with pytest.raises(NoPathFoundError) as e:
combinePathSegments(test_waypoints, pathPlannerMock)
# we now have the exception raised as `e`, lets verify that e.segment is 2
assert e.segment == 2, f’Reported incorrect segment as cause of exception’
Hopefully this gives you a good start on seeing where unit tests differ from the more natural feeling integration tests. We want to isolate the logic of a specific unit and verify it, not relying on dependencies or carefully constructed data whenever possible.
Test Runners for ROS
Pytest
Pytest is the standard test runner for python, expanding on and improving the standard unittest library.
- Tests cases are written as python functions or methods, grouped in files that usually follow the naming convention
test_xxx.py
. These suites are handled by pytest itself, which handles calling the individual cases, silently adds fixtures and information from theconftest.py
configuration file, and handles recording test outcomes and capturing standard output/error streams. - Monkey Patching is handled using the
monkeypatch
fixture. Methods are provided for patching code. Once an individual test case finishes, the original behavior is restored so there is no side effect on other cases. - Mock support is provided in the
unittest.Mock
library. Mocks created using this library can automatically match the interface of existing objects, record attribute access and call arguments, and can have members that are mocked further (like we used in our examples above). - In addition, mocks can be monkey patched into existing code using the various forms of the
patch
feature. In fact, most monkey patching will probably be done solely through the mock library. Patches can be used as decorators, as a context manager in awith
block, or as a function that can also accept a mock. - Fixtures are provided via pytest in the fixtures module. A fixture can be defined using the
@pytest.fixture()
decorator within a test suite, or in a separateconftest.py
file that will be common to all test files in its current directory. Additionally, pytest provides a number of fixtures by default for things like capturing stdout/stderr and handling temporary file and directory creation. Themonkeypatch
fixture above is an example. Fixtures are included in a test case simply by adding the name of the fixture as an argument to the test case, and if the fixture provides an object for additional functionality, it can be accessed from that argument.
Test Layout
A typical ROS package can be laid out as follows:
example_package
example_package
__init__.py
[code files]
resource
example_package
test
[test files go here]
CHANGELOG.rst
LICENSE
package.xml
README.md
setup.py
setup.py
has two notable changes to make it test aware.
setup(
…
packages=find_packages(exclude=[‘test’]),
tests_require=[‘pytest’]
)
Googletest/GTest
C++ tests for ROS are generally written using the Googletest framework (GTest for short).
- Test Cases for GTest are written using macros imported from
gtest/gtest.h
. The basic form is theTEST()
macro, written as follows:
TEST(TestSuiteName, TestCaseName) {
// asserts and test body
}
Macros like EXPECT_EQ
and ASSERT_STREQ
provide assertions, and can have error strings piped to them with <<
like a normal pipe. EXPECT_*
macros record failures but don’t exit the test case, whereas ASSERT_*
macros exit on failure.
- Monkey Patching is not easily done in C++, and is generally bad form. Using dependency injection, tests can replace behavior by injecting test specific code.
- Mocks are handled using the gMock framework. To effectively use them, your code must again utilize dependency injection. You can then create a mock as a derived class of the interface you inject. gMock provides the
MOCK_METHOD()
macro that will be used to define the mocked methods of the interface. In tests, expectations on the mocks can be set using theEXPECT_CALL()
macro, which has functionality for checking arguments, number of calls, and behavior. - Fixtures are provided via deriving a class from
::testing::Test
. They can be included in tests by using theTEST_F
macro instead ofTEST
and including the fixture as the first argument.
Test Layout
A typical C++ package would be laid out as follows:
example_package
include
[header files]
src
[source files]
test
[test files]
CMakeLists.txt
package.xml
README.md
CMakeLists.txt
will require a separate block to handle registering the tests, explained in the ament_cmake user docs. Each test suite (aka .cpp file in the test
dir) will need to be added with the ament_add_gtest
macro, along with instructions to link libraries or declare dependencies on external targets. An example can be found in the tf2
library.
Invoking Tests
Once tests are set up for a package, colcon test
can handle running all tests for you.
To use colcon test
, invoke it from the workspace after running colcon build
.
ubuntu@ros2-dev ~/workspace $ colcon test
All tests in the workspace will be run, and results will be printed and logged.
To run only specific packages, use the --package-select
argument.
If the workspace was built with --merge-install
, include that flag in the colcon test
invocation as well.
For additional output, use the --event-handlers
argument. --event-handlers=console_cohesion+
is a good default.
See the documentation for more specifics on using colcon test.
Hopefully this gives you a good starting place on understanding how to test your code. Solid test frameworks make iteration on code much more enjoyable for a developer.
Talk to us today
Interested in running Ubuntu in your organisation?
Newsletter signup
Are you building a robot on top of Ubuntu and looking for a partner? Talk to us!
Related posts
Optimise your ROS snap – Part 2
Welcome to Part 2 of the “optimise your ROS snap” blog series. Make sure to check Part 1 before reading this blog post. This second part is going to present...
Optimise your ROS snap – Part 1
Do you want to optimise the performance of your ROS snap? We reduced the size of the installed Gazebo snap by 95%! This is how you can do it for your snap....
ROS orchestration with snaps
Application orchestration is the process of integrating applications together to automate and synchronise processes. In robotics, this is essential,...