Torque Limited Simple Pendulum
Introduction
The project is an opensource and lowcost kit to get started with underactuated robotics. The kit targets lowering the entry barrier for studying underactuation in real systems which is often overlooked in conventional robotics courses. It implements a torquelimited simple pendulum built using a quasidirect drive motor which allows for a low friction, torque limited setup. This project describes the offline and online control methods which can be studied using the kit, lists its components, discusses best practices for implementation, presents results from experiments with the simulator and the real system. This repository describes the hardware (CAD, Bill Of Materials (BOM) etc.) required to build the physical system and provides the software (URDF models, simulation and controller) to control it.
See a video the simple pendulum in action:
Documentation
The hardware setup and the motor configuration are described in their respective readme files.
The dynamics of the pendulum are explained here.
In order to work with this repository you can get started here and read the usage instructions here for a description of how to use this repository on a real system. The instructions for testing the code can be found here.
 Hardware & Testbench Description
 Motor Configuration
 Software Installation Guide
 Usage Instructions
 Code Testing
Overview of Methods
Trajectory Optimization tries to find a trajectory of control inputs and states that is feasible for the system while minimizing a cost function. The cost function can for example include terms which drive the system to a desired goal state and penalize the usage of high torques. The following trajectory optimization algorithms are implemented:
 Direct Collocation: A collocation method, which transforms the optimal control problem into a mathematical programming problem which is solved by sequential quadratic programming. For more information, click here
 Iterative Linear Quadratic Regulator (iLQR): A optimization algorithm which iteratively linearizes the system dynamics and applies LQR to find an optimal trajectory. For more information, click here
 Feasability driven Differential Dynamic Programming (FDDP): Trajectory optimization using locally quadratic dynamics and cost models. For more information about DDP, click here and for FDDP, click here
Trajectory Following controllers act on a precomputed trajectory and ensure that the system follows the trajectory properly. As the PID and the tvLQR controller react to the actual state of the pendulum they can also be understood as closed loop controllers. The trajectory following controllers implemented in this project are:
 Feedforward torque Controller: Simple forwarding of a control signal from a precomputed trajectory.
 ProportionalIntegralDerivative (PID): A controller reacting to the position error, integrated error and error derivative to a precomputed trajectory.
 Timevarying Linear Quadreatic Regulator (tvLQR): A controller which linearizes the system dynamics at every timestep around the precomputed trajectory and uses LQR to drive the system towards this nominal trajectory.
Closed Loop or feedback controllers take the state of the system as input and ouput a control signal. Because they are able to react to the current state, they can cope with perturbations during the execution. The following feedback controllers are implemented:
 Gravity Compensation: A controller compensating the gravitational force acting on the pendulum. The pendulum can be moved as if it was in zerog.
 Energy Shaping: A controller regulating the energy of the pendulum. Drives the pendulum towards a desired energy level.
 Linear Quadratic Regulator (LQR): Linearizes the dynamics around a fixed point and drives the pendulum towards the fixpoint with a quadratic cost function. Only useable in a state space region around the fixpoint.
 Model predictive control with iLQR: A controller which performs an iLQR optimization at every timestep and executes the first control signal of the computed optimal trajectory.
Reinforcement Learning (RL) can be used to learn a policy on the state space of the robot. The policy, which has to be trained beforehand, receives a state and outputs a control signal like a feedback controller. The simple pendulum is can be formulated as a RL problem with two continuous inputs and one continuous output. Similar to the cost function in trajectory optimization, the policy is trained with a reward function. The controllers acting on the policies are closed loop controllers. The following RL algorithms are implemented:
 Soft Actor Critic (SAC): An offpolicy model free reinforcement learning algorithm. Maximizes a tradeoff between expected return of a reward function and entropy, a measure of randomness in the policy. reference
 Deep Deterministic Policy Gradient (DDPG): An offpolicy reinforcement algorithm which concurrently learns a Qfunction and uses this Qfunction to train a policy in the state space. reference
The implementations of direct collocation and TVLQR make use of drake, iLQR only makes use of the symbolic library of drake, FDDP makes use of Crocoddyl, SAC uses the stablebaselines3 implementation and DDPG is implemented in tensorflow. The other methods use only standard libraries.
The controllers can be benchmarked in simulation with a set of predefined criteria.
Authors
 Shivesh Kumar (Project Supervisor)
 Felix Wiebe (Software Maintainer)
 Jonathan Babel (Hardware Maintainer)
 Daniel Harnack
 Heiner Peters
 Shubham Vyas
 Melya Boukheddimi
 Mihaela Popescu
Feel free to contact us if you have questions about the test bench. Enjoy!
Contributing
 Fork it (https://github.com/yourname/yourproject/fork)
 Create your feature branch (
git checkout b feature/fooBar
)  Commit your changes (
git commit am 'Add some fooBar'
)  Push to the branch (
git push origin feature/fooBar
)  Create a new Pull Request
See Contributing for more details.
Safety Notes
When working with a real system be careful and mind the following safety measures:

Brushless motors can be very powerful, moving with tremendous force and speed. Always limit the range of motion, power, force and speed using configurable parameters, current limited supplies, and mechanical design.

Stay away from the plane in which pendulum is swinging. It is recommended to have a safety net surrounding the pendulum in case the pendulum flies away.

Make sure you have access to emergency stop while doing experiments. Be extra careful while operating in pure torque control loop.
Acknowledgements
This work has been performed in the VeryHuman project funded by the German Aerospace Center (DLR) with federal funds (Grant Number: FKZ 01IW20004) from the Federal Ministry of Education and Research (BMBF) and is additionally supported with project funds from the federal state of Bremen for setting up the Underactuated Robotics Lab (Grant Number: 201001103/202132).
License
This work has been released under the BSD 3Clause License. Details and terms of use are specified in the LICENSE file within this repository. Note that we do not publish thirdparty software, hence software packages from other developers are released under their very own terms and conditions, e.g. Stable baselines (MIT License) and Tensorflow (Apache License v2.0). If you install thirdparty software packages along with this repo ensure that you follow each individual license agreement.