Control of Neuralprostheses III: Self Adaptive Neuro-Fuzzy Control Using Reinforcement Learning
Adam Thrasher, Feng Wang, Brian Andrews, Richard Williamson Department of Biomedical Engineering, Univeristy of Alberta, Edmonton, Canada
ABSTRACT
Preliminary results are presented of a new functional electrical stimulation (FES) control methodology based on an adaptive fuzzy network (AFN) trained using supervised and reinforcement machine learning techniques. The FES application example used to test these controllers used a computer model of swing phase assisted by a powered hybrid FES orthosis. An open-loop controller was optimized and used to compare the performance of the new AFN controllers. The supervised learning controller was able to converge to a control strategy similar to that of the optimized open- loop controller in typically ten trials. Using very simple reinforcement signals, the reinforcement learning AFN controller was able to converge to the optimized open-loop strategy after several hundred trials. The reinforcement learning controller has the additional ability to continually re-adapt to changes in the system parameters which caused the other controllers to fail.
BACKGROUND
The powered hybrid system for paraplegic gait shown in figure 1 is based on the modular hybrid orthosis proposed in [1]. It consists of floor reaction orthoses (FRO) [2] worn on each leg plus a hip brace which restricts leg movement to the sagittal plane. An actuator positioned at each hip joint produces variable external torque. In addition, a pair of surface electrodes is positioned on each thigh to provide functional electrical stimulation (FES) of the knee extensor muscles. A controller is required for this system which will control the hip brace actuator's torque output, and the FES pulsewidth.
This hybrid system was modeled as the simple compound pendulum shown in figure 2. A 3- factor knee extensor muscle model [3] was used to simulate FES. A 2,3-order adaptive step size Runge-Kutta integration algorithm was used to solve the model dynamics.
figure: FIG1.TIF
Figure 1. Hybrid orthoses with floor reaction orthosis, powered hip brace, and FES electrodes positioned for quadriceps stimulation.
figure: FIG2.TIF
Figure 2. Two-segment compound pendulum model of swing leg.
RESEARCH QUESTION
What are the advantages of reinforcement learning over supervised learning in adaptive fuzzy networks (AFN)? How can reinforcement learning be used to continually re-adapt and cope with significant changes in the system parameters?
METHODS
First, a simple open-loop controller which provided two outputs, hip torque and knee extensor stimulation, was designed and optimized. The muscle stimulation was a rectangular waveform with adjustable mark- space timing. The hip torque was an exponential function with adjustable amplitude and on/off timing. The open-loop controller was optimized using a least-squares object function to achieve sufficient foot clearance with minimal energy output.
Two independent supervised learning AFN controllers were developed, one for hip torque control, and another for quadriceps stimulation control. Both controllers used two inputs, the hip and knee angle (assuming that these values were available), and provided two outputs, hip torque and quadriceps stimulation pulse width. Both of these AFN controllers used five Gaussian membership functions for each input and defined 25 fuzzy rules using the Sugeno- style inference method with fuzzy singletons [4, 5]. Both of these were trained using a gradient learning algorithm with the optimized open-loop controller as the training set.
Two separate reinforcement learning AFN controllers were also developed, one for hip torque control, and the other for quadriceps stimulation. Like the supervised learning AFN controllers, the reinforcement learning controllers used hip and knee angle for input. However, nine Gaussian membership functions were used for each input resulting in a greater rule base of 81 rules. This larger rule base is to ensure that there is accurate reinforcement predication.
Both reinforcement learning AFN controllers were trained using Williams' REINFORCE algorithm [6, 7] in conjunction with Sutton's Temporal Differences algorithm [8, 9]. The controllers were initialized with no knowledge of the plant. The reinforcement signal was on when the foot collided with the ground during mid- swing.
The effect of changing model parameters was examined for each AFN controller. Muscle output forces were reduced and increased to simulate fatigue and potentiation. Body mass and geometric parameters were altered simulating a change of patient.
RESULTS
After 15 trials, the supervised learning AFN controllers converged to a control strategy very similar to that of the optimized open-loop controller. Figure 3 shows the result of the learning.
figure: FIG3.TIF
Figure 3. Supervised learning AFN controller outputs. The optimized open-loop controller output is shown with dashed lines.
When the quadriceps muscle output was decreased by 15%, the open-loop controller failed, while the supervised learning AFN was able to cope with the change and overcome the loss by using more hip torque.
The reinforcement learning AFN controllers required approximately 150 trials to converge to a solution. The resulting control strategy was similar to the optimized open-loop and supervised learning AFN controllers.
When fatigue was simulated by reducing quadriceps muscle output by 40%, the reinforcement learning controllers initially failed. However, after 20 trials, they converged to a new solution which provided higher hip torque at the initial swing phase and produced successful gait. When given significant variations in body mass and geometric parameters, the reinforcement learning controllers converged to new solutions in usually less than 30 trials.
DISCUSSION
The supervised learning AFN controllers are similar to the open-loop controller, but have the advantage of closed-loop control. This was seen when fatigue was simulated.
The feasibility of reinforcement learning with an AFN was demonstrated. Although a specific biomechanical model was used representing a specific implementation of a hybrid FES orthosis, the control method is general purpose. The method requires no knowledge of the neuromuscular plant and can automatically adapt to biomechanical differences between patients and time varying changes such as muscle fatigue. The reinforcement learning controllers' use of partial training to re-adapt to new patients is practical for clinical application. In contrast to non-adaptive fuzzy logic networks [10], the teaching does not require a human expert to handcraft the fuzzy rules.
Further work is now underway to demonstrate the feasibility of the reinforcement learning AFN controllers in clinical application.
REFERENCES
[1] Andrews, B.J. (1992) Hybrid FES orthosis for paraplegic locomotion. Studies in Health Technology and Informatics, IOS Press, vol. 5:50-56
[2] Saltiel, J. (1969) A one piece laminated knee locking short leg brace. Orthotics & Prosthetics. June: 68-72.
[3] Veltink, P.H., Chizeck, H.J., Crago, P.E., and el Bialy, A. (1992) Nonlinear joint angle control for artificially stimulated muscle. IEEE Trans. On Biomedical Engineering, 39(4):368-380.
[4] Sugeno, M. (1985) Industrial applications of fuzzy control. Elsevier Science Pub. Co.
[5] Wang, F. And Andrews, B.J. (1994) Adaptive fuzzy logic controller for FES- computer simulation study. Proc. Of Ann. Int. Conf. Of IEEE/EMBS, 16:406-407.
[6] Williams, R.J. (1987) A class of gradient- estimating algorithms for reinforcement learning in neural networks. IEEE Int. Conf. On Neural Networks, II:601-608.
[7] Williams, R.J. (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256.
[8] Sutton, R.S. (1984) Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts.
[9] Sutton, R.S. (1988) Learning to predict by methods of temporal differences. Machine Learning, 3:9-44.
[10] Ng, S.K., Chizeck, H.J. (1993) A fuzzy logic gait event detector for FES using Firmware Transitional Logic. Proc. IEEE (EMBS) 10th Ann. Internat. Conf., 1562- 1563.
ACKNOWLEDGEMENTS
The authors Adam Thrasher and Richard Williamson are recipients of awards from the Natural Sciences and Engineering Research Council (NSERC). This research was funded in part by the Medical Research Council (MRC), and the Alberta Heritage Foundation for Medical Research (AHFMR).
ADDRESS
Adam Thrasher Dept. Biomedical Engineering, University of Alberta, Edmonton, Canada T6G 2G3
email: adam.thrasher@ualberta.ca
