Skip to content

Commit 75d030f

Browse files
author
Your Name
committed
LQR
0 parents  commit 75d030f

17 files changed

+952
-0
lines changed

G.txt

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
1 0.0049978 9.4206e-06 1.5702e-08
2+
0 0.9991 0.0037678 9.4206e-06
3+
0 -3.2043e-06 1.0002 0.0050003
4+
0 -0.0012816 0.075387 1.0002

H.txt

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2.2429e-05
2+
0.0089704
3+
3.2043e-05
4+
0.012816

LQR.py

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
import numpy as np
2+
3+
class LQR:
4+
def __init__(self, x_n, u_n, F_t, f_t, C_t, c_t):
5+
self.x_n = x_n
6+
self.u_n = u_n
7+
assert F_t.shape == (x_n, x_n+u_n)
8+
assert f_t.shape == (x_n, 1)
9+
assert C_t.shape == (x_n+u_n, x_n+u_n)
10+
assert c_t.shape == (x_n+u_n, 1)
11+
self.F_t = F_t
12+
self.f_t = f_t
13+
self.C_t = C_t
14+
self.c_t = c_t
15+
16+
self.C_xtxt = C_t[:x_n, :x_n]
17+
self.C_xtut = C_t[:x_n, x_n:]
18+
self.C_utxt = C_t[x_n:, :x_n]
19+
self.C_utut = C_t[x_n:, x_n:]
20+
21+
self.c_xt = c_t[:x_n, :]
22+
self.c_ut = c_t[x_n:, :]
23+
24+
def __step(self, x_t, u_t):
25+
return self.F_t@np.concatenate([x_t, u_t]) + self.f_t
26+
27+
def __call__(self, x_0, T):
28+
V_t = np.zeros((self.x_n, self.x_n))
29+
v_t = np.zeros((self.x_n, 1))
30+
K = [np.zeros((self.u_n, self.x_n)) for _ in range(T)]
31+
k = [np.zeros((self.u_n, 1)) for _ in range(T)]
32+
for i in range(T-1, -1, -1):
33+
Q_t = self.C_t + self.F_t.T@V_t@self.F_t
34+
q_t = self.c_t + self.F_t.T@V_t@self.f_t + self.F_t.T@v_t
35+
Q_xtxt = Q_t[:self.x_n, :self.x_n]
36+
Q_xtut = Q_t[:self.x_n, self.x_n:]
37+
Q_utxt = Q_t[self.x_n:, :self.x_n]
38+
Q_utut = Q_t[self.x_n:, self.x_n:]
39+
q_xt = q_t[:self.x_n, :]
40+
q_ut = q_t[self.x_n:, :]
41+
K[i] = -np.linalg.inv(Q_utut)@Q_utxt
42+
k[i] = -np.linalg.inv(Q_utut)@q_ut
43+
V_t = Q_xtxt + Q_xtut@K[i] + K[i].T@Q_utxt + K[i].T@Q_utut@K[i]
44+
v_t = q_xt + Q_xtut@k[i] + K[i].T@q_ut + K[i].T@Q_utut@k[i]
45+
x_t = x_0.copy()
46+
u = [np.zeros((self.u_n, 1)) for _ in range(T)]
47+
for i in range(T):
48+
u[i] = K[i]@x_t + k[i]
49+
x_t = self.__step(x_t, u[i])
50+
return u

README.html

+449
Large diffs are not rendered by default.

README.md

+182
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# LQR控制一阶倒立摆
2+
3+
## Folder structure
4+
```
5+
├─env # Gym env implementation of Inverted Pendulum
6+
├─figures # save images and result gif
7+
│ G.txt # txt to save G
8+
│ getGH.m # matlab file to get G and H
9+
│ H.txt # txt to save H
10+
│ LQR.py # LQR generalized implementation
11+
│ README.md # README
12+
│ test_gym.py # Run LQR in the environment of Inverted Pendulum
13+
│ utils.py # Some public functions
14+
```
15+
16+
## 动力学方程建立
17+
一阶倒立摆系统的简化模型如图1所示,该模型由一个沿水平方向运动的小车以及与之相连接的单摆构成,其中的一些物理参数如下表所示。
18+
*图中 $\theta$是摆与竖直向下方向的夹角,代码中`theta`表示摆与竖直向上方向的夹角,即与下式推导中的 $\phi$相同;摆杆质量分布均匀,质心位于摆的中心位置。*
19+
<center>
20+
<img style="border-radius: 0.3125em;
21+
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);"
22+
src="./figures/system%20model.jpg">
23+
<br>
24+
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
25+
display: inline-block;
26+
color: #999;
27+
padding: 2px;">图1 一阶倒立摆系统模型</div>
28+
</center>
29+
30+
<center>
31+
32+
<table>
33+
<tr>
34+
<th> 属性 </th>
35+
<th> 值 </th>
36+
</tr>
37+
<tr>
38+
<td> 小车质量M </td>
39+
<td> 0.5 kg </td>
40+
</tr>
41+
<tr>
42+
<td> 小车受到的阻尼b </td>
43+
<td> 0.1 s^-1</td>
44+
</tr>
45+
<tr>
46+
<td> 摆的质量m </td>
47+
<td> 0.1 kg </td>
48+
</tr>
49+
<tr>
50+
<td> 摆的1/2长度l </td>
51+
<td> 0.3 m </td>
52+
</tr>
53+
<tr>
54+
<td> 摆的转动惯量I </td>
55+
<td> 0.012 kg*m^2 </td>
56+
</tr>
57+
<tr>
58+
<td> 重力加速度g </td>
59+
<td> 9.8 m/s^2 </td>
60+
</tr>
61+
<tr>
62+
<td> 采样周期tau </td>
63+
<td> 0.005 s </td>
64+
</tr>
65+
</table>
66+
</center>
67+
68+
分别对摆、车体建立动力学方程:
69+
$$\begin{equation}
70+
(I+ml^2)\ddot{\theta}+mgl\sin \theta=-ml\ddot{x}\cos \theta
71+
\end{equation}$$
72+
73+
$$\begin{equation}
74+
(M+m)\ddot{x}+b\dot{x}+ml\ddot{\theta}\cos \theta - ml\dot{\theta}^2\sin \theta = F
75+
\end{equation}$$
76+
其中,$\theta$表示摆杆与竖直向下方向的夹角;$x$表示小车的位移
77+
78+
使用完整动力学方程(1)(2)离散化后,作为Gym仿真中的状态转移方程:
79+
1. 假设当前时刻为k,则先通过当前时刻的输入$F(k)$和k-1时刻的$x(k-1), \dot{x}(k-1),\theta(k-1),\dot{\theta}(k-1)$计算当前时刻的$\ddot{x}(k),\ddot{\theta}(k)$:
80+
$$
81+
\begin{cases}
82+
\ddot{\theta}(k)=(-g\sin\theta(k-1) - t\cos\theta(k-1)) / (\frac{I}{m} + l - \frac{ml\cos^2\theta(k-1)}{(M+m)})
83+
\\
84+
\ddot{x}(k)= t - \frac{ml\ddot{\theta}(k)\cos\theta(k-1)}{M+m}
85+
\end{cases}
86+
$$
87+
其中$t=\frac{F(k) + ml[\dot{\theta}(k-1)]^2\sin\theta(k-1) - b\dot{x}(k-1)}{M+m}$
88+
89+
2. 通过当前时刻的$\ddot{x}(k),\ddot{\theta}(k)$计算当前时刻的所有状态量$x(k), \dot{x}(k),\theta(k),\dot{\theta}(k)$:
90+
$$
91+
\begin{cases}
92+
x(k)=x(k-1)+\Delta t\cdot\dot{x}(k-1)
93+
\\
94+
\dot{x}(k)=\dot{x}(k-1)+\Delta t\cdot \ddot{x}(k)
95+
\\
96+
\theta(k)=\theta(k-1)+\Delta t\cdot\dot{\theta}(k-1)
97+
\\
98+
\dot{\theta}(k)=\dot{\theta}(k-1)+\Delta t\cdot\ddot{\theta}(k)
99+
\end{cases}
100+
$$
101+
102+
Gym方程环境程序中需要注意两个地方:
103+
* 方程中$\theta$表示摆杆与竖直向下方向夹角,而程序中`theta`表示摆杆与竖直向下方向夹角,需要修改计算$\ddot{\theta}(k)$时为$(g\sin\theta(k-1) - \dots$,即第一项没有负号;
104+
* 使用增量方式计算`theta`时,在$\pi$也就是摆杆竖直向下附件需要考虑跳变,程序中`theta`(即图中$\phi$)数值对应的位置图示如下图2所示,
105+
<center>
106+
<img style="border-radius: 0.3125em;
107+
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);"
108+
src="figures\phi数值位置对应图.png">
109+
<br>
110+
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
111+
display: inline-block;
112+
color: #999;
113+
padding: 2px;">图2 角度位置对应图</div>
114+
</center>
115+
116+
考虑跳变后对$\mathrm{theta}(k)$做如下约束:
117+
$$
118+
\mathrm{theta}(k)=
119+
\begin{cases}
120+
\mathrm{theta}(k)-2\pi, & \mathrm{theta}(k-1) < \pi\ and\ \mathrm{theta}(k) \geq \pi
121+
\\
122+
\mathrm{theta}(k)+2\pi, & \mathrm{theta}(k-1) > -\pi\ and\ \mathrm{theta}(k) \leq -\pi
123+
\\
124+
\mathrm{theta}(k), & otherwise
125+
\end{cases}
126+
$$
127+
128+
## 非线性动力学方程的线性化
129+
由于上述非线性方程无法直接使用LQR直接进行状态空间建模和求解最优控制问题,因此需要对非线性动力学方程线性化。
130+
131+
令$\theta=\pi+\phi$,即$\phi=\theta-\pi$,选取状态变量$X=[\begin{array}{cccc}x & \dot{x} & \phi & \dot{\phi}\end{array}]^T$,在$X_0=[\begin{array}{cccc}0 & 0 & 0 & 0\end{array}]^T$附近线性化。得到线性化动力学方程:
132+
$$\begin{equation}
133+
(M+m)\ddot{x}+b\dot{x}-ml\ddot{\phi}=u
134+
\end{equation}$$
135+
136+
$$\begin{equation}
137+
(I+ml^2)\ddot{\phi}-mgl\phi=ml\ddot{x}
138+
\end{equation}$$
139+
140+
## 一阶倒立摆系统的状态空间表达式
141+
根据线性化的动力学方程,得到连续域状态空间表达式$\dot{X}=AX+Bu$,其中$X=[\begin{array}{cccc}x & \dot{x} & \phi & \dot{\phi}\end{array}]^T$,A,B矩阵如下式:
142+
$$
143+
A =
144+
\left[\begin{matrix}
145+
0 & 1 & 0 & 0 \\
146+
0 & -\frac{(I+ml^2)b}{p} & \frac{m^2l^2g}{p} & 0\\
147+
0 & 0 & 0 & 1 \\
148+
0 & -\frac{mlb}{p} & \frac{mgl(M+m)}{p} & 0
149+
\end{matrix}\right],
150+
\quad B=
151+
\left[\begin{matrix}
152+
0 \\
153+
\frac{I+ml^2}{p}\\
154+
0\\
155+
\frac{ml}{p}
156+
\end{matrix}\right]
157+
\\
158+
where,\ p=I(M+m)+Mml^2
159+
$$
160+
161+
## LQR控制细节
162+
* 得到连续状态空间方程$\dot{X}=AX+Bu$后,可以使用matlab`c2d`函数转变为离散化状态空间$\dot{X}(k+1)=Gx(k)+Hu(k)$
163+
```Matlab
164+
[G,H]=c2d(A,B,Ts) % Ts是采样周期
165+
```
166+
167+
* LQR中$\mathbf{F}_t=[G\ H], \mathbf{f}_t=\mathbf{0}$,通过调参最终确定$\mathbf{C}_t=diag(10\ 15\ 30\ 6\ 1), \mathbf{c}_t=\mathbf{0}$;
168+
169+
* 基于状态反馈的动态LQR控制,按如下方式进行控制预测和实际控制:假设当前为k时刻,已知观测状态$X(k)$,以LQR控制时长T进行控制预测,但实际应用于控制的序列为前t步,即到k+t时刻则基于当时的观测状态$X(k+t)$再进行LQR控制预测之后k+t+1到k+t+T的控制输出,取T=100,t=15。
170+
171+
## 控制结果
172+
由于是对非线性状态方程在**摆杆竖直向上附近线性化**得到的表达式进行LQR控制,因此选择初始状态时摆杆偏离竖直向上方向一个较小的值,经测试当初始状态$\phi(0)\in [-0.2\pi, 0.2\pi]$时,均可稳定收敛,$\phi(0)=0.2\pi$的结果如下所示:
173+
<center>
174+
<img style="border-radius: 0.3125em;
175+
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);"
176+
src="./figures/result.gif">
177+
<br>
178+
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
179+
display: inline-block;
180+
color: #999;
181+
padding: 2px;">图3 初始偏离竖直向上方向36°的LQR控制结果</div>
182+
</center>

__pycache__/LQR.cpython-36.pyc

2.24 KB
Binary file not shown.

__pycache__/utils.cpython-36.pyc

741 Bytes
Binary file not shown.

env/__init__.py

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
from gym.envs.registration import register
2+
3+
register(
4+
id='CartPoleContinuous-v0',
5+
entry_point='env.cartpole_continuous:CartPoleContinuousEnv',
6+
max_episode_steps=200,
7+
reward_threshold=195.0,
8+
)
395 Bytes
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)