Last note, Functionals, Euler’s Equation, and Equilibrium, reviewed the concept of functionals and the method that solves the extremum problems of the simplest integral form of functionals—the Euler’s equation. The derivation adopted in that post is a directional derivatives like method. This treatment was simply to bypass the review of calculus of variations at that time. Calculus of variations is the subject of mathematics directly concerns the extremum problems of functionals, which underlies quite a lot of mechanics laws. The notation \(\delta\) frequently appears in many principles of mechanics and numerical formulations of complex mechanics problems. This memento reviews the concept of calculus of variations.

We will start with the extremum problems of functionals and repeat a bit the method in last memento. Then we will review the concept of variations and the necessary condition for the extremum problems of simplest integral functional. Last we will have a look at the equivalence between two methods and how this leads us to calculus of variations.

## Extremum Problems of Functionals

We reviewed the definition of functionals and gave a visualization of the curve length functionals in the last memento. As soon as the functionals are defined, a main kind of problems emerges, that is the extremum problems of functionals. Here we confine our interests in the necessary conditions for the extremum problems of the simplest integral form of functionals,

$$I[y(x)] = \int_{x_0}^{x_1} f[x, y(x), y'(x)] \, dx. $$

A clear characterization of if the extremum is maximum or minimum or saddle requires an investigation on the higher order terms and will be omitted at this point. In engineering we care about the necessary condition for the minimum first and most because it provides a way to do the computation and we can validate the computation with experiments right on. We sometimes leave the rest of the work (proof of the existence of a functional and its minimum) to the last step to complete the circle. Study of the integral form functionals is due to its dominant appearance in mechanics. The extremum problem for this functional is defined as:

*Suppose we have a real-valued, integral form of functional \( I[y(x)] \), \( I[y(x)] = \int_{x_0}^{x_1} f[x, y(x), y'(x)] \, dx \) , whose input functions are defined on \(x \in [x_0, x_1]\), so \(y(x_0) = y_0, y(x_1)=y_1\), we want to find a specific form of function \(y(x)\) on \(x \in (x_0, x_1)\) that extremes the functional \(I[y(x)]\).*

## Derivative Method

To solve it, the first idea is to transform the functional into a normal function with a single auxiliary parameter so that we can borrow the extremum conditions for single-variable functions. This method is quite similar with taking the directional derivatives of functions of vectors (or Gâteaux derivatives).

Suppose \(y=y(x)\) is the function which extremes the \(I[y(x)]\), it must be so special that it wins out over all other similar (continuity) functions. To make the comparison with all other similar functions, we make a general form to represent all those functions: \(z(x) = y(x) + \epsilon \eta(s)\). Here \(\eta(x)\) is an arbitrary real-valued function that has the same continuity with \(y(x)\). For this integral form functional, \(y(x)\) should have at least continuous first order derivative in the domain of definition. \(\epsilon\) is a real number like a kind of scale. As stated in the definition of extremum problems, \(z(x)\) needs to satisfy \(z(x_0) = y_0\) and \(z(x_1) = y_1\), so \(\eta(x_0) = \eta(x_1) = 0\). The functional, with the input function \(z(x)\) has the value:

$$\begin{eqnarray} I[z(x)] & = & \int_{x_0}^{x_1} f[x, z(x), z'(x)] \, dx \\\\

& = & I[y(x) + \epsilon\eta(x)] = \int_{x_0}^{x_1} f\big(x, y(x)+\epsilon\eta(x), [y(x)+\epsilon\eta(x)]’\big) \, dx \\\\

& = & \int_{x_0}^{x_1} f[x, y(x)+\epsilon\eta(x), y'(x)+\epsilon {\eta}'(x)] \, dx. \end{eqnarray}$$

We try two different paths on interpreting the above functional. First, if \(y(x)\) is the special extremizer (the function to extreme \(I[z(x)]\)) and \(\eta(x)\) is assumed and fixed, after the integration calculation, the functionals falls back to a normal function of \(\epsilon\):

$$g(\epsilon) = I[y(x)+\epsilon \eta(x)] \bigg|_{special \ y(x), \ assumed\ \ \eta(x)}.$$

Continuing on this path, as a normal function, the necessary condition for \(g(\epsilon)\) to extreme is its first derivative vanishes:

$$\frac{dg(\epsilon)}{d\epsilon} = 0.$$

We have:

$$\begin{eqnarray}\frac{dg(\epsilon)}{d\epsilon} & = & \frac{\partial I[y(x)+\epsilon\eta(x)]}{\partial \epsilon} \\

& = & \frac{\partial I[z(x)]}{\partial \epsilon} = \int_{x_0}^{x_1} f[x, z(x), z'(x)] \, dx \\

& = & \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, z(x), z'(x)]}{\partial z} \frac{\partial z(x)}{\partial \epsilon} + \frac{\partial f[x, z(x), z'(x)]}{\partial z’} \frac{\partial z'(x)}{\partial \epsilon} \bigg) \, dx \\

& = & \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, z(x), z'(x)]}{\partial z} \frac{\partial [y(x)+\epsilon \eta(x)]}{\partial \epsilon} + \frac{\partial f[x, z(x), z'(x)]}{\partial z’} \frac{\partial [y'(x)+\epsilon \eta'(x)]}{\partial \epsilon}\bigg) \, dx \\

& = & \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, z(x), z'(x)]}{\partial z} \eta(x) + \frac{\partial f[x, z(x), z'(x)]}{\partial z’} \eta'(x) \bigg) \, dx \\

& = & 0 . \end{eqnarray}$$

So for the special \(y(x)\) and given \(\eta(x)\), solving above equation gives the value of \(\epsilon\) that extremes the functional for those fixed functions.

Now let’s try the second path. If we allow \(\eta(x)\) to vary, the equation \(\frac{dg(\epsilon)}{d\epsilon} = 0\) gives a relation that \(z(x)\), \(\epsilon\), and \(\eta(x)\) need to satisfy to necessarily make the \(I[z(x)]\) an extremum. Noticing when \(\epsilon = 0\), \(z(x) = [y(x) + \epsilon\eta(x)] = y(x)\) , and \(\partial z \) and \(\partial z’ \) fall back to \(\partial y\) and \(\partial y’\), so if we add this condition on \(\epsilon\). Together

$$\begin{cases} \frac{dg(\epsilon)}{d\epsilon} = 0 \\

\epsilon =0, \end{cases}$$

gives the relation that \(y(x)\) and \(\eta(x)\) need to satisfy to necessarily make \(I[y(x)]\) an extremum. In other words,

$$\frac{dg(\epsilon)}{d\epsilon} \bigg|_{\epsilon = 0}= \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y} \eta(x) + \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \eta'(x) \bigg) \, dx=0,$$

is the necessary condition for functional \(I[y(x)]\) to extreme.

At this point, we are only one step away from the great Euler’s equation (or Euler-Lagrange equation or Lagrange’s equation). The second term in the integral can be converted by the rule of integration by parts:

$$\frac{\partial f[x, y(x), y'(x)]}{\partial y’} \eta'(x) = \big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \cdot \eta(x) \big)’ – \frac{d}{dx}\bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \bigg) \eta(x).$$

So,

$$\frac{dg(\epsilon)}{d\epsilon} \bigg|_{\epsilon = 0}= \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big) \bigg) \eta(x) \, dx + \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \cdot \eta(x) \bigg) \bigg|_{x_0}^{x_1}. $$

Since \(\eta(x_0) = \eta(x_1) = 0\), the last term vanishes and

$$\frac{dg(\epsilon)}{d\epsilon} \bigg|_{\epsilon = 0}= \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big) \bigg) \eta(x) \, dx = 0. $$

Because of the arbitrariness of \(\eta(x)\), this fundamental lemma tells us the above equation holds only when:

$$\frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big)= 0.$$

This is the great famous Euler’s equation (or Euler-Lagrange equation or Lagrange’s equation).

## Variation Method

The second idea to solve the extremum problems of functionals is to treat functionals equally with normal real functions and work directly on the change of the value of the functional due to the “change” of the input functions—just the way the extremum theorem for a normal real functions was developed.

The theorem of extremum problems of normal real functions is based on the analysis of infinitesimal change of value of the function due to the change of the independent variable. If the change of function value at one point is positive definite then walking away from that point will increase the value of the function, which means a minimum (at least a local one) is formed at that point; if on the contrary, the change of function at one point is negative definite, then walking away from the point will decrease the function value, so a maximum is formed at the point. If the change is zero, a saddle point is formed. **So what we need is to find out the points where the the function has the definiteness in the change of its value.**

To work on the analysis, we first need some notations to represent the change of function or functional values due to the change of independent variable of functions or the input function of functionals and second the Taylor theorem to expand the functions or functionals at input variables or functions to quantify the change.

### Variations

As a normal function is defined, there are three parties involved: the independent variable, the dependent variable, and the relation that maps the independent variable to the dependent variable. Therefore, there are **two independent ways we can change the value of the function (i.e. the dependent variable): **

**change the independent variable,****change the mapping relation.**

We like using the upper case Greek letters \(\Delta\) to represent the finite change in quantities. So \(\Delta y\) stands for the finite change of dependent variable \(y\), following the above two ways, we can make change \(\Delta y\) to \(y\) by:

- changing the independent variable \(x \rightarrow x+ \Delta x\),
- changing the mapping relation \(f \rightarrow f+v\).

The corresponding change of dependent variable \(\Delta y\) by the two ways are:

- \(\Delta y = f(x+\Delta x)-f(x)\) by \(x \rightarrow x+ \Delta x\),
- \(\Delta y = (f+v)(x)-f(x)=f(x)+v(x)-f(x)=v(x)\) by \(f \rightarrow f+v\).

We later need to analyze the infinitesimal change of \(y\) due to the infinitesimal change of \(x\) and \(f\), we first need to agree on notations and languages. As the finite change \(\Delta x\) approaches infinitesimally small (\(\Delta x \rightarrow 0\)), we use Leibniz’s notation \(dx\) to represent the infinitesimal change, the corresponding finite change in \(y\) approaches infinitesimal change \(dy\). Similarly, if we have an infinitesimal small change in the mapping relation, which means the change of the function value caused by the change of mapping relation approaches zero, i.e. \(\Delta y =v(x) \rightarrow 0\), we adopt the lower case Greek letter \(\delta\) to represent the infinitesimal change of function value due to the infinitesimal change of mapping relation and to distinguish this change from the previous change due to \(dy\). Not only doing so, we also distinguish this change in the verbal level, **we now call the change of function value due to the mapping relation, the variation of the function**; **\(\delta\) is now the variation notation**. So \(\delta y\) stands for the infinitesimal change of function value due to the infinitesimal change of mapping relation. The flow chart below shows the above blah-blah concisely.

For a functional \(I[y(x)]\), the independent variable is itself a function \(y(x)\). We use \(\Delta I = I[y(x)+v(x)]-I[y(x)]\) to represent the change of functional caused by the change of the input function. Finite or infinitesimal is this change does not matter much, the linear part (and the quadratic part) of the change really matters. We’ll see later. Below animation shows the curve length functional—\(I[y(x)]=\int _{x_0} ^{x_1} \sqrt{1+{y'(x)}^2} \, dx\) when the input function varies in a sinusoidal form \(\delta y(x)=0.2\sin[\frac{i}{4}\pi(x-2)]\, i \in [1,2,3,4,5]\). The values of the functional approaches to \(4\sqrt{2}\), the length of the straight line connecting the two end points \((2, 2)\) and \((6, 6)\), as \(i\) decreases. (Yes, we sacrificed the infinitesimal requirement of using \(\delta\) in this visualization.)

Next we move on to the extremum problems. We will work on necessary condition of the definiteness of the finite change of functions and functionals.

### Single-variable Functions

Taylor’s theorem for a single-variable function \(y=f(x)\) is

$$f(x+\Delta x)=f(x) + f'(x)\Delta x + \frac{1}{2}f”(x)(\Delta x)^2+O({\Delta x}^3).$$

The finite difference (change) of \(y\) caused by \(\Delta x\) (the change in \(x\)) is

$$\Delta y =f(x+\Delta x)-f(x)= f'(x)\Delta x + \frac{1}{2}f”(x)(\Delta x)^2+O({\Delta x}^3).$$

The definiteness of \(\Delta y\) determines the maximum, minimum, or saddle of the function as discussed above. If the term \(f'(x) \Delta x \) vanishes, \(\Delta y\) will have at the least the semi-definiteness based on the sign of \(f”(x)\) since higher order terms \(O({\Delta x}^3)\) will not affect anything. In this case if \(f”(x)\)>0 (or \(f”(x)<0\)), \(\Delta y\) will be positive definite (or negative definite), \(y=f(x)\) will have minimum (or maximum); or if \(f''(x)=0\), \(f(x)\) will have saddle. So the necessary condition for the definiteness, and therefore, the extremum is $$f'(x)=0.$$

### Multi-variable Functions

Taylor’s theorem for a multi-variable function \(z=f(x,y)\) is

$$f(x+\Delta x, y+\Delta y)=f(x,y)+\frac{\partial f(x,y)}{\partial x}\Delta x+\frac{\partial f(x,y)}{\partial y}\Delta y+\frac{1}{2}\frac{\partial^2 f(x,y)}{\partial x^2}(\Delta x)^2+\frac{\partial^2 f(x,y)}{\partial x \partial y}\Delta x \Delta y +\frac{1}{2}\frac{\partial^2 f(x,y)}{\partial y^2}(\Delta y)^2+O({\Delta x}^3, {\Delta y}^3).$$

The finite difference (change) of \(z\) caused by \((\Delta x, \Delta y)\) is

$$\Delta z= f(x+\Delta x, y+\Delta y)-f(x,y)=\frac{\partial f(x,y)}{\partial x}\Delta x+\frac{\partial f(x,y)}{\partial y}\Delta y+\frac{1}{2}\frac{\partial^2 f(x,y)}{\partial x^2}(\Delta x)^2+\frac{\partial^2 f(x,y)}{\partial x \partial y}\Delta x \Delta y +\frac{1}{2}\frac{\partial^2 f(x,y)}{\partial y^2}(\Delta y)^2+O({\Delta x}^3, {\Delta y}^3). $$

Similarly, the necessary condition for the definiteness of \(\Delta z\), and therefore, the extremum of \(z=f(x,y)\) is:

$$\frac{\partial f}{\partial x}=0,$$

$$\frac{\partial f}{\partial y}=0.$$

### Simplest Integral Functionals

For the simplest integral functional

$$I[y(x)]=\int_{x_0} ^{x_1} f[x, y(x), y'(x)]\, dx ,$$

as an integration we don’t directly Taylor-expand the integration. Because

$$I[y(x)+\delta y(x)]=\int _{x_0} ^{x_1} f[x, y(x)+\delta y(x), y'(x)+\delta y'(x)] \, dx,$$

we actually need to expand the integrand rather than the whole integration. Use Taylor theorem for multi-variable functions,

$$\begin{eqnarray}I[y(x)+\delta y(x)] &=& \int _{x_0} ^{x_1} f[x, y(x)+\delta y(x), y'(x)+\delta y'(x)] \, dx \\

&=& \int _{x_0} ^{x_1} \bigg [ f(x, y, y’) + \frac{\partial f(x, y, y’)}{\partial y} \delta y + \frac{\partial f(x, y, y’)}{\partial y’} \delta y’ + \\

&\,& \frac{1}{2} \frac{\partial^2 f}{\partial y^2} (\delta y)^2+ \frac{\partial^2 f}{\partial y \partial y’} \delta y \delta y’ + \frac{1}{2} \frac{\partial^2 f}{\partial {y’}^2} (\delta y’)^2 + O \big[ (\delta y)^3, (\delta y’)^3 \big] \bigg ] \,dx \\

&=& I[y(x)] + \int _{x_0} ^{x_1} \bigg[ \frac{\partial f(x, y, y’)}{\partial y} \delta y + \frac{\partial f(x, y, y’)}{\partial y’} \delta y’ + \\

&\,& \frac{1}{2} \frac{\partial^2 f}{\partial y^2} (\delta y)^2+ \frac{\partial^2 f}{\partial y \partial y’} \delta y \delta y’ + \frac{1}{2} \frac{\partial^2 f}{\partial {y’}^2} (\delta y’)^2 + O \big[ (\delta y)^3, (\delta y’)^3 \big] \bigg ] \,dx .\end{eqnarray}$$

Thus, the variation of \(I[y(x)]\) caused by the small variation \(\delta y(x)\) and thus \(\delta y'(x)\) is

$$\begin{eqnarray}\Delta I &=& I[y(x)+\delta y(x)] – I[y(x)] \\

&=& \int _{x_0} ^{x_1} \bigg[ \frac{\partial f(x, y, y’)}{\partial y} \delta y + \frac{\partial f(x, y, y’)}{\partial y’} \delta y’ \bigg] \, dx \, + \\

&\,& \int _{x_0} ^{x_1} \bigg[ \frac{1}{2} \frac{\partial^2 f}{\partial y^2} (\delta y)^2+ \frac{\partial^2 f}{\partial y \partial y’} \delta y \delta y’ + \frac{1}{2} \frac{\partial^2 f}{\partial {y’}^2} (\delta y’)^2 \bigg] \, dx + O \big[(\delta y)^3, (\delta y’)^3 \big] \\

&=&\delta I + \delta^2 I + O \big[ (\delta y)^3, (\delta y’)^3 \big]. \end{eqnarray}$$

Here we introduced the new term \(\delta I\) to represent the linear part of the variation and \(\delta^2 I\) for the quadratic part of the variation. The necessary condition for the functional \(I[y(x)]\) to extreme is the linear variation vanishes:

$$\delta I = \int _{x_0} ^{x_1} \bigg [ \frac{\partial f(x, y, y’)}{\partial y} \delta y + \frac{\partial f(x, y, y’)}{\partial y’} \delta y’ \bigg ] \, dx =0.$$

Here is a tiny difference from the extreme conditions for multi-variable functions. In multi-variable functions \(\Delta x\) and \(\Delta y\) are two independent infinitesimal change, each constitutes a part of the linear change \(\Delta y\), so the necessary condition requires each coefficient to vanish. Here in the functional case, if \(\delta y\) is determined, \(\delta y’\) is determined, so there is some relation between the tow variations. The relation is the integration by parts. Manipulating using this rule we have

$$\delta I = \int_{x_0}^{x_1} \bigg( \frac{\partial f(x, y, y’)}{\partial y} – \frac{d}{dx}\big( \frac{\partial f(x, y, y’)}{\partial y’} \big) \bigg) \delta y(x) \, dx + \bigg( \frac{\partial f(x, y, y’)}{\partial y’} \cdot \delta y(x) \bigg) \bigg|_{x_0}^{x_1}=0. $$

Since \(\delta y(x)\) vanishes at two ends, we have

$$\delta I = \int_{x_0}^{x_1} \bigg( \frac{\partial f(x, y, y’)}{\partial y} – \frac{d}{dx}\big( \frac{\partial f(x, y, y’)}{\partial y’} \big) \bigg) \delta y(x) \, dx =0. $$

By the arbitrariness of \(\delta y\) and the fundamental theorem of calculus of variation, we also arrive at the Euler’s equation

$$\frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big)= 0.$$

## Calculus of Variations

Comparing the two integral form equations for the fixed boundary condition problem we obtained with the two methods:

$$g(\epsilon) = I[y(x)+\epsilon \eta(x)]$$

$$\frac{dg(\epsilon)}{d\epsilon} \bigg|_{\epsilon = 0}= \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big) \bigg) \eta(x) \, dx = 0, $$

and

$$\delta I = \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big) \bigg) \delta y(x) \, dx =0, $$

we have the expression for the first variation:

$$\delta I = \frac{dg(\epsilon)}{d\epsilon} \bigg|_{\epsilon = 0} .$$

In fact from the analogy of directional derivative and replacing \(\eta(x)\) with \(\delta y(x)\) we can directly get the above expression

$$\delta I = \lim_{\epsilon \rightarrow 0}\frac{I[y(x)+\epsilon \delta y(x)]-I[y(x)]}{\epsilon}=\lim_{\epsilon \rightarrow 0}\frac{g(\epsilon)-g(0)}{\epsilon}=\frac{d g(\epsilon)}{d\epsilon}\bigg | _{\epsilon =0}.$$

This expression is good for understanding the meaning of variation of functionals but may be a bit obstructive for calculation. Following the analogy, we can continue to define the functional derivative, which is the equivalent of gradient for functions of vectors.

For a scale function of vectors \(y=f(\boldsymbol{x})\), the directional derivative is

$$\boldsymbol{\nabla} _{\boldsymbol{v}}f(\boldsymbol{x})=\lim_{\epsilon \rightarrow 0} \frac{f(\boldsymbol{x}+\epsilon \boldsymbol{v})-f(\boldsymbol{x})}{\epsilon}=\boldsymbol{\nabla} f \cdot \boldsymbol{v}.$$

For the integral form functional, we already have

$$\delta I = \lim_{\epsilon \rightarrow 0}\frac{I[y(x)+\epsilon \delta y(x)]-I[y(x)]}{\epsilon}= \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big) \bigg) \delta y(x) \, dx =\int _{x_0}^{x_1} \frac{\delta I}{\delta y} (x) \delta y(x) \, dx. $$

Comparing the above two equations, with the understanding that the first variation of functionals is the equivalent of the directional derivative of functions of vectors and the integration of the product of two functions is the equivalent of the dot product operation of two vectors, we can define the first order functional derivatives, which plays an equivalent role with the gradient of functions of vectors.

$$\frac{\delta I[y(x)]}{\delta y(x)}= \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big). $$

The functional derivative operation has the similar linear, product, and chain properties as the differential calculus of normal functions. This seemingly justifies the name “calculus” of variations.

## Summary

Here are the gems of this long post.

- Solving the extremum problems of functionals is solving the below equations:

$$\delta I[y(x)]= \frac{dg(\epsilon)}{d\epsilon} \bigg|_{\epsilon = 0} =0.$$

- In the solution process before arriving at the end results, the intermediate results—the integral form equation the above equation produces is actually the weak form:

$$\delta I = \int_{x_0}^{x_1} \bigg( \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big) \bigg) \delta y(x) \, dx =0. $$

- The ultimate result obtained from the above equation is actually the Euler’s equation or the vanishing of the first order functional derivative, which is also actually the strong form:

$$\frac{\delta I[y(x)]}{\delta y(x)}= \frac{\partial f[x, y(x), y'(x)]}{\partial y} – \frac{d}{dx}\big( \frac{\partial f[x, y(x), y'(x)]}{\partial y’} \big)=0.$$

- A tip for calculation, calculating the first order variation of the integral form functional I[y(x)] is very much like calculating the differential of normal functions—following the chain rule:

$$\delta I = \int _{x_0} ^{x_1} \bigg [ \frac{\partial f(x, y, y’)}{\partial y} \delta y + \frac{\partial f(x, y, y’)}{\partial y’} \delta y’ \bigg ] \, dx =0.$$

I think (at least I hope) I finally sorted out the basic ideas and methods to solve extremum problems of functionals and the concept of variations of functionals. I tried my best to make derivation from scratch and self-contained rather than referring to direct theorem. Although it may not be the most effect way in terms of reviewing, but tracing each derivation step back to its origin and comprehending how the original ideas were raised and worked and composing them is enjoying because I felt like I refreshed and connected many old knowledge together and a bit painful for a mechanical person.