Every time I had a look at proofs of the Euler Lagrange Equation in the past I stumbled over certain steps which looked weird. The ones that put me off most involve the notation of partial derivatives like
\begin{equation*} \frac{\partial L}{\partial f}(x,f(x),f'(x)) \; \text{and}\; \frac{\partial L}{\partial f'}(x,f(x),f'(x)), \end{equation*}which are also used in the above mentioned Wikipedia article. In conjunction with mentions of "calculus of variations" I thought, wow, how can a function $L$ have a derivative relative to a function $f$ or even $f'$. What kind of concept is this? As a software devloper trained in strongly typed languages, the $df$ in the denominator just did not make any sense at all.
Meanwhile I realised that this is just a silly abuse of notation and $\partial L/\partial f$ denotes the partial derivative of $L$ with regard to its second parameter. I hate it! If we have, for example, $L(x, 3x^2 + px, 6x+p)$ how's that written then? Do we write $\partial L/\partial(3x^2 + px)$? Nonsense.
For the rest of this text, mainly written for myself to come back here and see the derivation in a way I understand it, I try to be as exact as possible about notation. For the partial derivative of a function $L$ of $n$ arguments I'll write $\partial_i L$ for $1\leq i\leq n$. In particular for $n=1$, $\partial_1 L$ has the exact same meaning as $L'$. The benefit of this notation is that it does not rely on the actual parameters, whether $x$ or $x^\mu$ or $f'$, it does not matter, the meaning is well defined.
So what is the Euler-Langrange Equation?
A functional is a map from a vector space to the underlying field. For the vector space we have here the sufficiently often continuously differentiable functions on $\R$, say $\C^2(\R, \R)$. Let $J:\C^2(\R,\R)\to R$ be a functional mapping functions to values in $\R$ defined as
$$ J(f) = \int_a^b L(x, f(x), f'(x)) dx $$where $L:\R^3\to\R$ is sufficently often continuously differentiable.
Now you can ask which function $f$ would make $J$ minimal or maximal at least in a sufficiently defined neighbourhood of $f$. To be fair, the Euler-Lagrange Equation does not help to find minimizing or maximizing functions $f$ but rather one such that any sufficiently small variation of $f$ either increases of decreases the value $J(f)$.
The trick employed is to change the problem into one about a function $\phi:\R\to\R$ the derivative $\partial_1\phi=\phi'$ of which shall be zero.
First we introduce a function $\eta:[a,b]\to\R$ such that $\eta(a)=\eta(b)=0$ and then define $\ge:\R\to\R$ as $\ge(x) = f(x)+\epsilon\eta(x)$. We see that $f(x) = g_0(x)$. With these definitions we now define our $\phi$ as
$$ \phi(\epsilon) = J(\ge) = \int_a^b L(x, \ge(x), \ge'(x)) dx . $$Note how $\phi(0) = J(f)$, so if we manage to find an $f$ such that $\phi'(0)=0$ then $\phi(0)$ is extremal or a saddle point and so should be $J(f)$ for small variations of $f$ as implemented by $\ge$.
Finding $\phi'(0)=0$ is mostly elementary calculus on multi parameter functions with one nice lemma thrown in at the end for good. We want
$$ 0 = \phi'(\epsilon) = \frac{d}{d\epsilon} \int_a^b L(x, f(x)+\epsilon\eta(x), f'(x)+\epsilon\eta'(x)) dx $$We may pull the derivative into the integral to get
$$ 0 = \int_a^b \frac{d}{d\epsilon} L(x, f(x)+\epsilon\eta(x), f'(x)+\epsilon\eta'(x)) dx . $$Now comes the part to use non-weird notation for the derivatives of $L$. $L$ is a function from $\R^3$ to $\R$ and it is applied to the function $H: \R^2\to\R^3$ defined as
$$ H(\epsilon, x) = \begin{pmatrix}x \\ f(x)+\epsilon\eta(x) \\ f'(x)+\epsilon\eta'(x))\end{pmatrix}. $$So we can write:
\begin{align*} \frac{d}{d\epsilon} L(H(\epsilon, x)) &= ((\partial_1, \partial_2, \partial_3)L)(H(\epsilon, x)) \cdot \frac{d}{d\epsilon} H(\epsilon, x)\\ &\phantom{=}\scriptstyle \text{(row vector times a column vector)}\\ &= ((\partial_1, \partial_2, \partial_3)L)(H(\epsilon, x)) \cdot \frac{d}{d\epsilon} \begin{pmatrix}x \\ f(x)+\epsilon\eta(x) \\ f'(x)+\epsilon\eta'(x))\end{pmatrix} \\ &= ((\partial_1, \partial_2, \partial_3)L)(H(\epsilon, x)) \cdot \begin{pmatrix}0 \\ \eta(x) \\ \eta'(x) \end{pmatrix} \\ &=(\partial_2 L)(H(\epsilon, x))\cdot\eta(x) + (\partial_3 L)(H(\epsilon, x))\cdot\eta'(x) \end{align*}The latter term of the sum can be integrated by parts.
\begin{align*} \int_a^b (\partial_3 L)(H(\epsilon, x))\cdot\eta'(x) dx &= (\partial_3 L)(H(\epsilon, x))\cdot\eta(x)\Bigr\vert_a^b - \int_a^b \ \frac{d}{dx}\Bigl[ (\partial_3 L)(H(\epsilon, x)) \Bigr]\eta(x) dx \\ &= - \int_a^b \frac{d}{dx} \Bigl[(\partial_3 L)(H(\epsilon, x))\Bigr]\eta(x) dx \end{align*}The last line follows because $\eta(a)=\eta(b)=0$. Now we can pick up the full integral from above again which we want to see being zero.
\begin{align*} 0 &= \int_a^b \frac{d}{d\epsilon} L(H(\epsilon, x)) dx \\ &=\int_a^b \left[(\partial_2 L)(H(\epsilon, x)) - \frac{d}{dx} (\partial_3 L)(H(\epsilon, x))\right]\eta(x) dx \end{align*}Since $\eta$ can be arbitrarily chosen, the integral can only be zero if the part inside the brackets is a constant zero. This is called the fundamental lemma of the calculus of variations. Therefore, what we need is
$$ 0 = (\partial_2 L)(H(\epsilon, x)) - \frac{d}{dx} (\partial_3 L)(H(\epsilon, x)) $$and in particular, see where we started from, for the case $\epsilon=0$, which brings us straight to the Euler-Lagrange Equation:
$$ \boxed{0 = (\partial_2 L)(x, f(x), f'(x)) - \frac{d}{dx} (\partial_3 L)(x,f(x), f'(x))} $$