The Geometry of General Relativity: Part I — Smooth Structure

Introduction

This document develops the differential geometry underlying general relativity from the ground up, targeting physicists who want genuine understanding rather than just computational facility. The niche it occupies is narrow but real: most physics-oriented texts (Carroll, Wald) treat the geometry as a tool to reach the physics quickly, and sacrifice either rigor or intuition to get there. Most mathematics-oriented texts (Lee, Spivak) are rigorous but don't connect the formalism to the physical and classical intuitions that make it meaningful. This document tries to do both.

It is worth being clear about what differential geometry is and how it relates to other things called "rigorous calculus." Classical calculus was powerful but imprecise, built on infinitesimals and geometric intuition that resisted formal justification. Real analysis, developed in the 19th century, made calculus rigorous on \(\mathbb{R}^n\): it gave precise definitions of limits, continuity, and differentiability, and put integration on solid measure-theoretic foundations. But real analysis is tied to flat space. It tells you how to do calculus on open subsets of \(\mathbb{R}^n\), not on a sphere or a curved spacetime.

Differential geometry takes real analysis as its foundation and asks: what does calculus look like on a space that is only locally flat? All the smoothness conditions in differential geometry are real analysis conditions applied in coordinate charts. The exterior derivative is computed in coordinates using real analysis. The proof of Stokes' theorem reduces to the fundamental theorem of calculus. The "differential geometry" part is the insistence on coordinate-independence: finding definitions of vectors, covectors, derivatives, and integrals that make sense intrinsically on the manifold, not just in a particular chart. This is the sense in which DG extends and clarifies classical calculus, not by providing the \(\varepsilon\)-\(\delta\) foundations that real analysis provides, but by identifying the correct geometric homes for objects like \(df\) that classical calculus described only informally.

On rigor: we aim for the level of a careful first graduate course in mathematics, with precise definitions, explicit calculations, and real proofs where they illuminate rather than obscure. We do not aim for full measure-theoretic or topological generality. Two things are deliberately left informal: the global patching argument in the proof of Stokes' theorem (the verification that partition of unity pieces fit together consistently across overlapping charts), and the deeper topological foundations of manifold theory. These are handled carefully in standard references; our goal is understanding, not encyclopedic completeness.

01 Manifolds

Motivation

Physics takes place on spaces that are not always flat. The surface of a sphere, the spacetime of general relativity, the configuration space of a mechanical system: these are curved spaces that cannot be described by a single global coordinate system. The theory of manifolds is the precise framework for doing calculus on such spaces.

The key insight is that even though a curved space has no global coordinates, it looks flat in small enough regions. The surface of the Earth is curved globally, but locally it looks like a flat plane (which is why maps work). A manifold is a space that is locally flat in exactly this sense, with enough structure to do calculus consistently across overlapping local regions.

Definition

A manifold \(M\) of dimension \(n\) is a topological space that is locally homeomorphic to \(\mathbb{R}^n\), equipped with a smooth structure. A homeomorphism is a continuous bijection with a continuous inverse — the topological notion of "same shape," meaning the two spaces look identical from a purely continuous perspective, without any notion of smoothness or distance. Equipped with a smooth structure, we additionally require the transition functions between charts to be smooth, upgrading homeomorphisms to diffeomorphisms. More precisely, \(M\) comes with a collection of coordinate charts \(\{(U_\alpha, \phi_\alpha)\}\) where each \(U_\alpha \subset M\) is an open set and \(\phi_\alpha: U_\alpha \to \mathbb{R}^n\) is a homeomorphism onto an open subset of \(\mathbb{R}^n\). The collection \(\{U_\alpha\}\) must cover \(M\).

The coordinates \(x^1,\ldots,x^n\) of a point \(p \in U_\alpha\) are the components of \(\phi_\alpha(p) \in \mathbb{R}^n\). Different charts give different coordinates for the same point.

Transition Functions and Smooth Structure

Where two charts \((U_\alpha, \phi_\alpha)\) and \((U_\beta, \phi_\beta)\) overlap, we need a way to pass between their coordinate systems. The transition function is the map:

\phi_\beta \circ \phi_\alpha^{-1}: \phi_\alpha(U_\alpha \cap U_\beta) \to \phi_\beta(U_\alpha \cap U_\beta),

which is a map between open subsets of \(\mathbb{R}^n\), ordinary multivariable calculus territory. The manifold is smooth if all transition functions are smooth (infinitely differentiable). This is what it means for the smooth structure to be consistent across charts: the notion of a smooth function on \(M\) is chart-independent, because changing charts involves a smooth transition function.

Examples

Euclidean space \(\mathbb{R}^n\) is a manifold covered by a single chart, the identity map. It is the trivial case.

The sphere \(S^2\) is a 2-manifold. No single coordinate chart covers it (any map of the Earth has at least one point of distortion or discontinuity), but two charts suffice: stereographic projection from the north and south poles. The transition function between them is smooth.

Spacetime in GR is a 4-dimensional Lorentzian manifold. It is locally flat (special relativity holds in small enough regions) but globally curved by the distribution of matter and energy.

Smooth Functions and the Smooth Structure

A function \(f: M \to \mathbb{R}\) is smooth if for every chart \((U_\alpha, \phi_\alpha)\), the composition \(f \circ \phi_\alpha^{-1}: \mathbb{R}^n \to \mathbb{R}\) is smooth in the ordinary sense. The smoothness of transition functions ensures this definition is chart-independent. The collection of all smooth functions on \(M\) is denoted \(C^\infty(M)\).

The manifold itself has no preferred coordinates, no metric, no notion of distance. Just this smooth structure, which tells you what it means for a function to be differentiable.

Compactness

A topological space is compact if every open cover has a finite subcover. Intuitively this means the space does not escape to infinity and has no missing boundary points. In \(\mathbb{R}^n\) this reduces to closed and bounded: a sphere is compact, a plane is not. A manifold with boundary can also be compact. The closed interval \([a,b]\) is the simplest example, and the generalized Stokes theorem applies to compact manifolds with boundary, as stated in §.

02 The Tangent Space

Tangent Vectors

At each point \(p \in M\) we want to define "directions you can move." The naive definition of an arrow pointing somewhere doesn't make sense on an abstract manifold because there's no ambient space to point into.

The correct definition: a tangent vector at \(p\) is a derivation on smooth functions, i.e. a linear map \(v: C^\infty(M) \to \mathbb{R}\) satisfying the Leibniz rule:

v(fg) = f(p)\,v(g) + g(p)\,v(f).

This captures the idea of "differentiate in a direction" without needing an ambient space. In local coordinates \(\{x^\mu\}\), the partial derivatives \(\partial/\partial x^\mu\) are derivations, and they form a basis for the tangent space at \(p\). We adopt the Einstein summation convention: repeated indices, one upper and one lower, are implicitly summed over. A general tangent vector is then:

v = v^\mu \frac{\partial}{\partial x^\mu}.

The coefficients \(v^\mu\) are the components of the tangent vector in this coordinate basis. The tangent space \(T_pM\) is the vector space of all tangent vectors at \(p\), with dimension \(n\).

Cotangent Vectors

The cotangent space \(T_p^*M\) is the dual space to \(T_pM\), consisting of all linear maps \(\omega: T_pM \to \mathbb{R}\).

Given a smooth function \(f: M \to \mathbb{R}\), its differential \(df\) at \(p\) is the cotangent vector defined by:

\[ df(v) = v(f), \]

for any tangent vector \(v\). This takes a tangent vector and returns the directional derivative of \(f\) in that direction, a number.

Applied to the coordinate functions \(x^\mu\) themselves, the differentials \(dx^\mu\) form the dual basis to \(\{\partial/\partial x^\mu\}\):

dx^\mu\!\left(\frac{\partial}{\partial x^\nu}\right) = \delta^\mu_{\ \nu}.

A general cotangent vector (also called a 1-form) is \(\omega = \omega_\mu\, dx^\mu\), and its action on a tangent vector \(v = v^\nu \partial_\nu\) is:

\omega(v) = \omega_\mu\, dx^\mu(v^\nu \partial_\nu) = \omega_\mu v^\nu\, dx^\mu(\partial_\nu) = \omega_\mu v^\nu \delta^\mu_{\ \nu} = \omega_\mu v^\mu \in \mathbb{R}.

04 Flat Space First: Classical Calculus Revisited

Basis Vectors as Differential Operators

Before going to general manifolds it's worth understanding what's happening in ordinary \(\mathbb{R}^n\), because this is where the classical and modern pictures meet, and where physicists are usually not shown the connection.

In \(\mathbb{R}^n\) you have a completely canonical way to differentiate a function \(f\) in the direction of a vector \(v\):

D_v f = \lim_{\epsilon \to 0} \frac{f(p + \epsilon v) - f(p)}{\epsilon} = v^\mu \partial_\mu f.

The map \(f \mapsto v^\mu \partial_\mu f\) is linear and satisfies Leibniz, making it a derivation. Crucially, different vectors give different derivations, and every derivation arises this way. So there is a perfect correspondence:

v^\mu \hat{e}_\mu \longleftrightarrow v^\mu \partial_\mu.

This is why the coordinate basis vectors are identified with partial derivatives. It isn't saying basis vectors are differential operators in some mysterious sense. It's saying that the only intrinsic thing a tangent vector does is differentiate functions, and in \(\mathbb{R}^n\) that action is in perfect bijection with the naive arrow picture. The arrow \(v\) and the directional derivative \(D_v\) carry exactly the same information.

The reason this feels strange is that in \(\mathbb{R}^n\) you're used to tangent vectors having a life independent of functions. They point somewhere in space, but on a general manifold there's no ambient space to point into, so the directional derivative action is all there is. The \(\mathbb{R}^n\) picture is secretly already this, just disguised by the familiar geometry.

The Differential as a Cotangent Vector

Now look at the classical expression \(df = \frac{\partial f}{\partial x^\mu}dx^\mu\). In classical calculus this is treated as an infinitesimal, something vaguely defined that becomes rigorous only when you integrate it or cancel it against another \(d\)-something. The modern picture gives it a precise identity: \(df\) is a cotangent vector, a linear map that takes a tangent vector and returns a number. Feed it \(v = v^\mu \partial_\mu\):

df(v) = \frac{\partial f}{\partial x^\mu}v^\mu.

This is just the directional derivative of \(f\) in the direction \(v\), a perfectly concrete number: if \(v\) is a velocity vector, \(df(v)\) is the rate of change of \(f\) along that motion. The mysterious infinitesimal \(df\) was always this linear map. It just needed a tangent vector to act on.

The Differential vs. the Gradient

In flat space with Euclidean metric the components of \(df\) are \(\partial_\mu f\) and the components of \(\nabla f\) are also \(\partial_\mu f\), numerically identical. But \(df\) is a cotangent vector that acts on a tangent vector to give a number, while \(\nabla f\) is itself a tangent vector that you dot product with another vector to give a number. The metric \(g_{\mu\nu} = \delta_{\mu\nu}\) is doing the conversion between them invisibly, which is why classical calculus never needed to distinguish the two. In curved space \(g_{\mu\nu}\) is nontrivial and the distinction becomes essential.

The Modern Picture: The Chain Rule as the Duality Pairing

With this in place, the chain rule becomes transparent. Consider a curve \(\gamma: \mathbb{R} \to M\) with parameter \(\lambda\), and let \(D_\lambda = d/d\lambda\) denote its tangent vector:

D_\lambda = \frac{dx^\mu}{d\lambda}\frac{\partial}{\partial x^\mu},

with components \(v^\mu = dx^\mu/d\lambda\), the coordinate velocities along the curve:

\frac{df}{d\lambda} = \frac{\partial f}{\partial x^\mu}\frac{dx^\mu}{d\lambda}.

In the modern picture this is just the duality pairing between the cotangent vector \(df\) and the tangent vector \(D_\lambda\):

\[ \frac{df}{d\lambda} = df(D_\lambda) = \frac{\partial f}{\partial x^\mu}\frac{dx^\mu}{d\lambda}. \]

The right side is the pairing \(\omega_\mu v^\mu\) with \(\omega_\mu = \partial_\mu f\) and \(v^\mu = dx^\mu/d\lambda\). The chain rule is the duality pairing between \(T_pM\) and \(T_p^*M\).

What Classical Calculus Was Doing

In classical calculus this same result is sometimes hand-waved by writing \(df = \frac{\partial f}{\partial x^\mu}dx^\mu\) and then "dividing by \(d\lambda\)", treating the \(dx^\mu\) as numbers that cancel. The modern picture explains why it works: you are not canceling infinitesimals, you are evaluating the cotangent vector \(df\) on the tangent vector \(D_\lambda\).

The classical \(df\) was always a cotangent vector. The modern framework just gives it a precise home.

06 Tensors

Definition

Given any vector space \(V\) and its dual \(V^*\), a tensor of type \((r,s)\) is a multilinear map taking \(r\) elements of \(V^*\) and \(s\) elements of \(V\) and returning a number. There is nothing specific to manifolds here; this is a construction in linear algebra that applies to any vector space.

On a manifold, we apply this at each point \(p\) with \(V = T_pM\):

T: \underbrace{T_p^*M \times \cdots \times T_p^*M}_{r} \times \underbrace{T_pM \times \cdots \times T_pM}_{s} \to \mathbb{R}.

Tangent vectors are \((1,0)\) tensors and cotangent vectors are \((0,1)\) tensors.

The Tensor Product

Given an \((r,s)\) tensor \(S\) and a \((p,q)\) tensor \(T\), their tensor product \(S \otimes T\) is an \((r+p, s+q)\) tensor defined by evaluating each on its respective arguments and multiplying the results:

(S \otimes T)(\omega^1,\ldots,\omega^r,\alpha^1,\ldots,\alpha^p,\, v_1,\ldots,v_s,u_1,\ldots,u_q) = S(\omega^1,\ldots,\omega^r,v_1,\ldots,v_s)\cdot T(\alpha^1,\ldots,\alpha^p,u_1,\ldots,u_q),

where \(\omega^i, \alpha^i \in T_p^*M\) and \(v_i, u_i \in T_pM\). This has no symmetry, so \(S\otimes T \neq T\otimes S\) in general.

The tensor products of basis vectors and covectors, \(\partial_{\mu_1}\otimes\cdots\otimes\partial_{\mu_r}\otimes dx^{\nu_1}\otimes\cdots\otimes dx^{\nu_s}\), themselves form a basis for the space of all \((r,s)\) tensors, which is a vector space in the usual sense. So the coordinate expansion:

T = T^{\mu_1\cdots\mu_r}{}_{\nu_1\cdots\nu_s}\; \partial_{\mu_1}\otimes\cdots\otimes\partial_{\mu_r}\otimes dx^{\nu_1}\otimes\cdots\otimes dx^{\nu_s}

is exactly a basis expansion, with the components \(T^{\mu_1\cdots\mu_r}{}_{\nu_1\cdots\nu_s}\) playing the role of coordinates in that vector space.

Note also that the pairing between a vector and a covector is symmetric in the sense that it doesn't matter which one you think of as "acting": a \((1,0)\) tensor acting on a cotangent vector gives the same number as the cotangent vector acting on that tangent vector. Both are just the duality pairing \(\langle \omega, v \rangle = \omega_\mu v^\mu\). The distinction between who is acting and who is being acted on is a matter of perspective, not mathematics.

08 Coordinate Transformations

A tensor is a coordinate-independent geometric object: the same map between vectors and numbers regardless of which chart you use. This forces a specific relationship between how components in different coordinate systems must relate to each other.

Tangent Vectors

A tangent vector \(v\) has components \(v^\mu = dx^\mu/d\lambda\) in coordinates \(\{x^\mu\}\). Under a change to coordinates \(\{x'^\mu\}\), the chain rule gives:

v'^\mu = \frac{dx'^\mu}{d\lambda} = \frac{\partial x'^\mu}{\partial x^\nu}\frac{dx^\nu}{d\lambda} = \frac{\partial x'^\mu}{\partial x^\nu}v^\nu.

So tangent vector components transform by multiplying by the Jacobian \(\partial x'^\mu/\partial x^\nu\) of the coordinate change. Components that transform this way are called contravariant: they transform with the forward Jacobian.

Cotangent Vectors

A cotangent vector \(\omega\) has components \(\omega_\mu = \partial_\mu f\) (for example). Under the same coordinate change:

\omega'_\mu = \frac{\partial f}{\partial x'^\mu} = \frac{\partial x^\nu}{\partial x'^\mu}\frac{\partial f}{\partial x^\nu} = \frac{\partial x^\nu}{\partial x'^\mu}\omega_\nu.

Cotangent components transform by the inverse Jacobian \(\partial x^\nu/\partial x'^\mu\). Components that transform this way are called covariant: they transform with the inverse Jacobian.

Why the Pairing Is Invariant

The duality pairing \(\omega_\mu v^\mu\) is coordinate-independent because the two Jacobians cancel:

\omega'_\mu v'^\mu = \frac{\partial x^\nu}{\partial x'^\mu}\omega_\nu \cdot \frac{\partial x'^\mu}{\partial x^\rho}v^\rho = \delta^\nu_{\ \rho}\,\omega_\nu v^\rho = \omega_\nu v^\nu.

This is not a coincidence; it is forced by the requirement that the pairing return the same number in every coordinate system. The covariant and contravariant transformation laws are precisely defined to make this true.

General Tensors

A general \((r,s)\) tensor has components that transform with \(r\) forward Jacobians (one for each upper index) and \(s\) inverse Jacobians (one for each lower index):

T'^{\mu_1\cdots\mu_r}{}_{\nu_1\cdots\nu_s} = \frac{\partial x'^{\mu_1}}{\partial x^{\alpha_1}}\cdots\frac{\partial x'^{\mu_r}}{\partial x^{\alpha_r}}\cdot\frac{\partial x^{\beta_1}}{\partial x'^{\nu_1}}\cdots\frac{\partial x^{\beta_s}}{\partial x'^{\nu_s}}\cdot T^{\alpha_1\cdots\alpha_r}{}_{\beta_1\cdots\beta_s}.

In GR this is often taken as the definition of a tensor: a tensor is precisely an object whose components transform this way. The geometric definition (a multilinear map on \(T_pM\) and \(T_p^*M\)) and the transformation law definition are equivalent.

Lorentz Transformations as a Special Case

The general Jacobian \(\partial x'^\mu/\partial x^\nu\) can be anything, varying from point to point. Lorentz transformations are the special case where the coordinate change is linear, \(x'^\mu = \Lambda^\mu_{\ \nu}x^\nu\), so the Jacobian is constant everywhere and equals \(\Lambda^\mu_{\ \nu}\) itself. The tensor transformation law then just becomes multiplication by \(\Lambda\) and its inverse, which is exactly how tensors transform in special relativity.

09 The Wedge Product

Among all \((0,k)\) tensors, the antisymmetric ones change sign whenever any two arguments are swapped:

\omega(v_1,\ldots, v_i, \ldots, v_j, \ldots, v_k) = -\omega(v_1,\ldots, v_j,\ldots, v_i,\ldots, v_k).

These are called \(k\)-forms: antisymmetric multilinear maps at a single point, a purely algebraic object. A differential \(k\)-form, defined in §, is a smooth assignment of a \(k\)-form to every point of the manifold. Given a \(k\)-form \(\alpha\) and an \(l\)-form \(\beta\), the wedge product \(\alpha\wedge\beta\) is their antisymmetrized tensor product:

\alpha\wedge\beta = \frac{(k+l)!}{k!\, l!}\, \mathrm{Alt}(\alpha\otimes\beta),

where \(\mathrm{Alt}\) antisymmetrizes over all arguments by summing over all permutations with signs:

\mathrm{Alt}(\alpha\otimes\beta)(v_1,\ldots,v_{k+l}) = \frac{1}{(k+l)!}\sum_{\sigma\in S_{k+l}} \mathrm{sgn}(\sigma)\,(\alpha\otimes\beta)(v_{\sigma(1)},\ldots,v_{\sigma(k+l)}),

where \(\mathrm{sgn}(\sigma) = \pm 1\) is the sign of the permutation \(\sigma\). The combinatorial prefactor corrects for overcounting: \(\mathrm{Alt}\) sums over all \((k+l)!\) permutations, but the result is already antisymmetric in the first \(k\) and last \(l\) slots separately, so each distinct term appears \(k!\,l!\) times. The prefactor removes this redundancy, leaving a correctly normalized \((k+l)\)-form. The wedge product provides a standard mechanism for combining lower-rank forms into higher-rank antisymmetric forms, a construction whose utility will become clear shortly.

On basis covectors the antisymmetry gives:

dx^\mu\wedge dx^\nu = -dx^\nu\wedge dx^\mu, \qquad dx^\mu\wedge dx^\mu = 0.

The wedge extends to multiple factors by iteration, with each new factor is wedged on in turn.

dx^\mu\wedge dx^\nu\wedge dx^\rho = (dx^\mu\wedge dx^\nu)\wedge dx^\rho,

and so on for any number of factors. For general \(k\)- and \(l\)-forms \(\alpha\) and \(\beta\) the antisymmetry generalizes to graded commutativity:

\alpha\wedge\beta = (-1)^{kl}\,\beta\wedge\alpha.

The action of a basis \(k\)-form on \(k\) vectors is a determinant:

\[ (dx^{\mu_1}\wedge\cdots\wedge dx^{\mu_k})(v_1,\ldots,v_k) = \det\begin{pmatrix} v_1^{\mu_1} & \cdots & v_1^{\mu_k} \\ \vdots & & \vdots \\ v_k^{\mu_1} & \cdots & v_k^{\mu_k} \end{pmatrix}. \]

The entries are the \(\{\mu_1,\ldots,\mu_k\}\)-components of each vector, giving the projection onto the chosen coordinate \(k\)-plane. The determinant gives the signed \(k\)-volume of the parallelepiped they span. This is not an analogy: the definition of the wedge product via \(\mathrm{Alt}\) is precisely the Leibniz formula for the determinant. The exterior algebra is the algebra of determinants, which is why the wedge product is also called the exterior product. Note also that this determinant intepretation provides a simple way to see that \(dx^\mu\wedge dx^\nu = -dx^\nu\wedge dx^\mu\) and \(dx^\mu\wedge dx^\mu = 0\). The below table provides several explicit examples.

\(k\)	Example	Geometric content
1	\(dx^\mu(v) = v^\mu\)	Component of \(v\) in the \(\mu\) direction; a \(1\times1\) determinant.
2	\((dy\wedge dz)(v,w) = v^y w^z - v^z w^y\)	Signed projected area of \((v,w)\) onto the \(yz\)-plane.
3	\((dx\wedge dy\wedge dz)(u,v,w)\)	Full \(3\times3\) determinant; signed volume of the parallelepiped spanned by \(u,v,w\).

10 Differential Forms

Definition

A differential \(k\)-form on \(M\) is a smooth assignment of a \(k\)-form to each point of \(M\), specifically a smooth choice at every \(p\) of an antisymmetric multilinear map \(T_pM\times\cdots\times T_pM \to \mathbb{R}\).

Smoothness

It's worth being precise about what this means. The tangent space \(T_pM\) is a different vector space at each point, so the basis covectors \(dx^\mu|_p\) are literally different objects at different points. There is a separate dual basis at each \(p\), and a differential \(k\)-form \(\omega\) assigns to each \(p\) a \(k\)-form built from that point's basis covectors, with real-number coefficients that depend on \(p\). In a coordinate chart, \(\omega\) has component functions \(\omega_{\mu_1\cdots\mu_k}(p)\), and smoothness means these are smooth functions of the coordinates \(x^1,\ldots,x^n\) on that chart, in the usual multivariable calculus sense. This condition is chart-independent: if the components are smooth in one chart, they are smooth in any overlapping chart, because the transition functions between charts are themselves smooth, which is exactly what it means for \(M\) to be a smooth manifold. Equivalently, for any smooth vector fields \(v_1,\ldots,v_k\) on \(M\), the function \(p \mapsto \omega_p(v_1(p),\ldots,v_k(p))\) is a smooth function \(M\to\mathbb{R}\).

Just as a vector field is a linear combination of basis vector fields \(\partial_\mu\) with smooth coefficient functions, a differential \(k\)-form is a linear combination of basis \(k\)-forms \(dx^{\mu_1}\wedge\cdots\wedge dx^{\mu_k}\) with smooth coefficient functions. The space of all differential \(k\)-forms on \(M\) is denoted \(\Omega^k(M)\): concretely, all possible choices of smooth coefficient functions. It is a vector space over \(\mathbb{R}\) (you can add forms and scale by constants), and infinite-dimensional globally since the coefficients can be any smooth functions. But at each individual point \(p\), the space of \(k\)-forms is just a finite-dimensional vector space of dimension \(\binom{n}{k}\).

Examples in \(\mathbb{R}^3\)

In \(\mathbb{R}^3\) the independent basis forms of each degree, and the corresponding general forms, are:

0-forms: a smooth function \(f\). No arguments, just a number at each point.

1-forms: \(f_1\,dx + f_2\,dy + f_3\,dz\), with \(f_1,f_2,f_3\) arbitrary smooth functions. Takes one tangent vector, returns a number.

2-forms: \(f_1\,dy\wedge dz + f_2\,dz\wedge dx + f_3\,dx\wedge dy\). Takes two tangent vectors, returns a weighted sum of projected areas.

3-forms: \(f\,dx\wedge dy\wedge dz\). Takes three tangent vectors, returns their signed volume weighted by \(f\).

Counting Basis Forms

The number of independent basis \(k\)-forms in \(\mathbb{R}^n\) is \(\binom{n}{k}\): you choose \(k\) coordinate directions from \(n\), and antisymmetry means order doesn't matter up to sign. In \(\mathbb{R}^3\): 0-forms have 1 component, 1-forms have 3, 2-forms have 3, and 3-forms have 1. The coincidence that 1-forms and 2-forms have the same number of components is special to 3 dimensions, and has significant consequences for how vector calculus works, as we will see in §. There are no nonzero \(k\)-forms for \(k>n\): repeating a basis element gives a repeated row in the determinant, which is zero.

Why Forms Are the Right Objects to Integrate

The fact that a \(k\)-form returns signed \(k\)-volumes is not incidental; it is precisely what makes forms the right objects to integrate. A length element, an area element, a volume element are all instances of the same thing: a \(k\)-form evaluated on \(k\) infinitesimal displacement vectors. This gives a single unified definition of integration over curves, surfaces, volumes, and arbitrary manifolds, as we will see in §.

11 Integration

Orientation

An orientation on a manifold \(M\) is a consistent choice of "handedness" across all coordinate charts, concretely a choice of which basis orderings \((\partial_1,\ldots,\partial_k)\) count as positively oriented. On a curve this means a direction of travel; on a surface, which side is up; on a volume, a choice of right- vs left-handed axes. Two charts are orientation-compatible if the Jacobian of their transition function has positive determinant. An orientable manifold is one that admits a consistent such choice globally. A Möbius strip is the classic example of a manifold that does not. The orientation of \(M\) also induces a natural orientation on its boundary \(\partial M\) via the outward normal convention, which we make precise in the proof of the generalized Stokes theorem in §.

Definition

To integrate a \(k\)-form \(\omega\) over an oriented \(k\)-dimensional submanifold \(M\), work in a positively oriented coordinate chart with coordinates \(x^1,\ldots,x^k\). In this chart \(\omega\) has a single independent coefficient:

\omega = \omega_{12\cdots k}(x)\, dx^1\wedge\cdots\wedge dx^k.

We define the integral:

\[ \int_M \omega := \int \omega_{12\cdots k}(x)\,dx^1\cdots dx^k, \]

where the right-hand side is a standard multivariable integral, where the \(dx^i\) are are ordinary Lebesgue measure factors, not wedge products. Orientation enters through the sign: swapping two coordinate directions flips the sign of the determinant and hence the sign of the integral, corresponding to integrating over the oppositely oriented manifold. On a non-orientable manifold no global sign convention exists, so forms cannot be integrated globally. No metric is required anywhere in this construction.

Why This Definition Makes Sense

Think about what a Riemann integral is: you chop \(M\) into tiny coordinate boxes, evaluate something on each box, and sum. Each tiny box has edge vectors \(\epsilon_1\partial_1, \ldots, \epsilon_k\partial_k\) where \(\epsilon_i\) is the side length in the \(i\)-th direction (an ordinary real number, not a form). The natural thing to assign to this box is the value of \(\omega\) on its edge vectors:

\omega(\epsilon_1\partial_1,\ldots,\epsilon_k\partial_k) = \omega_{12\cdots k}(x)\,\det\begin{pmatrix}\epsilon_1 & & \\ & \ddots & \\ & & \epsilon_k\end{pmatrix} = \omega_{12\cdots k}(x)\,\epsilon_1\cdots\epsilon_k,

where we used the determinant rule from § and the fact that the edge vectors point in orthogonal coordinate directions. Summing over all boxes and taking the limit \(\epsilon_i \to dx^i\) (now Lebesgue measure factors) gives exactly the standard Riemann integral of \(\omega_{12\cdots k}\). So the definition is the unique natural way to sum up what the form assigns to infinitesimal pieces of \(M\).

Coordinate Independence

This might seem anticlimactic. After all that machinery, integrating a form just reduces to a standard integral in \(\mathbb{R}^n\). But that's precisely the point. On a bare manifold there is no canonical way to integrate a plain function: pick different coordinates and you get a different number, because you forgot the Jacobian. A differential form fixes this. As shown in §, a \(k\)-form coefficient \(\omega_{12\cdots k}\) has \(k\) lower indices, so it transforms with \(k\) inverse Jacobians, giving an overall factor of \(\det(\partial x/\partial x')\). The Lebesgue measure \(dx^1\cdots dx^k\) transforms by the change of variables formula with the reciprocal factor \(\det(\partial x'/\partial x)\). These cancel exactly, leaving \(\int_M \omega\) the same number in every chart. The entire apparatus exists to guarantee that the reduction to a standard integral is coordinate-independent.

12 The Exterior Derivative

Definition

The exterior derivative \(d\) is a map \(d: \Omega^k(M) \to \Omega^{k+1}(M)\), defined on a \(k\)-form \(\omega = \frac{1}{k!}\omega_{\mu_1\cdots\mu_k}\,dx^{\mu_1}\wedge\cdots\wedge dx^{\mu_k}\) by:

\[ d\omega = \frac{1}{k!}\frac{\partial \omega_{\mu_1\cdots\mu_k}}{\partial x^\nu}\,dx^\nu\wedge dx^{\mu_1}\wedge\cdots\wedge dx^{\mu_k}. \]

Differentiate the coefficient and wedge on a new basis covector. On a 0-form this is just the differential \(df = \partial_\mu f\, dx^\mu\). On a 1-form \(\omega = \omega_\mu\,dx^\mu\), applying the definition:

d\omega = \partial_\nu\omega_\mu\, dx^\nu\wedge dx^\mu.

Writing the sum as the average of itself with dummy indices \(\mu \leftrightarrow \nu\) swapped, then using \(dx^\mu\wedge dx^\nu = -dx^\nu\wedge dx^\mu\):

\begin{aligned} d\omega &= \frac{1}{2}\left(\partial_\nu\omega_\mu\, dx^\nu\wedge dx^\mu + \partial_\mu\omega_\nu\, dx^\mu\wedge dx^\nu\right) \\ &= \frac{1}{2}\left(\partial_\nu\omega_\mu\, dx^\nu\wedge dx^\mu - \partial_\mu\omega_\nu\, dx^\nu\wedge dx^\mu\right) \\ &= \frac{1}{2}\left(\partial_\nu\omega_\mu - \partial_\mu\omega_\nu\right)dx^\nu\wedge dx^\mu. \end{aligned}

The antisymmetric combination \(\partial_\nu\omega_\mu - \partial_\mu\omega_\nu\) is the curl of the 1-form coefficients, which is exactly where curl comes from, as we will see below.

Nilpotency

The antisymmetry of the wedge product automatically antisymmetrizes the derivatives, which is why \(d^2 = 0\): applying \(d\) twice gives \(\partial_\rho\partial_\nu\omega_\mu\) contracted with \(dx^\rho\wedge dx^\nu\wedge\cdots\), and since \(\partial_\rho\partial_\nu = \partial_\nu\partial_\rho\) (equality of mixed partials) but \(dx^\rho\wedge dx^\nu = -dx^\nu\wedge dx^\rho\) (antisymmetry of wedge), every term cancels with its swap. Symmetry in the indices is killed by antisymmetry in the basis: that's the whole argument.

Grad, Curl, and Div

All of grad, curl, and div are special cases of \(d\). To see this explicitly, associate to any vector field \(F = (F_x, F_y, F_z)\) in \(\mathbb{R}^3\) a 1-form \(\omega_F = F_x\,dx + F_y\,dy + F_z\,dz\) and a 2-form \(\sigma_F = F_x\,dy\wedge dz + F_y\,dz\wedge dx + F_z\,dx\wedge dy\).

Grad: apply \(d\) to a 0-form \(f\):

df = \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y}dy + \frac{\partial f}{\partial z}dz.

The components of \(df\) are exactly the components of \(\nabla f\), packaged as a 1-form.

Curl: apply \(d\) to the 1-form \(\omega_F\):

d\omega_F = \left(\frac{\partial F_z}{\partial y} - \frac{\partial F_y}{\partial z}\right)dy\wedge dz + \left(\frac{\partial F_x}{\partial z} - \frac{\partial F_z}{\partial x}\right)dz\wedge dx + \left(\frac{\partial F_y}{\partial x} - \frac{\partial F_x}{\partial y}\right)dx\wedge dy.

The three antisymmetric combinations of partial derivatives are exactly the components of \(\nabla \times F\), packaged as a 2-form.

Div: apply \(d\) to the 2-form \(\sigma_F\):

d\sigma_F = \left(\frac{\partial F_x}{\partial x} + \frac{\partial F_y}{\partial y} + \frac{\partial F_z}{\partial z}\right)dx\wedge dy\wedge dz.

The coefficient is exactly \(\nabla \cdot F\), packaged as a 3-form.

The single operation \(d\) — differentiate and wedge — encodes all three classical operations. The reason they look different in vector calculus is that \(d\) acts on different types of forms each time, and the 3D coincidence \(\binom{3}{1}=\binom{3}{2}=3\) allows both 1-forms and 2-forms to masquerade as vector fields. Curl is the special case that relies on this coincidence, which is why it doesn't generalize beyond 3 dimensions.

And the single identity \(d^2 = 0\) unifies what in vector calculus are two separate identities you memorize:

\nabla \times (\nabla f) = 0, \qquad \nabla \cdot (\nabla \times F) = 0.

They're not two facts. They're one fact: \(d^2 = 0\).

13 Pullbacks and Pushforwards

A smooth map \(\phi: M \to N\) between manifolds induces natural maps on tangent and cotangent spaces: a pushforward on vectors and a pullback on forms.

Pushforward

The pushforward \(\phi_*: T_pM \to T_{\phi(p)}N\) takes a tangent vector at \(p \in M\) to a tangent vector at \(\phi(p) \in N\). Recall that a tangent vector is a derivation that acts on smooth functions and returns a number. So \(\phi_*v\) must be a derivation on functions \(g: N \to \mathbb{R}\). The question is: given such a \(g\), how do we use \(v\), which only knows how to differentiate functions on \(M\)?

The answer is to compose: \(g \circ \phi: M \to \mathbb{R}\) pulls \(g\) back to \(M\), where \(v\) can act on it. So the pushforward is defined by:

(\phi_* v)(g) = v(g \circ \phi).

This is the only natural definition: the unique way to turn a function on \(N\) into a function on \(M\) using \(\phi\). To get the coordinate formula, apply \(\phi_*v\) to the coordinate function \(x'^\mu\) on \(N\) and use the chain rule:

(\phi_* v)^\mu = (\phi_*v)(x'^\mu) = v(x'^\mu \circ \phi) = v(\phi^\mu) = v^\nu\frac{\partial \phi^\mu}{\partial x^\nu},

where the last step uses \(v = v^\nu\partial_\nu\) acting on \(\phi^\mu\) as a function on \(M\). So the pushforward acts on components by the Jacobian \(\partial\phi^\mu/\partial x^\nu\).

Pullback

The pullback \(\phi^*: \Omega^k(N) \to \Omega^k(M)\) takes a \(k\)-form on \(N\) back to a \(k\)-form on \(M\). It is defined by evaluating the form on pushed-forward vectors:

(\phi^*\omega)(v_1,\ldots,v_k) = \omega(\phi_*v_1,\ldots,\phi_*v_k).

This is again the only natural definition: \(\omega\) lives on \(N\) and needs tangent vectors at \(\phi(p) \in N\) to act on, and the pushforward supplies exactly those. For a 0-form (smooth function) \(f: N \to \mathbb{R}\), the pullback is simply precomposition: \(\phi^* f = f \circ \phi\). There are no vectors to push forward, just the function composed with the map.

Coordinate Changes as a Special Case

A coordinate change \(x \to x'\) is a smooth map \(\phi: U \to U'\) between open sets. The transformation laws of § are exactly the pullback and pushforward of \(\phi\): contravariant components transform by the pushforward Jacobian \(\partial x'^\mu/\partial x^\nu\), and covariant components transform by the pullback inverse Jacobian \(\partial x^\nu/\partial x'^\mu\).

Pullback Commutes with \(d\)

A fundamental property of the exterior derivative is that it commutes with pullback: for any smooth map \(\phi: M \to N\) and any differential form \(\omega\) on \(N\),

\[ \phi^*(d\omega) = d(\phi^*\omega). \]

This says that differentiating and then pulling back gives the same result as pulling back and then differentiating. It reflects the fact that \(d\) is a purely smooth operation, intrinsic to the manifold structure and independent of any particular choice of map or coordinates.

The proof reduces to the 0-form case. For a smooth function \(f: N \to \mathbb{R}\) and a tangent vector \(v \in T_pM\):

\phi^*(df)(v) = df(\phi_*v) = (\phi_*v)(f) = v(f\circ\phi) = d(f\circ\phi)(v) = d(\phi^*f)(v),

where each step uses in turn: the definition of pullback, the definition of \(df\), the definition of pushforward, the definition of \(d\) on a 0-form, and \(\phi^*f = f\circ\phi\). So \(\phi^*(df) = d(\phi^*f)\). For a general \(k\)-form, any form can be written locally as a sum of terms \(f\,dx^{\mu_1}\wedge\cdots\wedge dx^{\mu_k}\). Since \(\phi^*\) commutes with \(\wedge\) (which follows directly from its definition) and with sums, and \(d\) satisfies the Leibniz rule, the identity extends to all forms by the 0-form case plus linearity.

Restriction to a Submanifold

A submanifold \(S \subset M\) is a subset that is itself a manifold, with the smooth structure inherited from \(M\). For example, a curve or surface sitting inside \(\mathbb{R}^3\). The natural map is the inclusion \(\zeta: S \hookrightarrow M\), which just sends each point of \(S\) to itself viewed as a point of \(M\). The pullback \(\zeta^*\omega\) is then the restriction of \(\omega\) to \(S\). It takes vectors tangent to \(S\), pushes them forward into \(M\) via \(\zeta_*\), and evaluates \(\omega\) on the result.

For a coordinate held fixed on \(S\), say \(x^i = b_i\), the inclusion map sends \((x^1,\ldots,x^{i-1},x^{i+1},\ldots,x^k) \mapsto (x^1,\ldots,x^{i-1},b_i,x^{i+1},\ldots,x^k)\). Applying the commutation identity with \(\phi = \zeta\) and \(\omega = x^i\) (a 0-form):

\zeta^*(dx^i) = d(x^i \circ \zeta).

Now \(x^i \circ \zeta\) is the function that takes a point on the face and returns its \(i\)-th coordinate, but on the face \(x^i = b_i\), that is always the constant \(b_i\):

d(x^i \circ \zeta) = d(b_i) = 0.

This fact will be useful when we prove the generalized Stokes theorem.

Terminology

There is an asymmetry in direction: vectors go forward with \(\phi\) while forms go backward against \(\phi\), reflecting a fundamental contrast in their nature. A form is a linear functional on vectors, so to evaluate a form from \(N\) at a point in \(M\), you first need to produce a vector in \(N\), which you do by pushing forward. This makes the pullback of forms always well-defined for any smooth map. The pushforward of forms is not generally well-defined: you would need \(\phi\) to be invertible to pull vectors back from \(N\) to \(M\). This directional asymmetry is why the word "pullback" appears so often in differential geometry: forms, metrics, and connections all pull back naturally, while vectors only push forward.

14 The Generalized Stokes Theorem

If \(M\) is a compact oriented \(k\)-dimensional manifold with boundary \(\partial M\), and \(\omega\) is a \((k-1)\)-form:

\[ \int_{\partial M}\omega = \int_M d\omega. \]

This single statement contains all the classical integral theorems as special cases:

M	ω	Classical name
\([a,b]\)	0-form \(f\)	Fundamental theorem of calculus
Region in \(\mathbb{R}^2\)	1-form	Green's theorem
Surface in \(\mathbb{R}^3\)	1-form	Classical Stokes' theorem
Volume in \(\mathbb{R}^3\)	2-form	Divergence theorem

They were never different theorems. The proof makes this obvious.

Proof

Step 1: reduce to a coordinate patch. A partition of unity is a collection of smooth functions \(\{\rho_i\}\), one for each coordinate chart, satisfying \(0 \leq \rho_i \leq 1\), each \(\rho_i\) vanishing outside its chart, and \(\sum_i \rho_i = 1\) everywhere. This lets us write \(\omega = \sum_i \rho_i\,\omega\), where each piece \(\rho_i\,\omega\) is supported entirely within a single chart. Since the integral is linear, it suffices to prove the theorem for each piece, so assume \(\omega\) is supported in one chart with coordinates \(x^1,\ldots,x^k\).

Step 2: write out both sides. We define a \((k-1)\)-form on a \(k\)-dimensional manifold as:

\omega = \sum_i (-1)^{i-1} f_i\, dx^1\wedge\cdots\wedge\widehat{dx^i}\wedge\cdots\wedge dx^k,

where the hat means that factor is omitted. The sign \((-1)^{i-1}\) is inserted into an otherwise standard expansion (its purpose will become clear momentarily): when we apply \(d\), the new \(dx^i\) lands at the left and must be moved \(i-1\) positions rightward to reach its natural place, picking up a factor of \((-1)^{i-1}\) from the wedge antisymmetry. The inserted sign exactly cancels this. To see this explicitly, the \(i\)-th term after applying \(d\) is:

(-1)^{i-1}\frac{\partial f_i}{\partial x^j}dx^j\wedge dx^1\wedge\cdots\wedge\widehat{dx^i}\wedge\cdots\wedge dx^k.

For \(j \neq i\), the factor \(dx^j\) already appears in the sequence, giving zero by antisymmetry. Only \(j = i\) survives, and moving \(dx^i\) past \(i-1\) factors costs \((-1)^{i-1}\), which cancels the prefactor:

d\omega = \sum_i \frac{\partial f_i}{\partial x^i}\, dx^1\wedge\cdots\wedge dx^k.

The antisymmetry of the wedge product kills all mixed partial terms; only the diagonal \(\partial_i f_i\) terms survive.

Step 3: apply the fundamental theorem. Integrating \(d\omega\) over \(M\), we handle each term in the sum separately. For the \(i\)-th term, integrate over \(x^i\) first using the fundamental theorem of calculus:

\int \frac{\partial f_i}{\partial x^i}\,dx^i = f_i\big|_{x^i=b_i} - f_i\big|_{x^i=a_i}.

Integrating the remaining \(k-1\) variables over their ranges gives:

\int_M d\omega = \sum_i \int \left[f_i\big|_{x^i=b_i} - f_i\big|_{x^i=a_i}\right] dx^1\cdots\widehat{dx^i}\cdots dx^k.

Each term is now an integral over a \((k-1)\)-dimensional face of the boundary, specifically the two faces where \(x^i\) takes its endpoint values.

Step 4: recognize the boundary. Recall from Step 1 that we reduced to a single coordinate patch, which locally looks like a box in \(\mathbb{R}^k\) with coordinates \(x^1,\ldots,x^k\) each ranging over some interval \([a_i, b_i]\). The boundary of this box consists of \(2k\) faces, pairs of \((k-1)\)-dimensional sides where one coordinate is held fixed at its endpoint. In \(\mathbb{R}^3\) for example: if \(M\) is a 2D patch (a piece of surface), its boundary faces are 4 edges; if \(M\) is a 1D patch, its boundary faces are 2 endpoints.

The restriction of \(\omega\) to the face \(x^i = b_i\) is precisely the pullback \(\zeta^*\omega\), where \(\zeta\) is the inclusion of that face into \(M\). By the commutation identity of §, \(\zeta^*(dx^j) = d(x^j \circ \zeta)\). For \(j \neq i\), \(x^j \circ \zeta = x^j\) (those coordinates are free on the face), so \(\zeta^*(dx^j) = dx^j\). For \(j = i\), \(x^i \circ \zeta = b_i\) is constant, so \(\zeta^*(dx^i) = d(b_i) = 0\). Every term in \(\omega\) except the \(i\)-th contains \(dx^i\) and is therefore killed by \(\zeta^*\). The surviving term is:

\zeta^*\omega\big|_{x^i=b_i} = (-1)^{i-1}f_i(x^1,\ldots,b_i,\ldots,x^k)\,dx^1\wedge\cdots\wedge\widehat{dx^i}\wedge\cdots\wedge dx^k.

Induced orientation. The boundary \(\partial M\) inherits an orientation from \(M\) by the following convention: a basis \((w_1,\ldots,w_{k-1})\) for \(T_p(\partial M)\) is positively oriented if \((n, w_1,\ldots,w_{k-1})\) is positively oriented in \(T_pM\), where \(n\) is the outward-pointing normal. On the face \(x^i = b_i\), the outward normal points in the \(+x^i\) direction, so the induced orientation agrees with \(dx^1\wedge\cdots\wedge\widehat{dx^i}\wedge\cdots\wedge dx^k\) up to the sign \((-1)^{i-1}\), exactly the sign already present in \(\zeta^*\omega\). On the face \(x^i = a_i\), the outward normal points in the \(-x^i\) direction, reversing the orientation and contributing a minus sign. This is precisely the \(\pm\) structure in Step 3.

Integrating \(\zeta^*\omega\) over all \(2k\) faces and summing gives \(\int_{\partial M}\omega\), which is term-by-term identical to what Step 3 produced. \(\square\)

The whole proof is the fundamental theorem of calculus, applied once per coordinate direction. The exterior derivative is built precisely so that its antisymmetry kills interior terms and leaves only the boundary, which is why \(d^2 = 0\) and Stokes' theorem are two sides of the same coin.

We have assumed throughout that the local pieces from the partition of unity sum back up cleanly to give the full global picture, specifically that boundary contributions from overlapping charts cancel consistently. This is intuitively clear but requires careful verification; a rigorous treatment is given in Lee, Introduction to Smooth Manifolds.

Outlook: Part 2

This document covers the metric-free story: manifolds, tensors, differential forms, the exterior derivative, pullbacks, and integration, ending with a proof of the generalized Stokes theorem. Everything here requires only a smooth structure and an orientation.

Part 2 will introduce the metric \(g_{\mu\nu}\) and develop what depends on it: the Hodge star \(\star\), closed and exact forms and de Rham cohomology, connections and curvature, and the Einstein field equations. It will also develop Maxwell's equations in the language of forms, where the split between metric-free and metric-dependent structure has direct physical meaning.