Building a PEF-Aligned LCA Engine from Public Data

Every product has an environmental story. The carbon emitted to forge its steel, the water consumed to grow its cotton, the particulate matter released from its manufacturing grid. The EU's Product Environmental Footprint (PEF) framework attempts to standardize how that story is told — across 16 impact categories, with official normalization and weighting factors.

We built an engine that computes it. This post is a technical walkthrough of our LCA engine's architecture: what public datasets feed it, what standards it implements, and what we learned building a multi-source impact calculator that runs entirely without external API calls.

The Standard: EU PEF EF 3.1

The PEF methodology, codified in Commission Recommendation (EU) 2021/2279, defines 16 environmental impact categories — from climate change and water use to human toxicity and ecotoxicity. Each category has three components:

Characterization factors — convert substance emissions (kg SO₂, kg NOₓ) into category-specific impact units (mol H+ eq for acidification, disease incidence for particulate matter)
Normalization factors (NF) — per-capita global annual reference values, allowing cross-category comparison
Weighting factors (WF) — political/scientific consensus on relative importance, summing to 1.0

The single score formula is:

Single Score = Σ (characterizedᵢ / NFᵢ) × WFᵢ

Expressed in micro-Points (µPt). Lower is better.

Our engine implements this formula end-to-end — from raw substance inventory through characterization, normalization, weighting, and single-score aggregation.

Data Sources

We deliberately built on publicly available, peer-reviewed datasets. No proprietary databases, no per-query licensing fees, no black-box factors. Here is what feeds the engine:

EPA Supply Chain GHG Emission Factors v1.3

The US Environmental Protection Agency publishes sector-level cradle-to-gate emission factors keyed by NAICS-6 industry codes. We use these as a spend-based estimation layer: given a product's price and sector classification, we can estimate embedded carbon even when material composition is unknown. The dataset covers 77 manufacturing sectors.

UK DEFRA/BEIS Conversion Factors (2024)

The UK Department for Environment, Food & Rural Affairs publishes annual greenhouse gas conversion factors for materials, transport modes, and energy sources. We use their material-level factors (40 materials, kg CO₂e per kg), transport emission intensities (8 modes, kg CO₂e per tonne-km), and international shipping distances.

IPCC AR6 Global Warming Potentials (GWP₁₀₀)

The Sixth Assessment Report from the Intergovernmental Panel on Climate Change provides the canonical 100-year global warming potentials we use for greenhouse gas equivalency. CO₂ = 1.0, CH₄ = 29.8, N₂O = 273, SF₆ = 25,200, and so on.

EPA GHG Emission Factors Hub

Grid-average electricity emission factors for US regions and international electricity grids. We currently model 13 country-specific grids covering roughly 70% of global manufacturing output.

EF 3.1 Characterization Method (JRC, 2022)

The Joint Research Centre of the European Commission publishes the EF 3.1 characterization factors that map individual substances to their impacts across all 16 PEF categories.

Our implementation covers 45 substances — the most common air emissions (CO₂, CH₄, N₂O, SO₂, NOₓ, PM₂.₅, NH₃, NMVOC), water pollutants (PO₄), ozone-depleting substances (CFC-11, CFC-12, HCFC-22), heavy metals (Hg, Pb, Cd), and resource indicators (fossil energy, mineral ores, water consumption).

EU PEF Normalization & Weighting Reference

Official normalization factors (global per-capita annual impacts) and weighting factors from Annex I of Commission Recommendation (EU) 2021/2279. Climate change carries the highest weight at 21.06%, followed by particulate matter (8.96%), water use (8.51%), and resource use of fossils (8.32%).

Data at a glance

77 NAICS sectors

45 tracked substances

16 PEF categories

40 material factors

13 electricity grids

31 product templates

Architecture: Multi-Layer Impact Estimation

A core design challenge in product-level LCA is data availability. A detailed Bill of Materials with supplier-specific emission data is the gold standard — but for consumer products scraped from a URL, you are working with a product title, a price, and maybe a weight.

Our engine addresses this with a multi-layer estimation architecture. Multiple independent calculation methods run in parallel, and the results are blended based on data availability and confidence:

Layer 1 — Spend-based. NAICS sector classification + EPA EEIO factors. Works for any product with a price.
Layer 2 — Activity-based. Material decomposition + DEFRA factors + transport modeling + manufacturing energy. Requires identified materials and weight.
Layer 3 — Reference lookup. Category-level reference values from published product environmental reports and industry studies.

When all layers produce results, they are blended with confidence-based weighting. When materials are unknown, the engine gracefully degrades to spend-based estimation. This ensures every product gets a score — the confidence level just varies.

Multi-Substance Emission Inventory

Most carbon calculators stop at CO₂e. We go further.

The engine builds a multi-substance emission inventory for each product, tracking individual pollutants across the full lifecycle. Each material, energy source, and transport mode maps to a substance-level emission profile:

Steel production emits not just CO₂, but SO₂, NOₓ, and PM₂.₅
Cotton cultivation releases NH₃ and N₂O
A Chinese electricity grid emits different proportions of SO₂ and particulate matter than a French one

These individual substance quantities are then characterized against all 16 PEF categories simultaneously. A single kilogram of NOₓ contributes to acidification, terrestrial eutrophication, marine eutrophication, photochemical ozone formation, and particulate matter formation. The engine computes all of these cross-category impacts.

Full Lifecycle Coverage

The engine models five lifecycle phases:

Raw Materials — Extraction and processing of each identified material, using material-specific emission profiles.
Manufacturing — Assembly energy based on product weight, applied through the country-of-origin electricity grid.
Transport — International sea freight + domestic road distribution, with route-specific distances.
Use Phase — Lifetime energy consumption for electronics and appliances, based on product-category templates with annual energy draw and expected lifespan.
End of Life — Landfill, incineration, and recycling pathways with treatment-specific emission profiles and recycling credits (avoided burden method).

For a refrigerator, the use phase dominates — thousands of kWh over a 15-year lifespan. For a t-shirt, it is raw materials and washing energy. The lifecycle template system ensures each product category gets the right proportional emphasis.

Scoring & Grading

The final output is a 0-100 score with a letter grade (A+ through F).

The scoring uses the official EU PEF weighting factors, dynamically renormalized across whichever impact categories have data for a given product. Products with richer data get scored across more categories; products with sparse data fall back to the core three (climate change, water use, resource use of fossils).

Every analysis also produces:

The raw PEF single score in µPt
Per-category breakdowns with characterized, normalized, and weighted values
Coverage statistics showing exactly how many of the 16 categories were computed from real emission data
Lifecycle phase indicators showing which phases (materials, manufacturing, transport, use, end-of-life) contributed to the assessment

Validation

Automated impact calculations can produce outliers. The engine includes a validation layer that checks results against category-specific benchmarks:

Mass balance verification — do material weights sum to the declared product weight?
Impact range checks — flagging results that deviate significantly from published benchmarks
Outlier capping — preventing runaway estimates from propagating to user-facing scores
Confidence assignment — high, medium, or low based on data completeness and consistency

What We Learned

Public data is good enough. The combination of EPA EEIO, DEFRA, IPCC AR6, and EF 3.1 characterization covers the vast majority of consumer product impact categories. You do not need an Ecoinvent license to build a useful product-level LCA engine.

Multi-source blending matters more than single-source precision. No single dataset covers every product. The ability to gracefully combine spend-based, activity-based, and reference-lookup estimates — and to communicate the confidence of each — is more valuable than having a single high-precision source.

PEF is more than CO₂. The 16-category framework reveals impacts that a CO₂-only metric misses entirely. A cotton t-shirt looks moderate on carbon but heavy on water use and eutrophication. A smartphone is carbon-intensive but negligible on land use. Collapsing everything to a single CO₂e number erases these distinctions.

Lifecycle phase modeling changes the answer. For appliances, adding use-phase energy flips the ranking. A cheap, inefficient refrigerator can have a significantly higher lifetime impact than an efficient one — even if their manufacturing footprints are similar.

This engine is the foundation of the GreenMetric.AI scoring API. We are continuing to expand substance coverage, improve uncertainty quantification, and add regional specificity. If you are building sustainability tooling and want to integrate product-level environmental scoring, check out our API documentation.