Burning embers: towards more transparent and robust climate-change risk assessments

The Intergovernmental Panel on Climate Change (IPCC) reports provide policy-relevant insights about climate impacts, vulnerabilities and adaptation through a process of peer-reviewed literature assessments underpinned by expert judgement. An iconic output from these assessments is the burning embers diagram, first used in the Third Assessment Report to visualize reasons for concern, which aggregate climate-change-related impacts and risks to various systems and sectors. These burning embers use colour transitions to show changes in the assessed level of risk to humans and ecosystems as a function of global mean temperature. In this Review, we outline the history and evolution of the burning embers and associated reasons for concern framework, focusing on the methodological approaches and advances. While the assessment framework and figure design have been broadly retained over time, refinements in methodology have occurred, including the consideration of different risks, use of confidence statements, more formalized protocols and standardized metrics. Comparison across reports reveals that the risk level at a given temperature has generally increased with each assessment cycle, reflecting accumulating scientific evidence. For future assessments, an explicit, transparent and systematic process of expert elicitation is needed to enhance comparability, quality and credibility of burning embers. Burning embers figures are used to represent climate-change risk and their transitions. This Review outlines the history and evolution of the burning embers concept, focusing on methodological shifts that increase transparency and allow for a more systematic elicitation process in Intergovernmental Panel on Climate Change (IPCC) reports. The Intergovernmental Panel on Climate Change (IPCC) has used the reasons for concern framework and burning embers diagrams since 2001 to assess and communicate risks from increasing global mean temperature on human and natural systems. The framework and burning embers diagrams are developed using expert judgement, based on available information about climate impacts. While assessment methods and figure design have been broadly retained across IPCC reports, risk levels at given temperatures have generally increased over time, owing to more comprehensive science. Structured expert-elicitation methods can reduce bias and increase reproducibility by specifying the process for selecting experts, providing external information, eliciting individual and consensus judgements, and facilitating group interaction. Recent IPCC Special Reports introduced formal protocols and standardized metrics to elicit risk thresholds. Despite challenges, these changes contributed to transparency and reliability. Further development and use of standardized and transparent methods to elicit risk thresholds and build burning embers diagrams will continue to increase the robustness and credibility of IPCC assessments. The Intergovernmental Panel on Climate Change (IPCC) has used the reasons for concern framework and burning embers diagrams since 2001 to assess and communicate risks from increasing global mean temperature on human and natural systems. The framework and burning embers diagrams are developed using expert judgement, based on available information about climate impacts. While assessment methods and figure design have been broadly retained across IPCC reports, risk levels at given temperatures have generally increased over time, owing to more comprehensive science. Structured expert-elicitation methods can reduce bias and increase reproducibility by specifying the process for selecting experts, providing external information, eliciting individual and consensus judgements, and facilitating group interaction. Recent IPCC Special Reports introduced formal protocols and standardized metrics to elicit risk thresholds. Despite challenges, these changes contributed to transparency and reliability. Further development and use of standardized and transparent methods to elicit risk thresholds and build burning embers diagrams will continue to increase the robustness and credibility of IPCC assessments.

introduced the reasons for concern (RFC) framework 7,8 . In this framework, risks from global mean temperature (GMT) rise are aggregated into five categories: risks to unique and threatened systems; risks associated with extreme weather events; risks associated with the distribution of impacts; risks associated with global aggregate impacts; and risks associated with large-scale singular events 9 . Risk levels are identified using expert judgement, based on available information about relevant hazards, exposure, vulnerability, capacity to adapt and impacts 10 . RFCs are subsequently visualized as burning embers diagrams (hereafter, burning embers), with risk levels expressed as different colours and uncertainties about precise changes conveyed through graded colour transitions (Fig. 1).
Both the RFC framework and burning embers are now widely used tools to communicate risks related to anthropogenic climate change in a way that can support both public discussions and policy decisions. For example, the goal of the Paris Agreement to hold Burning embers: towards more transparent and robust climate-change risk assessments global average temperature "well below 2°C" (reF. 11 ) was informed by IPCC reports that indicated increasingly high risks and limited adaptive capacity beyond warming of 1.5 °C or 2 °C (reFs 5,6 ). As governments update their Nationally Determined Contributions and work to develop disaster-risk-reduction strategies mandated by the Sendai Framework for Disaster Risk Reduction, accurate understanding of climate-related risks continues to be critical for decision-making. Indeed, at the 25th UNFCCC Conference of Parties, governments called for increased support on risk assessments, including those from climate-change impacts. Accordingly, the Sixth IPCC Assessment Report (AR6) will include a chapter on risks across sectors and regions, with potential updates to the RFCs and the burning embers 12 , feeding into the first Global Stocktake of the Paris Agreement in 2023.
However, the RFC framework and burning embers have also been the subject of criticism [13][14][15][16][17][18] , with claims that their production should be more systematic, transparent and comparable across reports 19 . Early assessments, for example, did not detail methods (particularly with regard to the assignment of risk thresholds), raising concerns about the reliability and reproducibility of the burning embers, potentially undermining confidence in the robustness of the RFC framework for policymaking. Some critics have further argued that expert judgement in IPCC reports has been implemented inconsistently across chapters and reports 14 . Critics have, thus, called for a more formalized approach integrating numerical modelling with expert judgement, in addition to indicating the full range of assessments made and any disagreements 20 . Despite improved guidance from the IPCC on the role of expert judgement and uncertainty language [21][22][23] , questions about the RFCs and the burning embers remain.
In this Review, we trace the evolution of the RFC framework and burning embers, using published literature and author experience in IPCC assessments to examine changes in methods and expert-elicitation processes. After outlining the history, we highlight relevant lessons from other fields and document efforts to develop a more systematic approach to estimating risk transitions, as used in the IPCC Special Report on Climate Change and Land (SRCCL) 24 and the Special Report on the Ocean and Cryosphere in a Changing Climate (SROCC) 25 . We subsequently discuss possible improvements to expert consensus for future IPCC cycles. Readers are referred to other reviews for in-depth discussion of elicitation methods [26][27][28][29][30][31][32][33] .

History
We begin by outlining the history of the RFC framework and burning embers, highlighting changes in methods, design and risk thresholds.

Emergence of the RFC framework and burning embers.
RFCs and the burning embers first appeared in the WGII contribution to the TAR (Chapter 19) (reF. 7 ). During their development, the multidisciplinary author teamwhich included physical, biophysical and social scientists -used peer-reviewed literature to debate and assess the level of risk from climate change relevant to dangerous anthropogenic interference. In doing so, it was determined that a single metric or measure would not be sufficient to capture the diversity of risks relevant to policy discussions: concerns included not only global economic gains or losses but also ecosystem impacts and human populations affected by projected climatic hazards [34][35][36] .

Key points
• the Intergovernmental Panel on climate change (IPcc) has used the reasons for concern framework and burning embers diagrams since 2001 to assess and communicate risks from increasing global mean temperature on human and natural systems. • the framework and burning embers diagrams are developed using expert judgement, based on available information about climate impacts. • While assessment methods and figure design have been broadly retained across IPcc reports, risk levels at given temperatures have generally increased over time, owing to more comprehensive science. • Structured expert-elicitation methods can reduce bias and increase reproducibility by specifying the process for selecting experts, providing external information, eliciting individual and consensus judgements, and facilitating group interaction. • recent IPcc Special reports introduced formal protocols and standardized metrics to elicit risk thresholds. Despite challenges, these changes contributed to transparency and reliability. • Further development and use of standardized and transparent methods to elicit risk thresholds and build burning embers diagrams will continue to increase the robustness and credibility of IPcc assessments. Early drafts of the chapter, therefore, included three lines of evidence regarding increases in GMT, which were later renamed RFCs: impacts on unique and threatened systems (such as tropical glaciers, coral reefs or indigenous communities); global aggregated impacts (such as net damages for market and non-market sectors at the global scale); and large-scale discontinuities, renamed large-scale singular events in later reports (such as the shutdown of the North Atlantic thermohaline circulation or the collapse of the West Antarctic Ice Sheet). These areas were deemed relevant to the UNFCCC Article 2. However, authors later decided that an assessment of aggregate damages alone was insufficient to capture economic impacts of climate change on developing countries. Additionally, significant concern was being raised in policy discussions about the impacts of extreme events. Thus, two additional RFCs were subsequently added: distribution of impacts (including the heightened vulnerability of developing countries) and probability of extreme climate events (including floods, soil-moisture deficits and tropical storms) 7 (J. Smith, personal communication).
For each RFC, future risks were mapped along a scale of GMT rise above pre-industrial to produce the burning embers, the GMT metric used owing to the availability of relatable impact literature and the direct relation to greenhouse gas concentrations 16 . GMT as used in the TAR, and in this paper, refers to global mean surface temperature change as used also by all Working Groups in the Fifth Assessment Report (AR5). GMT increase was assessed in increments of whole degrees as small (~2 °C), medium (2-4 °C) and large (>3 °C) 7 .
Expert judgement of the author team -solicited during lead-author meetings and subject to rounds of expert and government review -was subsequently used to determine the transition between different levels of risk; owing to similar levels of warming being reached at different times in the literature, it was not possible to directly detail rates of change or the dynamics of adaptation. Importantly, judgements were based on the authors' expert assessment of the literature, but this process was not conducted nor documented systematically.
While a greyscale version of the burning embers was included in Chapter 19 of the TAR, the first colour rendition was published in the WGII TAR Summary for Policy Makers 37 (SPM) (Fig. 1). For this figure, the colours were selected to reflect those of a traffic light, a colour scheme that has been used to inform decision-making in fisheries management [38][39][40] , health 41 , the geosciences 42,43 and climate change 44,45 . Instead of green, however, white was used to avoid the impression of safety or absence of risk but, rather, to show neutral or small negative or positive impacts 5,44 . Yellow was used to indicate negative impacts for some systems or low risks that could become evident at the denoted increase in GMT. Red, by contrast, was used to indicate negative impacts or risks that could be more widespread or greater in magnitude 46 . Gradients of shading across the colours reflected the fact that markers did not define absolute thresholds but, rather, an approximate indicator of when impacts might occur 7,47 . Uncertainties about transitions resulted from various sources, including global-warming projections, changes in adaptation or social vulnerability over time, nature of the impact assessments and expert judgement itself.

Evolution of methods and design.
Since the TAR, the RFC framework and burning embers have been used extensively in IPCC reports and other scientific literature [48][49][50][51] , with refinement and enhancement of both the construction methods and the design (Fig. 2).
For example, the IPCC's Fourth Assessment Report (AR4) (Chapter 19) included a section updating and revising the RFC assessment 52 . In particular, it introduced a more formal criteria-based approach to ensure there was a clear and transparent logic to the choice of specific RFCs 53 . The criteria included: magnitude of impacts; timing of impacts; persistence and reversibility of impacts; potential for adaptation; distributional aspects of impacts and vulnerabilities; likelihood (estimates of uncertainty) of impacts and vulnerabilities, and confidence in those estimates; and importance of the systems at risk. In the end, only the first six criteria were applied to the assessment, as the last was deemed to be in the realm of the judgements of policymakers 3 .
As in the TAR, RFCs produced during the AR4 were based on expert judgement, with a compilation of evidence used to ascertain the location of transitions between risk levels as a function of GMT. Judgements about 'risks of extreme weather events' , for instance, were based on the literature presented on the science and impacts of extreme weather events in the AR4 Working Group I (WGI) and WGII reports [54][55][56] Fig. 1 | Third Assessment Report representation of burning embers and the reasons for concern framework. The first depiction of burning embers to illustrate the impacts or risks from climate change, with each row corresponding to a specific reason for concern. Shading represents the severity of impact or risk: white signifies no or virtually neutral impact or risk, yellow signifies somewhat negative impacts or low risks and red signifies more negative impacts or higher risks. Figure is reproduced from the Technical Summary of the Contribution of Working Group II to the Third Assessment Report; an identical figure, in greyscale, was included in Chapter 19. Figure 53 and in other chapters of the AR4 WGII report 57,58 . For the first time, risks under different warming trajectories and risk to specific sectors were explored. For instance, in Chapter 4, risks to ecosystems were depicted against different levels of global mean annual temperature rise according to two different global-warming trajectories 59 . Colour transitions corresponding to those of the burning embers were applied to visualize risks for impact categories and ecosystems in general 57 . In Chapter 11, burning embers were used to depict vulnerability of key sectors to climate change against different emissions scenarios, highlighting possible coping and adaptive capacity at different levels of temperature increase 58 .
Both RFCs and burning embers were also included in the AR5, specifically, the WGII report (Chapter 19 and the SPM) 60,121 and the Synthesis Report 61 . The AR5 differed from its predecessors in that it built on a new risk framework adapted from the 2012 IPCC Special Report on Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation (SREX) 62,63 . This framework more formally expressed risks as a combination of climate stressors or hazards, and ecosystem and societal exposure and vulnerability to these hazards, which varies widely. In the AR5 (and previous assessments), exposure and vulnerability were incorporated implicitly in risk judgements to the extent that the underlying literature took it into account. Autonomous adaptation as well as limits to adaptation were also considered 9,59 . At the time, the literature was not extensive enough to support explicit differentiation of risks along adaptation pathways or specific dimensions of vulnerability and exposure. However, AR5 authors suggested that future risk judgements in RFCs and burning embers should be a function not only of GMT change but also of levels of exposure, vulnerability and adaptation 9,60 .
Using the literature, an initial assessment of risk transitions was made by AR5 Chapter 19 authors for each RFC. These authors subsequently confirmed or revised their judgements based on input from other chapters in the WGII report. That input took the form of a novel identification of 102 key risks (climate-related risks with the potential to become severe and where there is limited ability to adapt), grouped into eight overarching categories and mapped onto the five RFCs. As before, risk transitions were identified and assessed via group consensus. However, the more systematized approach of using key risks to inform RFCs was deemed to allow consistency and transparency, as well as integration of different types of evidence 9 .
The AR5, and a subsequent academic paper, further introduced confidence levels for transitions based on the type, extent and agreement of evidence 9 . These confidence levels complemented the visual representation of uncertainty as blurred boundaries between risk transitions in the burning embers and helped differentiate transition judgements with higher and lower confidence. For example, transitions from undetected to moderate risk for RFC1 and RFC2 (unique and threatened system, and extreme weather events) were made with high confidence based, in part, on a growing literature on the detection and attribution of impacts. By contrast, judgements about risk transitions for RFC5 (large-scale singular events) were made with medium confidence, given substantial uncertainty in the projection of the timing of ice-sheet loss 9 .
Although the design and aesthetics of the burning embers has been broadly retained across successive IPCC reports, variations have occurred. In concert with the growing literature on detection and attribution of climate impacts, the colour scale of the embers was slightly modified in the AR5 (reF. 60 ); white now indicated undetectable risk with no impacts that can be attributed to climate change; yellow indicated moderate risk with detectable impacts that could be attributed to climate change with at least medium confidence; red indicated high risk, where risks of severe and widespread impacts are judged to be high on one or more of the specific criteria for key risks; and purple was introduced to indicate very high risk, where all specific criteria for key risks were at very high levels, including irreversibility of an impact and exceedance of adaptation limits.
Recent Special Reports further introduced refinements and variations to the burning embers and RFC framework. For example, the 2018 IPCC Special Report on Global Warming of 1.5 Degrees (SR15) 64 contained sector-specific embers, along with aggregated RFC embers. Given the remit of the SR15, risks were only assessed up to relatively low levels of global warming, focusing on present warming of +1.0 °C and future warming, such as +1.5 °C and +2 °C above pre-industrial levels. The two additional Special Reports in the AR6 cycle integrated an analysis of the role of changes in vulnerability, exposure and adaptation for climate risk more explicitly. Building on previous recommendations and illustrations 9,60 , the SRCCL considered differences in vulnerability (including adaptive capacity) and exposure to risk using the emerging literature based on the Shared Socioeconomic Pathways [65][66][67][68] . It also assessed risks associated with certain land-based mitigation actions. The SROCC introduced burning embers illustrating risk reduction in the presence of specific effective adaptation-related response 69 . Other metrics instead of GMT, such as sea-level rise, atmospheric CO 2 and land area used for bioenergy, were also used on the vertical axis of burning embers figure building on innovations that first appeared in the AR5 (Fig. 2).

Climate-risk thresholds.
Alongside the evolution of the RFC framework and burning embers has come a corresponding shift in the quantification of climate-risk  (8) Risk as 'burning embers' for five RFCs, linked to temperature projections and scenarios (see details in Fig. 1)

AR4 WGII
• Australia and NZ chapter (58) : risk for several systems using a specifc scale (illustrated to the left) • Ecosystems chapter (57) : risk as burning embers supplemented by specific examples

AR5
• WGII (121) : updated risk framework and RFCs; suggestion to differentiate by levels of vulnerability; regional risks considering adaptation Synthesis report (61) • Burning embers for specific systems using hazard indicators other than GMST (illustrated to the left) SRCCL (24) • Risks related to land systems Smith et al. (53) • Updated TAR reasons for concern incorporating AR4 findings (illustrated to the right) Fischlin (51) • Increase in assessed risk from TAR to AR4 illustrated by connecting equal risk points for each RFC Vellinga and Swart (45) Risk associated with global warming synthesized in a 'traffic light' presentation (illustrated below) Gattuso et al. (49) Impacts of warming and ocean acidification on marine ecosystems (and services) as burning embers SR15 (10) Updated RFCs and risks for specific natural and human systems, up to 2.5 °C warming above pre-industrial Magnan et al. (50) Spider chart of risks for a range of ocean ecosystems as a function of GHG emission scenarios (illustrated below) O'Neill et al. (9) AR5 RFCs with icons for risks that are important for the colour transitions and confidence levels (illustrated below) ◀ transitions. In most cases, the risk level at a given temperature has increased with each subsequent assessment, especially between the TAR and the AR4 (reFs 51,70 ) (Fig. 3). A comprehensive determination of the causes of these changes in judgements would require analysis beyond the scope of this paper, but major features of the changes suggest that advances in science (including detection and attribution) and broadening of the available literature explains most of the differences between assessments 60 . Indeed, for RFC3 (distribution of impacts), it is thought that new knowledge synthesized in the AR4 provided a better identification of systems, sectors and regions that are particularly at risk, especially in Africa 53 . Similarly, more information on changes in extremes and their impacts led to the increased risk reported in RFC2 (extreme events) in the AR4 (reF. 53 ). Across RFC categories, a striking change occurred in judgements of the temperature at which risks associated with large-scale singular events (such as ice-sheet collapse) become high, falling from ~5.5 °C (above pre-industrial) in the TAR to <2 °C in the SR15 following new findings in climate science [71][72][73][74][75][76] . For example, the AR5 indicates that the main cause of change since the AR4 was new evidence about ice-sheet loss during the last interglacial period, at no more than 2 °C average global warming above pre-industrial 60 . Adaptation to associated sea-level rise was deemed to be possible if ice-sheet loss occurs slowly, such as over a millennium 9 . However, since the AR5, new observations suggest that the West Antarctic Ice Sheet is already in the early stages of marine-ice-sheet instability 64 . The SR15 also considered findings about the slowdown of the Atlantic meridional overturning circulation, the El Niño-Southern Oscillation and the role of the Southern Ocean in the global carbon cycle, concluding that risk levels at lower temperatures had increased 64 .
By contrast, a number of risk transitions have remained relatively stable across multiple reports, such as the transition to medium and to high risk for the RFC on unique and threatened systems and for the RFC on distribution of impacts. New literature is still important in such cases, when it provides more confidence in judgements through broader physical, ecological and socio-economic evidence compared with previous assessments. For example, between the AR4 and the AR5, the literature provided new insight on how ocean acidification and warming together increase long-term coral degradation 77 .
In some cases, the level of risk at a given temperature has decreased slightly in subsequent reports. For example, risk levels associated with extreme events appear at somewhat higher global warming levels in the AR5 compared with the AR4. At least two factors may have contributed to such changes: refinement of the framework, with clearer criteria for judging risk, and more precision in the consideration of temperature levels associated with risks. In particular, the TAR mainly refers to risk estimates for broad temperature ranges (observed past, <2 °C above 1990, 2-3 °C, >3 °C), and it further clarifies that temperature should be taken as approximate indications of impacts, not as absolute thresholds 7 . Thus, the information from the TAR was both more limited and expressed with less detail, indicating that small differences with the following reports should not be overinterpreted. Moreover, the TAR associated white areas with no or virtually neutral impact or risk 47 , while the AR5 refined the definition for the transition between white and yellow by adding the criterion that impacts had to be detectable and attributable to climate change with at least medium confidence 60 . The requirement of attribution of impacts to climate change in the AR5 has probably also contributed to the judgement of less climate-change-related risk from extreme events at low levels of warming.
These results highlight the potential use of the RFC framework and burning embers in evaluating trends in risks over time. It is also clear that, to allow for comparisons, consistent and fully reproducible techniques must be used in rendering the embers, including choice of colours. Continued use of the RFC framework and burning embers requires enhanced and sustained attention to how the increasing amount of knowledge is taken into account to assess risks. More clarity on the details of the evolution of the conceptual framework and greater rigour in the assessment methodology are needed, while considering that it would be preferable for results to still be comparable with earlier assessments.

Expert elicitation
Given the critical role of expert judgement in the RFC framework and construction of burning embers, it is prudent to explore how expert elicitation is conducted in other, well-established disciplines. Expert elicitation, for example, is used frequently in health sciences 26,28,33 when insufficient empirical evidence exists to inform clinical recommendations, parameters of decision analytic models, research priorities, quality indicators or best practices in research [78][79][80][81][82][83][84][85][86] . Best practices from other disciplines can inform the development of an explicit, systematic and transparent protocol for eliciting expert judgements in IPCC processes, such as the construction of burning embers.
Well-known approaches for structured, formal expert elicitation include the Classical Method 87 , Consensus Development Conference 88 , the Delphi method 89 , the Nominal Group Technique 90 and the Sheffield Elicitation Framework (SHELF) 91 (Box 1). All these approaches share several common practices for recruiting experts, preparing the elicitation exercise, eliciting and aggregating expert judgements, transforming individual judgements into data useful for analyses and aggregation, and providing feedback to experts and the wider scientific and policy communities [27][28][29]33 . However, these formal expert-elicitation methods vary in the specific design features that have been developed to facilitate the performance of experts in providing accurate, reliable and replicable judgements (TaBle 1).
Previous syntheses of expert-elicitation approaches have not identified any one standard methodology as inherently superior to others but, rather, discuss which approaches may be more or less appropriate for a given context in their traditional form 28  Aspects of elicitation to consider for a given exercise include the data-collection technique (such as Likert ratings or parameter values), elicitation mode (such as in-person, online or hybrid) and process for synthesizing individual elicitations into a group judgement (such as statistical aggregation or group facilitator discretion). Expert-elicitation processes can take considerable effort on the part of the researcher and require the commitment of respondents. As a result, there are several considerations that need to be taken into account. While face-to-face discussions provide an opportunity for experts to examine disagreements in depth 93 and take ownership of the material 28 , they are also more costly in time and money. In addition, facilitators of expert-elicitation exercises need to consider potential biases that can result from the type of respondents selected, the type of preparatory material provided, the elicitation questions and method of analysis, and specific research design. For example, anchoring subsequent questions to answers given to the first question, accessing the easiest-to-retrieve memory to answer questions and lowering probabilities through range-frequency compromise are only a few of the psychological biases found in expert elicitation 94 . Careful consideration and reporting of sources of bias is, therefore, required 87,94,95 .
Despite these challenges, structured expert-elicitation approaches are increasingly used in a variety of fields [96][97][98][99] , including in environmental and climate sciences [100][101][102][103] . Expert judgement has now been employed to examine climate sensitivity 104 , tipping points in the climate system 105 and future sea-level rise 99 . Expert-elicitation processes, as well as techniques such as systematic reviews 102,103,106,107 , may be useful to summarize and evaluate findings or help fill knowledge gaps where insufficient data are available. Such techniques could also be employed constructively in IPCC risk assessments, as a wide range of literature needs to be assessed and emerging trends appraised.

Expert elicitation in IPCC Special Reports
Responding to earlier critiques of the standardization and rigour of the RFC framework and burning embers, Special Reports in the IPCC AR6 cycle 24,25,69,108 have incorporated elements of expert-elicitation protocols used elsewhere. The SRCCL, for example, focused on documenting and standardizing the expert-elicitation process, while the SROCC developed a standardized scoring system for risk-threshold judgements.
Methods in the SRCCL. The SRCCL sought to identify risks to humans and ecosystems from climate-change interactions with land processes. To address these issues, a wide range of literature related to climate change, landuse change and socio-economic development pathways was assessed. A systematic approach was needed to characterize the risks reflected in this wider literature and to account for sources of uncertainty and variability, including professional biases inherent in individual experts' interpretation of the literature.
An expert-elicitation process based on design features commonly employed in the Delphi and SHELF methods was used to combine the benefits of both individual and collective deliberations ( Fig. 4; Supplementary  Information). To make the methodology more transparent, a protocol for eliciting expert opinions was developed including an a priori plan for analysing the data (a pre-analysis plan), an explicit sampling frame and eligibility criteria for selecting experts to participate in the process. Eight SRCCL authors participated in the elicitation process, representing different chapters, regions, genders and disciplinary backgrounds, decreasing the impact of individual subjective biases on risk transitions and confidence levels 106 .
In the first step, >300 journal articles referenced in SRCCL chapters and beyond were reviewed to extract quantitative and qualitative information about past and future impacts of climate change, socio-economic pathways and land-use scenarios on humans and ecosystems. Evidence from each of these articles was added to a shared database, which experts used to agree upon thresholds for each risk level. A risk was considered moderate if fewer than 1 million people or between 50 and 300 million hectares were likely to be adversely affected.
The expert elicitation took place over three rounds. In the first round, for every ember, experts provided a quantitative judgement of the GMT levels (upper bound, lower bound and best estimate of location of transition) corresponding to each of the three risk-level transitions, along with reasoning for these judgements.

Box 1 | Overview of formal expert-elicitation methods
the classical model of structured elicitation scores experts on their performance against empirical data for known parameters, using their performance to create and validate combinations of all expert judgements on the unknown variables of interest 119,120 . calibration questions are used to assess the statistical accuracy and information provided by each expert. Performance-based weights are then used for combining the expert judgements on the unknown variables of interest. the consensus Development conference involves an open meeting over several days of a selected group of experts 88 . this provides a public forum for the discussion of issues on the topic of interest. Stakeholders external to the expert group make presentations that are considered by the expert group until they reach consensus on a decision. both the public and the private sessions of the consensus Development conference are chaired 33 .
the Delphi method provides a structured, systematic communication approach for experts to independently and anonymously provide their initial judgements about the topic of inquiry. there are iterative rounds of feedback and modification of judgements based on the views of other experts on the panel 86 . the final group consensus 26 is from statistical aggregation of the individual responses.
the Nominal Group technique aims to structure interaction within a group or committee with differing views 90 . experts first record their ideas independently and privately. After collating these ideas, the facilitator then lists one idea from each expert in front of the group in a 'round-robin' fashion until all ideas have been listed and discussed. Each expert privately records their judgements for each idea until discussion ceases. lastly, the individual expert judgements are aggregated statistically to derive the group judgement. this technique allows more ideas to be expressed and elaborated due to the initial brainstorming and following discussion of all generated ideas 33 .
the Sheffield elicitation Framework (SHelF) elicits probability distributions for uncertain quantities from a group of experts to inform policy decisions 91 . SHelF typically involves a face-to-face meeting with a small group (approximately 6-10) of experts led by a trained facilitator. All members are aware of each other's responses and the group discusses modifications to a probability distribution aggregated from their individual responses until consensus is reached. Guidance and templates for pre-elicitation, elicitation, facilitation and achieving group consensus judgements based on the perspective of a rational impartial observer are publicly available.
These results were anonymized, transitions were aggregated and plotted to show the spread of results (Fig. 5), and then shared with the full group of experts. In the second round, experts had the opportunity to revise their quantitative assessment and rationale. This revision typically involved re-examining the literature for transitions in which the initial judgement differed from other experts. Results were again compiled, anonymized and shared with experts. The third round consisted of a group discussion with a facilitator, who ensured that due consideration was given to differing evidence-based viewpoints among panellists 107 . In the group conversation, an expert would present an ember, describing their choice of transition and citing the literature that supports it. Each transition was discussed until consensus was reached. The anonymized results and facilitated discussion diminished the risk of any one expert dominating the consensus process 109 .
Several factors during the group discussion helped experts refine the transition ranges and converge towards consensus. First, 'very high risk' requires that there is limited ability to adapt 60 . Agreeing on what constituted adaptation and what constituted a limit to adaptation narrowed the uncertainty range for the transition to very high risk. Second, the assessment involved a large number of studies reporting a wide variety of indicators and methods (such as detection and attribution, biophysical models and economic models). The group discussion led to a better understanding of how to account for diverse sources in judgements. Lastly, uncertainty in the risk transitions is represented through both the width of the temperature transition and the confidence level of the transition. Narrow transitions may be more informative for policymaking. However, narrowing a transition range typically comes at the cost of reducing the confidence level associated with the transition. A compromise had to be found during the discussion to minimize the width of the transition while maintaining the highest possible confidence level.
Methods in the SROCC. While the SRCCL used a structured process to improve the robustness and traceability of the risk assessments, the SROCC used standardized metrics to evaluate end-of-century risks from sea-level rise. The assessment also considered risks under specific adaptation pathways, as well as four illustrative geographies covering a wide range of low-lying coastal situations across different latitudes, hemispheres, development contexts and urban or rural settings 69 : resource-rich coastal cities, large tropical agricultural deltas, urban atoll islands and Arctic communities. Nine metrics were used as proxies for the components of the IPCC risk framework: the hazard (coastal flooding; coastal erosion; and salinization of groundwater lenses, soils and surface waters) and exposure and vulnerability of ecosystems and people (density of assets and degree of degradation of natural buffer ecosystems). As an innovation, four generic types of response to sea-level rise (hard-engineered coastal defences; restoration of degraded ecosystems; relocation of people and assets; and limiting subsidence) were incorporated into the assessment, making it possible to explore risk levels and transitions by 2100 under low-to-moderate and maximum adaptation.
The SROCC assessment method relied on a scoring system and expert judgement by eight chapter and external contributing authors, but without implementing a formal elicitation method, as done in the SRCCL (Fig. 6; Supplementary Information). The scoring system was designed to assess the relative contribution of each of the nine metrics to risk at present and by 2100 against three sea-level-rise scenarios: mean Representative Concentration Pathway 2.6 (RCP2.6) (+43 cm), mean RCP8.5 (+84 cm) and the upper likely range of RCP8.5 (+110 cm). The scoring exercise relied on the existing literature and considered well-documented specific real-world case studies, such as New York City (USA), Rotterdam (Netherlands) and Shanghai (China) for the resource-rich coastal cities, and Malé (Maldives), South Tarawa (Kiribati) and Fongafale (Tuvalu) for the urban atoll islands. For each illustrative region and each metric, the scores were aggregated by sea-level-rise scenario to highlight two risk levels in 2100, one under a 'low-to-moderate response' scenario (with small additional efforts in adaptation compared with today) and one under a 'maximum potential response' scenario. 'Maximum potential response' in this context referred to an ambitious and effective combination of both incremental and transformational adaptation (population relocation), assuming minimal financial, social and political barriers. Such distinction was made by associating positive scores to the hazard and exposure-vulnerability metrics, as they increase risk, and negative scores to the adaptation-response metrics, as they decrease risk.
The full range of theoretical aggregated scores (minimum to maximum, from 0 to 75) represented the full range of the IPCC risk colouring language (from white to deep purple) to highlight nine scoring levels: undetectable, undetectable to moderate, moderate, moderate to high, high, high to very high, very high, very high to extremely high and extremely high. As such, a specific risk level was assigned to each aggregated score, producing the final burning embers and risk transitions.

Strengths and limitations of the SRCCL and the SROCC approaches.
The use of formal expert-elicitation processes and the development of scoring systems for judgements increased the transparency and robustness of the RFC framework and burning embers. For example, methodological techniques used in the SRCCL -such as a common database of literature, anonymous judgements with justification and multiple expert-elicitation rounds -is considered to have reduced biases arising from pressure to conform to dominant individuals or anchoring in individual opinions 30,107,110 . The scoring process in the SROCC, and the articulation of metrics and judgement criteria and publication of outcome in a detailed supplement, further increased transparency and standardization.
Despite this progress, limitations remain. Both the SRCCL and the SROCC assessments would further benefit from the involvement of a greater diversity of experts. Studies have shown that heterogeneity, or a diversity of profiles and areas of expertise, in expert groups may lead to better performance than a more homogeneous group 106 . Common social and psychological biases in human decision-making, such as groupthink 111 and overconfidence 112 , may limit performance in homogeneous groups. The SRCCL could also have more clearly articulated metrics with which to evaluate risk thresholds. In turn, the SROCC could have used multiple rounds of independent evaluations to better record spread and changes in judgements. Additionally, the relationship between risk and selected SROCC metrics and aggregated scores will need to be further validated.
Ultimately, the selection of an optimal expertelicitation technique within the IPCC process will need to balance the confidential nature of the process, finite funds, limited time for meetings, coordination challenges related to the geographical distribution of authors across different time zones 113 , the broad scope of relevant literature, the integration of different forms of evidence and different values 114 . Future IPCC assessments may wish to build on, or combine, the advances made in the SRCCL and the SROCC. For example, a set of experts could use a scoring system, as in the SROCC, to estimate the contribution of different human-related drivers to risk levels under various climate-change scenarios. Then, as in the SRCCL, multiple independent assessment rounds could be used in combination with collective deliberations to discuss individual experts' underlying rationales and get consensus on final risk scores. This approach could be particularly useful for evaluating risks from climate change to human well-being or security -areas in which identifying temperature-related thresholds for risk could be difficult, given both the state of the literature and context specificities. Future assessments could alternatively draw from other relevant expert-elicitation approaches. Either way, we believe that adding structured design features to expert elicitation should be encouraged, and the advantages and disadvantages of different approaches further investigated.

Summary and future recommendations
The RFC framework and burning embers are key components of the IPCC's risk-assessment process. The RFC framework helps aggregate climate-related risks into easily understood and policy-relevant categories, while the burning embers communicate risks using a common  colour system and scale. This framework and iconic image have played a role in public policy and discourse 6 . Whereas methods for analysis and design elements have been broadly retained across successive IPCC reports, changes have been made over time, including the consideration of key risks, altered use of colours and the addition of confidence judgements. This Review indicates that the risk level at a given temperature has generally increased with each subsequent assessment. It is critical to ensure that these changes are driven by new science and not by methodological variations between reports or author bias. The SRCCL and the SROCC added innovations to the expert-elicitation process to strengthen scientific robustness and credibility.
For further enhanced usefulness in IPCC assessments, authors of IPCC reports may wish to identify and apply a standardized expert-elicitation protocol to unify anonymous judgements with group discussions. Whatever method is selected, at a minimum, the protocol should specify the process for providing external information, eliciting individual judgements, facilitating group interaction and developing consensus judgements. Assessment objectivity can be ensured by having an independent moderator (not a chapter author or a participant in expert elicitation) facilitate discussions. Assessment transparency can be strengthened by making workflows open, including tables with data from relevant literature, scoring and by pre-publishing the protocol 69 . Authors might also wish to provide clear criteria or narratives for risk transitions used in ember diagrams. Maintaining consistent risk thresholds and metrics across chapters and reports would help strengthen communication about changes in risk levels for the decision-makers as well as the general public. Additionally, clear design protocols should be created for burning embers figure creation, including the use of standardized colours for risk levels and a standardized format to indicate confidence levels, as well as specific requirements on translating numerical risk estimates into ember graphs, preferably using a standard computer code or program.
Future risk assessments should also include consideration of regional risks, the impacts of socio-economic pathways on risk and diversified adaptation and mitigation scenarios and dynamics. Demand for regionally specific RFC or embers has been growing 115 . The SRCCL and the SROCC laid some foundations for such analysis by including illustrative geographies or embers for specific latitudes. The SRCCL paved the way for the consideration of the influence of socio-economic pathways on risk, while the SROCC considered the relationship between different adaptation measures and risk. Assessing the potential benefits of adaptation in terms of risk reduction is key to progressively informing important and emerging policy concerns, such as limits to adaptation, residual risks and benefits to be expected from both mitigation and adaptation. The influence of the rate of climate change on risks and adaptation potential could also benefit from further investigation. Further methodological improvement is needed to evaluate human adaptation, adaptation capacity relative to the degree of climate change and adaptation limits including evolutionary and sociocultural factors. Going forwards, the expert-judgement-based risk assessments should explicitly document how they consider changes in ecological and anthropogenic exposure and vulnerability (driven by socio-economic development as well as adaptation or mitigation responses), in addition to changes in physical climate drivers of risk related to rising GMTs (as provided by IPCC WGI). One important contribution to this integration is provided by new impact modelling initiatives that can streamline the development of burning embers across the RFCs and contribute to tracking their evolution over time.
The Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) 116 , for example, aims at harmonizing impact projections across a range of sectors by providing consistent socio-economic and meteorological forcing data and a unified simulation protocol for multi-model assessments. The ongoing phase (ISIMIP2b) 117 provides a unique opportunity for supporting quantitative risk assessments for the AR6 cycle. Acknowledging that no one modelling approach is free of shortcomings and that models are unable to capture all dynamic processes related to impacts and adaptation, it will be important to continue to consider other sources of evidence of changing risks and effectiveness of various options for risk management.
Finally, there have been calls to better communicate the likelihood of future impacts to help policymakers set priorities on actions 19,118 . Some attempts have been made to do this in the past. For example, Figure  SPM.10 in the AR5 links the levels of risk by GMT with over-time-cumulative anthropogenic CO 2 emissions, illustrating the emissions reductions required to keep GMT, and risks, below certain levels 118 . However, finding ways to clearly visualize the severity of impacts, the likelihood and rate of change of temperature remains a challenge. WGI and WGII might wish to collaborate and identify ways to further communicate the probability of potential impacts, considering that risk associated with low likelihood but severe impacts cannot be ignored.
The run-up to the first global stocktake of the Paris Agreement in 2023 and the next round of the UNFCCC's periodic review of the long-term global goal under the Convention and of overall progress towards achieving it provide an opportune moment for ongoing work in all these areas. Increasing the reproducibility and credibility of risk assessments, and improving clarity and communication of the results, can facilitate more effective global, regional and local decision-making about consequences of dangerous anthropogenic interference with the climate system. Having more credible and traceable information about risk levels associated with different levels and rates of warming, and different socio-economic pathways or adaptation measures, policymakers will be better equipped to make decisions about mitigation, adaptation and disaster-risk reduction. The IPCC has made significant progress in this area, with innovations to the reasons for concern framework and burning embers figures in recent Special Reports. We hope this Review will enable future assessments to continue to strengthen expert elicitation, thus, allowing information to aid a range of climate policy decisions to minimize, and preferably avert, dangerous anthropogenic climate change.
Published online 10 September 2020 Development of the assessment protocol • Identification of the relevant metrics (based on initial proposal by one lead author and then interaction with all contributors) • Definition of the scoring system and aggregation rules • Establishment of correlations between the scoring system and the IPCC risk colour scale  Fig. 6 | Flow chart of sea-level-rise risk assessment in the Special Report on the Ocean and Cryosphere in a Changing Climate. A standardized scoring system was created to assess the relative contribution of nine metrics to risk at present and by 2100 against three sea-level-rise scenarios 69 . IPCC, Intergovernmental Panel on Climate Change; SPM, Summary for Policy Makers.