The objective of the Scientific Feasibility (SciFy) program is to develop computational methods
that measure the feasibility of claims in order to enable accurate assessments of scientific
content. The program’s focus will be on claims that express scientific and technological
capabilities. The program aims to demonstrate that the scientific feasibility of claims can be
determined by using automated reasoning to decompose claims into constituent, verifiable parts.
The program will produce methods that perform well beyond current automated fact-checkers,
recognizing that feasibility assessment is a complex process that requires breaking claims down
into constituent components that contribute to a whole, resulting functionality. Automatically
assessing each component may involve identifying and using existing technological
advancements, foundational scientific principles, data, software or simulation results, as well as
current industry standards or benchmarks. This assessment necessitates the development of
sophisticated automated techniques capable of managing the rapid expansion of evidence, ensuring that the synthesis and explanation of this evidence is both efficient and reliable. It is
also necessary to determine whether the claimed technological capability, while theoretically
possible in parts, is practical and realistic when considered as a whole, which may require
evaluating logical consistency, system integration, and compatibility considerations.
A scientific feasibility assessment must address both the limitations and the potential of
capability claims, which may involve the consideration of scientific knowledge that spans
disciplinary boundaries. The methods produced on the program should not only emulate subject
matter expert validation, but allow for analyses that extend beyond the limits of human cognition
to create and consider technical hypotheses that are possible based on available scientific
knowledge. Additionally, the approaches developed on the program should consider the
operational feasibility of technologies, for example, under time and cost constraints.
Deadlines:
o Abstract Due Date and Time: March 18, 2024, 12:00 PM ET
o Proposal Due Date and Time: April 25, 2024 , 12:00 PM ET
Technical Area 1: Feasibility Assessment
TA1 has two components: (1) claim decomposition to create reasoning chains (or candidate hypotheses), and (2) evaluation of those chains.
Technical Area 2: Test and Evaluation
Previous studies have focused on evaluating reasoning abilities of LLMs for tasks such as
multiple-choice questions or binary classification. Unlike these simple discriminative tasks,
determining the feasibility of scientific claims requires deductive reasoning, a generative activity
where outputs are understood as rationale. Often-used metrics (e.g., BLEU, ROUGE, semanticsimilarity)
are not sufficient for evaluating over large hypothesis spaces where there may exist
many feasible answers. Quantitative evaluations must also avoid LLM contamination, i.e., when
models have already seen the training or evaluation datasets.
If a performer is selected for TA2 award, the performer cannot be selected for TA1 either as a prime or subcontractor.