Collaborative Science: a case study and model Andy Packard, Michael Frenklach


Download 484 b.
Sana10.07.2018
Hajmi484 b.


Collaborative Science: a case study and model

  • Andy Packard, Michael Frenklach

  • Mechanical Engineering

  • jointly with Ryan Feeley and Trent Russi

  • University of California

  • Berkeley, CA

  • Presented 10/28/2005 at CITRIS, Berkeley campus

  • Support from NSF grants: CTS-0113985 and CHE-0535542


Collaborators

    • Pete Seiler (UCB, Honeywell)
    • Adam Arkin and Matt Onsum (UCB)
    • Greg Smith (SRI)
  • GRI-Mech Team: Michael Frenklach, Hai Wang, Michael Goldenberg, Nigel Moriarty, Boris Eiteener, Bill Gardiner, Huixing Yang, Zhiwei Qin, Tom Bowman, Ron Hanson, David Davidson, David Golden, Greg Smith, Dave Crossley

  • PrIMe Team:

    • UCB: Michael Frenklach, Andy Packard, Zoran Djurisic, Ryan Feeley, Trent Russi, Tim Suen
    • Stanford: David Golden, Tom Bowman, …
    • MIT: Bill Green, Greg McRae, …
    • EU: Mike Pilling, …
    • NIST: Tom Allision, Greg Rosasco, …
    • ANL: Branko Ruscic, …
    • CMCS
  • Support from NSF grants:

    • CTS-0113985 (ITR, 2001-2005)
    • CHE-0535542 (CyberInfrastructure, 2005-2010)






How did the GRI-Mech come about?

  • Each of the GRI-Mech ODE releases embody the work of many people, but not explicitly working together. How did the successful collaboration occur?

  • Informal mode?

    • assimilate conclusions of each paper sequentially
    • “read my paper”
    • “data is available on my website”
  • No. Didn’t/Doesn’t work – community tried it, but predictive capability of model did not reliably improve as more high-quality experiments were done.

    • Papers tend to lump modeling and theory, experiments, analysis and convenience assumptions, leading to a concise text-based conclusion
    • Conclusions are conditioned on additional assumptions necessary to make the conclusion concise.
    • Impossible to anonymously “collaborate" since the convenience assumptions are unique to each paper.
    • Goals of one paper are often the convenience assumptions of another.
    • Difficult/impossible to trace the quality of a conclusion reached sequentially across papers
    • Posted data is often the text-based conclusion in e-form, little additional information


Traditional Reporting of Experimental Results

  • The canonical structure of a technical report (a paper) is:

  • Description of experiment: apparatus, conditions, measured observable

    • flow-tube reactors, laminar premixed flames, ignition delay, flame speed
  • Care in eliminating unknown biases, and assessing uncertainty in outcome measurement

  • Informal description of transport and chemistry models that involve uncertain parameters

  • Focus on parameter(s) resulting in high sensitivities on the outcome

    • evaluate (numerical sims) sensitivities at nominal parameter values
  • Convenience assumptions on parameters not being studied

    • freeze low-sensitivity parameters at “nominal” values (obtained elsewhere)
  • Predict one or two parameter values/ranges

  • Post values on website (rarely models, rarely “raw” data)





Lessons Learned

  • Chemical kinetics modeling is a form of

    • high dimensional (mechanisms are complex),
    • distributed (efforts of many, working separately)
  • system identification.

  • The effort of researchers yields complex, intertwined, factual assertions about the unfalsified values of the model parameters

    • Handbook style of {parameter, nominal, range, reference} will not work
    • Each individual assertion is usually not illuminating in the problem’s natural coordinates. Concise individual conclusions are actually rare.
    • Information-rich, “anonymous” collaboration is necessary
    • Machines must do the heavy lifting.
      • Managing lists of assertions, reasoning and inference
    • Useful role of journal paper: document methodology leading to assertion
  • The GRI-Mech approach departed from the informal mode

    • used all of the same information (but none of the “conclusions”)
    • in a distributed fashion, successfully derived a model…
    • but, it was a grassroots effort; an organized, community-wide effort/participation is needed now


Two types of assertions: models and observed behavior

  • Two types of assertions: models and observed behavior

    • Assertion of models of physical processes (e.g., “if we knew the parameter values, this parametrized mathematics would accurately model the process”)
    • Assertion of measured outcomes of physical processes (e.g., “I performed experiment, and the process behaved as follows…”)
    • Together, these form constraints in "world"-parameter space of physical constants.
  • Analysis (global optimization) on the constraints

    • Check consistency of a collection of assertions
      • Sensitivity of consistency to changes in a single assertion
      • Discover highly informative (or highly suspect) assertions
    • Explore the information implied by the assertions
      • Prediction: determine possible range of different scalar functions on the feasible set
    • (old standby) Generate parameter samples from the feasible set.


GRI-Mech: Successful Data Collaboration

  • Result:

    • High quality, predictive Methane reaction model: 50+ Species/300+ Reactions
  • Based on:

    • 77 peer-reviewed, published Experiments/Measured Outcomes of ~25 groups
  • Infrastructure to use these did not exist

    • Grassroots effort of 4 groups
    • Decide on a common, “encompassing” list of species/reactions
    • Extract the information in each paper, not simply assimilate conclusions
    • Reverse-engineer assertions in light of the common reaction model
  • The rest was relatively “easy”

    • Optimization to get “best” fit single parameter vector
    • Validation (on ~120 other published results)
  • Features (www.me.berkeley.edu/gri_mech)

    • Only use "raw" scientific assertions - not the potentially erroneous conclusions
    • Treats the models/experiments as information, and combines them all.
    • Addresses the "lack-of-collaboration" in the post experimental data processing.
    • With the assertions now in place, much more can be inferred…




Opposition

  • There were/are criticisms of the overall GRI-Mech approach.

    • “I am unwilling to rely on flame measurements and optimization to extract some fundamental reaction's properties -- I prefer to do that by isolating phenomena”
    • “No one can analyze my data better than me.”
    • “It's too early -- some fundamental knowledge is still lacking”
  • Causes for objection

    • engineering/science distinction
    • distributed effort dilutes any one specific contribution
    • protection of individual’s territory
  • Opposition to the GRI ODE release

    • “Not all relevant data was used to get the latest GRIMech release”
    • “The result (one particular rate constant) differs from my results”
  • Our perspective – deploy the data and the tools

    • let everyone “mine” the community information to uncover hidden reality
    • value will entice groups to contribute new assertions as they emerge
    • illustrate concepts with familiar examples


Manual management of uncertainty propagation

  • Informal, manual (journal paper/email) mode would require an efficient uncertainty description (linear in number of model parameters, say).

    • But this is easy to do this wrong…
    • How about consistent, but simple?
  • For this, use “CRC-Handbook” type description:

    • parameter values
    • plus/minus uncertainty
  • Equivalent to requiring a coordinate-aligned cube to contain feasible set.







Consistency results for GRI-DataSet assertions

  • Collection of 77 assertions is consistent.

  • Nevertheless, a quantitative consistency measure was found to be very sensitive (using multipliers from the dual form) to 2 particular experimental assertions, but not to the prior info.



PrIMe: Process Informatics Model (www.primekinetics.org)

  • Combustion impacts everything

    • Economies
    • Politics
    • Environment
  • Predictive capability leads to informed decisions and policymaking

  • PrIMe: A community activity aimed at the development of predictive reaction models for combustion

  • Challenge

    • to meet immediate needs for predictive reaction models in combustion engineering, the petrochemical industry, and pharmaceuticals
    • build reaction models in a consistent and systematic way incorporating all data and including all members of the scientific community
  • Theme

    • "The scientific community builds the Process Informatics System and Process Informatics builds the community“












Alliance for Cellular Signaling (AfCS)

  • Similar origin to GRI Mech – a few people, frustrated by the uncoordinated, tunnel vision (deliberately leaving out interactions for simplicity sake) of the signaling community

    • brainchild of Gilman (UT Southwestern Medical Center)
    • saw the need for a large-scale examination/treatment of the problem
  • 10 laboratories investigating basic questions in cell signaling

    • How complex is signal processing in cells?
    • What is the structure and dynamics of the network?
    • Can functional modules be defined?
  • Key Advantage of AfCS:

    • High quality data from single cell type
    • All findings/data available to signaling community (www.signaling-gateway.org)
  • from Henry Bourne, UCSF “The collaboration itself is the biggest experiment of all. After all, the scientific culture of biology is traditionally very individualistic and it will be interesting to see if scientists can work as a large and complex exploratory expedition.” (http://www.nature.com/nature/journal/v420/n6916/full/420600a.html)

  • Vision paper in Nature talks about socialistic aspects of science (http://www.nature.com/nature/journal/v420/n6916/full/nature01304.html)



Calcium Signaling Application

  • Together with AfCS scientists, we extracted key, relevant features of calcium response to create 18 experimental assertions

    • Rise time, peak value, fall time
    • 6 different stimuli levels
  • Published models constitute various model assertions

    • Goldbeter, Proc. Natl. Acad Sci. 1990
    • Wiesner, American J. Physiology. 1996
    • Lemon, J. Theor. Biology, 2003
  • Models are ODEs, each derived from the hypothesized network



Calcium Signaling Application

  • Results

    • Goldbeter, 6 states, 20 parameters, invalidated 30 minutes
    • Wiesner, 8 states, 27 parameters
      • 10 node “machine”
      • Invalidated in 2 days
    • Lemon, 8 states, 34 parameters
      • Same 10-node cluster
      • Feasible points found in ~8 hours
      • New additional data led to invalidation
  • Conclusion: likely that more proteins and accompanying interactions are necessary to mathematically describe the signaling pathway.

  • These tools (eg., model-directed experimentation) were not part of the original AfCS mission, but the alliance is acquiring an appreciation of modeling and verification.



How are we computing? Invalidation Certificates

  • Consider invalidating the constraints (prior info, and N dataset units)

  • The invalidation certificate is a binary tree, with L leaves. At the i’th leaf

    • coordinate-aligned cube
    • collection of polynomial (surrogate) models and error bounds, which satisfy
    • sum of squares certificate proving the emptiness of
  • Moreover



How are we computing? Invalidation Certificates

  • Why do emptiness proofs on the algebraic models?

  • Easier. The original problem was

  • Could derive invalidation certificates directly for the ODEs, in principle

    • ODE reachability analysis using barrier (Lyapunov) functions
      • ODE solution cannot get within of for any value of
    • Use sum of squares certificates to bound reachability
    • Sufficient conditions using semidefinite programming
    • For the methane model, the SDPs would be almost unimaginably large
    • Perhaps a fresh look could reveal a new approach…


Error bounds: pragmatic issues

  • Recall, at the i’th leaf

    • coordinate-aligned cube
    • collection of polynomial models and error bounds, which satisfy
  • Error bounds are estimated statistically.

  • They are more likely “reliable” if M is well-behaved. So, through:

    • experience, and
    • domain-specific knowledge
  • the scientist is responsible to design/select experiments/features that are

    • measurable in the lab
    • reasonably well-behaved over the parameter space
  • Random experimental investigations could break the analysis, and lead nowhere… therefore…

  • Prudent experiment selection is critical to success



How are we computing? Summary

  • Transforming real models to polynomial/rational models

    • Large-scale computer “experimentation” on M().
      • Random sampling and sensitivity calculations to determine active parameters
      • Factorial design-of-experiments on active parameter cube
    • Polynomial or rational (just stay in Sum-of-Squares hierarchy) fit
    • Assess residuals, account for fit error
  • Assertions become polynomial/rational inequality constraints

  • Most analysis is optimization subject to these constraints

    • S-procedure, sum-of-squares (scalable emptiness proofs, outer bounds)
      • Outer bounds are also interpreted as solutions to the original problem when cost is an expected value, constraints are only satisfied on average, and the decision variable is a random variable.
    • Off-the-shelf constrained nonlinear optimization for inner bounds
      • Use stochastic interpretation of outer bounds to aid search
    • Branch & Bound (or increase order) to eliminate ambiguity due to fit errors
    • Message: Overall, straightforward and brute force, parallelizes rather easily


Dissemination






Do'stlaringiz bilan baham:


Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2017
ma'muriyatiga murojaat qiling