Databrowsers

A scientific databrowser is the ultimate end product of our work. These web-based systems allow users to interactively interrogate their data, creating on-the-fly visualizations based on user input.

In the press ...

Databrowser graphics are "publication ready" and make it easy for any author to include high quality data analysis and visualization in their reports. Below is a sampling of articles, reports and presentations that have used databrowser graphics.

More about databrowsers ...

A databrowser is a web based interface that allows non-technical users to interact with scientific data.

Making sense of complex datasets depends upon two very different kinds of machines. Silicon based number crunchers (computers) perform complex mathematical calculations at lightning speed with essentially zero errors while carbon based pattern recognizers (our brains) detect visual patterns much faster than any computer and use these patterns to develop further questions about the data.

People enjoy looking at informative scientific graphics if the barrier to creating them is low. When this happens, our species' extraordinary capabilities as pattern recognizers enable us to convert what we see in excellent scientific graphics into a deeper understanding. The problem in many fields of science is that the barriers to creating excellent graphics are discouragingly high.

A bottleneck exists where information is transferred between number crunchers and pattern recognizers. It can take a large amount of time to organize, format and analyze data before generating the graphics that tell the story of the data. Often, the role of data management and analysis is handed over to computer experts rather than the scientists end users with a real interest in the data. With no easy way to create the graphics that they need, the ability of scientists, managers and interested members of the public to develop their intuition about a dataset is greatly impaired.

Scientific databrowsers attempt to solve this problem by hiding the details of data management and analysis while providing simple, intuitive interfaces to the kinds of analysis that are appropriate for a particular dataset. These analyses are typically vetted statistical routines that are written in code in such a way as to be driven by input from a web browser user interface. In this manner, end users including both experts and non-experts can harness the power of (server side) number crunchers as well as their own (client side) pattern recognizers without having to learn the arcana of data management and scientific analysis software.

Building a databrowser.

The process of building a databrowser involves several steps:

  1. cleaning up any problems with the source data so that they are consistent and well organized
  2. writing code that allows vetted statistical analyses to be run interactively
  3. writing code to create high quality scientific graphics based on the results of the analysis
  4. embedding the analysis and visualization code in a web-server based databrowser engine
  5. creating a user interface that allows users to quickly and easily send requests to the analysis and visualization engine running on the server

When properly designed, the code behind a good databrowser can encapsulate a huge amount of institutional memory about the scientific process. Ideally, databrowser graphics should be of high enough quality that they are immediately ready to be included in scientific publications.

Examples

Examples of our work are included below. Click on an image to go to a working example and learn more by interacting with the data.

MUSTANG Databrowser

A global network of seismic stations, maintained by various international partners, keeps track of the earth's movements. The seismic traces generated by this network end up in the Data Management Center (DMC) at the Incorporated Research Institutions for Seismology for quality control and archiving. The DMC also generates metrics from the data that help assess the health of remotely located seismometers.

A new effort is underway at the IRIS DMC to create a database of metrics from over 100 Terabytes of raw seismic traces. The MUSTANG Databrowser provides easy access to the contents of the database with a variety of plots. Users can review the temporal evolution of daily metrics, statistical comparisons for an entire station or network as well as PDF plots and raw seismograms for rapid assessment of the seismometers in a particular network.

This work was funded by the Incorporated Research Institutions for Seismology.

Real-Time Hysplit Trajectories

Smoke from forestfires can have significant impacts on human health, especially in the western US. The Forest Service runs the Air Resources Laboratory's HYSPLIT model with the latest US Weather Service model output to get up-to-date predictions of where smoke from a particular file will end up.

The Real-Time HYSPLIT Trajectories interface allows non-expert users to set parameters for fire location and plume height and choose a meteorological input file by picking a date on the calendar. Each request causes the server to set up and run the HYSPLIT trajectory model and returns results as an image, raw CSV values or a fully annotated KMZ file for display in Google Earth.

This work was funded by the US Forest Service Pacific Wildland Fire Sciences Laboratory.

Arctic Transport Potential

The deposition of soot from wildland fires is a significant contributor to observed warming trends in the Arctic. Knowing when wind patterns are likely to transport soot to the Arctic can help guide decisions about when to use prescribed burns in forest management.

The Arctic Transport Potential Forecast viewer uses trajectory model data based on current weather forecasts to provide daily guidance on the likelihood that emissions from different regions will impact the Arctic. Trajectories calculated for a full North American grid are filtered to see which starting grid cells and which starting heights contribute soot to Arctic regions. Originating cells and Arctic-bound trajectories are displayed in semi-transparent colors to highlight source regions and show patterns of atmospheric flow.

This work was funded by the US Forest Service Pacific Wildland Fire Sciences Laboratory.

Climatological Trajectory Mapper

Smoke from forest fires can have significant impacts on human health, especially in the western US. The Forest Service uses trajectories from atmospheric models to better understand where forest fire smoke would have gone in past years -- providing a first guess as to where smoke from current fires will go.

The Trajectory Mapper accesses 30 years worth of 4x daily model output (6.3 terabytes) to provide a climatological look at smoke trajectory paths. Interactive response time allows USFS scientists to explore this large database in new ways and gain new understanding of how smoke from large scale fires disperses and where it might affect human health.

This work was funded by the US Forest Service Pacific Wildland Fire Sciences Laboratory.

Watershed Explorer

The NetMap community watershed modeling site generates large amounts of GIS data from scientific models that address climate, hydrology, forestry, fish habitat, roads, etc. Providing easy access to the results of these models is highly desirable.

The Watershed Explorer provides access to these data in a new type of web-enabled GIS system that combines powerful statistical features with Google Maps based ease of use. Exploring large GIS databases can now be done in a web browser and no longer requires special GIS skills.

This work was funded by Earth Systems Institute.

Riparian Management Explorer

Various conflicting pressures affect forest management practices in the Pacific Northwest. These include: company profits, community jobs, endangered birds that require mature forests and endangered fish that benefit from large woody debris in streams. Figuring out how to manage streamside timber stands is a challenging task.

The Riparian Management Explorer allows users to ‘adjust the knobs’ on a reach-scale model of wood transport into streams and see the results in real time. Different scenarios can be compared to better understand which sylvicultural treatment will provide the optimal benefit.

This work was funded by Earth Systems Institute.

Energy Import/Export

Access to fossil fuels is one of the most important issues of our time. The world's largest economies are extremely dependent upon imported supplies of oil and gas. Understanding who produces and consumes oil, coal and natural gas is critical today and will remain so in the years ahead.

This databrowser uses data from the BP Statistical Review and displays coal, oil & natural gas production and consumption time lines for each country in the database and several political and geographic groupings of nations. Users can dynamically plot import/export curves to get a sense of who the major fossil fuel producers and consumers are and how this has changed in the last four decades.

Natural Gas Trends

This databrowser extends some of the ideas from the Energy Import/Export databrowser and was created for a presentation at the 2010 Peak Oil conference. Data from the BP Statistical Review are combined with gas data from the US Energy Information Administration and population data from the US Census Bureau. Additional plot options are available that give users more specific control of the output graphics to better tell the stories in the data.

This databrowser also showcases the possibility of connecting exploratory graphics with explanatory text by linking specific user-generated graphics to related posts in an associated Energy Trends blog.

Population Trends

Understanding global population trends is extremely important for those trying to make projections regarding economics and natural resource usage.

The Population Trends databrowser provides easy access to graphics derived from the US Census Bureau's International Data Base (IDB). According to the Census Bureau:

"IDB data capture the timing and demographic impact of important events such as wars, famine, and natural disasters, with a precision exceeding that of other online resources for international demographic data."

Mineral Production and Use

The United States Geological Survey (USGS) has the following to say regarding mineral resources within the United States:

"Mineral materials processed domestically accounted for more than $575 billion in the U.S. economy in 2007. U.S. manufacturers and consumers require increasing amounts of imported mineral materials. Making informed decisions about supply and development of mineral commodities that are critical to our economy and security requires current and reliable information about both mineral resources and the consequences of their development."

This databrowser uses data from the USGS Dataseries 140 covering a wide range of minerals used in all areas of manufacturing, construction and agriculture. Several different visualization styles are available, each tailored to answer a specific set of questions regarding mineral use and availability. The goal of the US Minerals Databrowser is to make it easier to extract meaningful information from this valuable dataset.

Futures Explorer

Futures contracts allow traders to buy and sell commodities for delivery at some future date at a pre-determined price. A futures chain is created when prices for futures of subsequent months are chained together. In the current environment of volatile commodity prices, futures chains provide a snapshot of the "emotional state" of the market -- the level of optimism or pessimism about near term and long term prospects for both the economy and the availability of energy resources such as oil and natural gas.

The Futures Explorer uses daily prices for a variety of commodity futures from March 01, 2011 to yesterday's close. (Energy futures go back to January, 2010.) Users may select dates to compare market predictions from the past with the actual closing prices they were trying to predict. The data are presented in a manner that shows the weekly, monthly and quarterly variation in futures prices, providing an intuitive picture of volatility in commodity markets.

EPA Probabilistic Sampling Databrowser

EPA Probabilistic Survey Analysis

EPA's regional stream surveys were designed to collect data that are representative of conditions throughout the sampled watershed. Random site selection ensures that summary statistics derived from the data are unbiased and can be used to guide planning and management efforts at a regional scale.

This databrowser displays summary statistics for measures of biological condition, water chemistry, and site condition for wadeable streams. Users may test for correlation among response variables and human influence. Users may also subset the data and compare summary statistics and patterns of correlation according to year, state, stream order, or other factors.

This work was funded by the EPA Office of Environmental Information.

(This databrowser is no longer supported by the EPA and is currently non-functional.)

EPA Relative Risk Databrowser

EPA Relative Risk Analysis

Relative risk is a statistical method widely applied in human health reporting to summarize and compare the risk of developing an illness for a given set of factors. This approach has been adapted to summarize the effects of different stressors on the environmental health of streams. Because streams experience a variety of stressors (e.g., increased nutrients, loss of riparian habitat, sediment), resource managers need a method to identify which threats present the greatest risk in order to implement effective programs to protect them. Relative risk summarizes the strength of the association between a stressor and an indicator of stream condition.

This databrowser links relative risk calculations to the national Wadeable Streams Assessment data set. Benthic macroinvertebrates were used to assess stream condition. The databrowser allows users to compare the relative importance of various stressors across political or ecological regions.

This work was funded by the EPA Office of Environmental Information.

(This databrowser is no longer supported by the EPA and is currently non-functional.)

Rheumatology

Rheumatoid arthritis is a chronic autoimmune disorder affecting millions of individuals. Various drug treatments are available to treat the disease. The recently introduced TNF-alpha inhibitors show promising results in clinical studies but command a high per-treatment price. In an effort to carefully evaluate the effectiveness of these drugs several European countries have begun systematically tracking patient response to these treatments.

The data in the Rheumatology Databrowser come from a European database of patients undergoing therapy for rheumatoid arthritis. A version of this databrowser is presently in pre-production use in rheumatology hospitals serviced by our international healthcare partners. This databrowser is being evaluated by rheumatology researchers as a way to consistently apply vetted statistical analyses to a growing database of patient data.

This work was funded by the Danish medical informatics company ZiteLab.