WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 |

«MANAGING AND MINING THE LSST DATA SETS Astronomy is undergoing an exciting revolution a revolution in the way we probe the universe and the way we ...»

-- [ Page 1 ] --

MANAGING AND MINING THE LSST DATA SETS

Astronomy is undergoing an exciting revolution -- a revolution in the way we

probe the universe and the way we answer fundamental questions. New

technology enables this: novel detectors are opening new windows on the

universe, creating unprecedented volumes of high quality data, and computing

technology is keeping up with this explosion. In turn, this is driving a shift in the

way science is produced in astronomy and astrophysics: huge surveys of the sky over wide wavelengths can be analyzed statistically for low-level correlations and inverse problems may be solved by statistical inversion, producing new understanding of the underlying physics.

This parallels progress in high energy physics. Decades ago, a handful of photographs of events sufficed for ground-breaking discoveries. This gave way to experiments in which the systematic measuring (scanning) of many bubble chamber pictures allowed the measurement of statistical properties, such as lifetimes. Current experiments extend the technique by recording all events electronically and subjecting Petabyte data sets to rigorous statistical analysis.

A key ingredient in mining our astronomical science from such huge databases, efficient algorithms for statistical analysis, has been under-emphasized in the rush to utilize new technology and get the data products out to the science community. Past data sets in astronomy (and indeed in most areas of science) have been small enough that one individual could visualize the data and discover unanticipated correlations. This is often how major discoveries have been made.

Data sets are now becoming sufficiently large that this is less possible -- even prescribed processing of the data to test a hypothesis is becoming challenging.

In the near future, analysis of Petabyte databases will require the solution of this problem.

New Horizons It is worthwhile to briefly review this sea-change in the way astronomers produce science. A giant departure from the tradition of one astronomer and one modest data set per project has been the Sloan Digital Sky Survey: a 15TB imaging data set covering multiple wavelengths and up to 10,000 square degrees of the sky (http://www.sdss.org/). Nearly 100 Co-Is will mine these data in prescribed ways.

Current plans do not include mining the 15TB. Rather, 1TB of catalogs of detected objects and another 2TB of their “cutout” pictures will be produced and mined. Nevertheless, this will surely result in new understanding of our universe.

Imagine what might be discovered if the full 15TB could be explored efficiently!

Another refreshing and very successful departure from tradition is the 2MASS infrared survey of the sky (http://irsa.ipac.caltech.edu). This group has poured major effort into usability of the data products and efficient remote searching.

A new survey, which will begin in perhaps 2012 will produce several Petabytes of data per year: An 8.4-meter aperture 3-mirror telescope with 7 square degree field of view and uniformly excellent image quality will provide an unprecedented figure of merit for deep surveys. Equal emphasis will be placed on the survey products, their unique science capability and distribution to the community, the data pipeline, the camera and data system, and the telescope. In the ranked projects in the recent NRC AASC Decadal Survey for Astronomy, this facility was named the Large-aperture Synoptic Survey Telescope (LSST) to emphasize its multiple missions. Advances in three areas of technology (large aspherical optics fabrication and metrology, microelectronics and Terascale computation) have come together in the design of the LSST system. The LSST fills a nearly unexplored region of parameter space and enables programs that would take many lifetimes on current facilities. From a nearly complete assay of near-Earth objects down to 300m in size, to unique probes of cosmic dark mass-energy.

The data reduction and analysis for the LSST will be done in a way unlike most current observing programs. The data rate, combined with the need for real-time analysis and post-reduction data mining, require a fresh approach making use of the best technology and developing innovative software for efficient data management. While much headway can be made in efficient algorithms and associated software, there will also be CPU and disk hardware challenges. It will be particularly effective to have data analysis innovations in place when the telescope and camera system is in the verification phase, as early as 2009.

Achieving this requires parallel efforts on the optics, electronics, and software.

The collaboration will include astronomers involved in current and upcoming ultra-large surveys, experts on statistics and algorithm development, computer science, and data mining and visualization.

The Hardware Challenge

While the telescope optics is unique (it will have over fifty times the optical throughput of the current 4m wide-field cameras!), projections for the hardware associated with data processing and database mining must be studied now. The data rate from the 3 gigapixel camera will be over 20 TB per night, compressed.

Pipeline image processing of this data stream will be possible using parallel processors five years from now. More interesting challenges are presented by the archiving and mining tasks. Storage technology is rapidly evolving, so that keeping all the data online will almost certainly be possible. Even more importantly, we need now to discover ways to search for correlations in the resulting massive database, which will be necessary to extract unanticipated science. While the required data hardware and software for the key (prescribed) science programs present challenges, assuring opportunity for unanticipated science using such huge databases presents an even grater challenge.





Designing optimal data handling and search routines will be an exciting aspect of this project. Many science programs may need access to the full imaging data archive. Examples are: searches for low surface brightness objects not detected as individual objects, and searches for patterns of transient objects in time and space.

There are several hardware issues. Most of the computation will necessarily be done locally to the data disks, most likely in a hierarchical nesting of processors and a corresponding hierarchical nesting of slow and fast disks. The computation requirements, while large, appear to be within reach in the next five years (CPU Moore’s law scaling is not expected to saturate until 2015, and there is the likelihood of disruptive innovation avoiding the quantum and charge confinement crisis even then). Capacity per disk is still increasing in a fashion comparable to Moore's law for processors. Today, for low-bandwidth storage, a $300 IDE disk holds about 100 GB; we can expect by 2009 that a comparably priced disk will hold over three terabytes.

The bandwidth to disks is also growing, but not at nearly the same rate. This means that as the current magnetic technology advances, the total time required to read all the data from a disk actually goes up, not down. In the last five years, disk bandwidth has improved a factor of six, so we can estimate at least another factor of six for the next five years.

At a pace of one 3 Gpixel image per 20 seconds (15 sec exposure, 5 sec read/repoint), each night the LSST will generate 13 TB uncompressed processed data in 8 hours -- that's about 450 MB/s. This suggests that we can assemble the required pipeline processing fast disks by 2009. We predict from scaling that disks by then will handle over 300MB/s, but that is sustained transfer rate under ideal conditions. Several disks in parallel will provide more headroom to get the data written to disk in real time. It is also likely that new technologies such as holographic storage will break this scaling in the next five years, at least for slow storage.

Crafting the software pipeline and developing efficient database management tools and the algorithms for data mining will present more of a challenge than the pre-processing computational capacity. The demands of this post-processing will be hardware and software intensive. The effort invested in software, data system design, tools for visualizing and analyzing data, and the science data analysis, may be comparable to that spent on the telescope optics.

The Software Challenge

The enormity of these data sets creates exciting statistical challenges beyond the computational. Many statistical techniques used by astronomers today have been optimized to deal with the small size of their existing data sets. At the same time the algorithms scale as a power of the number of objects, prohibitive for the new data sets with billions of objects, where even log N is 30! Algorithms which scale much worse than linearly will be unacceptable computationally. At the same time, the main source of errors will be various systematic effects, and we also have to deal with the fact that we can only observe a single realization of the universe as a random process. There is also “cosmic variance,” the expected statistical variations in space of the power spectrum.

One can overcome the cosmic variance by surveying larger and larger volumes of the universe, but then the data sizes become even larger. We should think about approximate statistical techniques, where the approximation is within this variance, but the algorithm has a non-polynomial scaling. A recent example of such an attempt is Szapudi et al. 2000. For example, the new sky surveys are sufficiently large to allow accurate estimation of third and fourth order moment structure of galaxy locations. This immediately raises the issue of what functionals of higher order moment structures one should consider and how to relate these to theoretical models for the evolution of the universe. Deep surveys, which necessarily look far into the past, allow one to consider models for evolving large-scale structure, raising questions of how to estimate this evolution.

Furthermore, no matter what the size of the data set, there will always be features and scales for which the estimation error will be important, so the need for developing statistically efficient estimation schemes and methods for assessing estimation error will always remain. Combining statistical efficiency with computational efficiency will be a constant challenge, since the more statistically accurate estimation methods will often be the most computationally intensive.

With the ability to do fast correlation analysis on Petabytes of data, we could revolutionize how we detect faint moving objects or probe the underlying dark mass-energy of our universe. Weak gravitational lensing, the deflection of light by intervening clumps of dark matter, causes distortions in the observed shapes of galaxies. These data may then be inverted to yield a mass map of the intervening universe. Closer to home, potentially devastating near-Earth objects go undetected. New techniques of extracting relevant image parameters can be used on the Petascale imaging data to automatically find such objects. Similar image-mining techniques can be very relevant in other areas of science as well (satellite observations, biology, oceanography, etc). Generally, inverse problems (and the regularization of their solution) are computationally more intensive than simple n-point correlations. With Petabyte databases, new algorithms for statistical regularization must be developed.

Finally, data visualization will present a formidable challenge. Efficient methods for statistical visualization and sampling of large databases are required. Userreconfigurable trees of image feature catalogs driving multi-dimensional displays could help, but the opportunities here are largely unexplored.

A New Collaboration We see this research program attracting a broad range of mathematical, computer and physical scientists. In addition to the obvious connections to astronomy, statistics and large-scale computation, this program would also include probability, data visualization and data management. We would also seek to include representatives from the high-energy physics community, who have faced somewhat different problems involving massive data sets and immense data streams for many years now. Some representation from theoretical cosmologists who simulate universes would add to the mix and allow the question of comparing simulated universes to the actual universe to be more profitably addressed.

It will be particularly useful to study the characteristics of spatial processes, since it nicely combines the central computational and statistical challenges. Very little work has been done to date in this area, although a recent paper by Moore et al.

(2001) recognizes the importance of this problem and describes an algorithm for computing estimates of higher order correlation functions that, for sufficiently large data sets, is much more efficient than the obvious approach.

We need not simply a theoretical study of how massive astronomical data sets should be analyzed, but major efforts to analyze the most recently available data sets. Data from the Sloan Digital Sky Survey should be publicly available by

2003. It will be useful to work with this database in new ways, searching for lowlevel correlations. Deeper imaging surveys, such as the Deep Lens Survey, are producing imaging data and catalogs nearly to the depth that LSST will reach, but over a very small area of sky by comparison to a decade of LSST operations.

Such surveys are precursors to LSST and their data products will prove to valuable sand boxes for development of new algorithms.



Pages:   || 2 |


Similar works:

«EL VIEJO TERCIO JUEGOS DISTRIBUCIÓN JUEGOS – DISTRIBUCIÓN JUEGOS DISTRIBUCIÓN JUEGOS – DISTRIBUCIÓN elviejotercio@gmail.com www.elviejotercio.com TEL./FAX 91 8060157 Novedades juegos y libros, 4 de enero de 2012 Battleship Galaxies: The Saturn Offensive Game Set. P.V.P.: 62,00 € FORMATO: JUEGO DE TABLERO. EDITOR: WOC Battleship Galaxies es un juego de tablero de combate en el espacio con miniaturas. Las dos fuerzas enfrentadas en el juego, la humana Intergalactic Space Navy y los...»

«Remote Sens. 2014, 6, 7933-7951; doi:10.3390/rs6097933 OPEN ACCESS remote sensing ISSN 2072-4292 www.mdpi.com/journal/remotesensing Article Land X-Band Multi-Temporal InSAR Analysis of Tianjin Subsidence Qingli Luo 1,*, Daniele Perissin 2, Yuanzhi Zhang 3 and Youliang Jia 4 1 Center of Remote Sensing, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China 2 School of Civil Engineering, Purdue University, 550 Stadium Mall Drive, West Lafayette, IN 47907–2051, USA;...»

«MEGALITHIC SITES IN BRITAIN BY A. THOM Chapter 1. Introduction Chapter 2. Statistical Ideas Chapter 3. Astronomical Background Chapter 4. Mathematical Background Chapter 5. The Megalithic Yard Chapter 13. The Extinction Angle Chapter 14. Conclusions List of Figures, Tables and Sites (added) OXFORD AT THE CLARENDON PRESS [ 1971 ] 1. INTRODUCTION–Selected from hundreds of small-scale copies of surveys made by Professor Thom over the past thirty years, examples are chosen to illustrate some of...»

«Dopamine modulation in basal ganglia-cortical network implements saliency-based gating of working memory Aaron Gruber1,2, Peter Dayan3, Boris Gutkin3, and Sara Solla2,4 Biomedical Engineering1, Physiology2, Gatsby Computational Neuroscience Unit3 University College London, and Physics and Astronomy4 London, UK Northwestern University, Chicago, IL { dayan, boris }@gatsby.ucl.ac.uk { a-gruber1, solla }@northwestern.edu Abstract The sustained neural activity in prefrontal cortex (PFC)...»

«Bull. Astr. Soc. India (2013) 41, 1–17 The discovery of quasars K. I. Kellermann National Radio Astronomy Observatory, 520 Edgemont Road, Charlottesville, VA, 22901, USA Received 2013 February 01; accepted 2013 March 26 Abstract. Although the extragalactic nature of quasars was discussed as early as 1960, it was rejected largely because of preconceived ideas about what appeared to be an unrealistically high radio and optical luminosity. Following the 1962 occultations of the strong radio...»

«The Sun’s Path at Night The Revolution in Rabbinic Perspectives on the Ptolemaic Revolution RABBI NATAN SLIFKIN Copyright © 2010 by Natan Slifkin Version 1.1 http://www.ZooTorah.com http://www.RationalistJudaism.com This monograph is adapted from an essay that was written as part of the course requirements for a Master’s degree in Jewish Studies at the Lander Institute (Jerusalem). This document may be freely distributed as long as it is distributed complete and intact. If you are reading...»

«On the possible discovery of precessional effects in ancient astronomy. Giulio Magli Dipartimento di Matematica del Politecnico di Milano P.le Leonardo da Vinci 32, 20133 Milano, Italy.Abstract: The possible discovery, by ancient astronomers, of the slow drift in the stellar configurations due to the precessional movement of the earth’s axis has been proposed several times and, in particular, has been considered as the fundamental key in the interpretation of myths by Ugo de Santillana and...»

«Session 2302 Laboratory Instruction in Undergraduate Astronautics Christopher D. Hall Aerospace and Ocean Engineering Virginia Polytechnic Institute and State University Introduction One significant distinction between the “standard” educational programs in aeronautical and astronautical engineering is the extent to which experimental methods are incorporated into the curriculum. The use of wind tunnels and their many variations is firmly established in the aeronautical engineering...»

«FUTURE OBSERVATIONS OF AND MISSIONS TO MERCURY S. ALAN STERN University of Colorado and FAITH VILAS NASA Johnson Space Center The continued and expanded study of Mercury is important to several aspects of planetary science. Wejrst review the broad scientijc objectives of such exploration and describe the methods by which such scientijc objectives may be addressed. Groundbased optical, infrared and radar astronomy are discussed$rst, followed by Earth-orbital observations and in situ missions to...»

«1 Standard Photometric Systems Michael S. Bessell Research School of Astronomy and Astrophysics, The Australian National University, Weston, ACT 2611, Australia KEYWORDS: methods:data analysis techniques: photometric,spectroscopic catalogs. ABSTRACT: Standard star photometry dominated the second part of the 20th century reaching its zenith in the 1980s. It was introduced to take advantage of the high sensitivity and large dynamic range of photomultiplier tubes compared to photographic plates....»

«Astronomy Cast Episode 55: The Asteroid Belt Fraser Cain: I must have got 10 emails, 20 emails, in the last couple weeks saying, “Talk about the asteroid belt. Don’t go straight to Jupiter, what about the asteroid belt.” So we’re going to talk about the asteroid belt. Your wish is our command. Now, I am very well versed in this topic, thanks to video games and science fiction movies and television, like, “asteroids,” the video game, where there’s asteroids coming at you from all...»

«A 21cm Radio Telescope for the Cost-Conscious Marcus Leech, Science Radio Laboratories, Inc Abstract We show two slightly-different designs for a simple, small, effective, radio telescope capable of observing the Sun, and the galactic plane in both continuum and spectral modes, easily able to show the hydrogen line in various parts of the galactic plane. Introduction The emergence of new, relatively-inexpensive tools for the RF tinkerer has allowed a re-examination of the “bottom end” of...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.