FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 |

«Chapter 1 MINING ASTRONOMICAL DATABASES Roberta M. Humphreys Astronomy Department University of Minnesota roberta Juan Cabanela ...»

-- [ Page 1 ] --

Chapter 1


Roberta M. Humphreys

Astronomy Department

University of Minnesota


Juan Cabanela

Department of Physics, Astronomy and Engineering Science

St. Cloud State University


Jeffrey Kriessler

Efficient Channel Coding, Inc.

Cleveland, Ohio


The development of software tools and techniques for the efficient acAbstract

cess and analysis of large astronomical databases poses some unique challenges. We briefly describe some of the problems astronomical data and datasets present and give an example from our own efforts to automate the classification of galaxies, and then discuss where ”clustering” algorithms may be applicable.

Keywords: databases galaxy classification data mining Introduction The number of space-based all-sky surveys ranging from gamma rays and X-rays to the far infrared and millimeter wavelengths plus the supporting digitization programs from the optical photographic sky surveys (POSS I and II and the UK-SRC) is rapidly increasing. When we add in the new groundbased digital surveys, like 2MASS and DENIS in the near-infrared and the optical Sloan Digital Sky Survey (SDSS) a ≥40 TB-sized Internet-wide multi-wavelength astronomical dataset will soon exist. To derive the maximum scientific benefit from this vast resource of fundamental data and the recently proposed National Virtual Observatory will require efficient access, such as federated databases that stretch across several databases at physically different locations, and new software techniques and tools, often referred to as “data mining”, for the analysis of large databases.

Astronomical databases, however, pose unique problems and challenges due not only to their very large size, but also to the variable quality of the data and the uncertainty of measurement over the entire electromagnetic spectrum, and to the nature of astronomical objects with their very wide dynamic range in apparent luminosity and in apparent size (angular diameter). Astronomical databases will not only possess an unprecedented number of objects, but the astronomical objects themselves may also have a large number of attributes leading to a very high dimensional dataset.

Many of the necessary techiques and software packages, including articial intelligence techniques, like neural networks and decision trees have already been successfully applied to astronomical problems such as pattern recognition and object classification, while new clustering and data association algorithms that may have application to large astronomical databases are being developed by computer science groups. However, these new software packages are often developed and tested on idealized or “clean” datasetsthat lack the “real noise” and uncertainty of measurement encountered particularly in large astronomical databases. The APS Catalog of the POSS I is an excellent resource for perfecting and testing these data mining techniques.

1. THE APS PROJECT The Automated Plate Scanner(APS) Catalog of the POSS I is an on-line database of fundamental data and parameters for over 100 million stars and galaxies derived from our digitized scans of glass copies of the blue and red plates of the original, first epoch Palomar Observatory Sky Survey (POSS I). It is large enough, 25 GB, to present a realistic challenge for testing “data mining” algorithms on a range of astrophysical applications. Its scientific usefulness and validity have been demonstrated by numerous studies by members of the APS group and by our users.

Mining Astronomical Databases The Catalog contains coordinates, magnitudes, colors, and several other computed image parameters for all of the matched images on the blue and red plates. It provides information for the individual stars and galaxies down to fainter than 21st magnitude (in the blue). The calculation of accurate image parameters and the reliable separation of stellar and non-stellar images (galaxies) has long been a focus of our work with the APS data We were the first group to successfully apply AI techniques, specifically neural networks, to the image classification problem (see Odewahn et al. 1992. Odewahn et al. 1993, Odewahn

1995) Our neural network image classifier has been trained to the faint limit of the photograhpic plates and gives a success rate better than 90% to within one magnitude of the plate limit. It uses various image parameters with a back-propagation algorithm and two hidden layers to generate an output layer with two nodes, star or non–star(= galaxy).

This “node gal” value ranging from 0 to 1 also provides a confidence level of the classification and is cataloged with the image type.

The completed catalog of objects is available as an on-line database over the Internet (URL is http://aps.umn.edu). Querying is achieved with a custom-designed database management system called StarBase capable of handling millions of entries. StraBase was developed in collaboration with faculty and students of the University of Minnesota Computer Science Department. It uses specialized hashing on each image parameter derived for every catalog entry, including a two-dimensional hierarchical algorithm for positional search and retrieval. This level of optimization provides us with a DBMS that is faster and smaller than a commercial equivalent. A complementary image database is also available and includes all of the matched images in the object catalog as well as the unmatched images above the noise threshold on both the blue and red plates.

We have recently installed a federated database (FDBS) called Myriad (Lim et al. 1995) developed by Professor Jaideep Srivastava’s group in our Computer Science Dept. The FDBS integrates the APS catalog and image database so that they appear as one easy-to-use resource. With a FDBS, the queries and transactions on the integrated database are performed as if it were a single database. The separate DBMS’s are hidden from view by a flexible interface. It is important to emphasize that the FDBS permits horizontal access to the data, not just vertical.

Queries can be made not only by sky position, but also by any parameter in either database.

Although we have had considerable success with our neural networkbased object classifier for our research applications, for many astrophysical problems the actual morphological type of the galaxy is very important, especially for studies of galaxy formation and evolution and largescale structure in the universe. Working with members of our Computer Science Department, we have recently had some success applying data mining and pattern recognition codes to identifying the most useful parameters for automating the classification of the galaxy images by their morphological types.



Ever since the discovery of galaxies, it has been known that these assemblies of stars, dust and gas have different morphological shapes.

In 1936, Edwin Hubble established a system to classify galaxies into three fundamental types. Elliptical galaxies had an elliptical shape with no other discernible structure. Spiral galaxies had an elliptical nucleus surrounded by a flattened disk of stars and dust containing a spiral pattern of brighter stars. The irregular galaxies as their name suggests, were irregularly shaped and did not fit into the other two categories. As more galaxies were observed, it became apparent that the galaxy types formed a continuous sequence starting from nearly spherical galaxies toward more flattened ellipticals, through the lenticulars, galaxies with a large nucleus and small disk with no spiral structure, to the true spirals starting with tightly wound spiral arms and proceeding to less tightly wound arms and concluding with the irregulars. In other words, Hubble arranged galaxies in order of increasing complexity. Although many subdivisions and refinements have been made within the Hubble classification system, we are primarily concerned here with identifying the four basic types of galaxies: ellipticals, lenticulars, spirals, and irregulars.

The classification of galaxies is typically performed by visual inspection of photographic plates. This is by no means an easy task, requiring a great deal of practice and time on the part of the classifier. Large catalogs of galaxies containing a few tens of thousands of galaxies [e.g.

the Third Reference Catalogue of Bright Galaxies (de Vaucouleur et al.

1991)] take years to compile. With today’s large all–sky surveys, generating millions of galaxy images, human classification is no longer a viable option. Furthermore, although the types are well defined, human classifications tend to be subjective and it is difficult for independent researchers to reproduce results. Often it is very difficult to distinguish between adjacent types. For instance, a lenticular galaxy viewed faceon, looking down at the disk, looks very similar to an elliptical and all morphological catalogs have fewer face-on lenticulars than edge-on, Mining Astronomical Databases where the presence of a disk is more readily discerned. Studies also show that morphological catalogs of galaxies produced by even the best human classifiers disagree with other classifiers between 10% and 20% of the time. Therefore, in order to produce the large, objective, reproducible, morphological catalogs necessary for galaxy formation and evolution studies computer generated classifications are required.

There have been a few recent attempts to create an automated classification system, generally using artificial neural networks (Odewahn 1995, Naim et. al. 1995). While limited success has been achieved, these computer-based classifiers have yet to produce large, unbiased morphological galaxy catalogs. The reason for this seems to be that while it has been possible to train a neural network to correctly classify a well defined, hand picked, set of galaxies, when applied to the large random samples of galaxy images upon which any classifier must ultimately be applied, they fail to give results that can equal human classifications.

In our attempts to solve this problem we have visually classified some 1500 galaxies images obtained from the APS database in the region of the north galactic pole. Although this training set was chosen based solely on the brightness and size of galaxy images on 9 photographic plates, it is important to note that galaxies which were hard to classify (less than 1 %) were removed from this sample. The first problem is to identify a set of parameters which can separate the galaxies by their types. This has turned out to be quite challenging. The human eye can easily recognize complicated patterns in images such as spiral arms which tend to be spotty, blochy affairs that are difficult for automated techniquesi. Often it is necessary to rely on secondary effects such as color (spiral galaxies tend to be bluer than ellipticals) which are not specifically part of the classification system as originally conceived. If a picture is worth a thousand words, with a little imagination a galaxy image can be described by hundreds of parameters, all of which may have some relation to the morphological type. Currently, we calculate over five hundred such parameters for each galaxy in the APS database.

Unfortunately, we cannot simply present all of these parameters to a neural network and let the training algorithm determine which are the most important. We would merely end up with a network that has memorized the training sample perfectly, but performs poorly on samples not seen during training. In order to have a reasonable chance of spanning a five hundred dimensional parameter space we would require a training sample of many millions: the thing we are trying to avoid.

With drastic increases in training set size ruled out for practical reasons, another option is to limit the number of parameters presented to a neural network. The question is, which parameters to choose? If one or two parameters yielded adequate separation, we could merely plot all the parameters in turn and see which provided the greatest distance between the clumps defining the various types. Unfortunately, this is not the case. While several parameters show trends with galaxy type, no combination of two or three parameters is capable of solving the problem.

The problem of finding clusters in large dimensional spaces however, falls within the sphere of data mining. Working with the data mining group at the University of Minnesota, we applied the program Mineset to the task. This program allows quick evaluation and ranking of the parameters, as well as creating a decision tree classifier. Using the 10 best parameters we have been able to achieve a classifier with an 85% accuracy treating the Ellipticals and the lenticulars as a single class.

While this is still short of our goal of creating a classifier as good as the human classifiers, it is a step in the right direction. In order to improve our classifier we continue to seek parameters that can provide better separation between the morphological types. However, it is possible that our 500 parameters already have enough information to correctly classify galaxies and by limiting the number to only ten we are ignoring useful information. At the same time, examination of the misclassified galaxies often reveals an anomaly in the image which confuses the computer classifier. Examples include forground stars or faint background galaxies within the galaxy image, or the presence of dust lanes in an otherwise structureless Elliptical galaxy. While a human classifier routinely discounts these deviations, automated classifiers see only the parameters presented to them. A possible solution is to train a large number of neural networks, each of which is presented a small number of the parameters. The final classification is then taken to be a weighted average of all the classifiers output. This procedure allows a more robust classification to be performed as one or two deviant parameters can be out-weighed by the vast majority of normal parameters for that galaxy type.



Pages:   || 2 |

Similar works:


«Chemical Abundances of Bright Giants in the Mildly Metal-Poor Globular Cluster M4 Inese I. Ivans∗ Department of Astronomy and McDonald Observatory, University of Texas at Austin, USA Abstract: We present a chemical composition analysis of three dozen giant stars in the nearby “CN-bimodal” mildly metal-poor ([Fe/H] = -1.18) globular cluster M4. The analysis combined traditional spectroscopic abundance methods with modifications to the line-depth ratio technique pioneered by Gray (1994)....»

«Emissivity Statistics in Turbulent, Compressible MHD Flows and the Density-Velocity Correlation Alex Lazarian1, Dmitri Pogosyan2, Enrique V´zquez-Semadeni3, and B´rbara Pichardo4 a a Dept. of Astronomy, University of Wisconsin, Madison, USA Canadian Institute for Theoretical Astrophysics, University of Toronto, CANADA Instituto de Astronom´ UNAM, Campus Morelia, Apdo. Postal 3-72, Xangari, 58089, Morelia, Mich., ıa, MEXICO Instituto de Astronom´ UNAM, Apdo. Postal 70-264, M´xico D.F....»

«Boy Scouts of America ASTRONOMY Merit Badge Guide Table of Contents I. Introduction 3 II. Tools of the Astronomer 3 III. Using Constellation Adventure 4 Constellation Adventure Star Charts Under a Starry Night Field Guide IV. The Astronomy Merit Badge 4 Requirement 1 Observing Tips & Safe Solar Observing 4 Requirement 2 Light & Air Pollution 5 Requirement 3 Binoculars & Telescopes 5 Requirement 4 Stars, Constellations & the Milky Way 6 Requirement 5 – Planets 7 Requirement 6 – Observing...»

«Jahresbericht 2009 Mitteilungen der Astronomischen Gesellschaft 93 (2010), 101–118 Bochum Ruhr-Universität Bochum Institut für Theoretische Physik IV Weltraumund Astrophysik Universitätsstraße 150, 44780 Bochum Telefon: +49 (234) 32-22032, Telefax: +49 (234) 32-14177 E-Mail: rsch@tp4.ruhr-uni-bochum.de WWW: http://www.tp4.ruhr-uni-bochum.de 1 Personal und Ausstattung 1.1 Personalstand Professoren und Privatdozenten Prof. Dr. Julia Becker [-23779] (seit 1.6.2009), PD. Dr. Horst Fichtner...»

«Mean Motions and Longitudes in Indian Astronomy Dennis W. Duke, Florida State University The astronomy we find in texts from ancient India is similar to that we know from ancient Greco-Roman sources, so much so that the prevailing view is that astronomy in India was in large part adapted from Greco-Roman sources transmitted to India. 1 However, there are sometimes differences in the details of how fundamental ideas are implemented. One such area is the technique for dealing with mean motions...»

«ASTRONOMY: C. G. ABBOT 82 PROC. N. A. S.THE LARGER OPPORTUNITIES FOR RESEARCH ON THE RELATIONS OF SOLAR AND TERRESTRIAL RADIATION' By C. G. ABBOT SMITHSONIAN ASTROPHYSICAL OBSERVATORY Communicated by the National Research Council, December 8, 1919 The earth is maintained at it§ present temperature by a balance between the solar radiation received and the terrestrial radiation emitted. The mean intensity of the solar radiation as it is at the earth's mean distance outside the atmosphere is...»

«Rep. Lundy Field Soc 42 THE MEGALITHIC ASTRONOMY OF LUNDY: EVIDENCE FOR THE REMAINS OF A SOLAR CALENDAR By R. W. E. FARRAH South Light, Lundy, GPO Bidcford, North Devon, EX39 2L Y or 4 Railway Cottages, Long Marton, Appleby, Cumbria CA 16 6BY INTRODUCTION Many will be familiar with the Midsummer Solstice (Sol =sun, stice = stand, the sun' s standstill ) alignment at Stonehenge in which the sun ri ses above the hee l stone, an outlier to the north-east, when viewed from the centre of the...»

«Vol. 1, N.º 49 (enero-marzo 2016) Organización Barrial Tupac Amaru en San Salvador de Jujuy: ¿Un movimiento social urbano? Fernanda Valeria Torres Universidad Nacional de La Plata/ Consejo Nacional de Investigaciones Científicas y Técnicas (Argentina) Resumen Este trabajo se propone revisitar la categoría Movimientos sociales urbanos enlazando con las concepciones posibles de ciudad que dicha categoría analítica puede asumir. Nos referimos por un lado, a las interpretaciones que...»

«On-Sky Tests of an A/R Coated Silicon Grism on board NICS@TNG F. Vitali*a, V. Fogliettib, D. Lorenzettia, E. Ciancic, F. Ghinassid, A. Harutyunyand, S. Antoniuccia, C. Riverold, L. Riverold a INAF Osservatorio Astronomico di Roma, Via Frascati 33, 00040-I Monte Porzio Catone Italy b CNR Istituto di Struttura della Materia, Area della Ricerca Roma 1, Montelibretti, Via Salaria, Km. 29,300, 00016-I Monterotondo Italy c CNRIstituto per la Microelettronica e Microsistemi, Laboratorio MDM, Via C....»

«Bull. Astr. Soc. India (2013) 41, 1–17 The discovery of quasars K. I. Kellermann National Radio Astronomy Observatory, 520 Edgemont Road, Charlottesville, VA, 22901, USA Received 2013 February 01; accepted 2013 March 26 Abstract. Although the extragalactic nature of quasars was discussed as early as 1960, it was rejected largely because of preconceived ideas about what appeared to be an unrealistically high radio and optical luminosity. Following the 1962 occultations of the strong radio...»

«The Role of Astronomy in Society and Culture c International Astronomical Union 2011 Proceedings IAU Symposium No. 260, 2009 doi:10.1017/S174392131100295X D. Valls-Gabaud and A. Boksenberg eds. International Schools for Young Astronomers Teaching for Astronomy Development: two programmes of the International Astronomical Union Mich`le Gerbaldi1, Jean-Pierre DeGreve2 and Edward Guinan3 e Institut d’Astrophysique de Paris, France 98 bis Boulevard Arago, 75014 Paris, France email:...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.