next_inactive up previous





WFCAM SCIENCE ARCHIVE
SCIENCE REQUIREMENTS ANALYSIS DOCUMENT



Nigel Hambly, Ian Bond & Bob Mann
(with contributions from the UKIDSS Consortium, Andy Adamson & Jim Emerson)
Wide Field Astronomy Unit (WFAU), Institute for Astronomy, University of Edinburgh

Modification history:

Version Date Comments
1.00 28/10/02 Original version (NCH & IAB)
1.10 04/12/02 Updated following input from UKIDSS, AA & JPE (NCH)
1.20 08/01/03 Updated following CASU & WFAU review (NCH)
1.30 20/03/03 Inserted `analysis' in title


Contents

INTRODUCTION

Standard, top-level analysis for complex digital systems consists of:

  1. definition of requirements and specifications,
  2. undertaking analysis and design,
  3. code development and debugging,
  4. unit and integration testing,
  5. deployment and maintenance;
this sequence usually being iterative as scope/specifications change and feed back to modify the system requirements.

WFCAM (see Appendix A.1) is a new camera for the 3.8m United Kingdom Infrared Telescope. This large format camera will have an unprecedented data rate. Ultimately, successful science exploitation of WFCAM will depend on user access to the large data volumes generated by this instrument. Data volumes are far in excess of those that users can expect to hold and process on their own facilities. This leads to the concept of pipeline processing and the establishment of a centralised `science archive'. The project to develop the WFCAM Science Archive (hereafter WSA) is outlined in Appendix A.2 and references therein.

This `science requirements document' (SRD) details the basic requirements for the WSA, and represents item 1 in the above sequence. The intention is to state the top-level science requirements being placed on the WFCAM Science Archive (WSA) as a whole; give science usage examples of the WSA; and finally to discuss in more detail those requirements pertaining to the WSA in order to produce a specification for its design. The approach taken in this document is to distil the external, top-level requirements and the usage examples, through analysis and implication, to an explicit statement of the WSA contents and functionality. Hence, the SRD is structured as follows:

Subsequently, we intend to follow the sequence above and undertake a design for the WSA including documentation of Data Products, Data Flow, Hardware Architecture and Software Architecture. The detailed specification for the WSA will be developed further in these following documents rather than in the SRD.

It is not intended that the requirements and usage examples are set in stone at this stage. Both are `living' documents in the sense that they are online on the web (URLs are given in the next Section) and are subject to small alterations as the WFCAM and UKIDSS projects progress. The intention of this document is to take these inputs as they are at the time of writing (Q4 2002) and analyse them in order to have something to work to in the WSA development project.

The WFAU WSA development homepage is at http://www.roe.ac.uk/~nch/wfcam.


REQUIREMENTS AND USAGE EXAMPLES

The specified requirements are available online at http://www.jach.hawaii.edu/~adamson/wfarcrq.html Usage examples are available online at http://www.roe.ac.uk/~nch/wfcam/ and have been developed in collaboration with the UKIDSS Consortium.


REQUIREMENTS ANALYSIS

In the following analysis, we discuss the top-level requirements referenced in the previous Section in more detail. Each item has an associated Rationale, Implications which discuss the implications for the WSA design, an optional Note, and finally a concise statement of the requirement to be developed in later Sections. It is intended that the requirements cannot be changed without consultation (primarily with JAC and UKIDSS).

Top-level requirements

T1:
Science archive shall provide the maximum possible potential for capitalizing on the UKIDSS surveys.
Rationale: UKIDSS will absorb the greater fraction (75%) of all WFCAM time on UKIRT and so is the top priority for WSA usage.
Implications: The UKIDSS programme must be the prime science driver for the WSA. Archive development needs to be an open process, with as much UKIDSS involvement as possible. Hence, full and up-to-date documentation needs to be available in web-browsable form as well as hardcopy. The tight schedule for WFCAM, the competition from CFHT's WIRCAM, and the need for timely release of data for competitive and high-impact science place a correspondingly tight schedule on delivery of the WSA. Resource/time constraints imply a phased approach to WSA development, with a committment to producing a basic working archive system by instrument first light, followed by development to a fully functioning archive system thereafter. To expedite delivery of the WSA, design should be based on existing archive solutions and code where appropriate.
Note: WFCAM is currently due for delivery by Q4 2003; UKIDSS survey operations will likely begin in earnest in Q1 2004.
Requirement:
A basic working science archive (hereafter `Version 1.0') must be in place at Q4 2003. A fully functioning archive system (hereafter `Version 2.0'), as defined by the requirements herein, must be available as soon as possible after WFCAM first light, and no later than 1 year after survey operations begin in earnest.



T2:
Science Archive must contain and serve pipeline processed data (processed pixels, object catalogues and housekeeping data) from both UKIDSS and other usage (e.g. open time, commissioning time).
Rationale: Even small PATT programmes (for example) may produce large amounts of data that are problematic for the user's home institute resources. Moreover, non-survey data will be a valuable datamining resource (see later).
Implications: WSA data accumulation must take into account non-survey usage. Database schema design must be flexible to allow for non-survey data. Proprietary rights need to be protectable in the WSA.
Note: Pipeline processing and subsequent archiving cannot be undertaken for frames taken in non-standard observing modes. For non-survey data that are taken in standard modes, limited standardised schemas will be set up and the data will be archived; it will not be possible to develop individual shcemas on a case-by-case basis.
Requirement:
Science Archive (all Versions) must contain and serve pipeline processed data (pixels, object catalogues and housekeeping data) from both UKIDSS and other usage (e.g. open time, commissioning time).

T3:
Science Archive must be flexible to cope with alterations to UKIDSS survey design over time.
Rationale: The UKIDSS observing allocation and programme are subject to change by the Board on a 2 yearly rolling review.
Implications: WSA design must not preclude changes in design of the major surveys. Again, database design must be sufficiently modular and flexible to cope with this.
Note: Following the initial Board review in May 2002, twice-yearly reviews are expected in mid-2004 and every two years thereafter.
Requirement:
Science Archive (all Versions) will match UKIDSS survey requirements as they are currently specified, but will be flexible enough to follow changes in survey design.



T4:
Science Archive design must facilitate usage from `Grid clients' and inclusion in the Virtual Observatory (VO).
Rationale: Given the legacy aspect of the UKIDSS surveys (especially the LAS and GPS) it is expected that the WSA will form a substantial element in the `datagrid' of the VO (indeed, WFCAM is a prime science driver in the UK's AstroGrid project).
Implications: WSA access tools, data product formats and transfer protocols must conform to internationally agreed VO standards.
Note: The AstroGrid Phase A report is now available (October 2002) for information concerning VO development prototypes.
Requirement:
Version 1.0 Science Archive will conform to existing standards and will be designed such that new standards can be easily incorporated, but must not be delayed by waiting for new developments to crystalize. Ultimately, the Science Archive must conform to internationally agreed VO standards in access tools, data product formats and transfer protocols.



T5:
Science Archive must allow, for example, simple and complex queries, with appropriate interfaces.
Rationale: Many users will query the WSA, from the Grid-client `power user' to the casual, non-expert interactively browsing astronomer. Both are important from the science exploitation point of view.
Implications: Different levels of user interface will be needed for the WSA, from interactive web forms through remote-client GUIs to Grid-enabled clients.
Requirement:
Version 1.0 Science Archive will allow simple (see later) queries. Version 2.0 Science Archive will allow usages at varying levels of complexity (as defined later).

T6:
Science Archive must be simple to use for PR purposes.
Rationale: UKIDSS is the next development in the UK's Wide Field programme. High profile science will emerge from UKIDSS, and as the first point of contact with the data, the WSA must be designed appropriately.
Implications: Again, the WSA must be user-friendly to the casual, browsing user. `Aesthetic' data products (e.g. pseudo-colour images) must be available, in addition to `serious' science products.
Note: The SDSS has good examples of entry points for PR purposes (URL) as well as scientist access points (URL). However, while the production of individual images as a requirement of the WSA, the responsibility of designing and maintaining a `gallery' website of publicity images lies elsewhere (eg. with JAC and/or UKIDSS).
Requirement:
Science Archive (all Versions) must have interfaces that are open to simple, intuitive use by the non-expert.



T7:
Science Archive must allow access to survey data before all observations are complete, and must not be disrupted by regular ingest of new survey data.
Rationale: Rapid exploitation requires immediate access. The full UKIDSS programme will take up to 6 years or more, and users will want to undertake preliminary analysis after months of data accumulation rather than wait until the full survey datasets are released.
Implications: WSA design must allow for constant data ingest and regular data releases (e.g. interim survey products). WSA must allow for updates to calibrated quantities. WSA must allow for archiving of catalogues from `reruns' of the processing pipeline, as well as catalogues from previous runs, over pixel datasets in the event of bug fixes and/or enhancements of processing algorithms.
Note: The approach taken with the WFAU's SSS database is to locally mirror the entire released dataset so that two versions are held: a static online version, and another online (but inaccessible from the outside) version for updates. At a release point, the update version becomes the network online version, is copied back to mirror the latest updates, and the whole procedure is so cycled.
Requirement:
Version 1.0 Science Archive must be operable in time for WFCAM first light. Interim survey products must be released to the community on timescales determined by WFCAM observing periods (i.e. a survey `release' will occur as soon as possible after each observing period, and before the end of the following period).



T8:
Science Archive must allow requests for arithmetic operations, and options from an advanced processing toolkit, on pixel data.
Rationale: Pixel data volumes will be too large for efficient transfer to users home institute for manipulation.
Implications: WSA needs sufficient online storage for pixel data, and sufficient CPU, temporary storage and appropriate software toolkits for pixel manipulation.
Note: Astronomy community in general, and CASU, Subaru for example, are developing pixel processing algorithms. Not all routines will need coding from scratch.
Requirement: Version 2.0 Science Archive must allow requests for arithmetic operations, and options from an advanced processing toolkit (see later), on pixel data. (no requirement on the Version 1.0 Science Archive to allow this advanced functionality, since we do not anticipate any demand for this immediately after first light).

T9:
Science Archive must be scalable to VISTA data volumes.
Rationale: The WFCAM and VISTA cameras (and science programmes being pursued with them) are similar enough that it makes sense to produce a scalable solution from WFCAM to VISTA for cost effectiveness.
Implications: WSA developments must be open to scrutiny by, and must receive input from, the VISTA project.
Note: VISTA first light is currently scheduled for Q4 2006.
Requirement:
Despite the need to expedite delivery of the WSA, development will be made at all times with due regard to scalability to VISTA data volumes.



T10:
Science Archive must be able to merge reduced frames taken in non-photometric conditions with other data from the same survey.
Rationale: Rapid progress may require acceptance of sub-optimal observations in lieu of better, later repeated observations.
Implications: WSA must be able to cope with sub-optimal data and their subsequent displacement by better, repeat observations.
Requirement:
Science Archive (all Versions) must be able to cope with sub-optimal survey observations, and their subsequent displacement by better, repeated observations.



T11:
Science Archive must have some capability for the remote user to carry out data exploration and interaction in real time.
Rationale: The UKIDSS programme contains many instances (e.g. see the specific usage examples) where the remote user will want to manipulate and visualise large amounts of data quickly (i.e. without transfering the large dataset to their own machine).
Implications: Remote client GUI tools will need to be developed for the WSA to enable such interactive data exploration and manipulation. `Real time' interaction has implications for WSA response time when trawling Tbyte-sized datasets. Clearly, $\sim1000$sec response time is unacceptable for interactive use, while $\sim10$sec response time is unrealistic given current technological and financial constraints (such a fast response time may be feasible with a very high degree of parallelism, with consequent complexity and cost implications). For these purposes, a figure of $\sim100$sec response time seems reasonable.
Note: Of course, for queries on indexed quantities (position, image class, brightness and other commonly used attributes), WSA response time will be fast but ultimately limited by factors beyond the control of WFAU (eg. user network connectivity).
Requirement:
Version 2.0 Science Archive must have some capability for the remote user to carry out data exploration and interaction in real time, where `real time' is understood to mean a timescale of $\sim100$sec for wholesale trawls. No requirement on Version 1.0 Science Archive system to provide this speed; the ultimate goal should be a response time of $\sim10$sec.

Science archive contents and functions (minimum)

C1:
Contains calibrated object catalogues resulting from the pipeline, for both UKIDSS and open-time observations
Rationale: These are obvious, basic science archive functions.
Implications: Database schemas must be set up for various tables of object catalogues. Catalogue ingest software and procedures will be required. Software will be required for `post-processing' type operations, for example, merging routines and recalibration routines.
Requirement:
Science Archive (all Versions) must contain calibrated object catalogues resulting from the pipeline, for both UKIDSS and open-time observations



C2:
Ingests and stores pipeline output frames for later online processing, generates compressed pixel images on the fly for rapid web-based access, carries out immediate cross-referencing with existing UKIDSS survey data and produces consolidated UKIDSS catalogue in a given field
Rationale: Again, basic science archive functionality.
Implicattions: Database schemas must be designed to track between object catalogue tables and pixel data files. Pixel manipulation software will be required.
Requirement:
Science Archive (all Versions) must ingest and store pipeline output frames, allow rapid web-based access to images, and produce merged UKIDSS catalogues in a given field.



C3:
Is able to recalibrate a given field or fields in the event of revised calibration information (specifically, photometric and astrometric), and allow database queries on the recalibrated quantities
Rationale: Changes in calibration information are frequently encountered in survey operations, and the science archive itself may lead to such changes.
Implications: Database schema must allow provision for recalibration - e.g. stores positions as pixel co-ordinates plus and astrometric solution (consisting of specified model and coefficients); stores photometry as flux measures plus calibration data. Calibrated quantities will also be required to be stored in tables, since inverting calibration models to translate queries in calibrated units to uncalibrated ones will be difficult in general. The archive must be able to replace calibrated quantities when new ones become available. Calibration version control within the archive is required.
Requirement:
Science Archive must be designed from the start to enable astrometric and photometric recalibration.



C4:
Is able to cross-calibrate photometric information using areas of overlap between processed frames, where available.
Rationale: This is not a sensible function of the pipeline, which is required only to produce results on a night-by-night basis. The science archive will have all photometric information and calibrations for all superframes, and is where this should happen.
Implications: Calibration tools will be required to homogenise photometry over surveyed areas using overlap information and photometric zeropoints.
Requirement:
Version 2.0 Science Archive must be able to cross-calibrate using areas of overlap between processed frames, where available (no requirement on Version 1.0 Science Archive to cross-calibrate).

C5:
Allows public access to subsets of survey data on a variety of different search criteria (specified below)
Rationale: Basic science archive functionality.
Implications: For versatility, SQL-like querying is required, even if this is transparent to the user (e.g. simple access via web-form interface).
Requirement:
Science Archive (all Versions) must be designed to allow public access to subsets of survey data on a variety of different search criteria (specified later).



C6:
Allows rapid on-line cross-referencing of search results with other catalogues.
Rationale: consistent with T1, this requirement is expanded on later.
Implications: The Science Archive must undertake to store commonly used catalogues locally for combination queries in a queryable database.
Requirement:
Science Archive (all Versions) must have available commonly used catalogues (see later) stored locally. Version 2.0 Science Archive may additionally hold SDSS (and other survey) pixel data for joint querying - see later).



C7:
Allows generation of finder charts via a web form
Rationale: Simple to provide and useful when observing at a site remote from the UK.
Implications: Software will be required for generation of pixel and/or ellipse plot finder charts. A web form will be required as the user interface.
Requirement:
Science Archive (all Versions) must allow generation of finder charts via a web form.



C8:
Holds housekeeping information for all archived data.
Rationale: It is essential to propagate all available data description (e.g. FITS header data) through to the Science Archive, to enable users to query those data
Implications: The Science Archive must be able to track between object catalogue records, image data files and the housekeeping data. For example, to protect proprietary data rights the Science Archive will need to validate queries against the source of any particular image subset (e.g. UKIDSS, PATT time, etc.)
Requirement:
Science Archive (all Versions) must hold housekeeping information for all archived data.

Security

A1:
Archived data must be accessible only by validated users
Rationale: The WSA will contain data resulting from internationally competitive science proposals. Proprietary rights of the UKIDSS consortium and open-time PIs/CoIs must not be compromised by data being freely available through the online archive.
Implications: The Science Archive must have security systems in place that prevent unfettered access by opportunistic users, but at the same time must not become so protected that access by valid users is hampered (e.g. by constantly asking for usernames/passwords). Security systems must be able to cope with various proprietary periods, and allow unfettered access after appropriate time intervals. All of this in turn implies user registration with username/password login and/or `digital certification'.
Note: Any user (not just proprietors) should be able to derive information on what is in the archive without being given access to those data.
Requirement:
Science Archive data (all Versions) must be accessible only by validated users; archive content information should be available without restrictions.



A2:
Archived data must be uncorruptable by Science Archive users.
Rationale: Scientific exploitation will be compromised if data are corrupted.
Implications: Constant data ingest, recalibration of photometry/astrometry, and functionality enhancements imply a `living' archive that is subject to change. This opens up the possibility of accidental corruption, especially by local archive managers with read/write access to filesystems. Archive design must minimise the possibility of accidental corruption, and also insure against data loss and minimise reconstruction times by invoking and appropriate backup policy.
Requirement:
Science Archive (all Versions) must be uncorruptable by Science Archive users.



A3:
Science Archive must allow data protection on the basis of proprietary data (per frame)
Rationale: Proprietary periods will be different for different observations (survey/non-survey).
Implications: Security systems must be able to cope with various proprietary periods, and allow unfettered access after appropriate time intervals.
Requirement:
Science Archive (all Versions) must allow data protection on the basis of proprietary data (per frame)



A4:
Science Archive must be quickly recoverable in the event of corruption by hardware/software faults etc.
Rationale: Clear need to ensure against data loss.
Implications: Science Archive will require backup on removable media and/or 100% redundant storage with data striping (i.e. fault tolerant hardware/software).
Requirement:
Science Archive (all Versions) must be quickly recoverable in the event of corruption by hardware/software faults etc.

Detailed requirements

The following requirements form the baseline for the WSA; they are an expansion of the top-level requirements above and items D in the `Detailed Requirements'. Following T1 above, we have divided the requirements into those that must be in place for WFCAM first light and those that need fulfilling after a significant amount of data have accumulated. There are several reasons for this: i) the timescale for the delivery of WFCAM is short, so there is limited time for R&D concerning a large, scalable archive system; ii) such a system is not required at first light anyway since data volumes will be of limited size initially; iii) a phased approach means that the final large hardware purchase can be delayed as long as possible. So, we have grouped these into `Version 1.0 requirements', and `Version 2.0 requirements'; some requirements appear in the earlier version with limited scope, and in the later versions with full-blown functionality. We include some more long-term goals which may or may not be delivered, contingent on implementation and resource constraints, and delivery of appropriate tools/knowledge from related e-science projects (e.g. AstroGrid).

Version 1.0 requirements

T1/T7: The `Version 1.0' working science archive must be in place in time for WFCAM first light (currently scheduled for September 2003).
T2: Science Archive must contain and serve pipeline processed data (pixels, object catalogues and housekeeping data) from both UKIDSS and other usage (e.g. open time, commissioning time).
T3: Science Archive will match UKIDSS survey requirements as they are currently specified, but will be flexible enough to follow changes in survey design.
T4: Science Archive will conform to any existing `Virtual Observatory' standards and will be designed such that new standards can be easily incorporated, but must not be delayed by waiting for new developments to crystalize.
T5: Science Archive will allow simple (see below) queries.
T6: Science Archive must have an interface that is open to simple, intuitive use by the non-expert.
T9: Despite the need to expedite delivery of the WSA, development will be made at all times with due regard to scalability to VISTA data volumes.
T10: Science Archive must be able to cope with sub-optimal observations, and their subsequent displacement by better, repeated observations.
C1: Science Archive must contain calibrated object catalogues resulting from the pipeline, for both UKIDSS and open-time observations
C2: Science Archive must ingest and store pipeline output frames, allow rapid web-based access to images, and produce merged UKIDSS catalogues in a given field.
C3: Science Archive must be designed from the start to enable astrometric and photometric recalibration.
C5: Science Archive must be designed to allow public access to subsets of survey data on a variety of different search criteria (specified below).
C6: Science Archive must have available commonly used catalogues (see later) stored locally.
C7: Science Archive must allow generation of finder charts via a web form.
C8: Science Archive must hold housekeeping information for all archived data.
D1: Science archive must allow searching individual (or all) UKIDSS surveys on the following criteria (or combination of them):

D3: Science Archive must allow similar queries to be repeated for all objects in a user-supplied source catalogue.
D4: Science Archive must allow combinations of queries on UKIDSS data and the following other source catalogues: D6: Science Archive must have a simple interface for very quick searching on a given object name or position.
D8: Science Archive must return pixel images, confidence maps and catalogue data in gzipped FITS format, and must allow users to specify the output format of returned data as follows: D9: Science Archive must be able to return pixel data in any available passband, over a contiguous field up to one `tile' ($0.8^{\circ}$) across together with a matched catalogue.
D11: Science Archive must be able to generate and return stacked images given a user-selected list of input images and the standard stacking algorithm in the CASU basic pipeline.
D12: Science Archive must be able to generate and return merged multi-colour, multi-parameter catalogues with the best available photometric and astrometric calibrations.
D13: Science Archive must support federation with the source catalogues specified in D4 above
D14: Science Archive must be able to generate and return meaningful optical/IR colours for all objects in the overlap with the existing SDSS data where counterpart detections occur in the SDSS object catalogue.
D16: Science Archive must support the returning of only a subset of the entire possible array of object parameters.
D19: Science Archive must be able to produce a finder chart of size up to 10 arcmin for any region within which survey data exist, returning ellipse detection plot and/or a single colour pixel plot, as specified by the user.
D20: Science Archive must allow access to best or duplicate data for objects in overlapping survey data.
D21: Science Archive must allow general access to all housekeeping data - e.g. for a given survey area, what is currently available, how good it is, etc.
D22: Science Archive must store uncalibrated quantities, calibrated quantities and the calibration model/coefficients. Archive output must therefore include (in headers) D23: Science Archive must allow a summary of data available to be generated for a given search region.
A1: Science Archive must be accessible only by validated users.
A2: Science Archive must be uncorruptable by Science Archive users.
A3: Science Archive must allow data protection on the basis of proprietary data (per frame).
A4: Science Archive must be quickly recoverable in the event of corruption by hardware/software faults etc.

User access is to be through web forms providing fill-in boxes and button clicks, and also via an SQL query form interface; a command-line interface for remote users to bypass interactive webforms will also be provided.

The summary in Section 4 gives an explicit statement of the Version 1.0 WSA contents and functionality.

Version 2.0 requirements

In addition to the Version 1.0 requirements:

T1: A fully functioning archive system, as defined by the requirements (and where possible, goals) herein, must be available as soon as possible after WFCAM first light, and no later than 1 year after survey operations begin in earnest.
T4: Science Archive must eventually conform to internationally agreed VO standards in access tools, data product formats and transfer protocols.
T5: Science Archive will allow usages at varying levels of complexity (as defined later).
T7: Interim survey products must be released to the community on timescales determined by WFCAM observing periods (i.e. a survey `release' will occur as soon as possible after each observing period, and before the end of the following period).
T8: Science Archive must allow requests for arithmetic operations, and options from an advanced processing toolkit (see later), on pixel data.
T9: WSA solution must be scalable to VISTA data volumes.
T11: Science Archive must have some capability for the remote user to carry out data exploration and interaction in real time: the Science Archive response time should be $\sim100$sec for wholesale trawl-type queries.
C4: Science Archive must be able to cross-calibrate photometric information using areas of overlap between processed frames, where available.
C6: Science Archive must have the final SDSS catalogues (and, if possible, images) stored locally, in addition to the catalogues specified for the Version 1.0 Science Archive.
D1: Science Archive must allow searching individual (or all) UKIDSS surveys on the following criteria (or combination of them):

D2: Science Archive must allow searching within open-time programme data using the same criteria as D1 (where possible), returning whatever data are available.
D4: Science Archive must allow combinations of queries on UKIDSS data and the following other source catalogues: D5: Science Archive must allow arithmetic functions to be used in setting up complex queries (e.g. for a colour index not stored in survey catalogue tables)
D6: Science Archive must have a remote GUI application for formulating queries (e.g. an interface analogous to the SDSS Java-based query tool).
D7: Science Archive access GUI must allow plotting of returned parameters, in selected (X,Y) pairs or histograms, and also provide basic fitting routines.
D10: Science Archive must be able to generate (on-the-fly) and return larger (than D9) areas from survey data traversing survey tile boundaries, blocked down as specified by the user, in formats specified in D8.
D11: Science Archive must be able to generate and return stacked images using user-specified (see later) stacking algorithm options.
D12: Science Archive must be able to generate and return merged multi-colour, multi-parameter catalogues with the best (or previous as specified by the user) photometric and astrometric calibrations.
D13: Science Archive must support federation with the source catalogues specified in D4 above
D14: Science Archive must be able to generate and return meaningful optical/IR colours for all objects in the overlap with the SDSS, whether or not detected in the SDSS data (i.e. it must be possible to place an aperture in and measure the flux from SDSS image data given the position of an IR source detection).
D15: Science Archive must support ANDing of one query with another, where both have already been executed.
D17: Science Archive must allow trial-and-error searches (e.g. return the number of source hits rather than the output results), for any valid query
D18: Science Archive must allow repetition of queries using previous versions of astrometric and photometric calibrations.
D19: Science Archive must be able to produce a finder chart for any region within which survey data exist, returning a colour pixel plot, as specified by the user, generated from available single-passband images of the same field.
D20: Science Archive must allow access to best or duplicate data for objects in overlapping survey data, and must contain proper motion measures for objects where multi-epoch position measurements exist.

Section 4 gives an explicit statement of the Version 2.0 WSA contents and functionality.

Goals

T11: Science Archive response time should be $\sim10$sec for wholesale trawl-type querying.
C6: Science Archive will, insofar as external developments allow, be integrated into the `Virtual Observatory' (VO) as a general solution to rapid, online cross-referencing with any published astronomical catalogues that are also contained within the VO.
D1: Science Archive may recast web services as `Grid services' (a Grid-based solution to user access) in collaboration with AstroGrid.
D4/13: Science Archive may allow combinatorial queries with catalogues anywhere on the `data-Grid', i.e. may allow database federation across the grid.
D7: Science Archive will aspire to the mantra `ship the results, not the data', i.e. may allow remote procedure calls to advanced manipulation tools and may allow user upload of analysis codes.
D10/11: Science Archive may ultimately support advanced visualisation tools, e.g. large area, panoramic pseudo-colour images with panning in real time; three-dimensional catalogue parameter plotting and rotation.


SPECIFICATION OF CONTENTS AND FUNCTIONALITY

At its meeting on 2002 November 25, the UKIDSS Consortium met and discussed the requirements and usages along with the WSA development plan. The Consortium suggested several changes along with some issues for discussion. The results of these discussions have been folded into this document, yielding the following specification (in as much detail as is possible at this time) for the WSA functionality and contents at Versions 1.0 and 2.0 (note: this specification will be developed in later documents). The V2.0 requirements can be considered `goals' of V1.0.

Version 1.0

WSA Version 1.0 is deliverable at WFCAM first light (currently scheduled for September 2003). In addition to the following, WFAU undertakes to apply UKIDSS-specified algorithms, and import UKIDSS-supplied catalogues, to the WSA in lieu of automatic tools for such functionality (see Version 2.0).

Contents

The V1.0 WSA will contain the following information in a relational DBMS:

  1. Observations Information containing details of observations contained in the archive and their generic properties;
  2. Image Information containing details of all images (stored as flat files) in the archive along with housekeeping data (from stripped FITS headers);
  3. Observations Catalogue Information containing the object catalogues, generated by the CASU standard pipeline, associated with each image, and list-driven source catalogues between the different passbands in any given field;
  4. Merged Catalogue Information for each of the accumulating UKIDSS subsurveys LAS, GPS and GCS (merged in the sense that the `same' objects observed in different colours and/or at different times will be merged into one multi-colour, multi-epoch record);
  5. Catalogues for 2MASS, SDSS DR1, SSS, USNO-B, FIRST, IRAS and ROSAT-ASS surveys;
  6. A Survey Progress Catalogue, containing for each of the 5 UKIDSS subsurveys information on observations taken to date;
and also image data (pixels with confidence maps; default stacks for the) deep surveys; and difference images for the GPS K band) in flat files, along with a large reserve (scratch) workspace for use during querying. The V1.0 WSA will also contain online documentation and `cookbook' style worked examples to aid users.

Functionality

The V1.0 WSA will have the following access points:

  1. A web interface allowing searching of individual (or all) UKIDSS survey catalogues on the following criteria: and additionally the same searching functions on a user-specified ASCII (space separated) of centres (sexagesimal or decimal degrees) and search radii (i.e. a batch mode search). This interface will also produce ellipse plots for use as finder charts. For an example of such an interface, see WFAU's SuperCOSMOS Sky Survey access page http://www-wfau.roe.ac.uk/sss, particularly the `Get a CATALOGUE' interface.
  2. A web form interface allowing querying of individual (or all) WSA catalogues (e.g. UKIDSS survey catalogues, housekeeping data, details of archived images) via Structured Query Language (SQL), with push-button options for the format of output data: combinatorial queries with the 2MASS, SSS, SDSS-DR1 and USNO-B catalogues will be provided for. For an example of such an SQL interface, see WFAU's 6dF access interface at URL http://www-wfau.roe.ac.uk/6dFGS/SQL.html.
  3. A web form interface that returns pixel data (images and confidence maps) given an arbitrary input position (as in 1 above) and size up to $0.8^{\circ}$ (one WFCAM tile) as follows: For an example of such an interface, see WFAU's SSS page (URL above), particularly the `Get an IMAGE' facility.
`Remote server' functionality for web-based browsing tools (e.g. SkyCAT/GAIA/Aladin) will be provided for some of the above image/catalogue servers, along with a command line interface for remote user non-interactive web access. Archive response time for catalogue queries will be rapid for indexed quantities as follows: position, magnitude, colour, and image class.

Version 2.0

Version 2.0 is deliverable no later than one year after survey operations begin, and will include more `database driven' products and features. In addition to contents and functionality provided in V1.0, the following specifies the V2.0 contents and functionality.

Contents

The V2.0 WSA will additionally contain:

  1. Externally provided catalogues and pixel data (UKIDSS complementary imaging, and SDSS data release as avialable at that time);
  2. A database of open-time observations;
  3. Enhanced UKIDSS catalogues containing derived information (e.g. proper motions, dereddened colours, catalogue parameters from placing apertures on SDSS pixels at WFCAM detection positions) where possible using available data.

Functionality

In addition to the simple access tools provided in V1.0, one (or more) advanced GUI(s) will be provided that have the following functionality:

  1. User-specified options for stacking pixel data, i.e. select images to be stacked, and the stacking algorithm from a choice of: i) unweighted; ii) sensitivity weighted; iii) psf matched ...; iv) ...further toolkit options ...;
  2. Arbitrary sized, mosaiced images (across tile boundaries), blocked down as appropriate, with a multi-colour option;
  3. Source extraction options on any specified subset or bespoke stack of pixel data: i) CASU standard source extraction; ii) SExtractor; iii) mutiple simultaneous profile fitting (i.e. DAOphot-like); iv) ...further toolkit options ...;
  4. Data exploration/interaction facilities: simple XY plotting; histogram plotting; simple model fitting routines (generalised least-squares with robust outlier rejection);
  5. Automatic user-supplied catalogue ingest facility for joint querying with existing catalogues;
  6. Enhanced output format options to include any new Virtual Observatory standards available at that time;
  7. Ability to analyse archive pixel data (both WFCAM and other, e.g. SDSS) at arbitrary positions defined by an input list of positions, apertures and/or profiles types/models (ie. list-driven photometry for any data);
  8. Generalised difference imaging (and subsequent source analysis)
  9. Persistence of multi-stage usage/query; storage of intermediate user-generated results sets

Additionally, the web-based access tools in V1.0 will be supplemented with a `web service' interface (eg. a non-interactive access tool employing XML format data transfered using Simple Object Access Protocol) to provide, where appropriate, non-interactive access to pixel and catalogue data. Archive response time is to be $\sim100$sec for wholesale catalogue trawls on non-indexed quantities.

Later Versions

At this time, we make no explicit statement concerning the functionality of subsequent WSA versions.

APPENDICES


Background

WFCAM (see http://www.roe.ac.uk/atc/projects/wfcam/index.html) will enable the next generation wide-angle sky survey to be undertaken in the UK. It follows on from the hugely successful UK Schmidt photographic surveys of the last decades of the twentieth century, the major difference between the old and the new being the data rates and volumes that will be produced. WFCAM employs 4 2k$\times$2k Rockwell devices and has an instantaneous field-of-view of 0.21 square degrees. WFCAM is expected to be on-telescope for the greater fraction of all available UKIRT time, and will have average/peak data rates of 100/230 Gbytes per night. It will commence operations in the final quarter of 2003. VISTA, on the other hand, is a dedicated survey telescope with an IR camera employing 16 2k$\times$2k devices in a 0.44 square degree FOV. The data rate for VISTA will be $\sim400$ Gbytes per 10 hour night, and this facility is expected to begin operations in the third quarter of 2006. In terms of both timescale and scope, WFCAM therefore represents a natural `stepping stone' to VISTA, and in the overall scheme of UK wide-field astronomy the WFCAM project can be thought of as `VISTA phase A'.

There is, of course, a clear need for 4m survey facilities in the era of 8m-class telescopes; the relative performance of WFCAM (as measured by its `grasp', or information gathering product A$\Omega$) shows (see, for example, the original VISTA science case, available from http://www.vista.ac.uk/) that it is amongst the world's leading IR survey instruments, even when including other non-dedicated survey facilities such as VLT-IRMOS. The combined science case (for complete details, follow the URL http://www.ukidss.org/sciencecase/sciencecase.html) proposed by the UKIDSS consortium for WFCAM, for example, details a programme that is unrivalled in terms of depth, field of view and therefore survey volume. UKIDSS proposes a nested series of surveys ranging from the Large Area Survey (`LAS', 4000 sq. deg. to K=18.4), the Galactic Plane Survey (`GPS', 1800 sq. deg. to K=19), the Galactic Clusters Survey (`GCS', 1600 sq. deg. to K=18.7), the Deep Extragalactic Survey (`DXS', 35 sq. deg. to K=21) to the Ultra-Deep Survey (`UDS', 0.8 sq. de.g. to K=23). The image data alone for these amounts to $\sim50$ Tbytes of data, while the object catalogue and ancillary information are likely to be many Tbytes in size. VISTA survey data volumes will likely be more than $5\times$ those of WFCAM.


The need for science archives.

The question naturally arises as to how science exploitation of such large datasets will be undertaken. Data volumes will simply be too large for users to download and keep their own copies. Raw data processing is likely to be complicated, while calibration procedures will evolve as cameras are better characterised and more calibration data are obtained. Reprocessing of substantial amounts of pixel data may be necessary in the light of improved algorithms or for specific `non-standard' science goals. Once data are reduced using standardised pipeline procedures, the establishment of a centralised `science archive' offers the greatest potential for full science exploitation (see the paper presented by Lawrence et al. at the 2002 SPIE meeting in Kona, Hawaii; available online at http://www.roe.ac.uk/~nch/wfcam/misc). Again, calibration procedures can be more easily developed and applied in a controlled manner to data in a central repository - it makes sense to solve data-specific reduction and calibration problems once, yielding an optimal solution. Early community access to well calibrated data will facilitate timely science exploitation. A well constructed science archive will enhance greatly the scope of research that can be done with the survey data; in fact, many science applications will only be feasible via a sophisticated science archive. For example, much of the science that will be done with the UKIDSS LAS will rely on complementary data from the SDSS and other non-IR wavelength surveys. Given the volume of all of these datasets, some thought needs to go into the design of the archive to enable full exploitation.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.47)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -notransparent -white -split 0 wsasrd

The translation was initiated by Nigel Hambly on 2003-03-20


next_inactive up previous
Nigel Hambly 2003-03-20