Next: The data volume challenge Up: The WFCAM/UKIDSS data archive Previous: The WFCAM science programme

The WFCAM pipeline/archive project

All UK-owned WFCAM data, whether public UKIDSS data or open-time private data, will be processed through a pipeline producing standardised data products, and ingested into an on-line science archive. The UKATC, as builders of the camera, will provide software for operation of the instrument, and extraction of data from it. Following this, the responsibility for the pipeline and archive system belongs jointly to the JAC, and the two UK rolling-grant funded wide field astronomy groups, the Cambridge Astronomy Survey Unit (CASU) of the Institute of Astronomy, Cambridge University, and the Wide Field Astronomy Unit (WFAU) of the Institute for Astronomy, Edinburgh University. In addition, the National Astronomical Observatory of Japan (NAOJ) at Hilo, Hawaii, and Mitaka, Tokyo will collaborate to develop further advanced pipeline facilities. The JAC has end-to-end responsibility. CASU is taking prime responsibility for the pipeline processing, and WFAU for the science archive, but the boundaries between these activities is blurred, and all teams are working closely together. Finally, the pipeline/archive project is formally integrated with the VISTA Data Flow System (VDFS) project, which is led by Jim Emerson at QMUL.

All WFCAM observations will be in a queued mode, using UKIRT's new Observing Management Protocol (OMP) system. Furthermore, there will be very few possible observing modes. For both UKIDSS and open-time observations we will be enforcing fixed standard calibration procedures. All these simplifying factors make a standard pipeline possible for all WFCAM data. However, no standard pipeline will ever squeeze all the possible information out of the data, and there will always be users, or types of question, that require different assumptions or algorithms in the processing. We are not attempting to construct an all-purpose completely flexible pipeline toolkit, but rather a processing pipeline to produce pre-agreed standard data products. These should be good enough for most purposes, but the design is optimised to produce the survey products demanded by the UKIDSS project, with optimal stacking, mosaicing, and source extraction, and uniform astrometric and photometric calibration across survey fields. The Science Requirements Document (SRD) for the pipeline and archive is under construction as we write, and is expected to be agreed with the JAC and with the UKIDSS consortium by the end of 2002. More advanced processing, and an on-the-fly user-pipeline toolkit may also be constructed, but these are over and above the commitment to the standard processing.

The processing can be seen as divided into several stages - data acquisition, the summit pipeline, the basic pipeline, further survey-wide processing, ingestion to the archive, refinement of calibration, and serving the data to users. At the summit, data from all exposures within a single night at the same telescope pointing position (including micro-steps within that position) and using the same filter, are co-added on the spot by the Data Acquisition System (DAS). Regardless of the macroscopic dither pattern through the night, data from the four arrays are kept distinct, as are data from different pointings, so that the data written to media consist of a collection of 4096 co-added frames within one night. These frames are then the analogy of traditional plates and form the basic units of the archive from which all other data products are constructed. The main purpose of the summit pipeline is to generate near real-time Data Quality Control (DQC) information from the co-added frames. It will use fixed library frames for instrument signature removal (e.g. flat fielding) and will do a first cut source extraction. The reduced frames, DQC, and a statistical analysis of the source lists, will be examined in Hawaii by UKIRT and or UKIDSS staff and used to update a survey progress database, which then feeds back to the observing queue.

The raw data (i.e. the collection of co-added 4K frames from each night) will be sent to the UK on a daily basis for processing with the basic pipeline in Cambridge. The default plan is that the data will be sent on a nightly tape, but we are discussing the possibility of sending a hot-swappable hard disk drive, as used by the ESO NGAST system. The basic pipeline will use the same software as the summit pipeline, but it will process the real calibration data, and will estimate a separate PSF from each frame. The pipeline removes instrument signature, does a first cut photometric and astrometric calibration, and a default source extraction. The result is a calibrated version of the collection of 4K frames, and a separate source list for each frame.

A series of further processing steps is needed to make final survey products, but these steps can only be carried out as the survey data accumulates in the science archive, so they need to be especially carefully planned between the CASU and WFAU teams. The most obvious thing is optimal stacking of matching frames from different nights, and mosaicing to produce a final large pixel-map for each survey. (Once again the first aim is a standard single pixel map, but we will also develop the facility to build images of any given sky-area on the fly from the constituent frames using different sampling and stacking choices). The next step is improved PSF generation including variations over the field and within a frame stack. Next there is improved source extraction from the stacked data, including detection and parameterisation of Low Surface Brightness Objects and transient events. Then we have pairing of sources across catalogues in different filters to make YJHK colours, and pairing with objects in external catalogues, such as the SDSS. Finally, as large area surveys are accumulated, we will revisit astrometric and photometric calibration looking for systematic gradients and step functions, and making external checks, and eventually, over several years, deriving proper motions and variability parameters.

Finally the data is ingested into a public science archive housed in Edinburgh. This will have both interactive and batch modes, and all data in it will be well calibrated and documented. It will contain all the final survey pixel maps and paired source-lists, but will also contain all the constituent nightly 4K frames and their standard one-filter source lists, as well as a browsable database of the available frames. As a minimum, the user will be offered the ability to download arbitrary subsets of these data, including on-the-fly mosaics of small areas using specified combinations of frames. However, we expect to offer rather more functionality, as discussed in the section after next.

Next: The data volume challenge Up: The WFCAM/UKIDSS data archive Previous: The WFCAM science programme

Nigel Hambly 2002-10-02