next up previous
Next: PROPOSED PROGRAMME OF WORK Up: wfcamnewrg Previous: The need for `science

REPORT ON PROGRESS 1999 - 2002

WFAU and CASU have a proven track record in survey astronomy, both independently and in collaboration. Both groups have been routinely handling survey data at rates of several tens of Gbytes per day for some time now, and have much experience in delivering survey products to the international astronomical community - see, for example, details of the SuperCOSMOS Sky Survey in Section 4(d).

With reference to WFCAM, the end-to-end project is overseen by JAC at Hawai`i, and in broad outline consists of:

i.e. CASU and WFAU have formally undertaken to collaborate on pipeline processing and science archiving for all WFCAM data (not just UKIDSS data). At present we envisage that a similar approach, but of course overseen by ESO and the VISTA Project Office, will be taken for VISTA.

It has been generally accepted that the requirements for processing, archiving and distributing large datasets falls under the general heading of `e-science'. At a recent meeting with representatives from PPARC, CASU and WFAU it was agreed that the WFCAM project is a natural pathfinder for VISTA, and that a formal approach would be made to the PPARC Grid Steering Committee detailing, in broad outline, the work that needs to be undertaken to ensure the scientific success of both projects. In this way, funding has now been secured via the GSC (for more details, see document GSC(02)03) for a joint management structure. Funding for development of solutions to the processing, science archiving and hardware needs has been reserved (but we note that this does not include any provision for operations of either project). Furthermore, WFAU and CASU have played a crucial role in the AstroGrid project. This is concerned with more generic e-science issues, but it is important to note that the WFCAM and VISTA projects were prime drivers behind the development of the AstroGrid programme. Ultimately, data from all survey programmes using both these instruments are expected to form key components of the UK Virtual Observatory.

During the past 3 yr period, WFAU has been instigating R&D projects and establishing collaborative links to address large database problems. At the basic level, we have gained significant experience in the production, curation and user support of the SSS archive (which is $\sim2$ Tbyte in volume). With an eye on more advanced database technology and the future requirements for large astronomical databases in the UK we have been following the development of SX (Thakar, Kunszt & Szalay 2001), the archive system for the multi-terabyte SDSS7. This has produced a solution to the generic large astronomical database problem, and our proposed solution to the specific WFCAM (and subsequently VISTA) science archive development problem is based on this since we believe SX, which is based on $\sim15$ staff-years of software development, is unlikely to be bettered using available resources. SX is a C++ software suite that sits in between remote GUI applications (for example the SDSS query tool sdssQT) and the DBMS (in the case of the SDSS-EDR, Objectivity). It allows SQL-like queries (i.e. much more than simple access to data subsets predicated on position/proximity) and is necessary to interface the GUI (which in turn interfaces to the user) and the low-level commercial DBMS software which curates the database. WFAU has established links with the Johns-Hopkins University group responsible for the development of SX, and has established a UK mirror for the SDSS-EDR8 (a mirror of the STScI MAST access point employing PHP software). We are undertaking implementation of the SSS archive in SX, and are collaborating with JHU on parallelisation of SX employing MPI on Beowulf PC farms. We intend to build on this, and our experience in creating the SSS and other similar products, to design and implement a science archive for WFCAM data that is scalable to VISTA data volumes.

On the hardware side, we have employed monolithic RAID technology as a solution to mass storage and fast random access for the SSS. We have established an 8-node Beowulf cluster for our SDSS-EDR mirror, and have also acquired a 12-node rack-mounted Beowulf for experimentation with parallelisation techniques (both Beowulfs were funded through University and JREI sources). We have begun an analysis of the hardware requirements for the WFCAM science archive as part of an end-to-end data flow analysis9 and have made preliminary approaches to vendors, eg. Eclipse Computing (providers of bespoke hardware solutions like our monolithic RAID system and Beowulf clusters) and Sun Microsystems10. Finally, the Blackford Hill site is now connected to SuperJanet via two 1 Gbit s$^{-1}$ network links as part of a SRIF upgrade award.

CASU has been positioning itself on the data processing side. More details are of course in their RG renewal case, but a brief summary is pertinent at this point. Data processing software, and data processing pipeline infrastructure have been worked on. CASU already has provisional software architecture in place for the basic and summit pipelines and have started to address several of the elements of advanced pipeline functionality, all based on a modular design using PERL as a high level user interface and scripting language with C modules as the algorithm engines.

In developing things so far CASU and WFAU have obviously had to make provisional assumptions regarding WFCAM science and data products requirements and indeed have based these on the corresponding VISTA documents and the UKIDSS and VISTA science cases. We reiterate that we are building on our experience in wide-field survey projects and emphasise again the VISTA/WFCAM synergy. Obviously we expect to modify plans given feedback from the user community but of course do not anticipate having to start from scratch in the planning stages of the following proposal.


next up previous
Next: PROPOSED PROGRAMME OF WORK Up: wfcamnewrg Previous: The need for `science
Nigel Hambly 2002-08-15