next up previous
Next: Key design and analysis Up: PROPOSED PROGRAMME OF WORK Previous: PROPOSED PROGRAMME OF WORK

The challenge.

Specifying, designing and implementing a science archive is clearly complicated, and requires no small amount of creative R&D and coding. However, the effort will be very much rewarded in terms of enabling the fullest possible science exploitation of the survey data.

With reference to Section 1, a single overwhelming problem becomes apparent: the problem is the huge increase in data volume. Previous ad hoc solutions based on flat-file storage and basic interfaces (e.g. SSS and APMCAT) simply will not scale to the WFCAM and VISTA surveys. For example, a simple trawl of the SSS 1 Tbyte object catalogue, examining each record in turn (e.g. looking for non-stellar sources with colours redder than a given value) would take several days under the current database system. For a more complex query, with many predications, this IO limited trawl may take up to a week. For datasets with volumes larger by an order of magnitude or more (e.g. the WFCAM and VISTA surveys) such a simple database solution is clearly impractical. Moreover, users will naturally (and rightly) aspire to apply more sophisticated kinds of analysis with the next-generation survey data, as well as requiring the more familiar position/proximity basic querying but on vastly increased data volumes. Hence, the problem boils down to one of hugely increased storage and CPU power enabling archiving of large datasets and their exploitation via rapid searching and new online analysis tools.

There are a couple of `live' issues concerning our proposal; these may require some small adjustments to the programmes of work and the division of labour between us and our collaborators at CASU in addition to that between both Wide Field units and VO initiatives like AstroGrid. Firstly, although the division of responsibilities between CASU and WFAU as stated above seems clear-cut, there are some grey areas where close collaboration will be needed (for example, tasks such as stacking, mosaicing and `advanced pipeline' operations, e.g. tunable on-the-fly pixel analysis). Secondly, the boundary between the responsibilities of the Wide Field units and VO initiatives is presently unclear; e.g. for the science archive, in our proposal WFAU undertakes to produce basic and enhanced user interfaces; full-blown VO functionality will require an as yet unknown resource from VO sources.

Given the scope of the WFCAM and VISTA projects and the number of distinct organisations involved (JAC/ATC/CASU/WFAU for WFCAM and additionally ESO for VISTA) we propose an initial programme of analysis and documentation for '02/'03, followed by a period of implementation (for WFCAM only, but we reiterate that work will be carried out with due regard to applicability and scalability to VISTA) in '03/'04. The second half of the grant period ('04/'06) will consist of curation and development of the WFCAM science archive and a programme of analysis, documentation and initial prototyping for scaling our archive solutions to VISTA data volumes. A GANTT chart is shown in Appendix B; this summarises the following description:


next up previous
Next: Key design and analysis Up: PROPOSED PROGRAMME OF WORK Previous: PROPOSED PROGRAMME OF WORK
Nigel Hambly 2002-08-15