Usages of the WFCAM Science Archive

Revision date: Jan 9th 2003

Nigel Hambly and Ian Bond

Here are some example "usages" of the WFCAM science archive. We use the term "usage", rather than "query" (used, for example, in a similar exercise undertaken by the SDSS archive team in specifying the SQLServer implementation of the SDSS Science Archive) because it seems that a lot of the applications that the user would want are going to involve a combination of interactive analysis and on-the-fly processing (or indeed feeding the results of one application into another) as well as formulating SQL queries. Furthermore, these usages concentrate on science applications rather than Science Archive/DBMS implementation details. Note that we prefer the term "usage" to "use case" since the latter is commonly used in Universal Modelling Language to describe the general, high-level interaction between a system and an "actor", whereas the following examples are specific functions of the Science Archive.

As a starting point for generating these usages, we have used the UKIDSS proposal, specifically the 11 "key science goals" (proposal Section 1.5). We then detail other usages extracted from science goals within the subsurvey components of UKIDSS, and finally give some more general examples of data exploration and data mining. The usages have been supplemented with some discussion and analysis (appended) following review by the UKIDSS Consortium.


U1:  Count the number of sources in the LAS which satisfy the colour constraints (Y-J) > 1.0, (J-H) < 0.5 where SDSS i,z flux limits at the same position are less than 2-sigma. User then refines the query as necessary to give a reasonable number of candidates. When satisfied, the user requests a list, selecting output attributes from those available for the LAS, and finder charts in JHK for each object.

This is to search for the nearest and faintest substellar sources (first UKIDSS key science goal).


U2:  List all star-like objects with izYJHK SDSS/UKIDSS-LAS colours consistent with the colours of quasars at redshifts 5.8 < z < 7.2 or z > 7.2 (user specifies cuts in colour space). Return plots of (i-z) v. (z-J) and (i-Y) v. (Y-J) with these sources plotted in a specified symbol type, with 1 in every 10,000 other stellar sources plotted as points.

This is to search for the highest redshift quasars, and break the z=7 QSO barrier (second and third UKIDSS key science goals).


U3:  For a given cluster target in the UKIDSS GCS, make a candidate membership list via selection of stellar sources in colour-magnitude, colour-colour and proper motion space. Cross-correlate the candidate list against a user-supplied catalogue of optical/near-infrared detections in the same region.

Allows the user to determine the substellar mass function for the cluster (fourth UKIDSS key science goal).


U4:  From the UKIDSS LAS, provide a list of all stellar objects that have measured proper motions greater than 5x their estimated proper motion error; additionally give a count of all stellar objects that are unpaired between the two epochs of the LAS observations with specified conditions on image quality flags. User then refines these conditions to produce a manageable list of very high proper motion candidate stars. Return finder charts in JHK for all candidates.

Investigate potential Population II BDs (fifth UKIDSS key science goal) and also very cool, helium-atmosphere Halo WDs (if they exist; LAS Section 2.2) via their high proper motion.


U5:  From the UKIDSS DXS & UDS, construct galaxy catalogues. User selects all non-stellar sources satisfying quality criteria. User also requires the spatial sampling of this catalogue. Cross-correlate the galaxy catalogues against user-supplied optical catalogue in the same region.

Construct basic galaxy catalogues as a tool for further study (e.g. for z=1 and 3; measure the growth of structure from z=3 to the present day and clarify the relationship between QSOs, ULIRGs and galaxy formation: sixth, seventh and ninth UKIDSS key science goals)


U6:  From the UKIDSS LAS, construct a galaxy catalogue for all non-stellar sources satisfying K < 18.4 and given quality criteria; return full photometric list from SDSS & UKIDSS: ugrizYJHK. User also requires the spatial sampling of this catalogue.

Construct an infrared-selected galaxy catalogue for local studies (seventh UKIDSS key science goal).


U7:  From the UKIDSS UDS, select a sample of galaxies with colours and morphology consistent with being elliptical galaxies. Provide a spatial mask to enable determination of sample characteristics. Provide a measure of the half-light radius for each galaxy.

Constructs a candidate elliptical galaxy sample at high redshift. This is the first basic step in studying the epoch of spheroid formation (eighth UKIDSS key science goal; see also UKIDSS proposal Sections 6.2 and 6.4 for details).


U8:  From the UKIDSS GPS, provide star counts in 10 arcmin cells on a grid in Galactic longitude and latitude; also provide a list of cells where there is any quality issue rendering that cell's value inaccurate.

Enables mapping of the Milky Way through dust via infrared star counts (tenth UKIDSS key science goal).


U9:  From the UKIDSS GPS, provide a list of all sources that have brightened by a given amount in the K band.

Provides the means to increase the number of known YSOs (including rare types such as FU Orionis stars) by an order of magnitude (eleventh UKIDSS key science goal).


U10:  Provide a plot of g-J vs J-K for all point-like sources detected in the UKIDSS/LAS survey subject to quality constraints. User interacts with the plot to fit a straight line (g-J)=a+b(J-K) to the main sequence stars. Then find all UKIDSS/LAS sources with g-J>a+b(J-K), 4>g-J>-1, and 3>J-K>-1.

This is how a remote user may select quasars using the "K-excess" method. This is an example where remote interactive analysis is required (LAS KX-selected quasars; UKIDSS Section 2.6).


U11:  Construct H2 - K difference image maps for all CCD frames within a specified subregion surveyed by the GPS.

These difference image maps are used to study "macrojets" (UKIDSS GPS Section 3.2)


U12:  Find all galaxies with a de Vaucouleurs profile and infrared colours consistent with being an elliptical galaxy in the Virgo region of the UKIDSS LAS.

e.g. UKIDSS LAS Section 2.4 Virgo (4); to study the IR morphology of specific galaxy types in the Virgo cluster.


U13:  Given input co-ordinates and a search radius (arbitrary system and reference frame) provide a list of all WFCAM observations ever taken that contain data in all or part of the specified area.


U14:  Provide a list of point-like sources with multiple epoch measurements which have light variations > 0.1 magnitudes in J, H or K.


U15:  From any UKIDSS data, where multiple epoch measures exist for the same object, provide a list of anything moving more than X arcsec per hour.


U16:  Provide a list of star-like objects that are 1% rare for the 3-colour attributes.

This involves classification of the attribute set and then a scan to find objects with attributes close to those of a star that occur in rare categories.


U17:  For a given device in a tile, give me all images from the UDS corresponding to that frame, stacked in 10 day bins.


U18:  Give me a true colour JHK image mosaic using frames in the LAS centred at given co-ordinates (arbitrary reference frame and system) with 2 degree width and rebinned so that the entire mosaic is returned as a 2048x2048 pixel image.


U19:  Find all detected sources from all UKIDSS sub-surveys within 3x the error boxes of a user supplied list of X-ray transient sources.

Here we are searching the archive for counterparts to transient X-ray sources. One could also run this query to request pixel data.


U20:  For all sources in a user-supplied radio catalogue of HII regions in the GPS, return the Br-gamma surface brightness in an aperture of X arcsec

This usage example specifically requires analysis of H2 - K difference image pixel data with a user-specified list of aperture positions and radii.


Discussion and analysis

Additional usage examples, along with rationales, have been provided by UKIDSS. These were discussed at the 2002 November 25th Consortium meeting and a summary of those deliberations is presented here. In general, the Science Archive should be viewed as a system for managing and analysing large amounts of data and enabling science exploitation; however it cannot be all things to all people, and will not actually do the required science. Hence, the question of GUI functionality is necessarily restricted to those tasks that it would be impossible for the user to achieve (for example, requiring transfer and analysis large data volumes). Features requested for exploring and analysing small datasets are not a priority for the WSA design.


Large Area Survey (LAS):

No further usage examples were forthcoming from the LAS.

Discussion:

The key issue for the LAS is the cross-match with the SDSS optical survey. The question "is absence from the SDSS catalogues sufficient, or does the LAS require flux measurements at arbitrary positions in pixel data?"; was posed, i.e. for example in U1 above, it may be necessary for the Archive to be able to query the LAS catalogue in conjunction with SDSS pixel data (as opposed to simply the SDSS object catalogues). In the ensuing debate it became clear that there was a strong requirement for this. As for timescales, the Consortium was content to state that "it would be nice" to be able to do this at V1.0. For the purposes of the WSA, we will, therefore, undertake to look into this as a goal for V1.0 and a requirement for V2.0, but final implementation will depend on feasibility and resource constraints.


Galactic Plane Survey (GPS):

GPS 1: Return list of all bright and dark nebulae within 10 arcmin of a specified galactic or equatorial coordinate, with size, average surface brightness (darkness) at JHK, and best elliptical fit to shape.

GPS 2 (following U8): Return J:K and H:K ratios of star counts in 10 arcmin cells in the GPS, on a grid in Galactic Longitude and Latitude.

This gives information about extinction through the plane and detects clusters in distant spiral arms.

GPS 3: Return list of all stellar clusters in a specified rectangular galactic or equatorial coordinate box, with number of projected members, cluster radius (from King profile) and cluster contrast (perhaps defined as ratio of cluster stellar density to average local stellar density at the same galactic latitude)

GPS 4: Return K band extinction and individual solutions to distance modulus, true colour, dereddened apparent magnitude, spectral type and luminosity class for all point source J+H+K detections projected within a specified GPS cluster. Each solution must have a confidence rating or distance error attached and ambiguous solutions must have relative probabilities.

This type of data can be used to work out the distance to a cluster, either from the mode of solutions with >90% confidence or by main sequence fitting in a colour-magnitude diagram. This contributes to the 3-D atlas.

GPS 5 (following U11): Make J-K difference image map in magnitudes for a user specified rectangular region.

Provides extinction info for a nebulous star formation region.

GPS 6 (following U4, U1): Return list of GPS stars with detected proper motion (5-sigma) and J-H<0.5.

Searches for nearest and faintest brown dwarfs and white dwarfs.

Additional GPS comment:

U8 asks the archive to return star counts in 10 arcmin grid cells in the GPS. Stellar density does not require extra measurement but it would be worth calculating for each array field and/or for each square arcminute and putting the result in the archive. Similarly, a cluster detection algorithm should be run in regions of high stellar density, e.g. fitting a King model. The GPS is likely to find a lot of new clusters.

In summary the GPS wants to be able to get from the archive:

i) extinction (A(J), A(H), A(K)) for 3 colour detections

ii) solution(s) for true colours, dereddened apparent magnitude, sp. type, luminosity class and distance modulus, with probabilities and confidence attached.

iii) automatic update when multi-epoch data arrives - faint sources with proper motion being confirmed as dwarfs rather than giants.

Discussion:

The generalised querying enabled with SQL makes possible much of GPS 1-6 (for example, new columns of data can be generated from existing columns given a mathematically-expressed algorithm). For the purposes of the WSA, we undertake to work with the UKIDSS Working Groups to apply any supplied algorithm to catalogue tables to store required quantities. Ultimately, we aspire to making available an automatic system of upload of user-supplied codes/algorithms to enable general processing of catalogues and/or images.


Galactic Clusters Survey (GCS):

Following U3:

a) select all candidate cluster members which are within X arcmins of a brighter candidate cluster member to search for the incidence of brown dwarfs as wide companions to higher mass objects.

e.g. I would see if there is a significant number of these, then perhaps use them to test the brown dwarf ejection/formation scenario - one could then examine the `primaries' to see if they were indeed binaries as this theory would indicate.

One might want to append something like:

b) For all candidate members derive offset stars/wavefront sensor stars for X spectrograph on Y telescope. Make finding charts for each object."

Discussion:

Again, we note the flexibility of SQL, which would enable part (a), and also given a list and a set of selection criteria for brightness and proximity of nearby guide stars, part (b).


Deep Extragalactic Survey (DXS):

DXS 1: "I have a catalogue..."

XMM, SWIRE, GALEX etc will all want to input a list of positions and spit out an id so we need a query tool that reads in:

position, 1sigma error radius and N for the number of 1sigma radii you want to search to

and returns:

UKIDSS source, offset, N 1sigma radii.

This catalogue importer MUST be flexible and allow a simple ascii table to be read in.

DXS 2: "Find QSOs..."

I suspect that most of the DXS interest will be in galaxies but we should be able to select on:

"FWHM"=stellar/unresolved .and. J-K>1.0

DXS 3: "Find EROs..."

The vast majority of the DXS work will be on the union of optical and UKIDSS data. This strikes me as the most difficult aspect to include in the archive. We must be able to import optical data into the system with a similar structure and then interogate them simultaneously to say:

R<26 .and. K<20.5 .and. non-stellar .and. R-K>5

Perhaps a folding into the UKIDSS data of the optical photometry is required (and vice-versa) where each UKIDSS source has a set of comparison magnitudes (2MASS, SDSS, CFHLS, VST etc) that we can query globally and are skipped if they aren't entered.

DXS 4: "Find clusters...."

I suspect that this is by necessity an off-line process but will require the preselection of all the galaxies in a given area with some magnitude and colour cuts:

In a rectangular region X"xY" return:

non-stellar .and. K<20.5 .and. J-K>1.1

The important factor will be writing out such a vast query onto disk. The logistics of extracting large parts of the database is something that needs thought through now.

DXS 5: "Map me the source density..."

The current queries don't allow for the ability to select objects then display their surface density as a greyscale. This would be a really useful tool for looking for clusters or groups of similar objects but also for checking the completeness across large areas (i.e. take the faintest objects and see whether they are evenly distributed).

So the query would look like:

select non-stellar K>20.5 map 5x5deg with 10' pixels

The majority of the discussion about optical data is wrt to SDSS which is not appropriate to the DXS. We will need to bring the deeper optical data in as both pixel and catalogue form (ideally) so including this in the full design is vital in my view.

Discussion:

Clearly the issue of import of catalogue (and ideally pixel) data is open-ended. At this stage, in the design of the WSA we will undertake to ensure sufficient space is available for import of UKIDSS-supplied complementary imaging catalogues given sufficient information concerning their size; if this information is unavailable then this will be done on a best-efforts basis.


Ultra-Deep Survey (UDS):

No further usage examples were suggested.

Discussion:

Some discussion centred around the availability of space (and ultimately tools) for catalogue import; as stated above, WSA design will allow for this.


Miscellaneous:

1: Ability to semi-automatically prepare an ORAC-OT program based on the selection of objects from the survey.

2: Facility for ORAC load up of archive data at UKIRT while doing follow-up work would be useful.

Discussion:

WSA output formats will be standard, and remote server functionality will be provided to enable 2.