Package wsatools :: Package DbConnect :: Module DbSession :: Class Ingester
[hide private]

Class Ingester

source code


A special type of database session, where you wish to ingest data into a database table from a native binary or CSV file. Upon initialisation this class verifies that the schemas of the database tables are up-to-date.


Notes:

To Do: Always parse indices on initialisation of Ingester. Can't drop indices at this stage as it should occur too soon. Then give ingestTable() a default option of dropIndices=True. Ingester can tell from the schema whether it should parse non-survey indices or not. Ingester itself could have a dropIndices option at initialisation with default of False, for IngIngester to overrule.

Nested Classes [hide private]
  IngestError
Exception thrown when the data cannot be ingested.
    Errors & Exceptions

Inherited from DbSession: CatalogueServerError, DisconnectError

Instance Methods [hide private]
 
__init__(self, dbCon, tableSchema, tag='CuID000000', skipSchemaCheck=False, checkReleasedOnly=False)
Makes connection to the requested database, and checks that the schemas of the tables supplied in the table list are correct.
source code
int
ingestTable(self, tableName, filePathName, idxInfo=None, overWrite=False, isCsv=False, deleteFile=True, isOrdered=False, checkConstraints=True)
Ingest a binary or CSV flat file table into the database.
source code
str
_moveToShare(self, filePathName)
Moves a file to the catalogue load server's share directory as mounted on the curation server.
source code
tuple(str, str)
_normalisePath(self, filePathName)
Move ingest file to a place visible to the catalogue server O/S and normalise different filePathName inputs into the format:
source code
 
_removeIngestFile(self, fileName, filePathName, deleteFile)
Remove the ingest file.
source code

Inherited from DbSession: __del__, __str__, addColumn, addIndex, alterColumn, checkConstraints, checkSchema, commitTransaction, copyIntoTable, copyTable, createObjects, createStatistics, createTable, createUser, delete, deleteRows, dropColumn, dropObjects, dropTable, enableDirtyRead, existsTable, freeProcCache, getBestVolume, goOffline, grantAccess, insertData, query, queryAllowsNulls, queryAttrMax, queryColumnNames, queryDataTypes, queryEntriesExist, queryNumRows, queryRowSize, renameTable, rollbackTransaction, runOnServer, sharePath, shrinkTempDb, tablePath, testSharePath, truncate, uncPath, update, updateEntries, updateStatistics

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Class Variables [hide private]
  fileOnShare = None
Name of file on share for IngIngester type ingests only.
  fileTag = 'CuID000000'
Tag to append to ingest files so they can be cleaned-up later.
  _schema = None
Dictionary of schemas for tables to be ingested into, ref by table name.

Inherited from DbSession: database, isLoadDb, isRealRun, isTrialRun, server, sysc, userName

Instance Variables [hide private]

Inherited from DbSession (private): _dbSessionKey

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, dbCon, tableSchema, tag='CuID000000', skipSchemaCheck=False, checkReleasedOnly=False)
(Constructor)

source code 

Makes connection to the requested database, and checks that the schemas of the tables supplied in the table list are correct.

Parameters:
  • dbCon (DbSession) - A database connection.
  • tableSchema (list(Schema.Table) or dict(str; Schema.Table)) - Schema for tables to be ingested this session, as provided by Schema.parseTables().
  • tag (str) - A tag string to append to the file name on the share directory, so file clean up can be rm *tag*.
  • skipSchemaCheck (bool) - If True, don't check the schema for mismatches - use with extreme caution!
  • checkReleasedOnly (bool) - If True, only check the schema of released tables for mismatches.
Overrides: object.__init__

ingestTable(self, tableName, filePathName, idxInfo=None, overWrite=False, isCsv=False, deleteFile=True, isOrdered=False, checkConstraints=True)

source code 

Ingest a binary or CSV flat file table into the database.

Parameters:
  • tableName (str) - Name of table to create/update in the database.
  • filePathName (str) - Name of the ingest file, with optional full catalogue server path or curation server path. If full path is omitted, then file is assumed to be located in the catalogue server's share directory. Otherwise if a different curation server path is supplied the file is automatically moved to the share (assuming the fileTag has been set upon Ingester() initialisation).
  • idxInfo (defaultdict(str: list(Schema.Index))) - If you wish to drop indices prior to ingest, then supply Schema.parseIndices() information.
  • overWrite (bool) - If True, overwrite an existing table in the database. If False, update the table if it exists, otherwise create the table. NB: Whenever this method creates a table it does so without applying foreign key constraints.
  • isCsv (bool) - If True, expects to ingest a CSV ascii file, else expects native binary format.
  • deleteFile (bool) - If True, delete ingest file after ingest, regardless of outcome.
  • isOrdered (bool) - If True, the data is ordered by primary key.
  • checkConstraints (bool) - If True, check foreign key constraints during ingest if any exist, otherwise never check foreign key constraints.
Returns: int
Number of rows ingested.

Note: If the table doesn't already exist in the database it will be created. However, it won't have foreign key constraints (this is a feature intended for release databases made by CU19). If you need foreign key constraints it is best to call createTable() prior to ingestTable(overWrite=False).

To Do: If not isOrdered and overWrite=True then we don't apply primary key until after ingest. OR even better - if not isOrdered try to create table without any constraints, if succeed, i.e. there is no existing table or overWrite=True, then apply primary key after ingest other don't but do check foreign keys. May make parts of NeighbourTableIngester redundant.

_moveToShare(self, filePathName)

source code 

Moves a file to the catalogue load server's share directory as mounted on the curation server. Works around the share violation problem under NFS: a file written to the curation-client / load-server NFS file share cannot be accessed immediately after being written unless it is first renamed!

Parameters:
  • filePathName (str) - Path of file to copy over to the share directory. Must be a full path to the file visible from the curation server O/S.
Returns: str
New file name in share directory, including any sub-directories, or None if original file does not exist.

_normalisePath(self, filePathName)

source code 

Move ingest file to a place visible to the catalogue server O/S and normalise different filePathName inputs into the format:

filePathName = Full path to file from the catalogue server O/S e.g. H:\dir\subdir ilename.ext

fileName = filename.ext or subdir/filename.ext if file is in a subdirectory of the catalogue server share directory.

Parameters:
  • filePathName (str) - Any acceptable file path definition allowed by the Ingester.ingestTable() method.
Returns: tuple(str, str)
fileName, filePathName (see description above).

_removeIngestFile(self, fileName, filePathName, deleteFile)

source code 

Remove the ingest file.

Parameters:
  • fileName (str) - Name and sub-dir of file in the curation server share path.
  • filePathName (str) - Full path to file on the catalogue server.
  • deleteFile (bool) - If false, then just provide a warning.