Package wsatools :: Package DbConnect :: Module DbSession :: Class Outgester
[hide private]

Class Outgester

source code


A special type of database session, where you wish to bulk outgest data from the database to a file. This class ensures the connection has the correct permissions and is designed to work around the delays in the appearance of the data on the NFS mounted load server share directory. If the schema for tables that data will be outgested from is supplied, then upon initialisation this class verifies that the schemas of the database tables are up-to-date. Otherwise, the class will query the database schema directly, which is slower.

Nested Classes [hide private]
    Errors & Exceptions

Inherited from DbSession: CatalogueServerError, DisconnectError

Instance Methods [hide private]
 
__init__(self, dbCon, tag='CuID000000', honourTrialRun=False)
Makes connection to the requested database.
source code
 
getFilePath(self, fileID)
Returns the location of a file that outgest with this fileID.
source code
str (or int)
outgestQuery(self, query, fileID='', filePath='', createDtdFile=False, isCsv=False, redirectStdOut=False)
Bulk outgest the results from an SQL query to a file in the catalogue server share directory.
source code
 
_getDataTypes(self, query)
Returns ordered list of data types returned by the given query, either via examining the schema, if available, or by direct database table query.
source code
bool
_outgestFile(self, cmd, redirectStdOut, view)
Performs outgest to file, ensuring it exists.
source code
 
_transferFile(self, filePathName, filePath)
Transfers given file from the share path to the given path on the curation server.
source code
 
_waitForNFS(self, filePathName, query)
Wait until the file on the NFS share is ready to be accessed.
source code

Inherited from DbSession: __del__, __str__, addColumn, addIndex, alterColumn, checkConstraints, checkSchema, commitTransaction, copyIntoTable, copyTable, createObjects, createStatistics, createTable, createUser, delete, deleteRows, dropColumn, dropObjects, dropTable, enableDirtyRead, existsTable, freeProcCache, getBestVolume, goOffline, grantAccess, insertData, query, queryAllowsNulls, queryAttrMax, queryColumnNames, queryDataTypes, queryEntriesExist, queryNumRows, queryRowSize, renameTable, rollbackTransaction, runOnServer, sharePath, shrinkTempDb, tablePath, testSharePath, truncate, uncPath, update, updateEntries, updateStatistics

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Class Variables [hide private]
  fileTag = 'CuID000000'
Tag to append to outgest files so they can be cleaned-up later.
  sampleRate = 2
Delay in seconds between testing file size on NFS mounted share.
  tempViewName = 'OutgestTempView51591neferefre'
Name of temporary outgest view that's created if the SQL statement is too long for the BCP statement.
  timeOut = 600
Time in seconds before assuming outgest to NFS has failed.
  _isBcpDeadlocked = False
BCP outgest is currently deadlocked?

Inherited from DbSession: database, isLoadDb, isRealRun, isTrialRun, server, sysc, userName

Instance Variables [hide private]

Inherited from DbSession (private): _dbSessionKey

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, dbCon, tag='CuID000000', honourTrialRun=False)
(Constructor)

source code 

Makes connection to the requested database. Supplied database connection must be under a username that has has bcp rights, e.g. ldservrw/pubdbrw. Can't do a SETUSER to make sure we are always ldservrw, because not all users have the permissions to perform a SETUSER - which kinda defeats the object. If user is not ldservrw or pubdbrw a new connection will be made as ldservrw (which is safe as Outgester only reads from the database anyway). In this circumstance, to prevent too many connections being made/dropped make sure to create just a single Outgester object, which is continually used throughout.

Parameters:
  • dbCon (DbSession) - A database connection.
  • tag (str) - A tag string to append to the file name on the share directory, so file clean up can be rm *tag*.
  • honourTrialRun (bool) - If True, don't outgest if the database connection is in trial run mode.
Overrides: object.__init__

outgestQuery(self, query, fileID='', filePath='', createDtdFile=False, isCsv=False, redirectStdOut=False)

source code 

Bulk outgest the results from an SQL query to a file in the catalogue server share directory.

Parameters:
  • query (SelectSQL) - SQL query to outgest.
  • fileID (str) - Optionally request a specific unique identifier to be used in the name for the outgest file. The file will be named "[fileID][fileTag].dat". This is useful if multiple outgests from the same table need to be available at the same time.
  • filePath (str) - Optionally outgest to a path on the catalogue server other than the share path (e.g. "G:"), or transfer outgest file to a different path on the curation server. Either give full new path name for the file or else just provide the path to the new directory, but make sure that directory exists first.
  • createDtdFile (bool) - If True and the outgest is to a binary file on the share path, a file containing the data type definition of the outgest file will be created with the same name as the outgest file, but with a ".dtd" extension rather than ".dat".
  • isCsv (bool) - If True, outputs to CSV ascii file, else defaults to native binary format.
  • redirectStdOut (bool) - If True, the results of the outgest will be written to a file on the share path rather than returned through mxODBC. This is required to work around a mysterious bug in very large table outgests only and is slightly less efficient.
Returns: str (or int)
Full path to the outgest file (unless createDtdFile requested, which returns the row count outgested). The path will be a curation server path if the file is outgest to the share directory, which is the default behaviour, otherwise a catalogue server path is returned.
Notes:
  • If data is being copied by bulk outgest/ingest it helps to order by the primary key.
  • This function may hang for up to 10 minutes after outgest if the outgest failed silently (which can happen), whilst waiting for the NFS update to time out. I've chosen not to print a message saying the outgest is complete and that the function is waiting until time-out, because multiple outgests would display too much information, and there's no point making this message to occur in debug mode only because it would be obvious.

_getDataTypes(self, query)

source code 

Returns ordered list of data types returned by the given query, either via examining the schema, if available, or by direct database table query.

To Do:
  • Select * queries may not return correctly ordered data types. If so, and order matters, this can be acheived by another database query - which of course will slow the code down a bit.
  • Simplify by just running a test query of one row and then inspecting the cursor like pySQL does? Similar to how DbSession.queryRowSize() works. This will render the above todo obsolete.

_outgestFile(self, cmd, redirectStdOut, view)

source code 

Performs outgest to file, ensuring it exists.

Returns: bool
True, if outgest file successfully created.

Class Variable Details [hide private]

tempViewName

Name of temporary outgest view that's created if the SQL statement is too long for the BCP statement. Host PID is a safer UID than fileTag.

Value:
'OutgestTempView51591neferefre'