Package wsatools :: Module Utilities
[hide private]

Module Utilities

source code

General utility functions. Mostly concerning manipulation of Python objects, and the file system.


Author: I.A. Bond

Organization: WFAU, IfA, University of Edinburgh

Contributors: R.S. Collins, N.J.G. Cross, N.C. Hambly, E. Sutorius

Classes [hide private]
  ParsedFile
Behaves like a file object, except that when iterating over file lines only non-blank, non-comment lines are returned and any EOL characters are removed together with trailing white-space.
  WordWrapper
Formats long strings so that they neatly fit within a certain width without words being split across lines.
  Ratings
Ratings is mostly like a dictionary, with extra features: the value corresponding to each key is the 'score' for that key, and all keys are ranked in terms their scores.
Functions [hide private]
int
_getUserWidth()
Private helper function used by the WordWrapper class.
source code
list
arbSort(unsortedList, kwdList, key=<function <lambda> at 0x26be410>, isFullKeySet=True)
Arbitrarily sorts a list of tuples of form (keyword, value) by the order defined in the sequence of specified keywords.
source code
 
ensureDirExist(aDir)
If the supplied directory does not exist then create it.
source code
str
expandNumberRange(numberRange, useTens=False)
Given a human readable compact number range string, expand it to a complete sequence of numbers in a CSV string.
source code
generator(str)
extractColumn(filePathName, colNum)
Gobble all entries in the given column of a space separated text file into a list.
source code
list(list(dataType))
extractColumns(filePathName, columnList=None, numRows=None, dataType=<type 'str'>)
Extracts from the given file the data in the given list of columns as a list of strings for every column.
source code
dict(str:list(int, int))
getDiskSpace(disks)
Gets the available disk space for supplied list of disks.
source code
set
getDuplicates(anIterable)
Returns the set of items for which the groupBy method returns the same item more than once.
source code
list(int)
getListIndices(aList, value)
Returns a list of all indices where a value occurs in the given list.
source code
str
getNextDisk(sysc, spacePerDisk=None, byPercentFree=False, preAllocMem=0)
Gets the next available disk which is less than 99% full.
source code
int
getNumberItems(numberRange, useTens=False)
Calculates the number of items expressed by a human-readable string of number ranges.
source code
int
getSystemMemoryKb()
Returns: Amount of available memory in kilobytes.
source code
list(tuple(int, int))
groupByCounts(keyCounts, groupSize)
Taking an ordered list of counts of a particular key, e.g.
source code
str
joinDict(aDict, joinStr=' = ', sepStr=', ')
Like string.join, but operates on the contents of a dictionary instead of a list.
source code
str
joinNested(aSeq, joinStr=', ', subJoinStr=None, seqIndex=None)
Like string.join, but can handle nested (or un-nested) sequences of string- castable objects.
source code
dict
invertDict(aDict, forceReturnList=False)
Inverts the dictionary in such a way that if the input dictionary's values are lists, each item of this list will become a key with the input dictionary's key as value.
source code
mx.DateTime
makeDateTime(time=None)
Returns an archive date/time data type, defaulting to the current time if no input argument is given.
source code
str
makeMssqlTimeStamp()
Creates a timestamp using makeTimeStamp and formats appropriately for use in ingest strings for Microsoft SQL Server (and handles a bug in the datetime object creation).
source code
str
makeTimeStamp()
Returns: An archive time stamp as a string, as opposed to the internal date time type.
source code
bool
moreThanOneIn(values)
Generator test function is equivalent to memory hog len(list(values)) > 1 or sum(1 for _ in values) > 1 but doesn't iterate through all items.
source code
str
multiSub(text, subs)
Performs multiple string substitutions on the given string.
source code
 
noInterrupt(*args, **kwds)
Disables keyboard interrupts whilst in this context.
source code
list(list)
npop(aList, nMax=2, mode='topbot')
Divide aList in nMax sublists by populating it with items subsequently taken from the top and the bottom of list.
source code
str
numberRange(numbers, sep=', ', useTens=False)
Given a sequence of integers it returns that sequence as a string representation of an ordered range of unique numbers.
source code
list(X)
orderedSet(seq, excludeList=None)
Returns a list of the given sequence in the original order, but with duplicates removed.
source code
float
parseFloat(value)
Parses a string value and converts to float.
source code
generator(list(X))
splitList(longList, chunkSize=2, noSingles=False)
Splits a list into a list of equal sized chunks.
source code
generator(X)
unpackList(combinedList)
Given a list of lists, return a single sequence containing all of the elements of the combined list, as a generator.
source code
 
naturalSorted(strings)
Sort strings naturally.
source code
 
naturalSortKey(key) source code
Variables [hide private]
  __package__ = 'wsatools'
Function Details [hide private]

_getUserWidth()

source code 

Private helper function used by the WordWrapper class.

Returns: int
User's default wrap width read from the preferences file.

arbSort(unsortedList, kwdList, key=<function <lambda> at 0x26be410>, isFullKeySet=True)

source code 

Arbitrarily sorts a list of tuples of form (keyword, value) by the order defined in the sequence of specified keywords. Example:

>>> arbSort([7, 3, 8, 0, 1], [8, 3], isFullKeySet=False)
[8, 3, 7, 0, 1]
Parameters:
  • unsortedList (list) - Unsorted list of scalars or sequences.
  • kwdList (list) - List of keywords specifying the required sort order, need to be of same type/value as entries in of the sort key element of the given unsorted list.
  • key (function) - Function to fetch the sort key element from the given unsorted list, e.g. operator.itemgetter(0).
  • isFullKeySet (bool) - If True, expect kwdList to contain the complete set of keys in the unsortedList, otherwise leave entries for non-specified keys at the end of the sorted list, sorted by non-specified key in original key order.
Returns: list
The sorted list.

To Do: See if defining my own compare function that calls index() is faster/simpler than the Decorate-Sort-Undecorate method used here.

ensureDirExist(aDir)

source code 

If the supplied directory does not exist then create it.

Parameters:
  • aDir (str) - Full path to the directory.

expandNumberRange(numberRange, useTens=False)

source code 

Given a human readable compact number range string, expand it to a complete sequence of numbers in a CSV string. Example:

>>> expandNumberRange(numberRange([1, 2, 3, 5, 6, 7]))
'1,2,3,5,6,7'
>>> expandNumberRange('1,2')
'1,2'
>>> expandNumberRange('0')
'0'
Parameters:
  • numberRange (str) - A human readable compact number range string
Returns: str
A complete sequence of numbers in a CSV string.

extractColumn(filePathName, colNum)

source code 

Gobble all entries in the given column of a space separated text file into a list. Lines that begin with the hash mark are treated as comments and are ignored. Example:

>>> column = extractColumn("/disk47/sys/test/Utilities/test.cat", 6)
>>> list(column)[:2]
['14.5723', '15.1406']
Parameters:
  • filePathName (str) - Path to space separated text file to be read.
  • colNum (int) - Column number of text to extract.
Returns: generator(str)
A generator for all text in specified column of the file.

To Do: Replace with extractColumns()? Could leave this method here for speed and simplicity. Though normally we want more than one column anyway! So, may just extractColumns(file, 3)[0] isn't so bad.

extractColumns(filePathName, columnList=None, numRows=None, dataType=<type 'str'>)

source code 

Extracts from the given file the data in the given list of columns as a list of strings for every column. Example:

>>> data = extractColumns("/disk47/sys/test/Utilities/test.cat",
...                       columnList=[6, 7], dataType=float)
>>> print(data[0][0], data[0][1], data[1][0], data[1][1])
14.5723 15.1406 0.0033 0.005
Parameters:
  • filePathName (str) - Full path to the file to read.
  • columnList (list(int)) - Optional list of indices for the columns to be read (with the first column at index 0), otherwise all columns are read.
  • numRows (int) - Optionally supply number of rows to read, otherwise all rows are read.
  • dataType (type) - Optionally convert entries from string to this Python type.
Returns: list(list(dataType))
A list of column data, with each column represented by a list of strings.

To Do: Possibly alter to make use of the CSV module's abilities to handle different dialects? Would simplify code a bit, and make more useful.

getDiskSpace(disks)

source code 

Gets the available disk space for supplied list of disks.

Parameters:
  • disks (sequence(str) or generator(str)) - Sequence of disk paths (e.g. SystemConstants.massStorageRaidFileSystem()).
Returns: dict(str:list(int, int))
A dictionary of sizes of the form (total, free) for each disk.

getDuplicates(anIterable)

source code 

Returns the set of items for which the groupBy method returns the same item more than once. By default, this will simply return just the values in the list that are duplicated. Removed groupBy key option, because it's better to pre-process the iterable once prior to passing to this function and sorting (in this usage of groupby, in other usages it's useful). Examples:

>>> getDuplicates([1, 2, 1, 0])
set([1])
>>> getDuplicates(x[0] for x in [(1,2), (3,4), (1,4)])
set([1])
Parameters:
  • anIterable (sequence or generator) - Any sequence or generator that can be iterated over.
Returns: set

getListIndices(aList, value)

source code 

Returns a list of all indices where a value occurs in the given list. Example:

>>> getListIndices(["bob", "steve", "bob"], "bob")
[0, 2]
Parameters:
  • aList (list) - List to search for occurrences of value.
  • value (object) - Value to find in list, that may occur multiple times.
Returns: list(int)

Note: Can probably avoid the need to use this function by employing a slightly different algorithm design, e.g. use a dictionary.

getNextDisk(sysc, spacePerDisk=None, byPercentFree=False, preAllocMem=0)

source code 

Gets the next available disk which is less than 99% full.

Parameters:
  • sysc (SystemConstants) - An initialised SystemConstants object.
  • spacePerDisk (dict(str:list(int, int))) - A dictionary of available disks and their size, obtained automatically by getDiskSpace() if not supplied here.
  • byPercentFree (bool) - If True, choose next disk based on percentage free space available rather than absolute free space.
  • preAllocMem (int) - Pre allocated memory in GB.
Returns: str
Path to the next free disk.

To Do: Instead of taking a SystemConstants object, why not make this a method of SystemConstants?

getNumberItems(numberRange, useTens=False)

source code 

Calculates the number of items expressed by a human-readable string of number ranges.

>>> getNumberItems(numberRange(range(10)))
10
Parameters:
  • numberRange (str) - List of numbers expressed as a human-readable string of number ranges, as returned by Utilities.numberRange().
Returns: int
Number of items expressed in the given list.

Note: This isn't a sensible way of doing things, as numberRange() is designed only for the purpose of printing human readable strings. It shouldn't be used as a data container for processing.

getSystemMemoryKb()

source code 
Returns: int
Amount of available memory in kilobytes.

groupByCounts(keyCounts, groupSize)

source code 

Taking an ordered list of counts of a particular key, e.g. the results of an SQL "SELECT key, count(*) ... GROUP BY key ORDER BY key", this function returns a list of key ranges that contain up to the specified group size of counts. Example:

>>> groupByCounts([(10001, 5), (10002, 3), (10003, 12), (10004, 6)],
...               groupSize=10)
[(10001, 10002), (10003, 10003), (10004, 10004)]
Parameters:
  • keyCounts (list(tuple(int, int))) - List of keys and their respective counts.
  • groupSize (int) - Total number of counts in which to group keys by.
Returns: list(tuple(int, int))
List of ranges in the form (min, max) of values of the keys, where between these values the total counts are less than or equal to the specified group size.

joinDict(aDict, joinStr=' = ', sepStr=', ')

source code 

Like string.join, but operates on the contents of a dictionary instead of a list. Joins dictionary keyword and value pairs into the string: str(keyword) + joinStr + str(value) + sepStr etc. Example:

>>> joinDict(dict(a=1, b=2))
'a = 1, b = 2'
Parameters:
  • aDict (dict) - Dictionary to process into a string.
  • joinStr (str) - String to insert between keywords and values.
  • sepStr (str) - String to insert between dictionary elements.
Returns: str
A string representing the contents of the dictionary.

joinNested(aSeq, joinStr=', ', subJoinStr=None, seqIndex=None)

source code 

Like string.join, but can handle nested (or un-nested) sequences of string- castable objects. Example:

>>> joinNested([['a', 0], ['b', 1]])
'a, 0, b, 1'
Parameters:
  • aSeq (list or tuple) - A (nested) sequence string-castable objects.
  • joinStr (str) - String to insert between elements of the main sequence.
  • subJoinStr (str) - String to insert between elements of the nested sequence (defaults to the same as joinStr).
  • seqIndex (int) - If specified, only the elements at this index in the nested sequences are included.
Returns: str
A string containing all of the elements of the (nested) sequence.

invertDict(aDict, forceReturnList=False)

source code 

Inverts the dictionary in such a way that if the input dictionary's values are lists, each item of this list will become a key with the input dictionary's key as value. If several input dictionary's keys exist for one input dictionary's value the inverted dict's values will be lists. Example:

>>> invertDict(dict(males=["bob", "chris"], females=["jane", "chris"]))
{'chris': ['males', 'females'], 'jane': ['females'], 'bob': ['males']}
Parameters:
  • aDict (dict) - Dictionary to invert.
  • forceReturnList (bool) - If True, return dictionary where values are always lists.
Returns: dict
Inverted dictionary.

makeDateTime(time=None)

source code 

Returns an archive date/time data type, defaulting to the current time if no input argument is given. This defines the archive date/time data type, and is presently set to the mx.DateTime defined type. This function defines the time system for the archive (which is UTC).

Parameters:
  • time (str) - If given, specify a time in the format: "2005-01-29 23:59:59.99".
Returns: mx.DateTime
A date/time in the archive defined type.

makeMssqlTimeStamp()

source code 

Creates a timestamp using makeTimeStamp and formats appropriately for use in ingest strings for Microsoft SQL Server (and handles a bug in the datetime object creation).

Returns: str
A UTC date/time stamp string formatted for MS SQL Server.

makeTimeStamp()

source code 
Returns: str
An archive time stamp as a string, as opposed to the internal date time type.

moreThanOneIn(values)

source code 

Generator test function is equivalent to memory hog len(list(values)) > 1 or sum(1 for _ in values) > 1 but doesn't iterate through all items. In fact, memory is rarely an issue this the iterable here won't large compared to the full program usage, but even the first form is slow. Example:

>>> moreThanOneIn(x for x in [])
False
>>> moreThanOneIn(x for x in ['g'])
False
>>> moreThanOneIn(x for x in [0.1 ,0.2])
True
>>> moreThanOneIn(x for x in [1, 2, 3])
True
Parameters:
  • values (generator) - A generator to be evaluated.
Returns: bool
True if values contains more than one item.

multiSub(text, subs)

source code 

Performs multiple string substitutions on the given string. Example:

>>> multiSub("UKIRT and the WSA", [("WSA", "VSA"), ("UKIRT", "VISTA")])
'VISTA and the VSA'
Parameters:
  • text (str) - String containing text to be substituted.
  • subs (list(tuple(str, str))) - List of text marker and substitution value pairs.
Returns: str
The original text string, with all marker values replaced by given substitution values.

noInterrupt(*args, **kwds)

source code 

Disables keyboard interrupts whilst in this context.

Decorators:
  • @contextmanager

npop(aList, nMax=2, mode='topbot')

source code 

Divide aList in nMax sublists by populating it with items subsequently taken from the top and the bottom of list. Example:

>>> npop([1, 2, 3, 4, 5])
[[1, 5], [2, 4], [3]]
>>> npop([1, 2, 3, 4, 5, 6], nMax=3)
[[1, 6, 2, 5], [3, 4]]
Parameters:
  • aList (list) - The list (of file names).
  • nMax (int) - Maximal number of sublists. Rounded up to next even number.
  • mode (str) - Mode of picking items from the list: 'asc', 'desc', or subsequently from the top and bottom.
Returns: list(list)
List of sublists.

numberRange(numbers, sep=', ', useTens=False)

source code 

Given a sequence of integers it returns that sequence as a string representation of an ordered range of unique numbers. Example:

>>> numberRange([1, 2, 3, 5, 6, 7])
'1-3, 5-7'
>>> numberRange([1, 3, 5, 7])
'1, 3, 5, 7'
>>> numberRange([])
'None'
Parameters:
  • numbers (sequence(int)) - Any sequence of integers.
  • sep (str) - Marker string to separate individual numbers.
Returns: str
A string representation of the range of the unique set of ordered integers.

orderedSet(seq, excludeList=None)

source code 

Returns a list of the given sequence in the original order, but with duplicates removed. Example:

>>> orderedSet([6, 4, 7, 4, 9, 1, 7])
[6, 4, 7, 9, 1]
Parameters:
  • seq (sequence(X)) - Any ordered sequence containing duplicate values.
  • excludeList (list(X)) - A list of items to exclude from the returned list.
Returns: list(X)
Ordered list without duplicates.

parseFloat(value)

source code 

Parses a string value and converts to float. NaNs and failures all return None. Example:

>>> parseFloat('3.1')
3.1
Parameters:
  • value (str) - String containing just a floating-point value to parse.
Returns: float
Floating-point value of parsed string or None if fails.

splitList(longList, chunkSize=2, noSingles=False)

source code 

Splits a list into a list of equal sized chunks. If list is not equally divisable then the last chunk just contains the remaining number of elements. Example:

>>> list(splitList([1, 2, 3, 4]))
[[1, 2], [3, 4]]
>>> list(splitList([1, 2, 3, 4, 5, 6, 7], 3))
[[1, 2, 3], [4, 5, 6], [7]]
>>> list(splitList([1, 2, 3, 4, 5, 6, 7], 3, noSingles=True))
[[1, 2, 3], [4, 5, 6, 7]]
>>> list(splitList([1]))
[[1]]
>>> list(splitList([1], noSingles=True))
[]
>>> list(splitList([]))
[]
Parameters:
  • longList (list(X)) - The long list that needs to be split into a list of smaller sized chunks.
  • chunkSize (int) - Number of elements per sub-list to divide the long list.
  • noSingles (bool) - If True, ensure no single item lists are created.
Returns: generator(list(X))
A generator for sub-lists containing the original list sub-divided into chunks.

unpackList(combinedList)

source code 

Given a list of lists, return a single sequence containing all of the elements of the combined list, as a generator. It also handles the more general case of a sequence of sequences, unlike sum(combinedList, []), which is the equivalent of the standard case. Example:

>>> ', '.join(unpackList([['a', 'b'], ['c', 'd']]))
'a, b, c, d'
>>> list(unpackList(splitList([1, 2, 3, 4])))
[1, 2, 3, 4]
Parameters:
  • combinedList (list(list(X)) or generator(list(X))) - A list of lists or a generator for lists.
Returns: generator(X)
A generator for a sequence of just the elements of the original nested list.