esm_archiving package¶
Top-level package for ESM Archiving.
- esm_archiving.archive_mistral(tfile, rtfile=None)[source]¶
Puts the
tfile
to the tape archive usingtape_command
- Parameters
tfile (str) – The full path of the file to put to tape
rtfile (str) – The filename on the remote tape server. Defaults to None, in which case a replacement is performed to keep as much of the filename the same as possible. Example: /work/ab0246/a270077/experiment.tgz –> /hpss/arch/ab0246/a270077/experiment.tgz
- Returns
- Return type
None
- esm_archiving.delete_original_data(tfile, force=False)[source]¶
Erases data which is found in the tar file.
- Parameters
tfile (str) – Path to the tarfille whose data should be erased.
force (bool) – If False, asks the user if they really want to delete their files. Otherwise just does this silently. Default is
False
- Returns
- Return type
None
- esm_archiving.determine_datestamp_location(files)[source]¶
Given a list of files; figures where the datestamp is by checking if it varies.
- Parameters
files (list) – A list (longer than 1!) of files to check
- Returns
A slice object giving the location of the datestamp
- Return type
slice
- Raises
DatestampLocationError : – Raised if there is more than one slice found where the numbers vary over different files -or- if the length of the file list is not longer than 1.
- esm_archiving.determine_potential_datestamp_locations(filepattern)[source]¶
For a filepattern, gives back index of potential date locations
- Parameters
filepattern (str) – The filepattern to check.
- Returns
A list of slice object which you can use to cut out dates from the filepattern
- Return type
list
- esm_archiving.find_indices_of(char, in_string)[source]¶
Finds indicies of a specific character in a string
- Parameters
char (str) – The character to look for
in_string (str) – The string to look in
- Yields
int – Each round of the generator gives you the next index for the desired character.
- esm_archiving.get_files_for_date_range(filepattern, start_date, stop_date, frequency, date_format='%Y%m%d')[source]¶
Creates a list of files for specified start/stop dates
- Parameters
filepattern (str) – A filepattern to replace dates in
start_date (str) – The starting date, in a pandas-friendly date format
stop_date (str) – Ending date, pandas friendly. Note that for end dates, you need to add one month to assure that you get the last step in your list!
frequency (str) – Frequency of dates, pandas friendly
date_format (str) – How dates should be formatted, defaults to %Y%m%d
- Returns
A list of strings for the filepattern with correct date stamps.
- Return type
list
Example
>>> filepattern = "LGM_24hourly_PMIP4_echam6_BOT_mm_>>>DATE<<<.nc" >>> get_files_for_date_range(filepattern, "1890-07", "1891-11", "1M", date_format="%Y%m") [ "LGM_24hourly_PMIP4_echam6_BOT_mm_189007.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189008.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189009.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189010.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189011.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189012.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189101.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189102.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189103.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189104.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189105.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189106.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189107.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189108.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189109.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189110.nc", ]
- esm_archiving.group_files(top, filetype)[source]¶
Generates quasi-regexes for a specific filetype, replacing all numbers with #.
- Parameters
top (str) – Where to start looking (this should normally be top of the experiment)
filetype (str) – Which files to go through (e.g. outdata, restart, etc…)
- Returns
A dictonary containing keys for each folder found in
filetype
, and values as lists of files with strings where numbers are replaced by #.- Return type
dict
- esm_archiving.group_indexes(index_list)[source]¶
Splits indexes into tuples of monotonically ascending values.
- Parameters
list – The list to split up
- Returns
A list of tuples, so that you can get only one group of ascending tuples.
- Return type
list
Example
>>> indexes = [0, 1, 2, 3, 12, 13, 15, 16] >>> group_indexes(indexes) [(0, 1, 2, 3), (12, 13), (15, 16)]
- esm_archiving.log_tarfile_contents(tfile)[source]¶
Generates a log of the tarball contents
- Parameters
tfile (str) – The path for the tar file to generate a log for
- Returns
- Return type
None
Warning
Note that for this function to work, you need to have write permission in the directory where the tarball is located. If not, this will probably raise an OSError. I can imagine giving the location of the log path as an argument; but would like to see if that is actually needed before implementing it…
- esm_archiving.pack_tarfile(flist, wdir, outname)[source]¶
Creates a compressed tarball (
outname
) with all files found inflist
.- Parameters
flist (list) – A list of files to include in this tarball
wdir (str) – The directory to “change” to when packing up the tar file. This will (essentially) be used in the tar command as the -C option by stripping off the beginning of the flist
outname (str) – The output file name
- Returns
The output file name
- Return type
str
- esm_archiving.purify_expid_in(model_files, expid, restore=False)[source]¶
Puts or restores >>>EXPID<<< marker in filepatterns
- Parameters
model_files (dict) – The model files for archiving
expid (str) – The experiment ID to purify or restore
restore (bool) – Set experiment ID back from the temporary marker
- Returns
Dictionary containing keys for each model, values for file patterns
- Return type
dict
- esm_archiving.stamp_filepattern(filepattern, force_return=False)[source]¶
Transforms # in filepatterns to >>>DATE<<< and replaces other numbers back to original
- Parameters
filepattern (str) – Filepattern to get date stamps for
force_return (bool) – Returns the list of filepatterns even if it is longer than 1.
- Returns
New filepattern, with >>>DATE<<<
- Return type
str
- esm_archiving.stamp_files(model_files)[source]¶
Given a sttandard file dictioanry (keys: model names, values: filepattern); figures out where the date probably is, and replaces the
#
sequence with a >>>DATE<<< stamp.- Parameters
model_files (dict) – Dictionary of keys (model names) where values are lists of files for each model.
- Returns
As the input, but replaces the filepatterns with the >>>DATE<<< stamp.
- Return type
dict
- esm_archiving.sum_tar_lists(tar_lists)[source]¶
Sums up the amount of space in the tar lists dictionary
Given
tar_lists
, which is generally a dicitonary consisting of keys (model names) and values (files to be tarred), figures out how much space the raw, uncompressed files would use. Generally the compressed tarball will take up less space.- Parameters
tar_lists (dict) – Dictionary of file lists to be summed up. Reports every sum as a value for the key of that particular list.
- Returns
Keys are the same as in the input, values are the sums (in bytes) of all files present within the list.
- Return type
dict
- esm_archiving.sum_tar_lists_human_readable(tar_lists)[source]¶
As
sum_tar_lists
but gives back strings with human-readable sizes.
Subpackages¶
Submodules¶
esm_archiving.cli module¶
After installation, you have a new command in your path:
esm_archive
Passing in the argument --help
will show available subcommands:
Usage: esm_archive [OPTIONS] COMMAND [ARGS]...
Console script for esm_archiving.
Options:
--version Show the version and exit.
--write_local_config Write a local configuration YAML file in the current
working directory
--write_config Write a global configuration YAML file in
~/.config/esm_archiving/
--help Show this message and exit.
Commands:
create
upload
To use the tool, you can first create
a tar archive and then use upload
to put it onto the tape server.
Creating tarballs¶
Use esm_archive create
to generate tar files from an experiment:
esm_archive create /path/to/top/of/experiment start_date end_date
The arguments start_date
and end_date
should take the form
YYYY-MM-DD
. A complete example would be:
esm_archive create /work/ab0246/a270077/from_ba0989/AWICM/LGM_6hours 1850-01-01 1851-01-01
The archiving tool will automatically pack up all files it finds matching these
dates in the outdata
and restart
directories and generate logs in the
top of the experiment folder. Note that the final date (1851-01-1 in this
example) is not included. During packing, you get a progress bar indicating
when the tarball is finished.
Please be aware that are size limits in place on DKRZ’s tape server. Any tar files larger than 500 Gb will be trucated. For more information, see: https://www.dkrz.de/up/systems/hpss/hpss
Uploading tarballs¶
A second command esm_archive upload
allows you to put tarballs onto to tape server at DKRZ:
esm_archive upload /path/to/top/of/experiment start_date end_date
The signature is the same as for the create
subcommand. Note that for this
to work; you need to have a properly configured .netrc
file in your home
directory:
$ cat ~/.netrc
machine tape.dkrz.de login a270077 password OMITTED
This file needs to be readable/writable only for you, e.g. chmod 600
.
The archiving program will then be able to automatically log into the tape
server and upload the tarballs. Again, more information about logging onto the
tape server without password authentication can be found here:
https://www.dkrz.de/up/help/faq/hpss/how-can-i-use-the-hpss-tape-archive-without-typing-my-password-every-time-e-g-in-scripts-or-jobs
esm_archiving.config module¶
When run from either the command line or in library mode (note not as an
ESM Plugin), esm_archiving
can be configured to how it looks for specific
files. The configuration file is called esm_archiving_config
, should be
written in YAML, and have the following format:
echam: # The model name
archive: # archive seperator **required**
# Frequency specification (how often
# a datestamp is generated to look for)
frequency: "1M"
# Date format specification
date_format: "%Y%m"
By default, esm_archive
looks in the following locations:
Current working directory
- Any files in the XDG Standard:
https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html
If nothing is found, the program reverts to the hard-coded defaults, found in
esm_archiving/esm_archiving/config.py
Note
In future, it might be changed that the program will look for an experiment
specific configuration based upon the path it is given during the
create
or upload
step.
Generating a configuration¶
You can use the command line switches --write_local_config
and
--write_config
to generate configuration files either in the current
working directory, or in the global directory for your user account defined by
the XDG standard (typically ~/.config/esm_archiving):
$ esm_archive --write_local_config
Writing local (experiment) configuration...
$ esm_archive --write_config
Writing global (user) configuration...
esm_archiving.esm_archiving module¶
This is the esm_archiving
module.
- esm_archiving.esm_archiving.archive_mistral(tfile, rtfile=None)[source]¶
Puts the
tfile
to the tape archive usingtape_command
- Parameters
tfile (str) – The full path of the file to put to tape
rtfile (str) – The filename on the remote tape server. Defaults to None, in which case a replacement is performed to keep as much of the filename the same as possible. Example: /work/ab0246/a270077/experiment.tgz –> /hpss/arch/ab0246/a270077/experiment.tgz
- Returns
- Return type
None
- esm_archiving.esm_archiving.delete_original_data(tfile, force=False)[source]¶
Erases data which is found in the tar file.
- Parameters
tfile (str) – Path to the tarfille whose data should be erased.
force (bool) – If False, asks the user if they really want to delete their files. Otherwise just does this silently. Default is
False
- Returns
- Return type
None
- esm_archiving.esm_archiving.determine_datestamp_location(files)[source]¶
Given a list of files; figures where the datestamp is by checking if it varies.
- Parameters
files (list) – A list (longer than 1!) of files to check
- Returns
A slice object giving the location of the datestamp
- Return type
slice
- Raises
DatestampLocationError : – Raised if there is more than one slice found where the numbers vary over different files -or- if the length of the file list is not longer than 1.
- esm_archiving.esm_archiving.determine_potential_datestamp_locations(filepattern)[source]¶
For a filepattern, gives back index of potential date locations
- Parameters
filepattern (str) – The filepattern to check.
- Returns
A list of slice object which you can use to cut out dates from the filepattern
- Return type
list
- esm_archiving.esm_archiving.find_indices_of(char, in_string)[source]¶
Finds indicies of a specific character in a string
- Parameters
char (str) – The character to look for
in_string (str) – The string to look in
- Yields
int – Each round of the generator gives you the next index for the desired character.
- esm_archiving.esm_archiving.get_files_for_date_range(filepattern, start_date, stop_date, frequency, date_format='%Y%m%d')[source]¶
Creates a list of files for specified start/stop dates
- Parameters
filepattern (str) – A filepattern to replace dates in
start_date (str) – The starting date, in a pandas-friendly date format
stop_date (str) – Ending date, pandas friendly. Note that for end dates, you need to add one month to assure that you get the last step in your list!
frequency (str) – Frequency of dates, pandas friendly
date_format (str) – How dates should be formatted, defaults to %Y%m%d
- Returns
A list of strings for the filepattern with correct date stamps.
- Return type
list
Example
>>> filepattern = "LGM_24hourly_PMIP4_echam6_BOT_mm_>>>DATE<<<.nc" >>> get_files_for_date_range(filepattern, "1890-07", "1891-11", "1M", date_format="%Y%m") [ "LGM_24hourly_PMIP4_echam6_BOT_mm_189007.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189008.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189009.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189010.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189011.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189012.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189101.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189102.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189103.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189104.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189105.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189106.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189107.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189108.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189109.nc", "LGM_24hourly_PMIP4_echam6_BOT_mm_189110.nc", ]
- esm_archiving.esm_archiving.group_files(top, filetype)[source]¶
Generates quasi-regexes for a specific filetype, replacing all numbers with #.
- Parameters
top (str) – Where to start looking (this should normally be top of the experiment)
filetype (str) – Which files to go through (e.g. outdata, restart, etc…)
- Returns
A dictonary containing keys for each folder found in
filetype
, and values as lists of files with strings where numbers are replaced by #.- Return type
dict
- esm_archiving.esm_archiving.group_indexes(index_list)[source]¶
Splits indexes into tuples of monotonically ascending values.
- Parameters
list – The list to split up
- Returns
A list of tuples, so that you can get only one group of ascending tuples.
- Return type
list
Example
>>> indexes = [0, 1, 2, 3, 12, 13, 15, 16] >>> group_indexes(indexes) [(0, 1, 2, 3), (12, 13), (15, 16)]
- esm_archiving.esm_archiving.log_tarfile_contents(tfile)[source]¶
Generates a log of the tarball contents
- Parameters
tfile (str) – The path for the tar file to generate a log for
- Returns
- Return type
None
Warning
Note that for this function to work, you need to have write permission in the directory where the tarball is located. If not, this will probably raise an OSError. I can imagine giving the location of the log path as an argument; but would like to see if that is actually needed before implementing it…
- esm_archiving.esm_archiving.pack_tarfile(flist, wdir, outname)[source]¶
Creates a compressed tarball (
outname
) with all files found inflist
.- Parameters
flist (list) – A list of files to include in this tarball
wdir (str) – The directory to “change” to when packing up the tar file. This will (essentially) be used in the tar command as the -C option by stripping off the beginning of the flist
outname (str) – The output file name
- Returns
The output file name
- Return type
str
- esm_archiving.esm_archiving.purify_expid_in(model_files, expid, restore=False)[source]¶
Puts or restores >>>EXPID<<< marker in filepatterns
- Parameters
model_files (dict) – The model files for archiving
expid (str) – The experiment ID to purify or restore
restore (bool) – Set experiment ID back from the temporary marker
- Returns
Dictionary containing keys for each model, values for file patterns
- Return type
dict
- esm_archiving.esm_archiving.query_yes_no(question, default='yes')[source]¶
Ask a yes/no question via
input()
and return their answer.“question” is a string that is presented to the user. “default” is the presumed answer if the user just hits <Enter>.
It must be “yes” (the default), “no” or None (meaning an answer is required of the user).
The “answer” return value is True for “yes” or False for “no”.
Note: Shamelessly stolen from StackOverflow It’s not hard to implement, but Paul is lazy…
- Parameters
question (str) – The question you’d like to ask the user
default (str) – The presumed answer for
question
. Defaults to “yes”.
- Returns
True if the user said yes, False if the use said no.
- Return type
bool
- esm_archiving.esm_archiving.run_command(command)[source]¶
Runs
command
and directly prints output to screen.- Parameters
command (str) – The command to run, with pipes, redirects, whatever
- Returns
rc – The return code of the subprocess.
- Return type
int
- esm_archiving.esm_archiving.sort_files_to_tarlists(model_files, start_date, end_date, config)[source]¶
- esm_archiving.esm_archiving.stamp_filepattern(filepattern, force_return=False)[source]¶
Transforms # in filepatterns to >>>DATE<<< and replaces other numbers back to original
- Parameters
filepattern (str) – Filepattern to get date stamps for
force_return (bool) – Returns the list of filepatterns even if it is longer than 1.
- Returns
New filepattern, with >>>DATE<<<
- Return type
str
- esm_archiving.esm_archiving.stamp_files(model_files)[source]¶
Given a sttandard file dictioanry (keys: model names, values: filepattern); figures out where the date probably is, and replaces the
#
sequence with a >>>DATE<<< stamp.- Parameters
model_files (dict) – Dictionary of keys (model names) where values are lists of files for each model.
- Returns
As the input, but replaces the filepatterns with the >>>DATE<<< stamp.
- Return type
dict
- esm_archiving.esm_archiving.sum_tar_lists(tar_lists)[source]¶
Sums up the amount of space in the tar lists dictionary
Given
tar_lists
, which is generally a dicitonary consisting of keys (model names) and values (files to be tarred), figures out how much space the raw, uncompressed files would use. Generally the compressed tarball will take up less space.- Parameters
tar_lists (dict) – Dictionary of file lists to be summed up. Reports every sum as a value for the key of that particular list.
- Returns
Keys are the same as in the input, values are the sums (in bytes) of all files present within the list.
- Return type
dict