Data Management
From BG Users Forum
This page covers some basic elements of Grid file management.
Contents |
Glossary
UI - User Interface
SE - Storage Element
LFN - Logical File Name
LHC - Large Hadron Collider
LCG - LHC Computing Grid
LFC - LCG File Catalog
SURL - Storage Unique Resource Locator
GUID - Globally Unique Identifier
Note that the following examples assume that you have already created a proxy, that you are using a User Interface (UI), that you are using a Bourne-type shell. The same commands will also work inside a running job.
First some terminology. In the Grid, files are stored in Storage Elements (SEs). One logical file may have several identical replicas in different SEs. Files are identified by a Logical File Name (LFN), and a file catalogue stores the connection between the LFN and pointers to any replicas - the latter are known as Storage URLs (SURLs). The SURL may be partly specified by the user, but otherwise it will be generated automatically and for simple cases you do not need to worry about it. Files are also identified by a Globally Unique Identifier (GUID), which is a fixed-format string generated by the middleware and guaranteed to be absolutely unique. However, this is not very human-friendly, and for most purposes you can ignore it and just use the LFN.
For some purposes you need to know the name of the file catalogues for Balticgrid VO. This can be obtained with the lcg-infosites command:
lcg-infosites --vo balticgrid lfc
(Default LCG file catalog (LFC) for balticgrid VO is lfc.balticgrid.org)
If it is not defined by default you should store it in the environment variable LFC_HOST, e.g.:
export LFC_HOST=lfc.balticgrid.org
List files
LFNs follow a Unix-style naming system. You can explore the namespace with the lfc-ls command, which works in a similar way to the standard ls, although you should remember that the underlying structure is quite different from a Unix file system.
The top of the LFN namespace is normally /grid/balticgrid. The organisation of the namespace is defined by each VO, so you may need to consult VO-specific documentation to see if users are expected to create files in a particular area.
$ lfc-ls -l /grid/balticgrid drwxrwxr-x 6 115 101 0 Aug 28 17:01 andriusj drwxrwxr-x 7 123 101 0 Jun 26 12:26 arnka drwxrwxr-x 9 108 101 0 Jun 05 14:10 bartek drwxrwxr-x 4 104 101 0 Aug 09 2007 biit -rw-rw-r-- 1 116 101 209798 Nov 23 2007 con100_4.tar.gz drwxrwxr-x 13 116 101 0 May 24 01:41 danila drwxrwxr-x 1 116 101 0 Aug 19 2007 dapi2826
Replication operations
The examples below assume that LFC_HOME variable has been pointed to a suitable directory in which to create test files; directories will not be created automatically so you have to do that first if necessary, e.g.:
$ lfc-mkdir -p /grid/balticgrid/`id -nu`/test $ export LFC_HOME=/grid/balticgrid/`id -nu`/test
The following examples illustrate simple cases for storing, replicating, retrieving and deleting Grid files. Note that a -v option can be given to the lcg-* commands to get a more verbose description of what the command is doing. The commands all need to know the name of VO (balticgrid), which can be given with a --vo option. However, for these examples it will be assumed to be set as a default using the LCG_GFAL_VO variable:
$ export LCG_GFAL_VO=balticgrid
Also be convinced that variable VO_BALTICGRID_DEFAULT_SE is correct. Available storages can find with lcg-infosites command:
$ lcg-infosites --vo balticgrid se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 7320000000 13153 n.a se.basnet.by 5630000000 82950000000 n.a dpm.cyf-kr.edu.pl 365213640 1685427132 n.a kriit.eenet.ee 86390452 10487652 n.a birzs.latnet.lv 999800000 271870 n.a se.puduris.grid.lumii.lv 54266088 7918632 n.a atomas.itpa.lt 248650000 62 n.a fobas.itpa.lt
For example:
$ export VO_BALTICGRID_DEFAULT_SE=se.puduris.grid.lumii.lv
To begin with, create a test file called hw:
$ echo "Hello World" | cat > hw
Store the file on the Grid with the lcg-cr command (cr = copy®ister):
$ lcg-cr file:`pwd`/hw -l lfn:hw guid:c618f4aa-f011-484b-8cdb-7b838a79791a
Note that local files must be referred to as file: URLs using an absolute path. The command returns the GUID; technically the LFN is optional and you can refer to the file using the GUID, but normally you should use an LFN. You can check that the LFN has been created using the lfc-ls command as before:
$ lfc-ls -l -rw-rw-r-- 1 104 101 12 Sep 03 15:23 hw
You can store the file on some other SE by giving the SE name with a -d option to lcg-cr. Alternatively, replicate the existing file to another SE:
$ lcg-rep lfn:hw -d se.basnet.by -v
Using grid catalog type: lfc
Using grid catalog : lfc.balticgrid.org
Source URL: lfn:/grid/balticgrid/user/bg00785/test/hw
File size: 12
VO name: balticgrid
Destination specified: se.basnet.by
Source URL for copy: gsiftp://kriit.eenet.ee//store/SE/balticgrid/generated/2008-09-03/file33cf60a5-0d96-40e7-ab5a-eaca10a170ca
Destination URL for copy: gsiftp://se.basnet.by/se.basnet.by:/pool/balticgrid/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570.14888.0
# streams: 1
# set timeout to 0
0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
Transfer took 4340 ms
Destination URL registered in file catalog: srm://se.basnet.by/dpm/basnet.by/home/balticgrid/generated/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570
You can see the SURLs of all replicas registered under a given LFN with the lcg-lr command (lr = list replicas):
$ lcg-lr lfn:hw
This produces a response such as the following:
sfn://kriit.eenet.ee/store/SE/balticgrid/generated/2008-09-03/file33cf60a5-0d96-40e7-ab5a-eaca10a170ca srm://se.basnet.by/dpm/basnet.by/home/balticgrid/generated/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570
To retrieve a local copy of the file use the lcg-cp command:
$ lcg-cp lfn:hw file:`pwd`/hw2 $ cat hw2
This produces a response such as the following:
Hello World
Finally, to delete files there are two variants of the lcg-del command, depending on whether you want to delete just a single replica, or every instance plus the LFN. To delete an individual SURL (as obtained from lcg-lr):
$ lcg-del srm://se.basnet.by/dpm/basnet.by/home/balticgrid/generated/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570 $ lcg-lr lfn:hw
This produces a response such as the following:
sfn://kriit.eenet.ee/store/SE/balticgrid/generated/2008-09-03/file33cf60a5-0d96-40e7-ab5a-eaca10a170ca
Alternatively, with the -a option you can delete all replicas and the LFN itself:
$ lcg-del -a lfn:hw $ lcg-lr lfn:hw
This produces a response such as the following:
lcg_lr: No such file or directory
Alternatively you may wish to use the lfc-ls command:
$ lfc-ls hw
This produces a response such as the following:
hw: No such file or directory
Finally, to clean up if necessary delete the directory in which the test files were created (only if empty):
$ lfc-rm -r /grid/balticgrid/`id -nu`/test
Advanced Data Management
A case study of advanced data management is described here.
