BalticGrid-II Project | Baltic Grid Ticketing System | Baltic Grid User Interface

Data Management

From BG Users Forum

This page covers some basic elements of Grid file management.

Contents

Glossary

UI - User Interface
SE - Storage Element
LFN - Logical File Name
LHC - Large Hadron Collider
LCG - LHC Computing Grid
LFC - LCG File Catalog
SURL - Storage Unique Resource Locator
GUID - Globally Unique Identifier

Note that the following examples assume that you have already created a proxy, that you are using a User Interface (UI), that you are using a Bourne-type shell. The same commands will also work inside a running job.

First some terminology. In the Grid, files are stored in Storage Elements (SEs). One logical file may have several identical replicas in different SEs. Files are identified by a Logical File Name (LFN), and a file catalogue stores the connection between the LFN and pointers to any replicas - the latter are known as Storage URLs (SURLs). The SURL may be partly specified by the user, but otherwise it will be generated automatically and for simple cases you do not need to worry about it. Files are also identified by a Globally Unique Identifier (GUID), which is a fixed-format string generated by the middleware and guaranteed to be absolutely unique. However, this is not very human-friendly, and for most purposes you can ignore it and just use the LFN.

For some purposes you need to know the name of the file catalogues for Balticgrid VO. This can be obtained with the lcg-infosites command:

lcg-infosites --vo balticgrid lfc

(Default LCG file catalog (LFC) for balticgrid VO is lfc.balticgrid.org)


If it is not defined by default you should store it in the environment variable LFC_HOST, e.g.:

export LFC_HOST=lfc.balticgrid.org

List files

LFNs follow a Unix-style naming system. You can explore the namespace with the lfc-ls command, which works in a similar way to the standard ls, although you should remember that the underlying structure is quite different from a Unix file system.

The top of the LFN namespace is normally /grid/balticgrid. The organisation of the namespace is defined by each VO, so you may need to consult VO-specific documentation to see if users are expected to create files in a particular area.

$ lfc-ls -l /grid/balticgrid
drwxrwxr-x   6 115      101                       0 Aug 28 17:01 andriusj
drwxrwxr-x   7 123      101                       0 Jun 26 12:26 arnka
drwxrwxr-x   9 108      101                       0 Jun 05 14:10 bartek
drwxrwxr-x   4 104      101                       0 Aug 09  2007 biit
-rw-rw-r--   1 116      101                  209798 Nov 23  2007 con100_4.tar.gz
drwxrwxr-x  13 116      101                       0 May 24 01:41 danila
drwxrwxr-x   1 116      101                       0 Aug 19  2007 dapi2826

Replication operations

The examples below assume that LFC_HOME variable has been pointed to a suitable directory in which to create test files; directories will not be created automatically so you have to do that first if necessary, e.g.:

$ lfc-mkdir -p /grid/balticgrid/`id -nu`/test
$ export LFC_HOME=/grid/balticgrid/`id -nu`/test

The following examples illustrate simple cases for storing, replicating, retrieving and deleting Grid files. Note that a -v option can be given to the lcg-* commands to get a more verbose description of what the command is doing. The commands all need to know the name of VO (balticgrid), which can be given with a --vo option. However, for these examples it will be assumed to be set as a default using the LCG_GFAL_VO variable:

$ export LCG_GFAL_VO=balticgrid

Also be convinced that variable VO_BALTICGRID_DEFAULT_SE is correct. Available storages can find with lcg-infosites command:

$ lcg-infosites --vo balticgrid se
Avail Space(Kb) Used Space(Kb)  Type    SEs
----------------------------------------------------------
7320000000      13153           n.a     se.basnet.by
5630000000      82950000000     n.a     dpm.cyf-kr.edu.pl
365213640       1685427132      n.a     kriit.eenet.ee
86390452        10487652        n.a     birzs.latnet.lv
999800000       271870          n.a     se.puduris.grid.lumii.lv
54266088        7918632         n.a     atomas.itpa.lt
248650000       62              n.a     fobas.itpa.lt

For example:

$ export VO_BALTICGRID_DEFAULT_SE=se.puduris.grid.lumii.lv

To begin with, create a test file called hw:

$ echo "Hello World" | cat > hw

Store the file on the Grid with the lcg-cr command (cr = copy&register):

$ lcg-cr file:`pwd`/hw -l lfn:hw
guid:c618f4aa-f011-484b-8cdb-7b838a79791a

Note that local files must be referred to as file: URLs using an absolute path. The command returns the GUID; technically the LFN is optional and you can refer to the file using the GUID, but normally you should use an LFN. You can check that the LFN has been created using the lfc-ls command as before:

$ lfc-ls -l
-rw-rw-r--   1 104      101                      12 Sep 03 15:23 hw

You can store the file on some other SE by giving the SE name with a -d option to lcg-cr. Alternatively, replicate the existing file to another SE:

$ lcg-rep lfn:hw -d se.basnet.by -v
Using grid catalog type: lfc
Using grid catalog : lfc.balticgrid.org
Source URL: lfn:/grid/balticgrid/user/bg00785/test/hw
File size: 12
VO name: balticgrid
Destination specified: se.basnet.by
Source URL for copy: gsiftp://kriit.eenet.ee//store/SE/balticgrid/generated/2008-09-03/file33cf60a5-0d96-40e7-ab5a-eaca10a170ca
Destination URL for copy: gsiftp://se.basnet.by/se.basnet.by:/pool/balticgrid/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570.14888.0
# streams: 1
# set timeout to 0
            0 bytes      0.00 KB/sec avg      0.00 KB/sec inst
Transfer took 4340 ms
Destination URL registered in file catalog: srm://se.basnet.by/dpm/basnet.by/home/balticgrid/generated/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570

You can see the SURLs of all replicas registered under a given LFN with the lcg-lr command (lr = list replicas):

$ lcg-lr lfn:hw

This produces a response such as the following:

sfn://kriit.eenet.ee/store/SE/balticgrid/generated/2008-09-03/file33cf60a5-0d96-40e7-ab5a-eaca10a170ca
srm://se.basnet.by/dpm/basnet.by/home/balticgrid/generated/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570

To retrieve a local copy of the file use the lcg-cp command:

$ lcg-cp lfn:hw file:`pwd`/hw2
$ cat hw2

This produces a response such as the following:

Hello World

Finally, to delete files there are two variants of the lcg-del command, depending on whether you want to delete just a single replica, or every instance plus the LFN. To delete an individual SURL (as obtained from lcg-lr):

$ lcg-del srm://se.basnet.by/dpm/basnet.by/home/balticgrid/generated/2008-09-03/filecb62a432-601a-4099-9f79-cc9f295fd570
$ lcg-lr lfn:hw

This produces a response such as the following:

sfn://kriit.eenet.ee/store/SE/balticgrid/generated/2008-09-03/file33cf60a5-0d96-40e7-ab5a-eaca10a170ca

Alternatively, with the -a option you can delete all replicas and the LFN itself:

$ lcg-del -a lfn:hw
$ lcg-lr lfn:hw

This produces a response such as the following:

lcg_lr: No such file or directory

Alternatively you may wish to use the lfc-ls command:

$ lfc-ls hw

This produces a response such as the following:

hw: No such file or directory

Finally, to clean up if necessary delete the directory in which the test files were created (only if empty):

$ lfc-rm -r /grid/balticgrid/`id -nu`/test


Advanced Data Management

A case study of advanced data management is described here.