Object Storage Service

Increasingly, some data sets archived at GEOFON can be retrieved using the object storage service hosted at GFZ, in addition to our standard services. Object Storage is a standard storage architecture in which data is managed as objects/blobs instead of files. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Objects are organized into "buckets". It is very effective to manage large amounts of unstructured data.

The vast majority of cloud storage available in the market leverages an object-storage architecture. Some notable examples are Amazon Simple Storage Service (S3), Microsoft Azure, Openstack services, and Google Cloud Storage.

To work with data stored in the GFZ object storage service you need a client that supports the S3 protocol. There are many of these for every platform. This page shows examples using aws-cli, but you can try with other clients, as long as they follow the S3 standard.

For instance, s3cmd is a typical command-based client. If you prefer a graphical client, try Cloudmounter that is supported by different platforms.

Methods to retrieve data

When you need to work with the data or the buckets remember that the endpoint of the service provided by GFZ is always https://s3.gfz-potsdam.de. You will also need to know the name of the bucket you would like to access. Some bucket names are shown in the examples below.

Types of data stored in buckets at GEOFON

MiniSEED

A typical example of a large permanent network is that of the "GEOFON Seismic Network" (FDSN network code GE). This is found in the s3://gc.ge bucket. To list the files there you may use the aws ls subcommand.


$ aws --no-sign-request --endpoint-url https://s3.gfz-potsdam.de s3 ls s3://gc.ge
                           PRE 1993/
                           PRE 1994/
                           PRE 1995/
                           PRE 1996/
....
                           PRE 2020/
                           PRE 2021/
                           PRE 2022/
                           PRE 2023/
2025-02-24 13:15:43        545 README.txt
NOTE: In 2023 there was around 1.2 TB of GE data. Please take care to request only the data you need.

Some temporary experiments can also be found, like the dataset from the "Cyclades project 2002-2005 and Libyan Sea offshore project 2003-2004" (FDSN network code ZZ, years 2002-2005; ca. 610 gigabytes) in the s3://gc.zz2002 bucket.


$ aws --no-sign-request --endpoint-url https://s3.gfz-potsdam.de s3 ls s3://gc.zz2002
                           PRE 2002/
                           PRE 2003/
                           PRE 2004/
                           PRE 2005/
2025-02-12 13:51:31        717 README.txt

More data sets will be added over time.

Raw DAS data

In the case of raw distributed acoustical sensing (DAS) data, we store the files coming from the interrogator exactly as we receive them. One example is the Global DAS Month 2023, Teleseismic Event Recordings, Potsdam Fiber data set (FDSN network code 3U, 2023, ca. 78 gigabytes), which is available in the s3://gc.3u2023 bucket.

$ aws --no-sign-request --endpoint-url https://s3.gfz-potsdam.de s3 ls s3://gc.3u2023
                           PRE continuous/
                           PRE event_based/
2025-02-24 11:47:40     383235 3u2023.json
2025-02-24 13:40:32        752 README.txt
...

The most important file for each data set is its README.txt which contains information about the dataset. This includes citation and license information and any special instructions to interpret the data.

In most cases, next to it you will find a JSON file with the code of the network as object name. This is the DAS metadata prepared by the data provider following the proposal by Voon-Hui Lai, et al., Seismol. Res. Lett. (2024) 95(3):1986-1999, https://doi.org/10.1785/0220230325.

Citation

If there is no additional information in the README.txt file, we suggest citing the seismic network directly, as shown on its landing page. For the GEOFON Seismic Network (FDSN network code GE) this would be:

GEOFON Data Centre (1993): GEOFON Seismic Network. GFZ Data Services. Dataset/Seismic Network. doi:10.14470/TR560404

In the unlikely event that it is necessary to refer to the data as an object storage S3 bucket, we suggest a citation like this:

GEOFON Data Centre (1993): gc.ge S3 Object Storage with raw data [Dataset]. GFZ Helmholtz Centre for Geosciences. https://s3.gfz-potsdam.de/gc.ge

Here the URL of the S3 endpoint appears, while author(s) and year of publication are as for the original data set. Digital object identifiers (DOIs) are not yet available for the S3 bucket variant of the data.

The AWS CLI client

Installation

The AWS Command Line Client version 2 can be installed from Amazon:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

It requires Python 3.9 or a more recent version.

You can run the "help" command of the client to see if it was properly installed.

$ aws help

Retrieval of data

You can list the contents of a bucket just as you would list files in a directory/folder by means of the Unix ls command.

$ aws --no-sign-request --endpoint-url https://s3.gfz-potsdam.de s3 ls s3://bucketname

In the examples here, bucketname is a place-holder for the name of the bucket; replace this with the name of your bucket.

Now, create a directory to store the data you want to request and move into it.

$ mkdir localdata
$ cd localdata

If you would like to synchronize a whole bucket to your local computer you can do it with the "sync" subcommand.

$ aws --no-sign-request --endpoint-url https://s3.gfz-potsdam.de s3 sync s3://bucketname/ .

Note the trailing dot (".") here, indicating to the aws command where the remote data should be stored locally. You may want to synchronize only part of the bucket. The --exclude and --include command line options may be helpful here. See the documentation for aws. So feel free to add more details to your remote path. For instance, to synchronize only data from year 2000:

$ aws --no-sign-request --endpoint-url https://s3.gfz-potsdam.de s3 sync s3://bucketname/2000 2000

Return to 'Waveform Access'