Contributing data from a DAS experiment to the GEOFON Data Centre @ GFZ ( DAS )¶
Scope of this document¶
The present document aims to provide simple guidelines to assist data providers during the archival of their DAS data at GEOFON DC according to the Guidelines on Research Data at the GFZ German Research Centre for Geosciences.
This is based on the Guidelines for the creation of derived product from raw DAS data published during the Geo-INQUIRE project.
General Recommendations¶
In the case that a derived product of a DAS experiment should be archived at GEOFON the following data management procedures should be followed.
- The raw DAS data (e.g. HDF5, TDMS files) must be saved either on tapes, or some other type of storage suitable for long-term preservation. This could be done by the hosting data centre, or by the PI, but it should be agreed in advance.
- As a derivative product, we expect to convert the data from proprietary format to miniSEED with data decimated to a sampling rate between 100 to 250 sps, as it is considered that this is a sampling rate suitable for seismological investigations. The unit of the data is expected to be strain rate but may also be strain or velocity.
- Not all channels (spatial resolution) are expected to be converted to miniSEED and archived in miniSEED format. Channel selection at regular intervals (e.g. 1 channel every 10m, 1 channel per km), or specific channels, can be used to create the derivative dataset. This will also considerably decrease the amount of storage needed.
- An FDSN network code for the experiment is mandatory if data has to be archived (and distributed) at a seismological data centre. If possible, this should be managed by GEOFON and the DOI should be also minted at GFZ (see next section).
- Network code as registered by FDSN (see next section).
- Station code including the number of the sampling point on the raw data. As in some community recommendations it is suggested that the first character should be a letter, it could be used to identify a cable (or a segment of it), while the rest identifies the sampling point (e.g. ‘A0001’, ‘A0123’, ‘B0234’).
- Empty location code in case that there are no complex setups, or that the dataset includes changes in the configuration. Otherwise, this code could be used to identify a particular configuration (e.g. with a number), that could be explained later in the technical description of the experiment.
- Channel code as ‘HSF’, or ‘FSF’ depending on the sampling rate.
- Blockettes 1000 and 1001 for timing quality should be present.
- Access restrictions: embargo period, contact person to handle restrictions and rough number of users expected during the embargo period. Are all the users known at the beginning or the list will continuously change?
For the metadata, a version of the DAS-RCN JSON metadata schema (https://doi.org/10.1785/0220230325) should be provided.
Request a network code at FDSN¶
The optic fiber cable(s), used as a seismic instrument, can be considered equivalent to an array, or network of sensors. In the case that the experiment includes more than one set of cables and interrogators (identified as acquisition in the DAS RCN context), there should still be a possibility to keep all under only one new network code or under an existing temporary or permanent network code.
For requesting a temporary network code for your project, we need the following information:
- Title of the experiment |
- Start year |
- End Year |
- Short abstract (maximum 200 words) |
- Estimated size |
- PI (ideally with ORCID identifer) |
- Funding Agency |
- Deployment area |
- Will the data be embargoed? |
- If so, expected release date, for GIPP* is up to 4 years after the completion of the experiment |
- License (after releasing dataset), default for GIPP* is CC-BY |
*Geophysical Instrument Pool Potsdam (GIPP) Terms of use
Most of the above named information will be reused for minting a DOI later. Therefore we have prepared one writable PDF-file for serving both purposes.
Please download and fill in this PDF to provide all the information needed to archive your dataset and create a DOI for it.
Of course, if you prefer, you can request an FDSN network code yourself . In this case please be cautious, there are a few little traps. Please keep in mind that we still need the PDF. Please also name the software that was used for downsampling and ideally provide a website.
- Please make sure to choose Do nothing right now if you do not already have a DOI minted for your network. GFZ will mint a DOI for you and afterwards insert it into the FDSN network mask.
- Please also make sure that you choose the right institution at Operating institution . You can also insert a new institution if you can't find yours in the drop-down menu. Please do not choose GEOFON Program here!
- Choose Geofon as your webservice in order to enable FDSN to get your station metadata from our webservices automatically.
If you prefer to request an FDSN network code on your own, please make sure to forward the response message to geofon_dc(at)gfz-potsdam.de !
Data Preparation¶
For data preparation we recommend the GIPPtools from the Geophysical Instrument Pool Potsdam by Christof Lendl or msmod by Chad Trabant.
The receiver program on our server expects MiniSEED data with the following configuration.
Header containing the final network code (assigned by FDSN), appropriate station code, channel naming and location code.
Please also choose:
Blocksize 512 or 4096
ByteOrder: BigEndian
Compression: Steim1 or Steim2 (please be consistent stationwise to avoid different compressions in one day file)
Since the conversion can be very time consuming especially for large datasets, we strongly recommend to try first with a small amount of data.
Send Metadata¶
Station.xml or SeisComp3 inventory.xml or dataless seed or a table containing the following information:
| StationCode | Place/Country | Latitude | Longitude | Elevation | SamplingRate | LocalDepth | StartDate | EndDate |
If you provide station.xml, please make sure that your streams have start-date and end-date of deployment (for temporary networks). Even if we also may store data from before and after deployment time, only the data during the deployment epoch will be available for the user.
How to choose the proper source ID?¶
Station naming¶
If we consider that each fiber could have thousands of sampling points, with a spatial separation as small as 2–4 m and a length of tens of kilometers, a station designation seems to be the best fit. As stations have a unique position, we cannot consider such a deployment to be a single- station–multiple-channels configuration.
The most reasonable approach is to define a station for each sampling point, similar to the situation in nodal experiments in which each node is typically assigned a station code.
We recommend using the first character of the station code to identify different segments, or cables, and the rest as a numerical channel code assigned by the interrogator to the station code in order to establish a clear relation to the channel in the raw data (e.g. ‘A0123’, ‘A0124’, ‘B0125’).
Location codes¶
A situation that usually happens in the experiments with DAS equipment is the change of the configuration of the acquisition (e.g. number of channels, gauge length, sampling rate). We suggest that this field could be used to track these different (sub)experiments. Then, there will be no need to check details of the metadata to recognize different acquisitions. This also gives the opportunity to reuse the coordinates of the sampling points if needed without the need to define new (virtual) stations, or sampling points.
Channel codes¶
For fibre optics experiments we use channel code HSF according to the SEED channel naming .
Data Transfer¶
Miniseed¶
Now you have to download and install the Ringserver Client software miniseed2dmc
Please read the manual . It‘s possible to do Dry Runs (without server connection) for testing purposes, in order to find out if the data is recognized as miniSEED.
The next step should be the transfer of a small amount of data to see if all criteria have been met.
Call miniseed2dmc -v 139.17.3.77:port with port number being assigned by our data centre.
In any case, should you have difficulties with one or more of the steps described above, please contact us at geofon_dc(at)gfz.de. Most probably we‘ll find a solution.
Raw DAS data¶
You can transfer your raw data (E.g. HDF5 files) directly to an S3 bucket (Object Storage) that we create for that purpose. For that, we will inform you the name of the bucket (e.g. "s3://bucketname"), and send you a key that you need to use with your preferred S3 client (e.g. s3cmd).
Save the file we send you at your home directory under the name ".s3cfg". Usually some client tools use that default location. Otherwise, you can always pass the filename through a parameter.
To test your keys you can upload a file to the bucket with your credentials, then list it, then delete it, and list the bucket contents again, that should show that it is empty.
$ s3cmd put /home/user/deleteme.txt s3://bucketname -c ~/.s3cfg
upload: '/home/user/deleteme.txt' -> 's3://bucketname/deleteme.txt' [1 of 1]
103585 of 103585 100% in 0s 2.22 MB/s done
$ s3cmd ls -H -c ~/.s3cfg
$ s3cmd ls s3://bucketname -c ~/.s3cfg
2024-11-25 12:13 103585 s3://bucketname/deleteme.txt
$ s3cmd del s3://bucketname/deleteme.txt -c ~/.s3cfg
delete: 's3://bucketname/deleteme.txt'
$ s3cmd ls s3://bucketname -c ~/.s3cfg
$
DataCite Metadata¶
An important aspect to address in this type of datasets is how to properly create DataCite metadata for them. Some important pieces of information to include are:
- Predefined Terms from a vocabulary ("subjects"). Some commonly used are the ones from NASA or INSPIRE, but some ad hoc vocabularies could be used, as long as they are formal enough. For instance, the SeisData controlled vocabulary contains some typical terms used to tag and describe seismic datasets.
- Contributors: apart from all the contributors provided by the PI, we could add a "HostingInstitution", a "DataManager", and a "Sponsor", if this is proper.
- Related identifiers: links to the raw data, a paper describing the experiment (if any), and a link to the DAS-RCN JSON metadata are some of the other entities to associate.
- The institution or project of the PI should be included as a funder if that is proper.
Looking forward to hosting your data at our data centre,
The GEOFON DC operators
Updated by Peter Evans about 2 months ago · 11 revisions