SONG Documentation¶
Introduction¶
What is SONG?¶
SONG is a robust metadata and validation system used to quickly and reliably track genome metadata scattered across multiple cloud storage systems. In the field of genomics and bioinformatics, metadata managed by simple solutions such as spreadsheets and text files require significant time and effort to maintain and ensure the data is reliable. With several users and thousands of genomic files, tracking the state of metadata and their associations can become a nightmare. The purpose of SONG is to minimize human intervention by imposing rules and structure to user uploads, which as a result produces high quality and reliable metadata with a minimal amount of effort. SONG is one of many products provided by Overture and is completely open-source and free for everyone to use.
See also
For additional information on other products in the Overture stack, please visit https://overture.bio
Features¶
- Synchronous and asynchronous metadata validation using JsonSchema
- Strictly enforced data relationships and fields
- Optional schema-less JSON info fields for user specific metadata
- Standard REST API that is easy to understand and work with
- Simple and fast metadata searching
- Export payloads for SONG mirroring
- Clear and concise error handling
- ACL security using OAuth2 and scopes based on study codes
- Unifies metadata with object data stored in SCORE
- Built-in Swagger UI for API interaction
Data Submission Workflow¶
The data submission workflow can be separated into 4 main stages:
- Metadata Upload (SONG)
- Metadata Saving (SONG)
- Object data Upload (SCORE)
- Publishing Metadata (SONG)
The following diagram summarized the steps involved in successful data submission using SONG and SCORE:
Projects Using SONG¶

Legend:
- Cancer Collaboratory - Toronto : song.cancercollaboratory.org
- AWS - Virginia : virginia.song.icgc.org
Getting Started¶
The easiest way to understand SONG, is to simply use it! Below is a short list of different ways to get started on interacting with SONG.
Tutorial using a CLI with Docker for SONG¶
The Docker for SONG tutorial is a great way to spin-up SONG and all its dependent services using Docker on your host machine. Use this if you want to play with SONG locally. Refer to the Docker for SONG documentation.
Tutorial using the Python SDK with SONG¶
The SONG Python SDK Tutorial is a Python client module that is used to interact with a running SONG server. Use it with one of the Projects Using SONG, or in combination with Docker for SONG. For more information to about the Python SDK, refer to the SONG Python SDK documentation.
Play with the REST API from your browser¶
If you want to play with SONG from your browser, simply visit the Swagger UI for each server:
- Cancer Collaboratory - Toronto: https://song.cancercollaboratory.org/swagger-ui.html
- AWS - Virginia: https://virginia.song.icgc.org/swagger-ui.html
See also
For more information about user access, refer to the User Access documentation.
Deploy SONG to Production¶
If you want to deploy SONG onto a server, refer to the Deploying a SONG Server in Production documentation.
License¶
Copyright (c) 2018. Ontario Institute for Cancer Research
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
User Access¶
DACO Authentication¶
SONG servers use the auth.icgc.org OAuth2 authorization service to authorize secure API requests. In order to create the neccessary access tokens to interact with the song-python-sdk and the SONG server, the user must have DACO access. For more information about obtaining DACO access, please visit the instructions for DACO Cloud Access.
OAuth2 Authorization¶
With proper DACO access, the user can create an access token, using the Access Tokens and Token Manager instructions.
For each cloud environment, there is a specific authorization scope that is needed:
- For the Collaboratory - Toronto SONG Server (https://song.cancercollaboratory.org), the required authorization scope needed is collab.upload.
- For the AWS - Virginia SONG Server (https://virginia.song.icgc.org), the required authorization scope needed is aws.upload.
SCORE Client¶
The SCORE client (formally the icgc-storage-client) is used to upload and download object data to and from the SCORE Server.
See also
For more information about SCORE, refer to https://www.overture.bio/products/score
Installation¶
For installation, please see Installing icgc-storage-client from Tarball instructions.
Configuration¶
For configuration, after un-archiving the tarball, modify the ./conf/application.properties
by adding the line:
accessToken=<my_access_token>
where the accessToken has the appropriate scope.
Note
There are a few storage servers available for DACO users to use, and each has there own required upload scope:
- In Collaboratory - Toronto , the https://storage.cancercollaboratory.org storage servers are used and require
collab.upload
scope for uploading files. - In AWS - Virginia , the https://virginia.cloud.icgc.org storage servers are used and require
aws.upload
scope for uploading files.
Usage¶
For more information about the usage of the client, refer to ICGC Storage Client Usage documentation.
Deploying a SONG Server in Production¶
The following section describes how to install, configure and run the SONG server in production.
Prerequisites¶
The following software dependencies are required in order to run the server:
- Bash Shell
- Java 8 or higher
- Postgres database
Note
Only a postgres database can be used, since postgres-specific features are used in the SONG server
Official Releases¶
Official SONG releases can be found here. The releases follow the symantic versioning specification and contain notes with a description of the bug fixes, new features or enhancements and breaking changes, as well as links to downloads and change logs. All official song releases are tagged in the format $COMPONENT-$VERSION
, where the $COMPONENT
portion follows the regex ^[a-z-]+$
and the $VERSION
component follows ^\d+\.\d+\.\d+$
. For the SONG server, the tag format has the regex: ^song-\d+\.\d+\.\d+$
. For example song-1.0.0
.
Installation¶
Once the desired release tag and therefore $VERSION
are known, the corresponding distibution can be downloaded using the command:
curl "https://artifacts.oicr.on.ca/artifactory/dcc-release/org/icgc/dcc/song-server/$VERSION/song-server-$VERSION-dist.tar.gz" -Ls -o song-server-$VERSION-dist.tar.gz
This distibution contains the default configuration and jars for running the server. To unarchive, run the command:
tar zxvf song-server-$VERSION-dist.tar.gz
Configuration¶
Server¶
By default, the SONG server distibution is configured to run in secure production mode. The server can easily be configured by creating the file ./conf/application-secure.properties
with the following contents:
################################
# SONG Server Config #
################################
server.port=8080
################################
# OAuth2 Server Config #
################################
# Scope prefix used to authorize requests to the SONG server.
auth.server.prefix=collab
# Endpoint to validate OAuth2 tokens
auth.server.url=https://auth.icgc.org/oauth/check_token
auth.server.clientId=<auth-client-id>
auth.server.clientSecret=<auth-client-secret>
################################
# ID Server Config #
################################
# URL of the ID server
id.idUrl=https://id.icgc.org
# ID server auth token, which has id.create scope
id.authToken=<id-server-auth-token>
# Enabled to use an ID server. If false, will use
# and in-memory id server (use only for testing)
id.realIds=true
################################
# Postgres Database Config #
################################
spring.datasource.url=jdbc:postgresql://localhost:5432/song?stringtype=unspecified
spring.datasource.username=<my-db-username>
spring.datasource.password=<my-db-password>
################################
# SCORE Storage Server Config #
################################
# URL used to ensure files exist in the storage server
# Note: The same SONG auth token will be used for requests sent
# to the SCORE server. This means the same scope must be
# authorized to access the SCORE storage service.
dcc-storage.url=https://storage.cancercollaboratory.org
The example file above configures the server to use the id.icgc.org
id service, auth.icgc.org
auth service, and the storage.cancercollaboratory.org
SCORE storage service with a local Postgres database, however any similar service can be used. For example, the Docker for SONG Microservice Architecture uses a different implementation of an OAuth2 server.
Database¶
If the user chooses to host their own song server database, it can easily be setup with a few commands. Assuming postgresql was installed, the following instructions describe how to configure the schema and user roles for the song database using any linux user with sudo permissions:
- Create the
song
db as the userpostgres
.
sudo -u postgres -c "createdb song"
- Create the password for the postgres user.
sudo -u postgres psql postgres -c ‘ALTER USER postgres WITH PASSWORD ‘myNewPassword’;
- Download the desired release’s song-server jar archive. Refer to Official Releases for more information.
wget ‘https://artifacts.oicr.on.ca/artifactory/dcc-release/org/icgc/dcc/song-server/$VERSION/song-server-$VERSION.jar’ -O /tmp/song-server.jar
- Extract the schema.sql from the song-server jar archive.
unzip -p /tmp/song-server.jar schema.sql > /tmp/schema.sql
- Load the schema.sql into the
song
db.
sudo -u postgres psql song < /tmp/schema.sql
Running as a Service¶
Although the SONG server distribution could be run as a standalone application, it must be manually started or stopped by the user. For a long-running server, sudden power loss or a hard reboot would mean the standalone application would need to be restarted manually. However, if the SONG server distribution is run as a service, the OS would be responsible for automatically restarting the service upon reboot. For this reason, the distibution should be configured as a service that is always started on boot.
Linux (SysV)¶
Assuming the directory path of the distribution is $SONG_SERVER_HOME
, the following steps will register the SONG server
as a SysV service on any Linux host supporting SysV and the Prerequisites, and configure it to start on boot.
# Register the SONG service
sudo ln -s $SONG_SERVER_HOME/bin/song-server /etc/init.d/song-server
# Start on boot (defaults)
sudo update-rc.d song-server defaults
It can also be manually managed using serveral commands:
# Start the service
sudo service song-server start
# Stop the service
sudo service song-server stop
# Restart the service
sudo service song-server restart
Docker for SONG¶
Introduction¶
Warning
Docker for SONG is meant to demonstrate the configuration and usage of SONG, and is NOT INTENDED FOR PRODUCTION. If you decide to ignore this warning and use this in any public or production environment, please remember to change the passwords, accessKeys, and secretKeys.
What is Docker for SONG?¶
Important Features¶
- Turn-key bring up of SONG, SCORE, dcc-id and the dcc-auth services
- Completely configurable via docker-compose environment variables (i.e change ports, jmx ports, hosts, credentials, and other important data fields). Values are injected into configurations using a custom python script
- Data from databases (
song-db
andid-db
) and auth service are saved in volumes. - Logs from the
song-server
,score-server
anddcc-id-server
are mounted to the docker host for easy viewing via the./logs
directory - SCORE and SONG clients are automatically downloaded, configured and mounted to the docker host via the
./data
directory - Minio (s3 object storage) data is also mounted via the
./data
directory. Files can be uploaded by simply copying into./data/minio
- Uses
base-ubuntu
andbase-db
images to minimize pulling and building of docker images, and maximize reuse - If you decide to go to production, the databases from the volumes can be easily dumped, and the data from minio can be uploaded directly
Bonus Features¶
The Minio and OAuth2 services can be managed using their UIs!
- Minio UI
- Url: http://localhost:8085
- AccessKey: minio
- SecretKey: minio123
- OAuth2 UI
- Adapted from the wonderful dandric/simpleauth docker container
- Url: http://localhost:8084/admin
- Username: john.doe
- Password: songpassword
Microservice Architecture¶
- Each box represents a docker container, and the lines connecting them indicate a TCP/IP connection.
- Each Postgres database is its own docker container.
storage-client
andsong-client
are command line tools and used locally. They are used to communicate with thestorage-server
andsong-server
, respectively
Prerequisites¶
Mandatory¶
- Docker version 17.09.0-ce or higher
- Linux docker host machine (cannot run on Docker for Mac or Docker for Windows)
- Docker-compose version 1.16.1 and up
- Ports 8080 to 8089 on localhost are unused
Getting Docker for SONG¶
In order to run the Docker for SONG, the latest release must be downloaded. Before downloading, the latest release tag must be found.
Find the Latest Official Release Tag¶
To find the latest official release tag, refer to Official Releases. Instead of using the song-
prefex for the regex, replace it with ^song-docker-\\d+\\.\\d+\\.\\d+$
. For example song-docker-1.0.0
.
Download¶
Using the desired release tag, the docker repository can be downloaded via:
Download ZIP¶
curl -Ls 'https://github.com/overture-stack/SONG/archive/$RELEASE_TAG.zip' -o $RELEASE_TAG.zip
Download TAR.GZ¶
curl -Ls 'https://github.com/overture-stack/SONG/archive/$RELEASE_TAG.tar.gz' -o $RELEASE_TAG.tar.gz
Download using GIT¶
git clone --branch $RELEASE_TAG https://github.com/overture-stack/SONG.git $RELEASE_TAG
Build and Run¶
From the song-docker
directory, run:
docker-compose build
docker-compose up
Note
An internet connection is only needed for the docker-compose build
command. No external services are required for the docker-compose up
command.
Configuration¶
- All contained within the docker-compose.yml
- If a port is occupied on the localhost, it can be reconfigured by changing the value of the environment variable defining it (i.e SERVER_PORT, PGPORT, ID_PORT … etc)
- Default song-docker credentials and information are stored in the credentials.txt file.
Tutorial¶
The following tutorial executes the complete data submission workflow in
4 stages using the Java CLI Client which is automatically configured in the song-docker/data/client
directory.
This tutorial assumes current working directory is the song-docker
directory.
Note
Although this tutorial uses the icgc-storage-client
, it is in the process of being renamed to the score-client
Stage 1: SONG Upload¶
- Check that the SONG server is running
./data/client/bin/sing status -p
- Upload the example VariantCall payload, which contains the metadata. The response will contain the
uploadId
./data/client/bin/sing upload -f ./example/exampleVariantCall.json
- Check the status of the upload, using the uploadId`. Ensure the response has the state
VALIDATED
./data/client/bin/sing status -u <uploadId>
- Record or remember the
uploadId
from the response for the next phase
Stage 2: SONG Saving and Manifest Generation¶
- Save or commit the finalized metadata. The response will contain the
analysisId
./data/client/bin/sing save -u <uploadId>
- Search for the saved analysis, and observe the field
analysisState
is set toUNPUBLISHED
./data/client/bin/sing search -a <analysisId>
- Optionally, if you have
jq
installed, you can pipe the output of the search, and filter out theanalysisState
field
./data/client/bin/sing search -a <analysisId> | jq ‘.analysisState’
- Generate a manifest for the
icgc-storage-client
in Stage 3 with the files located in the ./example source directory
sudo ./data/client/bin/sing manifest -a <analysisId> -f manifest.txt -d ./example
Stage 3: SCORE Upload¶
Upload the manifest file to the score-server
(formally the icgc-dcc-storage
server) using the icgc-storage-client. This will upload the files specified in the exampleVariantCall.json payload, which are located in the ./example
directory
./data/storage-client/bin/icgc-storage-client upload --manifest manifest.txt
Stage 4: SONG Publish¶
- Using the same
analysisId
as before, publish it. Essentially, this is the handshake between the metadata stored in the SONG server (via the analysisIds) and the files stored in thescore-server
(the files described by theanalysisId
)
./data/client/bin/sing publish -a <analysisId>
- Search the
analysisId
, pipe it to jq and filter foranalysisState
, and observe the analysis has finally been published !!!
./data/client/bin/sing search -a <analysisId> | jq ‘.analysisState’
License¶
Copyright (c) 2018. Ontario Institute for Cancer Research
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
Supported SDKs¶
SONG Python SDK¶
The SONG Python SDK is a simple python module that allows you to interact with a SONG server through Python, with a minimum of coding effort.
It lets you upload payloads synchronously or asynchronously, check their status and create analyses. From there, you can use the power of Python to process and analyze the data within those objects however you see fit.
Prerequisites¶
Python 3.6
is REQUIRED, since the SDK uses the dataclasses module.
Installation¶
The official SONG Python SDK is publically hosted on PyPi. To install it, just run the command below:
pip install overture-song
Configuration¶
- in generic way, explain how to configure the sdk to be used. just explain ApiConfig and which library to import
Tutorial¶
This section demonstrates example usage of the overture-song
sdk.
After completing this tutorial, you will have uploaded your first SONG metadata payload!
For the impatient, the code used below can be found in examples/example_upload.py.
Warning
Python 3.6
or higher is required.
Configuration¶
Create an ApiConfig
object. This object contains the serverUrl
, accessToken
, and studyId
that will be used to interact with the SONG API. In this example we will use https://song.cancercollaboratory.org for
the serverUrl and ‘ABC123’ for the studyId. For the access token, please refer to Creating an Access Token.
from overture_song.model import ApiConfig
api_config = ApiConfig('https://song.cancercollaboratory.org', 'ABC123', <my_access_token>)
Next the main API client needs to be instantiated in order to interact with the SONG server.
from overture_song.client import Api
api = Api(api_config)
As a sanity check, ensure that the server is running. If the response is True
, then you may proceed with the next
section, otherwise the server is not running.
>>> api.is_alive()
True
Create a Study¶
If the studyId ‘ABC123’ does not exist, then the StudyClient
must be
instantiated in order to read and create studies.
First create a study client,
from overture_song.client import StudyClient
study_client = StudyClient(api)
If the study associated with the payload does not exist, then create
a Study
entity,
from overture_song.entities import Study
if not study_client.has(api_config.study_id):
study = Study.create(api_config.study_id, "myStudyName", "myStudyDescription", "myStudyOrganization")
study_client.create(study)
Create a Simple Payload¶
Now that the study exists, you can create your first payload!
In this example, a SequencingReadAnalysis
will be created.
It follows the
SequencingRead JsonSchema.
See also
Similarly, for the VariantCallAnalysis
, refer to the
VariantCall JsonSchema.
Firstly, import all the entities to minimize the import statements.
from overture_song.entities import *
Next, create an example Donor
entity:
donor = Donor()
donor.studyId = api_config.study_id
donor.donorGender = "male"
donor.donorSubmitterId = "dsId1"
donor.set_info("randomDonorField", "someDonorValue")
Create an example Specimen
entity:
specimen = Specimen()
specimen.specimenClass = "Tumour"
specimen.specimenSubmitterId = "sp_sub_1"
specimen.specimenType = "Normal - EBV immortalized"
specimen.set_info("randomSpecimenField", "someSpecimenValue")
Create an example Sample
entity:
sample = Sample()
sample.sampleSubmitterId = "ssId1"
sample.sampleType = "RNA"
sample.set_info("randomSample1Field", "someSample1Value")
Create 1 or more example File
entities:
# File 1
file1 = File()
file1.fileName = "myFilename1.bam"
file1.studyId = api_config.study_id
file1.fileAccess = "controlled"
file1.fileMd5sum = "myMd51"
file1.fileSize = 1234561
file1.fileType = "VCF"
file1.set_info("randomFile1Field", "someFile1Value")
# File 2
file2 = File()
file2.fileName = "myFilename2.bam"
file2.studyId = api_config.study_id
file2.fileAccess = "controlled"
file2.fileMd5sum = "myMd52"
file2.fileSize = 1234562
file2.fileType = "VCF"
file2.set_info("randomFile2Field", "someFile2Value")
Create an example SequencingRead
experiment entity:
# SequencingRead
sequencing_read_experiment = SequencingRead()
sequencing_read_experiment.aligned = True
sequencing_read_experiment.alignmentTool = "myAlignmentTool"
sequencing_read_experiment.pairedEnd = True
sequencing_read_experiment.insertSize = 0
sequencing_read_experiment.libraryStrategy = "WXS"
sequencing_read_experiment.referenceGenome = "GR37"
sequencing_read_experiment.set_info("randomSRField", "someSRValue")
Finally, use the SimplePayloadBuilder
class along with the previously
create entities to create a payload.
from overture_song.tools import SimplePayloadBuilder
builder = SimplePayloadBuilder(donor, specimen, sample, [file1, file2], sequencing_read_experiment)
payload = builder.to_dict()
Use a Custom AnalysisId¶
In some situations, the user may prefer to use a custom analysisId
. If not specified in the payload, it is
automatically generated by the SONG server during the Save the Analysis step.
Although this tutorial uses the analysisId
generated by the SONG server, a custom analysisId
can be set
as follows:
payload['analysisId'] = 'my_custom_analysis_id'
Upload the Payload¶
With the payload built, the data can now be uploaded to the SONG server for validation. There are 2 modes for validation:
- Synchronous - uploads are validated SYNCHRONOUSLY. Although this is the default mode, it can be selected by setting the kwarg
is_async_validation
toFalse
from theupload
method. - Asynchronously - uploads are validated ASYNCHRONOUSLY. This allows the user to upload a batch of payloads. This mode can be selected by setting
is_async_validation
toTrue
.
After calling the upload
method, the payload will be sent to the SONG server for validation, and a response will be returned:
>>> api.upload(json_payload=payload, is_async_validation=False)
{
"status": "ok",
"uploadId": "UP-c49742d0-1fc8-4b45-9a1c-ea58d282ac58"
}
If the status
field from the response is ok
, this means the payload was successfully submitted to the SONG server for validation, and returned a randomly generated uploadId
, which is a receipt for the upload request.
Check the Status of the Upload¶
Before continuing, the previous upload’s status must be checked using the
status
method, in order to ensure the payload was successfully validated.
Using the previous uploadId
, the status of the upload can be requested and will return the following response:
>>> api.status('UP-c49742d0-1fc8-4b45-9a1c-ea58d282ac58')
{
"analysisId": "",
"uploadId": "UP-c49742d0-1fc8-4b45-9a1c-ea58d282ac58",
"studyId": "ABC123",
"state": "VALIDATED",
"createdAt": [
2018,
2,
16,
0,
54,
31,
73774000
],
"updatedAt": [
2018,
2,
16,
0,
54,
31,
75476000
],
"errors": [
""
],
"payload": {
"analysisState": "UNPUBLISHED",
"sample": [
{
"info": {
"randomSample1Field": "someSample1Value"
},
"sampleSubmitterId": "ssId1",
"sampleType": "RNA",
"specimen": {
"info": {
"randomSpecimenField": "someSpecimenValue"
},
"specimenSubmitterId": "sp_sub_1",
"specimenClass": "Tumour",
"specimenType": "Normal - EBV immortalized"
},
"donor": {
"info": {
"randomDonorField": "someDonorValue"
},
"donorSubmitterId": "dsId1",
"studyId": "Study1",
"donorGender": "male"
}
}
],
"file": [
{
"info": {
"randomFile1Field": "someFile1Value"
},
"fileName": "myFilename1.bam",
"studyId": "Study1",
"fileSize": 1234561,
"fileType": "VCF",
"fileMd5sum": "myMd51",
"fileAccess": "controlled"
},
{
"info": {
"randomFile2Field": "someFile2Value"
},
"fileName": "myFilename2.bam",
"studyId": "Study1",
"fileSize": 1234562,
"fileType": "VCF",
"fileMd5sum": "myMd52",
"fileAccess": "controlled"
}
],
"analysisType": "sequencingRead",
"experiment": {
"info": {
"randomSRField": "someSRValue"
},
"aligned": true,
"alignmentTool": "myAlignmentTool",
"insertSize": 0,
"libraryStrategy": "WXS",
"pairedEnd": true,
"referenceGenome": "GR37"
}
}
}
In order to continue with the next section, the state
field MUST have the value VALIDATED
, which indicates
the upload was validated and there were no errors. If there were errors, the state
field would have the value
VALIDATION_ERROR
, and the field errors
would contains details of the validation issues. If there is an error,
the user can simply correct the payload, re-upload and check the status again.
Save the Analysis¶
Once the upload is successfully validated, the upload must be saved using the
save
method. This generates the following response:
>>> api.save(status_response.uploadId, ignore_analysis_id_collisions=False)
{
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"status": "ok"
}
The value of ok
in the status
field of the response indicates that an analysis was successfully created. The analysis
will contain the same data as the payload, with the addition of server-side generated ids, which are generated by an
centralized id server. By default, the request DOES NOT IGNORE analysisId
collisions, however by setting the save method parameter ignore_analysis_id_collisions
to True
, collisions will
be ignored. This mechanism is considered an override and is heavily discouraged, however it is necessary considering the
complexities associated with managing genomic data.
Observe the UNPUBLISHED Analysis¶
Verify the analysis is unpublished by observing the value of the analysisState
field in the response for the
get_analysis
call. The value should be UNPUBLISHED
. Also, observe that
the SONG server generated an unique sampleId, specimenId, analysisId and objectId:
>>> api.get_analysis('23c61f55-12b4-11e8-b46b-23a48c7b1324')
{
"analysisType": "sequencingRead",
"info": {},
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"study": "ABC123",
"analysisState": "UNPUBLISHED",
"sample": [
{
"info": {
"randomSample1Field": "someSample1Value"
},
"sampleId": "SA599347",
"specimenId": "SP196154",
"sampleSubmitterId": "ssId1",
"sampleType": "RNA",
"specimen": {
"info": {
"randomSpecimenField": "someSpecimenValue"
},
"specimenId": "SP196154",
"donorId": "DO229595",
"specimenSubmitterId": "sp_sub_1",
"specimenClass": "Tumour",
"specimenType": "Normal - EBV immortalized"
},
"donor": {
"donorId": "DO229595",
"donorSubmitterId": "dsId1",
"studyId": "ABC123",
"donorGender": "male",
"info": {}
}
}
],
"file": [
{
"info": {
"randomFile1Field": "someFile1Value"
},
"objectId": "f553bbe8-876b-5a9c-a436-ff47ceef53fb",
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"fileName": "myFilename1.bam",
"studyId": "ABC123",
"fileSize": 1234561,
"fileType": "VCF",
"fileMd5sum": "myMd51 ",
"fileAccess": "controlled"
},
{
"info": {
"randomFile2Field": "someFile2Value"
},
"objectId": "6e2ee06b-e95d-536a-86b5-f2af9594185f",
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"fileName": "myFilename2.bam",
"studyId": "ABC123",
"fileSize": 1234562,
"fileType": "VCF",
"fileMd5sum": "myMd52 ",
"fileAccess": "controlled"
}
],
"experiment": {
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"aligned": true,
"alignmentTool": "myAlignmentTool",
"insertSize": 0,
"libraryStrategy": "WXS",
"pairedEnd": true,
"referenceGenome": "GR37",
"info": {
"randomSRField": "someSRValue"
}
}
}
Generate the Manifest¶
With an analysis created, a manifest file must be generated using the
ManifestClient
, the analysisId from the previously generated analysis, a path to the directory containing the files to be uploaded
, and an output file path. If the source_dir does not exist or if the files to be uploaded are not present in that directory
, then an error will be shown. By calling the
write_manifest
method, a
Manifest
object is generated and then written to a file.
This step is required for the next section involving the upload of the object files to the storage server.
from overture_song.client import ManifestClient
manifest_client = ManifestClient(api)
source_dir = "/path/to/directory/containing/files"
manifest_file_path = './manifest.txt'
manifest_client.write_manifest('23c61f55-12b4-11e8-b46b-23a48c7b1324', source_dir, manifest_file_path)
After successful execution, a manifest.txt
file will be generated and will have the following contents:
23c61f55-12b4-11e8-b46b-23a48c7b1324
f553bbe8-876b-5a9c-a436-ff47ceef53fb /path/to/directory/containing/files/myFilename1.bam myMd51
6e2ee06b-e95d-536a-86b5-f2af9594185f /path/to/directory/containing/files/myFilename2.bam myMd52
Upload the Object Files¶
Upload the object files specified in the payload, using the icgc-storage-client and the manifest file.
This will upload the files specified in the manifest.txt
file, which should all be located in the same directory.
For Collaboratory - Toronto:
./bin/icgc-storage-client --profile collab upload --manifest ./manifest.txt
For AWS - Virginia:
./bin/icgc-storage-client --profile aws upload --manifest ./manifest.txt
See also
Refer to the SCORE Client section for more information about installation, configuration and usage.
Publish the Analysis¶
Using the same analysisId
as before, publish it.
Essentially, this is the handshake between the metadata stored in the SONG server (via the analysisIds) and the object
files stored in the storage server (the files described by the analysisId
)
>>> api.publish('23c61f55-12b4-11e8-b46b-23a48c7b1324')
AnalysisId 23c61f55-12b4-11e8-b46b-23a48c7b1324 successfully published
Observe the PUBLISHED Analysis¶
Finally, verify the analysis is published by observing the value of the analysisState
field in the response for the
get_analysis
call. If the value is PUBLISHED
, then congratulations on your first metadata upload!!
>>> api.get_analysis('23c61f55-12b4-11e8-b46b-23a48c7b1324')
{
"analysisType": "sequencingRead",
"info": {},
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"study": "ABC123",
"analysisState": "PUBLISHED",
"sample": [
{
"info": {
"randomSample1Field": "someSample1Value"
},
"sampleId": "SA599347",
"specimenId": "SP196154",
"sampleSubmitterId": "ssId1",
"sampleType": "RNA",
"specimen": {
"info": {
"randomSpecimenField": "someSpecimenValue"
},
"specimenId": "SP196154",
"donorId": "DO229595",
"specimenSubmitterId": "sp_sub_1",
"specimenClass": "Tumour",
"specimenType": "Normal - EBV immortalized"
},
"donor": {
"donorId": "DO229595",
"donorSubmitterId": "dsId1",
"studyId": "ABC123",
"donorGender": "male",
"info": {}
}
}
],
"file": [
{
"info": {
"randomFile1Field": "someFile1Value"
},
"objectId": "f553bbe8-876b-5a9c-a436-ff47ceef53fb",
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"fileName": "myFilename1.bam",
"studyId": "ABC123",
"fileSize": 1234561,
"fileType": "VCF",
"fileMd5sum": "myMd51 ",
"fileAccess": "controlled"
},
{
"info": {
"randomFile2Field": "someFile2Value"
},
"objectId": "6e2ee06b-e95d-536a-86b5-f2af9594185f",
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"fileName": "myFilename2.bam",
"studyId": "ABC123",
"fileSize": 1234562,
"fileType": "VCF",
"fileMd5sum": "myMd52 ",
"fileAccess": "controlled"
}
],
"experiment": {
"analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
"aligned": true,
"alignmentTool": "myAlignmentTool",
"insertSize": 0,
"libraryStrategy": "WXS",
"pairedEnd": true,
"referenceGenome": "GR37",
"info": {
"randomSRField": "someSRValue"
}
}
}
Reference¶
The Song module implements a simple Python REST client that can be used to access a song server
client¶
entities¶
-
class
overture_song.entities.
Analysis
(analysisId: str = None, study: str = None, analysisState: str = 'UNPUBLISHED', sample: List[overture_song.entities.CompositeEntity] = None, file: List[overture_song.entities.File] = None)[source]¶ Bases:
overture_song.entities.Entity
-
analysisId
= None¶
-
analysisState
= 'UNPUBLISHED'¶
-
file
= None¶
-
sample
= None¶
-
study
= None¶
-
-
class
overture_song.entities.
CompositeEntity
(info: dict = None, sampleId: str = None, specimenId: str = None, sampleSubmitterId: str = None, sampleType: str = None, specimen: Type[overture_song.entities.Specimen] = None, donor: Type[overture_song.entities.Donor] = None)[source]¶ Bases:
overture_song.entities.Sample
-
donor
= None¶
-
specimen
= None¶
-
-
class
overture_song.entities.
Donor
(info: dict = None, donorId: str = None, donorSubmitterId: str = None, studyId: str = None, donorGender: str = None)[source]¶ Bases:
overture_song.entities.Metadata
,overture_song.validation.Validatable
-
donorGender
= None¶
-
donorId
= None¶
-
donorSubmitterId
= None¶
-
studyId
= None¶
-
-
class
overture_song.entities.
File
(info: dict = None, objectId: str = None, analysisId: str = None, fileName: str = None, studyId: str = None, fileSize: int = -1, fileType: str = None, fileMd5sum: str = None, fileAccess: str = None)[source]¶ Bases:
overture_song.entities.Metadata
,overture_song.validation.Validatable
-
analysisId
= None¶
-
classmethod
create
(fileName, fileSize, fileType, fileMd5sum, fileAccess, studyId=None, analysisId=None, objectId=None, info=None)[source]¶
-
fileAccess
= None¶
-
fileMd5sum
= None¶
-
fileName
= None¶
-
fileSize
= -1¶
-
fileType
= None¶
-
objectId
= None¶
-
studyId
= None¶
-
-
class
overture_song.entities.
Metadata
(info: dict = None)[source]¶ Bases:
overture_song.entities.Entity
-
info
= None¶
-
-
class
overture_song.entities.
Sample
(info: dict = None, sampleId: str = None, specimenId: str = None, sampleSubmitterId: str = None, sampleType: str = None)[source]¶ Bases:
overture_song.entities.Metadata
,overture_song.validation.Validatable
-
sampleId
= None¶
-
sampleSubmitterId
= None¶
-
sampleType
= None¶
-
specimenId
= None¶
-
-
class
overture_song.entities.
SequencingRead
(info: dict = None, analysisId: str = None, aligned: bool = None, alignmentTool: str = None, insertSize: int = None, libraryStrategy: str = None, pairedEnd: bool = None, referenceGenome: str = None)[source]¶ Bases:
overture_song.entities.Experiment
,overture_song.validation.Validatable
-
aligned
= None¶
-
alignmentTool
= None¶
-
analysisId
= None¶
-
classmethod
create
(aligned, alignmentTool, insertSize, libraryStrategy, pairedEnd, referenceGenome, analysisId=None)[source]¶
-
insertSize
= None¶
-
libraryStrategy
= None¶
-
pairedEnd
= None¶
-
referenceGenome
= None¶
-
-
class
overture_song.entities.
SequencingReadAnalysis
(analysisId: str = None, study: str = None, analysisState: str = 'UNPUBLISHED', sample: List[overture_song.entities.CompositeEntity] = None, file: List[overture_song.entities.File] = None, analysisType: str = 'sequencingRead', experiment: Type[overture_song.entities.SequencingRead] = None)[source]¶ Bases:
overture_song.entities.Analysis
,overture_song.validation.Validatable
-
analysisType
= 'sequencingRead'¶
-
classmethod
create
(experiment, sample, file, analysisId=None, study=None, analysisState='UNPUBLISHED', info=None)[source]¶
-
experiment
= None¶
-
-
class
overture_song.entities.
Specimen
(info: dict = None, specimenId: str = None, donorId: str = None, specimenSubmitterId: str = None, specimenClass: str = None, specimenType: str = None)[source]¶ Bases:
overture_song.entities.Metadata
,overture_song.validation.Validatable
-
classmethod
create
(donorId, specimenSubmitterId, specimenClass, specimenType, specimenId=None, info=None)[source]¶
-
donorId
= None¶
-
specimenClass
= None¶
-
specimenId
= None¶
-
specimenSubmitterId
= None¶
-
specimenType
= None¶
-
classmethod
-
class
overture_song.entities.
Study
(info: dict = None, studyId: str = None, name: str = None, organization: str = None, description: str = None)[source]¶ Bases:
overture_song.entities.Metadata
,overture_song.validation.Validatable
-
description
= None¶
-
name
= None¶
-
organization
= None¶
-
studyId
= None¶
-
-
class
overture_song.entities.
VariantCall
(info: dict = None, analysisId: str = None, variantCallingTool: str = None, matchedNormalSampleSubmitterId: str = None)[source]¶ Bases:
overture_song.entities.Experiment
,overture_song.validation.Validatable
-
analysisId
= None¶
-
matchedNormalSampleSubmitterId
= None¶
-
variantCallingTool
= None¶
-
-
class
overture_song.entities.
VariantCallAnalysis
(analysisId: str = None, study: str = None, analysisState: str = 'UNPUBLISHED', sample: List[overture_song.entities.CompositeEntity] = None, file: List[overture_song.entities.File] = None, analysisType: str = 'variantCall', experiment: Type[overture_song.entities.VariantCall] = None)[source]¶ Bases:
overture_song.entities.Analysis
,overture_song.validation.Validatable
-
analysisType
= 'variantCall'¶
-
classmethod
create
(experiment, sample, file, analysisId=None, study=None, analysisState='UNPUBLISHED', info=None)[source]¶
-
experiment
= None¶
-
model¶
-
class
overture_song.model.
ApiConfig
(server_url: str, study_id: str, access_token: str, debug: bool = False)[source]¶ Configuration object for the SONG
Api
Parameters: - server_url (str) – URL of a running song-server
- study_id (str) – StudyId to interact with
- access_token (str) – access token used to authorize the song-server api
- debug (bool) – Enable debug mode
-
debug
= False¶
-
class
overture_song.model.
FileUpdateRequest
[source]¶ Mutable request object used to update file data.
Parameters: - file_md5 (str) – MD5 checksum value to update
- file_size (int) – File size (bytes) to update
- file_access (int) – Access type to update
- file_info (dict) – json info metadata to update
-
fileAccess
= None¶
-
fileMd5sum
= None¶
-
fileSize
= None¶
-
info
= None¶
-
class
overture_song.model.
Manifest
(analysis_id, manifest_entries=None)[source]¶ Object representing the contents of a manifest file
Parameters: - analysis_id (str) – analysisId associated with a
collection of
ManifestEntry
- manifest_entries (List[
ManifestEntry
] or None) – optionally initialize a manifest with an existing list ofManifestEntry
-
add_entry
(manifest_entry)[source]¶ Add a
ManifestEntry
to this manifestParameters: manifest_entry ( ManifestEntry
) – entry to addReturns: None
-
write
(output_file_path, overwrite=False)[source]¶ Write this manifest entry to a file
Parameters: - output_file_path (str) – output file to write to
- overwrite (boolean) – if true, overwrite the file if it exists
Raises: SongClientException – throws this exception if the file exists and overwrite is False
Returns: None
- analysis_id (str) – analysisId associated with a
collection of
-
class
overture_song.model.
ManifestEntry
(fileId: str, fileName: str, md5sum: str)[source]¶ Represents a line in the manifest file pertaining to a file. The string representation of this object is the TSV of the 3 field values
Parameters: - fileId (str) – ObjectId of the file
- fileName (str) – name of the file. Should not include directories
- md5sum (str) – MD5 checksum of the file
-
classmethod
create_manifest_entry
(input_dir, data)[source]¶ Creates a
ManifestEntry
objectParameters: data – Any object with members named ‘objectId’, ‘fileName’, ‘fileMd5sum’. Returns: ManifestEntry
object
-
class
overture_song.model.
ServerErrors
[source]¶ Server error definitions used for classifying SongErrors
-
STUDY_ID_DOES_NOT_EXIST
= 1¶
-
-
exception
overture_song.model.
SongError
(errorId: str, httpStatusName: str, httpStatusCode: int, message: str, requestUrl: str, debugMessage: str, timestamp: str, stackTrace: tuple = <factory>)[source]¶ Object containing data related to a song server error
Parameters: - errorId (str) – The id for the song server error. Used to give more meaning to errors instead of just using http status codes.
- httpStatusName (str) – Standard http status name for a http status code
- httpStatusCode (int) – Standard http status code
- message (str) – Text describing the error
- requestUrl (str) – The request url that caused this error
- debugMessage (str) – Additional text describing the error
- timestamp (str) – Epoch time of when this error occurred
- stackTrace (tuple) – Server stacktrace of this error
-
classmethod
create_song_error
(data)[source]¶ Creates a new song error object. Used to convert a json/dict server error response to a python object
Parameters: data (dict) – input dictionary containing all the fields neccessary to create a song server error Return type: SongError
-
classmethod
get_field_names
()[source]¶ Get the field names associated with a
SongError
Return type: List[str]
-
classmethod
is_song_error
(data)[source]¶ Determine if the input dictionary contains all the fields defined in a SongError
Parameters: data (dict) – input dictionary containing all the fields neccessary to create a song server error Returns: true if the data contains all the fields, otherwise false Return type: boolean
rest¶
-
class
overture_song.rest.
ObjectRest
(*args, **kwargs)[source]¶ Bases:
overture_song.rest.Rest
tools¶
-
class
overture_song.tools.
BatchUploader
(server_url, access_token, payload_dir, debug=False)[source]¶ Bases:
object
-
class
overture_song.tools.
FileUploadClient
(api, filename, is_async_validation=False, ignore_analysis_id_collisions=False)[source]¶ Bases:
object
-
class
overture_song.tools.
FileUploadState
[source]¶ Bases:
enum.Enum
An enumeration.
-
NOT_UPLOADED
= 0¶
-
PUBLISHED
= 4¶
-
PUBLISH_ERROR
= -2¶
-
SAVED
= 3¶
-
SAVE_ERROR
= -3¶
-
STATUS_ERROR
= -5¶
-
SUBMITTED
= 1¶
-
UNKNOWN_ERROR
= -1¶
-
UPLOAD_ERROR
= -6¶
-
VALIDATED
= 2¶
-
VALIDATION_ERROR
= -4¶
-
-
class
overture_song.tools.
SimplePayloadBuilder
(donor: Type[overture_song.entities.Donor], specimen: Type[overture_song.entities.Specimen], sample: Type[overture_song.entities.Sample], files: List[overture_song.entities.File], experiment: object, analysisId: str = None)[source]¶ Bases:
object
-
analysisId
= None¶
-
utils¶
Frequently Asked Questions¶
How can I find the latest official release via command line?¶
Official releases can alternatively be found via command line as follows:
- Execute an unauthenticated github request (rate limited to 60 requests/hour) using
curl
andjq
curl --silent "https://api.github.com/repos/overture-stack/SONG/releases" | jq '.[].tag_name | match("^song-\\d+\\.\\d+\\.\\d+$") | .string' | head -1 | xargs echo
OR
- Execute an authenticated github request (rate limited to 5000 requests/hour) using
curl
andjq
curl --silent -H "Authorization: Bearer $MY_GITHUB_OAUTH_TOKEN" "https://api.github.com/repos/overture-stack/SONG/releases" | jq '.[].tag_name | match("^song-\\d+\\.\\d+\\.\\d+$") | .string' | head -1 | xargs echo
Contribute¶
If you’d like to contribute to this project, it’s hosted on github.