SONG Documentation

Introduction

What is SONG?

SONG is a robust metadata and validation system used to quickly and reliably track genome metadata scattered across multiple cloud storage systems. In the field of genomics and bioinformatics, metadata managed by simple solutions such as spreadsheets and text files require significant time and effort to maintain and ensure the data is reliable. With several users and thousands of genomic files, tracking the state of metadata and their associations can become a nightmare. The purpose of SONG is to minimize human intervention by imposing rules and structure to user uploads, which as a result produces high quality and reliable metadata with a minimal amount of effort. SONG is one of many products provided by Overture and is completely open-source and free for everyone to use.

See also

For additional information on other products in the Overture stack, please visit https://overture.bio


Features

  • Synchronous and asynchronous metadata validation using JsonSchema
  • Strictly enforced data relationships and fields
  • Optional schema-less JSON info fields for user specific metadata
  • Standard REST API that is easy to understand and work with
  • Simple and fast metadata searching
  • Export payloads for SONG mirroring
  • Clear and concise error handling
  • ACL security using OAuth2 and scopes based on study codes
  • Unifies metadata with object data stored in SCORE
  • Built-in Swagger UI for API interaction

Data Submission Workflow

The data submission workflow can be separated into 4 main stages:

  1. Metadata Upload (SONG)
  2. Metadata Saving (SONG)
  3. Object data Upload (SCORE)
  4. Publishing Metadata (SONG)

The following diagram summarized the steps involved in successful data submission using SONG and SCORE:

_images/song-workflow.svg

Projects Using SONG

_images/song_projects_static_map.png

Legend:

Getting Started

The easiest way to understand SONG, is to simply use it! Below is a short list of different ways to get started on interacting with SONG.

Tutorial using a CLI with Docker for SONG

The Docker for SONG tutorial is a great way to spin-up SONG and all its dependent services using Docker on your host machine. Use this if you want to play with SONG locally. Refer to the Docker for SONG documentation.

Tutorial using the Python SDK with SONG

The SONG Python SDK Tutorial is a Python client module that is used to interact with a running SONG server. Use it with one of the Projects Using SONG, or in combination with Docker for SONG. For more information to about the Python SDK, refer to the SONG Python SDK documentation.

Play with the REST API from your browser

If you want to play with SONG from your browser, simply visit the Swagger UI for each server:

  1. Cancer Collaboratory - Toronto: https://song.cancercollaboratory.org/swagger-ui.html
  2. AWS - Virginia: https://virginia.song.icgc.org/swagger-ui.html

See also

For more information about user access, refer to the User Access documentation.

Deploy SONG to Production

If you want to deploy SONG onto a server, refer to the Deploying a SONG Server in Production documentation.

License

Copyright (c) 2019. Ontario Institute for Cancer Research

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

User Access

DACO Authentication

SONG servers use the auth.icgc.org OAuth2 authorization service to authorize secure API requests. In order to create the neccessary access tokens to interact with the song-python-sdk and the SONG server, the user must have DACO access. For more information about obtaining DACO access, please visit the instructions for DACO Cloud Access.

OAuth2 Authorization

With proper DACO access, the user can create an access token, using the Access Tokens and Token Manager instructions.

For each cloud environment, there is a specific authorization scope that is needed:

SCORe Client

The SCORe client (formally the icgc-storage-client) is used to upload and download object data to and from the SCORe Server.

See also

For more information about SCORE, refer to https://www.overture.bio/products/score

Installation

For installation, please see Installing SCORe client from Tarball instructions.

Configuration

For configuration, after un-archiving the tarball, modify the ./conf/application.properties by adding the line:

accessToken=<my_access_token>

where the accessToken has the appropriate scope.

Note

There are a few storage servers available for DACO users to use, and each has there own required upload scope:

Usage

For more information about the usage of the client, refer to SCORe Client Usage documentation.

Deploying a SONG Server in Production

The following section describes how to install, configure and run the SONG server in production.

Prerequisites

The following software dependencies are required in order to run the server:

  • Bash Shell
  • Java 8 or higher
  • Postgres database

Note

Only a postgres database can be used, since postgres-specific features are used in the SONG server

Official Releases

Official SONG releases can be found here. The releases follow the symantic versioning specification and contain notes with a description of the bug fixes, new features or enhancements and breaking changes, as well as links to downloads and change logs. All official song releases are tagged in the format $COMPONENT-$VERSION, where the $COMPONENT portion follows the regex ^[a-z-]+$ and the $VERSION component follows ^\d+\.\d+\.\d+$ . For the SONG server, the tag format has the regex: ^song-\d+\.\d+\.\d+$. For example song-1.0.0.

Installation

Once the desired release tag and therefore $VERSION are known, the corresponding distibution can be downloaded using the command:

curl "https://artifacts.oicr.on.ca/artifactory/dcc-release/bio/overture/song-server/$VERSION/song-server-$VERSION-dist.tar.gz" -Ls -o song-server-$VERSION-dist.tar.gz

This distribution contains the default configuration and jars for running the server. To unarchive, run the command:

tar zxvf song-server-$VERSION-dist.tar.gz

Configuration

Server

By default, the SONG server distribution is configured to run in secure production mode. The server can easily be configured by creating the file ./conf/application-secure.properties with the following contents:

################################
#     SONG Server Config       #
################################

server.port=8080

################################
#     OAuth2 Server Config     #
################################

# Scope prefix used to authorize requests to the SONG server.
# For example, using the configurations below, the User-Agent's
# access token would need to have collab.upload scope in order to
# complete an authorized request
auth.server.prefix=collab
auth.server.suffix=upload

# Endpoint to validate OAuth2 tokens
auth.server.url=https://auth.icgc.org/oauth/check_token

auth.server.clientId=<auth-client-id>
auth.server.clientSecret=<auth-client-secret>


################################
#       ID Server Config       #
################################

# URL of the ID server
id.idUrl=https://id.icgc.org

# Application level access token used to interact with the ID server.
# The access token must have id.create scope
id.authToken=<id-server-access-token>

# Enabled to use an ID server. If false, will use
# and in-memory id server (use only for testing)
id.realIds=true

################################
#   Postgres Database Config   #
################################

spring.datasource.url=jdbc:postgresql://localhost:5432/song?stringtype=unspecified
spring.datasource.username=<my-db-username>
spring.datasource.password=<my-db-password>

# Enable flyway to manage database migrations automatically
spring.flyway.enabled=true
spring.flyway.locations=classpath:db/migration

################################
# SCORe Server Config  #
################################

# URL used to ensure files exist in the score server
score.url=https://storage.cancercollaboratory.org

# Application level access token used internally by the SONG server to download
# additional file metadata from the SCORe server. This access token must have the
# correct download scope inorder to download from SCORe. In the case of collab,
# it would be collab.download
score.accessToken=<score-access-token-with-download-scope>

The example file above configures the server to use the id.icgc.org id service, auth.icgc.org auth service, and the storage.cancercollaboratory.org SCORe service with a local Postgres database, however any similar service can be used. For example, the Docker for SONG Microservice Architecture uses a different implementation of an OAuth2 server.

Database

If the user chooses to host their own song server database, it can easily be initialized with a few commands. As of song-1.5.0, SONG server database migrations are managed by flyway. When upgrading the SONG server version, a flyway migration must be run.

The following steps show how to create an empty database, and migrate a new or exising database using flyway.

Migrating a Database

This scenario is relevant to users installing a SONG server for the first time, or for those upgrading the SONG server to a newer version.

If the database doesnt exist yet, a flyway migration can easily be run on a newly created postgres database using the example commands below where, for example, the database user is postgres, password is password, database name is song and database url is http://localhost:8082

1. Create an empty database with password and user

Skip this step and move to step 2 if the database already exists.

# Create an empty database called "song" with user "postgres"
sudo -u postgres psql -c "createdb song"

# Create the password "myNewPassword" for the user "postgres"
sudo -u postgres psql postgres -c ‘ALTER USER postgres WITH PASSWORD ‘myNewPassword’;

2. Run a flyway migration on the empty or existing database for a particular SONG server version.

This step should be run initially on an empty database or when upgrading the SONG server version. In either case, the same commands should be executed:

# Clone the SONG repository for version "song-X.X.X"
git clone --branch song-X.X.X https://github.com/overture-stack/song

# Run the migration on the empty database "song" for version "song-X.X.X"
cd song
./mvnw -pl song-server flyway:migrate \
   -Dflyway.url=jdbc:postgresql://localhost:8082/song?stringtype=unspecified \
   -Dflyway.user=postgres \
   -Dflyway.password=password \
   -Dflyway.locations=db/migration

Running as a Service

Although the SONG server distribution could be run as a standalone application, it must be manually started or stopped by the user. For a long-running server, sudden power loss or a hard reboot would mean the standalone application would need to be restarted manually. However, if the SONG server distribution is run as a service, the OS would be responsible for automatically restarting the service upon reboot. For this reason, the distribution should be configured as a service that is always started on boot.

Linux (SysV)

Assuming the directory path of the distribution is $SONG_SERVER_HOME, the following steps will register the SONG server as a SysV service on any Linux host supporting SysV and the Prerequisites, and configure it to start on boot.

# Register the SONG service
sudo ln -s $SONG_SERVER_HOME/bin/song-server /etc/init.d/song-server

# Start on boot (defaults)
sudo update-rc.d song-server defaults

It can also be manually managed using serveral commands:

# Start the service
sudo service song-server start

# Stop the service
sudo service song-server stop

# Restart the service
sudo service song-server restart

Docker for SONG

Introduction

Warning

Docker for SONG is meant to demonstrate the configuration and usage of SONG, and is NOT INTENDED FOR PRODUCTION. If you decide to ignore this warning and use this in any public or production environment, please remember to change the passwords, accessKeys, and secretKeys.

What is Docker for SONG?

Important Features

  • Turn-key bring up of SONG, SCORE, dcc-id and the dcc-auth services
  • Completely configurable via docker-compose environment variables (i.e change ports, jmx ports, hosts, credentials, and other important data fields). Values are injected into configurations using a custom python script
  • Data from databases (song-db and id-db) and auth service are saved in volumes.
  • Logs from the song-server, score-server and dcc-id-server are mounted to the docker host for easy viewing via the ./logs directory
  • SCORE and SONG clients are automatically downloaded, configured and mounted to the docker host via the ./data directory
  • Minio (s3 object storage) data is also mounted via the ./data directory. Files can be uploaded by simply copying into ./data/minio
  • Uses base-ubuntu and base-db images to minimize pulling and building of docker images, and maximize reuse
  • If you decide to go to production, the databases from the volumes can be easily dumped, and the data from minio can be uploaded directly
Bonus Features

The Minio and OAuth2 services can be managed using their UIs!

  1. Minio UI
  2. OAuth2 UI

Microservice Architecture

  • Each box represents a docker container, and the lines connecting them indicate a TCP/IP connection.
  • Each Postgres database is its own docker container.
  • score-client and song-client are command line tools and used locally. They are used to communicate with the score-server and song-server, respectively

Note

the DCC-Storage Server is now the SCORE Server

_images/song-docker-service-architecture.svg

Prerequisites

Mandatory

  • Docker version 17.09.0-ce or higher
  • Linux docker host machine (cannot run on Docker for Mac or Docker for Windows)
  • Docker-compose version 1.16.1 and up
  • Ports 8080 to 8089 on localhost are unused

Optional

  • jq for json formatting and grepping (install via apt install jq)

Getting Docker for SONG

In order to run the Docker for SONG, the latest release must be downloaded. Before downloading, the latest release tag must be found.

Find the Latest Official Release Tag

To find the latest official release tag, refer to Official Releases.

Download

Using the desired release tag, the docker repository can be downloaded via:

Download ZIP
curl -Ls 'https://github.com/overture-stack/SONG/archive/$RELEASE_TAG.zip' -o $RELEASE_TAG.zip
Download TAR.GZ
curl -Ls 'https://github.com/overture-stack/SONG/archive/$RELEASE_TAG.tar.gz' -o $RELEASE_TAG.tar.gz
Download using GIT
git clone --branch $RELEASE_TAG https://github.com/overture-stack/SONG.git $RELEASE_TAG

Build and Run

From the root song directory, run:

docker-compose build
docker-compose up -d

Note

An internet connection is only needed for the docker-compose build command and may take several minutes to build. No external services are required for the docker-compose up command.

Configuration

  • All contained within the docker-compose.yml
  • If a port is occupied on the localhost, it can be reconfigured by changing the value of the environment variable defining it (i.e SERVER_PORT, PGPORT, ID_PORT … etc)
  • Default song-docker credentials and information are stored in the credentials.txt file.

Tutorial

The following tutorial executes the complete data submission workflow in 4 stages using the Java CLI Client which is automatically configured in the song-docker-demo/data/client directory. This tutorial assumes current working directory is the song-docker-demo directory.

Note

the icgc-storage-client is now renamed to score-client

Stage 1: SONG Upload

  1. Check that the SONG server is running
./data/client/bin/sing status -p
  1. Upload the example VariantCall payload, which contains the metadata. The response will contain the uploadId
./data/client/bin/sing upload -f  ./example/exampleVariantCall.json
  1. Check the status of the upload, using the uploadId`. Ensure the response has the state VALIDATED
./data/client/bin/sing status -u <uploadId>
  1. Record or remember the uploadId from the response for the next phase

Stage 2: SONG Saving and Manifest Generation

  1. Save or commit the finalized metadata. The response will contain the analysisId
./data/client/bin/sing save -u <uploadId>
  1. Search for the saved analysis, and observe the field analysisState is set to UNPUBLISHED
./data/client/bin/sing search -a <analysisId>
  1. Optionally, if you have jq installed, you can pipe the output of the search, and filter out the analysisState field
./data/client/bin/sing search -a <analysisId>    |  jq ‘.analysisState’
  1. Generate a manifest for the score-client in Stage 3 with the files located in the ./example source directory
sudo ./data/client/bin/sing manifest -a <analysisId> -f manifest.txt -d ./example

Stage 3: SCORE Upload

Upload the manifest file to the score-server (formally the icgc-dcc-storage server) using the score-client. This will upload the files specified in the exampleVariantCall.json payload, which are located in the ./example directory

./data/storage-client/bin/score-client upload --manifest manifest.txt

Stage 4: SONG Publish

  1. Using the same analysisId as before, publish it. Essentially, this is the handshake between the metadata stored in the SONG server (via the analysisIds) and the files stored in the score-server (the files described by the analysisId)
./data/client/bin/sing publish -a <analysisId>
  1. Search the analysisId, pipe it to jq and filter for analysisState, and observe the analysis has finally been published !!!
./data/client/bin/sing search -a <analysisId>    |  jq ‘.analysisState’

Issues

If you encounter any issues, please report them here

License

Copyright (c) 2019. Ontario Institute for Cancer Research

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

Supported SDKs

SONG Python SDK

The SONG Python SDK is a simple python module that allows you to interact with a SONG server through Python, with a minimum of coding effort.

It lets you upload payloads synchronously or asynchronously, check their status and create analyses. From there, you can use the power of Python to process and analyze the data within those objects however you see fit.

Prerequisites

Python 3.6 is REQUIRED, since the SDK uses the dataclasses module.

Installation

The official SONG Python SDK is publically hosted on PyPi. To install it, just run the command below:

pip install overture-song

Configuration

  • in generic way, explain how to configure the sdk to be used. just explain ApiConfig and which library to import

Tutorial

This section demonstrates example usage of the overture-song sdk. After completing this tutorial, you will have uploaded your first SONG metadata payload!

For the impatient, the code used below can be found in examples/example_upload.py.

Warning

Python 3.6 or higher is required.

Configuration

Create an ApiConfig object. This object contains the serverUrl, accessToken, and studyId that will be used to interact with the SONG API. In this example we will use https://song.cancercollaboratory.org for the serverUrl and ‘ABC123’ for the studyId. For the access token, please refer to Creating an Access Token.

from overture_song.model import ApiConfig
api_config = ApiConfig('https://song.cancercollaboratory.org', 'ABC123', <my_access_token>)

Next the main API client needs to be instantiated in order to interact with the SONG server.

from overture_song.client import Api
api = Api(api_config)

As a sanity check, ensure that the server is running. If the response is True, then you may proceed with the next section, otherwise the server is not running.

>>> api.is_alive()
True
Create a Study

If the studyId ‘ABC123’ does not exist, then the StudyClient must be instantiated in order to read and create studies.

First create a study client,

from overture_song.client import StudyClient
study_client = StudyClient(api)

If the study associated with the payload does not exist, then create a Study entity,

from overture_song.entities import Study
if not study_client.has(api_config.study_id):
     study = Study.create(api_config.study_id, "myStudyName", "myStudyDescription", "myStudyOrganization")
     study_client.create(study)
Create a Simple Payload

Now that the study exists, you can create your first payload! In this example, a SequencingReadAnalysis will be created. It follows the SequencingRead JsonSchema.

See also

Similarly, for the VariantCallAnalysis, refer to the VariantCall JsonSchema.

Firstly, import all the entities to minimize the import statements.

from overture_song.entities import *

Next, create an example Donor entity:

donor = Donor()
donor.studyId = api_config.study_id
donor.donorGender = "male"
donor.donorSubmitterId = "dsId1"
donor.set_info("randomDonorField", "someDonorValue")

Create an example Specimen entity:

specimen = Specimen()
specimen.specimenClass = "Tumour"
specimen.specimenSubmitterId = "sp_sub_1"
specimen.specimenType = "Normal - EBV immortalized"
specimen.set_info("randomSpecimenField", "someSpecimenValue")

Create an example Sample entity:

sample = Sample()
sample.sampleSubmitterId = "ssId1"
sample.sampleType = "RNA"
sample.set_info("randomSample1Field", "someSample1Value")

Create 1 or more example File entities:

# File 1
file1 = File()
file1.fileName = "myFilename1.bam"
file1.studyId = api_config.study_id
file1.fileAccess = "controlled"
file1.fileMd5sum = "myMd51"
file1.fileSize = 1234561
file1.fileType = "VCF"
file1.set_info("randomFile1Field", "someFile1Value")

# File 2
file2 = File()
file2.fileName = "myFilename2.bam"
file2.studyId = api_config.study_id
file2.fileAccess = "controlled"
file2.fileMd5sum = "myMd52"
file2.fileSize = 1234562
file2.fileType = "VCF"
file2.set_info("randomFile2Field", "someFile2Value")

Create an example SequencingRead experiment entity:

# SequencingRead
sequencing_read_experiment = SequencingRead()
sequencing_read_experiment.aligned = True
sequencing_read_experiment.alignmentTool = "myAlignmentTool"
sequencing_read_experiment.pairedEnd = True
sequencing_read_experiment.insertSize = 0
sequencing_read_experiment.libraryStrategy = "WXS"
sequencing_read_experiment.referenceGenome = "GR37"
sequencing_read_experiment.set_info("randomSRField", "someSRValue")

Finally, use the SimplePayloadBuilder class along with the previously create entities to create a payload.

from overture_song.tools import SimplePayloadBuilder
builder = SimplePayloadBuilder(donor, specimen, sample, [file1, file2], sequencing_read_experiment)
payload = builder.to_dict()
Use a Custom AnalysisId

In some situations, the user may prefer to use a custom analysisId. If not specified in the payload, it is automatically generated by the SONG server during the Save the Analysis step. Although this tutorial uses the analysisId generated by the SONG server, a custom analysisId can be set as follows:

payload['analysisId'] = 'my_custom_analysis_id'
Upload the Payload

With the payload built, the data can now be uploaded to the SONG server for validation. There are 2 modes for validation:

  1. Synchronous - uploads are validated SYNCHRONOUSLY. Although this is the default mode, it can be selected by setting the kwarg is_async_validation to False from the upload method.
  2. Asynchronously - uploads are validated ASYNCHRONOUSLY. This allows the user to upload a batch of payloads. This mode can be selected by setting is_async_validation to True.

After calling the upload method, the payload will be sent to the SONG server for validation, and a response will be returned:

>>> api.upload(json_payload=payload, is_async_validation=False)
{
    "status": "ok",
    "uploadId": "UP-c49742d0-1fc8-4b45-9a1c-ea58d282ac58"
}

If the status field from the response is ok, this means the payload was successfully submitted to the SONG server for validation, and returned a randomly generated uploadId, which is a receipt for the upload request.

Check the Status of the Upload

Before continuing, the previous upload’s status must be checked using the status method, in order to ensure the payload was successfully validated. Using the previous uploadId, the status of the upload can be requested and will return the following response:

>>> api.status('UP-c49742d0-1fc8-4b45-9a1c-ea58d282ac58')
{
    "analysisId": "",
    "uploadId": "UP-c49742d0-1fc8-4b45-9a1c-ea58d282ac58",
    "studyId": "ABC123",
    "state": "VALIDATED",
    "createdAt": [
        2019,
        2,
        16,
        0,
        54,
        31,
        73774000
    ],
    "updatedAt": [
        2019,
        2,
        16,
        0,
        54,
        31,
        75476000
    ],
    "errors": [
        ""
    ],
    "payload": {
        "analysisState": "UNPUBLISHED",
        "sample": [
            {
                "info": {
                    "randomSample1Field": "someSample1Value"
                },
                "sampleSubmitterId": "ssId1",
                "sampleType": "RNA",
                "specimen": {
                    "info": {
                        "randomSpecimenField": "someSpecimenValue"
                    },
                    "specimenSubmitterId": "sp_sub_1",
                    "specimenClass": "Tumour",
                    "specimenType": "Normal - EBV immortalized"
                },
                "donor": {
                    "info": {
                        "randomDonorField": "someDonorValue"
                    },
                    "donorSubmitterId": "dsId1",
                    "studyId": "Study1",
                    "donorGender": "male"
                }
            }
        ],
        "file": [
            {
                "info": {
                    "randomFile1Field": "someFile1Value"
                },
                "fileName": "myFilename1.bam",
                "studyId": "Study1",
                "fileSize": 1234561,
                "fileType": "VCF",
                "fileMd5sum": "myMd51",
                "fileAccess": "controlled"
            },
            {
                "info": {
                    "randomFile2Field": "someFile2Value"
                },
                "fileName": "myFilename2.bam",
                "studyId": "Study1",
                "fileSize": 1234562,
                "fileType": "VCF",
                "fileMd5sum": "myMd52",
                "fileAccess": "controlled"
            }
        ],
        "analysisType": "sequencingRead",
        "experiment": {
            "info": {
                "randomSRField": "someSRValue"
            },
            "aligned": true,
            "alignmentTool": "myAlignmentTool",
            "insertSize": 0,
            "libraryStrategy": "WXS",
            "pairedEnd": true,
            "referenceGenome": "GR37"
        }
    }
}

In order to continue with the next section, the state field MUST have the value VALIDATED, which indicates the upload was validated and there were no errors. If there were errors, the state field would have the value VALIDATION_ERROR, and the field errors would contains details of the validation issues. If there is an error, the user can simply correct the payload, re-upload and check the status again.

Save the Analysis

Once the upload is successfully validated, the upload must be saved using the save method. This generates the following response:

>>> api.save(status_response.uploadId, ignore_analysis_id_collisions=False)
{
    "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
    "status": "ok"
}

The value of ok in the status field of the response indicates that an analysis was successfully created. The analysis will contain the same data as the payload, with the addition of server-side generated ids, which are generated by an centralized id server. By default, the request DOES NOT IGNORE analysisId collisions, however by setting the save method parameter ignore_analysis_id_collisions to True, collisions will be ignored. This mechanism is considered an override and is heavily discouraged, however it is necessary considering the complexities associated with managing genomic data.

Observe the UNPUBLISHED Analysis

Verify the analysis is unpublished by observing the value of the analysisState field in the response for the get_analysis call. The value should be UNPUBLISHED. Also, observe that the SONG server generated an unique sampleId, specimenId, analysisId and objectId:

>>> api.get_analysis('23c61f55-12b4-11e8-b46b-23a48c7b1324')
{
    "analysisType": "sequencingRead",
    "info": {},
    "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
    "study": "ABC123",
    "analysisState": "UNPUBLISHED",
    "sample": [
        {
            "info": {
                "randomSample1Field": "someSample1Value"
            },
            "sampleId": "SA599347",
            "specimenId": "SP196154",
            "sampleSubmitterId": "ssId1",
            "sampleType": "RNA",
            "specimen": {
                "info": {
                    "randomSpecimenField": "someSpecimenValue"
                },
                "specimenId": "SP196154",
                "donorId": "DO229595",
                "specimenSubmitterId": "sp_sub_1",
                "specimenClass": "Tumour",
                "specimenType": "Normal - EBV immortalized"
            },
            "donor": {
                "donorId": "DO229595",
                "donorSubmitterId": "dsId1",
                "studyId": "ABC123",
                "donorGender": "male",
                "info": {}
            }
        }
    ],
    "file": [
        {
            "info": {
                "randomFile1Field": "someFile1Value"
            },
            "objectId": "f553bbe8-876b-5a9c-a436-ff47ceef53fb",
            "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
            "fileName": "myFilename1.bam",
            "studyId": "ABC123",
            "fileSize": 1234561,
            "fileType": "VCF",
            "fileMd5sum": "myMd51                          ",
            "fileAccess": "controlled"
        },
        {
            "info": {
                "randomFile2Field": "someFile2Value"
            },
            "objectId": "6e2ee06b-e95d-536a-86b5-f2af9594185f",
            "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
            "fileName": "myFilename2.bam",
            "studyId": "ABC123",
            "fileSize": 1234562,
            "fileType": "VCF",
            "fileMd5sum": "myMd52                          ",
            "fileAccess": "controlled"
        }
    ],
    "experiment": {
        "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
        "aligned": true,
        "alignmentTool": "myAlignmentTool",
        "insertSize": 0,
        "libraryStrategy": "WXS",
        "pairedEnd": true,
        "referenceGenome": "GR37",
        "info": {
            "randomSRField": "someSRValue"
        }
    }
}
Generate the Manifest

With an analysis created, a manifest file must be generated using the ManifestClient , the analysisId from the previously generated analysis, a path to the directory containing the files to be uploaded , and an output file path. If the source_dir does not exist or if the files to be uploaded are not present in that directory , then an error will be shown. By calling the write_manifest method, a Manifest object is generated and then written to a file. This step is required for the next section involving the upload of the object files to the storage server.

from overture_song.client import ManifestClient
manifest_client = ManifestClient(api)
source_dir = "/path/to/directory/containing/files"
manifest_file_path = './manifest.txt'
manifest_client.write_manifest('23c61f55-12b4-11e8-b46b-23a48c7b1324', source_dir, manifest_file_path)

After successful execution, a manifest.txt file will be generated and will have the following contents:

23c61f55-12b4-11e8-b46b-23a48c7b1324
f553bbe8-876b-5a9c-a436-ff47ceef53fb    /path/to/directory/containing/files/myFilename1.bam    myMd51
6e2ee06b-e95d-536a-86b5-f2af9594185f    /path/to/directory/containing/files/myFilename2.bam    myMd52
Upload the Object Files

Upload the object files specified in the payload, using the icgc-storage-client and the manifest file. This will upload the files specified in the manifest.txt file, which should all be located in the same directory.

For Collaboratory - Toronto:

./bin/icgc-storage-client --profile collab   upload --manifest ./manifest.txt

For AWS - Virginia:

./bin/icgc-storage-client --profile aws   upload --manifest ./manifest.txt

See also

Refer to the SCORE Client section for more information about installation, configuration and usage.

Publish the Analysis

Using the same analysisId as before, publish it. Essentially, this is the handshake between the metadata stored in the SONG server (via the analysisIds) and the object files stored in the storage server (the files described by the analysisId)

>>> api.publish('23c61f55-12b4-11e8-b46b-23a48c7b1324')
AnalysisId 23c61f55-12b4-11e8-b46b-23a48c7b1324 successfully published
Observe the PUBLISHED Analysis

Finally, verify the analysis is published by observing the value of the analysisState field in the response for the get_analysis call. If the value is PUBLISHED, then congratulations on your first metadata upload!!

>>> api.get_analysis('23c61f55-12b4-11e8-b46b-23a48c7b1324')
{
    "analysisType": "sequencingRead",
    "info": {},
    "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
    "study": "ABC123",
    "analysisState": "PUBLISHED",
    "sample": [
        {
            "info": {
                "randomSample1Field": "someSample1Value"
            },
            "sampleId": "SA599347",
            "specimenId": "SP196154",
            "sampleSubmitterId": "ssId1",
            "sampleType": "RNA",
            "specimen": {
                "info": {
                    "randomSpecimenField": "someSpecimenValue"
                },
                "specimenId": "SP196154",
                "donorId": "DO229595",
                "specimenSubmitterId": "sp_sub_1",
                "specimenClass": "Tumour",
                "specimenType": "Normal - EBV immortalized"
            },
            "donor": {
                "donorId": "DO229595",
                "donorSubmitterId": "dsId1",
                "studyId": "ABC123",
                "donorGender": "male",
                "info": {}
            }
        }
    ],
    "file": [
        {
            "info": {
                "randomFile1Field": "someFile1Value"
            },
            "objectId": "f553bbe8-876b-5a9c-a436-ff47ceef53fb",
            "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
            "fileName": "myFilename1.bam",
            "studyId": "ABC123",
            "fileSize": 1234561,
            "fileType": "VCF",
            "fileMd5sum": "myMd51                          ",
            "fileAccess": "controlled"
        },
        {
            "info": {
                "randomFile2Field": "someFile2Value"
            },
            "objectId": "6e2ee06b-e95d-536a-86b5-f2af9594185f",
            "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
            "fileName": "myFilename2.bam",
            "studyId": "ABC123",
            "fileSize": 1234562,
            "fileType": "VCF",
            "fileMd5sum": "myMd52                          ",
            "fileAccess": "controlled"
        }
    ],
    "experiment": {
        "analysisId": "23c61f55-12b4-11e8-b46b-23a48c7b1324",
        "aligned": true,
        "alignmentTool": "myAlignmentTool",
        "insertSize": 0,
        "libraryStrategy": "WXS",
        "pairedEnd": true,
        "referenceGenome": "GR37",
        "info": {
            "randomSRField": "someSRValue"
        }
    }
}

Reference

The Song module implements a simple Python REST client that can be used to access a song server

client
class overture_song.client.Api(config)[source]

Bases: object

check_is_alive()[source]
config
get_all_studies()[source]
get_analysis(analysis_id)[source]
get_analysis_files(analysis_id)[source]
get_entire_study(study_id)[source]
get_schema(schema_id)[source]
get_study(study_id)[source]
is_alive()[source]
list_schemas()[source]
publish(analysis_id)[source]
save(upload_id, ignore_analysis_id_collisions=False)[source]
save_study(study)[source]
status(upload_id)[source]
suppress(analysis_id)[source]
unpublish(analysis_id)[source]
update_file(object_id, file_update_request)[source]
upload(json_payload, is_async_validation=False)[source]
class overture_song.client.Endpoints(server_url)[source]

Bases: object

get_all_studies()[source]
get_analysis(study_id, analysis_id)[source]
get_analysis_files(study_id, analysis_id)[source]
get_entire_study(study_id)[source]
get_schema(schema_id)[source]
get_study(study_id)[source]
is_alive()[source]
list_schemas()[source]
publish(study_id, analysis_id)[source]
save_by_id(study_id, upload_id, ignore_analysis_id_collisions)[source]
save_study(study_id)[source]
status(study_id, upload_id)[source]
suppress(study_id, analysis_id)[source]
unpublish(study_id, analysis_id)[source]
update_file(study_id, object_id)[source]
upload(study_id, is_async_validation=False)[source]
class overture_song.client.ManifestClient(api)[source]

Bases: object

create_manifest(source_dir, analysis_id)[source]
write_manifest(analysis_id, source_dir, output_file_path)[source]
class overture_song.client.StudyClient(api)[source]

Bases: object

create(study)[source]
has(study_id)[source]
read(study_id)[source]
class overture_song.client.UploadClient(api)[source]

Bases: object

check_upload_status(upload_id)[source]
publish(analysis_id)[source]
save(upload_id, ignore_analysis_id_collisions=False)[source]
upload_file(file_path, is_async_validation=False)[source]
entities
class overture_song.entities.Analysis(analysisId: str = None, study: str = None, analysisState: str = 'UNPUBLISHED', sample: List[overture_song.entities.CompositeEntity] = None, file: List[overture_song.entities.File] = None)[source]

Bases: overture_song.entities.Entity

analysisId = None
analysisState = 'UNPUBLISHED'
classmethod builder()[source]
file = None
classmethod from_json(json_string)[source]
sample = None
study = None
class overture_song.entities.CompositeEntity(info: dict = None, sampleId: str = None, specimenId: str = None, sampleSubmitterId: str = None, sampleType: str = None, specimen: Type[overture_song.entities.Specimen] = None, donor: Type[overture_song.entities.Donor] = None)[source]

Bases: overture_song.entities.Sample

classmethod base_on_sample(sample)[source]
classmethod create(donor, specimen, sample)[source]
donor = None
specimen = None
validate()[source]
class overture_song.entities.Donor(info: dict = None, donorId: str = None, donorSubmitterId: str = None, studyId: str = None, donorGender: str = None)[source]

Bases: overture_song.entities.Metadata, overture_song.validation.Validatable

classmethod create(donorSubmitterId, studyId, donorGender, donorId=None, info=None)[source]
donorGender = None
donorId = None
donorSubmitterId = None
studyId = None
validate()[source]
class overture_song.entities.Entity[source]

Bases: object

to_dict()[source]
to_json()[source]
class overture_song.entities.Experiment(info: dict = None)[source]

Bases: overture_song.entities.Metadata

class overture_song.entities.File(info: dict = None, objectId: str = None, analysisId: str = None, fileName: str = None, studyId: str = None, fileSize: int = -1, fileType: str = None, fileMd5sum: str = None, fileAccess: str = None)[source]

Bases: overture_song.entities.Metadata, overture_song.validation.Validatable

analysisId = None
classmethod create(fileName, fileSize, fileType, fileMd5sum, fileAccess, studyId=None, analysisId=None, objectId=None, info=None)[source]
fileAccess = None
fileMd5sum = None
fileName = None
fileSize = -1
fileType = None
objectId = None
studyId = None
validate()[source]
class overture_song.entities.Metadata(info: dict = None)[source]

Bases: overture_song.entities.Entity

add_info(data: dict)[source]
info = None
set_info(key: str, value: Any)[source]
class overture_song.entities.Sample(info: dict = None, sampleId: str = None, specimenId: str = None, sampleSubmitterId: str = None, sampleType: str = None)[source]

Bases: overture_song.entities.Metadata, overture_song.validation.Validatable

classmethod create(specimenId, sampleSubmitterId, sampleType, sampleId=None, info=None)[source]
sampleId = None
sampleSubmitterId = None
sampleType = None
specimenId = None
validate()[source]
class overture_song.entities.SequencingRead(info: dict = None, analysisId: str = None, aligned: bool = None, alignmentTool: str = None, insertSize: int = None, libraryStrategy: str = None, pairedEnd: bool = None, referenceGenome: str = None)[source]

Bases: overture_song.entities.Experiment, overture_song.validation.Validatable

aligned = None
alignmentTool = None
analysisId = None
classmethod builder()[source]
classmethod create(aligned, alignmentTool, insertSize, libraryStrategy, pairedEnd, referenceGenome, analysisId=None)[source]
insertSize = None
libraryStrategy = None
pairedEnd = None
referenceGenome = None
validate()[source]
class overture_song.entities.SequencingReadAnalysis(analysisId: str = None, study: str = None, analysisState: str = 'UNPUBLISHED', sample: List[overture_song.entities.CompositeEntity] = None, file: List[overture_song.entities.File] = None, analysisType: str = 'sequencingRead', experiment: Type[overture_song.entities.SequencingRead] = None)[source]

Bases: overture_song.entities.Analysis, overture_song.validation.Validatable

analysisType = 'sequencingRead'
classmethod create(experiment, sample, file, analysisId=None, study=None, analysisState='UNPUBLISHED', info=None)[source]
experiment = None
validate()[source]
class overture_song.entities.Specimen(info: dict = None, specimenId: str = None, donorId: str = None, specimenSubmitterId: str = None, specimenClass: str = None, specimenType: str = None)[source]

Bases: overture_song.entities.Metadata, overture_song.validation.Validatable

classmethod create(donorId, specimenSubmitterId, specimenClass, specimenType, specimenId=None, info=None)[source]
donorId = None
specimenClass = None
specimenId = None
specimenSubmitterId = None
specimenType = None
validate()[source]
class overture_song.entities.Study(info: dict = None, studyId: str = None, name: str = None, organization: str = None, description: str = None)[source]

Bases: overture_song.entities.Metadata, overture_song.validation.Validatable

classmethod create(studyId, name=None, description=None, organization=None)[source]
classmethod create_from_raw(study_obj)[source]
description = None
name = None
organization = None
studyId = None
validate()[source]
class overture_song.entities.VariantCall(info: dict = None, analysisId: str = None, variantCallingTool: str = None, matchedNormalSampleSubmitterId: str = None)[source]

Bases: overture_song.entities.Experiment, overture_song.validation.Validatable

analysisId = None
classmethod create(variantCallingTool, matchedNormalSampleSubmitterId, analysisId=None)[source]
matchedNormalSampleSubmitterId = None
validate()[source]
variantCallingTool = None
class overture_song.entities.VariantCallAnalysis(analysisId: str = None, study: str = None, analysisState: str = 'UNPUBLISHED', sample: List[overture_song.entities.CompositeEntity] = None, file: List[overture_song.entities.File] = None, analysisType: str = 'variantCall', experiment: Type[overture_song.entities.VariantCall] = None)[source]

Bases: overture_song.entities.Analysis, overture_song.validation.Validatable

analysisType = 'variantCall'
classmethod create(experiment, sample, file, analysisId=None, study=None, analysisState='UNPUBLISHED', info=None)[source]
experiment = None
validate()[source]
model
class overture_song.model.ApiConfig(server_url: str, study_id: str, access_token: str, debug: bool = False)[source]

Configuration object for the SONG Api

Parameters:
  • server_url (str) – URL of a running song-server
  • study_id (str) – StudyId to interact with
  • access_token (str) – access token used to authorize the song-server api
  • debug (bool) – Enable debug mode
debug = False
class overture_song.model.FileUpdateRequest[source]

Mutable request object used to update file data.

Parameters:
  • file_md5 (str) – MD5 checksum value to update
  • file_size (int) – File size (bytes) to update
  • file_access (int) – Access type to update
  • file_info (dict) – json info metadata to update
fileAccess = None
fileMd5sum = None
fileSize = None
info = None
class overture_song.model.Manifest(analysis_id, manifest_entries=None)[source]

Object representing the contents of a manifest file

Parameters:
  • analysis_id (str) – analysisId associated with a collection of ManifestEntry
  • manifest_entries (List[ManifestEntry] or None) – optionally initialize a manifest with an existing list of ManifestEntry
add_entry(manifest_entry)[source]

Add a ManifestEntry to this manifest

Parameters:manifest_entry (ManifestEntry) – entry to add
Returns:None
write(output_file_path, overwrite=False)[source]

Write this manifest entry to a file

Parameters:
  • output_file_path (str) – output file to write to
  • overwrite (boolean) – if true, overwrite the file if it exists
Raises:

SongClientException – throws this exception if the file exists and overwrite is False

Returns:

None

class overture_song.model.ManifestEntry(fileId: str, fileName: str, md5sum: str)[source]

Represents a line in the manifest file pertaining to a file. The string representation of this object is the TSV of the 3 field values

Parameters:
  • fileId (str) – ObjectId of the file
  • fileName (str) – name of the file. Should not include directories
  • md5sum (str) – MD5 checksum of the file
classmethod create_manifest_entry(input_dir, data)[source]

Creates a ManifestEntry object

Parameters:data – Any object with members named ‘objectId’, ‘fileName’, ‘fileMd5sum’.
Returns:ManifestEntry object
class overture_song.model.ServerErrors[source]

Server error definitions used for classifying SongErrors

STUDY_ID_DOES_NOT_EXIST = 1
get_error_id()[source]

Get the error id for this error :return string

resolve_server_error = <bound method ServerErrors.resolve_server_error of <enum 'ServerErrors'>>[source]
exception overture_song.model.SongError(errorId: str, httpStatusName: str, httpStatusCode: int, message: str, requestUrl: str, debugMessage: str, timestamp: str, stackTrace: tuple = <factory>)[source]

Object containing data related to a song server error

Parameters:
  • errorId (str) – The id for the song server error. Used to give more meaning to errors instead of just using http status codes.
  • httpStatusName (str) – Standard http status name for a http status code
  • httpStatusCode (int) – Standard http status code
  • message (str) – Text describing the error
  • requestUrl (str) – The request url that caused this error
  • debugMessage (str) – Additional text describing the error
  • timestamp (str) – Epoch time of when this error occurred
  • stackTrace (tuple) – Server stacktrace of this error
classmethod create_song_error(data)[source]

Creates a new song error object. Used to convert a json/dict server error response to a python object

Parameters:data (dict) – input dictionary containing all the fields neccessary to create a song server error
Return type:SongError
classmethod get_field_names()[source]

Get the field names associated with a SongError

Return type:List[str]
classmethod is_song_error(data)[source]

Determine if the input dictionary contains all the fields defined in a SongError

Parameters:data (dict) – input dictionary containing all the fields neccessary to create a song server error
Returns:true if the data contains all the fields, otherwise false
Return type:boolean
rest
class overture_song.rest.HeaderGenerator(access_token=None)[source]

Bases: object

get_json_header()[source]
get_plain_header()[source]
class overture_song.rest.ObjectRest(*args, **kwargs)[source]

Bases: overture_song.rest.Rest

class overture_song.rest.Rest(access_token=None, debug=False)[source]

Bases: object

get(url)[source]
get_with_params(url, **kwargs)[source]
post(url, dict_data=None)[source]
put(url, dict_data=None)[source]
overture_song.rest.intercept_response(orig_function, debug=False, convert_to_json=False, convert_to_generic_object=False)[source]
tools
class overture_song.tools.BatchUploader(server_url, access_token, payload_dir, debug=False)[source]

Bases: object

get_all_files()[source]
get_file(study_id, filename)[source]
get_files(study_id)[source]
get_studies()[source]
print_upload_states()[source]
publish_all()[source]
save_all()[source]
status_all()[source]
upload_all()[source]
class overture_song.tools.FileUploadClient(api, filename, is_async_validation=False, ignore_analysis_id_collisions=False)[source]

Bases: object

publish()[source]
save()[source]
update_status()[source]
upload()[source]
class overture_song.tools.FileUploadState[source]

Bases: enum.Enum

An enumeration.

NOT_UPLOADED = 0
PUBLISHED = 4
PUBLISH_ERROR = -2
SAVED = 3
SAVE_ERROR = -3
STATUS_ERROR = -5
SUBMITTED = 1
UNKNOWN_ERROR = -1
UPLOAD_ERROR = -6
VALIDATED = 2
VALIDATION_ERROR = -4
class overture_song.tools.SimplePayloadBuilder(donor: Type[overture_song.entities.Donor], specimen: Type[overture_song.entities.Specimen], sample: Type[overture_song.entities.Sample], files: List[overture_song.entities.File], experiment: object, analysisId: str = None)[source]

Bases: object

analysisId = None
to_dict()[source]
utils
class overture_song.utils.Builder(class_type: type)[source]

Bases: object

build()[source]
class overture_song.utils.GenericObjectType[source]

Bases: type

display()[source]
to_dict()[source]
to_pretty_string()[source]
exception overture_song.utils.SongClientException(error_id, message)[source]

Bases: Exception

class overture_song.utils.Stack[source]

Bases: object

is_empty()[source]
peek()[source]
pop()[source]
push(item)[source]
size()[source]
overture_song.utils.check_dir(dirname)[source]
overture_song.utils.check_file(filename)[source]
overture_song.utils.check_song_state(expression, error_id, formatted_message, *args)[source]
overture_song.utils.check_state(expression, formatted_message, *args)[source]
overture_song.utils.check_type(instance, class_type)[source]
overture_song.utils.convert_to_url_param_list(delimeter='=', **kwargs)[source]
overture_song.utils.create_dir(dir_path)[source]
overture_song.utils.default_value(value, init)[source]
overture_song.utils.get_optional_field(dic, field)[source]
overture_song.utils.get_required_field(dic, field)[source]
overture_song.utils.objectize(original_function)[source]
overture_song.utils.repeat(value, repeat_number)[source]
overture_song.utils.setup_output_file_path(file_path)[source]
overture_song.utils.tab_repeat(repeat_number)[source]
overture_song.utils.to_generic_object(type_name, input_object)[source]

Convert a dictionary to an object (recursive).

overture_song.utils.to_pretty_json_string(json_data_string)[source]
overture_song.utils.whitespace_repeat(repeat_number)[source]
overture_song.utils.write_object(obj, output_file_path, overwrite=False)[source]
validation
class overture_song.validation.DataField(name, *types, required=True, multiple=False)[source]

Bases: object

validate(value)[source]
class overture_song.validation.Validatable[source]

Bases: object

validate()[source]
overture_song.validation.non_null(exclude=None)[source]
class overture_song.validation.validation(*datafields)[source]

Bases: object

Frequently Asked Questions

How can I find the latest official release via command line?

Official releases can alternatively be found via command line as follows:

  1. Execute an unauthenticated github request (rate limited to 60 requests/hour) using curl and jq
curl --silent "https://api.github.com/repos/overture-stack/SONG/releases" | jq '.[].tag_name | match("^song-\\d+\\.\\d+\\.\\d+$") | .string' | head -1 | xargs echo

OR

  1. Execute an authenticated github request (rate limited to 5000 requests/hour) using curl and jq
curl --silent -H "Authorization: Bearer $MY_GITHUB_OAUTH_TOKEN" "https://api.github.com/repos/overture-stack/SONG/releases" | jq '.[].tag_name | match("^song-\\d+\\.\\d+\\.\\d+$") | .string' | head -1 | xargs echo

Contribute

If you’d like to contribute to this project, it’s hosted on github.