Welcome to SciDataLib’s documentation!

Python library for writing SciData JSON-LD files

class scidatalib.scidata.SciData(uid: str)[source]

This class is used to create and populate a SciData object, to be output as a SciData JSON-LD document

A SciData object is created by calling the SciData class i.e. SciDataObject = SciData(<uid>)

The meta variable defines the keys that make up the backbone structure of the JSON-LD document. Class methods are called to populate the meta keys

context(context: [<class 'str'>, <class 'list'>], replace=False) list[source]

Add to or replace the list of external context files

Parameters
  • context – context URL string or list of context URL strings

  • replace – boolean to replace or not the existing data

When called, the content URL content of the @context JSON object will be replaced or updated with the supplied list of context urls

Example:

SciDataObject.context(
['https://stuchalk.github.io/scidata/contexts/scidata.jsonld'])
namespaces(namespaces: dict, replace=False) dict[source]

Add to or replace the dictionary of namespaces within @context. Namespaces are needed for values in a file that reference external resources the define something (vocabulary/taxonomy/ontology entries).

Parameters
  • namespaces – dictionary of namespaces (key->ns, val->URI start)

  • replace – boolean to replace or not the existing data

When called, the dictionary of namespaces within the @context key of the meta variable will be replaced or updated with the supplied dictionary of namespaces

Example:

SciDataObject.namespaces(
{"sci": "https://stuchalk.github.io/scidata/
ontology/scidata.owl#"})
base(base: str) dict[source]

Assign the JSON-LD @base URL (also defines @id’ under @graph’ for consistency) See: https://www.w3.org/TR/json-ld/#base-iri

Parameters

base – @base URL for a JSON-LD file

Defines the base url for all internal unique identifiers (define though @id’s). For consistency, the codes also sets the @id’ field under @graph’ so that all triple subjects are unique and associated with the same graph

Example:

SciDataObject.graph_uid("<uniqueidentifier>")
docid(docid: str) dict[source]

Assign the document identifier. This will become the graph name if the file is uploaded to a graph database

Parameters

docid – the root level @id value

version(version: str) dict[source]

Assign the version of this file (not the version of the data)

Parameters

version – the top level ‘version’ value

graph_uid(guid: str) dict[source]

Assign the uid value within the @graph JSON object

Parameters

guid – the @graph uid value

Normally the same as the unique id used in the @graph @id value and used to easily find the data in a file system.

Example:

SciDataObject.graph_uid("<uniqueidentifier>")
author(authors: list, replace=False) list[source]

Add to or replace the list of authors within the @graph authors section

Parameters
  • authors – list of names, or list of dicts with multiple fields

  • replace – boolean to replace or not the existing data

Add the list of authors of a set of data with the following defined fields in the SciData context file: name, address, organization, email, orcid.

Expects either:

1) a list of dictionaries where each dictionary contains at minimum of a key that is ‘name’

Example:

SciDataObject.author(
[{'name': 'George Washington', 'ORCID': 1},
{'name': 'John Adams', 'ORCID': 2}])
  1. a list of strings which are author names

Example:

SciDataObject.author(['George Washington', 'John Adams'])
title(title: str) str[source]

Used to create or replace title key within @graph

Parameters

title – descriptive title of the dataset

For a data source such as a journal article, this would typically be the title of the article

Example:

SciDataObject.title("The Hitchhiker's Guide to the Galaxy")
description(description: str) str[source]

Assign the description field within @graph

Parameters

description – textual description of the dataset

Used as a brief description of the type of data. For a journal article, this might house the abstract

Example:

SciDataObject.description('a brief description')
publisher(publisher: str) str[source]

Assign the publisher field within @graph

:param publisher - the name or title of the publisher of the data

This is a person, project, research group, organization etc.

Example:

SciDataObject.publisher('The Daily Prophet')
graphversion(version: str) str[source]

Assign the data version

Parameters

version – the version assigned to the data

If a version is not available, the date it was accessed online can be used to indicate the ‘state’ of the data as downloaded

Example:

SciDataObject.graphversion('ChEMBL database v28')
keywords(keywords: [<class 'str'>, <class 'list'>], replace=False) list[source]

Add to or replace the keywords of the instance

Parameters
  • keywords – important keywords to improve data findability

  • replace – boolean to replace or not the existing data

Example:

SciDataObject.keywords('important')
starttime(stime: str) str[source]

Assign the start time

Parameters

stime – datetime string

Typically in “%m-%d-%y %H:%M:%S” format

Example:

SciDataObject.starttime('04-05-21 06:14:53')

Assign the document permanent link

Parameters

link – URL to the location where this document can be found

Example:

SciDataObject.permalink('https://permanent.link.com/data1')
related(related: [<class 'str'>, <class 'list'>], replace=False) list[source]

Add to or replace the related URLs

Parameters
  • related – URLs to other data related to this dataset

  • replace – boolean to replace or not the existing data

Example:

SciDataObject.related('http://example.com/greatdata.jsonld')
ids(ids: [<class 'str'>, <class 'list'>]) list[source]

Add to the ids list

Parameters

ids – string or list of strings that are external references to ontological concepts

When called the contents of ‘ids’ is added to the ids list. Note that when the output function is called it iterates over instance content to find any values that are ontological references, in the format “<namespace>:<uniquevalue>”, and adds them to ids. Only ids provided in this format will be added and duplicates are ignored. Remember to add namespaces for ids.

Example:

SciDataObject.ids(['chebi:00001','qudt:GM'])

(requires the addition of the ‘chebi’ namespace)

discipline(disc: str) str[source]

Assign the discipline area of the data’

Parameters

disc – a discipline name or identifier (preferred)

Best practice is to use and entry in an ontology, i.e. the Modern Science Ontology (https://w3id.org/skgo/modsci#)

Example:

SciDataObject.discipline('w3i:Chemistry')

(requires the addition of the ‘w3i’ namespace)

subdiscipline(subdisc: str) str[source]

Assign the subdiscipline area of the data

Parameters

subdisc – a subdiscipline name or identifier (preferred)

Best practice is to use and entry in an ontology, i.e. the Modern Science Ontology (https://w3id.org/skgo/modsci#)

Example:

SciDataObject.subdiscipline('w3i:AnalyticalChemistry')
evaluation(evaln: str) str[source]

Assign the evaluation field

Parameters

evaln – the method of evaluation of research data

Recommended values of this field are: experimental, theoretical, computational

Example:

SciDataObject.evaluation('experimental')
aspects(aspects: list) list[source]

Add to or replace the aspects of the file

Example:

SciDataObject.aspects(
[{"@id": "assay",
 "@type": "sdo:assay",
 "description": "Inhibition of human ERG "
                "by MK499 binding assay",
 "assay_organism": "Homo sapiens"}])

Method also accepts keyword ‘#intlinks’. See documentation for def scidatapackage.

facets(facets: list) list[source]

Add to or replace the facets of the file

Example:

SciDataObject.facets(
[{"@id": "compound",
"@type": "sdo:compound",
"mw_freebase": "491.52",
"full_molformula": "C26H26FN5O4"}])

Method also accepts keyword ‘#intlinks’. See documentation for def scidatapackage.

scope(scope: [<class 'str'>, <class 'list'>]) str[source]

Assign what thing(s) the dataset relates to

Parameters

scope – str or list of internal unique id()s of entity(ies) in the system to which the data describes

The scope of a datasets should be described in the ‘system’ ‘facets’ section, e.g. chemical system, organism, specimen, should be included as a scope using the defined unique @id’ for that section

Example:

SciDataObject.scope('chemicalsystem/1/')
attribute(attributes: list) list[source]

Add one or more attributes

datapoint(points: list) list[source]

Add one or more datapoints

Example:

SciDataObject.datapoint(
[{"@id": "datapoint",
 "@type": "sdo:datapoint",
 "data": [{"@id": "datum",
           "@type": "sdo:exptdata",
           "type": "IC50",
           "value": "15.2",
           "units": "uM"}]}])
dataseries(series: list) list[source]

Add one or more dataseries

datagroup(group: list) list[source]

Add one or more datagroups

scidatapacket(packet)[source]
Add a packet of data where the datapoints are linked

with the associated aspects and facets

Template:

packet = [{'aspects':{},'facets':{},'datapoint':{}},
    {'aspects':{},'facets':{},'datapoint':{}}]

Example:

SciDataObject.scidatapacket([{
"aspects": [
    {"@id": "assay",
    "@type": "sdo:assay",
    "description": "Inhibition of human ERG
     by MK499 binding assay",
    "assay_organism": "Homo sapiens"}],
"facets": [
    {"@id": "compound",
    "@type": "sdo:compound",
    "mw_freebase": "491.52",
    "full_molformula": "C26H26FN5O4",
    "#intlinks": [
        {"@id": "identifier",
        "@type": "sdo:identifier",
        "standard_inchi_key": "OINHUVBCKUJZAG-UHFFFAOYSA-N"}]},
    {"@id": "target",
    "@type": "sdo:target",
    "pref_name": "HERG",
    "tax_id": 9606,
    "organism": "Homo sapiens"}],
"dataset": [
    {"@id": "datapoint",
    "@type": "sdo:datapoint",
    "data":[
        {"@id": "datum",
        "@type": "sdo:exptdata",
        "type": "IC50",
        "value": "15.2",
        "units": "uM"}]}]}])
sources(sources: list, replace=False) dict[source]

Add to or replace the source reference list

Parameters
  • sources – information about where the data came from

  • replace – boolean to replace or not the existing data

Add a list of sources with any of the available defined fields in the SciData context file: citation, reftype, url, doi

Example:

SciDataObject.sources([
{'citation': 'Chalk, S.J. SciData: a data model and
ontology for semantic representation of scientific data.
J Cheminform 8, 54 (2016)',
doi': https://doi.org/10.1186/s13321-016-0168-9'}])
rights(holder: str, license: str) dict[source]

Add the rights section to the file (max: 1 entry)

Parameters
  • holder – the entity that holds the license to this data

  • license – the assigned license

property output: dict

Cleans up Scidata Object

Indices and tables