Welcome to SciDataLib’s documentation!¶
Python library for writing SciData JSON-LD files
- class scidatalib.scidata.SciData(uid: str)[source]¶
This class is used to create and populate a SciData object, to be output as a SciData JSON-LD document
A SciData object is created by calling the SciData class i.e. SciDataObject = SciData(<uid>)
The meta variable defines the keys that make up the backbone structure of the JSON-LD document. Class methods are called to populate the meta keys
- context(context: [<class 'str'>, <class 'list'>], replace=False) list [source]¶
Add to or replace the list of external context files
- Parameters
context – context URL string or list of context URL strings
replace – boolean to replace or not the existing data
When called, the content URL content of the @context JSON object will be replaced or updated with the supplied list of context urls
Example:
SciDataObject.context( ['https://stuchalk.github.io/scidata/contexts/scidata.jsonld'])
- namespaces(namespaces: dict, replace=False) dict [source]¶
Add to or replace the dictionary of namespaces within @context. Namespaces are needed for values in a file that reference external resources the define something (vocabulary/taxonomy/ontology entries).
- Parameters
namespaces – dictionary of namespaces (key->ns, val->URI start)
replace – boolean to replace or not the existing data
When called, the dictionary of namespaces within the @context key of the meta variable will be replaced or updated with the supplied dictionary of namespaces
Example:
SciDataObject.namespaces( {"sci": "https://stuchalk.github.io/scidata/ ontology/scidata.owl#"})
- base(base: str) dict [source]¶
Assign the JSON-LD @base URL (also defines ‘@id’ under ‘@graph’ for consistency) See: https://www.w3.org/TR/json-ld/#base-iri
- Parameters
base – @base URL for a JSON-LD file
Defines the base url for all internal unique identifiers (define though ‘@id’s). For consistency, the codes also sets the ‘@id’ field under ‘@graph’ so that all triple subjects are unique and associated with the same graph
Example:
SciDataObject.graph_uid("<uniqueidentifier>")
- docid(docid: str) dict [source]¶
Assign the document identifier. This will become the graph name if the file is uploaded to a graph database
- Parameters
docid – the root level @id value
- version(version: str) dict [source]¶
Assign the version of this file (not the version of the data)
- Parameters
version – the top level ‘version’ value
- graph_uid(guid: str) dict [source]¶
Assign the uid value within the @graph JSON object
- Parameters
guid – the @graph uid value
Normally the same as the unique id used in the @graph @id value and used to easily find the data in a file system.
Example:
SciDataObject.graph_uid("<uniqueidentifier>")
- author(authors: list, replace=False) list [source]¶
Add to or replace the list of authors within the @graph authors section
- Parameters
authors – list of names, or list of dicts with multiple fields
replace – boolean to replace or not the existing data
Add the list of authors of a set of data with the following defined fields in the SciData context file: name, address, organization, email, orcid.
Expects either:
1) a list of dictionaries where each dictionary contains at minimum of a key that is ‘name’
Example:
SciDataObject.author( [{'name': 'George Washington', 'ORCID': 1}, {'name': 'John Adams', 'ORCID': 2}])
a list of strings which are author names
Example:
SciDataObject.author(['George Washington', 'John Adams'])
- title(title: str) str [source]¶
Used to create or replace title key within @graph
- Parameters
title – descriptive title of the dataset
For a data source such as a journal article, this would typically be the title of the article
Example:
SciDataObject.title("The Hitchhiker's Guide to the Galaxy")
- description(description: str) str [source]¶
Assign the description field within @graph
- Parameters
description – textual description of the dataset
Used as a brief description of the type of data. For a journal article, this might house the abstract
Example:
SciDataObject.description('a brief description')
- publisher(publisher: str) str [source]¶
Assign the publisher field within @graph
:param publisher - the name or title of the publisher of the data
This is a person, project, research group, organization etc.
Example:
SciDataObject.publisher('The Daily Prophet')
- graphversion(version: str) str [source]¶
Assign the data version
- Parameters
version – the version assigned to the data
If a version is not available, the date it was accessed online can be used to indicate the ‘state’ of the data as downloaded
Example:
SciDataObject.graphversion('ChEMBL database v28')
- keywords(keywords: [<class 'str'>, <class 'list'>], replace=False) list [source]¶
Add to or replace the keywords of the instance
- Parameters
keywords – important keywords to improve data findability
replace – boolean to replace or not the existing data
Example:
SciDataObject.keywords('important')
- starttime(stime: str) str [source]¶
Assign the start time
- Parameters
stime – datetime string
Typically in “%m-%d-%y %H:%M:%S” format
Example:
SciDataObject.starttime('04-05-21 06:14:53')
- permalink(link: str) dict [source]¶
Assign the document permanent link
- Parameters
link – URL to the location where this document can be found
Example:
SciDataObject.permalink('https://permanent.link.com/data1')
Add to or replace the related URLs
- Parameters
related – URLs to other data related to this dataset
replace – boolean to replace or not the existing data
Example:
SciDataObject.related('http://example.com/greatdata.jsonld')
- ids(ids: [<class 'str'>, <class 'list'>]) list [source]¶
Add to the ids list
- Parameters
ids – string or list of strings that are external references to ontological concepts
When called the contents of ‘ids’ is added to the ids list. Note that when the output function is called it iterates over instance content to find any values that are ontological references, in the format “<namespace>:<uniquevalue>”, and adds them to ids. Only ids provided in this format will be added and duplicates are ignored. Remember to add namespaces for ids.
Example:
SciDataObject.ids(['chebi:00001','qudt:GM'])
(requires the addition of the ‘chebi’ namespace)
- discipline(disc: str) str [source]¶
Assign the discipline area of the data’
- Parameters
disc – a discipline name or identifier (preferred)
Best practice is to use and entry in an ontology, i.e. the Modern Science Ontology (https://w3id.org/skgo/modsci#)
Example:
SciDataObject.discipline('w3i:Chemistry')
(requires the addition of the ‘w3i’ namespace)
- subdiscipline(subdisc: str) str [source]¶
Assign the subdiscipline area of the data
- Parameters
subdisc – a subdiscipline name or identifier (preferred)
Best practice is to use and entry in an ontology, i.e. the Modern Science Ontology (https://w3id.org/skgo/modsci#)
Example:
SciDataObject.subdiscipline('w3i:AnalyticalChemistry')
- evaluation(evaln: str) str [source]¶
Assign the evaluation field
- Parameters
evaln – the method of evaluation of research data
Recommended values of this field are: experimental, theoretical, computational
Example:
SciDataObject.evaluation('experimental')
- aspects(aspects: list) list [source]¶
Add to or replace the aspects of the file
Example:
SciDataObject.aspects( [{"@id": "assay", "@type": "sdo:assay", "description": "Inhibition of human ERG " "by MK499 binding assay", "assay_organism": "Homo sapiens"}])
Method also accepts keyword ‘#intlinks’. See documentation for def scidatapackage.
- facets(facets: list) list [source]¶
Add to or replace the facets of the file
Example:
SciDataObject.facets( [{"@id": "compound", "@type": "sdo:compound", "mw_freebase": "491.52", "full_molformula": "C26H26FN5O4"}])
Method also accepts keyword ‘#intlinks’. See documentation for def scidatapackage.
- scope(scope: [<class 'str'>, <class 'list'>]) str [source]¶
Assign what thing(s) the dataset relates to
- Parameters
scope – str or list of internal unique id()s of entity(ies) in the system to which the data describes
The scope of a datasets should be described in the ‘system’ ‘facets’ section, e.g. chemical system, organism, specimen, should be included as a scope using the defined unique ‘@id’ for that section
Example:
SciDataObject.scope('chemicalsystem/1/')
- datapoint(points: list) list [source]¶
Add one or more datapoints
Example:
SciDataObject.datapoint( [{"@id": "datapoint", "@type": "sdo:datapoint", "data": [{"@id": "datum", "@type": "sdo:exptdata", "type": "IC50", "value": "15.2", "units": "uM"}]}])
- scidatapacket(packet)[source]¶
- Add a packet of data where the datapoints are linked
with the associated aspects and facets
Template:
packet = [{'aspects':{},'facets':{},'datapoint':{}}, {'aspects':{},'facets':{},'datapoint':{}}]
Example:
SciDataObject.scidatapacket([{ "aspects": [ {"@id": "assay", "@type": "sdo:assay", "description": "Inhibition of human ERG by MK499 binding assay", "assay_organism": "Homo sapiens"}], "facets": [ {"@id": "compound", "@type": "sdo:compound", "mw_freebase": "491.52", "full_molformula": "C26H26FN5O4", "#intlinks": [ {"@id": "identifier", "@type": "sdo:identifier", "standard_inchi_key": "OINHUVBCKUJZAG-UHFFFAOYSA-N"}]}, {"@id": "target", "@type": "sdo:target", "pref_name": "HERG", "tax_id": 9606, "organism": "Homo sapiens"}], "dataset": [ {"@id": "datapoint", "@type": "sdo:datapoint", "data":[ {"@id": "datum", "@type": "sdo:exptdata", "type": "IC50", "value": "15.2", "units": "uM"}]}]}])
- sources(sources: list, replace=False) dict [source]¶
Add to or replace the source reference list
- Parameters
sources – information about where the data came from
replace – boolean to replace or not the existing data
Add a list of sources with any of the available defined fields in the SciData context file: citation, reftype, url, doi
Example:
SciDataObject.sources([ {'citation': 'Chalk, S.J. SciData: a data model and ontology for semantic representation of scientific data. J Cheminform 8, 54 (2016)', doi': https://doi.org/10.1186/s13321-016-0168-9'}])
- rights(holder: str, license: str) dict [source]¶
Add the rights section to the file (max: 1 entry)
- Parameters
holder – the entity that holds the license to this data
license – the assigned license
- property output: dict¶
Cleans up Scidata Object