Basic functionality for creating Nif data#

Below a short overview of the functionality of nifigator is given.

Nif contexts#

A NifContext contains a string of a document or part of it. Here’s how to create a simple NifContext.

# The NifContext contains a context which uses a URI scheme
from nifigator import NifContext, OffsetBasedString

# Make a context by passing uri, uri scheme and string
context = NifContext(
  uri="https://mangosaurus.eu/rdf-data/doc_1",
  URIScheme=OffsetBasedString,
  isString="The cat sat on the mat. Felix was his name."
)

# Show the string representation of the context
print(context)

The output shows the string representation of the NifContext object:

(nif:Context) uri = <https://mangosaurus.eu/rdf-data/doc_1&nif=context>
  isString : "The cat sat on the mat. Felix was his name."

Note

Of each Nif object the string representation starts with the object class name between parenthesis and the specific uri of the Nif object. In the lines below the Nif predicates are shown with their value. In this case only one predicate is specified (isString).

Linguistic annotations#

It is possible to add linguistic annotations to the context from the output of a Stanza pipeline.

import stanza

# Create the Stanza pipeline for English language
nlp = stanza.Pipeline("en", verbose=False)

# Process the string of the context and convert is to a dictionary
stanza_dict = nlp(context.isString).to_dict()

# Load the dictionary in the context
context.load_from_stanza_dict(stanza_dict)

Now all data can be accessed from the NifContext object.

The first sentence in the context:

print(context.sentences[0])

This gives:

(nif:Sentence) uri = https://mangosaurus.eu/rdf-data/doc_1&nif=sentence_0_23
  referenceContext : https://mangosaurus.eu/rdf-data/doc_1&nif=context
  beginIndex : 0
  endIndex : 23
  anchorOf : "The cat sat on the mat."
  nextSentence : "Felix was his name."
  firstWord : "The"
  lastWord : "."

The uri of this sentences is derived from the uri of the context by adding the specific offsets of the sentence within the context to the context uri. This is called an OffsetBasedString uri; it provides a unique uri for each sentence, word and phrase of the context.

The first word of the second sentence in the context:

print(context.sentences[1].words[0])

This results in:

(nif:Word) uri = https://mangosaurus.eu/rdf-data/doc_1&nif=word_24_29
  referenceContext : https://mangosaurus.eu/rdf-data/doc_1&nif=context
  beginIndex : 24
  endIndex : 29
  anchorOf : "Felix"
  lemma : "Felix"
  pos : olia:ProperNoun
  morphofeats : olia:Singular
  dependency : https://mangosaurus.eu/rdf-data/doc_1&nif=word_42_43
  dependencyRelationtype : nsubj

Note

The part-of-speech tags and the morphological features are converted from Universal Dependencies (the output of the Stanza NLP processor) to core OLiA classes.

All individual predicates can be accessed from the object. For example, the lemma of the third word of the first sentence:

print(context.sentences[0].words[2].lemma)

This gives:

'sit'

This is the lemma of the word ‘sat’ (the third word of the first sentence).

Nif collections#

You can collect mutliple contexts in a NifContextCollection.

# A NifContextCollection contains a set of contexts
from nifigator import NifContextCollection

# Make a collection by passing a uri
collection = NifContextCollection(uri="https://mangosaurus.eu/rdf-data")

# Add the context that was made earlier
collection.add_context(context)

# show the string representation of the collection
print(collection)

This gives:

(nif:ContextCollection) uri = https://mangosaurus.eu/rdf-data
  hasContext : https://mangosaurus.eu/rdf-data/doc_1&nif=context
  conformsTo : http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1

The contexts are retrievable as a list of the collection and can be accessed in the following way:

# Retrieving the first context in the collection
collection.contexts[0]

Creating a graph from a collection#

A NifGraph is a rdflib.Graph with additional functionality to convert to and from the Nif objects.

You an create a NifGraph from a NifContextCollection in the following way.

from nifigator import NifGraph

g = NifGraph(collection=collection)

You can then use all the functions of a rdflib.Graph such as serializing the graph.

print(g.serialize(format="turtle")[0:1890])

This gives the Nif data in RDF/turtle format:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix olia: <http://purl.org/olia/olia.owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://mangosaurus.eu/rdf-data> a nif:ContextCollection ;
    nif:hasContext <https://mangosaurus.eu/rdf-data/doc_1> ;
    dcterms:conformsTo <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1> .

<https://mangosaurus.eu/rdf-data/doc_1&nif=word_15_18> a nif:OffsetBasedString,
        nif:String,
        nif:Word ;
    nif:anchorOf "the"^^xsd:string ;
    nif:anchorOf_no_accents "the"^^xsd:string ;
    nif:anchorOf_no_diacritics "the"^^xsd:string ;
    nif:beginIndex "15"^^xsd:nonNegativeInteger ;
    nif:dependency <https://mangosaurus.eu/rdf-data/doc_1&nif=word_22_23> ;
    nif:dependencyRelationType "det"^^xsd:string ;
    nif:endIndex "18"^^xsd:nonNegativeInteger ;
    nif:lemma "the"^^xsd:string ;
    nif:oliaLink olia:Article,
        olia:Definite ;
    nif:pos olia:Determiner ;
    nif:referenceContext <https://mangosaurus.eu/rdf-data/doc_1&nif=context> ;
    nif:sentence <https://mangosaurus.eu/rdf-data/doc_1&nif=sentence_0_23> .

<https://mangosaurus.eu/rdf-data/doc_1&nif=word_19_22> a nif:OffsetBasedString,
        nif:String,
        nif:Word ;
    nif:anchorOf "mat"^^xsd:string ;
    nif:anchorOf_no_accents "mat"^^xsd:string ;
    nif:anchorOf_no_diacritics "mat"^^xsd:string ;
    nif:beginIndex "19"^^xsd:nonNegativeInteger ;
    nif:dependency <https://mangosaurus.eu/rdf-data/doc_1&nif=word_12_14> ;
    nif:dependencyRelationType "obl"^^xsd:string ;
    nif:endIndex "22"^^xsd:nonNegativeInteger ;
    nif:lemma "mat"^^xsd:string ;
    nif:oliaLink olia:Singular ;
    nif:pos olia:CommonNoun ;
    nif:referenceContext <https://mangosaurus.eu/rdf-data/doc_1&nif=context> ;
    nif:sentence <https://mangosaurus.eu/rdf-data/doc_1&nif=sentence_0_23> .

You can also parse the serialized data from this graph into another NifGraph and check whether they are isomorphic (meaning that they contain the same triples excepts from the blank nodes).

# Create an empty NifGraph
g1 = NifGraph()

# parse the serialized graph in turtle format
g1.parse(data=g.serialize(format="turtle"))

# Check whether the graphs are isomorphic
print(g1.isomorphic(g))

This gives:

True

With the NifGraph you can store the Nif data in a database or in a file with the functionality provided by RDFLib.

If you have read data into a graph then you can create a NifContextCollection from this in the following way:

# generate a NifContextCollection from a `NifGraph`
collection = g1.collection

# show the string representation of the result
print(collection)

The code will look for data in the graph that satisfies the Nif data format. This shows:

(nif:ContextCollection) uri = https://mangosaurus.eu/rdf-data
  hasContext : https://mangosaurus.eu/rdf-data/doc_1&nif=context
  conformsTo : http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1

All underlying Nif data can be accessed from this collection in the manner described above, so you can do

print(collection.contexts[0].sentences[0].words[0])

Which returns:

(nif:Word) uri = https://mangosaurus.eu/rdf-data/doc_1&nif=word_0_3
  referenceContext : https://mangosaurus.eu/rdf-data/doc_1&nif=context
  nifsentence : https://mangosaurus.eu/rdf-data/doc_1&nif=sentence_0_23
  beginIndex : 0
  endIndex : 3
  anchorOf : "The"
  lemma : "the"
  pos : olia:Determiner
  morphofeats : olia:Article, olia:Definite