Adding annotations#

You can add annotations to the Nif data in the following way.

Entity occurrences#

First we create a collection with one context.

# For the NLP data we create a NifContext and a NifContextCollection
from nifigator import NifContext, NifContextCollection, OffsetBasedString
from rdflib import URIRef

# Create context with two sentences
context = NifContext(
  base_uri=URIRef("https://mangosaurus.eu/rdf-data/doc_1"),
  URIScheme=OffsetBasedString,
  isString="The cat sat on the mat. Felix was his name."
)
# Create a collection and add the context above
collection = NifContextCollection(uri="https://mangosaurus.eu/rdf-data")
collection.add_context(context)

An annotation refers to a part of the string of the NifContext called a Phrase.

Here is how to create a new annotations for the named entity ‘Felix’.

# a NifPhrase can be an EntityOccurrence or a TermOccurrence
from nifigator import NifPhrase, EntityOccurrence

# Create the EntityOccurrence
entity = NifPhrase(
    base_uri="https://mangosaurus.eu/rdf-data/doc_1",
    URIScheme=OffsetBasedString,
    referenceContext=context,
    beginIndex=24,
    endIndex=29,
    taIdentRef="https://mangosaurus.eu/rdf-data/entities/Felix",
    taClassRef="https://mangosaurus.eu/rdf-data/classes/cat",
    taConfidence=1.0,
    PhraseType=EntityOccurrence,
)

# set the phrases as a list with one element
context.set_Phrases([entity])

For referencing the taIdentRef, taClassRef and taConfidence from the Internationalization Tag Set (itsrdf) are used.

The phrases can then be accessed with

# the string representation of the phrases in the context then looks like this
print(context.phrases)
[(nif:EntityOccurrence) uri = <https://mangosaurus.eu/rdf-data/doc_1&nif=phrase_24_29>
  referenceContext : https://mangosaurus.eu/rdf-data/doc_1
  beginIndex : 24
  endIndex : 29
  anchorOf : "Felix"
  taIdentRef : https://mangosaurus.eu/rdf-data/entities/Felix
  taClassRef : https://mangosaurus.eu/rdf-data/classes/cat
  taConfidence : 1.0
]

We can then create a graph and convert back to a collection.

from nifigator import NifGraph

g = NifGraph(collection=collection)

The phrases can be accessed with

g.collection.contexts[0].phrases
[(nif:TermOccurrence) uri = <https://mangosaurus.eu/rdf-data/doc_1&nif=phrase_24_29>
   referenceContext : https://mangosaurus.eu/rdf-data/doc_1
   beginIndex : 24
   endIndex : 29
   anchorOf : "Felix"]
   taIdentRef : https://mangosaurus.eu/rdf-data/entities/Felix
   taClassRef : https://mangosaurus.eu/rdf-data/classes/cat
   taConfidence : 1.0   

Checking whether the serialized data is the samen as the NifGraph:

from rdflib import Graph
g1 = Graph().parse(data=g.serialize(format="ttl"))
print(g1.isomorphic(g))

In some situations you might want to store the annotations in a different graph. You can do that in the following way:

g_annotations = NifGraph()

for phrase in collection.contexts[0].phrases:
    for triple in phrase.triples():
        g_annotations.add(triple)

Then the graph contains all data about the phrases and nothing else.

from rdflib import RDF

# retrieve and print all triples where the predicatie is rdf:type
for triple in g_annotations.triples([None, RDF.type, None]):
    print(triple)
(rdflib.term.URIRef('https://mangosaurus.eu/rdf-data/doc_1&nif=phrase_24_29'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#Phrase'))
(rdflib.term.URIRef('https://mangosaurus.eu/rdf-data/doc_1&nif=phrase_24_29'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#EntityOccurrence'))
(rdflib.term.URIRef('https://mangosaurus.eu/rdf-data/doc_1&nif=phrase_24_29'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#OffsetBasedString'))
(rdflib.term.URIRef('https://mangosaurus.eu/rdf-data/doc_1&nif=phrase_24_29'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#String'))