We’ve added a new component to the Blackstone model pipeline that facilitates rules-based entity recognition for “concepts” peculiar to legal texts, such as names of offences (e.g. murder) and common legal phrases (e.g. extension of time).Example
The following provides an example of how the
Concept pipeline extension is used. We make use of Gensim's
summarize() class to reduce the input text (Igor Judge CJ's judgment in Chambers v DPP  1 WLR 1833) down to 20 percent of its original size:
import spacy from spacy import displacy from blackstone.concepts import Concepts from blackstone.rules.concept_rules import CONCEPT_PATTERNS from blackstone.displacy_palette import ner_displacy_options from gensim.summarization import summarize nlp = spacy.load('en_blackstone_proto') concepts_pipe = Concepts(nlp) nlp.add_pipe(concepts_pipe) # Load the text of Chambers v DPP  1 WLR 1833 with open('chambers_v_dpp.txt', 'r') as data: text = data.read() # Squeeze the text down into a more compressed form using # Gensim's summariser summary = summarize(text, ratio=0.2) doc = nlp(text) print (doc._.concepts) >>> [('twitter', 34), ('electronic communications', 20), ('police', 11), ('mens rea', 7), ('evidence', 6)]
How does it work?
The implementation of the
Concept component is actually rather crude, but it seems to do a decent job nevertheless.
Concept component to Blackstone's model places spaCy's
EntityRuler() at the end of the model's pipeline and adds terms derived from a (currently) partial portion of the subject matter section of ICLR's Red Index (a cumulative case law index that has been in circulation since the 1950s) to the
EntityRuler has been updated with the concept rules, a second pipeline component,
Concepts is added to the end of the pipeline.
Term matches are then rendered accessible via