Blackstone Concept Extractor / by Daniel Hoadley


We’ve added a new component to the Blackstone model pipeline that facilitates rules-based entity recognition for “concepts” peculiar to legal texts, such as names of offences (e.g. murder) and common legal phrases (e.g. extension of time).Example

The following provides an example of how the Concept pipeline extension is used. We make use of Gensim's summarize() class to reduce the input text (Igor Judge CJ's judgment in Chambers v DPP [2013] 1 WLR 1833) down to 20 percent of its original size:

import spacy
from spacy import displacy

from blackstone.concepts import Concepts
from blackstone.rules.concept_rules import CONCEPT_PATTERNS
from blackstone.displacy_palette import ner_displacy_options

from gensim.summarization import summarize

nlp = spacy.load('en_blackstone_proto')
concepts_pipe = Concepts(nlp) 

# Load the text of Chambers v DPP [2013] 1 WLR 1833
with open('chambers_v_dpp.txt', 'r') as data:
    text =

# Squeeze the text down into a more compressed form using
# Gensim's summariser

summary = summarize(text, ratio=0.2)
doc = nlp(text)

print (doc._.concepts)

>>> [('twitter', 34), ('electronic communications', 20), ('police', 11), ('mens rea', 7), ('evidence', 6)]

How does it work?

The implementation of the Concept component is actually rather crude, but it seems to do a decent job nevertheless.

Adding the Concept component to Blackstone's model places spaCy's EntityRuler() at the end of the model's pipeline and adds terms derived from a (currently) partial portion of the subject matter section of ICLR's Red Index (a cumulative case law index that has been in circulation since the 1950s) to the EntityRuler's patterns.

Once the EntityRuler has been updated with the concept rules, a second pipeline component, Concepts is added to the end of the pipeline.

Term matches are then rendered accessible via doc._.concepts.