We’ve added a new component to the Blackstone model pipeline that facilitates rules-based entity recognition for “concepts” peculiar to legal texts, such as names of offences (e.g. murder) and common legal phrases (e.g. extension of time).
The following provides an example of how the Concept
pipeline extension is used. We make use of Gensim's summarize()
class to reduce the input text (Igor Judge CJ's judgment in Chambers v DPP [2013] 1 WLR 1833) down to 20 percent of its original size:
How does it work?
The implementation of the Concept
component is actually rather crude, but it seems to do a decent job nevertheless.
Adding the Concept
component to Blackstone's model places spaCy's EntityRuler()
at the end of the model's pipeline and adds terms derived from a (currently) partial portion of the subject matter section of ICLR's Red Index (a cumulative case law index that has been in circulation since the 1950s) to the EntityRuler
's patterns.
Once the EntityRuler
has been updated with the concept rules, a second pipeline component, Concepts
is added to the end of the pipeline.
Term matches are then rendered accessible via doc._.concepts
.