We’ve added a new component to the Blackstone model pipeline that facilitates rules-based entity recognition for “concepts” peculiar to legal texts, such as names of offences (e.g. murder) and common legal phrases (e.g. extension of time).
The following provides an example of how the
Concept pipeline extension is used. We make use of Gensim's
summarize() class to reduce the input text (Igor Judge CJ's judgment in Chambers v DPP  1 WLR 1833) down to 20 percent of its original size:
How does it work?
The implementation of the
Concept component is actually rather crude, but it seems to do a decent job nevertheless.
Concept component to Blackstone's model places spaCy's
EntityRuler() at the end of the model's pipeline and adds terms derived from a (currently) partial portion of the subject matter section of ICLR's Red Index (a cumulative case law index that has been in circulation since the 1950s) to the
EntityRuler has been updated with the concept rules, a second pipeline component,
Concepts is added to the end of the pipeline.
Term matches are then rendered accessible via