Something we’re quite passionate about here at ICLR&D is working to demystify the business of doing “innovative” things with legal information. That hinges on honesty and transparency and means avoiding the use of unhelpful buzzwords and showing our “working out” — how we go about scoping and solving problems.
As a starting point, we thought it might be helpful for us to set out what we have in our toolbox. Here we go.
Basic, boring and really useful stuff
Pencils (I’m a fan of the propelling variety)
Post-its — it’s nice to a mixture of sizes and colours
Pilot pens (the finer the better)
A3 paper
Sharpies
Whiteboard
Project management and communications
Trello (the free version does everything we need it to)
Slack (keep it contained and make use of channels to manage conversations on particular topics, but don’t go bonkers and make too many channels)
Dropbox (for moving big stuff around)
Rough and ready data management
Excel (begrudgingly, albeit)
OpenRefine (really good for getting a look at large tabular datasets that cause Excel to fall over)
Databases (not including Graph)
MarkLogic (for heavy-weight, production-grade stuff)
MongoDB (mainly for holding text data that doesn’t have structure)
PostgreSQL (for structured data)
Servers and cloud services
Amazon EC2 (for deployments and data jobs that require lots of processing power)
Amazon S3 (for storage)
Amazon Kinesis (when we’re playing with streamed data)
Amazon Lambda (for triggering event-driven scripts)
Development and IDEs
Data Science-y stuff
Pandas (data wrangling)
spaCy (for NLP)
Numpy (for maths)
Scikit-learn (for machine learning)
Tensorflow (for deep learning)
Keras (also for deep learning)
BeautifulSoup (for parsing XML)
Altair (for visualisation)
Seaborn (also for visualisation)
Graph (not to be confused with charts)
Neo4j (I can’t even begin to explain how brilliant this thing is)
Arrows (for modelling graph networks before moving to Neo4j and after drawing it on paper)