Dataset focused on text mining and computational linguistics