Using machine learning for concept extraction on clinical documents from multiple data sources