Susan Li
1 min readNov 22, 2018


By default max_df is 1.0, which means “ignore terms that appear in more than 100% of the documents”, while min_df=5 means “ignore terms that appear in more than 5 documents.



Susan Li

Changing the world, one post at a time. Sr Data Scientist, Toronto Canada.