In the fact the problem is worst. Because the file gets the control characters has well. But the thread that you've indicated did help and the parsing is better. For future reference here is a way to convert the file again to human eyes :o : cat dic.dat | tr -d "\173" | tr -d "\175&qu...