Sunday, May 15, 2011

findstr does not support unicode

I often use findstr command on my windows XP to find some informations in text files (grep for unix)

For example with this command

findstr /i "PC2K01" *.*

you will have something like:

File_WIN_PC01_20110414.csv:2011-04-14;PC2K01;;G:;72;62;WINDOWS
File_WIN_PC01_20110414.csv:2011-04-14;PC2K01;;H:;54;49;WINDOWS


I don't remember having problems to find something with findstr command. Except today, when I try to find a computer name in some files. I was completely sure this computer exist in my files and findstr doesn't find this computer...
Finally, I found manually the file containing my computer. So I decided to understand why findstr didn't work. 2 reasons appears to me immediatly: 1st-findstr is bugged. 2nd-there is an encoding problem. To verify this 2nd reason I tested all my files with Get-FileEncoding.ps1
like this

Import-Module .\Get-FileEncoding.ps1
Get-ChildItem | select name, @{n='Encoding';e={Get-FileEncoding $_.FullName}}


Then it appears some file are in Unicode

Name Encoding
---- --------
1.csv ASCII
2.csv Unicode UTF-16 Little-Endian
3.csv ASCII
4.csv ASCII
5.csv ASCII
6.csv ASCII


To verify my idea was good, I just tested something that exist in unicode file 2.csv. In file 2.csv there is the string mystring and only in this file.

findstr /i "mystring" 2.csv

and nothing appears. Findstr does not support unicode....

The solution: convert all your files to ansi.
For this, use this command to create files ANSI_1.csv ANSI_2.csv ...

Get-ChildItem | Foreach-Object { Get-Content -Path $_.Fullname | Out-File ANSI_$_ -Encoding Default }

(for information, you can also do it manually by using notepad / file save as / and changing encoding)

some other ideas if you don't want to convert your files to ansi:
- use find command which is unicode (yes find is unicode / not findstr...)

- use Microsoft sysinternals strings utility
http://technet.microsoft.com/en-us/sysinternals/bb897439

- use powershell

Select-String *.* -pattern "string" | Select-Object filename,pattern,line

2 comments:

Pollus Brodeur said...

I found a workaround.
You can use this:
TYPE UTF16.txt > ASCII.txt
FINDSTR object ASCII.TXT

Anonymous said...

Or skip creating a text file by using a pipe (vertical bar):
TYPE UTF16.txt | FINDSTR object