Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
"Search in files" dialog (regexp, replace)
#1
Maybe I don't know something, but when I search for text in files on Windows XP and Vista, it often does not find. That is why I created this dialog. It also can find text using regular expressions, and can replace text.

Also included function to get path of the folder in Windows Explorer.


.qml   Search in files.qml (Size: 7.08 KB / Downloads: 562)


.qml   Search in files.qml (Size: 6.18 KB / Downloads: 755)


.zip   SearchInFiles.zip (Size: 254.46 KB / Downloads: 494)
#2
i had to change the code to

Code:
Select All      Help
,if(but(18 hDlg)) _s.getpath(_s "")
,run _s "" "Explore"

otherwise for some reason the explorer 'open with' dialog opens.
it aks which application should open 'foldername' (e.g c:\shell -> shell).
maybe something is wrong in my registry.
pi
#3
better this

Code:
Select All      Help
,if(but(18 hDlg)) run _s.getpath(_s "") "" "explore"
,else run _s
#4
is it possible to get the line numbers where a word (term) has been found when searching for text in files ?

if so, then a context menu for opening the file in an external editor would be nice.
pi
#5
Possible. Maybe some day I'll also need this feature and then add it.

Also, before searching binary files maybe better would be to replace all 0 characters to spaces, because find and findrx search only until first 0 character.
#6
I have a folder with many .ppt and .pps files. There is the title text and the text on the slides and in "notes sections"
If I leave the text area blank and just search under *.pps, all the files come up.
Is there any way, I could search them using the text in the pps/pps files?
Stuart
#7
Hi Gintaras,
Is there any way to modify the text that goes to the ListBox so that the user only sees the file names (i.e. no fullpath and no extension).

I took off the (1) from dd.FileName(1) to get just the filename.ext
and then did a replacerx to "" for the ext but then when I click on the results in the listbox, of course, it doesn't know the pathname of the file.

Do I have to set up a two dimensional array that (no filepath, no extension) = (with filepath, with extension)?

Thanks for any help!!!

Stuart
#8
Updated. Now the list box contains relative paths.
#9
OK

I just tried your search program and there is no reason to use any other search program than yours. I gave it the hardest test I could (one which I know it could not find) and it found it.
#10
Hi All,
I am now using QM code derived from the Search in Files dialog to find files and filepaths but the number of files is SO huge ( about 100,000 files in about 65 gig), that it is taking too long to search through them. I have found sofware online that create an index (in about 2 minutes or so) which returns lightning-fast results. The best so far I have found (shareware) is "Index Your Files http://www.indexyourfiles.com/. Problem is that it doesn't support command-line interface (to my knowledge), so I can't automate it's usage to use within my QM file. I was just wondering the following:

1) any recommendations for software that people are using to do this...preferably with command-line search query submit and output to textfile so I can use it in QM dialog

2) What is the nature of "indexing" and can I do it myself easily within QM using SQLITE as Index database.
- I tried just outputting the filepaths of my folders to .txt file but it got so huge that it wouldn't open properly and I know it would be too big to do simple text string searching in QM. But maybe this could be done with SQLITE database. If it was just a list of filepaths, would searching this as a SQLITE database be faster than going through the regular Windows file structure like in the Search in Files dialog? Any advice on how to structure such an index database if so?

3) Alternatively, if there is software that creates an index that could then be read by QM rather than that software, then I would only need the external software at time of index creation (won't need to be updated too often). Maybe there is some common database format that such software uses. Anybody's experience would be greatly appreciated. For example, the Index created by "Index Your Files" is only 3.42 MB containing 89,030 files and is .EXP file but I don't know much about what that format is.

Thanks, Stuart
#11
I cannot answer to 1) and 3).

Quote:2) What is the nature of "indexing" and can I do it myself easily within QM using SQLITE as Index database.
- I tried just outputting the filepaths of my folders to .txt file but it got so huge that it wouldn't open properly and I know it would be too big to do simple text string searching in QM. But maybe this could be done with SQLITE database. If it was just a list of filepaths, would searching this as a SQLITE database be faster than going through the regular Windows file structure like in the Search in Files dialog? Any advice on how to structure such an index database if so?

Indexing nature, as I imagine it:
1. Indexing:
a) Get all filenames.
b) Get all unique words.
c) Bind each word with all filenames that contain it.
d) Save all to a database.
2. Searching:
a) Open the database.
b) Find the word in the word list (1 b).
c) Get filenames that are bound to the word.

It is simplified. Need much programming to make useful. But if you don't need to search in file data, would be quite simple.

Or you can use Windows Search: Enumerate files (with text in file)
#12
Thanks so much for your explanation. I just need filename/filepath. I don't need to index the actual filetext (pdf's).
To bind each word to the files, I geuss inwould have to put each filename first into it's own db lookup table so that I then would have to bind only the row number for the article in the main table (or else the list of full filenames for common words would be so huge if it was a fulltext entry for each bounded file.)
After that, it would be easy to match an OR statement (additive) but the much more useful and specific AND statements seems more complicated. Would I have to do a series of nested loops for each AND word in the search query?
Thanks for any thoughts, Stuart
#13
Function QuickFindFile_Index
Code:
Select All      Help
;/
function $databaseFile $folders

;Creates database for <tip>QuickFindFile_Find</tip>.
;Error if fails.

;databaseFile - database file. Ex: "$my qm$\x.db3"
;folders - list of folders. Will get all file and folder paths from these folders. Ex: "C:[]E:\Folder"


Sqlite db.Open(databaseFile)
db.Exec("DROP TABLE files"); err
db.Exec("BEGIN TRANSACTION")
db.Exec("CREATE TABLE files (path)")

str f
foreach f folders
,f+iif(f.end("\") "*" "\*")
,Dir d
,foreach(d f FE_Dir 0x6)
,,str sPath=d.FileName(1)
,,sPath.lcase ;;in Sqlite LIKE, Unicode chars case sensitive
,,sPath.SqlEscape
,,db.Exec(F"INSERT INTO files VALUES ('{sPath}')")

db.Exec("END TRANSACTION")

err+ end _error

;info: with sqlite 50% slower than with raw txt file

Function QuickFindFile_Find
Code:
Select All      Help
;/
function $databaseFile $filePattern ARRAY(str)&results

;Finds files in database created by <tip>QuickFindFile_Index</tip>.
;Error if fails.

;databaseFile - database file.
;filePattern - file pattern. Must match full path. Examples: "*.txt", "C:\*.txt", "C:\Folder\*", "*\file.txt".
;results - receives full paths of found files and folders.


Sqlite db.Open(databaseFile)

str s=filePattern
s.lcase
s.findreplace("`" "``")
s.findreplace("%" "`%")
s.findreplace("_" "`_")
s.findreplace("*" "%")
s.findreplace("?" "_")
s.SqlEscape
db.Exec(F"SELECT path FROM files WHERE path LIKE '{s}' ESCAPE '`'" results)

err+ end _error

;info: with sqlite 5 times slower than with raw txt file. Not faster if we get all and use matchw.

test

Macro Macro1479
Code:
Select All      Help
out

str dbFile="$my qm$\QuickFindFile_Index.db3"

int t1=timeGetTime

QuickFindFile_Index dbFile "$qm$[]$my qm$"
;QuickFindFile_Index dbFile "c:"

int t2=timeGetTime

ARRAY(str) a
QuickFindFile_Find dbFile "*.chm" a

int t3=timeGetTime
out "%i %i" t2-t1 t3-t2

for(_i 0 a.len) out a[0 _i]


_____________________________________________________________________________________

The same with txt database.
Indexing 33% faster.
Searching 5 times faster.
Speed tested with ~150000 files.

Function QuickFindFile_Index2
Code:
Select All      Help
;/
function $databaseFile $folders

;Creates database for <tip>QuickFindFile_Find2</tip>.
;Error if fails.

;databaseFile - database file. Ex: "$my qm$\x.txt"
;folders - list of folders. Will get all file and folder paths from these folders. Ex: "C:[]E:\Folder"


str s f
__HFile h.Create(databaseFile CREATE_ALWAYS GENERIC_WRITE)

foreach f folders
,f+iif(f.end("\") "*" "\*")
,Dir d
,foreach(d f FE_Dir 0x6)
,,s.addline(d.FileName(1))
,,if s.len>10000
,,,;g1
,,,if(!WriteFile(h s s.len &_i 0)) end _s.dllerror
,,,s.fix(0 1)
,if(s.len) goto g1

err+ end _error

Function QuickFindFile_Find2
Code:
Select All      Help
;/
function $databaseFile $filePattern ARRAY(str)&results

;Finds files in database created by <tip>QuickFindFile_Index2</tip>.
;Error if fails.

;databaseFile - database file.
;filePattern - file pattern. Must match full path. Examples: "*.txt", "C:\*.txt", "C:\Folder\*", "*\file.txt".
;results - receives full paths of found files and folders.


str s f
int na(65000) nr nrTotal

results=0

__HFile h.Create(databaseFile OPEN_EXISTING GENERIC_READ FILE_SHARE_READ)
rep
,if(!ReadFile(h s.all(na) na &nr 0)) end _s.dllerror
,if(!nr) break
,s.fix(nr)
,s.fix(findcr(s 10)+1)
,
,foreach f s
,,if(matchw(f filePattern 1)) results[]=f
,
,nrTotal+s.len
,SetFilePointer h nrTotal 0 0

err+ end _error

test

Macro Macro1481
Code:
Select All      Help
out

str dbFile="$my qm$\QuickFindFile_Index.txt"

int t1=timeGetTime

QuickFindFile_Index2 dbFile "$qm$[]$my qm$"
;QuickFindFile_Index2 dbFile "c:"

int t2=timeGetTime

ARRAY(str) a
QuickFindFile_Find2 dbFile "*.chm" a

int t3=timeGetTime

out "%i %i" t2-t1 t3-t2

out a
#14
Holy Guacamole!! Just the trick.... I can now use REGEX to parse the query string and submit word each AND word in the statetment.....get the results using your new functions, then get the common (overlapping) subsets.

To get the individual words, I will use something like this:

Macro Macro55
Code:
Select All      Help
[code]
str Query = "test phrase here";; from input box
str Pattern_1w = "\b\w+?\b"
ARRAY(str) arr_1w
int Total_1w = findrx(Query Pattern_1w 0 4 arr_1w)
out Total_1w
out "[][]"
for _i 0 Total_1w
,out arr_1w[0 _i][/code]

rep 1000
,Thanks Gintaras!!!

Stuart
#15
Is the Quick Find packaged in the exe above? Looks amazing!
Thanks Gintaras!
#16
TheVig Wrote:Is the Quick Find packaged in the exe above? Looks amazing!
Thanks Gintaras!

No.
#17
The indexing is fantastic for searches of the type "Exact Phrase with Wildcards" but I would like to be able to search where the results come back in this order: Exact Entire Search Phrase Match (whole exact > partial word match) > All Words (order not important) > some (variable) words.

This will help because when sending the search query "cats dogs" some of the matches may be in reverse order so even "cats*dogs" won't match "dogs and cats" though this would be desirable.

Ideally, even if you just threw a bunch of words at it, the items that match the most of them would come to the top even if some didn't match. This would make the search much more "google-like".

To extract each word from the search query I would start with code like this:
Macro Macro79
Code:
Select All      Help
,str Query = "test phrase here";; from input box
str Pattern_1w = "\b\w+?\b"
ARRAY(str) arr_1w
int Total_1w = findrx(Query Pattern_1w 0 4 arr_1w)
out Total_1w
out "[][]"
for _i 0 Total_1w
,out arr_1w[0 _i]

But then I would have to run them through the Index Search in variable combinations and then figure out a way to remove the overlaps. Any suggestions for a strategy for this or am I underestimating the complexity of this problem.

Thanks for any suggestions or directions to go on this.

Stuart
#18
Quite complex. Look at QM 'Search Help and Tools'. It is only part of what you need. It is in System folder, look for CHI_CreateIndex.


Forum Jump:


Users browsing this thread: 1 Guest(s)