Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
regexp with quotes
#1
I'm trying to search through HTML using findrx and pulling out a numeric string within quotes. Obviosuly, I can't include the double quote character directly into the search pattern, so the code I'd LIKE to use is below, though it isn't working. The \042 = the double quote octal code.
Any ideas?


Htm el=htm("BODY" "" "" " Internet Explorer" "" 0 0x20)
findrx(el.HTML "value=\042(\d+)\042" 0 4 tmp)
#2
I tested your regex on other software. It works. Here's some things to try.
  • Make sure "tmp" is an array or flag 4 will not work.
    Index 0 of "tmp" contains the entire match, index 1 will contain your value.
    Check your input.
    Use this regex to account for possible spaces "value\s?=\s?\042(\d+)\042".
Matt B
#3
Well, after struggling, I dumped out the contents of el.HTML and saw that it did not match what I got when I went to the HTML source through IE. It rearranged the attributes and removed the quotes, which is why my pattern wouldn't match. After that, I was able to code up something that worked.

My assumption is that the IE DOM doesn't return the exact source that was loaded to create the page?
#4
Quote:My assumption is that the IE DOM doesn't return the exact source that was loaded to create the page?

Yes, even if you use el.DocText(1).

Quote:I can't include the double quote character directly into the search pattern

You can:
Code:
Copy      Help
findrx(el.HTML "value=''(\d+)''" 0 4 tmp)


Forum Jump:


Users browsing this thread: 1 Guest(s)