Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Get text of webpage
#1
Is possible get text of webpage without open Explorer?
#2
Member function str.Html2Text
Code:
Copy      Help
function $HTML

;Extracts text from HTML.

;EXAMPLE
;str s
;IntGetFile "http://www.quickmacros.com" s
;s.Html2Text(s)
;out s



MSHTML.IHTMLDocument2 d._create(uuidof(MSHTML.HTMLDocument))
ARRAY(VARIANT) a.create(1)
a[0]=HTML
d.write(a)
this=d.body.innerText
#3
Thanks.
#4

How would I grab specific information, rather then the whole web page?

For example, to just grab the "Requirements" from the QM home page.
Taking on Quick Macros one day at a time
#5
Using regular expression.

Next QM release also will have function HtmlParse that gets document object model of HTML.

Code:
Copy      Help
function $HTML [MSHTML.IHTMLDocument2&doc2] [MSHTML.IHTMLDocument3&doc3]

;Creates document object model of HTML.

;HTML - HTML.
;doc2, doc3 - variables that, after calling the function, can be used to get parsed HTML information. Can be omitted or 0.


;EXAMPLE
;str s
;IntGetFile "http://www.quickmacros.com" s ;;download a html file
;
;MSHTML.IHTMLDocument2 d; MSHTML.IHTMLDocument3 d3
;HtmlParse(s d d3)
;
;s=d.body.innerText
;ShowText "body text" s
;
;MSHTML.IHTMLElement el=d.links.item(6)
;s=el.getAttribute("href" 0)
;s.replacerx("^about:") ;;relative links have "about:" at the beginning
;ShowText "URL of 7-th link" s
;
;MSHTML.IHTMLElement eb=d3.getElementsByTagName("B").item(0)
;s=eb.innerText
;ShowText "text of first bold text" s



MSHTML.IHTMLDocument2 d._create(uuidof(MSHTML.HTMLDocument))
ARRAY(VARIANT) a.create(1)
a[0]=HTML
d.write(a)

if(&doc2) doc2=d
if(&doc3) doc3=+d
#6

Would findrx be better at grabbing specific information on a page compared to HtmlParse?
Taking on Quick Macros one day at a time
#7
findrx is faster, and is easy to use if you are familiar with regular expressions. Sometimes HtmlParse is more useful, especially when you want to extract text or part of text without any HTML, because it is not easy to correctly parse HTML using regular expressions. Depends on what you want to find in the page.
#8

Oh ok, which do you think would be better to use if i was trying to grab all the "High/Low" temperatures?
http://www.weather.com/weather/tenday/4 ... undeclared
Taking on Quick Macros one day at a time
#9
Code:
Copy      Help
out

str s
IntGetFile "http://www.weather.com/weather/tenday/48183?from=36hr_fcst10DayLink_undeclared" s

ARRAY(str) a; int i
if(!findrx(s ">(\d+)&deg;F</B>" 0 1|4 a)) ret
for i 0 a.len
,out a[1 i]
#10

Thanks, will this only grab what is in bold print?
Taking on Quick Macros one day at a time
#11
All number degrees F
#12

Okay, any reason why when I run this it only gives me the high temperatures?
Taking on Quick Macros one day at a time
#13
high

http://www.weather.com/weather/tenday/48183?dp=htempdp

low

http://www.weather.com/weather/tenday/48183?dp=ltempdp
#14

Great thanks. Any idea why I get this error: Error (RT) in 10 Day Forecast: array is not created. *NOTE: this macro is not done yet, just curious why i'm getting the error in one and not the other.*

EDIT: Posted wrong code for first macro, corrected now.

Macro ( 10 Day Forecast ) Trigger ( @11 )
Code:
Copy      Help
str Src Src2 Message zipcode url url2 s a
zipcode="48183" ;;<----------Enter your zipcode
ARRAY(str) Temp FeelsLike
url.from("http://www.weather.com/weather/tenday/" zipcode "?dp=htempdp")
url2.from("http://www.weather.com/weather/tenday/" zipcode "?dp=ltempdp")
int MatchSuccess i

IntGetFile url Src
Src.setclip
if findrx(Src "<B CLASS=obsTempTextA>([\d]{1,3})" 0 0 Temp) ;;([\d]{1,3}+)
,IntGetFile url2 Src2
,Src2.setclip
,if findrx(Src "<B CLASS=obsTempTextA>([\d]{1,3})" 0 0 Temp) ;;([\d]{1,3}+)  
,,Message.from("Temperature: " Temp[1] "F" "[]Feels Like: " FeelsLike[1] "F")
,,mes(Message "Current Temperature" "isa")


But in this macro I don't get the error.

Macro ( Current Weather ) Trigger ( @11 )
Code:
Copy      Help
str Src Message zipcode url
zipcode="48183" ;;<----------Enter your zipcode
ARRAY(str) Temp FeelsLike
url.from("http://www.weather.com/weather/local/" zipcode "?lswe=" zipcode "&lwsa=WeatherLocalUndeclared&from=whatwhere")
int MatchSuccess

IntGetFile url Src
Src.setclip
if findrx(Src "<B CLASS=obsTempTextA>([\d]{1,3})" 0 0 Temp) ;;([\d]{1,3}+)
,if findrx(Src "<B CLASS=obsTextA>Feels Like<BR> ([\d]{1,3})" 0 0 FeelsLike)
,,Message.from("Temperature: " Temp[1] "F" "[]Feels Like: " FeelsLike[1] "F")
,,mes(Message "Current Temperature" "isa")
Taking on Quick Macros one day at a time
#15
findrx returns -1 when does not find.

if(findrx(...)>=0)
,code

If using flag 4, returns 0 when does not find.
#16

I see, i'm just not sure how this is not finding it. Not sure what i'm doing wrong.
Taking on Quick Macros one day at a time
#17
When a regular expression does not want to work, I remove some parts of it until it begins to work, then I can see where was the mistake.
#18

Thats what I do, but now i'm not getting any error, it's just simply not working?
I've changed this just to display the highs.


Macro ( 10 Day Forecast2 )
Code:
Copy      Help
str Src Src2 Message zipcode url url2 s a
zipcode="48183"
ARRAY(str) Temp FeelsLike
url.from("http://www.weather.com/weather/tenday/" zipcode "?dp=htempdp")
url2.from("http://www.weather.com/weather/tenday/" zipcode "?dp=ltempdp")
int MatchSuccess i

IntGetFile url Src
Src.setclip
if(findrx(Src ">(\d+)&deg;F</B>" 0 1|4 Temp)<0) ret
,Message.from("Temperature: " Temp[1] "F")
,mes(Message "10-Day Forecast" "isa")

I get no error but it doesnt put anything into a message.
Taking on Quick Macros one day at a time
#19
Don't use tabs/commas where they should not be.
#20
Your right, okay after a little work I have it giving me a message with the weather, but it doesn't give me all 10 temperatures, I only get 1 high and 1 low, and it might actually just be the same temperature just labeld under each one. Any ideas on how to get it to show all 10 Temperatures?

Macro ( 10 Day Forecast3 )
Code:
Copy      Help
str Src Src2 Message zipcode url url2 s a
zipcode="48183"
ARRAY(str) High Low
url.from("http://www.weather.com/weather/tenday/" zipcode "?dp=htempdp")
url2.from("http://www.weather.com/weather/tenday/" zipcode "?dp=ltempdp")
int MatchSuccess i

IntGetFile url Src
Src.setclip
if(findrx(Src ">(\d+)&deg;F</B>" 0 1|4 High)<0) ret
for i 0 High.len
,IntGetFile url2 Src2
,Src2.setclip
,if(findrx(Src ">(\d+)&deg;F</B>" 0 1|4 Low)<0) ret
,for i 0 Low.len
,,Message.from("Highs: " High[1 i] "F" "[]Lows: " Low[1 i] "F")
mes(Message "10-Day Forecast" "isa")
Taking on Quick Macros one day at a time
#21
Don't know why you call IntGetFile for url2 repeatedly, and why use clipboard. My version is:

Code:
Copy      Help
out

str Src Src2 message zipcode url url2 s a
zipcode="48183"
ARRAY(str) High Low
url.from("http://www.weather.com/weather/tenday/" zipcode "?dp=htempdp")
url2.from("http://www.weather.com/weather/tenday/" zipcode "?dp=ltempdp")
int MatchSuccess i

IntGetFile url Src
IntGetFile url2 Src2

if(findrx(Src ">(\d+)&deg;F</B>" 0 1|4 High)=0) ret
if(findrx(Src2 ">(\d+)&deg;F</B>" 0 1|4 Low)=0) ret

for i 0 High.len
,message.formata("%s %s[]" Low[1 i] High[1 i])

mes message
#22

Oh okay, not sure why I did that either, thanks for the help. One last question, is it possible to add labels for each row? such as
Monday: 34 28
Tuesday: 22 18
etc.?
Taking on Quick Macros one day at a time
#23
Extract them from one of the pages using findrx, like you do with temperatures.
#24

Good idea, i'll give it a try, thanks for all the help.
Taking on Quick Macros one day at a time
#25

I hate to ask so many question, i'm just trying to learn how to do this. But how do I know what to fill in for the macro, for the dates or days? Is there a specific code for links, bold letters, or underlined statements, or is there a code I can find on the bottom of the QM screen? I've never really messed with findrx before and after reading through the QM reference im a bit confused.
Taking on Quick Macros one day at a time
#26
In web browser, right click somewhere and click 'View source' or similar menu item. It opens page source, ie HTML. Find the text you need. Usually it is surrounded by HTML tags, for example <b>bold text</b> or <a href="url">link text</a>.

In macro, use findrx to find and extract the text (eg weekday). You have to create regular expression that matches the text (possibly with html tags) you want to find. If you need multiple instances, it also must match all other instances. For example, regular expression ">(\d+)&deg;F</B>" matches every temperature in the page.

Regular expresion syntax is completely documented in QM help. Often used regular expression parts are in floating toolbar -> more tools -> reg expr menu.
#27

Ohhhhhh, that helps out alot. What do I do though if the line is different for each date? I have posted part of the line that gives the day. Each day ranges from dayNum=1-dayNum=7.

A HREF="/weather/wxdetail/48183?dayNum=3">

There is also part that says Thu</B right after the dayNum.

So what do I do if there is a different code for each date?
Taking on Quick Macros one day at a time
#28
Regular expressions usually are used not to find multiple instances of SAME text but rather to find multiple instances of different but SIMILAR text. For example to match a digit is used \d. To match 3-character word is used \b\w\w\w\b.
#29

Wait then how would I set it for a range from dayNum=1 to dayNum=7? I try to implement
<B<A HREF="/weather/wxdetail/48183?dayNum="<B
into the code and I believe because of the quotations, it is screwing up the arguments in the macro. Or maybe i'm just putting the wrong code into the macro.
*NOTE - some minor parts of the code above were taken out so i could show the code.
Taking on Quick Macros one day at a time
#30
This is another way. Uses HtmlParse. You still need some parsing using findrx or other string functions, but it should be easier.

Function HtmlTableToArray
Code:
Copy      Help
;/
function $HTML VARIANT'tableNameOrIndex ARRAY(str)&a [flags] [str&tableText] ;;flags: 1 get HTML

;Gets cells of a HTML table into array.

;HTML - all HTML (page source).
;tableNameOrIndex - table name or 0-based index in the HTML.
;a - array variable for results. The function creates 1-dimension array where each element is cell text.
;tableText - optional str variable that receives whole text of the table.


;EXAMPLE
;out
;str s
;IntGetFile "http://www.weather.com/weather/tenday/48183" s
;
;ARRAY(str) a
;HtmlTableToArray s 12 a
;
;;display text in first cell of each row
;int i ncolumns=2
;for i 0 a.len ncolumns

,;out a[i]
,;;out a[i+1] ;;second cell, and so on
,;out "---------"


MSHTML.IHTMLDocument2 d; MSHTML.IHTMLDocument3 d3
HtmlParse HTML d d3

MSHTML.IHTMLTable2 table=d3.getElementsByTagName("TABLE").item(tableNameOrIndex); err end "the specified table does not exist"

MSHTML.IHTMLElementCollection cells=table.cells
a.create(cells.length)
int i
for i 0 a.len
,MSHTML.IHTMLElement el=cells.item(i)
,if(flags&1) a[i]=el.innerHTML
,else a[i]=el.innerText

if(&tableText)
,el=+table
,if(flags&1) tableText=el.innerHTML
,else tableText=el.innerText


Forum Jump:


Users browsing this thread: 1 Guest(s)