Alright apparently I am not speaking the right language to get some ideas flowing here. I realized now after further searching the forum that the proper term for what I am trying to do is html extraction. I have put together this code (below) from some of the other posts I have read and now I am able to get just the text out of the webpage I am using. However, I need to format the text coming out into a csv format and I am a little at a loss of how to do this. The main problem is knowing which text is coming out so that I can put it in the correct column of the csv file.
int w=wait(3 WV win("List Details - Windows Internet Explorer" "IEFrame"))
Acc a1.Find(w "PANE" "List Details" "" 0x3001 3)
str html
a1.WebPageProp(0 0 html)
HtmlDoc d.InitFromText(html)
ARRAY(MSHTML.IHTMLElement) a
d.GetHtmlElements(a "")
int i
for i 0 a.len
out "----------"
str s2=a[i].innerText
out s2
I have also posted a excerpt of the html I am trying to extract from. This excerpt is one row of the csv file and there are 20 more blocks of html just like this one on the page that would like to extract. Any help on capturing these unique pieces of information would be a huge help. You can also see below this code a look at how I would like to format the csv file as well.
<div class="search-result-container contact-result row-fluid"><div class="span12">
<div class="item-actions-container">
<div class="actions-row long-line">
<div class="actions-container inline-block" style="width: 80px;">
<div class="touch-button-container inline-block pull-right">
<div title="Pin" class="pin-this"></div>
</div>
<div class="touch-button-container inline-block pull-right">
</div>
<div class="touch-button-container touch-right-divider inline-block pull-right">
<div title="Quick View" class="quick-view"></div>
<div class="right-divider"></div>
</div>
</div><div class="social-row">
<div class="search-result-google search-result-social pull-right">
<a href="https://plus.google.com/s/Alex%20Abadi" target="_blank"></a>
</div>
<div class="search-result-facebook search-result-social pull-right">
<a href="https://www.facebook.com/search/more/?q=Alex%20Abadi" target="_blank"></a>
</div>
<div class="search-result-twitter search-result-social pull-right">
<a href="https://twitter.com/search?q=Alex%20Abadi&mode=users" target="_blank"></a>
</div>
<div class="search-result-linkedin search-result-social pull-right">
<a href="http://www.linkedin.com/vsearch/f?keywords=Alex+Abadi" target="_blank"></a>
</div>
<!--<div class="search-result-companyURL inline-block">-->
<div class="search-result-url search-result-social pull-right">
<a href="http://www.imagemicrosystems.com" target="_blank"></a>
</div>
</div><div class="connection-meter list-only pull-left">
<!--<div class="left-side"></div>-->
<!--<div class="middle"></div>-->
<!--<div class="right-side"></div>-->
</div>
</div>
</div>
<div class="logo-container">
<div class="selected-status pull-left"></div>
<input class="pull-left" type="checkbox" name="searchResults-10611e14-c5b5-3cac-9679-7b69997eb75d" id="10611e14-c5b5-3cac-9679-7b69997eb75d" data-primitive-type="contact">
<div class="image-wrapper">
<!--<div class="p-meter-wrapper"><i class="icon p-meter list-only" ></i></div>-->
<div class="search-result-icon contact-icon"></div>
<div class="favicon-container">
</div>
</div>
<i class="icon ideal-prospect-img list-only"></i>
<div class="ideal-prospect-val list-only">
0
</div>
</div>
<div class="detail-container">
<div class="name-row">
<a href="/contact/10611e14-c5b5-3cac-9679-7b69997eb75d">Alex Abadi</a>
</div>
<div class="search-result-subheadline">
<span class="large-black-text">Chief Executive Officer at </span>
<span class="contact-company-name"><a href="/company/d0a95324-611b-36b7-8a5b-b753ab957e36" class="clickable">Image Microsystems, Inc.</a></span>
</div>
<div class="compact-section">
<div class="location">Austin,
Texas,
United States
<div class="contact-industry">Computer and Peripheral Equipment Manufacturing</div>
</div>
<div class="compact-section">
<div class="small-data-label">Main:</div>
<div class="inline-block black-text"><span id="gc-number-24" class="gc-cs-link" title="Call with Google Voice">512-623-5621</span></div>
<div>
<div class="small-data-label">Direct:</div>
<div class="inline-block black-text"><span id="gc-number-25" class="gc-cs-link" title="Call with Google Voice">512-623-5642</span></div>
</div>
<div>
<div class="small-data-label">Email:</div>
<a class="black-text" href="mailto:alex_abadi@imagemicrosystems.com">alex_abadi@imagemicrosystems.com</a>
</div>
</div>
<div class="">
</div>
</div>
</div>
<div class="right-wrapper">
<div class="stick-bottom pull-right">
<div class="notification-container list-only">
<a class="trigger-wrapper pull-right hidden" href="/contact/10611e14-c5b5-3cac-9679-7b69997eb75d?report=company_triggers">
<span class="trigger-count pull-right"></span>
<div class="trigger-icon-color pull-right"></div>
</a>
<div class="notes-wrapper dropdown text-right">
<a class="notes dropdown-toggle" data-toggle="dropdown" role="button" data-target="dropdown" data-item-id="10611e14-c5b5-3cac-9679-7b69997eb75d">
Notes <span class="note-count"></span></a>
<div class="dropdown-menu text-left">
<form class="noteEditForm">
<div class="helptext noteActionLabel">Add a New Note:</div>
<input type="text" name="label" class="noteLabel" placeholder="Title">
<textarea name="messageBody" class="noteBody" placeholder="Body"></textarea>
<input type="hidden" name="entityId" class="entityId" value="10611e14-c5b5-3cac-9679-7b69997eb75d">
<input type="hidden" name="entityType" class="entityType" value="contact">
<input type="hidden" name="id" class="noteId">
<div class="button-wrapper pull-right">
<a class="cancelNoteButton cancel-link" data-dismiss="dropdown" aria-hidden="true">Cancel</a>
<input type="submit" class="saveNoteButton btn btn-blue-small" value="Save">
</div>
<div class="clearfix"></div>
</form>
<div class="existing-notes hide">
<div class="helptext">Open an Existing Note:</div>
<ul>
</ul>
</div>
</div>
</div>
</div>
<div class="crm-status" data-id="10611e14-c5b5-3cac-9679-7b69997eb75d">
</div>
<div class="list-add-date text-right pull-right list-only">Added 6-Jan-2016</div>
</div>
</div></div></div>
CSV File Example - This csv needs to be separated by Tabs because there are "," in the data coming out of the html that I don't want to separate.
Name Title Company City/State Industry Main Phone Direct Phone Email Added
Alex Abadi Chief Executive Officer at Image Microsystems, Inc. Austin, Texas, United States Computer and Peripheral Equipment Manufacturing 512-623-5621 512-623-5642 alex_abadi@imagemicrosystems.com Added 6-Jan-2016
Any help that you can provide in helping me identify the particular html elements to pull out would be great as currently I am only able to pull all text into a text file which isn't helpful for the project I am working on.
Really appreciate any help you give me.
Best Regards,
Paul