PDA

View Full Version : Screen Scrapeing(C#)



DEElekgolo
June 8th, 2009, 01:04 AM
I require help for an application I have in mind. I plan on making a screenshot dumper that would load this (http://www.bungie.net/Stats/Halo3/Screenshots.aspx?player=DEEhunter1) page into memory and then find all occurances of

size=full&ssid=And read the next 32 values after that into an array or variable of some sort.
(depending if I either use an array or a foreach method)
So now that it would have a bunch of video Ids. It can now dump all of them by adding

http://www.bungie.net/Stats/Halo3/Screenshots.aspx?size=full&ssid=In front of all of them and using them as download links to dump all the pictures into a folder.

Kind of a dirty method to do it but I am sure it will work.

Problem is, I'm not that good at C# and will require a lot of help.
I looked into Regex and currently got my mind hovering around a command like this.

Regex descRegex = new Regex("<p><b>Description</b></p>\n<p>(?<description>.+?)</p>", RegexOptions.Singleline);
string description = descRegex.Match(html).Groups["description"].Value.Trim();
Taken from here. (http://mhinze.com/archive/screen-scraping-tutorial-using-c-net/)
But again, not that god at C# so I'll be needing help...

Advancebo
June 8th, 2009, 01:14 AM
This would help for the people that take alot of references, and it goes over the 30 recent screenshots in your profile on Bungie.net.

Awesome, but I wish I could help.

Kornman00
June 8th, 2009, 02:45 AM
Regex descRegex = new Regex("size=full&amp;ssid=(?<SSID>.+?)\"", RegexOptions.Singleline);
string description = descRegex.Match(html).Groups["SSID"].Value.Trim();
try that against each line of html read

EDIT:
This will help you Regular Expression Pocket Reference (http://my.safaribooksonline.com/9780596514273). I use SlickEdit for VS to test Regex expressions, really neat. Also VAssistX is nice

DEElekgolo
June 8th, 2009, 03:54 PM
[QUOTE=Kornman00;411169]
Regex descRegex = new Regex("size=full&amp;ssid=(?<SSID>.+?)\"", RegexOptions.Singleline);
string description = descRegex.Match(html).Groups["SSID"].Value.Trim();

problem with this is, it captures the ssid but it captures it as:

;ssid=(screenshot ID)"
It captures the " after the ssid as well. How can I change that?

Limited
June 8th, 2009, 04:33 PM
Cut the string up then...cut 7 off the front, 1 off the back.

Use the substring method of string.

description.Substring(7, 32));Syntax: Substring(<how many characters in>, <how many characters to use after split>);

After using that code, description should equal the ID of the screenshot.

DEElekgolo
June 8th, 2009, 07:37 PM
Ok I got the SSID isolated. Thanks guys!