What a Great Day

September 11, 2006

[Off topic, yet cool] ICFES

Filed under: Uncategorized — f3lheadhunter @ 12:20 pm

Disclaimer: If you don’t know what VIM, WGET, or an URL is, get away from here (to google, of course, and then come back)

so I run into this page:

http://200.14.205.63:8080/portalicfes/home_2/htm/cont_63.jsp?rec=not_4676.jsp

which contains a small amount of links, with ICFES tests (kinda like SAT), and I hit the midle mouse button, just to find out they use “javascript” ARGH!, I think to myself.
but then, looking a litle closer, I find this:

javascript:ventanaNueva(‘../rec/arc_4719.pdf’)

this “rec”directory, arouses my curiosity, so i open it

(HUGE)
http://200.14.205.63:8080/portalicfes/home_2/rec/

there must be 4000 documents or so!
OMG!
I have to download this stuff RIGHT AWAY!
so I proceed to o the following.

1. view source
2. use a VIM regulare expression to find all the links

href=”.*”

and I pass those to a new buffer, (using a VIM macro) q … and then @
however, pressing @@ 4000 times is kinda lame so i just use
(normal mode)
30000@@

which works wonders.

now I have a new buffer with something like this:

whatever

3. to get just the URLS, I use a regular expression, like this one:

%s/.*=”\(.*\)”.*/\1

however, all these addresses are in a form:
/a/b/c/x.pdf

but I will need the full url, so I must append the server name:

%s#.*#http://200.14.205.63:8080/&#

now I have something like

Click to access x.pdf

BUT I need this to be commands I can run, to download the pages, so I’ll just use wget (look it up)

%s/.*/wget &/

and now I have a 4000 lines file downloading progranm, like:

wget http://200.14.205.63:8080/portalicfes/home_2/rec/arc_1258.xls
wget http://200.14.205.63:8080/portalicfes/home_2/rec/arc_1257.xls
wget http://200.14.205.63:8080/portalicfes/home_2/rec/arc_1256.xls

I know I could have just downloaded HTTrack, or something ilke that, BUT I just couldn’t resist using VIM.

There are easier ways.

all of this could have been avoided, by the sysadmins, If they just had put a “dont show dir contents server directive” on this folder, or an even easier blank index.html file, but they were far too lazy to do that.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.