If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Web get command (wget) to download all icons/pics on a web page (too large or too small)
How do I get Windows/Linux web get to ignore all files of a too-small size?
Like everyone, I often use the Windows/Linux "Free Software Foundation" web-get wget command to download all the PDFs, GIFs, or JPEGs in a web site onto my hard disk. The basic command we all use is: EXAMPLE FOR WINDOWS: c:\ wget -prA.gif http://machine/path EXAMPLE FOR LINUX: % wget -prA.jpg http://machine/path This famous wget command works great, except it downloads ALL the JPG & GIF icons and photos at the targeted web site - large or small. How do we tell wget to skip files of a certain size? For example, assume we wish to skip anything smaller than, say, 10KB and antyhing larger than, say, 100KB. Can we get wget to skip files that are too small or too large? barb |
#2
|
|||
|
|||
Web get command (wget) to download all icons/pics on a web page (too large or too small)
On Fri, 04 Aug 2006 12:00:45 -0400, Marvin wrote:
% wget -prA.jpg http://machine/path Can we get wget to skip files that are too small or too large? I don't know how to do that, but it would be easy to erase all the small files. When the images have been downloaded to a directory, sort the directory by file size and erase those below the minimum size you want to keep. Hi Marvin, Thank you for your help. I thought of this but I was kind of hoping that wget would have a "size" range option that handled this. Something like: wget -prA.pdf http://www.consumerreports.com --sizemin:max What I do today is sort by file size and then delete the too-large files and the too-small files but that is obviously not optimal. barb |
#3
|
|||
|
|||
Web get command (wget) to download all icons/pics on a web page (too large or too small)
On Fri, 04 Aug 2006 11:28:07 -0500, Dances With Crows wrote:
I think if you really want this, I think you're going to have to hack wget such that it takes another option, --size-range or something. Then wget would have to parse the server's 200 responses and either halt the download if the 200 said the file wasn't in --size-range, or unlink() the file after it finished. The exact approach you'd take depends on the wget code itself, and your level of C skill. Hi Dances with Crows, Thank you for your kind help. As you surmised, I do not have the skill set to "hack" the venerable wget command so that it selects to download only files of a certain range in size. I had also read the manpage and I had searched prior but I did not see that anyone had done this yet. I am kind of surprised since it's the most basic of things you want to do. For example, let's say we went to a free icon site and let's say they updated that site periodically with the little web page bitmaps and better icons usable for powerpoint slides and too-big icons suitable for photo sessions. Let's say you had a scheduled wget go to that site daily and download all the icons automatically from that http web page but not the large ones or the really really small ones. Let's say there were thousands of these. Of course, ftp would be a pain. You likely wouldn't even have FTP access anyway. And, downloading them manually isn't in the cards. What I'd want to schedule is: wget -prA.gif,jpg,bmp http://that/freeware/icon/web/page --size:low:high barb |
#4
|
|||
|
|||
Web get command (wget) to download all icons/pics on a web page (too large or too small)
On 4 Aug 2006 10:43:53 -0700, poddys wrote:
I'm just wondering why you need to do this... You might be getting into copyright issues here.... Hi poddys, Thank you very much for asking the right questions. Let's say I went to http://www.freeimages.co.uk or http://www.bigfoto.com or http://www.freefoto.com/index.jsp or any of a zillion sites which supply royalty free images or GIFs or bitmaps or PDFs or HTML files etc. Why wouldn't I want to use wget to obtain all the images, pdfs, word documents, powerpoint templates, whatever ... that this site offers. Even for sites I PAY for such as consumer reports and technical data ... why wouldn't I want to just use wget to download every single PDF or Microsoft office document or graphic at that web site? There's no copyright infringement in that is there? I can do all that today with wget. The only problem I have is that the really large (too large) files get downloaded too and that the really small (too small) files seem to be useless clutter. barb |
#5
|
|||
|
|||
Web get command (wget) to download all icons/pics on a web page (too large or too small)
On Fri, 04 Aug 2006 13:34:13 -0500, Dances With Crows wrote:
barb never stated what barb was doing with the images. It's a legit and semi-interesting question, though, regardless of what the final purpose is. Too bad there's nothing in wget that does what barb wants. barb will have to either hack wget or write a small script to remove all files between sizes X and Y after wget's finished. Hi Dances with crows, I don't know what I want to do with the images or pdfs or powerpoint templates. For example, recently I found a page of royalty free powerpoint calendar templates. The web page had scores and scores of them. Nobody in their right mind is going to click on a link-by-link basis when they can run a simple wget command and get them all in one fell swoop (are they?) wget -prA.ppt http://that/web/page My older brother pointed me to one of his yahoo web pages which contained photos, hundreds of them. I picked up them all in seconds using: wget -prA.jpg http://that/web/page I wouldn't THINK of downloading a hundred photos manually (would you?). Do people REALLY download documents MANUALLY nowadays? Oh my. They're crazy in my opinion (although I did write and file this letter manually myself :P) barb |
#6
|
|||
|
|||
Web get command (wget) to download all icons/pics on a web page (too large or too small)
On Fri, 04 Aug 2006 18:51:11 GMT, Ben Dover wrote:
You could probably write a script or batchfile to process the results of the wget download based on filesize. Hi Ben Dover, Thank you very much for your kind advice. I am not a programmer but I guess it could look like this (in dos)? REM wget.bat wget -prA.ppt,jpg,doc,pdf,gif http://some/web/page dir if filesize 10K then del filename else if filesize 100K then del filename end And, in linux, maybe something like this (I found on the web): # wget wget -prA.ppt,jpg,doc,pdf,gif http://some/web/page foreach file (`ls`) set size = `ls | awk 'print $3'` if $size 10000 then rm $file if $size 100000 then rm $file endif end Is this a good start (which newsgroup could we ask?) barb |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
large composite photo from many small photos | Gene Palmiter | Digital Photography | 14 | November 30th 05 07:05 AM |