One of my current duties is examining a large amount of scanned documents (tifs and jpegs). The documents seemed to be stored in a data management that I access through through a web browser (mostly Chrome). We have some proprietary document viewer that displays all the scanned documents for us; however, it is incredibly slow for folders containing 200+. IT is working on the issue, but I know it’s going to take a long time for fix.Therefore, to speed up the process we have resorted to downloading and saving every document to the computer.
Chrome has speeded up the process for us, because it doesn't question ask Yes or No questions on downlaoding. However we still have to click on every single link to download the documents. The internet access is limited, so we cannot install many of the popular download managers like flashget, jdownloader, etc; however, Orbit was successful. Even with orbit installed, it attempts to download all the files, but the files are all .ASP files. I believe it has something to do with the security certificates, so the download must take place with the web browser.
I was successful in extracting the full URL address using a simple function in excel that I found through google. The original links would look like this name.tif or name.jpg. Therefore, I used excel to extract the full URL in the hope that it would solve the problem with the download manager. However, I ran into the same issue with receiving ASP files. I have come to the conclusion that the links must be downloaded through the web browser and not through a download manager. I also tried a firefox extension called Down Them All, but it ran into security certificate issues and constantly asked for my username and password to the server.
One of my co-workers has set up a simple macro that moves his mouse pointer over the links and clicks on each one to download. It works, but is still slow when you have 200+ files to download.
Anyone have any suggestions. Another thought was having an excel program to open every link and have Chrome set to not automatically open every file.
If the site is accessed through an ASP, it needs to run the code on the server to render the page to show you the file. ASP is basically a way to create a dynamic HTML page based on code in the file, not just an HTML page the way you think about it. Your click runs the code to fetch the image and file, which is why your automated downloaders fail. They don't run that code, just search the page for files to grab, which don't exist yet before the ASP code is run.
You'd need to get access directly to the data store (FTP site or file share) or have the page re-written, I don't know of automated way of grabbing info from a scripted site.
Thanks for the reply. I actually stumbled on a really nice add on for chrome called linkclump. The program allows you to highlight all the links that you want to download using the right mouse button. It then opens them all in separate tabs. Since chrome doesn't ask those yes or no questions to download, it automatically downloads each file. It does open every single tab for a download, but they disappear when it finishes. Now my job has become 100 times easier