Large Scale Central

Utilities to help ease the transition

As we all probably know, our benevolent Bob is updating the software for LSC. These kinds of updates have risks, notably in this case around the freight sheds.

To attempt to help, I’ve created a website that will have a few utilities for use by anyone (for free). First is a way to export a complete topic (all the pages) into a single HTML page. I hope to also do a utility to download all photos from a freight shed.

The web page is available at https://lsc.jimrowson.com [link] and looks like this right now:

If you provide a url to a topic and click the button, you get a web page that looks like this:

Feel free to complain or point out bugs here. I reserve the right to work on this as much as I feel like, though. So purely visual changes may not be a high priority for me.

Cheers!

Jim,

Thank you for stepping up and helping out.

BUT, for us less computer literate, just what is this and how does it work? Does it just copy and paste a forum topic to a new website and if so what do we do with it from there??

Thanks

Rick

Rick:

All this does is gather together all the posts from all the pages in a topic in a single page.

Once you have that you can:

  • copy it and paste it into an editor
  • print it
  • print it to PDF
  • save the HTML for later use

Does that help?

The goal is to let you save your hard work just in case there is a glitch, say with the freight sheds.

thank you for your work, Jim.

but i still got questions.

i right clicked on the pic in your example, to see the pic info. but the whole example is a pic.

so, if i convert a thread: where are the pics stored?

if i save as pdf, they are part of the document. clear so far.

if i save as html, where are they stored?

same question for copy-saving in an editor?

Korm:

Sorry if I seem vague but there are a LOT of variables: your operating system, your browser, your editor. I can only tell you what happens on my computer (a Mac) using Chrome and Google Docs editor.

  • as you note, if you print to PDF, it includes all the images, but the PDF is a little hard to work with, and browsers are notoriously bad at making reasonable choices when printing
  • for me there are 3 ways to save HTML:
  • as a “Web page, HTML file” just saves the HTML and leaves the images, etc. where they are now (a much smaller file but doesn’t help if the files on the web disappear),
  • as a “Web page, single file” which produces an MHTML that encapsulates all the images internally (I think? maybe at low resolution?) [link]
  • as a “Web page, complete” which also produces an MHTML that is much bigger than the previous one, so maybe has the images at full resolution?- for me when I copy and paste into a Google Doc it seems to be saving the image (not sure of the resolution)

The MTHML files (extension is .htm) is loadable into a browser, and presumably you can copy/paste into an editor from the browser if you want.

So which of these you use depends on what you want to do.

I suggest you try this with your computer, your browser, your editor and see what works and what doesn’t. I can’t begin to enumerate all the variations.

Also, just so you know, I’ve passed all this by Bob and he’s fine with it. I wouldn’t have done this without his blessing.

Some additional useability of Jim’s program, you can also print a PDF file.

Jim showed what you will see when you scrape a page. Once you scrape the page, for Windows Firefox users users, click on the ‘hamburger’ in the upper right of your browser . Click on the Print button . This will bring up the preview pane allowing you to see what the top of the file will look like. In the upper left corner of the preview pane, click on the print button . This will open a pane where you can choose a printer. In this pane click on ‘Microsoft Print to PDF’ . This will then bring you to where you will give your PDF a name and choose a folder where to place it.

Another interesting thing I found is that if you right click on a picture in your scrape and pick on ‘View Image Info’ , it will bring up a pane with a list of all the image files in the scrape including the path info. . In the image list pane, select Select All . This will get a complete list in the pane of all the images in the thread. If you wish to save the images to a local folder, choose ‘Save All’ and then choose a folder where you want to store the images. You can also save the list by right clicking in the list area which will highlight the list and bring up a small dialog box. Click Copy in the dialog box. Open Note Pad or Word and paste the list into the text editor.

Ah, sorry, just got a chance to look more into the difference between a Single and a Complete mhtml save as…

The Single file version saves the images in-line within the mhtml file, so you get a single file that is self contained. This makes it easy to copy around, or email to somebody, etc. For those that care, this uses the same mechanism that emails with attachments does…

The Complete version saves the html in one file and a separate folder with everything needed to make that html file work. So all the images are in that sub folder (along with javascript, css, and other obscure web stuff). The html references are modified so they look things up in the subfolder. This makes it easy to, for example, grab all the images referenced from a web page, as they are all sitting as separate files within the subfolder. It makes it harder to copy it around or email it to somebody because all the files are separate. You would need to wrap it all up into a zip file, for example, to send it as a single file.

So, your mileage may vary as to how you want to use the data. It is up to you and what you find most important and convenient.

I appreciate Jim taking the initiative to do this. This is a 100% LSC approved project. :slight_smile:

Jim,

Everything works great for me (Mac/Safari).

Can you save the entire document to your hard drive as a PDF document for off line viewing?

Thanks!!!

Very nice work, Jim. Works fine on Mac with Firefox.

John Bouck:

I think I figured out how to export to PDF using Safari on a Mac. In the window generated with the topic all in one page you need to hold the control key and click the mouse button (the classic way to do a “right click” on a single button mouse) which will bring up a small menu, where one option is to “Print Page…”. In the print dialog box that appears, in the lower left corner is a PDF menu that has an option to “Save as PDF”:

That will bring up yet another dialog box allowing you to choose where to save the PDF. Here’s what the resultant PDF looked like for me:

UPDATED to add a simpler method. In the File menu at the top of the screen there’s an “Export as PDF…” item you can click to do the same basic task:

And there are lots of web pages with help available if you search for “safari save as pdf” [link]

Hope that helps…

Thanks Jim for the information and help. (https://www.largescalecentral.com/externals/tinymce/plugins/emoticons/img/smiley-cool.gif)

Feel free to complain or point out bugs here. I reserve the right to work on this as much as I feel like, though. So purely visual changes may not be a high priority for me

Jim, that’s a wonderful resource. We used to be able to do that on MLS, and several moderators, like Dwight, converted useful threads and then re-linked them on a forum section for future reading. It is especially useful when all the photo storage vanishes! See attached pic for an example.

I did find a bug in the “Microsoft Print to PDF”, which may be of MS or Firefox origin. I used an old thread of mine which has 7 photos in the initial post.

http://www.largescalecentral.com/forums/topic/29805/east-broad-top-coach-5/view/post_id/391278

The Firefox ‘Print’ preview only showed two of the pics and shrinking to 70% only got me 2 1/2. See attached screenshot (which shows the only pics that came out.) So a PDF may not be a good option - and should certainly be checked carefully.

I did save it as a “Web Page Complete” which got me an html page and a folder of files - with all the photos in it.

I would make a minor comment that your (link) doesn’t open in a new window - or at least it didn’t for me. (Firefox/W10).

Pete:

Yeah, converting to PDF comes with the caveat that all the browsers I know of do a really lousy job of figuring out page boundaries. Chrome does a similar thing of splitting photos across pages which is truly annoying. Except that sometimes it creates page boundaries at inexplicable spots giving weird large blank areas. Sigh. Fixing that is sadly beyond my ability to work on right now (I took a try at it a while ago when I worked in the print part of HP, but sadly it never came to fruition).

If you copy the entire thing and paste it into a word processor (like MS Word or Google docs), you might be able to gain more control by selectively changing the sizes and layouts of various images. But that’s a lot of work to go through.

I’m not sure what you mean by the (link) not opening in a new window but I’m not surprised that the behavior varies in different browsers…

Atta boy, Jim!!

I’m not sure what you mean by the (link) not opening in a new window but I’m not surprised that the behavior varies in different browsers…

Sorry, I meant opening in a new tab. As in target=“blank”.

A neat thing about Jim’s tool is that one can easily snapshot one’s entire multi-page build thread and:

  • copy-paste it into your favorite editor

  • edit it to your liking

  • pump it to PDF, and

  • upload it as an LSC article

Cliff

I tried the scraper using Microsoft Edge and Windows 10. Worked like a charm. Thanks for doing this!

Greg

Gregory Hile said:

I tried the scraper using Microsoft Edge and Windows 10. Worked like a charm. Thanks for doing this!

Greg