Utilities to help ease the transition

Jim_Rowson · October 29, 2020, 11:02pm

As we all probably know, our benevolent Bob is updating the software for LSC. These kinds of updates have risks, notably in this case around the freight sheds.

To attempt to help, I’ve created a website that will have a few utilities for use by anyone (for free). First is a way to export a complete topic (all the pages) into a single HTML page. I hope to also do a utility to download all photos from a freight shed.

The web page is available at https://lsc.jimrowson.com [link] and looks like this right now:

If you provide a url to a topic and click the button, you get a web page that looks like this:

Feel free to complain or point out bugs here. I reserve the right to work on this as much as I feel like, though. So purely visual changes may not be a high priority for me.

Cheers!

Rick_Marty · October 30, 2020, 1:32am

Jim,

Thank you for stepping up and helping out.

BUT, for us less computer literate, just what is this and how does it work? Does it just copy and paste a forum topic to a new website and if so what do we do with it from there??

Thanks

Rick

Jim_Rowson · October 30, 2020, 1:37am

Rick:

All this does is gather together all the posts from all the pages in a topic in a single page.

Once you have that you can:

copy it and paste it into an editor
print it
print it to PDF
save the HTML for later use

Does that help?

The goal is to let you save your hard work just in case there is a glitch, say with the freight sheds.

Korm · October 30, 2020, 2:45am

thank you for your work, Jim.

but i still got questions.

i right clicked on the pic in your example, to see the pic info. but the whole example is a pic.

so, if i convert a thread: where are the pics stored?

if i save as pdf, they are part of the document. clear so far.

if i save as html, where are they stored?

same question for copy-saving in an editor?

Jim_Rowson · October 30, 2020, 3:08am

Korm:

Sorry if I seem vague but there are a LOT of variables: your operating system, your browser, your editor. I can only tell you what happens on my computer (a Mac) using Chrome and Google Docs editor.

as you note, if you print to PDF, it includes all the images, but the PDF is a little hard to work with, and browsers are notoriously bad at making reasonable choices when printing
for me there are 3 ways to save HTML:
as a “Web page, HTML file” just saves the HTML and leaves the images, etc. where they are now (a much smaller file but doesn’t help if the files on the web disappear),
as a “Web page, single file” which produces an MHTML that encapsulates all the images internally (I think? maybe at low resolution?) [link]
as a “Web page, complete” which also produces an MHTML that is much bigger than the previous one, so maybe has the images at full resolution?- for me when I copy and paste into a Google Doc it seems to be saving the image (not sure of the resolution)

The MTHML files (extension is .htm) is loadable into a browser, and presumably you can copy/paste into an editor from the browser if you want.

So which of these you use depends on what you want to do.

I suggest you try this with your computer, your browser, your editor and see what works and what doesn’t. I can’t begin to enumerate all the variations.

Jim_Rowson · October 30, 2020, 3:12am

Also, just so you know, I’ve passed all this by Bob and he’s fine with it. I wouldn’t have done this without his blessing.

Bob_Cope · October 30, 2020, 3:31am

Some additional useability of Jim’s program, you can also print a PDF file.

Jim showed what you will see when you scrape a page. Once you scrape the page, for Windows Firefox users users, click on the ‘hamburger’ in the upper right of your browser . Click on the Print button . This will bring up the preview pane allowing you to see what the top of the file will look like. In the upper left corner of the preview pane, click on the print button . This will open a pane where you can choose a printer. In this pane click on ‘Microsoft Print to PDF’ . This will then bring you to where you will give your PDF a name and choose a folder where to place it.

Another interesting thing I found is that if you right click on a picture in your scrape and pick on ‘View Image Info’ , it will bring up a pane with a list of all the image files in the scrape including the path info. . In the image list pane, select Select All . This will get a complete list in the pane of all the images in the thread. If you wish to save the images to a local folder, choose ‘Save All’ and then choose a folder where you want to store the images. You can also save the list by right clicking in the list area which will highlight the list and bring up a small dialog box. Click Copy in the dialog box. Open Note Pad or Word and paste the list into the text editor.

Jim_Rowson · October 30, 2020, 4:35am

Ah, sorry, just got a chance to look more into the difference between a Single and a Complete mhtml save as…

The Single file version saves the images in-line within the mhtml file, so you get a single file that is self contained. This makes it easy to copy around, or email to somebody, etc. For those that care, this uses the same mechanism that emails with attachments does…

The Complete version saves the html in one file and a separate folder with everything needed to make that html file work. So all the images are in that sub folder (along with javascript, css, and other obscure web stuff). The html references are modified so they look things up in the subfolder. This makes it easy to, for example, grab all the images referenced from a web page, as they are all sitting as separate files within the subfolder. It makes it harder to copy it around or email it to somebody because all the files are separate. You would need to wrap it all up into a zip file, for example, to send it as a single file.

So, your mileage may vary as to how you want to use the data. It is up to you and what you find most important and convenient.

admin1 · October 30, 2020, 12:43pm

I appreciate Jim taking the initiative to do this. This is a 100% LSC approved project.

John_Bouck · October 30, 2020, 2:26pm

Jim,

Everything works great for me (Mac/Safari).

Can you save the entire document to your hard drive as a PDF document for off line viewing?

Thanks!!!

Bruce_Chandler · October 30, 2020, 3:17pm

Very nice work, Jim. Works fine on Mac with Firefox.

Jim_Rowson · October 30, 2020, 4:47pm

John Bouck:

I think I figured out how to export to PDF using Safari on a Mac. In the window generated with the topic all in one page you need to hold the control key and click the mouse button (the classic way to do a “right click” on a single button mouse) which will bring up a small menu, where one option is to “Print Page…”. In the print dialog box that appears, in the lower left corner is a PDF menu that has an option to “Save as PDF”:

That will bring up yet another dialog box allowing you to choose where to save the PDF. Here’s what the resultant PDF looked like for me:

UPDATED to add a simpler method. In the File menu at the top of the screen there’s an “Export as PDF…” item you can click to do the same basic task:

And there are lots of web pages with help available if you search for “safari save as pdf” [link]…

Hope that helps…

Noel_Wilson · October 30, 2020, 5:02pm

Thanks Jim for the information and help. (https://www.largescalecentral.com/externals/tinymce/plugins/emoticons/img/smiley-cool.gif)

PeterT · October 30, 2020, 6:37pm

Feel free to complain or point out bugs here. I reserve the right to work on this as much as I feel like, though. So purely visual changes may not be a high priority for me

Jim, that’s a wonderful resource. We used to be able to do that on MLS, and several moderators, like Dwight, converted useful threads and then re-linked them on a forum section for future reading. It is especially useful when all the photo storage vanishes! See attached pic for an example.

I did find a bug in the “Microsoft Print to PDF”, which may be of MS or Firefox origin. I used an old thread of mine which has 7 photos in the initial post.

http://www.largescalecentral.com/forums/topic/29805/east-broad-top-coach-5/view/post_id/391278

The Firefox ‘Print’ preview only showed two of the pics and shrinking to 70% only got me 2 1/2. See attached screenshot (which shows the only pics that came out.) So a PDF may not be a good option - and should certainly be checked carefully.

I did save it as a “Web Page Complete” which got me an html page and a folder of files - with all the photos in it.

I would make a minor comment that your (link) doesn’t open in a new window - or at least it didn’t for me. (Firefox/W10).

Jim_Rowson · October 30, 2020, 7:23pm

Pete:

Yeah, converting to PDF comes with the caveat that all the browsers I know of do a really lousy job of figuring out page boundaries. Chrome does a similar thing of splitting photos across pages which is truly annoying. Except that sometimes it creates page boundaries at inexplicable spots giving weird large blank areas. Sigh. Fixing that is sadly beyond my ability to work on right now (I took a try at it a while ago when I worked in the print part of HP, but sadly it never came to fruition).

If you copy the entire thing and paste it into a word processor (like MS Word or Google docs), you might be able to gain more control by selectively changing the sizes and layouts of various images. But that’s a lot of work to go through.

I’m not sure what you mean by the (link) not opening in a new window but I’m not surprised that the behavior varies in different browsers…

Cliff_Jennings · October 30, 2020, 7:58pm

Atta boy, Jim!!

PeterT · October 30, 2020, 8:32pm

I’m not sure what you mean by the (link) not opening in a new window but I’m not surprised that the behavior varies in different browsers…

Sorry, I meant opening in a new tab. As in target=“blank”.

Cliff_Jennings · October 30, 2020, 8:51pm

A neat thing about Jim’s tool is that one can easily snapshot one’s entire multi-page build thread and:

copy-paste it into your favorite editor
edit it to your liking
pump it to PDF, and
upload it as an LSC article

Cliff

Gregory_Hile · October 30, 2020, 11:30pm

I tried the scraper using Microsoft Edge and Windows 10. Worked like a charm. Thanks for doing this!

Greg

Rooster · October 30, 2020, 11:37pm

Gregory Hile said:

I tried the scraper using Microsoft Edge and Windows 10. Worked like a charm. Thanks for doing this!

Greg