#StackBounty: #google-search-console #google-chrome #download "Uncommon download" warnings in Chrome

Bounty: 50

Our site is a private service — that is, firewalls and user authorisation keeps both the general public and Google’s indexing off it.

The site generates dynamic DocX files, which the customer downloads with their browser. It’s been pretty much the same way for years.

This week we have started to intermittently see warnings in the Chrome downloads shelf:

Screenshot: "suitability-rep....docx is not commonly downloaded and may be dangerous"

I’ve dived into Chrome’s safe browsing stuff and discovered that you can get diagnostics at chrome://safe-browsing/. There I can see that:

  • Chrome sends a safe browsing “ping” when it downloads .docx files
  • Most of the time the response to this contains "verdict": "SAFE"
  • Sometimes the response contains "verdict": "UNCOMMON", and this is when the user gets a warning.

I’ve also had a bit of a hunt through the Chromium source and Chrome blogs, and it doesn’t look as if either Chrome’s approach, nor its policy for the .docx file type has changed recently. All I can think of is that the Safe Browsing service has changed its rules.

Here’s an anonymised ping payload that gave an UNCOMMON verdict:

{
   "archive_directory_count": 0,
   "archive_file_count": 0,
   "archived_binary": [  ],
   "download_type": 14,
   "file_basename": "doc.docx",
   "length": 493030,
   "referrer_chain": [ {
      "ip_addresses": [  ],
      "is_retargeting": false,
      "main_frame_url": "",
      "maybe_launched_by_external_application": false,
      "navigation_initiation": "UNDEFINED",
      "navigation_time_msec": 1.591196304544e+12,
      "referrer_main_frame_url": "",
      "referrer_url": "https://foo.bar/baz",
      "server_redirect_chain": [ "https://foo.bar/5ca70c26-9b9a-4cda-a99e-27394630910d" ],
      "type": "EVENT_URL",
      "url": "https://foo.bar/5ca70c26-9b9a-4cda-a99e-27394630910d"
   }, {
      "ip_addresses": [ "1.2.3.4" ],
      "is_retargeting": false,
      "main_frame_url": "",
      "maybe_launched_by_external_application": false,
      "navigation_initiation": "RENDERER_INITIATED_WITHOUT_USER_GESTURE",
      "navigation_time_msec": 1.591196217058e+12,
      "referrer_main_frame_url": "",
      "referrer_url": "https://foo.bar/baz/bap",
      "server_redirect_chain": [  ],
      "type": "LANDING_PAGE",
      "url": "https://foo.bar/baz"
   }, {
      "ip_addresses": [ "1.2.3.4" ],
      "is_retargeting": false,
      "main_frame_url": "",
      "maybe_launched_by_external_application": false,
      "navigation_initiation": "RENDERER_INITIATED_WITH_USER_GESTURE",
      "navigation_time_msec": 1.591196195633e+12,
      "referrer_main_frame_url": "",
      "referrer_url": "foo.bar/baz/bat",
      "server_redirect_chain": [  ],
      "type": "CLIENT_REDIRECT",
      "url": "https://foo.bar/bap"
   }, {
      "ip_addresses": [ "1.2.3.4" ],
      "is_retargeting": false,
      "main_frame_url": "",
      "maybe_launched_by_external_application": false,
      "navigation_initiation": "RENDERER_INITIATED_WITH_USER_GESTURE",
      "navigation_time_msec": 1.591196152516e+12,
      "referrer_main_frame_url": "",
      "referrer_url": "",
      "server_redirect_chain": [  ],
      "type": "LANDING_REFERRER",
      "url": "foo.bar/baz/bat"
   } ],
   "request_ap_verdicts": false,
   "url": "blob:https://foo.bar/5ca70c26-9b9a-4cda-a99e-27394630910d"
}

(The download is initiated by a user click on a React component which triggers Javascript using the file-saver library, which delivers it as a blob)

Google Search Console indicates no problems with our site — which is to be expected because the site is private.

I understand that UNCOMMON means “we can’t say this is safe because it’s not a file we’ve seen before”. It sort-of makes sense to me that our docx files could be described as uncommon, because each one is dynamically generated and therefore unique. However I don’t understand why the service sometimes reaches a verdict of SAFE and sometimes UNCOMMON – at a guess there are some heuristics based on the contents of the referrer chain, but I’ve not been able to spot any correlations so far.

I know we can advise our users or their Enterprise admins to whitelist our site, and while this is certainly an option, I’d really like to not give them that inconvenience.

So, questions:

  • Does Google publish anything to explain their criteria for reaching an UNCOMMON verdict?
  • What can we do at the server side to prevent an UNCOMMON verdict?
  • Any other solutions to this problem?


Get this bounty!!!

#StackBounty: #download #print-to-pdf #file-download #user-experience Can this PDF be downloaded to internal storage?

Bounty: 50

There’s a website where information is available to view publicly and one can share that information anywhere but only by taking screenshots. I want to save it as pdf in my device for future reference.

I tried looking for ways on the web but nothing helpful found.

Here’s the link of the PDF that I want to download: https://www.ibps.in/pdfview.html?pdfNameaHR0cHM6Ly93d3cuaWJwcy5pbi93cC1jb250ZW50L3VwbG9hZHMvQ1JQLVBPLUlYdmdndi1OT1RJQ0UucGRm

Instead of saving it to my storage as screenshot it would be much more user friendly to save it as PDF for future use.

Note: It takes a long time to load if you are visiting the link via smartphone or trying to open it using Google Chrome. I had to use Safari to open the link and it opened immediately.


Get this bounty!!!

#StackBounty: #python #html #shell #download #wget Shell script to download a lot of HTML files and store them statically with all CSS

Bounty: 50

I have posted on a forum of sciences a lot of post (roughly 290 questions) that I would like to get back by downloading them with all the associated answers.

The first issue is that I have to be logged on my personal space to have the list of all the messages. How to circumvent this first barrier to be able with a shell script or a single wget command to get back all URL and their content. Can I pass to wgeta login and a password to be logged and redirected to the appropriate URL ontaining the list of all messages ?

Once this first issue will be solved, the second issue is that I have to start from 6 different menu pages that all contain the title and the link of the questions.

Moreover, concerning some of my questions, the answers and the discussions may be on multiple pages.

So I wonder if I could achieve this operation of global downloading knowing I would like to store them statically with local CSS stored also on my computer (to keep the same format into my browser when I consult them on my PC).

The URL of the first menu page of questions is (once I am logged on the website : that could be an issue also to download with wget if I am obliged to be connected).

An example of URL containing the list of messages, once I am logged, is :

https://forums.futura-sciences.com/search.php?searchid=22897684

the other pages (there all 6 or 7 pages of discussions title in total appering in the main menu page) have the format :
https://forums.futura-sciences.com/search.php?searchid=22897684&pp=&page=2
(for page 2)

https://forums.futura-sciences.com/search.php?searchid=22897684&pp=&page=5
(for page 5)

One can see on each of these pages the title and the link of each of the discussions that I would like to download with also CSS (knowing each discussion may contain multiple pages also) :

for example the first page of discussion “https://forums.futura-sciences.com/archives/804364-demonstration-dilatation-temps.html

has page2 : “https://forums.futura-sciences.com/archives/804364-demonstration-dilatation-temps-2.html

and page 3 : “https://forums.futura-sciences.com/archives/804364-demonstration-dilatation-temps-3.html

Naively, I tried to do all this with only one command : (with the example of URL on my personal space that I have taken at the beginning of post, i.e “https://forums.futura-sciences.com/search.php?searchid=22897684“) :

wget -r --no-check-certificate --html-extension --convert-links "https://forums.futura-sciences.com/search.php?searchid=22897684"

but unfortunately, this command downloads all files, and even maybe not what I want, i.e my discussions.

I don’t know what the approach to use : must I firstly store all URL in a file (with all sub-pages containing all answers and the global discussion for each of mu initial question) ?

and after, I could do maybe a wget -i all_URL_questions.txt

Anyone could give me help to carry out this operation?

UPDATE 1 : my issue needs a script, i tried with python the following things :

1)

import urllib, urllib2, cookielib

username = 'USERNAME'
password = 'PASSWORD'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('https://forums.futura-sciences.com/login.php', login_data)
resp = opener.open('https://forums.futura-sciences.com/search.php?do=finduser&userid=253205&contenttype=vBForum_Post&showposts=1')
print resp.read()

But the page printed is not the page of my home into personal space.

2)

import requests

# Fill in your details here to be posted to the login form.
payload = { 
    'inUserName': 'USERNAME',
    'inUserPass': 'PASSWORD'
}

# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
    p = s.post('https://forums.futura-sciences.com/login.php?do=login', data=payload)
    # print the html returned or something more intelligent to see if it's a successful login page.
    print p.text.encode('utf8')

    # An authorised request.
    r = s.get('https://forums.futura-sciences.com/search.php?do=finduser&userid=253205&contenttype=vBForum_Post&showposts=1')
    print r.text.encode('utf8')

here too, this doesn’t work

3)

import requests
import bs4 

site_url = 'https://forums.futura-sciences.com/login.php?do=login'
userid = 'USERNAME'
password = 'PASSWWORD'

file_url = 'https://forums.futura-sciences.com/search.php?do=finduser&userid=253205&contenttype=vBForum_Post&showposts=1' 
o_file = 'abc.html'  

# create session
s = requests.Session()
# GET request. This will generate cookie for you
s.get(site_url)
# login to site.
s.post(site_url, data={'vb_login_username': userid, 'vb_login_password': password})
# Next thing will be to visit URL for file you would like to download.
r = s.get(file_url)

# Download file
with open(o_file, 'wb') as output:
    output.write(r.content)
print("requests:: File {o_file} downloaded successfully!")

# Close session once all work done
s.close()

Same thing, the content is wrong

4)

from selenium import webdriver

# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')

webdriver.get('https://forums.futura-sciences.com/')
webdriver.find_element_by_id('ID').send_keys('USERNAME')
webdriver.find_element_by_id ('ID').send_keys('PASSWORD')
webdriver.find_element_by_id('submit').click()
browser = webdriver.Firefox()
browser.get('https://forums.futura-sciences.com/search.php?do=finduser&userid=253205&contenttype=vBForum_Post&showposts=1')

Still not getting to log in with USERNAME and PASSSWORD and get content of homepage of personal space

5)

from selenium import webdriver
from selenium.webdriver.firefox.webdriver import FirefoxProfile
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time

def MS_login(username, passwd):  # call this with username and password

firefox_capabilities = DesiredCapabilities.FIREFOX
    firefox_capabilities['moz:webdriverClick'] = False
    driver = webdriver.Firefox(capabilities=firefox_capabilities)
    fp = webdriver.FirefoxProfile()
    fp.set_preference("browser.download.folderList", 2) # 0 means to download to the desktop, 1 means to download to the default "Downloads" directory, 2 means to use the directory
    fp.set_preference("browser.download.dir","/Users/user/work_archives_futura/")
    driver.get('https://forums.futura-sciences.com/') # change the url to your website
    time.sleep(5) # wait for redirection and rendering
    driver.delete_all_cookies() # clean up the prior login sessions
    driver.find_element_by_xpath("//input[@name='vb_login_username']").send_keys(username)

elem  = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//input[@name='vb_login_password']")))
    elem.send_keys(Keys.TAB)

driver.find_element_by_xpath("//input[@type='submit']").click()

    print("success !!!!")

driver.close() # close the browser
    return driver

if __name__ == '__main__':
    MS_login("USERNAME","PASSWORD")

The window is well opened, username filled but impossible to fill or submit the password and click on submit.

I begin to be discouraged.

ps: the main issue could come from that password field has display:none property, So I can’t simulate TAB operation to password field and pass it, once I have put the login.

Any help is welcome

EDIT 1: Isn’t there really nobody who could provide suggestions ? I know this a little bit tricky but it should exist a solution, at least, I hope…


Get this bounty!!!

#StackBounty: #javascript #jquery #download Force template download instead of opening file using code

Bounty: 50

I have a Templates document library, where the piece of code below has happily forcing the download of a template document instead of the file been opened. However, it has stopped working.

The reason for needing to force download is that users actually need a copy of the template and don’t need to open the actual original template. (There is an approval workflow, as part of a QMS, which manages the templates and tracks the version, users accidentally interfere with this otherwise).

If any user clicks on a file name currently they see this error message (which isn’t really telling us anything, other than the code is not rerouting the ‘open file’ request to a ‘download document’ request:

file not found

Here is the code that is being used:

n.b. there is a Templates Editors group for certain users who have permission to edit the templates in order to keep them updated. For everyone else -> force download.

https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js/uk_qhse/Style%20Library/jquery.SPServices-0.7.2.min.js
jQuery.noConflict();
jQuery(document).ready(function(){
  jQuery().SPServices({  
      operation: "GetGroupCollectionFromUser",  
      userLoginName: jQuery().SPServices.SPGetCurrentUser(),  
      async: false,  
      completefunc: function(xData, Status) { 
        //if current user is not a member of this group...       
        if(jQuery(xData.responseXML).find("Group[Name='Templates Editors']").length != 1)  
        {  
           //force download file on click
           jQuery("a[onclick*='DispEx']").each( function(data){
            var href = this.href;
            this.href = '/uk_qhse/_layouts/download.aspx?SourceURL=' + this.href;
            });
            jQuery("a[onclick*='DispEx']").removeAttr('onclick');  
        }  
      }  
    });  
});


Get this bounty!!!