jSoup: Java HTML Parser

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

  • scrape and parse HTML from a URL, file, or string
  • find and extract data, using DOM traversal or CSS selectors
  • manipulate the HTML elements, attributes, and text
  • clean user-submitted content against a safe white-list, to prevent XSS attacks
  • output tidy HTML

jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.


Fetch the Wikipedia homepage, parse it to a DOM, and select the headlines from theIn the news section into a list of Elements (online sample):

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

Open source

jsoup is an open source project distributed under the liberal MIT license. The source code is available at GitHub.

Getting started

  1. Download the jsoup jar (version 1.8.3)
  2. Read the cookbook introduction
  3. Enjoy!


ADF – Displaying Multi-Line Text with outputText Component

To display multi-line text, the basic idea is applying the white-space CSS property to the element with one of following values: pre, pre-wrap and pre-line. This property controls the processing of whitespace inside an element. The exact value you need depends on the behavior you want. In this post, I’m using pre-wrap as the example, which means “Sequences of whitespace are preserved. Lines are broken at newline characters, at
, and as necessary to fill line boxes.”. Here’s the code snippet from the skin of the sample application:

.PreWrap {
white-space: -moz-pre-wrap; /* Mozilla /
white-space: pre; /
CSS 2.0 /
white-space: pre-wrap; /
CSS 2.1 /
word-wrap: break-word; /
IE 5.5-7 */

In this rule, the word-wrap CSS property specifies whether the break within a word is allowed to prevent overflow when an otherwise-unbreakable string is too long to fit within line box. The property is applies only when the white-space property allows wrapping. The value of break-word means “Normally unbreakable words may be broken at arbitrary points if there are no otherwise acceptable break points in the line.” This property is useful for the cases, such as really long URLs.

How to include a JavaScript file in another JavaScript file

There are two main ways to achieve this:

1 – You can load it with an Ajax call and then use eval.

This is the most straightforward way, but it’s limited to your domain because of the JavaScript safety settings, and using eval is opening the door to bugs and hacks.

2 – Add a script tag with the script URL in the HTML.

This is definitely the best way to go. You can load the script even from a foreign server, and it’s clean as you use the browser parser to evaluate the code. You can put the <script /> tag in the head of the web page, or at the bottom of the body.

Both of these solutions are discussed and illustrated in JavaScript Madness: Dynamic Script Loading.

Now, there is a big issue you must know about. Doing that implies that you remotely load the code. Modern web browsers will load the file and keep executing your current script because they load everything asynchronously to improve performance.

It means that if you use these tricks directly, you won’t be able to use your newly loaded code the next line after you asked it to be loaded, because it will be still loading.

For example: my_lovely_script.js contains MySuperObject:

var js = document.createElement("script");

js.type ="text/javascript";
js.src = jsFilePath;

document.body.appendChild(js);var s =newMySuperObject();Error:MySuperObjectisundefined

Then you reload the page hitting F5. And it works! Confusing…

So what to do about it ?

Well, you can use the hack the author suggests in the link I gave you. In summary, for people in a hurry, he uses en event to run a callback function when the script is loaded. So you can put all the code using the remote library in the callback function. For example:

function loadScript(url, callback){// Adding the script tag to the head as suggested beforevar head = document.getElementsByTagName('head')[0];var script = document.createElement('script');
    script.type ='text/javascript';
    script.src = url;// Then bind the event to the callback function.// There are several events for cross browser compatibility.
    script.onreadystatechange = callback;
    script.onload = callback;// Fire the loading

Then you write the code you want to use AFTER the script is loaded in a lambda function:

var myPrettyCode =function(){// Here, do what ever you want};

Then you run all that:

loadScript("my_lovely_script.js", myPrettyCode);

OK, I got it. But it’s a pain to write all this stuff.

Well, in that case, you can use as always the fantastic free jQuery library, which let you do the very same thing in one line:


   alert("Script loaded and executed.");// Here you can use anything you defined in the loaded script});


Source: StackOverflow Question

What’s the difference between display: none and visibility: hidden?

Question: What’s the difference between display: none and visibility: hidden?
The display: none and visibility: hidden CSS properties appear to be the same thing, but they aren’t.

These two style properties do two different things.

visibility: hidden hides the element, but it still takes up space in the layout.

display: none removes the element completely from the document. It does not take up any space, even though the HTML for it is still in the source code.

You can see the effect of these style properties on this page. I created three identical swatches of code, and then set the display and visibility properties on two so you can see how they look.

Source: http://webdesign.about.com/od/css/f/blfaqhidden.htm