Aquileo | Recent changes to ticketshttps://sourceforge.net/p/thehtmldom/tickets/Recent changes to ticketsenSun, 08 Feb 2015 17:15:04 -0000Aquileo | HtmlNodeList iterators are brokenhttps://sourceforge.net/p/thehtmldom/tickets/6/<div class="markdown_content"><p>The HtmlNodeList iterator contract is broken for 2.7. The <strong>next</strong>() method is not called. The next() method is called. To fix this, create a new class HtmlNodeListIter() with <strong>init</strong> and next methods. In init() pass the nodeList being iterated and save it and the counter, and return self. In next() use the same logic as in <strong>next</strong> from HtmlNodeList. Make sure to actually call the StopIteration() constructor. <br />
</p></div>AnonymousSun, 08 Feb 2015 17:15:04 -0000https://sourceforge.netae7299a18a12fb2e040aff55397864141633b509Aquileo | HTMLDOM cannot correclty parse html element without proper closing taghttps://sourceforge.net/p/thehtmldom/tickets/5/<div class="markdown_content"><p>Htmldom cannot parse tags which aren't closed properly.<br />
For example: <br />
<code><br />
<AREA SHAPE="RECT" COORDS="2,2,95,30" HREF="../index.shtml" alt="Home"><br />
</code><br />
According to html standats it is acceptable, but htmldom fails to correctly parse it. </p>
<p>For example:</p>
<div class="codehilite"><pre><span class="nb">from</span> <span class="nx">htmldom</span> <span class="k">import</span> <span class="nx">htmldom</span>
<span class="n">dom</span> <span class="o">=</span> <span class="nx">htmldom.HtmlDom</span><span class="p">()</span><span class="bp">.</span><span class="nx">createDom</span><span class="p">(</span><span class="s2">"""</span>
<span class="s2"><body></span>
<span class="s2"><MAP NAME="</span><span class="nx">top_nav_map</span><span class="s2">"></span>
<span class="s2"> <AREA SHAPE="</span><span class="nb">RECT</span><span class="s2">" COORDS="</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">95</span><span class="p">,</span><span class="mi">30</span><span class="s2">" HREF="</span><span class="nx">..</span><span class="p">/</span><span class="nx">index.shtml</span><span class="s2">" alt="</span><span class="nb">Home</span><span class="s2">"></span>
<span class="s2"> <AREA SHAPE="</span><span class="nb">RECT</span><span class="s2">" COORDS="</span><span class="mi">99</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">220</span><span class="p">,</span><span class="mi">30</span><span class="s2">" HREF="</span><span class="nx">..</span><span class="p">/</span><span class="nx">Components.shtml</span><span class="s2">" alt="</span><span class="nb">Components</span><span class="s2">"></span>
<span class="s2"> <AREA SHAPE="</span><span class="nb">RECT</span><span class="s2">" COORDS="</span><span class="mi">224</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">319</span><span class="p">,</span><span class="mi">30</span><span class="s2">" HREF="</span><span class="nx">..</span><span class="p">/</span><span class="nx">HardwareMain.shtml</span><span class="s2">" alt="</span><span class="nx">Hardware</span><span class="s2">"></span>
<span class="s2"> <AREA SHAPE="</span><span class="nb">RECT</span><span class="s2">" COORDS="</span><span class="mi">324</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">402</span><span class="p">,</span><span class="mi">30</span><span class="s2">" HREF="</span><span class="nx">..</span><span class="p">/</span><span class="nx">Boards.shtml</span><span class="s2">" alt="</span><span class="nx">Boards</span><span class="s2">"></span>
<span class="s2"> <AREA SHAPE="</span><span class="nb">RECT</span><span class="s2">" COORDS="</span><span class="mi">406</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">477</span><span class="p">,</span><span class="mi">30</span><span class="s2">" HREF="</span><span class="nx">..</span><span class="p">/</span><span class="nx">BooksMain.shtml</span><span class="s2">" alt="</span><span class="nx">Books</span><span class="s2">"></span>
<span class="s2"> <AREA SHAPE="</span><span class="nb">RECT</span><span class="s2">" COORDS="</span><span class="mi">482</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">535</span><span class="p">,</span><span class="mi">30</span><span class="s2">" HREF="</span><span class="nx">..</span><span class="p">/</span><span class="nx">Kits.shtml</span><span class="s2">" alt="</span><span class="nx">Kits</span><span class="s2">"></span>
<span class="s2"></MAP></span>
<span class="s2"><h1>Hello</h1></span>
<span class="s2"></body></span>
<span class="s2">"""</span><span class="p">)</span>
<span class="n">table</span> <span class="o">=</span> <span class="nx">dom.find</span><span class="p">(</span><span class="s2">"body"</span><span class="p">)</span>
<span class="nx">print</span><span class="p">(</span><span class="nx">table.html</span><span class="p">())</span>
</pre></div>
<p>This code print:</p>
<div class="codehilite"><pre><span class="nt"><body></span>
<span class="nt"><map</span> <span class="na">NAME=</span><span class="s">"top_nav_map"</span><span class="nt">></span>
<span class="nt"><area</span> <span class="na">COORDS=</span><span class="s">"2,2,95,30"</span> <span class="na">SHAPE=</span><span class="s">"RECT"</span> <span class="na">alt=</span><span class="s">"Home"</span> <span class="na">HREF=</span><span class="s">"../index.shtml"</span><span class="nt">></span>
<span class="nt"><area</span> <span class="na">COORDS=</span><span class="s">"99,2,220,30"</span> <span class="na">SHAPE=</span><span class="s">"RECT"</span> <span class="na">alt=</span><span class="s">"Components"</span> <span class="na">HREF=</span><span class="s">"../Components.shtml"</span><span class="nt">></span>
<span class="nt"><area</span> <span class="na">COORDS=</span><span class="s">"224,2,319,30"</span> <span class="na">SHAPE=</span><span class="s">"RECT"</span> <span class="na">alt=</span><span class="s">"Hardware"</span> <span class="na">HREF=</span><span class="s">"../HardwareMain.shtml"</span><span class="nt">></span>
<span class="nt"><area</span> <span class="na">COORDS=</span><span class="s">"324,2,402,30"</span> <span class="na">SHAPE=</span><span class="s">"RECT"</span> <span class="na">alt=</span><span class="s">"Boards"</span> <span class="na">HREF=</span><span class="s">"../Boards.shtml"</span><span class="nt">></span>
<span class="nt"><area</span> <span class="na">COORDS=</span><span class="s">"406,2,477,30"</span> <span class="na">SHAPE=</span><span class="s">"RECT"</span> <span class="na">alt=</span><span class="s">"Books"</span> <span class="na">HREF=</span><span class="s">"../BooksMain.shtml"</span><span class="nt">></span>
<span class="nt"><area</span> <span class="na">COORDS=</span><span class="s">"482,2,535,30"</span> <span class="na">SHAPE=</span><span class="s">"RECT"</span> <span class="na">alt=</span><span class="s">"Kits"</span> <span class="na">HREF=</span><span class="s">"../Kits.shtml"</span><span class="nt">></span>
<span class="nt"></area></span>
<span class="nt"><h1></span>
Hello
<span class="nt"></h1></span>
<span class="nt"></area></span>
<span class="nt"></area></span>
<span class="nt"></area></span>
<span class="nt"></area></span>
<span class="nt"></area></span>
<span class="nt"></map></span>
<span class="nt"></body></span>
</pre></div>
</div>AnonymousWed, 03 Sep 2014 12:24:09 -0000https://sourceforge.netc52f8ab730c0e2daa6a1e5e366703b83b1b3b63bAquileo | Attributes cannot contain quoteshttps://sourceforge.net/p/thehtmldom/tickets/4/<div class="markdown_content"><p>The HTML spec allows attributes delimited with double quotes to contain single quotes and vice versa. However code like</p>
<div class="codehilite"><pre><span class="n">page</span> <span class="o">=</span> <span class="s2">"""<a title="</span><span class="nx">It</span><span class="s1">'s bugged!"></a>"""</span>
<span class="s1">dom = htmldom.HtmlDom().createDom(page)</span>
</pre></div>
<p>enters an infinite loop. It looks the the regular expression used for attributes does not allow for this.</p></div>AnonymousSat, 28 Jun 2014 06:05:46 -0000https://sourceforge.net0b015d466e605786b54d4171ab9690467104a55dAquileo | #3 Parse issue when element attributes either don't have a value or are not wrapped in quoteshttps://sourceforge.net/p/thehtmldom/tickets/3/?limit=50#0203<div class="markdown_content"><p>Sorry, I actually forgot that I e-mailed you about this issue. I had prepared this ticket before I e-mailed you, and now I saw that this ticket was unsaved, and I saved it without a second thought.</p></div>AnonymousWed, 19 Mar 2014 19:48:06 -0000https://sourceforge.netecda79d46f232ffd638aed4028b0af5eabc17059Aquileo | Parse issue when element attributes either don't have a value or are not wrapped in quoteshttps://sourceforge.net/p/thehtmldom/tickets/3/<div class="markdown_content"><p>Hello,</p>
<p>I'm having unexpected trouble while parsing HTML code when the value-part for an element attribute is either omitted or is not wrapped in quotes. While I'm not completely certain that it's a bug, it's not obvious to me that it's not a bug.</p>
<p>Code:</p>
<div class="codehilite"><pre><span class="kn">from</span> <span class="nn">htmldom</span> <span class="kn">import</span> <span class="n">htmldom</span>
<span class="n">htmlInput</span> <span class="o">=</span> <span class="s">"""<form></span>
<span class="s"> <select name="country_code"></span>
<span class="s"> <option value="GB" selected>United Kingdom</option></span>
<span class="s"> <option value="AL">Albania</option></span>
<span class="s"> </select></span>
<span class="s"></form>"""</span>
<span class="n">dom</span> <span class="o">=</span> <span class="n">htmldom</span><span class="o">.</span><span class="n">HtmlDom</span><span class="p">()</span><span class="o">.</span><span class="n">createDom</span><span class="p">(</span><span class="n">htmlInput</span><span class="p">)</span>
<span class="n">form</span> <span class="o">=</span> <span class="n">dom</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">"form"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Countries:"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">option</span> <span class="ow">in</span> <span class="n">form</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">"select[name=country_code] > option"</span><span class="p">):</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">option</span><span class="o">.</span><span class="n">attr</span><span class="p">(</span><span class="s">"value"</span><span class="p">)</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">option</span><span class="o">.</span><span class="n">text</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">" {0} = {1}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">text</span><span class="p">))</span>
</pre></div>
<p>Output:</p>
<div class="codehilite"><pre><span class="n">Countries</span><span class="o">:</span>
<span class="n">AL</span> <span class="o">=</span> <span class="n">selected</span><span class="o">></span><span class="n">United</span> <span class="n">Kingdom</span>
<span class="n">AL</span> <span class="o">=</span> <span class="n">Albania</span>
</pre></div>
<p>I expected the first element to have the value "GB" and the text "United Kingdom".</p>
<p>If I do this:</p>
<div class="codehilite"><pre><span class="nt"><option</span> <span class="na">value=</span><span class="s">"GB"</span> <span class="err">selected</span><span class="nt">></span>United Kingdom<span class="nt"></option></span>
</pre></div>
<p>or</p>
<div class="codehilite"><pre><span class="nt"><option</span> <span class="na">value=</span><span class="s">"GB"</span> <span class="na">selected=</span><span class="err">></span><span class="s">United</span> <span class="err">Kingdom</option</span><span class="nt">></span>
</pre></div>
<p>or</p>
<div class="codehilite"><pre><span class="nt"><option</span> <span class="na">value=</span><span class="s">"GB"</span> <span class="na">selected=</span><span class="s">something</span><span class="nt">></span>United Kingdom<span class="nt"></option></span>
</pre></div>
<p>then the issue exists.</p>
<p>However, if I do this:</p>
<div class="codehilite"><pre><span class="nt"><option</span> <span class="na">value=</span><span class="s">"GB"</span> <span class="na">selected=</span><span class="s">""</span><span class="nt">></span>United Kingdom<span class="nt"></option></span>
</pre></div>
<p>or</p>
<div class="codehilite"><pre><span class="nt"><option</span> <span class="na">value=</span><span class="s">"GB"</span><span class="nt">></span>United Kingdom<span class="nt"></option></span>
</pre></div>
<p>then I get the expected result.</p>
<p>Thank you for any help you can provide!</p></div>AnonymousWed, 19 Mar 2014 19:44:07 -0000https://sourceforge.net190b95a9c76ff6d5a8b88df15d5539b9731d6682Aquileo | #2 Nodelists containing nested elements loop infinitely.https://sourceforge.net/p/thehtmldom/tickets/2/?limit=25#7184<div class="markdown_content"><p>In order to make an object iterator python requires us to implement "__iter__" and "next" function(for python 2.7) and "__next__"(for python 3.x). You must be testing this code in python 2.x interpreter. The current HtmlNodeList does not implement "next" function, that is why the above code is not working but will work fine in python 3.x interpreter.</p>
<p>I knew about this bug when i implemented the code(i am really ashamed to say this). I implemented the "next" function but it didn't work. The control was not even entering the "next" function. I spent hours on it but failed to identify the bug. So i just implemented "__next__" function to make it work on python 3.x interpreters. </p>
<p>But after seeing this ticket, i just gone through the entire code and to my horror i saw a "next" function defined(which returns siblings of the current node list) below "__next__" function. This new def was overriding the previous function def of "next", resulting in infinite loop.</p>
<p>This is one of those tricky bugs where you are really confident that some part of the code works correctly and you don't even want to take a look at it. And it is also one of the disadvantageous of dynamic languages as they don't complain even if they see multiple functions with the same name.</p>
<p>I will correct the code upload the new one as soon as possible.</p>
<p>Happy coding :)</p></div>Bhimsen.S.KularniThu, 02 Jan 2014 17:30:38 -0000https://sourceforge.net81fdf58859ee4dca533099e604273198b7e60cbbAquileo | #2 Nodelists containing nested elements loop infinitely.https://sourceforge.net/p/thehtmldom/tickets/2/?limit=50#f63b<div class="markdown_content"><p>In order to make an object iterator python requires us to implement "__iter__" and "next" function(for python 2.7) and "__next__"(for python 3.x). You must be testing this code in python 2.x interpreter. The current HtmlNodeList does not implement "next" function, that is why the above code is not working but will work fine in python 3.x interpreter.</p>
<p>I knew about this bug when i implemented the code(i am really ashamed to say this). I implemented the "next" function but it didn't work. The control was not even entering the "next" function. I spent hours on it but failed to identify the bug. So i just implemented "__next__" function to make it work on python 3.x interpreters. </p>
<p>But after seeing this ticket, i just gone through the entire code and to my horror i saw a "next" function defined(which returns siblings of the current node list) below "__next__" function. This new def was overriding the previous function def of "next", resulting in infinite loop.</p>
<p>This is one of those tricky bugs where you are really confident that some part of the code works correctly and you don't even want to take a look at it. And it is also one of the disadvantageous of dynamic languages as they don't complain even if they see multiple functions with the same name.</p>
<p>I will correct the code upload the new one as soon as possible.</p>
<p>Happy coding :)</p></div>AnonymousThu, 02 Jan 2014 17:25:16 -0000https://sourceforge.net69dc92c3f3d468a9e0d9648f39921055bd3d386aAquileo | Nodelists containing nested elements loop infinitely.https://sourceforge.net/p/thehtmldom/tickets/2/<div class="markdown_content"><p>If you create a dom containing nested elements, ala:</p>
<div class="codehilite"><pre><span class="nt"><html><body></span>
<span class="nt"><div><p></span>Level One<span class="nt"></p></span>
<span class="nt"><div><p></span>Level Two<span class="nt"></p></div></span>
<span class="nt"></div></span>
<span class="nt"></body></html></span>
</pre></div>
<p>And then try to iterate over a nodelist containing the nested elements, it loops infinitely</p>
<div class="codehilite"><pre><span class="k">for</span> <span class="n">item</span> <span class="n">in</span> <span class="n">dom</span><span class="p">.</span><span class="n">find</span><span class="p">(</span> <span class="s">"div"</span> <span class="p">)</span><span class="o">:</span>
<span class="n">print</span> <span class="n">item</span><span class="p">.</span><span class="n">html</span><span class="p">()</span>
</pre></div>
</div>AnonymousThu, 02 Jan 2014 09:02:03 -0000https://sourceforge.netbdc97c163e2067dcf7b530280d21a4e560dcd3c4Aquileo | #1 Problem found in createDom() method https://sourceforge.net/p/thehtmldom/tickets/1/?limit=25#e816<div class="markdown_content"><p>Thank you very much for finding the bug.</p></div>Bhimsen.S.KularniSun, 07 Apr 2013 09:57:42 -0000https://sourceforge.net9f4d6a1023f039055f4faf7a8e824393dd893d52Aquileo | Problem found in createDom() method https://sourceforge.net/p/thehtmldom/tickets/1/<div class="markdown_content"><p>Hello,<br />
I wanted to start using htmldom just a while ago, and everytime I was using createDom() exception was raised. I've read the code and found a solution. It was all about changing import statement in line 317 from import urllib.request as urllib2 to import urllib2. Hope it helps if anyone have a similar problem.</p></div>AnonymousTue, 02 Apr 2013 17:34:07 -0000https://sourceforge.net6ff877cf9ae27cf6310a0a55f9e6b40e5cb8126b