#StackBounty: #php #html #regex #parsing #pcre How to exclude text in HTML tags when replacing text with links in PHP?

Bounty: 50

This is the error I am trying to correct

<img class="lazy_responsive" title="<a href='kathryn-kuhlman-language-en-topics-718-page-1' title='Kathryn Kuhlman'>Kathryn Kuhlman</a> - iUseFaith.com" src="ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="<a href='kathryn-kuhlman-language-en-topics-718-page-1' title='Kathryn Kuhlman'>Kathryn Kuhlman</a> - iUseFaith.com" width="1600" height="517">

If you look carefully at the code above, you will see that the text in the attribute alt and Title were replaced with a link due to the fact that the keyword was in that text. As a result, my image is being displayed like with a tooltip which gives a link instead of just a name like this
enter image description here

Problem: I have an array with keywords where each keyword has its own URL which will serve as a link like this:

$keywords["Kathryn Kuhlman"] = "https://www.iusefaith.com/en-354";
$keywords["Max KANTCHEDE"] = "https://www.iusefaith.com/MaxKANTCHEDE";

I have a text with images and links … where those keywords may be found.

$text='Meet God's General Kathryn Kuhlman. <br>
<img class="lazy_responsive" title="Kathryn Kuhlman - iUseFaith.com" src="https://www.iusefaith.com/ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="Kathryn Kuhlman - iUseFaith.com" width="1600" height="517" />
<br>
Follow <a href="https://www.iusefaith.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>
<br>
Max KANTCHEDE
';

I want to replace each keyword with a full link to the keyword with the title without replacing the content of href nor the content of alt nor the content of title that is in the text. I did this

$lien_existants = array();

$regexp = "<as[^>]*href=("??)([^" >]*?)\1[^>]*>(.*)</a>";

  if(preg_match_all("/$regexp/siU", $text, $matches, PREG_SET_ORDER)) 
  {
    foreach($matches as $match) 
    {
     
         
        $lien_actuels_existant = filter_var($match[3], FILTER_SANITIZE_STRING);
        $lien_existants [] = trim($lien_actuels_existant);
          
      // $match[2] = link address
      // $match[3] = link text
        
        echo $match[2], '', $match[3], '<br>';
    }
  } 


foreach(@$keywords as $name => $value) 
     {

          
         if(!in_array($name, $lien_existants)&&!preg_match("/'/i", $name)&&!preg_match('/"/i', $name))
        {

            $text =  trim(preg_replace('~(b'. $name.'b)~ui', "<a href='$value' title='$name'>$1</a>", $text));

        }
        else
        {
            $name = addslashes($name);
            
        $text =  trim(preg_replace('~(b'. $name.'b)~ui', "<a href='$value' title='$name'>$1</a>", $text));
        }
        ######################################### 


     }

This replaces the words with links but also replaces it in the attributes alt, title in images.

How to prevent it from replacing the text from alt, title, and href ?

Note I have tried all the other solutions I have found on S.O so if you think one works kindly use my code above and show me how it should be done because if I knew how to make it work I would not be asking it here.

Thanks


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.