Prime 357

We'll learn something

Site Menu

  • Home
  • Recent Posts
  • Forum
    • Programming Languages
      • C++
    • Website Design & Content Management
      • Wordpress >> Drupal
  • Blogs
  • Books
    • C++
    • Changing hosts - Dummies Guide
    • Wordpress >> Drupal
  • Download Centre
  • Contact us
Home


Image - OpenID

User login

What is OpenID?
  • Log in using OpenID
  • Cancel OpenID login
  • Create new account
  • Request new password

Navigation

  • Recent posts

Books

  • C++ (The Book)
  • Changing Hosts - a Dummies Guide
  • Wordpress to Drupal

Recent comments

  • Thanks
    2 weeks 1 day ago
  • I'm running the conversion
    3 weeks 4 days ago
  • Can't reproduce
    3 weeks 6 days ago
  • Strange one
    3 weeks 6 days ago
  • No customer support
    5 weeks 2 days ago
  • Came to the rescue
    6 weeks 6 days ago
  • Permalink - %postname%
    7 weeks 5 hours ago
  • Downloads are now ready.
    7 weeks 1 day ago
  • Sorry, I'm just having some
    7 weeks 1 day ago
  • Awesome
    8 weeks 2 days ago

New forum topics

  • funny little bug in mac version
  • Error: Unable to Insert into Node_revisions table when converting from wordpress 2.6.0 to drupal 6.4
  • index.php?
  • where do i download?
  • Source connection settings are correct but it appears it's the wrong database
more

Who's online

There are currently 0 users and 1 guest online.

Who's new

  • puzz1ed1
  • bugmenot
  • ClaudiaB
  • beiduo
  • chourmovs

Code - replace html entities with symbols

Submitted by Steve on Wed, 25 Jun, 2008 - 16:04
  • PHP
  • RealBasic
  • code

As a result of an enquiry from this forum post I followed up with a very quick "pseudo code" version of how to strip HTML entities from code and replace them with their corresponding symbols.

The Drupal Aggregator module also suffers from the fact that post titles are not HTML rendered. Even though the following code is not PHP but rather Realbasic I'm sure any skilled PHP developer should be able to convert/translate to PHP.

Without further ado, here's the code.

Function Convert_Html_Code_To_Symbol(pStr as string) As string
  // Version 1.04 - 2 Jun 2008.
  // Converts raw text, containing html symbol codes such as (" &) to their numerical equivalents (" &)
  // The numerical codes are retrieved from 'charactercodes.txt' text file in same directory as this program.
  // After codes changed to numerical codes then the string can be run through the RegEx converter, changing to symbols.
 
  Dim intPos as Integer
  Dim intAmp as integer
  Dim intSemiColon as integer
  Dim intLen as Integer
  Dim strKey as string
  Dim strReplace as string
  Dim strDone as string
 
  Dim x as Integer

  intLen = len(pStr)
  intAmp = instr(pStr, ";")
 
  While len(pStr) > 0
   
    intPos = instr(pStr, "&")

    if intPos = 0 then
      strDone = strDone + pStr
      pStr = ""
      exit while
    else
      if intPos > 1 then
        strDone = strDone + left(pStr,intPos - 1)
        pStr = right(pStr,len(pStr) - intPos +1)
      end if

      intAmp = instr(2,pstr,"&")
      intSemiColon = instr(pStr, ";")

      if intSemiColon = 0 then
        strDone = strDone + pStr
        pStr = ""
        exit while
      else
        if intAmp > 0 then
          if intSemiColon > intAmp then
            strDone = strDone + left(pStr, intAmp - 1)
            pStr = right(pStr,len(pStr) - intAmp + 1)
          else
            strKey = left(pStr,intSemiColon)

            if gDictCharCodes.HasKey(strKey) then
              strReplace = gDictCharCodes.Value(strKey)
              strDone = strDone + strReplace
            else
              strDone = strDone + strKey
            end if
            pStr = right(pStr, len(pStr) - len(strKey) )
          end if

        else
          strKey = left(pStr,intSemiColon )
         
          if gDictCharCodes.HasKey(strKey) then
            strReplace = gDictCharCodes.Value(strKey)
            strDone = strDone + strReplace
          else
            strDone = strDone + strKey
          end if

          pStr = right(pStr, len(pStr) - len(strKey) )
        end if
      end if
    end if
  wend

  strDone = Convert_Html_Reg_Ex_To_Symbol(strDone)
  return strDone

End Function

The above function first ensures that all HTML entities are in the form of numerical code rather than alphabetical code.

Please note, that gDictCharCodes is a global dictionary variable and is why it's not declared in this function. gDictCharCodes stores key/value pairs relating to HTML entity codes (alphabetic) and corresponding numerical codes.

With this out of the way, the text/string is then passed through the next function which actually replaces the numerical code with their corresponding symbols.

Function Convert_Html_Reg_Ex_To_Symbol(pStrSource as string) As string
  Dim strReturn as string
 
  strReturn = pStrSource
 
  Dim expresion As New RegEx
  Dim expresionmatch As New RegExMatch
  expresion.SearchPattern = "\&#(\d+);"
 
  expresionmatch = expresion.Search(strReturn)
  While expresionmatch<>Nil
    expresion.ReplacementPattern = Encodings.UTF8.Chr(Val(expresionmatch.SubExpressionString(1)))
   
    strReturn = expresion.Replace(strReturn)
    expresionmatch = expresion.Search(strReturn)
   
  WEnd
 
  return strReturn
End Function

The whole process comes down to this particular line of code,

  1. expresion.ReplacementPattern = Encodings.UTF8.Chr(Val(expresionmatch.SubExpressionString(1)))

Rather than trying to explain this code, which is why I hadn't published it until now, a time thing, I thought I'd just publish then address questions, if any, later.

  • Login or register to post comments
  • 114 reads

 Subscribe in a reader

free hit counter


RoopleTheme