Code - replace html entities with symbols
As a result of an enquiry from this forum post I followed up with a very quick "pseudo code" version of how to strip HTML entities from code and replace them with their corresponding symbols.
The Drupal Aggregator module also suffers from the fact that post titles are not HTML rendered. Even though the following code is not PHP but rather Realbasic I'm sure any skilled PHP developer should be able to convert/translate to PHP.
Without further ado, here's the code.
// Version 1.04 - 2 Jun 2008.
// Converts raw text, containing html symbol codes such as (" &) to their numerical equivalents (" &)
// The numerical codes are retrieved from 'charactercodes.txt' text file in same directory as this program.
// After codes changed to numerical codes then the string can be run through the RegEx converter, changing to symbols.
Dim intPos as Integer
Dim intAmp as integer
Dim intSemiColon as integer
Dim intLen as Integer
Dim strKey as string
Dim strReplace as string
Dim strDone as string
Dim x as Integer
intLen = len(pStr)
intAmp = instr(pStr, ";")
While len(pStr) > 0
intPos = instr(pStr, "&")
if intPos = 0 then
strDone = strDone + pStr
pStr = ""
exit while
else
if intPos > 1 then
strDone = strDone + left(pStr,intPos - 1)
pStr = right(pStr,len(pStr) - intPos +1)
end if
intAmp = instr(2,pstr,"&")
intSemiColon = instr(pStr, ";")
if intSemiColon = 0 then
strDone = strDone + pStr
pStr = ""
exit while
else
if intAmp > 0 then
if intSemiColon > intAmp then
strDone = strDone + left(pStr, intAmp - 1)
pStr = right(pStr,len(pStr) - intAmp + 1)
else
strKey = left(pStr,intSemiColon)
if gDictCharCodes.HasKey(strKey) then
strReplace = gDictCharCodes.Value(strKey)
strDone = strDone + strReplace
else
strDone = strDone + strKey
end if
pStr = right(pStr, len(pStr) - len(strKey) )
end if
else
strKey = left(pStr,intSemiColon )
if gDictCharCodes.HasKey(strKey) then
strReplace = gDictCharCodes.Value(strKey)
strDone = strDone + strReplace
else
strDone = strDone + strKey
end if
pStr = right(pStr, len(pStr) - len(strKey) )
end if
end if
end if
wend
strDone = Convert_Html_Reg_Ex_To_Symbol(strDone)
return strDone
End Function
The above function first ensures that all HTML entities are in the form of numerical code rather than alphabetical code.
Please note, that gDictCharCodes is a global dictionary variable and is why it's not declared in this function. gDictCharCodes stores key/value pairs relating to HTML entity codes (alphabetic) and corresponding numerical codes.
With this out of the way, the text/string is then passed through the next function which actually replaces the numerical code with their corresponding symbols.
Dim strReturn as string
strReturn = pStrSource
Dim expresion As New RegEx
Dim expresionmatch As New RegExMatch
expresion.SearchPattern = "\&#(\d+);"
expresionmatch = expresion.Search(strReturn)
While expresionmatch<>Nil
expresion.ReplacementPattern = Encodings.UTF8.Chr(Val(expresionmatch.SubExpressionString(1)))
strReturn = expresion.Replace(strReturn)
expresionmatch = expresion.Search(strReturn)
WEnd
return strReturn
End Function
The whole process comes down to this particular line of code,
-
expresion.ReplacementPattern = Encodings.UTF8.Chr(Val(expresionmatch.SubExpressionString(1)))
Rather than trying to explain this code, which is why I hadn't published it until now, a time thing, I thought I'd just publish then address questions, if any, later.


Recent comments
2 weeks 1 day ago
3 weeks 4 days ago
3 weeks 6 days ago
3 weeks 6 days ago
5 weeks 2 days ago
6 weeks 6 days ago
7 weeks 5 hours ago
7 weeks 1 day ago
7 weeks 1 day ago
8 weeks 2 days ago