-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove_corrupt_utf8() not working #41
Comments
Sure, thanks. Looks OK. Note that |
So in 0.5 I had to adapt further, due to
Further not sure if the index stepping with function remove_corrupt_utf8(s::AbstractString)
r = zeros(UInt8, endof(s)+1)
i = 1
for chr in s
try
r[i] = (UInt8(chr) != 0xfffd) ? chr : ' '
catch
r[i] = ' '
end
i = nextind(s,i)
end
return Compat.UTF8String(r)
end Seems reasonable? |
Not all unicode characters will fit in an UInt8. This line above will loose all non-ascii characters from the string, I think. I'd use something like this: function remove_corrupt_utf8(s::AbstractString)
r = IOBuffer()
i = 1
for chr in s
if chr != Char(0xfffd)
write(r, chr)
end
end
return takebuf_string(r)
end Are there any tests for this? |
Are there any updates/resolutions on this? |
Should be working with Julia > 1.0 and implementation like: function remove_corrupt_utf8(s::AbstractString)
return map(x->isvalid(x) ? x : ' ', s)
end |
The function
remove_corrupt_utf8()
does not work under Julia v0.4.6.The problem is the line
zeros(Char, endof(s)+1)
where it complains thatzero is not defined for type Char. When using UInt8 instead I could make it
run without error, but please check if this does what it is supposed to do.
Note that on the return statement I got rid of the
CharString()
too.If this is ok I can make another pull request.
Cheers,
Andre
The text was updated successfully, but these errors were encountered: