Sunday, February 18, 2007

Unicode

Unicode is a standard way of character encoding which was designed to replace all old encodings S.A. ASCII using the Unicode standard transformation.

ASCII was a way of representing English characters, numbers and some punctuation marks by giving each character a code between 32 and 126 while other codes are unprintable and reserved for special use. E.g. '\b', '\n', '\a', '\v', '\r'.


ASCII character set

ASCII could be represented in 7-bits but conveniently it's stored in 8-bits.


Codes from 128 to 255 can be used for any purpose. But IBM-PC had something called OEM character set which provides the first 128 ASCII character set in addition to some drawing characters. Horizontal bars, vertical bars…



OEM character set

Here troubles began. Every one can use Codes from 128 to 255 as he wish. And support his regional language. The matter became worse with Asian languages.


Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.


Originally Unicode was designed to provide a single character set that contains every available character on the Earth. Every character is assigned a value and written as U+XXXX in hex-decimal. So A is assigned a U+0041.


There are some design principles:

  • Unicode Standard concerns with characters not glyphs. Glyphs are shapes that appear on screen, a printed document.
    A character can have more than one glyph.
    e.g.: A, A and A are different glyphs to letter A. While Ain (U+0645) (ﻋ ﻊ ) has 4 forms (isolated, initial, medial, final). So drawing these glyphs is the responsibility of text rendering while Unicode standards concerns only with characters representation in memory or other media.

  • Plain text is a sequence of character codes while rich text or fancy text contains additional information such as text color, size, font…
    Unicode standard encodes only plain text.

  • Unicode text is stored in their Logical order. When mixing different languages Unicode still preserves logical order.
    e.g. : mixing English and Arabic text will be stored in logical order while rendering this text will be according to direction of each language.
    Other issue is combining marks (accent in some Latin scripts and vowel marks in Arabic (التشكيل)) follows base characters while in rendering they don't appear linearly.

  • Unicode provides Unification concept that is each character has a unique code. Common letters and punctuation marks are given a unique code regardless of the language.

  • Conversion between Unicode and other standards must be guaranteed.
    In general a single code in other standard maps to a single code in Unicode.


These design principles are not satisfied for Unicode implementation. They are only principles.


Encoding forms for Unicode:


There are two major categories for Unicode encoding:

UTF universal transformation format and UCS universal character set. The major difference between UTF and UCS is that UTF is variable length encoding while UCS is fixed length encoding.


UCS-2

A fixed length encoding consists of 2 bytes sometimes named as plain Unicode.

Let's encode Hello world!

U+0048 U+0065 U+006C U+006C U+006F U+0020 U+0077 U+006F U+0072 U+006C U+0064.

This can be stored as:

Little-Endian: 4800 6500 6C00 6C00 6F00 2000 7700 6F00 7200 6C00 6400.

Or Big-Endian: 0048 0065 006C 006C 006F 0020 0077 006F 0072 006C 0064.

Little-Endian is more common.


UTF-8:


  • Uses from one to four bytes. All ASCII symbols require one byte.

  • Two bytes for Greek, Armenian, Arabic, Hebrew…


Range

Encoding

000000–00007F

ASCII characters, byte begins with zero bits. ( coded as original ASCII encoding )

000080–0007FF

First byte begins with 110 bits the following byte begins with 10.

000800–00FFFF

First byte begins with 1110 bits; the following bytes begin with 10.

010000–10FFFF

First byte begins with 11110 bits, the following bytes begin with 10


E.g.: Value of 0x3D6 01111010110 is encoded in 2-bytes 11001111 10010110.


UTF-16:

  • Uses 2-bytes or 4-bytes encoding.

  • More complex than UTF-8, designed mainly to extend the range of UCS-2.

  • For encoding: all codes above U+FFFF must be encoded in two words a (pair)
    E.g. U+23458 :

    • First subtract 10000 = 13458.

    • Divide the resulting 20-bit into two halves: 0001001101, 0001011000.

    • Initialize the first 6-bits of the first word with 110110 then add the higher half that is 1101100001001101 or 0xD84D.

    • Initialize the first 6-bits of the second word with 110111 then add the lower half that is 1101110001011000 or 0xDC58.

    • Finally the encoding is 0xD84D 0xDC58.

  • For safety values from U+D800 to U+DBFF and U+DC00 to U+DFFF are not available to represent characters in 2-byte format (used to represent values higher than 0xFFFF).


Java supports UTF-8 in normal programming through InputStreamReader and OutputStreamWriter.


To create a string literal in C you use char str[] = "Hello world!" which is encoded as ASCII to create a literal encoded in UCS-2 use char str[] = L"Hello world!"


Here are some values from Unicode:

A A A

Letter A

U+0041

AB

Letter A + Letter B

U+0041 + U+0042

a

Small A

U+0061

ﻫ ﻪ ﻬ ﻫ ﻪ ﻬ

Letter Heh

U+0647

َُ

Damma ضمه

U+064F

نُ

Noon + Dammaنون + ضمه

U+0646 + U+064F


The Unicode Consortium is a non-profit organization that is responsible for Unicode's development.


Try playing with Unicode encodings create a simple html file maybe with hex editor write 0048 0065 006C 006C 006F 0020 0077 006F 0072 006C 0064 and view the file using a browser and changing the browser's character encoding.


Finally this article was so long than I expected. I couldn't cover every thing I want. You can read THE UNICODE STANDARD book versions 3.0, 4.0 and 5.0.


You can refer to www.unicode.org for more info.


Monday, August 07, 2006

Smart pointers

Smart pointers are template classes that store pointers to classes. Smart pointers provide same functionality of built-in pointers with some more smart operations.

First let us look at drawbacks of ordinary pointers:

  1. No automatic deletion.

  2. More than one reference.

Class1* x = new Class1() ;

Class1* y = x ;

Now who will delete Class1()? x or y .

  1. No garbage collection.

  2. Assignment operator not suitable.


template <>
class SmartPointer

{
public:
explicit SmartPointer(T* pointer) : ptr(pointer);
SmartPtr& operator=(const SmartPtr&amp;amp; other);
~SmartPtr();
T& operator*() const
{
return *ptr;
}
T* operator->() const
{
return ptr;
}
private:
T* ptr;
};


Smart pointers do the same operations as ordinary pointers as:

  • T& operator*() const

  • T* operator->() const

While other smart capabilities can be implemented as:

  • Auto initialize pointer to null.

  • Destructor frees allocated memory.

  • Smart assignment

template

SmartPointer & SmartPointer ::operator=( SmartPointer & sptr)

{

if (this != &rhs) {
delete ptr;
ptr = sptr.ptr;
sptr.ptr = NULL;
}
return *this;
}
return *this;
}

Now we gave a pointer to only the newly assigned pointer and deleted the other one!
Other ideas may be implemented as needed as allocating new data and copying it!!

  • Garbage collection(c++ doesn’t have garbage collection so smart pointers can be used for that purpose but how??).

Smart pointers in STL

  • auto_ptr is an example of smart pointers.

Sunday, July 30, 2006

Directories in c

This tutorial covers directory in c:

Before proceeding include the following file:

dirent.h:
Contains definition of DIR structure. Just similar to FILE structure you will see later.
To open a directory use
DIR* dir = opendir("..");
".." means open current directiory.
"." means parent directory.
else a sub directory or complete path e.g: “e:\\” remember why double slash??

Now the directory is opened how to loop for directories or files in this opened directory??
struct dirent* dent = dent=readdir(dir) ;
Reads next entry in the opened directory. entry may be a file or directory.

Finally to close opened directory:
closedir(dir);

The following code is self explanatory.

/*

file : dir.c

A tutorial in opening directories in c.

*/

#include

#include


int main(int argc,char **argv)

{

/*

A directory entry pointer.

*/

struct dirent* dent;

/*

open current directory.

".." means open current directiory.

*/

DIR* dir = opendir("..");

printf("\n***** DIR LISTING *****\n\n");

/*

if dir is null then open failed

may be due to directory is not present

or access is denied.

*/

if(dir)

{

/*

readdir reads next direcory entry.

returns struct dirent* which ->d_name contains name of directory entry.

entry may be a file or directory.

returns null after finishing all entries.

*/

while((dent=readdir(dir)))

{

FILE* fptr ;

printf(dent->d_name);

/*

try to open dir as a file in read mode.

if it's opened then it's actually a file.

else it's a directory.

*/

if(fptr = fopen(dent->d_name,"r"))

{

printf("\t\tFile") ;

fclose(fptr) ;

}

else

printf("\t\tDirectory") ;

printf("\n");

}

/*

close opened directory.

*/

closedir(dir);

}

else

printf("Err. opening directory\n");

printf("\n");

getchar();

return 0;

}

Sunday, July 09, 2006

Lvalues and rvalues

Lvalues and rvalues

Have you found this error?
main.cc:12: non-lvalue in assignment
or any error involved with lvalues.

What are lvalues?
Lvalues are references to objects that appear to left side of assignment statements. This definition may be not accurate as const objects are lvalues although they cannot appear on the left side of an assignment statement. So, this definition can be modified to “references to objects”. So this objects must have memory locations or lvalues have addresses.

While rvalues:
Expressions that can appear on right side of assignment statements. Rvalues cannot appear on left side of assignment statements.

Lvalues and rvalues:
Lvalues can appear in a place that requires rvalues. Lvalues are converted to rvalues. Rvalues cannot be converted to lvalues. Therefore, it is possible to use every lvalue expression as rvalue place, but not vice versa.
These operators must be applied to lvalues:
1. &
2. ++ --
3. = += -= *= %= <<= >>= &= ^= |=

Examples of lvalues and rvalues:
1. x = 5; // x is an lvalue; 5 is an rvalue
2. array[10] = 3 // array[10] is an lvalue while 3 is an rvalue
3. X = Y + 5 ; // X is an lvalue Y + 5 is an rvalue
4. A function that returns a reference is an lvalue else is an rvalue expression.
5. *ptr = 10 // *ptr is an lvalue; 10 is an rvalue
6. x = y ; // x is an lvalue y is an lvalue converted to rvalue.
7. Casts are lvalues.

Enumerations and const data:
Both const data and enumerations can't appear at left side of assignment statements. Enumerations aren't lvalues while const data are lvalues. Enumerations doesn't have addresses. The are translated at compilation time to there numeric values. While static members are constructed at runtime.

class Class1{
public:
Class1( int x=100 ):MAX(x){}

const int MAX ;
};

Here MAX is initialized with x. MAX is const and have an address. Every instance of Class1 have a const data MAX and every instance may have different value. You may use static const data if it's common among all instances.

class Class2{

public:

enum { MAX=100 } ;

};

You may notice that enum can't be static because they aren't lvalues or they aren't constructed at run-time.

y+5 = 10 generated the top error.
Because y+5 is an rvalue placed in an lvalue place.

A final notice is #define statements are different from enumerations and const data as they are macros and substituted in a pre-compilation process.

Sunday, July 02, 2006

Metaprogramming


Metaprogramming


The term "programing a program" reflects metaprogramming. In other words writing a program that writes or maintains other programs. Seem ambiguous: you write a code that generates other code which is actually needed. Thus metaprogramming is generating some thing -actually code- at compile time rather than run time.
metaprogramming is very language dependent.

Article Contents:

1. Uses metaprogramming.
2. Benefits of metaprogramming.
3. Drawbacks of metaprogramming.
4. Examples of metaprogramming.
5. Starter example of metaprogramming.
6. Other template metaprogramming example.
7. Using metaprogramming in numerical analysis.
8. Unrolling loops.

Uses metaprogramming:

1. Generation of look up test tables.
2. Some small functions can be parametrized and used allover the program can be generated by metaprogramming.
3. Unroll looping. of course unrolled loops are more efficient than rolled loops.
4. Other uses you can imagine.

Benefits of metaprogramming:

1. Minimizes code.
2. Achieve more functionality.
3. Provides less effort for programmers and reduces maintenance effort.
4. Meta code is generated at compile time. Thus, results faster programs.

Drawbacks of metaprogramming:

1. Impossible to debug.
2. Hard to trace.
3. Ambiguous to some extent.

Examples of metaprogramming:

1. C/C++ macro processor.
2. C++ templates.
3. M4 processor.

Starter example of metaprogramming:

This small macro can be considered as a metaprogramming:
#define swap(x,y,type){ type __temp = x ; x = y ; y = __temp ; }
int x=4 , y=7 ;
swap(x,y,int) ;

Other template metaprogramming example:

This template calss has an emumeration data type that reserves 3 ^ n where n is the template parameter.
Note that template parameters can be ordinary variables.

// A template class to compute 3 to power n
template
class Pow3 {
public:
enum { result=3*Pow3::result };
};

// Base condition to end the recursion
template<>
class Pow3<0> {
public:
enum { result = 1 };
};

The line:
enum { result=3*Pow3::result };
Assignes result enumeration to result of Pow3 and till recursivly reaches Pow3<0>.

Using metaprogramming in numerical analysis:
Bisection method:

Solve the following equation:
x*x*x +x*x - 6*x = 0

// A template to solve the equation.
template
class Solve
{

public:

// compute the midpoint
enum { mid = (low+high+1)/2 };

// search in the halved interval
// equation is mid*mid*mid +mid*mid - 6*mid
enum { result = ( (mid*mid*mid +mid*mid - 6*mid) > 0 ) ?
Solve::result : Solve::result };

};

template
class Solve
{
public:
enum { result=N};
};


#include

int main(void)
{
std::cout << "Sol of the equation is" <<>::result;
return 0 ;
}

The variable mid is used to compute mid point for bisection method
mid = (low+high+1)/2
while result holds the result.
result = ( (mid*mid*mid +mid*mid - 6*mid) > 0 ) ? Solve::result : Solve::result
This code can't get the zero solution do you know why?
Try use the interval 0,1000 this results in large compilation time but same run time.

Unrolling loops:

// primary template
template
class ADD {
public:
static int result (int* a) {
return *a + ADD::result(a+1);
}
};

// partial specialization as end criteria
class ADD<1> {
public:
static int result (int* a) {
return *a ;
}
};

int main(void)
{
int a[] = { 1,3,5 } ;
std::cout << "Adding 3 elements:" <<>::result(a) ;
return 0 ;
}

Here we used ADD template class to unroll the addition.
Unrolling generally reduces the run time but leads to bigger output file.

Conclusion:

Metaprogramming enhances program efficiency by computing some thing at compilation time rather than runtime. Also it reduces source code size and increases it's functionality.

Links:

http://en.wikipedia.org/wiki/Metaprogramming
http://en.wikipedia.org/wiki/Template_metaprogramming
http://boost-consulting.com/mplbook/
http://www-128.ibm.com/developerworks/linux/library/l-metaprog1.html
http://www-128.ibm.com/developerworks/linux/library/l-metaprog2.html
http://www-128.ibm.com/developerworks/linux/library/l-metaprog3/?ca=dgr-wikiaMetaprogP3

Friday, June 16, 2006

Q & A Bitwise operations in C/C++

Q & A Bitwise operations in C/C++

Bitwise operations mean fast machine performance.
Using Macros means eliminating time cost of a function call.

Bitwise operations:


& Bitwise AND
| Bitwise OR
^ XOR
~ One's compliment flip bits
<<>> Right shift

Bitwise Q & A:


# define UNITY 0x00000001

Q: I want to raise a number to a power of x 2^x?
A:
#define TWOPWR( x ) UNITY << (x) int z = TWOPWR(x) ;

Q: I want to a get a bit of position pos from x?

A:
#define GETBIT(x,pos) ( ((x) & ( UNITY << (pos) ))!=0 ) unsigned x = 25 ; bool bit = GETBIT(x,5) ;

Q: I want to a set a bit of position pos from x by true?

A:
#define SETBIT(x,pos) ( (x) | ( UNITY << (pos) ) ) unsigned x = 25 ; x = SETBIT(x,10) ;

Q: I want to a reset a bit of position pos from x by false?

A:
#define RESETBIT(x,pos) ( (x) & ~( UNITY << (pos) ) ) unsigned x = 25 ; x = RESETBIT(x,0) ;

Q: I want to a reset a bit of position pos from x by false?

A:
#define ISPWRTWO(x) (!((x) & ((x) - 1))
bool bit = ISPWRTWO(5) ;

Q: I want to swap 2 variables a, b without external space?
A:
#define SWAP(x, y) (((x) ^= (y)), ((y) ^= (x)), ((x) ^= (y)))
int d = 8 , e = 9 ;
SWAP(d,e) ;

Q: In my assembler I want to set n, i, x, ….?
A: Just use SETBIT Macro and use bit 6
#define NPOSITION 6
#define SETBIT(x,pos) ( (x) | ( UNITY << (pos) ) ) unsigned x = 25 ; x = SETBIT(x, NPOSITION); Other faster idea is to compute UNITY << style="color: rgb(51, 51, 255); font-weight: bold;">Q: I want to multiply an integer by 640?
A: Never use multiplications for multiplying an integer with a constant value. instead use addition and shift operations.
#define MUL640(x) ( ((x)<<7) + ((x)<<9) )
int x = MUL640(3) ;
We used 3 operations to multiply x by 640. But they are faster than operator *.
x = x *(128 + 512) ;
x = x * 128 + x * 512 ;