teach-ict.com logo

THE education site for computer science and ICT

3. String storage

ASCII is a very popular standard for storing characters and groups of characters i.e. strings.

It is a 1 byte per character system. The string:

                    MyString ="hello"

is five characters in length and it needs 5 bytes of storage.

ASCII is used for standard 'Latin' characters (this includes English). It uses denary values 0-255 and can store:

  • upper case letters
  • lower case letters
  • some control characters
  • symbols
  • a few accented characters.

However, the world speaks many languages which use non-Latin characters such as Ў (a Russian cyrillic letter). To handle this, Unicode was developed. This is 2 bytes per character (Unicode-16).

Many computer languages support both, and will switch storage size depending if a single byte or a 2 byte unicode version is needed.

Python for example uses Unicode by default, but if the character is part of the ASCII set it uses a single byte.

For example, the string

                          mystring = helЎlo

is six characters long and it takes 7 bytes of storage because the Ў cyrillic character needs 2 bytes.

 

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: How much storage is needed for a string?