Java Tips: Memory Optimization for String

String is a unique object in Java. The Java Specification explains several unique properties of String in Java. We might already know some of them. First, String is unique because it can be created without new keyword, like example below.

String s = "new String";

I have to mention that you can still create String object using new keyword, like this:

String s = new String("new String");

Does both statement “exactly equals”? Well, most of you also know that this is not true. The first example will try to reuse the same object whenever possible (and is correct because String is immutable) while the second will force the creation of new String object. Consider this example:

System.out.println("b" == "b");
System.out.println(new String("b") == new String("b"));

The result of first example is “true” while the second one will give “false”.

I almost certain that experienced programmer will never create String using new in normal use. But sometime, we are forced to use that. One case that I can think of is when you parse an XML file using SAX parser.

public class Reader extends DefaultHandler {

    private List<String> listString = new ArrayList<String>();

    public void characters(char[] ch, int start, int length) throws SAXException {

        String content = new String(ch, start, length);
        listString.add(content);

    }
}

This example works correctly but is not efficient. Once you have a document like this:

<test>
    <string>String</string>
    <string>String</string>
    <string>String</string>
    <string>String</string>
    <string>String</string>
    <string>String</string>
    <string>String</string>
    <string>String</string>
    <string>String</string>
    <string>String</string>
</test>

Try to profile your application, force garbage collection and you will still have ten String objects left in the memory.

Fortunately, Java has provided a method to avoid such case. You can use String.intern() to force the application to use the same String object whenever possible. For above example, you can change the code to something like this:

public class Reader extends DefaultHandler {

    private List<String> listString = new ArrayList<String>();

    public void characters(char[] ch, int start, int length) throws SAXException {

        String content = new String(ch, start, length).intern();
        listString.add(content);

    }
}

Now, re-profile the application, force garbage collection, and you will only have one String left in the memory. You can save a lot of memory if you can make sure that there is only one instance of String with certain value in your JVM.

This method also has nice side effect. If you do a lot of String equality comparison in the application, a same String object run faster. To explain this, we can read the source code of String:

...
public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = count;
        if (n == anotherString.count) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = offset;
            int j = anotherString.offset;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                    return false;
            }
            return true;
        }
    }
    return false;
}
...

If the object is same, then the method will be immediately after this line if (this == anObject). This is very fast and will save a lot of process time if your application do this operations a lot of time.

Related posts:

  1. Java Tips: Iterate and cast
  2. Java Tips: Thread Safety Documentation
  3. Java Tips: Array argument
  4. Java Tips: Launching a page in default browser
  5. Java Tips: Using generic correctly

25 Responses to “Java Tips: Memory Optimization for String”


  • This advice can have bad side-effects. A string object is actually just a view on an underlying char[]. If the char[] is very large, and the view very small (as can often happen with using substring on a large string such as xml) then your advice would end up caching the whole large char[] even though you are only using a small fraction of it. And this is a cache that can never be freed.

    In general, String.intern() is a pretty low-level method (originally designed for JVM/compiler caching) and great care should be taken when using it in your own code.

  • Beside the above mentioned problem, isn’t this also a performance hit?

  • Wouldn’t there be a lookup cost as the pool of unique strings grow?
    I’d rather sacrifice a little bit of memory rather than performance.

    As for parsing the XML in your example, I understand that it depends on the XML

  • (sorry, I don’t know how my comment got cut off)

    wouldn’t it be better to use regex if you really really need to have the unique string(s)?

  • for Stephen: Sorry, I can’t understand your explanation. I only cache the partial String, not the whole XML.

    new String(ch, start, length)

    And AFAIK, the entries in String pool will be freed if there is no more soft link or hard link to that String.

  • for Ariya: intern is a native method, so I really believe that the performance will not be affected that much. Of course, you then have a question which one is more important for you, performance or memory. But like I said, if we don’t cache the String, we will get performance problem in the equality operation.

  • For Bristol: Well, maybe you misunderstood my small example. I want to collect all text contents from the XML file, and put it in the list. The list will not contain unique String, but for if two elements is equals, they will also share the same object by using intern. This is not the case if you aren’t using intern.

    Maybe I should explain more my view of this problem. This problem is a real problem that I faced in my application and by using profiler I exactly know my problem lies in the number of duplicate String object that I have at one specific time. Using intern solve that problem.

  • I found this article about String.intern().

    String.intern() saves heap space, but at the expense of using up the more precious PermGen space.

  • For Ronsen: Of course the String has to be stored in some place. I haven’t managed to test whether it is stored in PermGen memory or heap memory but for sure, I won’t suggest to use intern in case your String is not unique (as the case explained in the article).

    Using weak reference might be the best solution for my problem. The only problem that I can see now is probably about the performance compared to intern, I’m not sure yet. I will try to use that technique later to optimize my application. Thank you for the reference!

  • Hi,
    Very good explanation, why one should be careful to not duplicate Strings!
    @Bristol, Typically the extra costs of String.intern are minimal compared to the increase in memory usage.
    Often you would hold a copy of a certain String for each user, which means for n users you will need n times more memory for this String.
    http://www.eclipse.org/mat/ allows you to find those duplicates easily. Check some examples at http://kohlerm.blogspot.com/search/label/memory

    Regards,
    Markus

    Regards,
    Markus

  • Hi Nanda,

    The problem with intern as mentioned before is that it eats up your PermGen slowly and surely.
    If the String you create is a result of user input, you will have your PermGen constantly increasing up until the inevitable OutOfMemoryError is thrown!
    Also, for something that was said earlier, you really do keep only the subset of chars; see String’s implementation for reference. :)

  • Hi Aviad,
    I don’t reject the fact that intern uses PermGen memory, but so the normal Strings. Like I explain in the article, you should do this for a String that you know is used many times in your program and you keep that. How do you know that is by profiling your program. This is a performance problem which should only be handled after the functionality of the application is known to work properly.

  • Nanda,

    Since you’re accepting user input as XML. This means that over time your application bloats and eventually lead it to crash, which is not a behavior a user would expect as it’s random. I don’t like th suggestion because it’s extremely bad for long-running applications would definitely crash and short-running applications would die randomly according to input size.

    I would prefer the usage of a weakly referenced map, since the GC can clean it later.

  • Do you mean that GC will not free PermGen memory? AFAIK, that’s not the case.

  • Artur Biesiadowski

    Few clarifications:

    1) As far as I know, interned strings can be freed. It is a weak hashmap. If this is not a case, unloading classes/classloaders would be leaking memory (as all literals are interned by default).

    2) For me, biggest issue with intern was synchronization. It is huge contention point for application. I have seen the case in real world application where part of the code which was taking around 2-3% in profiling, when ‘optimized’ with intern, moved to be around 80% because of blocking all threads which could be doing useful work in meantime.

    3) For relatively short lived strings (short lived mean not going into old generation, but maybe surviving few newgen collections), cost of lookup in intern map can be considerable compared to the savings.

  • @artur: I agree with you regarding synchronization but really curious about how big it affect your application. Can you describe how many threads are you talking about? Thanks.

  • Artur Biesiadowski

    @Nanda
    Application itself had probably around 500 threads, but there were probably max 100 around that section (and no more than around 5 executing it at any given moment probably). It was a protocol parsing/encoding code, which was handling hundreds of concurrent connections, distributing 20-50k events per second. All of those messages had small strings with identifiers, which were often the same, but had to be moved through few layers of code, staying in memory for few seconds sometimes. As it was probably around 50% of short lived garbage in the system, we came to idea of using intern to merge that memory. Unfortunately, intern was considerably slower than string allocation (obviously), which caused more than 5 threads to be at same time in that area of code which on top of that got single-threaded because of intern.

    As far as I remember, we ended up with some higher-level id cache, using byte[] arrays with index/offset as keys (to avoid temp object creation for key lookup).

  • Thank you for the explanation @artur. One question, do you think the same problem will also happen if we define a normal String, like String a = “String” in several threads?

  • Artur Biesiadowski

    @Nanda – no, in such case you would not pay cost for intern. Literal strings are interned at the class loading time (or class resolution, don’t remember now) and after that they are just references. Only way you could have an issue here would be if you would defined new classes over and over with different literals – but class loading will be a lot bigger performance hit than string intern can ever be.

  • Great! Then it will have a problem if the definition is something like this: String a = “String” + b, where b is only resolvable in runtime. Am I correct?

  • Artur Biesiadowski

    No, you will have the problem only if you do something like

    String a = (“String”+b).intern();

    where b is runtime variable. We are talking about possible dangers of intern() here, normal String operations are not any issue for multiple threads at once.

  • OK thanks… I got your explanation.

  • If I understand this correctly, for long-life String repositories such as ResourceBundle et al, using intern() for the keys (but not the values) should make sense both in terms of speed and memory consumption.

    I will give it a try, if something blows out I will keep you posted. Thanks!

  • Yup… As far as two things are fulfilled (many duplication, long term used), I thing we can safely say it’s worthed, except if your application is multi-threaded.

    Much better is if you do some profiling beforehand.

Leave a Reply