Discussion:
Static library: removing non-API symbols
(too old to reply)
Norbert Juffa
2004-02-15 22:21:23 UTC
Permalink
My apologies if someone has already seen this post over
in gnu.gcc. I belatedly noticed that gnu.gcc is a very
low-volume group and probably not suitable for questions
such as this, which could explain the lack of replies.

I am trying to find a way of hiding all non-API symbols
contained in a static libary (i.e., lib*.a). I am using
the gcc 3.2 tool chain, and the static library consists
of a fairly large number of object files generated from C
sources. The modularity of the library is of no concern,
as any real-world application will pull in 90+% of the
library code anyhow.

What I am trying to accomplish is easily implemented for
a DSO by use of --version-script combined with aggressive
stripping. However in this case I need to deliver a static
library, not a DSO.

Here is what I am doing right now. First, I converted as
many functions as possible to static functions. I figured
further that by combining the object files comprising the
library into one giant new object file, all inter-object
references will be resolved and thus no longer needed. I
could then strip off the names of everything but API the
API symbols and undefined symbols (i.e. references to
functions in other libraries):

$(LD) -o super.o -O5 -Ur --retain-symbols-file api.txt *.o
$(AR) rcvs $(STAT_LIB_NAME) super.o
$(STRIP) --strip-debug --discard-all -R .note -R .comment

Here, api.txt refers to a list of API functions using the
format required by --retain-symbols-file (flat file with
one symbol name per line).

Unfortunately, the above approach does not do what I want
to accomplish. Using nm on the resulting static library
shows me that all symbols with external linkage from the
original object files are preserved. I experimented with
various switch combinations of strip, objcopy, and ld, but
can't find a way to make this work.

The description of --retain-symbols-file says it "does not
discard undefined symbols, or symbols needed for relocations".
So apparently the linker considers the symbols I would like
to remove as "needed for relocation", although I have created
a single object file?

Is there a way to solve this issue without obfuscating the
names of non-API symbols?

-- Norbert
Paul Pluzhnikov
2004-02-16 02:36:41 UTC
Permalink
"Norbert Juffa" <***@earthlink.net> writes:

You want:

objcopy --strip-all --keep-symbols api.txt \
-R .note -R .comment super.o super-s.o

For example:

$ cat junk.c
int a() { return 1; }
int b() { return a(); }
int c() { return b(); }
int d() { return c(); }
int api() { return a() + b() + c() + d(); }

$ cat junk.txt
api

$ gcc -c junk.c
$ objcopy --strip-all --keep-symbols junk.txt junk.o junk-s.o
$ nm junk-s.o
junk-s.o:00000080 T api

Voila!
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Norbert Juffa
2004-02-16 07:38:34 UTC
Permalink
Post by Paul Pluzhnikov
objcopy --strip-all --keep-symbols api.txt \
-R .note -R .comment super.o super-s.o
$ cat junk.c
int a() { return 1; }
int b() { return a(); }
int c() { return b(); }
int d() { return c(); }
int api() { return a() + b() + c() + d(); }
$ cat junk.txt
api
$ gcc -c junk.c
$ objcopy --strip-all --keep-symbols junk.txt junk.o junk-s.o
$ nm junk-s.o
junk-s.o:00000080 T api
Paul, thanks for the speedy reply. I wonder though: Does your method
work if functions a(), b(), ..., d() are defined in different source
files and therefore initially reside in separate object files? In my
situation there are numerous .o files and I combine them into a single
super.o by using ld -o super.o -O5 -Ur *.o

I am pretty sure I tried something along the lines of what you suggest
without success. Unfortunately I am not in front of my Linux box right
now to give your suggestion a try. I'll check it out as soon as possible
(i.e. Tuesday) and will report back my findings.

Again, thanks for your help.


-- Norbert
Paul Pluzhnikov
2004-02-16 15:50:32 UTC
Permalink
Post by Norbert Juffa
Paul, thanks for the speedy reply. I wonder though: Does your method
work if functions a(), b(), ..., d() are defined in different source
files and therefore initially reside in separate object files?
Obviously you don't want to strip them *before* they are linked
together into the "super.o".
Post by Norbert Juffa
In my
situation there are numerous .o files and I combine them into a single
super.o by using ld -o super.o -O5 -Ur *.o
By which time all traces of their origin (in separate objects) are gone.
They are bound together and "inseparable" from this point on [1].
Post by Norbert Juffa
I am pretty sure I tried something along the lines of what you suggest
without success.
You've probably *almost* had it working. I've just confirmed that
it works with multiple "files of origin" on a Linux box.

[1] Inseparable on all UNIXes except on AIX, where the linker can
still "pull" them apart via their csect's.
Post by Norbert Juffa
Again, thanks for your help.
Welcome from Pasadena.
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Norbert Juffa
2004-02-17 21:44:52 UTC
Permalink
[...]
Post by Paul Pluzhnikov
Post by Norbert Juffa
I am pretty sure I tried something along the lines of what you suggest
without success.
You've probably *almost* had it working. I've just confirmed that
it works with multiple "files of origin" on a Linux box.
[1] Inseparable on all UNIXes except on AIX, where the linker can
still "pull" them apart via their csect's.
Post by Norbert Juffa
Again, thanks for your help.
Welcome from Pasadena.
I gave Paul's suggestion a try and the resulting library is no longer
functional. Using the example setup mentioned above, all the calls to
functions a(), b(), .., d() branch to an incorrect address. Bascically
they branch to the calling instruction itself. I seem to recall that
this is the same effect I noticed before as I was struggling to find a
solution. Apparently the symbols for functions a() through d() are
needed for relocation. For references, here is the sequence of
commands I used.

rm *.o
rm *.a
gcc -O2 -g -c -o funca.o funca.c
gcc -O2 -g -c -o funcb.o funcb.c
gcc -O2 -g -c -o funcc.o funcc.c
gcc -O2 -g -c -o api.o api.c
ld -o super.o -Ur *.o
objcopy --strip-all --keep-symbols api.txt -R .note -R .comment
super.o super-api.o
ar rcvs libfunc.a super-api.o
gcc -O2 -L. -o tester -static tester.c -lfunc
nm libfunc.a

super-api.o:
00000000 T api

-- Norbert
Paul Pluzhnikov
2004-02-18 04:13:54 UTC
Permalink
Post by Norbert Juffa
I gave Paul's suggestion a try and the resulting library is no longer
functional. Using the example setup mentioned above, all the calls to
functions a(), b(), .., d() branch to an incorrect address. Bascically
they branch to the calling instruction itself.
Indeed. They actually branch to $pc+1, which results in a SIGILL ...
Post by Norbert Juffa
Apparently the symbols for functions a() through d() are
needed for relocation.
Yes, of course :-(
However, there is a solution: :-)

gcc -O2 -g -c -o funca.o funca.c
gcc -O2 -g -c -o funcb.o funcb.c
gcc -O2 -g -c -o funcc.o funcc.c
gcc -O2 -g -c -o api.o api.c

Produce *relocated* file:

ld -eapi -o super.o *.o

["-eapi" is just to suppress "ld: warning: cannot find entry symbol
_start; defaulting to 08048080"]

Now strip all relocation entries except the ones you want to keep:

objcopy --strip-all --keep-symbols api.txt -R .note -R .comment \
super.o super-api.o

nm super-api.o
080480a0 T api

gcc -o tester -static tester.c super-api.o -Wl,-y,api
/tmp/cc9i2onI.o: reference to api
super-api.o: definition of api

nm a.out | grep api
08048450 T api ## good: symbol was relocated elsewhere

./a.out
echo $?
4 ## correct answer: a()+b()+c()+d()

Unfortunately, things grow even more complicated if super.o has
any unresolved symbols ...

So in the end, the simplest solution appears to be:

- make all symbols 'static', except the public-API ones
- create 'super.c' which '#includes' all the other .c files
- compile super.c, 'objcopy --strip-unneeded super.o super-api.o',
and you are done.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Norbert Juffa
2004-02-19 16:33:24 UTC
Permalink
Post by Paul Pluzhnikov
Post by Norbert Juffa
I gave Paul's suggestion a try and the resulting library is no longer
functional. Using the example setup mentioned above, all the calls to
functions a(), b(), .., d() branch to an incorrect address. Bascically
they branch to the calling instruction itself.
Indeed. They actually branch to $pc+1, which results in a SIGILL ...
That is the behavior I observed on x86. On ARM, I got BL (branch and link)
instructions branching to themselves, causing an infinite loop.


[..]
Post by Paul Pluzhnikov
Unfortunately, things grow even more complicated if super.o has
any unresolved symbols ...
This is the case for my library. It calls several functions from libc
and libm. Paul, thanks for putting so much effort into trying to find
a solution, but I think it is becoming clear that trying to go the .o
route here is not very workable.
Post by Paul Pluzhnikov
- make all symbols 'static', except the public-API ones
- create 'super.c' which '#includes' all the other .c files
- compile super.c, 'objcopy --strip-unneeded super.o super-api.o',
and you are done.
I am considering combining the sources, or go for obfuscating symbols
after all.


-- Norbert

Loading...