Skip to content

Minor inconsistencies in tools:::funAPI()

3 messages · Toby Hocking, Ivan Krylov

#
Hi all,

I've noticed some peculiarities in the tools:::funAPI output that
complicate its programmatic use a bit.

 - Is it for remapped symbol names (with Rf_ or the Fortran
   underscore), or for unmapped names (without Rf_ or the underscore)?

I see that the functions marked in WRE are almost all (except
Rf_installChar and Rf_installTrChar) unmapped. This makes a lot of
sense because some of those interfaces (e.g. CONS(), CHAR(),
NOT_SHARED()) are C preprocessor macros, not functions. I also see that
installTrChar is not explicitly marked.

Are we allowed to call tools:::unmap(tools:::funAPI()$name) and
consider the return value to be the list of all unmapped APIs, despite,
e.g., installTrChar not being explicitly marked?

 - Should R_PV be an @apifun if it's currently caught by checks in
   sotools.R?

 - Should R_FindSymbol be commented /* Not API */ if it's marked as
   @apifun in WRE and not caught by sotools.R? It is currently used by 8
   CRAN packages.

 - The names 'select', 'delztg' from R_ext/Lapack.h are function
   pointer arguments, not functions or type declarations. They are
   being found because funcRegexp is written to match incomplete
   function declarations (e.g. when they end up being split over
   multiple lines, like in R_ext/Lapack.h), and function pointer
   argument declarations look sufficiently similar.

A relatively compact (but still brittle) way to match function
declarations in C header files is shown at the end of this message. I
have confirmed that compared to tools:::getFunsHdr, the only extraneous
symbols that it finds in preprocessed headers are "R_SetWin32",
"user_unif_rand", "user_unif_init", "user_unif_nseed",
"user_unif_seedloc" "user_norm_rand", which are special-cased in
tools:::getFunsHdr, and the only symbols it doesn't find are "select"
and "delztg" in R_ext/Lapack.h, which we should not be finding.

# "Bird's eye" view, gives unmapped names on non-preprocessed headers
getdecl <- function(file, lines = readLines(file)) {
	# have to combine to perform multi-line matches
	lines <- paste(c(lines, ''), collapse = '\n')
	# first eat the C comments, dotall but non-greedy match
	lines <- gsub('(?s)/\\*.*?\\*/', '', lines, perl = TRUE)
	# C++-style comments too, multiline not dotall
	lines <- gsub('(?m)//.*$', '', lines, perl = TRUE)
	# drop all preprocessor directives
	lines <- gsub('(?m)^\\s*#.*$', '', lines, perl = TRUE)

	rx <- r"{(?xs)
		(?!typedef)(?<!\w) # please no typedefs
		# return type with attributes
		(
			# words followed by whitespace or stars
			(?: \w+ (?:\s+ | \*)+)+
		)
		# function name, assumes no extra whitespace
		(
			\w+\(\w+\) # macro call
			| \(\w+\)  # in parentheses
			| \w+      # a plain name
		)
		# arguments: non-greedy match inside parentheses
		\s* \( (.*?) \) \s* # using dotall here
		# will include R_PRINTF_FORMAT(1,2 but we don't care
		# finally terminated by semicolon
		;
	}"

	regmatches(lines, gregexec(rx, lines, perl = TRUE))[[1]][3,]
}

# Preprocess then extract remapped function names like getFunsHdr
getdecl2 <- function(file)
	file |>
	readLines() |>
	grep('^\\s*#\\s*error', x = _, value = TRUE, invert = TRUE) |>
	tools:::ccE() |>
	getdecl(lines = _)
14 days later
#
Hi Ivan
Can you please clarify what input files should be used with your
proposed function? I tried a few files in r-svn/src/include and one of
them gave me an error.
[1] "R_FlushConsole"  "R_ProcessEvents" "R_WaitEvent"
Error in regmatches(lines, gregexec(rx, lines, perl = TRUE))[[1]][3, ] :
  incorrect number of dimensions

On Mon, Jul 15, 2024 at 10:32?AM Ivan Krylov via R-devel
<r-devel at r-project.org> wrote:
#
? Mon, 29 Jul 2024 16:29:42 -0400
Toby Hocking <tdhock5 at gmail.com> ?????:
This is a good illustration of the brittleness of the regexp approach.
I focused on the header files marked as API:
[1] "R_ext/GraphicsDevice.h" "Rmath.h"
 [3] "R_ext/GraphicsEngine.h" "R_ext/BLAS.h"
 [5] "R_ext/Lapack.h"         "R_ext/Linpack.h"
 [7] "Rembedded.h"            "Rinterface.h"
 [9] "R_ext/Altrep.h"         "R_ext/Memory.h"
[11] "R_ext/RStartup.h"       "R_ext/Arith.h"
[13] "R_ext/Random.h"         "R_ext/Error.h"

I also wanted the function not to crash with Rinternals.h, but getdecl
/ getdecl2 / tools:::getFunsHdr all give different answers for it.

I think this can be done in a more reliable manner using a recursive
descent parser, but that would take some screenfuls of R that will need
to be very carefully written.

Speaking of discrepancies, here are a few functions declared in API
headers but marked with attribute_hidden:

R_ext/Error.h:NORET void WrongArgCount(const char *);
R_ext/Memory.h:int      R_gc_running(void);

And some minor headaches for people who would like a full
programmatic list of entry points:

 - The functions [dpq]norm are unconditionally remapped to dnorm4,
   pnorm5, qnorm5, and the header file parser only picks up the
   numbered function names.

 - 'optimfn', 'optimgr', 'integr_fn' are marked in WRE as @apifun
   despite not directly being functions or symbol names exported by R
   binaries. May I suggest a separate category for types?