1. 13 Dec, 2022 2 commits
    • Prevent all CHECK functions from expanding macros for the error string. · a13ef44b
      This was likely an unintentional behavior change made a while ago while trying to reduce duplication.  The new behavior will always include the unexpanded macro in the error string.  For example, `CHECK_EQ(MACRO(x), MACRO(y))` will now output "MACRO(x) == MACRO(y)" if it fails.  Before this change, CHECK and QCHECK were the only macros that had this behavior.
      
      Not using function-like macro aliases is a possible alternative here, but unfortunately that would flood the macro namespace downstream with CHECK* and break existing code.
      
      PiperOrigin-RevId: 495138582
      Change-Id: I6a1afd89a6b9334003362e5d3e55da68f86eec98
      Mike Kruskal committed
    • Add prefetch to crc32 · 4cb6c389
      We already prefetch in case of large inputs, do the same
      for medium sized inputs as well. This is mostly neutral
      for performance in most cases, so this also adds a new
      bench with working size >> cache size to ensure that we
      are seeing performance benefits of prefetch. Main benefits
      are on AMD with hardware prefetchers turned off:
      
      AMD prefetchers on:
      name                           old time/op  new time/op  delta
      BM_Calculate/0                 2.43ns ± 1%  2.43ns ± 1%     ~     (p=0.814 n=40+40)
      BM_Calculate/1                 2.50ns ± 2%  2.50ns ± 2%     ~     (p=0.745 n=39+39)
      BM_Calculate/100               9.17ns ± 1%  9.17ns ± 2%     ~     (p=0.747 n=40+40)
      BM_Calculate/10000              474ns ± 1%   474ns ± 2%     ~     (p=0.749 n=40+40)
      BM_Calculate/500000            22.8µs ± 1%  22.9µs ± 2%     ~     (p=0.298 n=39+40)
      BM_Extend/0                    1.38ns ± 1%  1.38ns ± 1%     ~     (p=0.651 n=40+40)
      BM_Extend/1                    1.53ns ± 2%  1.53ns ± 1%     ~     (p=0.957 n=40+39)
      BM_Extend/100                  9.48ns ± 1%  9.48ns ± 2%     ~     (p=1.000 n=40+40)
      BM_Extend/10000                 474ns ± 2%   474ns ± 1%     ~     (p=0.928 n=40+40)
      BM_Extend/500000               22.8µs ± 1%  22.9µs ± 2%     ~     (p=0.331 n=40+40)
      BM_Extend/100000000            4.79ms ± 1%  4.79ms ± 1%     ~     (p=0.753 n=38+38)
      BM_ExtendCacheMiss/10          25.5ms ± 2%  25.5ms ± 2%     ~     (p=0.988 n=38+40)
      BM_ExtendCacheMiss/100         23.1ms ± 2%  23.1ms ± 2%     ~     (p=0.792 n=40+40)
      BM_ExtendCacheMiss/1000        37.2ms ± 1%  28.6ms ± 2%  -23.00%  (p=0.000 n=38+40)
      BM_ExtendCacheMiss/100000      7.77ms ± 2%  7.74ms ± 2%   -0.45%  (p=0.006 n=40+40)
      
      AMD prefetchers off:
      name                           old time/op  new time/op  delta
      BM_Calculate/0                 2.43ns ± 2%  2.43ns ± 2%     ~     (p=0.351 n=40+39)
      BM_Calculate/1                 2.51ns ± 2%  2.51ns ± 1%     ~     (p=0.535 n=40+40)
      BM_Calculate/100               9.18ns ± 2%  9.15ns ± 2%     ~     (p=0.120 n=38+39)
      BM_Calculate/10000              475ns ± 2%   475ns ± 2%     ~     (p=0.852 n=40+40)
      BM_Calculate/500000            22.9µs ± 2%  22.8µs ± 2%     ~     (p=0.396 n=40+40)
      BM_Extend/0                    1.38ns ± 2%  1.38ns ± 2%     ~     (p=0.466 n=40+40)
      BM_Extend/1                    1.53ns ± 2%  1.53ns ± 2%     ~     (p=0.914 n=40+39)
      BM_Extend/100                  9.49ns ± 2%  9.49ns ± 2%     ~     (p=0.802 n=40+40)
      BM_Extend/10000                 475ns ± 2%   474ns ± 1%     ~     (p=0.589 n=40+40)
      BM_Extend/500000               22.8µs ± 2%  22.8µs ± 2%     ~     (p=0.872 n=39+40)
      BM_Extend/100000000            10.0ms ± 3%  10.0ms ± 4%     ~     (p=0.355 n=40+40)
      BM_ExtendCacheMiss/10           196ms ± 2%   196ms ± 2%     ~     (p=0.698 n=40+40)
      BM_ExtendCacheMiss/100          129ms ± 1%   129ms ± 1%     ~     (p=0.602 n=36+37)
      BM_ExtendCacheMiss/1000        88.6ms ± 1%  57.2ms ± 1%  -35.49%  (p=0.000 n=36+38)
      BM_ExtendCacheMiss/100000      14.9ms ± 1%  14.9ms ± 1%     ~     (p=0.888 n=39+40)
      
      Intel skylake:
      BM_Calculate/0                 2.49ns ± 2%  2.44ns ± 4%  -2.15%  (p=0.001 n=31+34)
      BM_Calculate/1                 3.04ns ± 2%  2.98ns ± 9%  -1.95%  (p=0.003 n=31+35)
      BM_Calculate/100               8.64ns ± 3%  8.53ns ± 5%    ~     (p=0.065 n=31+35)
      BM_Calculate/10000              290ns ± 3%   285ns ± 7%  -1.80%  (p=0.004 n=28+34)
      BM_Calculate/500000            11.8µs ± 2%  11.6µs ± 8%  -1.59%  (p=0.003 n=26+34)
      BM_Extend/0                    1.56ns ± 1%  1.52ns ± 3%  -2.44%  (p=0.000 n=26+35)
      BM_Extend/1                    1.88ns ± 3%  1.83ns ± 6%  -2.17%  (p=0.001 n=27+35)
      BM_Extend/100                  9.31ns ± 3%  9.13ns ± 7%  -1.92%  (p=0.000 n=33+38)
      BM_Extend/10000                 290ns ± 3%   283ns ± 3%  -2.45%  (p=0.000 n=32+38)
      BM_Extend/500000               11.8µs ± 2%  11.5µs ± 8%  -1.80%  (p=0.001 n=35+37)
      BM_Extend/100000000            6.39ms ±10%  6.11ms ± 8%  -4.34%  (p=0.000 n=40+40)
      BM_ExtendCacheMiss/10          36.2ms ± 7%  35.8ms ±14%    ~     (p=0.281 n=33+37)
      BM_ExtendCacheMiss/100         26.9ms ±15%  25.9ms ±12%  -3.93%  (p=0.000 n=40+40)
      BM_ExtendCacheMiss/1000        23.8ms ± 5%  23.4ms ± 5%  -1.68%  (p=0.001 n=39+40)
      BM_ExtendCacheMiss/100000      10.1ms ± 5%  10.0ms ± 4%    ~     (p=0.051 n=39+39)
      
      PiperOrigin-RevId: 495119444
      Change-Id: I67bcf3b0282b5e1c43122de2837a24c16b8aded7
      Ilya Tokar committed
  2. 12 Dec, 2022 4 commits
  3. 09 Dec, 2022 1 commit
  4. 08 Dec, 2022 5 commits
    • Fix some ClangTidy warnings in raw_hash_set code. · 522606b7
      PiperOrigin-RevId: 493993005
      Change-Id: I0705be8678022a9e08a1af9972687b7955593994
      Evan Brown committed
    • Fixing macro expansion changes in new logging macros. · ec583f2d
      This was an unintentional behavior change when we added a new layer of macros.  Not using function-like macro aliases would get around this, but unfortunately that would flood the macro namespace downstream with CHECK and LOG (and break existing code).
      
      Note, the old behavior only applied to CHECK and QCHECK.  Other CHECK macros already had multiple layers of function-like macros and were unaffected.
      
      PiperOrigin-RevId: 493984662
      Change-Id: I9a050dcaf01f2b6935f02cd42e23bc3a4d5fc62a
      Mike Kruskal committed
    • Eliminate AArch64-specific code paths from LowLevelHash · c353e259
      After internal investigation, it’s no longer clear that the alternative
      LowLevelHash mixer committed in a05366d8
      unequivocally improves performance on AArch64. It unnecessarily reduces
      performance on Apple Silicon and the AWS Graviton. It also lowers hash
      quality, which offsets much of the performance gain it provides on the
      Arm Neoverse N1 (see https://github.com/abseil/abseil-cpp/issues/1093).
      Switch back to the original mixer.
      
      Closes: https://github.com/abseil/abseil-cpp/issues/1093
      PiperOrigin-RevId: 493941913
      Change-Id: I84c789b2f88c91dec22f6f0f6e8c5129d2939a6f
      Benjamin Barenblat committed
    • Change CommonFields from a private base class of raw_hash_set to be the first… · 523b8699
      Change CommonFields from a private base class of raw_hash_set to be the first member of the settings_ CompressedTuple so that we can move growth_left into CommonFields.
      
      This allows for removing growth_left as a separate argument for a few functions.
      
      Also, move the infoz() accessor functions to be before the data members of CommonFields to comply with the style guide.
      
      PiperOrigin-RevId: 493918310
      Change-Id: I58474e37d3b16a1513d2931af6b153dea1d809c2
      Evan Brown committed
    • The abridged justification is as follows: · 2e177685
      -   The deadlock seems to occur if flag initialization happens to occur while a sample is being created.
          -   Each sample has its own mutex that is locked when a new sample is registered, i.e. created for the first time.
          -   The flag implicitly creates a global sampler object which locks `graveyard_`'s mutex.
      -   Usually, in `PushDead`, the `graveyard` is locked before the sample, hence triggering deadlock detection.
      -   This lock order can never be recreated since this code is executed exactly once per sample object, and the sample object cannot be accessed until after the method returns.
      -   It should therefore be safe to ignore any locking order condition that may occur during sample creation.
      
      PiperOrigin-RevId: 493901903
      Change-Id: I094abca82c1a8a82ac392383c72469d68eef09c4
      Abseil Team committed
  5. 07 Dec, 2022 3 commits
  6. 06 Dec, 2022 4 commits
  7. 05 Dec, 2022 2 commits
  8. 02 Dec, 2022 3 commits
  9. 01 Dec, 2022 2 commits
  10. 30 Nov, 2022 3 commits
  11. 29 Nov, 2022 6 commits
  12. 28 Nov, 2022 5 commits
    • Write (more) directly into the structured buffer from StringifySink, including… · 13708db8
      Write (more) directly into the structured buffer from StringifySink, including for (size_t, char) overload.
      
      PiperOrigin-RevId: 491456410
      Change-Id: I76dec24b0bd02204fa38419af9247cee38b1cf50
      Andy Getzendanner committed
    • Avoid using the non-portable type __m128i_u. · 558a0e46
      According to https://stackoverflow.com/a/68939636 it is safe to use
      __m128i instead.
      
      https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list?view=msvc-170 also uses this type instead
      
      Fixes #1330
      
      PiperOrigin-RevId: 491427300
      Change-Id: I4a1d44ac4d5e7c1e1ee063ff397935df118254a1
      Derek Mauro committed
    • Reduce flat_hash_{set,map} generated code size. · e5a7979d
      This CL makes a bunch of changes (mostly to raw_hash_set which
      underlies flat_hash_set and flat_hash_map). Techniques used:
      
      * Extract code that does not depend on the specific hash table type
        into common (non-inlined) functions.
      * Place ABSL_ATTRIBUTE_NOINLINE directives judiciously.
      * Out-of-line some slow paths.
      
      Reduces sizes of some large binaries by ~0.5%.
      
      Has no significant performance impact on a few performance critical
      binaries.
      
      ## Speed of fleetbench micro-benchmarks
      
      Following is a histogram of %-age changes in
      [fleetbench](https://github.com/google/fleetbench)
      hot_swissmap_benchmark results. Negative numbers indicate a speedup
      caused by this change. Statistically insignificant changes are mapped
      to zero.
      
      XXX Also run and merge in cold_swissmap_benchmark
      
      Across all 351 benchmarks, the average speedup is 0.38%.
      The best speedup was -25%, worst slowdown was +6.81%.
      
      ```
      Count: 351  Average: -0.382764  StdDev: 3.77807
      Min: -25  Median: 0.435135  Max: 6.81
      ---------------------------------------------
      [ -25, -10)  16  4.558%   4.558% #
      [  -9,  -8)   2  0.570%   5.128%
      [  -8,  -7)   1  0.285%   5.413%
      [  -7,  -6)   1  0.285%   5.698%
      [  -6,  -5)   2  0.570%   6.268%
      [  -5,  -4)   5  1.425%   7.692%
      [  -4,  -3)  13  3.704%  11.396% #
      [  -3,  -2)  15  4.274%  15.670% #
      [  -2,  -1)  26  7.407%  23.077% ##
      [  -1,   0)  14  3.989%  27.066% #
      [   0,   1) 185 52.707%  79.772% ############
      [   1,   2)  14  3.989%  83.761% #
      [   2,   3)   8  2.279%  86.040% #
      [   3,   4)   7  1.994%  88.034%
      [   4,   5)  32  9.117%  97.151% ##
      [   5,   6)   6  1.709%  98.860%
      [   6,   7)   4  1.140% 100.000%
      ```
      
      We looked at the slowdowns and they do not seem worth worrying
      about. E.g., the worst one was:
      
      ```
      BM_FindHit_Hot<::absl::node_hash_set,64>/set_size:4096/density:0
        2.61ns ± 1%  2.79ns ± 1%   +6.81%  (p=0.008 n=5+5)
      ```
      
      ## Detailed changes
      
      * Out-of-line slow paths in hash table sampler methods.
      * Explicitly unregister from sampler instead of from destructor.
      * Introduced a non-templated CommonFields struct that holds some of
        the hash table fields (infoz, ctrl, slots, size, capacity). This
        struct can be passed to new non-templated helpers. The struct is
        a private base class of raw_hash_set.
      * Made non-inlined InitializeSlots<> that is only templated on
        allocator and size/alignment of the slot type so that we can share
        instantiations across types that have the same size/alignment.
      * Moved some infrequently called code paths into non-inlined type-erased.
        functions. Pass a suite of type-specific function pointers to these
        routines for when they need to operate on slots.
      * Marked some methods as non-inlined.
      * Avoid unnecessary reinitialization in destructor.
      * Introduce UpdateSpine type-erased helper that is called from
        clear() and rehash().
      
      PiperOrigin-RevId: 491413386
      Change-Id: Ia5495c5a6ec73622a785a0d260e406ddb9085a7c
      Abseil Team committed
    • Use ABSL_HAVE_BUILTIN to fix -Wundef __has_builtin warning · e3158086
      Fixes #1329
      
      PiperOrigin-RevId: 491372279
      Change-Id: I93c094b06ece9cb9bdb39fd4541353e0344a1a57
      Derek Mauro committed
    • Add a TODO for the deprecation of absl::aligned_storage_t · 04596b25
      PiperOrigin-RevId: 491367420
      Change-Id: I6a0ab74bb0675fd910ed9fc95ee20c5023eb0cb6
      Derek Mauro committed