- 15 Sep, 2023 2 commits
-
-
1. Remove special handling of Condition::kTrue. Condition::kTrue is used very rarely (frequently its uses even indicate confusion and bugs). But we pay few additional branches for kTrue on all Condition operations. Remove that special handling and simplify logic. 2. And remove known_false condition in Mutex code. Checking known_false condition only causes slow down because: 1. We already built skip list with equivalent conditions (and keep improving it on every Skip call). And when we built the skip list, we used more capable GuaranteedEqual function (it does not just check for equality of pointers, but for also for equality of function/arg). 2. Condition pointer are rarely equal even for equivalent conditions becuase temp Condition objects are usually created on the stack. We could call GuaranteedEqual(cond, known_false) instead of cond == known_false, but that slows down things even more (see point 1). So remove the known_false optimization. Benchmark results for this and the previous change: name old cpu/op new cpu/op delta BM_ConditionWaiters/0/1 36.0ns ± 0% 34.9ns ± 0% -3.02% (p=0.008 n=5+5) BM_ConditionWaiters/1/1 36.0ns ± 0% 34.9ns ± 0% -2.98% (p=0.008 n=5+5) BM_ConditionWaiters/2/1 35.9ns ± 0% 34.9ns ± 0% -3.03% (p=0.016 n=5+4) BM_ConditionWaiters/0/8 55.5ns ± 5% 49.8ns ± 3% -10.33% (p=0.008 n=5+5) BM_ConditionWaiters/1/8 36.2ns ± 0% 35.2ns ± 0% -2.90% (p=0.016 n=5+4) BM_ConditionWaiters/2/8 53.2ns ± 7% 48.3ns ± 7% ~ (p=0.056 n=5+5) BM_ConditionWaiters/0/64 295ns ± 1% 254ns ± 2% -13.73% (p=0.008 n=5+5) BM_ConditionWaiters/1/64 36.2ns ± 0% 35.2ns ± 0% -2.85% (p=0.008 n=5+5) BM_ConditionWaiters/2/64 290ns ± 6% 250ns ± 4% -13.68% (p=0.008 n=5+5) BM_ConditionWaiters/0/512 5.50µs ±12% 4.99µs ± 8% ~ (p=0.056 n=5+5) BM_ConditionWaiters/1/512 36.7ns ± 3% 35.2ns ± 0% -4.10% (p=0.008 n=5+5) BM_ConditionWaiters/2/512 4.44µs ±13% 4.01µs ± 3% -9.74% (p=0.008 n=5+5) BM_ConditionWaiters/0/4096 104µs ± 6% 101µs ± 3% ~ (p=0.548 n=5+5) BM_ConditionWaiters/1/4096 36.2ns ± 0% 35.1ns ± 0% -3.03% (p=0.008 n=5+5) BM_ConditionWaiters/2/4096 90.4µs ± 5% 85.3µs ± 7% ~ (p=0.222 n=5+5) BM_ConditionWaiters/0/8192 384µs ± 5% 367µs ± 7% ~ (p=0.222 n=5+5) BM_ConditionWaiters/1/8192 36.2ns ± 0% 35.2ns ± 0% -2.84% (p=0.008 n=5+5) BM_ConditionWaiters/2/8192 363µs ± 3% 316µs ± 7% -12.84% (p=0.008 n=5+5) PiperOrigin-RevId: 565669535 Change-Id: I5180c4a787933d2ce477b004a111853753304684
Dmitry Vyukov committed -
PiperOrigin-RevId: 565662176 Change-Id: I18d5d9eb444b0090e3f4ab8f66ad214a67344268
Abseil Team committed
-
- 14 Sep, 2023 1 commit
-
-
PiperOrigin-RevId: 565330231 Change-Id: I84f0e9065986bb592b5bfb196b3fc221feb14bc4
Abseil Team committed
-
- 13 Sep, 2023 2 commits
-
-
PiperOrigin-RevId: 565050503 Change-Id: I8f4c463be4ef513a2788745d1b454a7ede489152
Abseil Team committed -
PiperOrigin-RevId: 565040001 Change-Id: I1c2e715c97375754c8d863132be2c388265ca4ad
Abseil Team committed
-
- 12 Sep, 2023 3 commits
-
-
PiperOrigin-RevId: 564779671 Change-Id: I8cae825a533a00ff1983b48782486d5d00dae69a
Abseil Team committed -
This should enable binary size savings for now and more efficiency improvements with small buffer optimization. PiperOrigin-RevId: 564741270 Change-Id: Icf204d88256243eb60464439a52dd589d7a559cb
Evan Brown committed -
Add a flat_hash_set_test that we use value_type member functions to read/write from value_types when we aren't allowed to memcpy them. The motivation is to prevent a bug in small buffer optimization for swisstables. PiperOrigin-RevId: 564726325 Change-Id: Id0df5d28d65c7586428001fcb266886988cd481e
Evan Brown committed
-
- 11 Sep, 2023 2 commits
-
-
65d7b6d4 changed StrCat() to not use an intermediate buffer when the result fits in the SSO buffer, but only libc++ has an SSO buffer large enough for this optimization to work. PiperOrigin-RevId: 564447163 Change-Id: I0c7fa4afed3369b36e13e7d1691eb7f933ea0091
Derek Mauro committed -
PiperOrigin-RevId: 564296635 Change-Id: I13ca663cdb676948a7041c5671b82a97a4388ff1
Abseil Team committed
-
- 08 Sep, 2023 7 commits
-
-
We have no intention to use it instead of the CordRepBtree implementation, so cleanup up and remove all code and references. PiperOrigin-RevId: 563803813 Change-Id: I95a67318d0f722f3eb7ecdcc7b6c87e28f2e26dd
Martijn Vels committed -
It sorts NaNs and the test became flaky. Flakiness arises from the fact that sorting checks randomize and check for 100 elements but we sort here around a thousand PiperOrigin-RevId: 563783036 Change-Id: Id25bcb47483acf9c40be3fd1747c37d046197330
Abseil Team committed -
absl: remove special handling of Condition::kTrue absl: remove known_false condition in Mutex code There are some test breakages. PiperOrigin-RevId: 563751370 Change-Id: Ie14dc799e0a0d286a7e1b47f0a9bbe59dfb23f70
Abseil Team committed -
When CondVar accepted generic non-Mutex mutexes, Mutex pointer could be nullptr. Now that support is removed, but we still have some lingering checks for Mutex* == nullptr. Remove them. PiperOrigin-RevId: 563740239 Change-Id: Ib744e0b991f411dd8dba1b0da6477c13832e0f65
Abseil Team committed -
Mutex::Await/LockWhen/CondVar::Wait duplicate code, and cause additional calls at runtime and code bloat. Inline thin wrappers that just convert argument types and add a single de-duped implementation for these methods. This reduces code size, shaves off 55K from the mutex_test in release build, and should make things marginally faster. $ nm -nS mutex_test | egrep "(_ZN4absl5Mutex.*(Await|LockWhen))|(_ZN4absl7CondVar.*Wait)" before: 00000000000912c0 00000000000001a8 T _ZN4absl7CondVar4WaitEPNS_5MutexE 00000000000988c0 0000000000000c36 T _ZN4absl7CondVar16WaitWithDeadlineEPNS_5MutexENS_4TimeE 000000000009a6e0 0000000000000041 T _ZN4absl5Mutex19LockWhenWithTimeoutERKNS_9ConditionENS_8DurationE 00000000000a28c0 0000000000000779 T _ZN4absl5Mutex17AwaitWithDeadlineERKNS_9ConditionENS_4TimeE 00000000000cf4e0 0000000000000011 T _ZN4absl5Mutex8LockWhenERKNS_9ConditionE 00000000000cf500 0000000000000041 T _ZN4absl5Mutex20LockWhenWithDeadlineERKNS_9ConditionENS_4TimeE 00000000000cf560 0000000000000011 T _ZN4absl5Mutex14ReaderLockWhenERKNS_9ConditionE 00000000000cf580 0000000000000041 T _ZN4absl5Mutex26ReaderLockWhenWithDeadlineERKNS_9ConditionENS_4TimeE 00000000000cf5e0 0000000000000766 T _ZN4absl5Mutex5AwaitERKNS_9ConditionE 00000000000cfd60 00000000000007b5 T _ZN4absl5Mutex16AwaitWithTimeoutERKNS_9ConditionENS_8DurationE 00000000000d0700 00000000000003cf T _ZN4absl7CondVar15WaitWithTimeoutEPNS_5MutexENS_8DurationE 000000000011c280 0000000000000041 T _ZN4absl5Mutex25ReaderLockWhenWithTimeoutERKNS_9ConditionENS_8DurationE after: 000000000009c300 00000000000007ed T _ZN4absl7CondVar10WaitCommonEPNS_5MutexENS_24synchronization_internal13KernelTimeoutE 00000000000a03c0 00000000000006fe T _ZN4absl5Mutex11AwaitCommonERKNS_9ConditionENS_24synchronization_internal13KernelTimeoutE 000000000011ae00 0000000000000025 T _ZN4absl5Mutex14LockWhenCommonERKNS_9ConditionENS_24synchronization_internal13KernelTimeoutEb PiperOrigin-RevId: 563729364 Change-Id: Ic6b43761f76719c01e03d43cc0e0c419e41a85c1
Abseil Team committed -
Checking known_false condition only causes slow down because: 1. We already built skip list with equivalent conditions (and keep improving it on every Skip call). And when we built the skip list, we used more capable GuaranteedEqual function (it does not just check for equality of pointers, but for also for equality of function/arg). 2. Condition pointer are rarely equal even for equivalent conditions becuase temp Condition objects are usually created on the stack. We could call GuaranteedEqual(cond, known_false) instead of cond == known_false, but that slows down things even more (see point 1). So remove the known_false optimization. Benchmark results for this and the previous change: name old cpu/op new cpu/op delta BM_ConditionWaiters/0/1 36.0ns ± 0% 34.9ns ± 0% -3.02% (p=0.008 n=5+5) BM_ConditionWaiters/1/1 36.0ns ± 0% 34.9ns ± 0% -2.98% (p=0.008 n=5+5) BM_ConditionWaiters/2/1 35.9ns ± 0% 34.9ns ± 0% -3.03% (p=0.016 n=5+4) BM_ConditionWaiters/0/8 55.5ns ± 5% 49.8ns ± 3% -10.33% (p=0.008 n=5+5) BM_ConditionWaiters/1/8 36.2ns ± 0% 35.2ns ± 0% -2.90% (p=0.016 n=5+4) BM_ConditionWaiters/2/8 53.2ns ± 7% 48.3ns ± 7% ~ (p=0.056 n=5+5) BM_ConditionWaiters/0/64 295ns ± 1% 254ns ± 2% -13.73% (p=0.008 n=5+5) BM_ConditionWaiters/1/64 36.2ns ± 0% 35.2ns ± 0% -2.85% (p=0.008 n=5+5) BM_ConditionWaiters/2/64 290ns ± 6% 250ns ± 4% -13.68% (p=0.008 n=5+5) BM_ConditionWaiters/0/512 5.50µs ±12% 4.99µs ± 8% ~ (p=0.056 n=5+5) BM_ConditionWaiters/1/512 36.7ns ± 3% 35.2ns ± 0% -4.10% (p=0.008 n=5+5) BM_ConditionWaiters/2/512 4.44µs ±13% 4.01µs ± 3% -9.74% (p=0.008 n=5+5) BM_ConditionWaiters/0/4096 104µs ± 6% 101µs ± 3% ~ (p=0.548 n=5+5) BM_ConditionWaiters/1/4096 36.2ns ± 0% 35.1ns ± 0% -3.03% (p=0.008 n=5+5) BM_ConditionWaiters/2/4096 90.4µs ± 5% 85.3µs ± 7% ~ (p=0.222 n=5+5) BM_ConditionWaiters/0/8192 384µs ± 5% 367µs ± 7% ~ (p=0.222 n=5+5) BM_ConditionWaiters/1/8192 36.2ns ± 0% 35.2ns ± 0% -2.84% (p=0.008 n=5+5) BM_ConditionWaiters/2/8192 363µs ± 3% 316µs ± 7% -12.84% (p=0.008 n=5+5) PiperOrigin-RevId: 563717887 Change-Id: I9a62670628510d764a4f2f88a047abb8f85009e2
Abseil Team committed -
Condition::kTrue is used very rarely (frequently its uses even indicate confusion and bugs). But we pay few additional branches for kTrue on all Condition operations. Remove that special handling and simplify logic. PiperOrigin-RevId: 563691160 Change-Id: I76125adde4872489da069dd9c894ed73a65d1d83
Abseil Team committed
-
- 07 Sep, 2023 4 commits
-
-
PiperOrigin-RevId: 563500989 Change-Id: Ie86c5913d498aa2eea9ca6a8de045eaf3bce66f5
Abseil Team committed -
https://github.com/abseil/abseil-cpp/issues/1518#issuecomment-1709098904 pointed out that the previous untested fix doesn't work because pthread_getthreadid_np() has a different signature on Darwin. Follow up to https://github.com/abseil/abseil-cpp/commit/b9707b7d7845f9710ae6d5906827b833fdcc2754 Fixes #1518 PiperOrigin-RevId: 563432451 Change-Id: Id0a9212e9c4413fa520a42934efaed2a06ca5dbc
Derek Mauro committed -
This is a rename only with no other changes. PiperOrigin-RevId: 563428969 Change-Id: Iefc184bf9a233cb72649bc20b8555f6b662cac6d
Abseil Team committed -
This CL rolls forward a previous change which we rolled back temporarily due to compilation errors on x86 when PCLMUL intrinsics were unavailable. *** Original change description *** This change replaces inline x86 intrinsics with generic versions that compile for both x86 and ARM depending on the target arch. This change does not enable the accelerated crc memcpy engine on ARM. That will be done in a subsequent change after the optimal number of vector and integer regions for different CPUs is determined. *** PiperOrigin-RevId: 563416413 Change-Id: Iee630a15ed83c26659adb0e8a03d3f3d3a46d688
Abseil Team committed
-
- 06 Sep, 2023 3 commits
-
-
PiperOrigin-RevId: 563201556 Change-Id: I2e1abde3944f3f3db532ce15db12adaf89207515
Abseil Team committed -
FreeBSD, NetBSD, and OpenBSD https://man.freebsd.org/cgi/man.cgi?query=pthread_getthreadid_np https://man.netbsd.org/_lwp_self.2 https://man.openbsd.org/getthrid.2 This fixes a build break caused by https://github.com/abseil/abseil-cpp/commit/88cc63ef739d83277b492e881be72e9069fcb1fe Fixes #1518 PiperOrigin-RevId: 563200172 Change-Id: Ifd1b65c84e3631075248bc2e01b8f047dc72d201
Derek Mauro committed -
This change makes b-tree sets conform to the std::set API of having const access through `iterator`s as well as `const_iterator`s. This change can cause breakages for user code that depends on having mutable access to keys. If your code breaks, then there a couple of options to fix the issue: - If you are mutating a part of the key that can impact the sorted order, then this is a bug and you need to extract the key, mutate it, and then re-insert the mutated key. - If you are mutating a part of the key that can't impact the sorted order, then you can potentially (a) refactor to use btree_map instead of btree_set and make the part of the key that doesn't impact that ordering be the mapped_type, (b) change the part of the key that doesn't impact the ordering to be a `mutable` member of the key_type, (c) refactor from btree_set<K> to btree_set<K*> (but this also permits mutable access to the part of the key that determines the ordering). PiperOrigin-RevId: 563118156 Change-Id: I243558e74c43aa6655290099494b411d15298f4c
Evan Brown committed
-
- 05 Sep, 2023 4 commits
-
-
PiperOrigin-RevId: 562832827 Change-Id: If37f83e67b3b2ea350f74dd6bffae51ea5508f12
Evan Brown committed -
This change makes RepToPointer/PointerToRep have 0 instructions. This makes IsMovedFrom simpler (although this could always have left out the IsInlined check since that bit can never be set on the aligned pointer) In exchange, it makes CodeToInlinedRep slower, but does not inhibit replacing it with a constant. InlinedRepToCode is unaffected. PiperOrigin-RevId: 562826801 Change-Id: I2732f04ab293b773edc2efdec546b3a287b980c2
Abseil Team committed -
In some configurations this change causes compilation errors. We will roll this forward again after those issue are addressed. PiperOrigin-RevId: 562810916 Change-Id: I45b2a8d456273e9eff188f36da8f11323c4dfe66
Abseil Team committed -
This change replaces inline x86 intrinsics with generic versions that compile for both x86 and ARM depending on the target arch. This change does not enable the accelerated crc memcpy engine on ARM. That will be done in a subsequent change after the optimal number of vector and integer regions for different CPUs is determined. PiperOrigin-RevId: 562785420 Change-Id: I8ba4aa8de17587cedd92532f03767059a481f159
Abseil Team committed
-
- 01 Sep, 2023 1 commit
-
-
PiperOrigin-RevId: 561954737 Change-Id: I0e29a4f4523e4b3eafbd26d1d96f6e1c8deed957
Evan Brown committed
-
- 31 Aug, 2023 3 commits
-
-
Revise a comment regarding casting `memchr()`'s return type. The original comment seemed to suggest that the return type varies depending on the compiler and hence the `static_cast` is needed. PiperOrigin-RevId: 561706189 Change-Id: Ie244d235d082536edfca263d4165e16171cbc6c7
Abseil Team committed -
This bug does not affect any users currently since AcceleratedCrcMemcpyEngine is never configured with a single region currently. Before this CL, if the number of regions for the AcceleratedCrcMemcpyEngine was set to one, the CRC for the sole region would be incorrectly concatenated onto itself and corrupted. PiperOrigin-RevId: 561663848 Change-Id: Ibfc596306ab07db906d2e3ecf6eea3f6cb9f1b2b
Abseil Team committed -
Public functions are now available in //absl/base/prefetch.h PiperOrigin-RevId: 561649346 Change-Id: Ic377ad5f21c9a44ea1f213dec17f5cf97795ebde
Derek Mauro committed
-
- 30 Aug, 2023 5 commits
-
-
PiperOrigin-RevId: 561444343 Change-Id: I26c648b28b626e11caa32b0a34aef92932d5ddb9
Tomas Dzetkulic committed -
PiperOrigin-RevId: 561444259 Change-Id: I205ba9f11f4d41163ce74ae9cfa417fe500ccab3
Abseil Team committed -
There is a few cycles of overhead when transfering between GPR and Neon registers. We pay this cost for GroupAarch64Impl, largely because the speedup we get in Match() makes it profitable. After a Match call, if we do subsequent Group operations, we don't have to pay the full GPR <-> Neon cost, so it makes sense to do them with Neon instructions as well. However, in iteration and find_first_non_full(), we do not do a prior Match(), so the Mask/Count EmptyOrDeleted calls pay the full GPR <-> Neon cost. We can avoid this by using the GPR versions of the functions in the portable implementation of Group instead. We slightly change the order of operations in these functions (should be functionally a nop) in order to take advantage of Arm's free flexible second operand shifts with Logical operations. Iteration and Resize are roughly 8% and 12.6% faster respectively. This is not profitable on x86 because there is much lower GPR <-> xmm register latency and we use a 16-bit wide Group size. PiperOrigin-RevId: 561415183 Change-Id: I660b5bb84afedb05a12dcdf04d5b2e1514902760
Connal de Souza committed -
options.h was already included indirectly from config.h. This CL is just to include what you use. PiperOrigin-RevId: 561376910 Change-Id: I5b96b2aedc1e02eddc049f5bf0e6faa91799930d
Abseil Team committed -
Enqueue updates priority of the queued thread. It was assumed that the queued thread is the current thread. But it's not the case in CondVar wait morhping, where we requeue an existing CondVar waiter on the Mutex. As the result one thread can falsely get priority of another thread. Fix this by not updating priority in this case. And make the assumption explicit and checked. PiperOrigin-RevId: 561249402 Change-Id: I9476c047757090b893a88a2839b795b85fe220ad
Abseil Team committed
-
- 29 Aug, 2023 3 commits
-
-
PiperOrigin-RevId: 561119886 Change-Id: Ia1483fdb237f4b211068c7ad1f780ab3e6b81eca
Abseil Team committed -
PiperOrigin-RevId: 561108037 Change-Id: Idff65e288384cb55ce69f789db2d9374ae781d3d
Abseil Team committed -
Using the non-temporal AVX engine for unknown CPU types looks like a mistake to me, and the default built into the switch case is to use the fallback engine. I don't think this is causing issues now, but it might once we add ARM support. PiperOrigin-RevId: 561097994 Change-Id: I7f0edd447017c09acd49e4ea11476e32740d630a
Abseil Team committed
-